Age | Commit message (Collapse) | Author | Files | Lines |
|
Also, avoid using CO-RE features, as lskel doesn't support CO-RE, yet.
Include both light and libbpf skeleton in same file to test both of them
together.
In c48e51c8b07a ("bpf: selftests: Add selftests for module kfunc support"),
I added support for generating both lskel and libbpf skel for a BPF
object, however the name parameter for bpftool caused collisions when
included in same file together. This meant that every test needed a
separate file for a libbpf/light skeleton separation instead of
subtests.
Change that by appending a "_lskel" suffix to the name for files using
light skeleton, and convert all existing users.
Signed-off-by: Kumar Kartikeya Dwivedi <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Acked-by: Andrii Nakryiko <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
There are some instances where we don't use O_CLOEXEC when opening an
fd, fix these up. Otherwise, it is possible that a parallel fork causes
these fds to leak into a child process on execve.
Signed-off-by: Kumar Kartikeya Dwivedi <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Acked-by: Andrii Nakryiko <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
Add a simple wrapper for passing an fd and getting a new one >= 3 if it
is one of 0, 1, or 2. There are two primary reasons to make this change:
First, libbpf relies on the assumption a certain BPF fd is never 0 (e.g.
most recently noticed in [0]). Second, Alexei pointed out in [1] that
some environments reset stdin, stdout, and stderr if they notice an
invalid fd at these numbers. To protect against both these cases, switch
all internal BPF syscall wrappers in libbpf to always return an fd >= 3.
We only need to modify the syscall wrappers and not other code that
assumes a valid fd by doing >= 0, to avoid pointless churn, and because
it is still a valid assumption. The cost paid is two additional syscalls
if fd is in range [0, 2].
[0]: e31eec77e4ab ("bpf: selftests: Fix fd cleanup in get_branch_snapshot")
[1]: https://lore.kernel.org/bpf/CAADnVQKVKY8o_3aU8Gzke443+uHa-eGoM0h7W4srChMXU1S4Bg@mail.gmail.com
Signed-off-by: Kumar Kartikeya Dwivedi <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Acked-by: Song Liu <[email protected]>
Acked-by: Andrii Nakryiko <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
This extends existing ksym relocation code to also support relocating
weak ksyms. Care needs to be taken to zero out the src_reg (currently
BPF_PSEUOD_BTF_ID, always set for gen_loader by bpf_object__relocate_data)
when the BTF ID lookup fails at runtime. This is not a problem for
libbpf as it only sets ext->is_set when BTF ID lookup succeeds (and only
proceeds in case of failure if ext->is_weak, leading to src_reg
remaining as 0 for weak unresolved ksym).
A pattern similar to emit_relo_kfunc_btf is followed of first storing
the default values and then jumping over actual stores in case of an
error. For src_reg adjustment, we also need to perform it when copying
the populated instruction, so depending on if copied insn[0].imm is 0 or
not, we decide to jump over the adjustment.
We cannot reach that point unless the ksym was weak and resolved and
zeroed out, as the emit_check_err will cause us to jump to cleanup
label, so we do not need to recheck whether the ksym is weak before
doing the adjustment after copying BTF ID and BTF FD.
This is consistent with how libbpf relocates weak ksym. Logging
statements are added to show the relocation result and aid debugging.
Signed-off-by: Kumar Kartikeya Dwivedi <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
This uses the bpf_kallsyms_lookup_name helper added in previous patches
to relocate typeless ksyms. The return value ENOENT can be ignored, and
the value written to 'res' can be directly stored to the insn, as it is
overwritten to 0 on lookup failure. For repeating symbols, we can simply
copy the previously populated bpf_insn.
Also, we need to take care to not close fds for typeless ksym_desc, so
reuse the 'off' member's space to add a marker for typeless ksym and use
that to skip them in cleanup_relos.
We add a emit_ksym_relo_log helper that avoids duplicating common
logging instructions between typeless and weak ksym (for future commit).
Signed-off-by: Kumar Kartikeya Dwivedi <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
This helper allows us to get the address of a kernel symbol from inside
a BPF_PROG_TYPE_SYSCALL prog (used by gen_loader), so that we can
relocate typeless ksym vars.
Signed-off-by: Kumar Kartikeya Dwivedi <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Acked-by: Song Liu <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
Joanne Koong says:
====================
This patchset adds a new kind of bpf map: the bloom filter map.
Bloom filters are a space-efficient probabilistic data structure
used to quickly test whether an element exists in a set.
For a brief overview about how bloom filters work,
https://en.wikipedia.org/wiki/Bloom_filter
may be helpful.
One example use-case is an application leveraging a bloom filter
map to determine whether a computationally expensive hashmap
lookup can be avoided. If the element was not found in the bloom
filter map, the hashmap lookup can be skipped.
This patchset includes benchmarks for testing the performance of
the bloom filter for different entry sizes and different number of
hash functions used, as well as comparisons for hashmap lookups
with vs. without the bloom filter.
A high level overview of this patchset is as follows:
1/5 - kernel changes for adding bloom filter map
2/5 - libbpf changes for adding map_extra flags
3/5 - tests for the bloom filter map
4/5 - benchmarks for bloom filter lookup/update throughput and false positive
rate
5/5 - benchmarks for how hashmap lookups perform with vs. without the bloom
filter
v5 -> v6:
* in 1/5: remove "inline" from the hash function, add check in syscall to
fail out in cases where map_extra is not 0 for non-bloom-filter maps,
fix alignment matching issues, move "map_extra flags" comments to inside
the bpf_attr struct, add bpf_map_info map_extra changes here, add map_extra
assignment in bpf_map_get_info_by_fd, change hash value_size to u32 instead of
a u64
* in 2/5: remove bpf_map_info map_extra changes, remove TODO comment about
extending BTF arrays to cover u64s, cast to unsigned long long for %llx when
printing out map_extra flags
* in 3/5: use __type(value, ...) instead of __uint(value_size, ...) for values
and keys
* in 4/5: fix wrong bounds for the index when iterating through random values,
update commit message to include update+lookup benchmark results for 8 byte
and 64-byte value sizes, remove explicit global bool initializaton to false
for hashmap_use_bloom and count_false_hits variables
v4 -> v5:
* Change the "bitset map with bloom filter capabilities" to a bloom filter map
with max_entries signifying the number of unique entries expected in the bloom
filter, remove bitset tests
* Reduce verbiage by changing "bloom_filter" to "bloom", and renaming progs to
more concise names.
* in 2/5: remove "map_extra" from struct definitions that are frozen, create a
"bpf_create_map_params" struct to propagate map_extra to the kernel at map
creation time, change map_extra to __u64
* in 4/5: check pthread condition variable in a loop when generating initial
map data, remove "err" checks where not pragmatic, generate random values
for the hashmap in the setup() instead of in the bpf program, add check_args()
for checking that there aren't more requested entries than possible unique
entries for the specified value size
* in 5/5: Update commit message with updated benchmark data
v3 -> v4:
* Generalize the bloom filter map to be a bitset map with bloom filter
capabilities
* Add map_extra flags; pass in nr_hash_funcs through lower 4 bits of map_extra
for the bitset map
* Add tests for the bitset map (non-bloom filter) functionality
* In the benchmarks, stats are computed only as monotonic increases, and place
stats in a struct instead of as a percpu_array bpf map
v2 -> v3:
* Add libbpf changes for supporting nr_hash_funcs, instead of passing the
number of hash functions through map_flags.
* Separate the hashing logic in kernel/bpf/bloom_filter.c into a helper
function
v1 -> v2:
* Remove libbpf changes, and pass the number of hash functions through
map_flags instead.
* Default to using 5 hash functions if no number of hash functions
is specified.
* Use set_bit instead of spinlocks in the bloom filter bitmap. This
improved the speed significantly. For example, using 5 hash functions
with 100k entries, there was roughly a 35% speed increase.
* Use jhash2 (instead of jhash) for u32-aligned value sizes. This
increased the speed by roughly 5 to 15%. When using jhash2 on value
sizes non-u32 aligned (truncating any remainder bits), there was not
a noticeable difference.
* Add test for using the bloom filter as an inner map.
* Reran the benchmarks, updated the commit messages to correspond to
the new results.
====================
Acked-by: Martin KaFai Lau <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
Current BPF codegen doesn't respect X86_FEATURE_RETPOLINE* flags and
unconditionally emits a thunk call, this is sub-optimal and doesn't
match the regular, compiler generated, code.
Update the i386 JIT to emit code equal to what the compiler emits for
the regular kernel text (IOW. a plain THUNK call).
Update the x86_64 JIT to emit code similar to the result of compiler
and kernel rewrites as according to X86_FEATURE_RETPOLINE* flags.
Inlining RETPOLINE_AMD (lfence; jmp *%reg) and !RETPOLINE (jmp *%reg),
while doing a THUNK call for RETPOLINE.
This removes the hard-coded retpoline thunks and shrinks the generated
code. Leaving a single retpoline thunk definition in the kernel.
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Borislav Petkov <[email protected]>
Acked-by: Alexei Starovoitov <[email protected]>
Acked-by: Josh Poimboeuf <[email protected]>
Tested-by: Alexei Starovoitov <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Take an idea from the 32bit JIT, which uses the multi-pass nature of
the JIT to compute the instruction offsets on a prior pass in order to
compute the relative jump offsets on a later pass.
Application to the x86_64 JIT is slightly more involved because the
offsets depend on program variables (such as callee_regs_used and
stack_depth) and hence the computed offsets need to be kept in the
context of the JIT.
This removes, IMO quite fragile, code that hard-codes the offsets and
tries to compute the length of variable parts of it.
Convert both emit_bpf_tail_call_*() functions which have an out: label
at the end. Additionally emit_bpt_tail_call_direct() also has a poke
table entry, for which it computes the offset from the end (and thus
already relies on the previous pass to have computed addrs[i]), also
convert this to be a forward based offset.
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Borislav Petkov <[email protected]>
Acked-by: Alexei Starovoitov <[email protected]>
Acked-by: Josh Poimboeuf <[email protected]>
Tested-by: Alexei Starovoitov <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Currently Linux prevents usage of retpoline,amd on !AMD hardware, this
is unfriendly and gets in the way of testing. Remove this restriction.
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Borislav Petkov <[email protected]>
Acked-by: Josh Poimboeuf <[email protected]>
Tested-by: Alexei Starovoitov <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Make sure we can see the text changes when booting with
'debug-alternative'.
Example output:
[ ] SMP alternatives: retpoline at: __traceiter_initcall_level+0x1f/0x30 (ffffffff8100066f) len: 5 to: __x86_indirect_thunk_rax+0x0/0x20
[ ] SMP alternatives: ffffffff82603e58: [2:5) optimized NOPs: ff d0 0f 1f 00
[ ] SMP alternatives: ffffffff8100066f: orig: e8 cc 30 00 01
[ ] SMP alternatives: ffffffff8100066f: repl: ff d0 0f 1f 00
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Borislav Petkov <[email protected]>
Acked-by: Josh Poimboeuf <[email protected]>
Tested-by: Alexei Starovoitov <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Try and replace retpoline thunk calls with:
LFENCE
CALL *%\reg
for spectre_v2=retpoline,amd.
Specifically, the sequence above is 5 bytes for the low 8 registers,
but 6 bytes for the high 8 registers. This means that unless the
compilers prefix stuff the call with higher registers this replacement
will fail.
Luckily GCC strongly favours RAX for the indirect calls and most (95%+
for defconfig-x86_64) will be converted. OTOH clang strongly favours
R11 and almost nothing gets converted.
Note: it will also generate a correct replacement for the Jcc.d32
case, except unless the compilers start to prefix stuff that, it'll
never fit. Specifically:
Jncc.d8 1f
LFENCE
JMP *%\reg
1:
is 7-8 bytes long, where the original instruction in unpadded form is
only 6 bytes.
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Borislav Petkov <[email protected]>
Acked-by: Josh Poimboeuf <[email protected]>
Tested-by: Alexei Starovoitov <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Handle the rare cases where the compiler (clang) does an indirect
conditional tail-call using:
Jcc __x86_indirect_thunk_\reg
For the !RETPOLINE case this can be rewritten to fit the original (6
byte) instruction like:
Jncc.d8 1f
JMP *%\reg
NOP
1:
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Borislav Petkov <[email protected]>
Acked-by: Josh Poimboeuf <[email protected]>
Tested-by: Alexei Starovoitov <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Rewrite retpoline thunk call sites to be indirect calls for
spectre_v2=off. This ensures spectre_v2=off is as near to a
RETPOLINE=n build as possible.
This is the replacement for objtool writing alternative entries to
ensure the same and achieves feature-parity with the previous
approach.
One noteworthy feature is that it relies on the thunks to be in
machine order to compute the register index.
Specifically, this does not yet address the Jcc __x86_indirect_thunk_*
calls generated by clang, a future patch will add this.
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Borislav Petkov <[email protected]>
Acked-by: Josh Poimboeuf <[email protected]>
Tested-by: Alexei Starovoitov <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Stick all the retpolines in a single symbol and have the individual
thunks as inner labels, this should guarantee thunk order and layout.
Previously there were 16 (or rather 15 without rsp) separate symbols and
a toolchain might reasonably expect it could displace them however it
liked, with disregard for their relative position.
However, now they're part of a larger symbol. Any change to their
relative position would disrupt this larger _array symbol and thus not
be sound.
This is the same reasoning used for data symbols. On their own there
is no guarantee about their relative position wrt to one aonther, but
we're still able to do arrays because an array as a whole is a single
larger symbol.
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Borislav Petkov <[email protected]>
Acked-by: Josh Poimboeuf <[email protected]>
Tested-by: Alexei Starovoitov <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Because it makes no sense to split the retpoline gunk over multiple
headers.
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Borislav Petkov <[email protected]>
Acked-by: Josh Poimboeuf <[email protected]>
Tested-by: Alexei Starovoitov <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Currently GEN-for-each-reg.h usage leaves GEN defined, relying on any
subsequent usage to start with #undef, which is rude.
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Borislav Petkov <[email protected]>
Acked-by: Josh Poimboeuf <[email protected]>
Tested-by: Alexei Starovoitov <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Ensure the register order is correct; this allows for easy translation
between register number and trampoline and vice-versa.
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Borislav Petkov <[email protected]>
Acked-by: Josh Poimboeuf <[email protected]>
Tested-by: Alexei Starovoitov <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Now that objtool no longer creates alternatives, these replacement
symbols are no longer needed, remove them.
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Borislav Petkov <[email protected]>
Acked-by: Josh Poimboeuf <[email protected]>
Tested-by: Alexei Starovoitov <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Instead of writing complete alternatives, simply provide a list of all
the retpoline thunk calls. Then the kernel is free to do with them as
it pleases. Simpler code all-round.
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Borislav Petkov <[email protected]>
Acked-by: Josh Poimboeuf <[email protected]>
Tested-by: Alexei Starovoitov <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Any one instruction can only ever call a single function, therefore
insn->mcount_loc_node is superfluous and can use insn->call_node.
This shrinks struct instruction, which is by far the most numerous
structure objtool creates.
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Borislav Petkov <[email protected]>
Acked-by: Josh Poimboeuf <[email protected]>
Tested-by: Alexei Starovoitov <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Assume ALTERNATIVE()s know what they're doing and do not change, or
cause to change, instructions in .altinstr_replacement sections.
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Borislav Petkov <[email protected]>
Acked-by: Josh Poimboeuf <[email protected]>
Tested-by: Alexei Starovoitov <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
In order to avoid calling str*cmp() on symbol names, over and over, do
them all once upfront and store the result.
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Borislav Petkov <[email protected]>
Acked-by: Josh Poimboeuf <[email protected]>
Tested-by: Alexei Starovoitov <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
|
|
The reset callback may clear the internal card detect interrupts, so
make sure to reenable them if needed.
Fixes: b4d86f37eacb ("mmc: renesas_sdhi: do hard reset if possible")
Reported-by: Biju Das <[email protected]>
Signed-off-by: Wolfram Sang <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Ulf Hansson <[email protected]>
|
|
filter
This patch adds benchmark tests for comparing the performance of hashmap
lookups without the bloom filter vs. hashmap lookups with the bloom filter.
Checking the bloom filter first for whether the element exists should
overall enable a higher throughput for hashmap lookups, since if the
element does not exist in the bloom filter, we can avoid a costly lookup in
the hashmap.
On average, using 5 hash functions in the bloom filter tended to perform
the best across the widest range of different entry sizes. The benchmark
results using 5 hash functions (running on 8 threads on a machine with one
numa node, and taking the average of 3 runs) were roughly as follows:
value_size = 4 bytes -
10k entries: 30% faster
50k entries: 40% faster
100k entries: 40% faster
500k entres: 70% faster
1 million entries: 90% faster
5 million entries: 140% faster
value_size = 8 bytes -
10k entries: 30% faster
50k entries: 40% faster
100k entries: 50% faster
500k entres: 80% faster
1 million entries: 100% faster
5 million entries: 150% faster
value_size = 16 bytes -
10k entries: 20% faster
50k entries: 30% faster
100k entries: 35% faster
500k entres: 65% faster
1 million entries: 85% faster
5 million entries: 110% faster
value_size = 40 bytes -
10k entries: 5% faster
50k entries: 15% faster
100k entries: 20% faster
500k entres: 65% faster
1 million entries: 75% faster
5 million entries: 120% faster
Signed-off-by: Joanne Koong <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
This patch adds benchmark tests for the throughput (for lookups + updates)
and the false positive rate of bloom filter lookups, as well as some
minor refactoring of the bash script for running the benchmarks.
These benchmarks show that as the number of hash functions increases,
the throughput and the false positive rate of the bloom filter decreases.
>From the benchmark data, the approximate average false-positive rates
are roughly as follows:
1 hash function = ~30%
2 hash functions = ~15%
3 hash functions = ~5%
4 hash functions = ~2.5%
5 hash functions = ~1%
6 hash functions = ~0.5%
7 hash functions = ~0.35%
8 hash functions = ~0.15%
9 hash functions = ~0.1%
10 hash functions = ~0%
For reference data, the benchmarks run on one thread on a machine
with one numa node for 1 to 5 hash functions for 8-byte and 64-byte
values are as follows:
1 hash function:
50k entries
8-byte value
Lookups - 51.1 M/s operations
Updates - 33.6 M/s operations
False positive rate: 24.15%
64-byte value
Lookups - 15.7 M/s operations
Updates - 15.1 M/s operations
False positive rate: 24.2%
100k entries
8-byte value
Lookups - 51.0 M/s operations
Updates - 33.4 M/s operations
False positive rate: 24.04%
64-byte value
Lookups - 15.6 M/s operations
Updates - 14.6 M/s operations
False positive rate: 24.06%
500k entries
8-byte value
Lookups - 50.5 M/s operations
Updates - 33.1 M/s operations
False positive rate: 27.45%
64-byte value
Lookups - 15.6 M/s operations
Updates - 14.2 M/s operations
False positive rate: 27.42%
1 mil entries
8-byte value
Lookups - 49.7 M/s operations
Updates - 32.9 M/s operations
False positive rate: 27.45%
64-byte value
Lookups - 15.4 M/s operations
Updates - 13.7 M/s operations
False positive rate: 27.58%
2.5 mil entries
8-byte value
Lookups - 47.2 M/s operations
Updates - 31.8 M/s operations
False positive rate: 30.94%
64-byte value
Lookups - 15.3 M/s operations
Updates - 13.2 M/s operations
False positive rate: 30.95%
5 mil entries
8-byte value
Lookups - 41.1 M/s operations
Updates - 28.1 M/s operations
False positive rate: 31.01%
64-byte value
Lookups - 13.3 M/s operations
Updates - 11.4 M/s operations
False positive rate: 30.98%
2 hash functions:
50k entries
8-byte value
Lookups - 34.1 M/s operations
Updates - 20.1 M/s operations
False positive rate: 9.13%
64-byte value
Lookups - 8.4 M/s operations
Updates - 7.9 M/s operations
False positive rate: 9.21%
100k entries
8-byte value
Lookups - 33.7 M/s operations
Updates - 18.9 M/s operations
False positive rate: 9.13%
64-byte value
Lookups - 8.4 M/s operations
Updates - 7.7 M/s operations
False positive rate: 9.19%
500k entries
8-byte value
Lookups - 32.7 M/s operations
Updates - 18.1 M/s operations
False positive rate: 12.61%
64-byte value
Lookups - 8.4 M/s operations
Updates - 7.5 M/s operations
False positive rate: 12.61%
1 mil entries
8-byte value
Lookups - 30.6 M/s operations
Updates - 18.9 M/s operations
False positive rate: 12.54%
64-byte value
Lookups - 8.0 M/s operations
Updates - 7.0 M/s operations
False positive rate: 12.52%
2.5 mil entries
8-byte value
Lookups - 25.3 M/s operations
Updates - 16.7 M/s operations
False positive rate: 16.77%
64-byte value
Lookups - 7.9 M/s operations
Updates - 6.5 M/s operations
False positive rate: 16.88%
5 mil entries
8-byte value
Lookups - 20.8 M/s operations
Updates - 14.7 M/s operations
False positive rate: 16.78%
64-byte value
Lookups - 7.0 M/s operations
Updates - 6.0 M/s operations
False positive rate: 16.78%
3 hash functions:
50k entries
8-byte value
Lookups - 25.1 M/s operations
Updates - 14.6 M/s operations
False positive rate: 7.65%
64-byte value
Lookups - 5.8 M/s operations
Updates - 5.5 M/s operations
False positive rate: 7.58%
100k entries
8-byte value
Lookups - 24.7 M/s operations
Updates - 14.1 M/s operations
False positive rate: 7.71%
64-byte value
Lookups - 5.8 M/s operations
Updates - 5.3 M/s operations
False positive rate: 7.62%
500k entries
8-byte value
Lookups - 22.9 M/s operations
Updates - 13.9 M/s operations
False positive rate: 2.62%
64-byte value
Lookups - 5.6 M/s operations
Updates - 4.8 M/s operations
False positive rate: 2.7%
1 mil entries
8-byte value
Lookups - 19.8 M/s operations
Updates - 12.6 M/s operations
False positive rate: 2.60%
64-byte value
Lookups - 5.3 M/s operations
Updates - 4.4 M/s operations
False positive rate: 2.69%
2.5 mil entries
8-byte value
Lookups - 16.2 M/s operations
Updates - 10.7 M/s operations
False positive rate: 4.49%
64-byte value
Lookups - 4.9 M/s operations
Updates - 4.1 M/s operations
False positive rate: 4.41%
5 mil entries
8-byte value
Lookups - 18.8 M/s operations
Updates - 9.2 M/s operations
False positive rate: 4.45%
64-byte value
Lookups - 5.2 M/s operations
Updates - 3.9 M/s operations
False positive rate: 4.54%
4 hash functions:
50k entries
8-byte value
Lookups - 19.7 M/s operations
Updates - 11.1 M/s operations
False positive rate: 1.01%
64-byte value
Lookups - 4.4 M/s operations
Updates - 4.0 M/s operations
False positive rate: 1.00%
100k entries
8-byte value
Lookups - 19.5 M/s operations
Updates - 10.9 M/s operations
False positive rate: 1.00%
64-byte value
Lookups - 4.3 M/s operations
Updates - 3.9 M/s operations
False positive rate: 0.97%
500k entries
8-byte value
Lookups - 18.2 M/s operations
Updates - 10.6 M/s operations
False positive rate: 2.05%
64-byte value
Lookups - 4.3 M/s operations
Updates - 3.7 M/s operations
False positive rate: 2.05%
1 mil entries
8-byte value
Lookups - 15.5 M/s operations
Updates - 9.6 M/s operations
False positive rate: 1.99%
64-byte value
Lookups - 4.0 M/s operations
Updates - 3.4 M/s operations
False positive rate: 1.99%
2.5 mil entries
8-byte value
Lookups - 13.8 M/s operations
Updates - 7.7 M/s operations
False positive rate: 3.91%
64-byte value
Lookups - 3.7 M/s operations
Updates - 3.6 M/s operations
False positive rate: 3.78%
5 mil entries
8-byte value
Lookups - 13.0 M/s operations
Updates - 6.9 M/s operations
False positive rate: 3.93%
64-byte value
Lookups - 3.5 M/s operations
Updates - 3.7 M/s operations
False positive rate: 3.39%
5 hash functions:
50k entries
8-byte value
Lookups - 16.4 M/s operations
Updates - 9.1 M/s operations
False positive rate: 0.78%
64-byte value
Lookups - 3.5 M/s operations
Updates - 3.2 M/s operations
False positive rate: 0.77%
100k entries
8-byte value
Lookups - 16.3 M/s operations
Updates - 9.0 M/s operations
False positive rate: 0.79%
64-byte value
Lookups - 3.5 M/s operations
Updates - 3.2 M/s operations
False positive rate: 0.78%
500k entries
8-byte value
Lookups - 15.1 M/s operations
Updates - 8.8 M/s operations
False positive rate: 1.82%
64-byte value
Lookups - 3.4 M/s operations
Updates - 3.0 M/s operations
False positive rate: 1.78%
1 mil entries
8-byte value
Lookups - 13.2 M/s operations
Updates - 7.8 M/s operations
False positive rate: 1.81%
64-byte value
Lookups - 3.2 M/s operations
Updates - 2.8 M/s operations
False positive rate: 1.80%
2.5 mil entries
8-byte value
Lookups - 10.5 M/s operations
Updates - 5.9 M/s operations
False positive rate: 0.29%
64-byte value
Lookups - 3.2 M/s operations
Updates - 2.4 M/s operations
False positive rate: 0.28%
5 mil entries
8-byte value
Lookups - 9.6 M/s operations
Updates - 5.7 M/s operations
False positive rate: 0.30%
64-byte value
Lookups - 3.2 M/s operations
Updates - 2.7 M/s operations
False positive rate: 0.30%
Signed-off-by: Joanne Koong <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Acked-by: Andrii Nakryiko <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
This patch adds test cases for bpf bloom filter maps. They include tests
checking against invalid operations by userspace, tests for using the
bloom filter map as an inner map, and a bpf program that queries the
bloom filter map for values added by a userspace program.
Signed-off-by: Joanne Koong <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Acked-by: Andrii Nakryiko <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
This patch adds the libbpf infrastructure for supporting a
per-map-type "map_extra" field, whose definition will be
idiosyncratic depending on map type.
For example, for the bloom filter map, the lower 4 bits of
map_extra is used to denote the number of hash functions.
Please note that until libbpf 1.0 is here, the
"bpf_create_map_params" struct is used as a temporary
means for propagating the map_extra field to the kernel.
Signed-off-by: Joanne Koong <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Acked-by: Andrii Nakryiko <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
This patch adds the kernel-side changes for the implementation of
a bpf bloom filter map.
The bloom filter map supports peek (determining whether an element
is present in the map) and push (adding an element to the map)
operations.These operations are exposed to userspace applications
through the already existing syscalls in the following way:
BPF_MAP_LOOKUP_ELEM -> peek
BPF_MAP_UPDATE_ELEM -> push
The bloom filter map does not have keys, only values. In light of
this, the bloom filter map's API matches that of queue stack maps:
user applications use BPF_MAP_LOOKUP_ELEM/BPF_MAP_UPDATE_ELEM
which correspond internally to bpf_map_peek_elem/bpf_map_push_elem,
and bpf programs must use the bpf_map_peek_elem and bpf_map_push_elem
APIs to query or add an element to the bloom filter map. When the
bloom filter map is created, it must be created with a key_size of 0.
For updates, the user will pass in the element to add to the map
as the value, with a NULL key. For lookups, the user will pass in the
element to query in the map as the value, with a NULL key. In the
verifier layer, this requires us to modify the argument type of
a bloom filter's BPF_FUNC_map_peek_elem call to ARG_PTR_TO_MAP_VALUE;
as well, in the syscall layer, we need to copy over the user value
so that in bpf_map_peek_elem, we know which specific value to query.
A few things to please take note of:
* If there are any concurrent lookups + updates, the user is
responsible for synchronizing this to ensure no false negative lookups
occur.
* The number of hashes to use for the bloom filter is configurable from
userspace. If no number is specified, the default used will be 5 hash
functions. The benchmarks later in this patchset can help compare the
performance of using different number of hashes on different entry
sizes. In general, using more hashes decreases both the false positive
rate and the speed of a lookup.
* Deleting an element in the bloom filter map is not supported.
* The bloom filter map may be used as an inner map.
* The "max_entries" size that is specified at map creation time is used
to approximate a reasonable bitmap size for the bloom filter, and is not
otherwise strictly enforced. If the user wishes to insert more entries
into the bloom filter than "max_entries", they may do so but they should
be aware that this may lead to a higher false positive rate.
Signed-off-by: Joanne Koong <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Acked-by: Andrii Nakryiko <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
* irq/misc-5.16:
: .
: Misc irqchip fixes for 5.16:
: - MAINTAINERS update for the ARM VIC DT binding
: - Allow drivers using the IRQCHIP_PLATFORM_DRIVER_BEGIN/END
: infrastructure to use COMPILE_TEST without CONFIG_OF
: - DT updates
: - Detangle h8300 linux/irqchip.h inclusion
: .
h8300: Fix linux/irqchip.h include mess
dt-bindings: irqchip: renesas-irqc: Document r8a774e1 bindings
irqchip: Fix compile-testing without CONFIG_OF
MAINTAINERS: update arm,vic.yaml reference
Signed-off-by: Marc Zyngier <[email protected]>
|
|
h8300 drags linux/irqchip.h from asm/irq.h, which is in general a bad
idea (asm/*.h should avoid dragging linux/*.h, as it is usually supposed
to work the other way around).
Move the inclusion of linux/irqchip.h to the single location where it
actually matters in the arch code.
Reported-by: Guenter Roeck <[email protected]>
Tested-by: Guenter Roeck <[email protected]>
Signed-off-by: Marc Zyngier <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Pull drm fixes from Dave Airlie:
"Quiet but not too quiet, I blame Halloween.
The first set of amdgpu fixes missed last week, hence why this has a
few more of them, it's mostly display fixes for new GPUs and some
debugfs OOB stuff.
The i915 patches have one to remove a tracepoint possible issue before
it's a real problem, the others around cflush and display are cc'ed to
stable as well.
Otherwise it's just a few misc fixes.
Summary:
MAINTAINERS:
- Fix the path pattern
ttm:
- Fix fence leak in ttm_transfered_destroy.
core:
- Add GPD Win3 rotation quirk
i915:
- Remove unconditional clflushes
- Fix oops on boot due to sync state on disabled DP encoders
- Revert backend specific data added to tracepoints
- Remove useless and incorrect memory frequence calculation
panel:
- Add quirk for Aya Neo 2021
seltest:
- Reset property count for each drm damage selftest so full run will
work correctly.
amdgpu:
- Fix two potential out of bounds writes in debugfs
- Fix revision handling for Yellow Carp
- Display fixes for Yellow Carp
- Display fixes for DCN 3.1"
* tag 'drm-fixes-2021-10-29' of git://anongit.freedesktop.org/drm/drm: (21 commits)
MAINTAINERS: dri-devel is for all of drivers/gpu
drm/i915: Revert 'guc_id' from i915_request tracepoint
drm/amd/display: Fix deadlock when falling back to v2 from v3
drm/amd/display: Fallback to clocks which meet requested voltage on DCN31
drm/amdgpu: Fix even more out of bound writes from debugfs
drm: panel-orientation-quirks: Add quirk for GPD Win3
drm/i915/dp: Skip the HW readout of DPCD on disabled encoders
drm/i915: Catch yet another unconditioal clflush
drm/i915: Convert unconditional clflush to drm_clflush_virt_range()
drm/i915/selftests: Properly reset mock object propers for each test
drm: panel-orientation-quirks: Add quirk for Aya Neo 2021
drm/ttm: fix memleak in ttm_transfered_destroy
drm/amdgpu: support B0&B1 external revision id for yellow carp
drm/amd/display: Moved dccg init to after bios golden init
drm/amd/display: Increase watermark latencies for DCN3.1
drm/amd/display: increase Z9 latency to workaround underflow in Z9
drm/amd/display: Require immediate flip support for DCN3.1 planes
drm/amd/display: Fix prefetch bandwidth calculation for DCN3.1
drm/amd/display: Limit display scaling to up to true 4k for DCN 3.1
drm/amdgpu: fix out of bounds write
...
|
|
Somehow we only have a list of subdirectories, which apparently made
it harder for folks to find the gpu maintainers. Fix that.
References: https://lore.kernel.org/dri-devel/YXrAAZlxxStNFG%[email protected]/
Signed-off-by: Daniel Vetter <[email protected]>
Cc: David Airlie <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Steven Rostedt <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
git://anongit.freedesktop.org/drm/drm-intel into drm-fixes
drm/i915 fixes for v5.15 final:
- Remove unconditional clflushes
- Fix oops on boot due to sync state on disabled DP encoders
- Revert backend specific data added to tracepoints
- Remove useless and incorrect memory frequence calculation
Signed-off-by: Dave Airlie <[email protected]>
From: Jani Nikula <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
Align the FDINITRD line to the FDARGS one with tabs.
[ bp: Commit message. ]
Signed-off-by: Elyes HAOUAS <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
It's faster and easier to read if we tolerate cur_hctx being NULL in
the "when to flush" condition. Rename last_hctx to cur_hctx while at it,
as it better describes the role of that variable.
Signed-off-by: Jens Axboe <[email protected]>
|
|
Return error code if devm_kmemdup() fails in ice_get_recp_frm_fw()
Fixes: fd2a6b71e300 ("ice: create advanced switch recipe")
Reported-by: Hulk Robot <[email protected]>
Signed-off-by: Wang Hai <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
Clang warns:
drivers/net/ethernet/intel/ice/ice_lib.c:1906:2: error: unannotated fall-through between switch labels [-Werror,-Wimplicit-fallthrough]
default:
^
drivers/net/ethernet/intel/ice/ice_lib.c:1906:2: note: insert 'break;' to avoid fall-through
default:
^
break;
1 error generated.
Clang is a little more pedantic than GCC, which does not warn when
falling through to a case that is just break or return. Clang's version
is more in line with the kernel's own stance in deprecated.rst, which
states that all switch/case blocks must end in either break,
fallthrough, continue, goto, or return. Add the missing break to silence
the warning.
Link: https://github.com/ClangBuiltLinux/linux/issues/1482
Signed-off-by: Nathan Chancellor <[email protected]>
Reviewed-by: Jesse Brandeburg <[email protected]>
Tested-by: Gurucharan G <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
Some devices have support for loading the PHY FW and in some cases this
can fail. When this fails, the FW will set the corresponding bit in the
link info structure. Also, the FW will send a link event if the correct
link event mask bit is set. Add support for printing an error message
when the PHY FW load fails during any link configuration flow and the
link event flow.
Since ice_check_module_power() is already doing something very similar
add a new function ice_check_link_cfg_err() so any failures reported in
the link info's link_cfg_err member can be printed in this one function.
Also, add the new ICE_FLAG_PHY_FW_LOAD_FAILED bit to the PF's flags so
we don't constantly print this error message during link polling if the
value never changed.
Signed-off-by: Brett Creeley <[email protected]>
Tested-by: Sunitha Mekala <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
This change adds support for changing MTU on port representor in
switchdev mode, by setting the min/max MTU values on port representor
netdev. Before it was possible to change the MTU only in a limited,
default range (68-1500).
Signed-off-by: Marcin Szycik <[email protected]>
Tested-by: Sandeep Penigalapati <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
Part of virtchannel messages are treated in different way in switchdev
mode to block configuring VFs from iavf driver side. This blocking was
done by doing nothing and returning success, event without sending
response.
Not sending response for opcodes that aren't supported in switchdev mode
leads to block iavf driver message handling. This happens for example
when vlan is configured at VF config time (VLAN module is already
loaded).
To get rid of it ice driver should answer for each VF message. In
switchdev mode:
- for adding/deleting VLAN driver should answer success without doing
anything to allow creating vlan device on VFs
- for enabling/disabling VLAN stripping and promiscuous mode driver
should answer not supported, this feature in switchdev can be only
set from host side
Signed-off-by: Michal Swiatkowski <[email protected]>
Tested-by: Sandeep Penigalapati <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
Mostly reuse code from Geneve and VXLAN in TC parsing code. Add new GRE
header to match on correct fields. Create new dummy packets with GRE
fields.
Instead of checking if any encap values are presented in TC flower,
check if device is tunnel type or redirect is to tunnel device. This
will allow adding all combination of rules. For example filters only
with inner fields.
Return error in case device isn't tunnel but encap values are presented.
gre example:
- create tunnel device
ip l add $NVGRE_DEV type gretap remote $NVGRE_REM_IP local $VF1_IP \
dev $PF
- add tc filter (in switchdev mode)
tc filter add dev $NVGRE_DEV protocol ip parent ffff: flower dst_ip \
$NVGRE1_IP action mirred egress redirect dev $VF1_PR
Signed-off-by: Michal Swiatkowski <[email protected]>
Tested-by: Sandeep Penigalapati <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
Add definition of UDP tunnel dummy packets. Fill destination port value
in filter based on UDP tunnel port. Append tunnel flags to switch filter
definition in case of matching the tunnel.
Both VXLAN and Geneve are UDP tunnels, so only one new header is needed.
Signed-off-by: Michal Swiatkowski <[email protected]>
Tested-by: Sandeep Penigalapati <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
Add definition for VXLAN and Geneve dummy packet. Define VXLAN and
Geneve type of fields to match on correct UDP tunnel header.
Parse tunnel specific fields from TC tool like outer MACs, outer IPs,
outer destination port and VNI. Save values and masks in outer header
struct and move header pointer to inner to simplify parsing inner
values.
There are two cases for redirect action:
- from uplink to VF - TC filter is added on tunnel device
- from VF to uplink - TC filter is added on PR, for this case check if
redirect device is tunnel device
VXLAN example:
- create tunnel device
ip l add $VXLAN_DEV type vxlan id $VXLAN_VNI dstport $VXLAN_PORT \
dev $PF
- add TC filter (in switchdev mode)
tc filter add dev $VXLAN_DEV protocol ip parent ffff: flower \
enc_dst_ip $VF1_IP enc_key_id $VXLAN_VNI action mirred egress \
redirect dev $VF1_PR
Geneve example:
- create tunnel device
ip l add $GENEVE_DEV type geneve id $GENEVE_VNI dstport $GENEVE_PORT \
remote $GENEVE_IP
- add TC filter (in switchdev mode)
tc filter add dev $GENEVE_DEV protocol ip parent ffff: flower \
enc_key_id $GENEVE_VNI dst_ip $GENEVE1_IP action mirred egress \
redirect dev $VF1_PR
Signed-off-by: Michal Swiatkowski <[email protected]>
Tested-by: Sandeep Penigalapati <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
Implement indirect notification mechanism to support offloading TC rules
on tunnel devices.
Keep indirect block list in netdev priv. Notification will call setting
tc cls flower function. For now we can offload only ingress type. Return
not supported for other flow block binder.
Signed-off-by: Michal Swiatkowski <[email protected]>
Acked-by: Paul Menzel <[email protected]>
Tested-by: Sandeep Penigalapati <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
include/net/sock.h
7b50ecfcc6cd ("net: Rename ->stream_memory_read to ->sock_is_readable")
4c1e34c0dbff ("vsock: Enable y2038 safe timeval for timeout")
drivers/net/ethernet/marvell/octeontx2/af/rvu_debugfs.c
0daa55d033b0 ("octeontx2-af: cn10k: debugfs for dumping LMTST map table")
e77bcdd1f639 ("octeontx2-af: Display all enabled PF VF rsrc_alloc entries.")
Adjacent code addition in both cases, keep both.
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Use 2-factor argument form kvcalloc() instead of kvzalloc().
Link: https://github.com/KSPP/linux/issues/162
Signed-off-by: Gustavo A. R. Silva <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from WiFi (mac80211), and BPF.
Current release - regressions:
- skb_expand_head: adjust skb->truesize to fix socket memory
accounting
- mptcp: fix corrupt receiver key in MPC + data + checksum
Previous releases - regressions:
- multicast: calculate csum of looped-back and forwarded packets
- cgroup: fix memory leak caused by missing cgroup_bpf_offline
- cfg80211: fix management registrations locking, prevent list
corruption
- cfg80211: correct false positive in bridge/4addr mode check
- tcp_bpf: fix race in the tcp_bpf_send_verdict resulting in reusing
previous verdict
Previous releases - always broken:
- sctp: enhancements for the verification tag, prevent attackers from
killing SCTP sessions
- tipc: fix size validations for the MSG_CRYPTO type
- mac80211: mesh: fix HE operation element length check, prevent out
of bound access
- tls: fix sign of socket errors, prevent positive error codes being
reported from read()/write()
- cfg80211: scan: extend RCU protection in
cfg80211_add_nontrans_list()
- implement ->sock_is_readable() for UDP and AF_UNIX, fix poll() for
sockets in a BPF sockmap
- bpf: fix potential race in tail call compatibility check resulting
in two operations which would make the map incompatible succeeding
- bpf: prevent increasing bpf_jit_limit above max
- bpf: fix error usage of map_fd and fdget() in generic batch update
- phy: ethtool: lock the phy for consistency of results
- prevent infinite while loop in skb_tx_hash() when Tx races with
driver reconfiguring the queue <> traffic class mapping
- usbnet: fixes for bad HW conjured by syzbot
- xen: stop tx queues during live migration, prevent UAF
- net-sysfs: initialize uid and gid before calling
net_ns_get_ownership
- mlxsw: prevent Rx stalls under memory pressure"
* tag 'net-5.15-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (67 commits)
Revert "net: hns3: fix pause config problem after autoneg disabled"
mptcp: fix corrupt receiver key in MPC + data + checksum
riscv, bpf: Fix potential NULL dereference
octeontx2-af: Fix possible null pointer dereference.
octeontx2-af: Display all enabled PF VF rsrc_alloc entries.
octeontx2-af: Check whether ipolicers exists
net: ethernet: microchip: lan743x: Fix skb allocation failure
net/tls: Fix flipped sign in async_wait.err assignment
net/tls: Fix flipped sign in tls_err_abort() calls
net/smc: Correct spelling mistake to TCPF_SYN_RECV
net/smc: Fix smc_link->llc_testlink_time overflow
nfp: bpf: relax prog rejection for mtu check through max_pkt_offset
vmxnet3: do not stop tx queues after netif_device_detach()
r8169: Add device 10ec:8162 to driver r8169
ptp: Document the PTP_CLK_MAGIC ioctl number
usbnet: fix error return code in usbnet_probe()
net: hns3: adjust string spaces of some parameters of tx bd info in debugfs
net: hns3: expand buffer len for some debugfs command
net: hns3: add more string spaces for dumping packets number of queue info in debugfs
net: hns3: fix data endian problem of some functions of debugfs
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Pull spi fixes from Mark Brown:
"A couple of final driver specific fixes for v5.15, one fixing
potential ID collisions between two instances of the Altera driver and
one making Microwire full duplex mode actually work on pl022"
* tag 'spi-fix-v5.15-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: spl022: fix Microwire full duplex mode
spi: altera: Change to dynamic allocation of spi id
|