Age | Commit message (Collapse) | Author | Files | Lines |
|
This is a preparatory change. A follow-up patch "bpf: verify callbacks
as if they are called unknown number of times" changes logic for
callbacks handling. While previously callbacks were verified as a
single function call, new scheme takes into account that callbacks
could be executed unknown number of times.
This has dire implications for bpf_loop_bench:
SEC("fentry/" SYS_PREFIX "sys_getpgid")
int benchmark(void *ctx)
{
for (int i = 0; i < 1000; i++) {
bpf_loop(nr_loops, empty_callback, NULL, 0);
__sync_add_and_fetch(&hits, nr_loops);
}
return 0;
}
W/o callbacks change verifier sees it as a 1000 calls to
empty_callback(). However, with callbacks change things become
exponential:
- i=0: state exploring empty_callback is scheduled with i=0 (a);
- i=1: state exploring empty_callback is scheduled with i=1;
...
- i=999: state exploring empty_callback is scheduled with i=999;
- state (a) is popped from stack;
- i=1: state exploring empty_callback is scheduled with i=1;
...
Avoid this issue by rewriting outer loop as bpf_loop().
Unfortunately, this adds a function call to a loop at runtime, which
negatively affects performance:
throughput latency
before: 149.919 ± 0.168 M ops/s, 6.670 ns/op
after : 137.040 ± 0.187 M ops/s, 7.297 ns/op
Acked-by: Andrii Nakryiko <[email protected]>
Signed-off-by: Eduard Zingerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
This change prepares strobemeta for update in callbacks verification
logic. To allow bpf_loop() verification converge when multiple
callback iterations are considered:
- track offset inside strobemeta_payload->payload directly as scalar
value;
- at each iteration make sure that remaining
strobemeta_payload->payload capacity is sufficient for execution of
read_{map,str}_var functions;
- make sure that offset is tracked as unbound scalar between
iterations, otherwise verifier won't be able infer that bpf_loop
callback reaches identical states.
Acked-by: Andrii Nakryiko <[email protected]>
Signed-off-by: Eduard Zingerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
This change prepares syncookie_{tc,xdp} for update in callbakcs
verification logic. To allow bpf_loop() verification converge when
multiple callback itreations are considered:
- track offset inside TCP payload explicitly, not as a part of the
pointer;
- make sure that offset does not exceed MAX_PACKET_OFF enforced by
verifier;
- make sure that offset is tracked as unbound scalar between
iterations, otherwise verifier won't be able infer that bpf_loop
callback reaches identical states.
Acked-by: Andrii Nakryiko <[email protected]>
Signed-off-by: Eduard Zingerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
Report the number of workers in use to process the test batches.
Since the number is now subject to a limit, avoid users getting
confused.
Signed-off-by: Pedro Tammela <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Acked-by: Jamal Hadi Salim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
In the spirit of failing early, timeout on unbounded loops that take
longer than 20 ticks to complete. Such loops are to ensure that objects
created are already visible so tests can proceed without any issues.
If a test setup takes more than 20 ticks to see an object, there's
definetely something wrong.
Signed-off-by: Pedro Tammela <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Acked-by: Jamal Hadi Salim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Instead of listing lingering ns pinned files and delete them one by one, leverage '-all'
from iproute2 to do it in a single process fork.
Signed-off-by: Pedro Tammela <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Acked-by: Jamal Hadi Salim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
When pyroute2 is available, use the native netns delete routine instead
of calling iproute2 to do it. As forks are expensive with some kernel
configs, minimize its usage to avoid kselftests timeouts.
Signed-off-by: Pedro Tammela <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Acked-by: Jamal Hadi Salim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Surprisingly in kernel configs with most of the debug knobs turned on,
pre-allocating the test resources makes tdc run much slower overall than
when allocating resources on a per test basis.
As these knobs are used in kselftests in downstream CIs, let's go back
to the old way of doing things to avoid kselftests timeouts.
Reported-by: kernel test robot <[email protected]>
Closes: https://lore.kernel.org/oe-lkp/[email protected]
Signed-off-by: Pedro Tammela <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Acked-by: Jamal Hadi Salim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
We have observed a lot of lock contention and test instability when running with >8 cores.
Enough to actually make the tests run slower than with fewer cores.
Cap the maximum cores of parallel tdc to 4 which showed in testing to
be a reasonable number for efficiency and stability in different kernel
config scenarios.
Signed-off-by: Pedro Tammela <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Acked-by: Jamal Hadi Salim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Commit 29f834aa326e ("net_sched: sch_fq: add 3 bands and WRR
scheduling") introduces multiple traffic bands, and per-band maximum
packet count.
Per-band limits ensures that packets in one class cannot fill the
entire qdisc and so cause DoS to the traffic in the other classes.
Verify this behavior:
1. set the limit to 10 per band
2. send 20 pkts on band A: verify that 10 are queued, 10 dropped
3. send 20 pkts on band A: verify that 0 are queued, 20 dropped
4. send 20 pkts on band B: verify that 10 are queued, 10 dropped
Packets must remain queued for a period to trigger this behavior.
Use SO_TXTIME to store packets for 100 msec.
The test reuses existing upstream test infra. The script is a fork of
cmsg_time.sh. The scripts call cmsg_sender.
The test extends cmsg_sender with two arguments:
* '-P' SO_PRIORITY
There is a subtle difference between IPv4 and IPv6 stack behavior:
PF_INET/IP_TOS sets IP header bits and sk_priority
PF_INET6/IPV6_TCLASS sets IP header bits BUT NOT sk_priority
* '-n' num pkts
Send multiple packets in quick succession.
I first attempted a for loop in the script, but this is too slow in
virtualized environments, causing flakiness as the 100ms timeout is
reached and packets are dequeued.
Also do not wait for timestamps to be queued unless timestamps are
requested.
Signed-off-by: Willem de Bruijn <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
The ASSERT_EQ() macro sneakily expands to two statements, so the loop here
needs braces to ensure it captures both and actually terminates the test
upon failure. Where these tests are currently failing on my arm64 machine,
this reduces the number of logged lines from a rather unreasonable
~197,000 down to 10. While we're at it, we can also clean up the
tautologous "count" calculations whose assertions can never fail unless
mathematics and/or the C language become fundamentally broken.
Fixes: a9af47e382a4 ("iommufd/selftest: Test IOMMU_HWPT_GET_DIRTY_BITMAP")
Link: https://lore.kernel.org/r/90e083045243ef407dd592bb1deec89cd1f4ddf2.1700153535.git.robin.murphy@arm.com
Signed-off-by: Robin Murphy <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
Reviewed-by: Joao Martins <[email protected]>
Tested-by: Joao Martins <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
Reduce verboseness of test_progs' output in reg_bounds set of tests with
two changes.
First, instead of each different operator (<, <=, >, ...) being it's own
subtest, combine all different ops for the same (x, y, init_t, cond_t)
values into single subtest. Instead of getting 6 subtests, we get one
generic one, e.g.:
#192/53 reg_bounds_crafted/(s64)[0xffffffffffffffff; 0] (s64)<op> 0xffffffff00000000:OK
Second, for random generated test cases, treat all of them as a single
test to eliminate very verbose output with random values in them. So now
we'll just get one line per each combination of (init_t, cond_t),
instead of 6 x 25 = 150 subtests before this change:
#225 reg_bounds_rand_consts_s32_s32:OK
Given we reduce verboseness so much, it makes sense to do a bit more
random testing, so we also bump default number of random tests to 100,
up from 25. This doesn't increase runtime significantly, especially in
parallelized mode.
With all the above changes we still make sure that we have all the
information necessary for reproducing test case if it happens to fail.
That includes reporting random seed and specific operator that is
failing. Those will only be printed to console if related test/subtest
fails, so it doesn't have any added verboseness implications.
Signed-off-by: Andrii Nakryiko <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
Extend the existing tc_redirect selftest to also cover netkit devices
for exercising the bpf_redirect_peer() code paths, so that we have both
veth as well as netkit covered, all tests still pass after this change.
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Stanislav Fomichev <[email protected]>
Reviewed-by: Nikolay Aleksandrov <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Martin KaFai Lau <[email protected]>
|
|
No functional changes to the test case, but just renaming various functions,
variables, etc, to remove veth part of their name for making it more generic
and reusable later on (e.g. for netkit).
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Stanislav Fomichev <[email protected]>
Reviewed-by: Nikolay Aleksandrov <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Martin KaFai Lau <[email protected]>
|
|
The sleepgraph tool currently fails:
File "/usr/bin/sleepgraph", line 4155
or re.match('psci: CPU(?P<cpu>[0-9]*) killed.*', msg)):
^
SyntaxError: unmatched ')'
Fixes: 34ea427e01ea ("PM: tools: sleepgraph: Recognize "CPU killed" messages")
Signed-off-by: David Woodhouse <[email protected]>
Reviewed-by: Wolfram Sang <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>
|
|
Add tests to specifically check for the refcount interactions of
hashtables created by u32. These tables should not be deleted when
referenced and the flush order should respect a tree like composition.
Signed-off-by: Pedro Tammela <[email protected]>
Acked-by: Jamal Hadi Salim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Instead of always printing numbers as either decimals (and in some
cases, like for "imm=%llx", in hexadecimals), decide the form based on
actual values. For numbers in a reasonably small range (currently,
[0, U16_MAX] for unsigned values, and [S16_MIN, S16_MAX] for signed ones),
emit them as decimals. In all other cases, even for signed values,
emit them in hexadecimals.
For large values hex form is often times way more useful: it's easier to
see an exact difference between 0xffffffff80000000 and 0xffffffff7fffffff,
than between 18446744071562067966 and 18446744071562067967, as one
particular example.
Small values representing small pointer offsets or application
constants, on the other hand, are way more useful to be represented in
decimal notation.
Adjust reg_bounds register state parsing logic to take into account this
change.
Acked-by: Eduard Zingerman <[email protected]>
Acked-by: Stanislav Fomichev <[email protected]>
Signed-off-by: Andrii Nakryiko <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
Simplify BPF verifier log further by omitting default (and frequently
irrelevant) off=0 and imm=0 parts for non-SCALAR_VALUE registers. As can
be seen from fixed tests, this is often a visual noise for PTR_TO_CTX
register and even for PTR_TO_PACKET registers.
Omitting default values follows the rest of register state logic: we
omit default values to keep verifier log succinct and to highlight
interesting state that deviates from default one. E.g., we do the same
for var_off, when it's unknown, which gives no additional information.
Acked-by: Eduard Zingerman <[email protected]>
Acked-by: Stanislav Fomichev <[email protected]>
Signed-off-by: Andrii Nakryiko <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
In complicated real-world applications, whenever debugging some
verification error through verifier log, it often would be very useful
to see map name for PTR_TO_MAP_VALUE register. Usually this needs to be
inferred from key/value sizes and maybe trying to guess C code location,
but it's not always clear.
Given verifier has the name, and it's never too long, let's just emit it
for ptr_to_map_key, ptr_to_map_value, and const_ptr_to_map registers. We
reshuffle the order a bit, so that map name, key size, and value size
appear before offset and immediate values, which seems like a more
logical order.
Current output:
R1_w=map_ptr(map=array_map,ks=4,vs=8,off=0,imm=0)
But we'll get rid of useless off=0 and imm=0 parts in the next patch.
Acked-by: Eduard Zingerman <[email protected]>
Acked-by: Stanislav Fomichev <[email protected]>
Signed-off-by: Andrii Nakryiko <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
Test that PCI reset works correctly by verifying that only the expected
reset methods are supported and that after issuing the reset the ifindex
of the port changes.
Signed-off-by: Ido Schimmel <[email protected]>
Reviewed-by: Petr Machata <[email protected]>
Signed-off-by: Petr Machata <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux
Pull turbostat updates from Len Brown:
- Turbostat features are now table-driven (Rui Zhang)
- Add support for some new platforms (Sumeet Pawnikar, Rui Zhang)
- Gracefully run in configs when CPUs are limited (Rui Zhang, Srinivas
Pandruvada)
- misc minor fixes
[ This came in during the merge window, but sorting out the signed tag
took a while, so thus the late merge - Linus ]
* tag 'turbostat-2023.11.07' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux: (86 commits)
tools/power turbostat: version 2023.11.07
tools/power/turbostat: bugfix "--show IPC"
tools/power/turbostat: Add initial support for LunarLake
tools/power/turbostat: Add initial support for ArrowLake
tools/power/turbostat: Add initial support for GrandRidge
tools/power/turbostat: Add initial support for SierraForest
tools/power/turbostat: Add initial support for GraniteRapids
tools/power/turbostat: Add MSR_CORE_C1_RES support for spr_features
tools/power/turbostat: Move process to root cgroup
tools/power/turbostat: Handle cgroup v2 cpu limitation
tools/power/turbostat: Abstrct function for parsing cpu string
tools/power/turbostat: Handle offlined CPUs in cpu_subset
tools/power/turbostat: Obey allowed CPUs for system summary
tools/power/turbostat: Obey allowed CPUs for primary thread/core detection
tools/power/turbostat: Abstract several functions
tools/power/turbostat: Obey allowed CPUs during startup
tools/power/turbostat: Obey allowed CPUs when accessing CPU counters
tools/power/turbostat: Introduce cpu_allowed_set
tools/power/turbostat: Remove PC7/PC9 support on ADL/RPL
tools/power/turbostat: Enable MSR_CORE_C1_RES on recent Intel client platforms
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"Thirteen hotfixes. Seven are cc:stable and the remainder pertain to
post-6.6 issues or aren't considered suitable for backporting"
* tag 'mm-hotfixes-stable-2023-11-17-14-04' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
mm: more ptep_get() conversion
parisc: fix mmap_base calculation when stack grows upwards
mm/damon/core.c: avoid unintentional filtering out of schemes
mm: kmem: drop __GFP_NOFAIL when allocating objcg vectors
mm/damon/sysfs-schemes: handle tried region directory allocation failure
mm/damon/sysfs-schemes: handle tried regions sysfs directory allocation failure
mm/damon/sysfs: check error from damon_sysfs_update_target()
mm: fix for negative counter: nr_file_hugepages
selftests/mm: add hugetlb_fault_after_madv to .gitignore
selftests/mm: restore number of hugepages
selftests: mm: fix some build warnings
selftests: mm: skip whole test instead of failure
mm/damon/sysfs: eliminate potential uninitialized variable warning
|
|
Rename verifier internal flag BPF_F_TEST_SANITY_STRICT to more neutral
BPF_F_TEST_REG_INVARIANTS. This is a follow up to [0].
A few selftests and veristat need to be adjusted in the same patch as
well.
[0] https://patchwork.kernel.org/project/netdevbpf/patch/[email protected]/
Signed-off-by: Andrii Nakryiko <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
GCC 14 introduces a new -Walloc-size included in -Wextra which errors out
like:
```
check.c: In function ‘cfi_alloc’:
check.c:294:33: error: allocation of insufficient size ‘1’ for type ‘struct cfi_state’ with size ‘320’ [-Werror=alloc-size]
294 | struct cfi_state *cfi = calloc(sizeof(struct cfi_state), 1);
| ^~~~~~
```
The calloc prototype is:
```
void *calloc(size_t nmemb, size_t size);
```
So, just swap the number of members and size arguments to match the prototype, as
we're initialising 1 struct of size `sizeof(struct ...)`. GCC then sees we're not
doing anything wrong.
Signed-off-by: Sam James <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Acked-by: Josh Poimboeuf <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
The blamed commit below introduced a typo causing 'gretap' test-case
failures:
./rtnetlink.sh -t kci_test_gretap -v
COMMAND: ip link add name test-dummy0 type dummy
COMMAND: ip link set test-dummy0 up
COMMAND: ip netns add testns
COMMAND: ip link help gretap 2>&1 | grep -q '^Usage:'
COMMAND: ip -netns testns link add dev gretap00 type gretap seq key 102 local 172.16.1.100 remote 172.16.1.200
COMMAND: ip -netns testns addr add dev gretap00 10.1.1.100/24
COMMAND: ip -netns testns link set dev gretap00 ups
Error: either "dev" is duplicate, or "ups" is a garbage.
COMMAND: ip -netns testns link del gretap00
COMMAND: ip -netns testns link add dev gretap00 type gretap external
COMMAND: ip -netns testns link del gretap00
FAIL: gretap
Fix it by using the correct keyword.
Fixes: 9c2a19f71515 ("kselftest: rtnetlink.sh: add verbose flag")
Signed-off-by: Paolo Abeni <[email protected]>
Reviewed-by: Hangbin Liu <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The sockets used by udpgso_bench_tx aren't always ready when
udpgso_bench_tx transmits packets. This issue is more prevalent in -rt
kernels, but can occur in both. Replace the hacky sleep calls with a
function that checks whether the ports in the namespace are ready for
use.
Suggested-by: Paolo Abeni <[email protected]>
Signed-off-by: Lucas Karpinski <[email protected]>
Reviewed-by: Willem de Bruijn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Leverage parallel tests in kselftests using all the available cpus.
We tested this in tuxsuite and locally extensively and it seems it's ready for prime time.
Signed-off-by: Pedro Tammela <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
While running tdc tests in parallel it can race over the module loading
done by tc and fail the run with random errors.
So avoid this by preloading all modules before running tdc in kselftests.
Signed-off-by: Pedro Tammela <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
As mentioned in the TC Workshop 0x17, our recent changes to tdc broke
downstream CI systems like tuxsuite. The issue is the classic problem
with rcu/workqueue objects where you can miss them if not enough wall time
has passed. The latter is subjective to the system and kernel config,
in my machine could be nanoseconds while in another could be microseconds
or more.
In order to make the suite deterministic, poll for the existence
of the objects in a reasonable manner. Talking netlink directly is the
the best solution in order to avoid paying the cost of multiple
'fork()' calls, so introduce a netlink based setup routine using
pyroute2. We leave the iproute2 one as a fallback when pyroute2 is not
available.
Also rework the iproute2 side to mimic the netlink routine where it
creates DEV0 as the peer of DEV1 and moves DEV1 into the net namespace.
This way when the namespace is deleted DEV0 is also deleted
automatically, leaving no margin for resource leaks.
Another bonus of this change is that our setup time sped up by a factor
of 2 when using netlink.
Signed-off-by: Pedro Tammela <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
This argument would bypass the net namespace creation and run the test in
the root namespace, even if nsPlugin was specified.
Drop it as it's the same as commenting out the nsPlugin from a test and adds
additional complexity to the plugin code.
Signed-off-by: Pedro Tammela <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from BPF and netfilter.
Current release - regressions:
- core: fix undefined behavior in netdev name allocation
- bpf: do not allocate percpu memory at init stage
- netfilter: nf_tables: split async and sync catchall in two
functions
- mptcp: fix possible NULL pointer dereference on close
Current release - new code bugs:
- eth: ice: dpll: fix initial lock status of dpll
Previous releases - regressions:
- bpf: fix precision backtracking instruction iteration
- af_unix: fix use-after-free in unix_stream_read_actor()
- tipc: fix kernel-infoleak due to uninitialized TLV value
- eth: bonding: stop the device in bond_setup_by_slave()
- eth: mlx5:
- fix double free of encap_header
- avoid referencing skb after free-ing in drop path
- eth: hns3: fix VF reset
- eth: mvneta: fix calls to page_pool_get_stats
Previous releases - always broken:
- core: set SOCK_RCU_FREE before inserting socket into hashtable
- bpf: fix control-flow graph checking in privileged mode
- eth: ppp: limit MRU to 64K
- eth: stmmac: avoid rx queue overrun
- eth: icssg-prueth: fix error cleanup on failing initialization
- eth: hns3: fix out-of-bounds access may occur when coalesce info is
read via debugfs
- eth: cortina: handle large frames
Misc:
- selftests: gso: support CONFIG_MAX_SKB_FRAGS up to 45"
* tag 'net-6.7-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (78 commits)
macvlan: Don't propagate promisc change to lower dev in passthru
net: sched: do not offload flows with a helper in act_ct
net/mlx5e: Check return value of snprintf writing to fw_version buffer for representors
net/mlx5e: Check return value of snprintf writing to fw_version buffer
net/mlx5e: Reduce the size of icosq_str
net/mlx5: Increase size of irq name buffer
net/mlx5e: Update doorbell for port timestamping CQ before the software counter
net/mlx5e: Track xmit submission to PTP WQ after populating metadata map
net/mlx5e: Avoid referencing skb after free-ing in drop path of mlx5e_sq_xmit_wqe
net/mlx5e: Don't modify the peer sent-to-vport rules for IPSec offload
net/mlx5e: Fix pedit endianness
net/mlx5e: fix double free of encap_header in update funcs
net/mlx5e: fix double free of encap_header
net/mlx5: Decouple PHC .adjtime and .adjphase implementations
net/mlx5: DR, Allow old devices to use multi destination FTE
net/mlx5: Free used cpus mask when an IRQ is released
Revert "net/mlx5: DR, Supporting inline WQE when possible"
bpf: Do not allocate percpu memory at init stage
net: Fix undefined behavior in netdev name allocation
dt-bindings: net: ethernet-controller: Fix formatting error
...
|
|
Alexei Starovoitov says:
====================
pull-request: bpf 2023-11-15
We've added 7 non-merge commits during the last 6 day(s) which contain
a total of 9 files changed, 200 insertions(+), 49 deletions(-).
The main changes are:
1) Do not allocate bpf specific percpu memory unconditionally, from Yonghong.
2) Fix precision backtracking instruction iteration, from Andrii.
3) Fix control flow graph checking, from Andrii.
4) Fix xskxceiver selftest build, from Anders.
* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
bpf: Do not allocate percpu memory at init stage
selftests/bpf: add more test cases for check_cfg()
bpf: fix control-flow graph checking in privileged mode
selftests/bpf: add edge case backtracking logic test
bpf: fix precision backtracking instruction iteration
bpf: handle ldimm64 properly in check_cfg()
selftests: bpf: xskxceiver: ksft_print_msg: fix format type error
====================
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
commit 116d57303a05 ("selftests/mm: add a new test for madv and hugetlb")
added a new test case, but, it didn't add the binary name in
tools/testing/selftests/mm/.gitignore.
Add hugetlb_fault_after_madv to tools/testing/selftests/mm/.gitignore.
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 116d57303a05 ("selftests/mm: add a new test for madv and hugetlb")
Signed-off-by: Breno Leitao <[email protected]>
Reported-by: Ryan Roberts <[email protected]>
Closes: https://lore.kernel.org/all/[email protected]/
Signed-off-by: Andrew Morton <[email protected]>
|
|
The test mm `hugetlb_fault_after_madv` selftest needs one and only one
huge page to run, thus it sets `/proc/sys/vm/nr_hugepages` to 1.
The problem is that further tests require the previous number of hugepages
allocated in order to succeed.
Save the number of huge pages before changing it, and restore it once the
test finishes, so, further tests could run successfully.
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 116d57303a05 ("selftests/mm: add a new test for madv and hugetlb")
Signed-off-by: Breno Leitao <[email protected]>
Reported-by: Ryan Roberts <[email protected]>
Closes: https://lore.kernel.org/all/[email protected]/
Signed-off-by: Andrew Morton <[email protected]>
|
|
Fix build warnings:
pagemap_ioctl.c:1154:38: warning: format `%s' expects a matching `char *' argument [-Wformat=]
pagemap_ioctl.c:1162:51: warning: format `%ld' expects argument of type `long int', but argument 2 has type `int' [-Wformat=]
pagemap_ioctl.c:1192:51: warning: format `%ld' expects argument of type `long int', but argument 2 has type `int' [-Wformat=]
pagemap_ioctl.c:1600:51: warning: format `%ld' expects argument of type `long int', but argument 2 has type `int' [-Wformat=]
pagemap_ioctl.c:1628:51: warning: format `%ld' expects argument of type `long int', but argument 2 has type `int' [-Wformat=]
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 46fd75d4a3c9 ("selftests: mm: add pagemap ioctl tests")
Signed-off-by: Muhammad Usama Anjum <[email protected]>
Cc: Shuah Khan <[email protected]>
Cc: Ryan Roberts <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Some architectures don't support userfaultfd. Skip running the whole test
on them instead of registering the failure.
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 46fd75d4a3c9 ("selftests: mm: add pagemap ioctl tests")
Reported-by: Ryan Roberts <[email protected]>
Closes: https://lore.kernel.org/all/[email protected]
Signed-off-by: Muhammad Usama Anjum <[email protected]>
Cc: Shuah Khan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Add a simple verifier test that requires deriving reg bounds for one
register from another register that's not a constant. This is
a realistic example of iterating elements of an array with fixed maximum
number of elements, but smaller actual number of elements.
This small example was an original motivation for doing this whole patch
set in the first place, yes.
Signed-off-by: Andrii Nakryiko <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
Add a new flag -r (--test-sanity), similar to -t (--test-states), to add
extra BPF program flags when loading BPF programs.
This allows to use veristat to easily catch sanity violations in
production BPF programs.
reg_bounds tests are also enforcing BPF_F_TEST_SANITY_STRICT flag now.
Signed-off-by: Andrii Nakryiko <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
Make sure to set BPF_F_TEST_SANITY_STRICT program flag by default across
most verifier tests (and a bunch of others that set custom prog flags).
There are currently two tests that do fail validation, if enforced
strictly: verifier_bounds/crossing_64_bit_signed_boundary_2 and
verifier_bounds/crossing_32_bit_signed_boundary_2. To accommodate them,
we teach test_loader a flag negation:
__flag(!<flagname>) will *clear* specified flag, allowing easy opt-out.
We apply __flag(!BPF_F_TEST_SANITY_STRICT) to these to tests.
Also sprinkle BPF_F_TEST_SANITY_STRICT everywhere where we already set
test-only BPF_F_TEST_RND_HI32 flag, for completeness.
Acked-by: Eduard Zingerman <[email protected]>
Signed-off-by: Andrii Nakryiko <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
Add random cases generation to reg_bounds.c and run them without
SLOW_TESTS=1 to increase a chance of BPF CI catching latent issues.
Suggested-by: Alexei Starovoitov <[email protected]>
Signed-off-by: Andrii Nakryiko <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
Now that verifier supports range vs range bounds adjustments, validate
that by checking each generated range against every other generated
range, across all supported operators (everything by JSET).
We also add few cases that were problematic during development either
for verifier or for selftest's range tracking implementation.
Note that we utilize the same trick with splitting everything into
multiple independent parallelizable tests, but init_t and cond_t. This
brings down verification time in parallel mode from more than 8 hours
down to less that 1.5 hours. 106 million cases were successfully
validate for range vs range logic, in addition to about 7 million range
vs const cases, added in earlier patch.
Signed-off-by: Andrii Nakryiko <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
Similar to kernel-side BPF verifier logic enhancements, use 32-bit
subrange knowledge for is_branch_taken() logic in reg_bounds selftests.
Signed-off-by: Andrii Nakryiko <[email protected]>
Acked-by: Eduard Zingerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
Add test to validate BPF verifier's register range bounds tracking logic.
The main bulk is a lot of auto-generated tests based on a small set of
seed values for lower and upper 32 bits of full 64-bit values.
Currently we validate only range vs const comparisons, but the idea is
to start validating range over range comparisons in subsequent patch set.
When setting up initial register ranges we treat registers as one of
u64/s64/u32/s32 numeric types, and then independently perform conditional
comparisons based on a potentially different u64/s64/u32/s32 types. This
tests lots of tricky cases of deriving bounds information across
different numeric domains.
Given there are lots of auto-generated cases, we guard them behind
SLOW_TESTS=1 envvar requirement, and skip them altogether otherwise.
With current full set of upper/lower seed value, all supported
comparison operators and all the combinations of u64/s64/u32/s32 number
domains, we get about 7.7 million tests, which run in about 35 minutes
on my local qemu instance without parallelization. But we also split
those tests by init/cond numeric types, which allows to rely on
test_progs's parallelization of tests with `-j` option, getting run time
down to about 5 minutes on 8 cores. It's still something that shouldn't
be run during normal test_progs run. But we can run it a reasonable
time, and so perhaps a nightly CI test run (once we have it) would be
a good option for this.
We also add a small set of tricky conditions that came up during
development and triggered various bugs or corner cases in either
selftest's reimplementation of range bounds logic or in verifier's logic
itself. These are fast enough to be run as part of normal test_progs
test run and are great for a quick sanity checking.
Let's take a look at test output to understand what's going on:
$ sudo ./test_progs -t reg_bounds_crafted
#191/1 reg_bounds_crafted/(u64)[0; 0xffffffff] (u64)< 0:OK
...
#191/115 reg_bounds_crafted/(u64)[0; 0x17fffffff] (s32)< 0:OK
...
#191/137 reg_bounds_crafted/(u64)[0xffffffff; 0x100000000] (u64)== 0:OK
Each test case is uniquely and fully described by this generated string.
E.g.: "(u64)[0; 0x17fffffff] (s32)< 0". This means that we
initialize a register (R6) in such a way that verifier knows that it can
have a value in [(u64)0; (u64)0x17fffffff] range. Another
register (R7) is also set up as u64, but this time a constant (zero in
this case). They then are compared using 32-bit signed < operation.
Resulting TRUE/FALSE branches are evaluated (including cases where it's
known that one of the branches will never be taken, in which case we
validate that verifier also determines this as a dead code). Test
validates that verifier's final register state matches expected state
based on selftest's own reg_state logic, implemented from scratch for
cross-checking purposes.
These test names can be conveniently used for further debugging, and if -vv
verboseness is requested we can get a corresponding verifier log (with
mark_precise logs filtered out as irrelevant and distracting). Example below is
slightly redacted for brevity, omitting irrelevant register output in
some places, marked with [...].
$ sudo ./test_progs -a 'reg_bounds_crafted/(u32)[0; U32_MAX] (s32)< -1' -vv
...
VERIFIER LOG:
========================
func#0 @0
0: R1=ctx(off=0,imm=0) R10=fp0
0: (05) goto pc+2
3: (85) call bpf_get_current_pid_tgid#14 ; R0_w=scalar()
4: (bc) w6 = w0 ; R0_w=scalar() R6_w=scalar(smin=0,smax=umax=4294967295,var_off=(0x0; 0xffffffff))
5: (85) call bpf_get_current_pid_tgid#14 ; R0_w=scalar()
6: (bc) w7 = w0 ; R0_w=scalar() R7_w=scalar(smin=0,smax=umax=4294967295,var_off=(0x0; 0xffffffff))
7: (b4) w1 = 0 ; R1_w=0
8: (b4) w2 = -1 ; R2=4294967295
9: (ae) if w6 < w1 goto pc-9
9: R1=0 R6=scalar(smin=0,smax=umax=4294967295,var_off=(0x0; 0xffffffff))
10: (2e) if w6 > w2 goto pc-10
10: R2=4294967295 R6=scalar(smin=0,smax=umax=4294967295,var_off=(0x0; 0xffffffff))
11: (b4) w1 = -1 ; R1_w=4294967295
12: (b4) w2 = -1 ; R2_w=4294967295
13: (ae) if w7 < w1 goto pc-13 ; R1_w=4294967295 R7=4294967295
14: (2e) if w7 > w2 goto pc-14
14: R2_w=4294967295 R7=4294967295
15: (bc) w0 = w6 ; [...] R6=scalar(id=1,smin=0,smax=umax=4294967295,var_off=(0x0; 0xffffffff))
16: (bc) w0 = w7 ; [...] R7=4294967295
17: (ce) if w6 s< w7 goto pc+3 ; R6=scalar(id=1,smin=0,smax=umax=4294967295,smin32=-1,var_off=(0x0; 0xffffffff)) R7=4294967295
18: (bc) w0 = w6 ; [...] R6=scalar(id=1,smin=0,smax=umax=4294967295,smin32=-1,var_off=(0x0; 0xffffffff))
19: (bc) w0 = w7 ; [...] R7=4294967295
20: (95) exit
from 17 to 21: [...]
21: (bc) w0 = w6 ; [...] R6=scalar(id=1,smin=umin=umin32=2147483648,smax=umax=umax32=4294967294,smax32=-2,var_off=(0x80000000; 0x7fffffff))
22: (bc) w0 = w7 ; [...] R7=4294967295
23: (95) exit
from 13 to 1: [...]
1: [...]
1: (b7) r0 = 0 ; R0_w=0
2: (95) exit
processed 24 insns (limit 1000000) max_states_per_insn 0 total_states 2 peak_states 2 mark_read 1
=====================
Verifier log above is for `(u32)[0; U32_MAX] (s32)< -1` use cases, where u32
range is used for initialization, followed by signed < operator. Note
how we use w6/w7 in this case for register initialization (it would be
R6/R7 for 64-bit types) and then `if w6 s< w7` for comparison at
instruction #17. It will be `if R6 < R7` for 64-bit unsigned comparison.
Above example gives a good impression of the overall structure of a BPF
programs generated for reg_bounds tests.
In the future, this "framework" can be extended to test not just
conditional jumps, but also arithmetic operations. Adding randomized
testing is another possibility.
Some implementation notes. We basically have our own generics-like
operations on numbers, where all the numbers are stored in u64, but how
they are interpreted is passed as runtime argument enum num_t. Further,
`struct range` represents a bounds range, and those are collected
together into a minimal `struct reg_state`, which collects range bounds
across all four numberical domains: u64, s64, u32, s64.
Based on these primitives and `enum op` representing possible
conditional operation (<, <=, >, >=, ==, !=), there is a set of generic
helpers to perform "range arithmetics", which is used to maintain struct
reg_state. We simulate what verifier will do for reg bounds of R6 and R7
registers using these range and reg_state primitives. Simulated
information is used to determine branch taken conclusion and expected
exact register state across all four number domains.
Implementation of "range arithmetics" is more generic than what verifier
is currently performing: it allows range over range comparisons and
adjustments. This is the intended end goal of this patch set overall and verifier
logic is enhanced in subsequent patches in this series to handle range
vs range operations, at which point selftests are extended to validate
these conditions as well. For now it's range vs const cases only.
Note that tests are split into multiple groups by their numeric types
for initialization of ranges and for comparison operation. This allows
to use test_progs's -j parallelization to speed up tests, as we now have
16 groups of parallel running tests. Overall reduction of running time
that allows is pretty good, we go down from more than 30 minutes to
slightly less than 5 minutes running time.
Acked-by: Eduard Zingerman <[email protected]>
Signed-off-by: Andrii Nakryiko <[email protected]>
Acked-by: Shung-Hsi Yu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
Add simple sanity checks that validate well-formed ranges (min <= max)
across u64, s64, u32, and s32 ranges. Also for cases when the value is
constant (either 64-bit or 32-bit), we validate that ranges and tnums
are in agreement.
These bounds checks are performed at the end of BPF_ALU/BPF_ALU64
operations, on conditional jumps, and for LDX instructions (where subreg
zero/sign extension is probably the most important to check). This
covers most of the interesting cases.
Also, we validate the sanity of the return register when manually
adjusting it for some special helpers.
By default, sanity violation will trigger a warning in verifier log and
resetting register bounds to "unbounded" ones. But to aid development
and debugging, BPF_F_TEST_SANITY_STRICT flag is added, which will
trigger hard failure of verification with -EFAULT on register bounds
violations. This allows selftests to catch such issues. veristat will
also gain a CLI option to enable this behavior.
Acked-by: Eduard Zingerman <[email protected]>
Signed-off-by: Andrii Nakryiko <[email protected]>
Acked-by: Shung-Hsi Yu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
Running the mp_join selftest manually with the following command line:
./mptcp_join.sh -z -C
leads to some failures:
002 fastclose server test
# ...
rtx [fail] got 1 MP_RST[s] TX expected 0
# ...
rstrx [fail] got 1 MP_RST[s] RX expected 0
The problem is really in the wrong expectations for the RST checks
implied by the csum validation. Note that the same check is repeated
explicitly in the same test-case, with the correct expectation and
pass successfully.
Address the issue explicitly setting the correct expectation for
the failing checks.
Reported-by: Xiumei Mu <[email protected]>
Fixes: 6bf41020b72b ("selftests: mptcp: update and extend fastclose test-cases")
Cc: [email protected]
Signed-off-by: Paolo Abeni <[email protected]>
Reviewed-by: Matthieu Baerts <[email protected]>
Signed-off-by: Matthieu Baerts <[email protected]>
Link: https://lore.kernel.org/r/20231114-upstream-net-20231113-mptcp-misc-fixes-6-7-rc2-v1-5-7b9cd6a7b7f4@kernel.org
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Add selftests for cgroup1 hierarchy.
The result as follows,
$ tools/testing/selftests/bpf/test_progs --name=cgroup1_hierarchy
#36/1 cgroup1_hierarchy/test_cgroup1_hierarchy:OK
#36/2 cgroup1_hierarchy/test_root_cgid:OK
#36/3 cgroup1_hierarchy/test_invalid_level:OK
#36/4 cgroup1_hierarchy/test_invalid_cgid:OK
#36/5 cgroup1_hierarchy/test_invalid_hid:OK
#36/6 cgroup1_hierarchy/test_invalid_cgrp_name:OK
#36/7 cgroup1_hierarchy/test_invalid_cgrp_name2:OK
#36/8 cgroup1_hierarchy/test_sleepable_prog:OK
#36 cgroup1_hierarchy:OK
Summary: 1/8 PASSED, 0 SKIPPED, 0 FAILED
Besides, I also did some stress test similar to the patch #2 in this
series, as follows (with CONFIG_PROVE_RCU_LIST enabled):
- Continuously mounting and unmounting named cgroups in some tasks,
for example:
cgrp_name=$1
while true
do
mount -t cgroup -o none,name=$cgrp_name none /$cgrp_name
umount /$cgrp_name
done
- Continuously run this selftest concurrently,
while true; do ./test_progs --name=cgroup1_hierarchy; done
They can ran successfully without any RCU warnings in dmesg.
Signed-off-by: Yafang Shao <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
A new cgroup helper function, get_cgroup1_hierarchy_id(), has been
introduced to obtain the ID of a cgroup1 hierarchy based on the provided
cgroup name. This cgroup name can be obtained from the /proc/self/cgroup
file.
Signed-off-by: Yafang Shao <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
Introduce a new helper function to retrieve the cgroup ID from a net_cls
cgroup directory.
Signed-off-by: Yafang Shao <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
Include the current pid in the classid cgroup path. This way, different
testers relying on classid-based configurations will have distinct classid
cgroup directories, enabling them to run concurrently. Additionally, we
leverage the current pid as the classid, ensuring unique identification.
Signed-off-by: Yafang Shao <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
If the net_cls subsystem is already mounted, attempting to mount it again
in setup_classid_environment() will result in a failure with the error code
EBUSY. Despite this, tmpfs will have been successfully mounted at
/sys/fs/cgroup/net_cls. Consequently, the /sys/fs/cgroup/net_cls directory
will be empty, causing subsequent setup operations to fail.
Here's an error log excerpt illustrating the issue when net_cls has already
been mounted at /sys/fs/cgroup/net_cls prior to running
setup_classid_environment():
- Before that change
$ tools/testing/selftests/bpf/test_progs --name=cgroup_v1v2
test_cgroup_v1v2:PASS:server_fd 0 nsec
test_cgroup_v1v2:PASS:client_fd 0 nsec
test_cgroup_v1v2:PASS:cgroup_fd 0 nsec
test_cgroup_v1v2:PASS:server_fd 0 nsec
run_test:PASS:skel_open 0 nsec
run_test:PASS:prog_attach 0 nsec
test_cgroup_v1v2:PASS:cgroup-v2-only 0 nsec
(cgroup_helpers.c:248: errno: No such file or directory) Opening Cgroup Procs: /sys/fs/cgroup/net_cls/cgroup.procs
(cgroup_helpers.c:540: errno: No such file or directory) Opening cgroup classid: /sys/fs/cgroup/net_cls/cgroup-test-work-dir/net_cls.classid
run_test:PASS:skel_open 0 nsec
run_test:PASS:prog_attach 0 nsec
(cgroup_helpers.c:248: errno: No such file or directory) Opening Cgroup Procs: /sys/fs/cgroup/net_cls/cgroup-test-work-dir/cgroup.procs
run_test:FAIL:join_classid unexpected error: 1 (errno 2)
test_cgroup_v1v2:FAIL:cgroup-v1v2 unexpected error: -1 (errno 2)
(cgroup_helpers.c:248: errno: No such file or directory) Opening Cgroup Procs: /sys/fs/cgroup/net_cls/cgroup.procs
#44 cgroup_v1v2:FAIL
Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED
- After that change
$ tools/testing/selftests/bpf/test_progs --name=cgroup_v1v2
#44 cgroup_v1v2:OK
Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED
Signed-off-by: Yafang Shao <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
|