Age | Commit message (Collapse) | Author | Files | Lines |
|
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:
====================
pull-request: bpf-next 2023-08-16
We've added 17 non-merge commits during the last 6 day(s) which contain
a total of 20 files changed, 1179 insertions(+), 37 deletions(-).
The main changes are:
1) Add a BPF hook in sys_socket() to change the protocol ID
from IPPROTO_TCP to IPPROTO_MPTCP to cover migration for legacy
applications, from Geliang Tang.
2) Follow-up/fallout fix from the SO_REUSEPORT + bpf_sk_assign work
to fix a splat on non-fullsock sks in inet[6]_steal_sock,
from Lorenz Bauer.
3) Improvements to struct_ops links to avoid forcing presence of
update/validate callbacks. Also add bpf_struct_ops fields documentation,
from David Vernet.
4) Ensure libbpf sets close-on-exec flag on gzopen, from Marco Vedovati.
5) Several new tcx selftest additions and bpftool link show support for
tcx and xdp links, from Daniel Borkmann.
6) Fix a smatch warning on uninitialized symbol in
bpf_perf_link_fill_kprobe, from Yafang Shao.
7) BPF selftest fixes e.g. misplaced break in kfunc_call test,
from Yipeng Zou.
8) Small cleanup to remove unused declaration bpf_link_new_file,
from Yue Haibing.
9) Small typo fix to bpftool's perf help message, from Daniel T. Lee.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next:
selftests/bpf: Add mptcpify test
selftests/bpf: Fix error checks of mptcp open_and_load
selftests/bpf: Add two mptcp netns helpers
bpf: Add update_socket_protocol hook
bpftool: Implement link show support for xdp
bpftool: Implement link show support for tcx
selftests/bpf: Add selftest for fill_link_info
bpf: Fix uninitialized symbol in bpf_perf_link_fill_kprobe()
net: Fix slab-out-of-bounds in inet[6]_steal_sock
bpf: Document struct bpf_struct_ops fields
bpf: Support default .validate() and .update() behavior for struct_ops links
selftests/bpf: Add various more tcx test cases
selftests/bpf: Clean up fmod_ret in bench_rename test script
selftests/bpf: Fix repeat option when kfunc_call verification fails
libbpf: Set close-on-exec flag on gzopen
bpftool: fix perf help message
bpf: Remove unused declaration bpf_link_new_file()
====================
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
This reverts commit 90bc21aaef4adaefceda2d385756138fc247c0c2.
Patch was merged too hastily, Vladimir requested changes in:
https://lore.kernel.org/all/20230816121305.5dio5tk3chge2ndh@skbuf/
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Geliang Tang says:
====================
As is described in the "How to use MPTCP?" section in MPTCP wiki [1]:
"Your app should create sockets with IPPROTO_MPTCP as the proto:
( socket(AF_INET, SOCK_STREAM, IPPROTO_MPTCP); ). Legacy apps can be
forced to create and use MPTCP sockets instead of TCP ones via the
mptcpize command bundled with the mptcpd daemon."
But the mptcpize (LD_PRELOAD technique) command has some limitations
[2]:
- it doesn't work if the application is not using libc (e.g. GoLang
apps)
- in some envs, it might not be easy to set env vars / change the way
apps are launched, e.g. on Android
- mptcpize needs to be launched with all apps that want MPTCP: we could
have more control from BPF to enable MPTCP only for some apps or all the
ones of a netns or a cgroup, etc.
- it is not in BPF, we cannot talk about it at netdev conf.
So this patchset attempts to use BPF to implement functions similer to
mptcpize.
The main idea is to add a hook in sys_socket() to change the protocol id
from IPPROTO_TCP (or 0) to IPPROTO_MPTCP.
[1]
https://github.com/multipath-tcp/mptcp_net-next/wiki
[2]
https://github.com/multipath-tcp/mptcp_net-next/issues/79
v14:
- Use getsockopt(MPTCP_INFO) to verify mptcp protocol intead of using
nstat command.
v13:
- drop "Use random netns name for mptcp" patch.
v12:
- update diag_* log of update_socket_protocol.
- add 'ip netns show' after 'ip netns del' to check if there is
a test did not clean up its netns.
- return libbpf_get_error() instead of -EIO for the error from
open_and_load().
- Use getsockopt(SOL_PROTOCOL) to verify mptcp protocol intead of
using 'ss -tOni'.
v11:
- add comments about outputs of 'ss' and 'nstat'.
- use "err = verify_mptcpify()" instead of using =+.
v10:
- drop "#ifdef CONFIG_BPF_JIT".
- include vmlinux.h and bpf_tracing_net.h to avoid defining some
macros.
- drop unneeded checks for mptcp.
v9:
- update comment for 'update_socket_protocol'.
v8:
- drop the additional checks on the 'protocol' value after the
'update_socket_protocol()' call.
v7:
- add __weak and __diag_* for update_socket_protocol.
v6:
- add update_socket_protocol.
v5:
- add bpf_mptcpify helper.
v4:
- use lsm_cgroup/socket_create
v3:
- patch 8: char cmd[128]; -> char cmd[256];
v2:
- Fix build selftests errors reported by CI
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/79
====================
Signed-off-by: Martin KaFai Lau <[email protected]>
|
|
Implement a new test program mptcpify: if the family is AF_INET or
AF_INET6, the type is SOCK_STREAM, and the protocol ID is 0 or
IPPROTO_TCP, set it to IPPROTO_MPTCP. It will be hooked in
update_socket_protocol().
Extend the MPTCP test base, add a selftest test_mptcpify() for the
mptcpify case. Open and load the mptcpify test prog to mptcpify the
TCP sockets dynamically, then use start_server() and connect_to_fd()
to create a TCP socket, but actually what's created is an MPTCP
socket, which can be verified through 'getsockopt(SOL_PROTOCOL)'
and 'getsockopt(MPTCP_INFO)'.
Acked-by: Yonghong Song <[email protected]>
Reviewed-by: Matthieu Baerts <[email protected]>
Signed-off-by: Geliang Tang <[email protected]>
Link: https://lore.kernel.org/r/364e72f307e7bb38382ec7442c182d76298a9c41.1692147782.git.geliang.tang@suse.com
Signed-off-by: Martin KaFai Lau <[email protected]>
|
|
Return libbpf_get_error(), instead of -EIO, for the error from
mptcp_sock__open_and_load().
Load success means prog_fd and map_fd are always valid. So drop these
unneeded ASSERT_GE checks for them in mptcp run_test().
Acked-by: Yonghong Song <[email protected]>
Signed-off-by: Geliang Tang <[email protected]>
Link: https://lore.kernel.org/r/db5fcb93293df9ab173edcbaf8252465b80da6f2.1692147782.git.geliang.tang@suse.com
Signed-off-by: Martin KaFai Lau <[email protected]>
|
|
Add two netns helpers for mptcp tests: create_netns() and
cleanup_netns(). Use them in test_base().
These new helpers will be re-used in the following commits
introducing new tests.
Acked-by: Yonghong Song <[email protected]>
Reviewed-by: Matthieu Baerts <[email protected]>
Signed-off-by: Geliang Tang <[email protected]>
Link: https://lore.kernel.org/r/7506371fb6c417b401cc9d7365fe455754f4ba3f.1692147782.git.geliang.tang@suse.com
Signed-off-by: Martin KaFai Lau <[email protected]>
|
|
Add a hook named update_socket_protocol in __sys_socket(), for bpf
progs to attach to and update socket protocol. One user case is to
force legacy TCP apps to create and use MPTCP sockets instead of
TCP ones.
Define a fmod_ret set named bpf_mptcp_fmodret_ids, add the hook
update_socket_protocol into this set, and register it in
bpf_mptcp_kfunc_init().
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/79
Acked-by: Matthieu Baerts <[email protected]>
Acked-by: Yonghong Song <[email protected]>
Signed-off-by: Geliang Tang <[email protected]>
Link: https://lore.kernel.org/r/ac84be00f97072a46f8a72b4e2be46cbb7fa5053.1692147782.git.geliang.tang@suse.com
Signed-off-by: Martin KaFai Lau <[email protected]>
|
|
Add support to dump XDP link information to bpftool. This reuses the
recently added show_link_ifindex_{plain,json}(). The XDP link info only
exposes the ifindex.
Below shows an example link dump output, and a cgroup link is included
for comparison, too:
# bpftool link
[...]
10: cgroup prog 2466
cgroup_id 1 attach_type cgroup_inet6_post_bind
[...]
16: xdp prog 2477
ifindex enp5s0(3)
[...]
Equivalent json output:
# bpftool link --json
[...]
{
"id": 10,
"type": "cgroup",
"prog_id": 2466,
"cgroup_id": 1,
"attach_type": "cgroup_inet6_post_bind"
},
[...]
{
"id": 16,
"type": "xdp",
"prog_id": 2477,
"devname": "enp5s0",
"ifindex": 3
}
[...]
Signed-off-by: Daniel Borkmann <[email protected]>
Reviewed-by: Quentin Monnet <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Martin KaFai Lau <[email protected]>
|
|
Add support to dump tcx link information to bpftool. This adds a
common helper show_link_ifindex_{plain,json}() which can be reused
also for other link types. The plain text and json device output is
the same format as in bpftool net dump.
Below shows an example link dump output along with a cgroup link
for comparison:
# bpftool link
[...]
10: cgroup prog 1977
cgroup_id 1 attach_type cgroup_inet6_post_bind
[...]
13: tcx prog 2053
ifindex enp5s0(3) attach_type tcx_ingress
14: tcx prog 2080
ifindex enp5s0(3) attach_type tcx_egress
[...]
Equivalent json output:
# bpftool link --json
[...]
{
"id": 10,
"type": "cgroup",
"prog_id": 1977,
"cgroup_id": 1,
"attach_type": "cgroup_inet6_post_bind"
},
[...]
{
"id": 13,
"type": "tcx",
"prog_id": 2053,
"devname": "enp5s0",
"ifindex": 3,
"attach_type": "tcx_ingress"
},
{
"id": 14,
"type": "tcx",
"prog_id": 2080,
"devname": "enp5s0",
"ifindex": 3,
"attach_type": "tcx_egress"
}
[...]
Suggested-by: Yafang Shao <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Reviewed-by: Quentin Monnet <[email protected]>
Acked-by: Yafang Shao <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Martin KaFai Lau <[email protected]>
|
|
Add selftest for the fill_link_info of uprobe, kprobe and tracepoint.
The result:
$ tools/testing/selftests/bpf/test_progs --name=fill_link_info
#79/1 fill_link_info/kprobe_link_info:OK
#79/2 fill_link_info/kretprobe_link_info:OK
#79/3 fill_link_info/kprobe_invalid_ubuff:OK
#79/4 fill_link_info/tracepoint_link_info:OK
#79/5 fill_link_info/uprobe_link_info:OK
#79/6 fill_link_info/uretprobe_link_info:OK
#79/7 fill_link_info/kprobe_multi_link_info:OK
#79/8 fill_link_info/kretprobe_multi_link_info:OK
#79/9 fill_link_info/kprobe_multi_invalid_ubuff:OK
#79 fill_link_info:OK
Summary: 1/9 PASSED, 0 SKIPPED, 0 FAILED
The test case for kprobe_multi won't be run on aarch64, as it is not
supported.
Signed-off-by: Yafang Shao <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Yonghong Song <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
The commit 1b715e1b0ec5 ("bpf: Support ->fill_link_info for perf_event") leads
to the following Smatch static checker warning:
kernel/bpf/syscall.c:3416 bpf_perf_link_fill_kprobe()
error: uninitialized symbol 'type'.
That can happens when uname is NULL. So fix it by verifying the uname when we
really need to fill it.
Fixes: 1b715e1b0ec5 ("bpf: Support ->fill_link_info for perf_event")
Reported-by: Dan Carpenter <[email protected]>
Signed-off-by: Yafang Shao <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Yonghong Song <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Closes: https://lore.kernel.org/bpf/[email protected]
Link: https://lore.kernel.org/bpf/[email protected]
|
|
Kui-Feng Lee says:
====================
Remove expired routes with a separated list of routes.
FIB6 GC walks trees of fib6_tables to remove expired routes. Walking a tree
can be expensive if the number of routes in a table is big, even if most of
them are permanent. Checking routes in a separated list of routes having
expiration will avoid this potential issue.
Background
==========
The size of a Linux IPv6 routing table can become a big problem if not
managed appropriately. Now, Linux has a garbage collector to remove
expired routes periodically. However, this may lead to a situation in
which the routing path is blocked for a long period due to an
excessive number of routes.
For example, years ago, there is a commit c7bb4b89033b ("ipv6: tcp:
drop silly ICMPv6 packet too big messages"). The root cause is that
malicious ICMPv6 packets were sent back for every small packet sent to
them. These packets add routes with an expiration time that prompts
the GC to periodically check all routes in the tables, including
permanent ones.
Why Route Expires
=================
Users can add IPv6 routes with an expiration time manually. However,
the Neighbor Discovery protocol may also generate routes that can
expire. For example, Router Advertisement (RA) messages may create a
default route with an expiration time. [RFC 4861] For IPv4, it is not
possible to set an expiration time for a route, and there is no RA, so
there is no need to worry about such issues.
Create Routes with Expires
==========================
You can create routes with expires with the command.
For example,
ip -6 route add 2001:b000:591::3 via fe80::5054:ff:fe12:3457 \
dev enp0s3 expires 30
The route that has been generated will be deleted automatically in 30
seconds.
GC of FIB6
==========
The function called fib6_run_gc() is responsible for performing
garbage collection (GC) for the Linux IPv6 stack. It checks for the
expiration of every route by traversing the trees of routing
tables. The time taken to traverse a routing table increases with its
size. Holding the routing table lock during traversal is particularly
undesirable. Therefore, it is preferable to keep the lock for the
shortest possible duration.
Solution
========
The cause of the issue is keeping the routing table locked during the
traversal of large trees. To solve this problem, we can create a separate
list of routes that have expiration. This will prevent GC from checking
permanent routes.
Result
======
We conducted a test to measure the execution times of fib6_gc_timer_cb()
and observed that it enhances the GC of FIB6. During the test, we added
permanent routes with the following numbers: 1000, 3000, 6000, and
9000. Additionally, we added a route with an expiration time.
Here are the average execution times for the kernel without the patch.
- 120020 ns with 1000 permanent routes
- 308920 ns with 3000 ...
- 581470 ns with 6000 ...
- 855310 ns with 9000 ...
The kernel with the patch consistently takes around 14000 ns to execute,
regardless of the number of permanent routes that are installed.
Major changes from v7:
- Fix warings raised by the patchwork.
Major changes from v6:
- Remove unnecessary check of tb6 in fib6_clean_expires_locked().
- Use ib6_clean_expires_locked() instead in fib6_purge_rt().
Major changes from v5:
- Change the order of adding new routes to the GC list and starting
GC timer.
- Remove time measurements from the test case.
- Stop forcing GC flush.
Major changes from v4:
- Detect existence of 'strace' in the test case.
Major changes from v3:
- Fix the type of arg according to feedback.
- Add 1k temporary routes and 5K permanent routes in the test case.
Measure time spending on GC with strace.
Major changes from v2:
- Remove unnecessary and incorrect sysctl restoring in the test case.
Major changes from v1:
- Moved gc_link to avoid creating a hole in fib6_info.
- Moved fib6_set_expires*() and fib6_clean_expires*() to the header
file and inlined. And removed duplicated lines.
- Added a test case.
====================
Signed-off-by: David S. Miller <[email protected]>
|
|
Add 1000 IPv6 routes with expiration time (w/ and w/o additional 5000
permanet routes in the background.) Wait for a few seconds to make sure
they are removed correctly.
The expected output of the test looks like the following example.
> Fib6 garbage collection test
> TEST: ipv6 route garbage collection [ OK ]
Signed-off-by: Kui-Feng Lee <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
FIB6 GC walks trees of fib6_tables to remove expired routes. Walking a tree
can be expensive if the number of routes in a table is big, even if most of
them are permanent. Checking routes in a separated list of routes having
expiration will avoid this potential issue.
Signed-off-by: Kui-Feng Lee <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
On some I219 devices, ethernet cable plugging detection only works once
from PCI D3 state. Subsequent cable plugging does set PME bit correctly,
but device still doesn't get woken up.
Since I219 connects to the root complex directly, it relies on platform
firmware (ACPI) to wake it up. In this case, the GPE from _PRW only
works for first cable plugging but fails to notify the driver for
subsequent plugging events.
The issue was originally found on CNP, but the same issue can be found
on ADL too. So workaround the issue by continuing use PME poll after
first ACPI wake. As PME poll is always used, the runtime suspend
restriction for CNP can also be removed.
Signed-off-by: Kai-Heng Feng <[email protected]>
Tested-by: Naama Meir <[email protected]>
Acked-by: Sasha Neftin <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Reviewed-by: Leon Romanovsky <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Now there are two indicators of socket memory pressure sit inside
struct mem_cgroup, socket_pressure and tcpmem_pressure, indicating
memory reclaim pressure in memcg->memory and ->tcpmem respectively.
When in legacy mode (cgroupv1), the socket memory is charged into
->tcpmem which is independent of ->memory, so socket_pressure has
nothing to do with socket's pressure at all. Things could be worse
by taking socket_pressure into consideration in legacy mode, as a
pressure in ->memory can lead to premature reclamation/throttling
in socket.
While for the default mode (cgroupv2), the socket memory is charged
into ->memory, and ->tcpmem/->tcpmem_pressure are simply not used.
So {socket,tcpmem}_pressure are only used in default/legacy mode
respectively for indicating socket memory pressure. This patch fixes
the pieces of code that make mixed use of both.
Fixes: 8e8ae645249b ("mm: memcontrol: hook up vmpressure to socket pressure")
Signed-off-by: Abel Wu <[email protected]>
Acked-by: Shakeel Butt <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Take over maintainership of the nfp driver from Simon as he
is moving away from Corigine.
Signed-off-by: Louis Peens <[email protected]>
Acked-by: Simon Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
This patch adds MQPRIO Qdisc offload in full 'channel' mode which allows
not only setting up pri:tc mapping, but also configuring TX shapers on
external port FIFOs. The K3 CPSW MQPRIO Qdisc offload is expected to work
with VLAN/priority tagged packets. Non-tagged packets have to be mapped
only to TC0.
- TX traffic classes must be rated starting from TC that has highest
priority and with no gaps
- Traffic classes are used starting from 0, that has highest priority
- min_rate defines Committed Information Rate (guaranteed)
- max_rate defines Excess Information Rate (non guaranteed) and offloaded
as (max_rate[i] - tcX_min_rate[i])
- VLAN/priority tagged packets mapped to TC0 will exit switch with VLAN tag
priority 0
The configuration example:
ethtool -L eth1 tx 5
ethtool --set-priv-flags eth1 p0-rx-ptype-rrobin off
tc qdisc add dev eth1 parent root handle 100: mqprio num_tc 3 \
map 0 0 1 2 0 0 0 0 0 0 0 0 0 0 0 0 \
queues 1@0 1@1 1@2 hw 1 mode channel \
shaper bw_rlimit min_rate 0 100mbit 200mbit max_rate 0 101mbit 202mbit
tc qdisc replace dev eth2 handle 100: parent root mqprio num_tc 1 \
map 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 queues 1@0 hw 1
ip link add link eth1 name eth1.100 type vlan id 100
ip link set eth1.100 type vlan egress 0:0 1:1 2:2 3:3 4:4 5:5 6:6 7:7
In the above example two ports share the same TX CPPI queue 0 for low
priority traffic. 3 traffic classes are defined for eth1 and mapped to:
TC0 - low priority, TX CPPI queue 0 -> ext Port 1 fifo0, no rate limit
TC1 - prio 2, TX CPPI queue 1 -> ext Port 1 fifo1, CIR=100Mbit/s, EIR=1Mbit/s
TC2 - prio 3, TX CPPI queue 2 -> ext Port 1 fifo2, CIR=200Mbit/s, EIR=2Mbit/s
Signed-off-by: Grygorii Strashko <[email protected]>
Signed-off-by: Roger Quadros <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Eric Dumazet says:
====================
inet: socket lock and data-races avoidance
In this series, I converted 20 bits in "struct inet_sock" and made
them truly atomic.
This allows to implement many IP_ socket options in a lockless
fashion (no need to acquire socket lock), and fixes data-races
that were showing up in various KCSAN reports.
I also took care of IP_TTL/IP_MINTTL, but left few other options
for another series.
v4: Rebased after recent mptcp changes.
Added Reviewed-by: tags from Simon (thanks !)
v3: fixed patch 7, feedback from build bot about ipvs set_mcast_loop()
v2: addressed a feedback from a build bot in patch 9 by removing
unused issk variable in mptcp_setsockopt_sol_ip_set_transparent()
Added Acked-by: tags from Soheil (thanks !)
====================
Signed-off-by: David S. Miller <[email protected]>
|
|
inet->min_ttl is already read with READ_ONCE().
Implementing IP_MINTTL socket option set/read
without holding the socket lock is easy.
Signed-off-by: Eric Dumazet <[email protected]>
Acked-by: Soheil Hassas Yeganeh <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
ip_select_ttl() is racy, because it reads inet->uc_ttl
without proper locking.
Add READ_ONCE()/WRITE_ONCE() annotations while
allowing IP_TTL socket option to be set/read without
holding the socket lock.
Signed-off-by: Eric Dumazet <[email protected]>
Acked-by: Soheil Hassas Yeganeh <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Make room in struct inet_sock by removing this bit field,
using one available bit in inet_flags instead.
Also move local_port_range to fill the resulting hole,
saving 8 bytes on 64bit arches.
Signed-off-by: Eric Dumazet <[email protected]>
Acked-by: Soheil Hassas Yeganeh <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Reviewed-by: Matthieu Baerts <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
IP_BIND_ADDRESS_NO_PORT socket option can now be set/read
without locking the socket.
Signed-off-by: Eric Dumazet <[email protected]>
Acked-by: Soheil Hassas Yeganeh <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
IP_NODEFRAG socket option can now be set/read
without locking the socket.
Signed-off-by: Eric Dumazet <[email protected]>
Acked-by: Soheil Hassas Yeganeh <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
We move single bit fields to inet->inet_flags to avoid races.
Signed-off-by: Eric Dumazet <[email protected]>
Acked-by: Soheil Hassas Yeganeh <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
IP_TRANSPARENT socket option can now be set/read
without locking the socket.
v2: removed unused issk variable in mptcp_setsockopt_sol_ip_set_transparent()
v4: rebased after commit 3f326a821b99 ("mptcp: change the mpc check helper to return a sk")
Signed-off-by: Eric Dumazet <[email protected]>
Cc: Paolo Abeni <[email protected]>
Acked-by: Soheil Hassas Yeganeh <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Reviewed-by: Matthieu Baerts <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
IP_MULTICAST_ALL socket option can now be set/read
without locking the socket.
Signed-off-by: Eric Dumazet <[email protected]>
Acked-by: Soheil Hassas Yeganeh <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
IP_MULTICAST_LOOP socket option can now be set/read
without locking the socket.
v3: fix build bot error reported in ipvs set_mcast_loop()
Signed-off-by: Eric Dumazet <[email protected]>
Acked-by: Soheil Hassas Yeganeh <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
IP_HDRINCL socket option can now be set/read
without locking the socket.
Signed-off-by: Eric Dumazet <[email protected]>
Acked-by: Soheil Hassas Yeganeh <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
IP_FREEBIND socket option can now be set/read
without locking the socket.
Signed-off-by: Eric Dumazet <[email protected]>
Acked-by: Soheil Hassas Yeganeh <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Reviewed-by: Matthieu Baerts <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
IP_RECVERR_RFC4884 socket option can now be set/read
without locking the socket.
Signed-off-by: Eric Dumazet <[email protected]>
Acked-by: Soheil Hassas Yeganeh <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
IP_RECVERR socket option can now be set/get without locking the socket.
This patch potentially avoid data-races around inet->recverr.
Signed-off-by: Eric Dumazet <[email protected]>
Acked-by: Soheil Hassas Yeganeh <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Now we have inet->inet_flags, we can set following options
without having to hold the socket lock:
IP_PKTINFO, IP_RECVTTL, IP_RECVTOS, IP_RECVOPTS, IP_RETOPTS,
IP_PASSSEC, IP_RECVORIGDSTADDR, IP_RECVFRAGSIZE.
ip_sock_set_pktinfo() no longer hold the socket lock.
Similarly we can get the following options whithout holding
the socket lock:
IP_PKTINFO, IP_RECVTTL, IP_RECVTOS, IP_RECVOPTS, IP_RETOPTS,
IP_PASSSEC, IP_RECVORIGDSTADDR, IP_CHECKSUM, IP_RECVFRAGSIZE.
Signed-off-by: Eric Dumazet <[email protected]>
Acked-by: Soheil Hassas Yeganeh <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Various inet fields are currently racy.
do_ip_setsockopt() and do_ip_getsockopt() are mostly holding
the socket lock, but some (fast) paths do not.
Use a new inet->inet_flags to hold atomic bits in the series.
Remove inet->cmsg_flags, and use instead 9 bits from inet_flags.
Signed-off-by: Eric Dumazet <[email protected]>
Acked-by: Soheil Hassas Yeganeh <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Ruan Jinjie says:
====================
net: Remove redundant of_match_ptr() macro
Since these net drivers depend on CONFIG_OF, there is
no need to wrap the macro of_match_ptr() here.
Changes in v3:
- Collect responses from v1 and v2.
====================
Signed-off-by: David S. Miller <[email protected]>
|
|
The driver depends on CONFIG_OF, it is not necessary to use
of_match_ptr() here.
Signed-off-by: Ruan Jinjie <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The driver depends on CONFIG_OF, it is not necessary to use
of_match_ptr() here.
Signed-off-by: Ruan Jinjie <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The driver depends on CONFIG_OF, it is not necessary to use
of_match_ptr() here.
Signed-off-by: Ruan Jinjie <[email protected]>
Acked-by: Linus Walleij <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The driver depends on CONFIG_OF, it is not necessary to use
of_match_ptr() here.
Signed-off-by: Ruan Jinjie <[email protected]>
Reviewed-by: Vladimir Oltean <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The driver depends on CONFIG_OF, it is not necessary to use
of_match_ptr() here.
Signed-off-by: Ruan Jinjie <[email protected]>
Acked-by: Linus Walleij <[email protected]>
Reviewed-by: Vladimir Oltean <[email protected]>
Reviewed-by: Linus Walleij <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Use the module_misc_device macro to simplify the code, which is the
same as declaring with module_init() and module_exit().
Signed-off-by: Li Zetao <[email protected]>
Reviewed-by: Krzysztof Kozlowski <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Jijie Shao says:
====================
hns3: refactor registers information for ethtool -d
refactor registers information for ethtool -d
====================
Signed-off-by: David S. Miller <[email protected]>
|
|
In the original RPU query command, the status register values of
multiple RPU tunnels are accumulated by default, which is unreasonable.
This patch Fix it by querying the specified tunnel ID.
The tunnel number of the device can be obtained from firmware
during initialization.
Fixes: ddb54554fa51 ("net: hns3: add DFX registers information for ethtool -d")
Signed-off-by: Jijie Shao <[email protected]>
Reviewed-by: Leon Romanovsky <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The dump register function is being refactored.
The third step in refactoring is to support tlv info in regs data for
HNS3 PF driver.
Currently, if we use "ethtool -d" to dump regs value,
the output is as follows:
offset1: 00 01 02 03 04 05 ...
offset2:10 11 12 13 14 15 ...
......
We can't get the value of a register directly.
This patch deletes the original separator information and
add tag_len_value information in regs data.
ethtool can parse register data in key-value format by -d command.
a patch will be added to the ethtool to parse regs data
in the following format:
reg1 : value2
reg2 : value2
......
Signed-off-by: Jijie Shao <[email protected]>
Reviewed-by: Leon Romanovsky <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The dump register function is being refactored.
The second step in refactoring is to support tlv info in regs data for
HNS3 PF driver.
Currently, if we use "ethtool -d" to dump regs value,
the output is as follows:
offset1: 00 01 02 03 04 05 ...
offset2:10 11 12 13 14 15 ...
......
We can't get the value of a register directly.
This patch deletes the original separator information and
add tag_len_value information in regs data.
ethtool can parse register data in key-value format by -d command.
a patch will be added to the ethtool to parse regs data
in the following format:
reg1 : value2
reg2 : value2
......
Signed-off-by: Jijie Shao <[email protected]>
Reviewed-by: Leon Romanovsky <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The dump register function is being refactored.
The first step in refactoring is put the dump regs function
into a separate file.
Signed-off-by: Jijie Shao <[email protected]>
Reviewed-by: Leon Romanovsky <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Wei Fang says:
====================
net: fec: add XDP_TX feature support
This patch set is to support the XDP_TX feature of FEC driver, the first
patch is add initial XDP_TX support, and the second patch improves the
performance of XDP_TX by not using xdp_convert_buff_to_frame(). Please
refer to the commit message of each patch for more details.
====================
Acked-by: Jesper Dangaard Brouer <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
As suggested by Jesper and Alexander, we can avoid converting xdp_buff
to xdp_frame in case of XDP_TX to save a bunch of CPU cycles, so that
we can further improve the XDP_TX performance.
Before this patch on i.MX8MP-EVK board, the performance shows as follows.
root@imx8mpevk:~# ./xdp2 eth0
proto 17: 353918 pkt/s
proto 17: 352923 pkt/s
proto 17: 353900 pkt/s
proto 17: 352672 pkt/s
proto 17: 353912 pkt/s
proto 17: 354219 pkt/s
After applying this patch, the performance is improved.
root@imx8mpevk:~# ./xdp2 eth0
proto 17: 369261 pkt/s
proto 17: 369267 pkt/s
proto 17: 369206 pkt/s
proto 17: 369214 pkt/s
proto 17: 369126 pkt/s
proto 17: 369272 pkt/s
Signed-off-by: Wei Fang <[email protected]>
Suggested-by: Alexander Lobakin <[email protected]>
Suggested-by: Jesper Dangaard Brouer <[email protected]>
Reviewed-by: Jesper Dangaard Brouer <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The XDP_TX feature is not supported before, and all the frames
which are deemed to do XDP_TX action actually do the XDP_DROP
action. So this patch adds the XDP_TX support to FEC driver.
I tested the performance of XDP_TX in XDP_DRV mode and XDP_SKB
mode respectively on i.MX8MP-EVK platform, and as suggested by
Jesper, I also tested the performance of XDP_REDIRECT on the
same platform. And the test steps and results are as follows.
XDP_TX test:
Step 1: One board is used as generator and connects to switch,and
the FEC port of DUT also connects to the switch. Both boards with
flow control off. Then the generator runs the
pktgen_sample03_burst_single_flow.sh script to generate and send
burst traffic to DUT. Note that the size of packet was set to 64
bytes and the procotol of packet was UDP in my test scenario. In
addition, the SMAC of the packet need to be different from the MAC
of the generator, because the xdp2 program will swap the DMAC and
SMAC of the packet and send it back to the generator. If the SMAC
of the generated packet is the MAC of the generator, the generator
will receive the returned traffic which increase the CPU loading
and significantly degrade the transmit speed of the generator, and
finally it affects the test of XDP_TX performance.
Step 2: The DUT runs the xdp2 program to transmit received UDP
packets back out on the same port where they were received.
root@imx8mpevk:~# ./xdp2 eth0
proto 17: 353918 pkt/s
proto 17: 352923 pkt/s
proto 17: 353900 pkt/s
proto 17: 352672 pkt/s
proto 17: 353912 pkt/s
proto 17: 354219 pkt/s
root@imx8mpevk:~# ./xdp2 -S eth0
proto 17: 160604 pkt/s
proto 17: 160708 pkt/s
proto 17: 160564 pkt/s
proto 17: 160684 pkt/s
proto 17: 160640 pkt/s
proto 17: 160720 pkt/s
The above results show that the XDP_TX performance of XDP_DRV mode
is much better than XDP_SKB mode, more than twice that of XDP_SKB
mode, which is in line with our expectation.
XDP_REDIRECT test:
Step1: Both the generator and the FEC port of the DUT connet to the
switch port. All the ports with flow control off, then the generator
runs the pktgen script to generate and send burst traffic to DUT.
Note that the size of packet was set to 64 bytes and the procotol of
packet was UDP in my test scenario.
Step2: The DUT runs the xdp_redirect program to redirect the traffic
from the FEC port to the FEC port itself.
root@imx8mpevk:~# ./xdp_redirect eth0 eth0
Redirecting from eth0 (ifindex 2; driver fec) to eth0
(ifindex 2; driver fec)
Summary 232,302 rx/s 0 err,drop/s 232,344 xmit/s
Summary 234,579 rx/s 0 err,drop/s 234,577 xmit/s
Summary 235,548 rx/s 0 err,drop/s 235,549 xmit/s
Summary 234,704 rx/s 0 err,drop/s 234,703 xmit/s
Summary 235,504 rx/s 0 err,drop/s 235,504 xmit/s
Summary 235,223 rx/s 0 err,drop/s 235,224 xmit/s
Summary 234,509 rx/s 0 err,drop/s 234,507 xmit/s
Summary 235,481 rx/s 0 err,drop/s 235,482 xmit/s
Summary 234,684 rx/s 0 err,drop/s 234,683 xmit/s
Summary 235,520 rx/s 0 err,drop/s 235,520 xmit/s
Summary 235,461 rx/s 0 err,drop/s 235,461 xmit/s
Summary 234,627 rx/s 0 err,drop/s 234,627 xmit/s
Summary 235,611 rx/s 0 err,drop/s 235,611 xmit/s
Packets received : 3,053,753
Average packets/s : 234,904
Packets transmitted : 3,053,792
Average transmit/s : 234,907
Compared the performance of XDP_TX with XDP_REDIRECT, XDP_TX is also
much better than XDP_REDIRECT. It's also in line with our expectation.
Signed-off-by: Wei Fang <[email protected]>
Suggested-by: Jesper Dangaard Brouer <[email protected]>
Suggested-by: Jakub Kicinski <[email protected]>
Reviewed-by: Larysa Zaremba <[email protected]>
Acked-by: Jesper Dangaard Brouer <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
When run command "ip netns delete client", device link1_1 has been
deleted. So, it is no need to delete link1_1 again. Remove it.
Signed-off-by: Zhengchao Shao <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|