aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2018-08-13lan743x: lan743x: Remove duplicated include from lan743x_ptp.cYue Haibing1-1/+0
Remove duplicated include. Signed-off-by: Yue Haibing <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-08-13virtio_net: remove duplicated include from virtio_net.cYueHaibing1-1/+0
Remove duplicated include linux/netdevice.h Signed-off-by: YueHaibing <[email protected]> Acked-by: Michael S. Tsirkin <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-08-13packet: switch kvzalloc to allocate memoryLi RongQing2-32/+13
The patches includes following change: *Use modern kvzalloc()/kvfree() instead of custom allocations. *Remove order argument for alloc_pg_vec, it can get from req. *Remove order argument for free_pg_vec, free_pg_vec now uses kvfree which does not need order argument. *Remove pg_vec_order from struct packet_ring_buffer, no longer need to save/restore 'order' *Remove variable 'order' for packet_set_ring, it is now unused Signed-off-by: Zhang Yu <[email protected]> Signed-off-by: Li RongQing <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-08-13net: Change the layout of structure trace_event_raw_fib_table_lookupZong Li1-1/+1
There is an unalignment access about the structure 'trace_event_raw_fib_table_lookup'. In include/trace/events/fib.h, there is a memory operation which casting the 'src' data member to a pointer, and then store a value to this pointer point to. p32 = (__be32 *) __entry->src; *p32 = flp->saddr; The offset of 'src' in structure trace_event_raw_fib_table_lookup is not four bytes alignment. On some architectures, they don't permit the unalignment access, it need to pay the price to handle this situation in exception handler. Adjust the layout of structure to avoid this case. Fixes: 9f323973c915 ("net/ipv4: Udate fib_table_lookup tracepoint") Signed-off-by: Zong Li <[email protected]> Acked-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-08-13Merge branch 'net-sched-actions-rename-for-grep-ability-and-consistency'David S. Miller13-44/+44
Jamal Hadi Salim says: ==================== net: sched: actions rename for grep-ability and consistency Having a structure (example tcf_mirred) and a function with the same name is not good for readability or grepability. This long overdue patchset improves it and make sure there is consistency across all actions ==================== Signed-off-by: David S. Miller <[email protected]>
2018-08-13net: sched: act_mirred method rename for grep-ability and consistencyJamal Hadi Salim1-3/+3
Signed-off-by: Jamal Hadi Salim <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-08-13net: sched: act_vlan method rename for grep-ability and consistencyJamal Hadi Salim1-3/+3
Signed-off-by: Jamal Hadi Salim <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-08-13net: sched: act_skbmod method rename for grep-ability and consistencyJamal Hadi Salim1-2/+2
Signed-off-by: Jamal Hadi Salim <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-08-13net: sched: act_skbedit method rename for grep-ability and consistencyJamal Hadi Salim1-3/+3
Signed-off-by: Jamal Hadi Salim <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-08-13net: sched: act_simple method rename for grep-ability and consistencyJamal Hadi Salim1-3/+3
Signed-off-by: Jamal Hadi Salim <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-08-13net: sched: act_police method rename for grep-ability and consistencyJamal Hadi Salim1-8/+8
Signed-off-by: Jamal Hadi Salim <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-08-13net: sched: act_pedit method rename for grep-ability and consistencyJamal Hadi Salim1-3/+3
Signed-off-by: Jamal Hadi Salim <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-08-13net: sched: act_nat method rename for grep-ability and consistencyJamal Hadi Salim1-3/+3
Signed-off-by: Jamal Hadi Salim <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-08-13net: sched: act_ipt method rename for grep-ability and consistencyJamal Hadi Salim1-4/+4
Signed-off-by: Jamal Hadi Salim <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-08-13net: sched: act_gact method rename for grep-ability and consistencyJamal Hadi Salim1-3/+3
Signed-off-by: Jamal Hadi Salim <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-08-13net: sched: act_sum method rename for grep-ability and consistencyJamal Hadi Salim1-3/+3
Signed-off-by: Jamal Hadi Salim <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-08-13net: sched: act_bpf method rename for grep-ability and consistencyJamal Hadi Salim1-3/+3
Signed-off-by: Jamal Hadi Salim <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-08-13net: sched: act_connmark method rename for grep-ability and consistencyJamal Hadi Salim1-3/+3
Signed-off-by: Jamal Hadi Salim <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-08-13cpumask: make cpumask_next_wrap available without smpWillem de Bruijn1-0/+7
The kbuild robot shows build failure on machines without CONFIG_SMP: drivers/net/virtio_net.c:1916:10: error: implicit declaration of function 'cpumask_next_wrap' cpumask_next_wrap is exported from lib/cpumask.o, which has lib-$(CONFIG_SMP) += cpumask.o same as other functions, also define it as static inline in the NR_CPUS==1 branch in include/linux/cpumask.h. If wrap is true and next == start, return nr_cpumask_bits, or 1. Else wrap across the range of valid cpus, here [0]. Fixes: 2ca653d607ce ("virtio_net: Stripe queue affinities across cores.") Signed-off-by: Willem de Bruijn <[email protected]> Tested-by: Krzysztof Kozlowski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-08-13r8169: don't use MSI-X on RTL8168gHeiner Kallweit1-0/+5
There have been two reports that network doesn't come back on resume from suspend when using MSI-X. Both cases affect the same chip version (RTL8168g - version 40), on different systems. Falling back to MSI fixes the issue. Even though we don't really have a proof yet that the network chip version is to blame, let's disable MSI-X for this version. Reported-by: Steve Dodd <[email protected]> Reported-by: Lou Reed <[email protected]> Tested-by: Steve Dodd <[email protected]> Tested-by: Lou Reed <[email protected]> Fixes: 6c6aa15fdea5 ("r8169: improve interrupt handling") Signed-off-by: Heiner Kallweit <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-08-13Merge branch 'nixge-Minor-cleanups'David S. Miller1-11/+0
Moritz Fischer says: ==================== net: nixge: Minor cleanups in preparation of my 64-bit support series, here's some minor cleanup in preparation that gets rid of unneccesary accesses to the descriptor application fields. I've confirmed that the hardware does not access the fields in all our configurations. ==================== Signed-off-by: David S. Miller <[email protected]>
2018-08-13net: nixge: Don't store skb in app4 field of descriptorMoritz Fischer1-1/+0
Don't store skb in app4 field of descriptor since it is not being used anywhere (including hardware). Signed-off-by: Moritz Fischer <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-08-13net: nixge: Do not zero application specific fields in descMoritz Fischer1-10/+0
Do not zero application specific fields in DMA descriptors. The hardware does ignore them, so should software. Signed-off-by: Moritz Fischer <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-08-13l2tp: use sk_dst_check() to avoid race on sk->sk_dst_cacheWei Wang1-1/+1
In l2tp code, if it is a L2TP_UDP_ENCAP tunnel, tunnel->sk points to a UDP socket. User could call sendmsg() on both this tunnel and the UDP socket itself concurrently. As l2tp_xmit_skb() holds socket lock and call __sk_dst_check() to refresh sk->sk_dst_cache, while udpv6_sendmsg() is lockless and call sk_dst_check() to refresh sk->sk_dst_cache, there could be a race and cause the dst cache to be freed multiple times. So we fix l2tp side code to always call sk_dst_check() to garantee xchg() is called when refreshing sk->sk_dst_cache to avoid race conditions. Syzkaller reported stack trace: BUG: KASAN: use-after-free in atomic_read include/asm-generic/atomic-instrumented.h:21 [inline] BUG: KASAN: use-after-free in atomic_fetch_add_unless include/linux/atomic.h:575 [inline] BUG: KASAN: use-after-free in atomic_add_unless include/linux/atomic.h:597 [inline] BUG: KASAN: use-after-free in dst_hold_safe include/net/dst.h:308 [inline] BUG: KASAN: use-after-free in ip6_hold_safe+0xe6/0x670 net/ipv6/route.c:1029 Read of size 4 at addr ffff8801aea9a880 by task syz-executor129/4829 CPU: 0 PID: 4829 Comm: syz-executor129 Not tainted 4.18.0-rc7-next-20180802+ #30 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113 print_address_description+0x6c/0x20b mm/kasan/report.c:256 kasan_report_error mm/kasan/report.c:354 [inline] kasan_report.cold.7+0x242/0x30d mm/kasan/report.c:412 check_memory_region_inline mm/kasan/kasan.c:260 [inline] check_memory_region+0x13e/0x1b0 mm/kasan/kasan.c:267 kasan_check_read+0x11/0x20 mm/kasan/kasan.c:272 atomic_read include/asm-generic/atomic-instrumented.h:21 [inline] atomic_fetch_add_unless include/linux/atomic.h:575 [inline] atomic_add_unless include/linux/atomic.h:597 [inline] dst_hold_safe include/net/dst.h:308 [inline] ip6_hold_safe+0xe6/0x670 net/ipv6/route.c:1029 rt6_get_pcpu_route net/ipv6/route.c:1249 [inline] ip6_pol_route+0x354/0xd20 net/ipv6/route.c:1922 ip6_pol_route_output+0x54/0x70 net/ipv6/route.c:2098 fib6_rule_lookup+0x283/0x890 net/ipv6/fib6_rules.c:122 ip6_route_output_flags+0x2c5/0x350 net/ipv6/route.c:2126 ip6_dst_lookup_tail+0x1278/0x1da0 net/ipv6/ip6_output.c:978 ip6_dst_lookup_flow+0xc8/0x270 net/ipv6/ip6_output.c:1079 ip6_sk_dst_lookup_flow+0x5ed/0xc50 net/ipv6/ip6_output.c:1117 udpv6_sendmsg+0x2163/0x36b0 net/ipv6/udp.c:1354 inet_sendmsg+0x1a1/0x690 net/ipv4/af_inet.c:798 sock_sendmsg_nosec net/socket.c:622 [inline] sock_sendmsg+0xd5/0x120 net/socket.c:632 ___sys_sendmsg+0x51d/0x930 net/socket.c:2115 __sys_sendmmsg+0x240/0x6f0 net/socket.c:2210 __do_sys_sendmmsg net/socket.c:2239 [inline] __se_sys_sendmmsg net/socket.c:2236 [inline] __x64_sys_sendmmsg+0x9d/0x100 net/socket.c:2236 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x446a29 Code: e8 ac b8 02 00 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 eb 08 fc ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007f4de5532db8 EFLAGS: 00000246 ORIG_RAX: 0000000000000133 RAX: ffffffffffffffda RBX: 00000000006dcc38 RCX: 0000000000446a29 RDX: 00000000000000b8 RSI: 0000000020001b00 RDI: 0000000000000003 RBP: 00000000006dcc30 R08: 00007f4de5533700 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00000000006dcc3c R13: 00007ffe2b830fdf R14: 00007f4de55339c0 R15: 0000000000000001 Fixes: 71b1391a4128 ("l2tp: ensure sk->dst is still valid") Reported-by: [email protected] Signed-off-by: Wei Wang <[email protected]> Signed-off-by: Martin KaFai Lau <[email protected]> Cc: Guillaume Nault <[email protected]> Cc: David Ahern <[email protected]> Cc: Cong Wang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-08-13ipv6: Add icmp_echo_ignore_all support for ICMPv6Virgile Jarry5-4/+22
Preventing the kernel from responding to ICMP Echo Requests messages can be useful in several ways. The sysctl parameter 'icmp_echo_ignore_all' can be used to prevent the kernel from responding to IPv4 ICMP echo requests. For IPv6 pings, such a sysctl kernel parameter did not exist. Add the ability to prevent the kernel from responding to IPv6 ICMP echo requests through the use of the following sysctl parameter : /proc/sys/net/ipv6/icmp/echo_ignore_all. Update the documentation to reflect this change. Signed-off-by: Virgile Jarry <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-08-13Merge branch 'net-tls-Combined-memory-allocation-for-decryption-request'David S. Miller2-100/+142
Vakul Garg says: ==================== net/tls: Combined memory allocation for decryption request This patch does a combined memory allocation from heap for scatterlists, aead_request, aad and iv for the tls record decryption path. In present code, aead_request is allocated from heap, scatterlists on a conditional basis are allocated on heap or on stack. This is inefficient as it may requires multiple kmalloc/kfree. The initialization vector passed in cryption request is allocated on stack. This is a problem since the stack memory is not dma-able from crypto accelerators. Doing one combined memory allocation for each decryption request fixes both the above issues. It also paves a way to be able to submit multiple async decryption requests while the previous one is pending i.e. being processed or queued. ==================== Signed-off-by: David S. Miller <[email protected]>
2018-08-13net/tls: Combined memory allocation for decryption requestVakul Garg2-100/+142
For preparing decryption request, several memory chunks are required (aead_req, sgin, sgout, iv, aad). For submitting the decrypt request to an accelerator, it is required that the buffers which are read by the accelerator must be dma-able and not come from stack. The buffers for aad and iv can be separately kmalloced each, but it is inefficient. This patch does a combined allocation for preparing decryption request and then segments into aead_req || sgin || sgout || iv || aad. Signed-off-by: Vakul Garg <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-08-13Bluetooth: mediatek: pass correct size to h4_recv_buf()Dan Carpenter1-1/+1
We're supposed to pass the number of elements in the mtk_recv_pkts, not the number of bytes. Fixes: 7237c4c9ec92 ("Bluetooth: mediatek: Add protocol support for MediaTek serial devices") Signed-off-by: Dan Carpenter <[email protected]> Signed-off-by: Marcel Holtmann <[email protected]>
2018-08-13Merge branch 'bpf-ancestor-cgroup-id'Daniel Borkmann9-5/+404
Andrey Ignatov says: ==================== This patch set adds new BPF helper bpf_skb_ancestor_cgroup_id that returns id of cgroup v2 that is ancestor of cgroup associated with the skb at the ancestor_level. The helper is useful to implement policies in TC based on cgroups that are upper in hierarchy than immediate cgroup associated with skb. v1->v2: - more reliable check for testing IPv6 to become ready in selftest. ==================== Signed-off-by: Daniel Borkmann <[email protected]>
2018-08-13selftests/bpf: Selftest for bpf_skb_ancestor_cgroup_idAndrey Ignatov4-3/+302
Add selftests for bpf_skb_ancestor_cgroup_id helper. test_skb_cgroup_id.sh prepares testing interface and adds tc qdisc and filter for it using BPF object compiled from test_skb_cgroup_id_kern.c program. BPF program in test_skb_cgroup_id_kern.c gets ancestor cgroup id using the new helper at different levels of cgroup hierarchy that skb belongs to, including root level and non-existing level, and saves it to the map where the key is the level of corresponding cgroup and the value is its id. To trigger BPF program, user space program test_skb_cgroup_id_user is run. It adds itself into testing cgroup and sends UDP datagram to link-local multicast address of testing interface. Then it reads cgroup ids saved in kernel for different levels from the BPF map and compares them with those in user space. They must be equal for every level of ancestry. Example of run: # ./test_skb_cgroup_id.sh Wait for testing link-local IP to become available ... OK Note: 8 bytes struct bpf_elf_map fixup performed due to size mismatch! [PASS] Signed-off-by: Andrey Ignatov <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]>
2018-08-13selftests/bpf: Add cgroup id helpers to bpf_helpers.hAndrey Ignatov1-0/+4
Add bpf_skb_cgroup_id and bpf_skb_ancestor_cgroup_id helpers to bpf_helpers.h to use them in tests and samples. Signed-off-by: Andrey Ignatov <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]>
2018-08-13bpf: Sync bpf.h to tools/Andrey Ignatov1-1/+20
Sync skb_ancestor_cgroup_id() related bpf UAPI changes to tools/. Signed-off-by: Andrey Ignatov <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]>
2018-08-13bpf: Introduce bpf_skb_ancestor_cgroup_id helperAndrey Ignatov3-1/+78
== Problem description == It's useful to be able to identify cgroup associated with skb in TC so that a policy can be applied to this skb, and existing bpf_skb_cgroup_id helper can help with this. Though in real life cgroup hierarchy and hierarchy to apply a policy to don't map 1:1. It's often the case that there is a container and corresponding cgroup, but there are many more sub-cgroups inside container, e.g. because it's delegated to containerized application to control resources for its subsystems, or to separate application inside container from infra that belongs to containerization system (e.g. sshd). At the same time it may be useful to apply a policy to container as a whole. If multiple containers like this are run on a host (what is often the case) and many of them have sub-cgroups, it may not be possible to apply per-container policy in TC with existing helpers such as bpf_skb_under_cgroup or bpf_skb_cgroup_id: * bpf_skb_cgroup_id will return id of immediate cgroup associated with skb, i.e. if it's a sub-cgroup inside container, it can't be used to identify container's cgroup; * bpf_skb_under_cgroup can work only with one cgroup and doesn't scale, i.e. if there are N containers on a host and a policy has to be applied to M of them (0 <= M <= N), it'd require M calls to bpf_skb_under_cgroup, and, if M changes, it'd require to rebuild & load new BPF program. == Solution == The patch introduces new helper bpf_skb_ancestor_cgroup_id that can be used to get id of cgroup v2 that is an ancestor of cgroup associated with skb at specified level of cgroup hierarchy. That way admin can place all containers on one level of cgroup hierarchy (what is a good practice in general and already used in many configurations) and identify specific cgroup on this level no matter what sub-cgroup skb is associated with. E.g. if there is a cgroup hierarchy: root/ root/container1/ root/container1/app11/ root/container1/app11/sub-app-a/ root/container1/app12/ root/container2/ root/container2/app21/ root/container2/app22/ root/container2/app22/sub-app-b/ , then having skb associated with root/container1/app11/sub-app-a/ it's possible to get ancestor at level 1, what is container1 and apply policy for this container, or apply another policy if it's container2. Policies can be kept e.g. in a hash map where key is a container cgroup id and value is an action. Levels where container cgroups are created are usually known in advance whether cgroup hierarchy inside container may be hard to predict especially in case when its creation is delegated to containerized application. == Implementation details == The helper gets ancestor by walking parents up to specified level. Another option would be to get different kind of "id" from cgroup->ancestor_ids[level] and use it with idr_find() to get struct cgroup for ancestor. But that would require radix lookup what doesn't seem to be better (at least it's not obviously better). Format of return value of the new helper is same as that of bpf_skb_cgroup_id. Signed-off-by: Andrey Ignatov <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]>
2018-08-13bpf: decouple btf from seq bpf fs dump and enable more mapsDaniel Borkmann12-44/+75
Commit a26ca7c982cb ("bpf: btf: Add pretty print support to the basic arraymap") and 699c86d6ec21 ("bpf: btf: add pretty print for hash/lru_hash maps") enabled support for BTF and dumping via BPF fs for array and hash/lru map. However, both can be decoupled from each other such that regular BPF maps can be supported for attaching BTF key/value information, while not all maps necessarily need to dump via map_seq_show_elem() callback. The basic sanity check which is a prerequisite for all maps is that key/value size has to match in any case, and some maps can have extra checks via map_check_btf() callback, e.g. probing certain types or indicating no support in general. With that we can also enable retrieving BTF info for per-cpu map types and lpm. Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Alexei Starovoitov <[email protected]> Acked-by: Yonghong Song <[email protected]>
2018-08-11Merge branch 'ip-faster-in-order-IP-fragments'David S. Miller3-42/+149
Peter Oskolkov says: ==================== ip: faster in-order IP fragments Added "Signed-off-by" in v2. ==================== Signed-off-by: David S. Miller <[email protected]>
2018-08-11ip: process in-order fragments efficientlyPeter Oskolkov2-42/+70
This patch changes the runtime behavior of IP defrag queue: incoming in-order fragments are added to the end of the current list/"run" of in-order fragments at the tail. On some workloads, UDP stream performance is substantially improved: RX: ./udp_stream -F 10 -T 2 -l 60 TX: ./udp_stream -c -H <host> -F 10 -T 5 -l 60 with this patchset applied on a 10Gbps receiver: throughput=9524.18 throughput_units=Mbit/s upstream (net-next): throughput=4608.93 throughput_units=Mbit/s Reported-by: Willem de Bruijn <[email protected]> Signed-off-by: Peter Oskolkov <[email protected]> Cc: Eric Dumazet <[email protected]> Cc: Florian Westphal <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-08-11ip: add helpers to process in-order fragments faster.Peter Oskolkov2-0/+79
This patch introduces several helper functions/macros that will be used in the follow-up patch. No runtime changes yet. The new logic (fully implemented in the second patch) is as follows: * Nodes in the rb-tree will now contain not single fragments, but lists of consecutive fragments ("runs"). * At each point in time, the current "active" run at the tail is maintained/tracked. Fragments that arrive in-order, adjacent to the previous tail fragment, are added to this tail run without triggering the re-balancing of the rb-tree. * If a fragment arrives out of order with the offset _before_ the tail run, it is inserted into the rb-tree as a single fragment. * If a fragment arrives after the current tail fragment (with a gap), it starts a new "tail" run, as is inserted into the rb-tree at the end as the head of the new run. skb->cb is used to store additional information needed here (suggested by Eric Dumazet). Reported-by: Willem de Bruijn <[email protected]> Signed-off-by: Peter Oskolkov <[email protected]> Cc: Eric Dumazet <[email protected]> Cc: Florian Westphal <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-08-11Merge ra.kernel.org:/pub/scm/linux/kernel/git/davem/netDavid S. Miller19-71/+92
2018-08-11Merge branch 'Remove-rtnl-lock-dependency-from-all-action-implementations'David S. Miller17-122/+214
Vlad Buslov says: ==================== Remove rtnl lock dependency from all action implementations Currently, all netlink protocol handlers for updating rules, actions and qdiscs are protected with single global rtnl lock which removes any possibility for parallelism. This patch set is a second step to remove rtnl lock dependency from TC rules update path. Recently, new rtnl registration flag RTNL_FLAG_DOIT_UNLOCKED was added. Handlers registered with this flag are called without RTNL taken. End goal is to have rule update handlers(RTM_NEWTFILTER, RTM_DELTFILTER, etc.) to be registered with UNLOCKED flag to allow parallel execution. However, there is no intention to completely remove or split rtnl lock itself. This patch set addresses specific problems in implementation of tc actions that prevent their control path from being executed concurrently. Additional changes are required to refactor classifiers API and individual classifiers for parallel execution. This patch set lays groundwork to eventually register rule update handlers as rtnl-unlocked. Action API is already prepared for parallel execution with previous patch set, which means that action ops that use action API for their implementation do not require additional modifications. (delete, search, etc.) Action API implements concurrency-safe reference counting and guarantees that cleanup/delete is called only once, after last reference to action is released. The goal of this change is to update specific actions APIs that access action private state directly, in order to be independent from external locking. General approach is to re-use existing tcf_lock spinlock (used by some action implementation to synchronize control path with data path) to protect action private state from concurrent modification. If action has rcu-protected pointer, tcf spinlock is used to protect its update code, instead of relying on rtnl lock. Some actions need to determine rtnl mutex status in order to release it. For example, ife action can load additional kernel modules(meta ops) and must make sure that no locks are held during module load. In such cases 'rtnl_held' argument is used to conditionally release rtnl mutex. Changes from V1 to V2: - Patch 12: - new patch - Patch 14: - refactor gen_new_estimator() to reuse stats_lock when re-assigning rate estimator statistics pointer - Remove mirred and tunnel_key helper function changes. (to be submitted and standalone patch) ==================== Signed-off-by: David S. Miller <[email protected]>
2018-08-11net: sched: act_police: remove dependency on rtnl lockVlad Buslov1-3/+6
Use tcf spinlock to protect police action private data from concurrent modification during dump. (init already uses tcf spinlock when changing police action state) Pass tcf spinlock as estimator lock argument to gen_replace_estimator() during action init. Signed-off-by: Vlad Buslov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-08-11net: core: protect rate estimator statistics pointer with lockVlad Buslov2-10/+15
Extend gen_new_estimator() to also take stats_lock when re-assigning rate estimator statistics pointer. (to be used by unlocked actions) Rename 'stats_lock' to 'lock' and change argument description to explain that it is now also used for control path. Signed-off-by: Vlad Buslov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-08-11net: sched: act_mirred: remove dependency on rtnl lockVlad Buslov1-27/+51
Re-introduce mirred list spinlock, that was removed some time ago, in order to protect it from concurrent modifications, instead of relying on rtnl lock. Use tcf spinlock to protect mirred action private data from concurrent modification in init and dump. Rearrange access to mirred data in order to be performed only while holding the lock. Rearrange net dev access to always hold reference while working with it, instead of relying on rntl lock. Signed-off-by: Vlad Buslov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-08-11net: sched: extend action ops with put_dev callbackVlad Buslov3-1/+13
As a preparation for removing dependency on rtnl lock from rules update path, all users of shared objects must take reference while working with them. Extend action ops with put_dev() API to be used on net device returned by get_dev(). Modify mirred action (only action that implements get_dev callback): - Take reference to net device in get_dev. - Implement put_dev API that releases reference to net device. Signed-off-by: Vlad Buslov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-08-11net: sched: act_vlan: remove dependency on rtnl lockVlad Buslov1-12/+15
Use tcf spinlock to protect vlan action private data from concurrent modification during dump and init. Use rcu swap operation to reassign params pointer under protection of tcf lock. (old params value is not used by init, so there is no need of standalone rcu dereference step) Remove rtnl assertion that is no longer necessary. Signed-off-by: Vlad Buslov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-08-11net: sched: act_tunnel_key: remove dependency on rtnl lockVlad Buslov1-13/+13
Use tcf lock to protect tunnel key action struct private data from concurrent modification in init and dump. Use rcu swap operation to reassign params pointer under protection of tcf lock. (old params value is not used by init, so there is no need of standalone rcu dereference step) Remove rtnl lock assertion that is no longer required. Signed-off-by: Vlad Buslov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-08-11net: sched: act_skbmod: remove dependency on rtnl lockVlad Buslov1-5/+9
Move read of skbmod_p rcu pointer to be protected by tcf spinlock. Use tcf spinlock to protect private skbmod data from concurrent modification during dump. Signed-off-by: Vlad Buslov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-08-11net: sched: act_simple: remove dependency on rtnl lockVlad Buslov1-1/+5
Use tcf spinlock to protect private simple action data from concurrent modification during dump. (simple init already uses tcf spinlock when changing action state) Signed-off-by: Vlad Buslov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-08-11net: sched: act_sample: remove dependency on rtnl lockVlad Buslov1-2/+10
Use tcf spinlock to protect private sample action data from concurrent modification during dump and init. Signed-off-by: Vlad Buslov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-08-11net: sched: act_pedit: remove dependency on rtnl lockVlad Buslov1-20/+20
Rearrange pedit init code to only access pedit action data while holding tcf spinlock. Change keys allocation type to atomic to allow it to execute while holding tcf spinlock. Take tcf spinlock in dump function when accessing pedit action data. Signed-off-by: Vlad Buslov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-08-11net: sched: act_ipt: remove dependency on rtnl lockVlad Buslov1-0/+3
Use tcf spinlock to protect ipt action private data from concurrent modification during dump. Ipt init already takes tcf spinlock when modifying ipt state. Signed-off-by: Vlad Buslov <[email protected]> Signed-off-by: David S. Miller <[email protected]>