aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2018-08-13cxgb4: remove set but not used variable 'spd'YueHaibing1-8/+0
Fixes gcc '-Wunused-but-set-variable' warning: drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c: In function 'print_port_info': drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c:5147:14: warning: variable 'spd' set but not used [-Wunused-but-set-variable] variable 'spd' is set but not used since commit 547fd27241a8 ("cxgb4: Warn if device doesn't have enough PCI bandwidth") Signed-off-by: YueHaibing <[email protected]> Acked-by: Ganesh Goudar <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-08-13lan743x: lan743x: Remove duplicated include from lan743x_ptp.cYue Haibing1-1/+0
Remove duplicated include. Signed-off-by: Yue Haibing <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-08-13virtio_net: remove duplicated include from virtio_net.cYueHaibing1-1/+0
Remove duplicated include linux/netdevice.h Signed-off-by: YueHaibing <[email protected]> Acked-by: Michael S. Tsirkin <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-08-13packet: switch kvzalloc to allocate memoryLi RongQing2-32/+13
The patches includes following change: *Use modern kvzalloc()/kvfree() instead of custom allocations. *Remove order argument for alloc_pg_vec, it can get from req. *Remove order argument for free_pg_vec, free_pg_vec now uses kvfree which does not need order argument. *Remove pg_vec_order from struct packet_ring_buffer, no longer need to save/restore 'order' *Remove variable 'order' for packet_set_ring, it is now unused Signed-off-by: Zhang Yu <[email protected]> Signed-off-by: Li RongQing <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-08-13net: Change the layout of structure trace_event_raw_fib_table_lookupZong Li1-1/+1
There is an unalignment access about the structure 'trace_event_raw_fib_table_lookup'. In include/trace/events/fib.h, there is a memory operation which casting the 'src' data member to a pointer, and then store a value to this pointer point to. p32 = (__be32 *) __entry->src; *p32 = flp->saddr; The offset of 'src' in structure trace_event_raw_fib_table_lookup is not four bytes alignment. On some architectures, they don't permit the unalignment access, it need to pay the price to handle this situation in exception handler. Adjust the layout of structure to avoid this case. Fixes: 9f323973c915 ("net/ipv4: Udate fib_table_lookup tracepoint") Signed-off-by: Zong Li <[email protected]> Acked-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-08-13Merge branch 'net-sched-actions-rename-for-grep-ability-and-consistency'David S. Miller13-44/+44
Jamal Hadi Salim says: ==================== net: sched: actions rename for grep-ability and consistency Having a structure (example tcf_mirred) and a function with the same name is not good for readability or grepability. This long overdue patchset improves it and make sure there is consistency across all actions ==================== Signed-off-by: David S. Miller <[email protected]>
2018-08-13net: sched: act_mirred method rename for grep-ability and consistencyJamal Hadi Salim1-3/+3
Signed-off-by: Jamal Hadi Salim <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-08-13net: sched: act_vlan method rename for grep-ability and consistencyJamal Hadi Salim1-3/+3
Signed-off-by: Jamal Hadi Salim <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-08-13net: sched: act_skbmod method rename for grep-ability and consistencyJamal Hadi Salim1-2/+2
Signed-off-by: Jamal Hadi Salim <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-08-13net: sched: act_skbedit method rename for grep-ability and consistencyJamal Hadi Salim1-3/+3
Signed-off-by: Jamal Hadi Salim <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-08-13net: sched: act_simple method rename for grep-ability and consistencyJamal Hadi Salim1-3/+3
Signed-off-by: Jamal Hadi Salim <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-08-13net: sched: act_police method rename for grep-ability and consistencyJamal Hadi Salim1-8/+8
Signed-off-by: Jamal Hadi Salim <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-08-13net: sched: act_pedit method rename for grep-ability and consistencyJamal Hadi Salim1-3/+3
Signed-off-by: Jamal Hadi Salim <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-08-13net: sched: act_nat method rename for grep-ability and consistencyJamal Hadi Salim1-3/+3
Signed-off-by: Jamal Hadi Salim <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-08-13net: sched: act_ipt method rename for grep-ability and consistencyJamal Hadi Salim1-4/+4
Signed-off-by: Jamal Hadi Salim <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-08-13net: sched: act_gact method rename for grep-ability and consistencyJamal Hadi Salim1-3/+3
Signed-off-by: Jamal Hadi Salim <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-08-13net: sched: act_sum method rename for grep-ability and consistencyJamal Hadi Salim1-3/+3
Signed-off-by: Jamal Hadi Salim <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-08-13net: sched: act_bpf method rename for grep-ability and consistencyJamal Hadi Salim1-3/+3
Signed-off-by: Jamal Hadi Salim <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-08-13net: sched: act_connmark method rename for grep-ability and consistencyJamal Hadi Salim1-3/+3
Signed-off-by: Jamal Hadi Salim <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-08-13cpumask: make cpumask_next_wrap available without smpWillem de Bruijn1-0/+7
The kbuild robot shows build failure on machines without CONFIG_SMP: drivers/net/virtio_net.c:1916:10: error: implicit declaration of function 'cpumask_next_wrap' cpumask_next_wrap is exported from lib/cpumask.o, which has lib-$(CONFIG_SMP) += cpumask.o same as other functions, also define it as static inline in the NR_CPUS==1 branch in include/linux/cpumask.h. If wrap is true and next == start, return nr_cpumask_bits, or 1. Else wrap across the range of valid cpus, here [0]. Fixes: 2ca653d607ce ("virtio_net: Stripe queue affinities across cores.") Signed-off-by: Willem de Bruijn <[email protected]> Tested-by: Krzysztof Kozlowski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-08-13r8169: don't use MSI-X on RTL8168gHeiner Kallweit1-0/+5
There have been two reports that network doesn't come back on resume from suspend when using MSI-X. Both cases affect the same chip version (RTL8168g - version 40), on different systems. Falling back to MSI fixes the issue. Even though we don't really have a proof yet that the network chip version is to blame, let's disable MSI-X for this version. Reported-by: Steve Dodd <[email protected]> Reported-by: Lou Reed <[email protected]> Tested-by: Steve Dodd <[email protected]> Tested-by: Lou Reed <[email protected]> Fixes: 6c6aa15fdea5 ("r8169: improve interrupt handling") Signed-off-by: Heiner Kallweit <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-08-13Merge branch 'nixge-Minor-cleanups'David S. Miller1-11/+0
Moritz Fischer says: ==================== net: nixge: Minor cleanups in preparation of my 64-bit support series, here's some minor cleanup in preparation that gets rid of unneccesary accesses to the descriptor application fields. I've confirmed that the hardware does not access the fields in all our configurations. ==================== Signed-off-by: David S. Miller <[email protected]>
2018-08-13net: nixge: Don't store skb in app4 field of descriptorMoritz Fischer1-1/+0
Don't store skb in app4 field of descriptor since it is not being used anywhere (including hardware). Signed-off-by: Moritz Fischer <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-08-13net: nixge: Do not zero application specific fields in descMoritz Fischer1-10/+0
Do not zero application specific fields in DMA descriptors. The hardware does ignore them, so should software. Signed-off-by: Moritz Fischer <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-08-13l2tp: use sk_dst_check() to avoid race on sk->sk_dst_cacheWei Wang1-1/+1
In l2tp code, if it is a L2TP_UDP_ENCAP tunnel, tunnel->sk points to a UDP socket. User could call sendmsg() on both this tunnel and the UDP socket itself concurrently. As l2tp_xmit_skb() holds socket lock and call __sk_dst_check() to refresh sk->sk_dst_cache, while udpv6_sendmsg() is lockless and call sk_dst_check() to refresh sk->sk_dst_cache, there could be a race and cause the dst cache to be freed multiple times. So we fix l2tp side code to always call sk_dst_check() to garantee xchg() is called when refreshing sk->sk_dst_cache to avoid race conditions. Syzkaller reported stack trace: BUG: KASAN: use-after-free in atomic_read include/asm-generic/atomic-instrumented.h:21 [inline] BUG: KASAN: use-after-free in atomic_fetch_add_unless include/linux/atomic.h:575 [inline] BUG: KASAN: use-after-free in atomic_add_unless include/linux/atomic.h:597 [inline] BUG: KASAN: use-after-free in dst_hold_safe include/net/dst.h:308 [inline] BUG: KASAN: use-after-free in ip6_hold_safe+0xe6/0x670 net/ipv6/route.c:1029 Read of size 4 at addr ffff8801aea9a880 by task syz-executor129/4829 CPU: 0 PID: 4829 Comm: syz-executor129 Not tainted 4.18.0-rc7-next-20180802+ #30 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113 print_address_description+0x6c/0x20b mm/kasan/report.c:256 kasan_report_error mm/kasan/report.c:354 [inline] kasan_report.cold.7+0x242/0x30d mm/kasan/report.c:412 check_memory_region_inline mm/kasan/kasan.c:260 [inline] check_memory_region+0x13e/0x1b0 mm/kasan/kasan.c:267 kasan_check_read+0x11/0x20 mm/kasan/kasan.c:272 atomic_read include/asm-generic/atomic-instrumented.h:21 [inline] atomic_fetch_add_unless include/linux/atomic.h:575 [inline] atomic_add_unless include/linux/atomic.h:597 [inline] dst_hold_safe include/net/dst.h:308 [inline] ip6_hold_safe+0xe6/0x670 net/ipv6/route.c:1029 rt6_get_pcpu_route net/ipv6/route.c:1249 [inline] ip6_pol_route+0x354/0xd20 net/ipv6/route.c:1922 ip6_pol_route_output+0x54/0x70 net/ipv6/route.c:2098 fib6_rule_lookup+0x283/0x890 net/ipv6/fib6_rules.c:122 ip6_route_output_flags+0x2c5/0x350 net/ipv6/route.c:2126 ip6_dst_lookup_tail+0x1278/0x1da0 net/ipv6/ip6_output.c:978 ip6_dst_lookup_flow+0xc8/0x270 net/ipv6/ip6_output.c:1079 ip6_sk_dst_lookup_flow+0x5ed/0xc50 net/ipv6/ip6_output.c:1117 udpv6_sendmsg+0x2163/0x36b0 net/ipv6/udp.c:1354 inet_sendmsg+0x1a1/0x690 net/ipv4/af_inet.c:798 sock_sendmsg_nosec net/socket.c:622 [inline] sock_sendmsg+0xd5/0x120 net/socket.c:632 ___sys_sendmsg+0x51d/0x930 net/socket.c:2115 __sys_sendmmsg+0x240/0x6f0 net/socket.c:2210 __do_sys_sendmmsg net/socket.c:2239 [inline] __se_sys_sendmmsg net/socket.c:2236 [inline] __x64_sys_sendmmsg+0x9d/0x100 net/socket.c:2236 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x446a29 Code: e8 ac b8 02 00 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 eb 08 fc ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007f4de5532db8 EFLAGS: 00000246 ORIG_RAX: 0000000000000133 RAX: ffffffffffffffda RBX: 00000000006dcc38 RCX: 0000000000446a29 RDX: 00000000000000b8 RSI: 0000000020001b00 RDI: 0000000000000003 RBP: 00000000006dcc30 R08: 00007f4de5533700 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00000000006dcc3c R13: 00007ffe2b830fdf R14: 00007f4de55339c0 R15: 0000000000000001 Fixes: 71b1391a4128 ("l2tp: ensure sk->dst is still valid") Reported-by: [email protected] Signed-off-by: Wei Wang <[email protected]> Signed-off-by: Martin KaFai Lau <[email protected]> Cc: Guillaume Nault <[email protected]> Cc: David Ahern <[email protected]> Cc: Cong Wang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-08-13ipv6: Add icmp_echo_ignore_all support for ICMPv6Virgile Jarry5-4/+22
Preventing the kernel from responding to ICMP Echo Requests messages can be useful in several ways. The sysctl parameter 'icmp_echo_ignore_all' can be used to prevent the kernel from responding to IPv4 ICMP echo requests. For IPv6 pings, such a sysctl kernel parameter did not exist. Add the ability to prevent the kernel from responding to IPv6 ICMP echo requests through the use of the following sysctl parameter : /proc/sys/net/ipv6/icmp/echo_ignore_all. Update the documentation to reflect this change. Signed-off-by: Virgile Jarry <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-08-13Merge branch 'net-tls-Combined-memory-allocation-for-decryption-request'David S. Miller2-100/+142
Vakul Garg says: ==================== net/tls: Combined memory allocation for decryption request This patch does a combined memory allocation from heap for scatterlists, aead_request, aad and iv for the tls record decryption path. In present code, aead_request is allocated from heap, scatterlists on a conditional basis are allocated on heap or on stack. This is inefficient as it may requires multiple kmalloc/kfree. The initialization vector passed in cryption request is allocated on stack. This is a problem since the stack memory is not dma-able from crypto accelerators. Doing one combined memory allocation for each decryption request fixes both the above issues. It also paves a way to be able to submit multiple async decryption requests while the previous one is pending i.e. being processed or queued. ==================== Signed-off-by: David S. Miller <[email protected]>
2018-08-13net/tls: Combined memory allocation for decryption requestVakul Garg2-100/+142
For preparing decryption request, several memory chunks are required (aead_req, sgin, sgout, iv, aad). For submitting the decrypt request to an accelerator, it is required that the buffers which are read by the accelerator must be dma-able and not come from stack. The buffers for aad and iv can be separately kmalloced each, but it is inefficient. This patch does a combined allocation for preparing decryption request and then segments into aead_req || sgin || sgout || iv || aad. Signed-off-by: Vakul Garg <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-08-13Merge branches 'fixes', 'misc' and 'spectre' into for-linusRussell King16-149/+105
Conflicts: arch/arm/include/asm/uaccess.h Signed-off-by: Russell King <[email protected]>
2018-08-13ARM: 8778/1: clkdev: don't call __of_clk_get_by_name() unnecessarily from ↵Bartosz Golaszewski1-1/+1
clk_get() The way this function is implemented caused some confusion when converting the TI DaVinci platform to using the common clock framework. Current kernel supports booting DaVinci boards both in device tree as well as legacy, board-file mode. In the latter, we always end up calling clk_get_sys() as of_node is NULL and __of_clk_get_by_name() returns -ENOENT. It was not obvious at first glance how clk_get(dev, NULL) will work in board-file mode since we always call __of_clk_get_by_name(). Let's make it clearer by checking if of_node is NULL and skipping right to clk_get_sys(). Cc: Sekhar Nori <[email protected]> Cc: Kevin Hilman <[email protected]> Cc: David Lechner <[email protected]> Reviewed-by: David Lechner <[email protected]> Reviewed-by: Sekhar Nori <[email protected]> Signed-off-by: Bartosz Golaszewski <[email protected]> Signed-off-by: Russell King <[email protected]>
2018-08-13Documentation: remove dynamic-resolution-notes reference to non-existent fileHarish Jenny K N1-3/+2
File dt-object-internal.txt does not exist. This patch removes a reference to it. Signed-off-by: Harish Jenny K N <[email protected]> Reviewed-by: Frank Rowand <[email protected]> Signed-off-by: Rob Herring <[email protected]>
2018-08-13Bluetooth: mediatek: pass correct size to h4_recv_buf()Dan Carpenter1-1/+1
We're supposed to pass the number of elements in the mtk_recv_pkts, not the number of bytes. Fixes: 7237c4c9ec92 ("Bluetooth: mediatek: Add protocol support for MediaTek serial devices") Signed-off-by: Dan Carpenter <[email protected]> Signed-off-by: Marcel Holtmann <[email protected]>
2018-08-13Merge tag 'asoc-v4.19' of ↵Takashi Iwai984-6606/+21040
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus ASoC: Updates for v4.19 A fairly big update, including quite a bit of core activity this time around (which is good to see) along with a fairly large set of new drivers. - A new snd_pcm_stop_xrun() helper which is now used in several drivers. - Support for providing name prefixes to generic component nodes. - Quite a few fixes for DPCM as it gains a bit wider use and more robust testing. - Generalization of the DIO2125 support to a simple amplifier driver. - Accessory detection support for the audio graph card. - DT support for PXA AC'97 devices. - Quirks for a number of new x86 systems. - Support for AM Logic Meson, Everest ES7154, Intel systems with RT5682, Qualcomm QDSP6 and WCD9335, Realtek RT5682 and TI TAS5707.
2018-08-13parisc: Fix and improve kernel stack unwindingHelge Deller10-232/+88
This patchset fixes and improves stack unwinding a lot: 1. Show backward stack traces with up to 30 callsites 2. Add callinfo to ENTRY_CFI() such that every assembler function will get an entry in the unwind table 3. Use constants instead of numbers in call_on_stack() 4. Do not depend on CONFIG_KALLSYMS to generate backtraces. 5. Speed up backtrace generation Make sure you have this patch to GNU as installed: https://sourceware.org/ml/binutils/2018-07/msg00474.html Without this patch, unwind info in the kernel is often wrong for various functions. Signed-off-by: Helge Deller <[email protected]>
2018-08-13parisc: Remove unnecessary barriers from spinlock.hJohn David Anglin1-6/+2
Now that mb() is an instruction barrier, it will slow performance if we issue unnecessary barriers. The spinlock defines have a number of unnecessary barriers.  The __ldcw() define is both a hardware and compiler barrier.  The mb() barriers in the routines using __ldcw() serve no purpose. The only barrier needed is the one in arch_spin_unlock().  We need to ensure all accesses are complete prior to releasing the lock. Signed-off-by: John David Anglin <[email protected]> Cc: [email protected] # 4.0+ Signed-off-by: Helge Deller <[email protected]>
2018-08-13parisc: Remove ordered stores from syscall.SJohn David Anglin1-12/+12
Now that we use a sync prior to releasing the locks in syscall.S, we don't need the PA 2.0 ordered stores used to release some locks.  Using an ordered store, potentially slows the release and subsequent code. There are a number of other ordered stores and loads that serve no purpose.  I have converted these to normal stores. Signed-off-by: John David Anglin <[email protected]> Cc: [email protected] # 4.0+ Signed-off-by: Helge Deller <[email protected]>
2018-08-13parisc: prefer _THIS_IP_ and _RET_IP_ statement expressionsNick Desaulniers1-2/+2
As part of the effort to reduce the code duplication between _THIS_IP_ and current_text_addr(), let's consolidate callers of current_text_addr() to use _THIS_IP_. Signed-off-by: Nick Desaulniers <[email protected]> Signed-off-by: Helge Deller <[email protected]>
2018-08-13parisc: Add HAVE_REGS_AND_STACK_ACCESS_API featureHelge Deller3-0/+112
Some parts of the HAVE_REGS_AND_STACK_ACCESS_API feature is needed for the rseq syscall. This patch adds the most important parts, and as long as we don't support kprobes, we should be fine. Signed-off-by: Helge Deller <[email protected]>
2018-08-13parisc: Drop architecture-specific ENOTSUP defineHelge Deller3-13/+2
parisc is the only Linux architecture which has defined a value for ENOTSUP. All other architectures #define ENOTSUP as EOPNOTSUPP in their libc headers. Having an own value for ENOTSUP which is different than EOPNOTSUPP often gives problems with userspace programs which expect both to be the same. One such example is a build error in the libuv package, as can be seen in https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=900237. Since we dropped HP-UX support, there is no real benefit in keeping an own value for ENOTSUP. This patch drops the parisc value for ENOTSUP from the kernel sources. glibc needs no patch, it reuses the exported headers. Signed-off-by: Helge Deller <[email protected]>
2018-08-13parisc: use generic dma_noncoherent_opsChristoph Hellwig4-139/+16
Switch to the generic noncoherent direct mapping implementation. Fix sync_single_for_cpu to do skip the cache flush unless the transfer is to the device to match the more tested unmap_single path which should have the same cache coherency implications. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Helge Deller <[email protected]>
2018-08-13parisc: always use flush_kernel_dcache_range for DMA cache maintainanceChristoph Hellwig1-3/+3
Current the S/G list based DMA ops use flush_kernel_vmap_range which contains a few UP optimizations, while the rest of the DMA operations uses flush_kernel_dcache_range. The single vs sg operations are supposed to have the same effect, so they should use the same routines. Use the more conservation version for now, but if people more familiar with parisc think the vmap version is generally fine for DMA we should switch all interfaces over to it. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Helge Deller <[email protected]>
2018-08-13parisc: merge pcx_dma_ops and pcxl_dma_opsChristoph Hellwig4-59/+43
The only difference is that pcxl supports dma coherent allocations, while pcx only supports non-consistent allocations and otherwise fails. But dma_alloc* is not in the fast path, and merging these two allows an easy migration path to the generic dma-noncoherent implementation, so do it. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Helge Deller <[email protected]>
2018-08-13kconfig: fix the rule of mainmenu_stmt symbolMasahiro Yamada1-2/+2
The rule of mainmenu_stmt does not have debug print of zconf_lineno(), but if it had, it would print a wrong line number for the same reason as commit b2d00d7c61c8 ("kconfig: fix line numbers for if-entries in menu tree"). The mainmenu_stmt does not need to eat following empty lines because they are reduced to common_stmt. Signed-off-by: Masahiro Yamada <[email protected]>
2018-08-13Merge branch 'bpf-ancestor-cgroup-id'Daniel Borkmann9-5/+404
Andrey Ignatov says: ==================== This patch set adds new BPF helper bpf_skb_ancestor_cgroup_id that returns id of cgroup v2 that is ancestor of cgroup associated with the skb at the ancestor_level. The helper is useful to implement policies in TC based on cgroups that are upper in hierarchy than immediate cgroup associated with skb. v1->v2: - more reliable check for testing IPv6 to become ready in selftest. ==================== Signed-off-by: Daniel Borkmann <[email protected]>
2018-08-13selftests/bpf: Selftest for bpf_skb_ancestor_cgroup_idAndrey Ignatov4-3/+302
Add selftests for bpf_skb_ancestor_cgroup_id helper. test_skb_cgroup_id.sh prepares testing interface and adds tc qdisc and filter for it using BPF object compiled from test_skb_cgroup_id_kern.c program. BPF program in test_skb_cgroup_id_kern.c gets ancestor cgroup id using the new helper at different levels of cgroup hierarchy that skb belongs to, including root level and non-existing level, and saves it to the map where the key is the level of corresponding cgroup and the value is its id. To trigger BPF program, user space program test_skb_cgroup_id_user is run. It adds itself into testing cgroup and sends UDP datagram to link-local multicast address of testing interface. Then it reads cgroup ids saved in kernel for different levels from the BPF map and compares them with those in user space. They must be equal for every level of ancestry. Example of run: # ./test_skb_cgroup_id.sh Wait for testing link-local IP to become available ... OK Note: 8 bytes struct bpf_elf_map fixup performed due to size mismatch! [PASS] Signed-off-by: Andrey Ignatov <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]>
2018-08-13selftests/bpf: Add cgroup id helpers to bpf_helpers.hAndrey Ignatov1-0/+4
Add bpf_skb_cgroup_id and bpf_skb_ancestor_cgroup_id helpers to bpf_helpers.h to use them in tests and samples. Signed-off-by: Andrey Ignatov <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]>
2018-08-13bpf: Sync bpf.h to tools/Andrey Ignatov1-1/+20
Sync skb_ancestor_cgroup_id() related bpf UAPI changes to tools/. Signed-off-by: Andrey Ignatov <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]>
2018-08-13bpf: Introduce bpf_skb_ancestor_cgroup_id helperAndrey Ignatov3-1/+78
== Problem description == It's useful to be able to identify cgroup associated with skb in TC so that a policy can be applied to this skb, and existing bpf_skb_cgroup_id helper can help with this. Though in real life cgroup hierarchy and hierarchy to apply a policy to don't map 1:1. It's often the case that there is a container and corresponding cgroup, but there are many more sub-cgroups inside container, e.g. because it's delegated to containerized application to control resources for its subsystems, or to separate application inside container from infra that belongs to containerization system (e.g. sshd). At the same time it may be useful to apply a policy to container as a whole. If multiple containers like this are run on a host (what is often the case) and many of them have sub-cgroups, it may not be possible to apply per-container policy in TC with existing helpers such as bpf_skb_under_cgroup or bpf_skb_cgroup_id: * bpf_skb_cgroup_id will return id of immediate cgroup associated with skb, i.e. if it's a sub-cgroup inside container, it can't be used to identify container's cgroup; * bpf_skb_under_cgroup can work only with one cgroup and doesn't scale, i.e. if there are N containers on a host and a policy has to be applied to M of them (0 <= M <= N), it'd require M calls to bpf_skb_under_cgroup, and, if M changes, it'd require to rebuild & load new BPF program. == Solution == The patch introduces new helper bpf_skb_ancestor_cgroup_id that can be used to get id of cgroup v2 that is an ancestor of cgroup associated with skb at specified level of cgroup hierarchy. That way admin can place all containers on one level of cgroup hierarchy (what is a good practice in general and already used in many configurations) and identify specific cgroup on this level no matter what sub-cgroup skb is associated with. E.g. if there is a cgroup hierarchy: root/ root/container1/ root/container1/app11/ root/container1/app11/sub-app-a/ root/container1/app12/ root/container2/ root/container2/app21/ root/container2/app22/ root/container2/app22/sub-app-b/ , then having skb associated with root/container1/app11/sub-app-a/ it's possible to get ancestor at level 1, what is container1 and apply policy for this container, or apply another policy if it's container2. Policies can be kept e.g. in a hash map where key is a container cgroup id and value is an action. Levels where container cgroups are created are usually known in advance whether cgroup hierarchy inside container may be hard to predict especially in case when its creation is delegated to containerized application. == Implementation details == The helper gets ancestor by walking parents up to specified level. Another option would be to get different kind of "id" from cgroup->ancestor_ids[level] and use it with idr_find() to get struct cgroup for ancestor. But that would require radix lookup what doesn't seem to be better (at least it's not obviously better). Format of return value of the new helper is same as that of bpf_skb_cgroup_id. Signed-off-by: Andrey Ignatov <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]>
2018-08-13bpf: decouple btf from seq bpf fs dump and enable more mapsDaniel Borkmann12-44/+75
Commit a26ca7c982cb ("bpf: btf: Add pretty print support to the basic arraymap") and 699c86d6ec21 ("bpf: btf: add pretty print for hash/lru_hash maps") enabled support for BTF and dumping via BPF fs for array and hash/lru map. However, both can be decoupled from each other such that regular BPF maps can be supported for attaching BTF key/value information, while not all maps necessarily need to dump via map_seq_show_elem() callback. The basic sanity check which is a prerequisite for all maps is that key/value size has to match in any case, and some maps can have extra checks via map_check_btf() callback, e.g. probing certain types or indicating no support in general. With that we can also enable retrieving BTF info for per-cpu map types and lpm. Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Alexei Starovoitov <[email protected]> Acked-by: Yonghong Song <[email protected]>
2018-08-12Linux 4.18Linus Torvalds1-1/+1