aboutsummaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)AuthorFilesLines
2020-09-30bpf: Introduce BPF_F_PRESERVE_ELEMS for perf event arraySong Liu1-2/+17
Currently, perf event in perf event array is removed from the array when the map fd used to add the event is closed. This behavior makes it difficult to the share perf events with perf event array. Introduce perf event map that keeps the perf event open with a new flag BPF_F_PRESERVE_ELEMS. With this flag set, perf events in the array are not removed when the original map fd is closed. Instead, the perf event will stay in the map until 1) it is explicitly removed from the array; or 2) the array is freed. Signed-off-by: Song Liu <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-09-30bpf, net: Rework cookie generator as per-cpu oneDaniel Borkmann1-1/+1
With its use in BPF, the cookie generator can be called very frequently in particular when used out of cgroup v2 hooks (e.g. connect / sendmsg) and attached to the root cgroup, for example, when used in v1/v2 mixed environments. In particular, when there's a high churn on sockets in the system there can be many parallel requests to the bpf_get_socket_cookie() and bpf_get_netns_cookie() helpers which then cause contention on the atomic counter. As similarly done in f991bd2e1421 ("fs: introduce a per-cpu last_ino allocator"), add a small helper library that both can use for the 64 bit counters. Given this can be called from different contexts, we also need to deal with potential nested calls even though in practice they are considered extremely rare. One idea as suggested by Eric Dumazet was to use a reverse counter for this situation since we don't expect 64 bit overflows anyways; that way, we can avoid bigger gaps in the 64 bit counter space compared to just batch-wise increase. Even on machines with small number of cores (e.g. 4) the cookie generation shrinks from min/max/med/avg (ns) of 22/50/40/38.9 down to 10/35/14/17.3 when run in parallel from multiple CPUs. Signed-off-by: Daniel Borkmann <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Acked-by: Martin KaFai Lau <[email protected]> Cc: Eric Dumazet <[email protected]> Link: https://lore.kernel.org/bpf/8a80b8d27d3c49f9a14e1d5213c19d8be87d1dc8.1601477936.git.daniel@iogearbox.net
2020-09-29bpf: Fix context type resolving for extension programsToke Høiland-Jørgensen1-1/+8
Eelco reported we can't properly access arguments if the tracing program is attached to extension program. Having following program: SEC("classifier/test_pkt_md_access") int test_pkt_md_access(struct __sk_buff *skb) with its extension: SEC("freplace/test_pkt_md_access") int test_pkt_md_access_new(struct __sk_buff *skb) and tracing that extension with: SEC("fentry/test_pkt_md_access_new") int BPF_PROG(fentry, struct sk_buff *skb) It's not possible to access skb argument in the fentry program, with following error from verifier: ; int BPF_PROG(fentry, struct sk_buff *skb) 0: (79) r1 = *(u64 *)(r1 +0) invalid bpf_context access off=0 size=8 The problem is that btf_ctx_access gets the context type for the traced program, which is in this case the extension. But when we trace extension program, we want to get the context type of the program that the extension is attached to, so we can access the argument properly in the trace program. This version of the patch is tweaked slightly from Jiri's original one, since the refactoring in the previous patches means we have to get the target prog type from the new variable in prog->aux instead of directly from the target prog. Reported-by: Eelco Chaudron <[email protected]> Suggested-by: Jiri Olsa <[email protected]> Signed-off-by: Toke Høiland-Jørgensen <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Acked-by: Andrii Nakryiko <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-09-29bpf: Support attaching freplace programs to multiple attach pointsToke Høiland-Jørgensen2-16/+126
This enables support for attaching freplace programs to multiple attach points. It does this by amending the UAPI for bpf_link_Create with a target btf ID that can be used to supply the new attachment point along with the target program fd. The target must be compatible with the target that was supplied at program load time. The implementation reuses the checks that were factored out of check_attach_btf_id() to ensure compatibility between the BTF types of the old and new attachment. If these match, a new bpf_tracing_link will be created for the new attach target, allowing multiple attachments to co-exist simultaneously. The code could theoretically support multiple-attach of other types of tracing programs as well, but since I don't have a use case for any of those, there is no API support for doing so. Signed-off-by: Toke Høiland-Jørgensen <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Acked-by: Andrii Nakryiko <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-09-29bpf: Move prog->aux->linked_prog and trampoline into bpf_link on attachToke Høiland-Jørgensen7-253/+289
In preparation for allowing multiple attachments of freplace programs, move the references to the target program and trampoline into the bpf_tracing_link structure when that is created. To do this atomically, introduce a new mutex in prog->aux to protect writing to the two pointers to target prog and trampoline, and rename the members to make it clear that they are related. With this change, it is no longer possible to attach the same tracing program multiple times (detaching in-between), since the reference from the tracing program to the target disappears on the first attach. However, since the next patch will let the caller supply an attach target, that will also make it possible to attach to the same place multiple times. Signed-off-by: Toke Høiland-Jørgensen <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Acked-by: Andrii Nakryiko <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-09-29bpf/preload: Make sure Makefile cleans up after itself, and add .gitignoreToke Høiland-Jørgensen2-0/+6
The Makefile in bpf/preload builds a local copy of libbpf, but does not properly clean up after itself. This can lead to subsequent compilation failures, since the feature detection cache is kept around which can lead subsequent detection to fail. Fix this by properly setting clean-files, and while we're at it, also add a .gitignore for the directory to ignore the build artifacts. Fixes: d71fa5c9763c ("bpf: Add kernel module with user mode driver that populates bpffs.") Signed-off-by: Toke Høiland-Jørgensen <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-09-28bpf: Add bpf_seq_printf_btf helperAlan Maguire3-2/+36
A helper is added to allow seq file writing of kernel data structures using vmlinux BTF. Its signature is long bpf_seq_printf_btf(struct seq_file *m, struct btf_ptr *ptr, u32 btf_ptr_size, u64 flags); Flags and struct btf_ptr definitions/use are identical to the bpf_snprintf_btf helper, and the helper returns 0 on success or a negative error value. Suggested-by: Alexei Starovoitov <[email protected]> Signed-off-by: Alan Maguire <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-09-28bpf: Bump iter seq size to support BTF representation of large data structuresAlan Maguire1-2/+2
BPF iter size is limited to PAGE_SIZE; if we wish to display BTF-based representations of larger kernel data structures such as task_struct, this will be insufficient. Suggested-by: Alexei Starovoitov <[email protected]> Signed-off-by: Alan Maguire <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-09-28bpf: Add bpf_snprintf_btf helperAlan Maguire3-0/+70
A helper is added to support tracing kernel type information in BPF using the BPF Type Format (BTF). Its signature is long bpf_snprintf_btf(char *str, u32 str_size, struct btf_ptr *ptr, u32 btf_ptr_size, u64 flags); struct btf_ptr * specifies - a pointer to the data to be traced - the BTF id of the type of data pointed to - a flags field is provided for future use; these flags are not to be confused with the BTF_F_* flags below that control how the btf_ptr is displayed; the flags member of the struct btf_ptr may be used to disambiguate types in kernel versus module BTF, etc; the main distinction is the flags relate to the type and information needed in identifying it; not how it is displayed. For example a BPF program with a struct sk_buff *skb could do the following: static struct btf_ptr b = { }; b.ptr = skb; b.type_id = __builtin_btf_type_id(struct sk_buff, 1); bpf_snprintf_btf(str, sizeof(str), &b, sizeof(b), 0, 0); Default output looks like this: (struct sk_buff){ .transport_header = (__u16)65535, .mac_header = (__u16)65535, .end = (sk_buff_data_t)192, .head = (unsigned char *)0x000000007524fd8b, .data = (unsigned char *)0x000000007524fd8b, .truesize = (unsigned int)768, .users = (refcount_t){ .refs = (atomic_t){ .counter = (int)1, }, }, } Flags modifying display are as follows: - BTF_F_COMPACT: no formatting around type information - BTF_F_NONAME: no struct/union member names/types - BTF_F_PTR_RAW: show raw (unobfuscated) pointer values; equivalent to %px. - BTF_F_ZERO: show zero-valued struct/union members; they are not displayed by default Signed-off-by: Alan Maguire <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-09-28bpf: Move to generic BTF show support, apply it to seq files/stringsAlan Maguire1-102/+905
generalize the "seq_show" seq file support in btf.c to support a generic show callback of which we support two instances; the current seq file show, and a show with snprintf() behaviour which instead writes the type data to a supplied string. Both classes of show function call btf_type_show() with different targets; the seq file or the string to be written. In the string case we need to track additional data - length left in string to write and length to return that we would have written (a la snprintf). By default show will display type information, field members and their types and values etc, and the information is indented based upon structure depth. Zeroed fields are omitted. Show however supports flags which modify its behaviour: BTF_SHOW_COMPACT - suppress newline/indent. BTF_SHOW_NONAME - suppress show of type and member names. BTF_SHOW_PTR_RAW - do not obfuscate pointer values. BTF_SHOW_UNSAFE - do not copy data to safe buffer before display. BTF_SHOW_ZERO - show zeroed values (by default they are not shown). Signed-off-by: Alan Maguire <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-09-28bpf: Provide function to get vmlinux BTF informationAlan Maguire1-6/+12
It will be used later for BPF structure display support Signed-off-by: Alan Maguire <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-09-28bpf: verifier: refactor check_attach_btf_id()Toke Høiland-Jørgensen2-87/+118
The check_attach_btf_id() function really does three things: 1. It performs a bunch of checks on the program to ensure that the attachment is valid. 2. It stores a bunch of state about the attachment being requested in the verifier environment and struct bpf_prog objects. 3. It allocates a trampoline for the attachment. This patch splits out (1.) and (3.) into separate functions which will perform the checks, but return the computed values instead of directly modifying the environment. This is done in preparation for reusing the checks when the actual attachment is happening, which will allow tracing programs to have multiple (compatible) attachments. This also fixes a bug where a bunch of checks were skipped if a trampoline already existed for the tracing target. Fixes: 6ba43b761c41 ("bpf: Attachment verification for BPF_MODIFY_RETURN") Fixes: 1e6c62a88215 ("bpf: Introduce sleepable BPF programs") Acked-by: Andrii Nakryiko <[email protected]> Signed-off-by: Toke Høiland-Jørgensen <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]>
2020-09-28bpf: change logging calls from verbose() to bpf_log() and use log pointerToke Høiland-Jørgensen2-28/+28
In preparation for moving code around, change a bunch of references to env->log (and the verbose() logging helper) to use bpf_log() and a direct pointer to struct bpf_verifier_log. While we're touching the function signature, mark the 'prog' argument to bpf_check_type_match() as const. Also enhance the bpf_verifier_log_needed() check to handle NULL pointers for the log struct so we can re-use the code with logging disabled. Acked-by: Andrii Nakryiko <[email protected]> Signed-off-by: Toke Høiland-Jørgensen <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]>
2020-09-28bpf: disallow attaching modify_return tracing functions to other BPF programsToke Høiland-Jørgensen1-0/+5
From the checks and commit messages for modify_return, it seems it was never the intention that it should be possible to attach a tracing program with expected_attach_type == BPF_MODIFY_RETURN to another BPF program. However, check_attach_modify_return() will only look at the function name, so if the target function starts with "security_", the attach will be allowed even for bpf2bpf attachment. Fix this oversight by also blocking the modification if a target program is supplied. Fixes: 18644cec714a ("bpf: Fix use-after-free in fmod_ret check") Fixes: 6ba43b761c41 ("bpf: Attachment verification for BPF_MODIFY_RETURN") Acked-by: Andrii Nakryiko <[email protected]> Signed-off-by: Toke Høiland-Jørgensen <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]>
2020-09-28bpf: sockmap: Enable map_update_elem from bpf_iterLorenz Bauer1-1/+1
Allow passing a pointer to a BTF struct sock_common* when updating a sockmap or sockhash. Since BTF pointers can fault and therefore be NULL at runtime we need to add an additional !sk check to sock_map_update_elem. Since we may be passed a request or timewait socket we also need to check sk_fullsock. Doing this allows calling map_update_elem on sockmap from bpf_iter context, which uses BTF pointers. Signed-off-by: Lorenz Bauer <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-09-28bpf, cpumap: Remove rcpu pointer from cpu_map_build_skb signatureLorenzo Bianconi1-3/+2
Get rid of bpf_cpu_map_entry pointer in cpu_map_build_skb routine signature since it is no longer needed. Signed-off-by: Lorenzo Bianconi <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Link: https://lore.kernel.org/bpf/33cb9b7dc447de3ea6fd6ce713ac41bca8794423.1601292015.git.lorenzo@kernel.org
2020-09-28bpf: Enable BPF_PROG_TEST_RUN for raw_tracepointSong Liu2-1/+2
Add .test_run for raw_tracepoint. Also, introduce a new feature that runs the target program on a specific CPU. This is achieved by a new flag in bpf_attr.test, BPF_F_TEST_RUN_ON_CPU. When this flag is set, the program is triggered on cpu with id bpf_attr.test.cpu. This feature is needed for BPF programs that handle perf_event and other percpu resources, as the program can access these resource locally. Signed-off-by: Song Liu <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: John Fastabend <[email protected]> Acked-by: Andrii Nakryiko <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-09-25bpf, verifier: Remove redundant var_off.value ops in scalar known reg casesJohn Fastabend1-4/+2
In BPF_AND and BPF_OR alu cases we have this pattern when the src and dst tnum is a constant. 1 dst_reg->var_off = tnum_[op](dst_reg->var_off, src_reg.var_off) 2 scalar32_min_max_[op] 3 if (known) return 4 scalar_min_max_[op] 5 if (known) 6 __mark_reg_known(dst_reg, dst_reg->var_off.value [op] src_reg.var_off.value) The result is in 1 we calculate the var_off value and store it in the dst_reg. Then in 6 we duplicate this logic doing the op again on the value. The duplication comes from the the tnum_[op] handlers because they have already done the value calcuation. For example this is tnum_and(). struct tnum tnum_and(struct tnum a, struct tnum b) { u64 alpha, beta, v; alpha = a.value | a.mask; beta = b.value | b.mask; v = a.value & b.value; return TNUM(v, alpha & beta & ~v); } So lets remove the redundant op calculation. Its confusing for readers and unnecessary. Its also not harmful because those ops have the property, r1 & r1 = r1 and r1 | r1 = r1. Signed-off-by: John Fastabend <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]>
2020-09-25bpf: Change bpf_sk_storage_*() to accept ARG_PTR_TO_BTF_ID_SOCK_COMMONMartin KaFai Lau1-2/+2
This patch changes the bpf_sk_storage_*() to take ARG_PTR_TO_BTF_ID_SOCK_COMMON such that they will work with the pointer returned by the bpf_skc_to_*() helpers also. A micro benchmark has been done on a "cgroup_skb/egress" bpf program which does a bpf_sk_storage_get(). It was driven by netperf doing a 4096 connected UDP_STREAM test with 64bytes packet. The stats from "kernel.bpf_stats_enabled" shows no meaningful difference. The sk_storage_get_btf_proto, sk_storage_delete_btf_proto, btf_sk_storage_get_proto, and btf_sk_storage_delete_proto are no longer needed, so they are removed. Signed-off-by: Martin KaFai Lau <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Acked-by: Lorenz Bauer <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-09-25bpf: Enable bpf_skc_to_* sock casting helper to networking prog typeMartin KaFai Lau1-2/+32
There is a constant need to add more fields into the bpf_tcp_sock for the bpf programs running at tc, sock_ops...etc. A current workaround could be to use bpf_probe_read_kernel(). However, other than making another helper call for reading each field and missing CO-RE, it is also not as intuitive to use as directly reading "tp->lsndtime" for example. While already having perfmon cap to do bpf_probe_read_kernel(), it will be much easier if the bpf prog can directly read from the tcp_sock. This patch tries to do that by using the existing casting-helpers bpf_skc_to_*() whose func_proto returns a btf_id. For example, the func_proto of bpf_skc_to_tcp_sock returns the btf_id of the kernel "struct tcp_sock". These helpers are also added to is_ptr_cast_function(). It ensures the returning reg (BPF_REF_0) will also carries the ref_obj_id. That will keep the ref-tracking works properly. The bpf_skc_to_* helpers are made available to most of the bpf prog types in filter.c. The bpf_skc_to_* helpers will be limited by perfmon cap. This patch adds a ARG_PTR_TO_BTF_ID_SOCK_COMMON. The helper accepting this arg can accept a btf-id-ptr (PTR_TO_BTF_ID + &btf_sock_ids[BTF_SOCK_TYPE_SOCK_COMMON]) or a legacy-ctx-convert-skc-ptr (PTR_TO_SOCK_COMMON). The bpf_skc_to_*() helpers are changed to take ARG_PTR_TO_BTF_ID_SOCK_COMMON such that they will accept pointer obtained from skb->sk. Instead of specifying both arg_type and arg_btf_id in the same func_proto which is how the current ARG_PTR_TO_BTF_ID does, the arg_btf_id of the new ARG_PTR_TO_BTF_ID_SOCK_COMMON is specified in the compatible_reg_types[] in verifier.c. The reason is the arg_btf_id is always the same. Discussion in this thread: https://lore.kernel.org/bpf/[email protected]/ The ARG_PTR_TO_BTF_ID_ part gives a clear expectation that the helper is expecting a PTR_TO_BTF_ID which could be NULL. This is the same behavior as the existing helper taking ARG_PTR_TO_BTF_ID. The _SOCK_COMMON part means the helper is also expecting the legacy SOCK_COMMON pointer. By excluding the _OR_NULL part, the bpf prog cannot call helper with a literal NULL which doesn't make sense in most cases. e.g. bpf_skc_to_tcp_sock(NULL) will be rejected. All PTR_TO_*_OR_NULL reg has to do a NULL check first before passing into the helper or else the bpf prog will be rejected. This behavior is nothing new and consistent with the current expectation during bpf-prog-load. [ ARG_PTR_TO_BTF_ID_SOCK_COMMON will be used to replace ARG_PTR_TO_SOCK* of other existing helpers later such that those existing helpers can take the PTR_TO_BTF_ID returned by the bpf_skc_to_*() helpers. The only special case is bpf_sk_lookup_assign() which can accept a literal NULL ptr. It has to be handled specially in another follow up patch if there is a need (e.g. by renaming ARG_PTR_TO_SOCKET_OR_NULL to ARG_PTR_TO_BTF_ID_SOCK_COMMON_OR_NULL). ] [ When converting the older helpers that take ARG_PTR_TO_SOCK* in the later patch, if the kernel does not support BTF, ARG_PTR_TO_BTF_ID_SOCK_COMMON will behave like ARG_PTR_TO_SOCK_COMMON because no reg->type could have PTR_TO_BTF_ID in this case. It is not a concern for the newer-btf-only helper like the bpf_skc_to_*() here though because these helpers must require BTF vmlinux to begin with. ] Signed-off-by: Martin KaFai Lau <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Acked-by: John Fastabend <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-09-25bpf: Move the PTR_TO_BTF_ID check to check_reg_type()Martin KaFai Lau1-29/+31
check_reg_type() checks whether a reg can be used as an arg of a func_proto. For PTR_TO_BTF_ID, the check is actually not completely done until the reg->btf_id is pointing to a kernel struct that is acceptable by the func_proto. Thus, this patch moves the btf_id check into check_reg_type(). "arg_type" and "arg_btf_id" are passed to check_reg_type() instead of "compatible". The compatible_reg_types[] usage is localized in check_reg_type() now. The "if (!btf_id) verbose(...); " is also removed since it won't happen. Signed-off-by: Martin KaFai Lau <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Acked-by: Lorenz Bauer <[email protected]> Acked-by: John Fastabend <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-09-23Merge branch 'rtt-speedup.2020.09.16a' of ↵Alexei Starovoitov1-12/+41
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into bpf-next Signed-off-by: Alexei Starovoitov <[email protected]>
2020-09-23Revert "bpf: Fix potential call bpf_link_free() in atomic context"Alexei Starovoitov1-2/+6
This reverts commit 31f23a6a181c81543b10a1a9056b0e6c7ef1c747. This change made many selftests/bpf flaky: flow_dissector, sk_lookup, sk_assign and others. There was no issue in the code. Signed-off-by: Alexei Starovoitov <[email protected]>
2020-09-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller10-245/+523
Alexei Starovoitov says: ==================== pull-request: bpf-next 2020-09-23 The following pull-request contains BPF updates for your *net-next* tree. We've added 95 non-merge commits during the last 22 day(s) which contain a total of 124 files changed, 4211 insertions(+), 2040 deletions(-). The main changes are: 1) Full multi function support in libbpf, from Andrii. 2) Refactoring of function argument checks, from Lorenz. 3) Make bpf_tail_call compatible with functions (subprograms), from Maciej. 4) Program metadata support, from YiFei. 5) bpf iterator optimizations, from Yonghong. ==================== Signed-off-by: David S. Miller <[email protected]>
2020-09-23bpf: Explicitly size compatible_reg_typesLorenz Bauer1-2/+1
Arrays with designated initializers have an implicit length of the highest initialized value plus one. I used this to ensure that newly added entries in enum bpf_reg_type get a NULL entry in compatible_reg_types. This is difficult to understand since it requires knowledge of the peculiarities of designated initializers. Use __BPF_ARG_TYPE_MAX to size the array instead. Suggested-by: Alexei Starovoitov <[email protected]> Signed-off-by: Lorenz Bauer <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Acked-by: Andrii Nakryiko <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-09-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller20-74/+172
Two minor conflicts: 1) net/ipv4/route.c, adding a new local variable while moving another local variable and removing it's initial assignment. 2) drivers/net/dsa/microchip/ksz9477.c, overlapping changes. One pretty prints the port mode differently, whilst another changes the driver to try and obtain the port mode from the port node rather than the switch node. Signed-off-by: David S. Miller <[email protected]>
2020-09-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds2-12/+7
Pull networking fixes from Jakub Kicinski: - fix failure to add bond interfaces to a bridge, the offload-handling code was too defensive there and recent refactoring unearthed that. Users complained (Ido) - fix unnecessarily reflecting ECN bits within TOS values / QoS marking in TCP ACK and reset packets (Wei) - fix a deadlock with bpf iterator. Hopefully we're in the clear on this front now... (Yonghong) - BPF fix for clobbering r2 in bpf_gen_ld_abs (Daniel) - fix AQL on mt76 devices with FW rate control and add a couple of AQL issues in mac80211 code (Felix) - fix authentication issue with mwifiex (Maximilian) - WiFi connectivity fix: revert IGTK support in ti/wlcore (Mauro) - fix exception handling for multipath routes via same device (David Ahern) - revert back to a BH spin lock flavor for nsid_lock: there are paths which do require the BH context protection (Taehee) - fix interrupt / queue / NAPI handling in the lantiq driver (Hauke) - fix ife module load deadlock (Cong) - make an adjustment to netlink reply message type for code added in this release (the sole change touching uAPI here) (Michal) - a number of fixes for small NXP and Microchip switches (Vladimir) [ Pull request acked by David: "you can expect more of this in the future as I try to delegate more things to Jakub" ] * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (167 commits) net: mscc: ocelot: fix some key offsets for IP4_TCP_UDP VCAP IS2 entries net: dsa: seville: fix some key offsets for IP4_TCP_UDP VCAP IS2 entries net: dsa: felix: fix some key offsets for IP4_TCP_UDP VCAP IS2 entries inet_diag: validate INET_DIAG_REQ_PROTOCOL attribute net: bridge: br_vlan_get_pvid_rcu() should dereference the VLAN group under RCU net: Update MAINTAINERS for MediaTek switch driver net/mlx5e: mlx5e_fec_in_caps() returns a boolean net/mlx5e: kTLS, Avoid kzalloc(GFP_KERNEL) under spinlock net/mlx5e: kTLS, Fix leak on resync error flow net/mlx5e: kTLS, Add missing dma_unmap in RX resync net/mlx5e: kTLS, Fix napi sync and possible use-after-free net/mlx5e: TLS, Do not expose FPGA TLS counter if not supported net/mlx5e: Fix using wrong stats_grps in mlx5e_update_ndo_stats() net/mlx5e: Fix multicast counter not up-to-date in "ip -s" net/mlx5e: Fix endianness when calculating pedit mask first bit net/mlx5e: Enable adding peer miss rules only if merged eswitch is supported net/mlx5e: CT: Fix freeing ct_label mapping net/mlx5e: Fix memory leak of tunnel info when rule under multipath not ready net/mlx5e: Use synchronize_rcu to sync with NAPI net/mlx5e: Use RCU to protect rq->xdp_prog ...
2020-09-22Merge tag 'trace-v5.9-rc5' of ↵Linus Torvalds5-28/+53
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixes from Steven Rostedt: - Check kprobe is enabled before unregistering from ftrace as it isn't registered when disabled. - Remove kprobes enabled via command-line that is on init text when freed. - Add missing RCU synchronization for ftrace trampoline symbols removed from kallsyms. - Free trampoline on error path if ftrace_startup() fails. - Give more space for the longer PID numbers in trace output. - Fix a possible double free in the histogram code. - A couple of fixes that were discovered by sparse. * tag 'trace-v5.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: bootconfig: init: make xbc_namebuf static kprobes: tracing/kprobes: Fix to kill kprobes on initmem after boot tracing: fix double free ftrace: Let ftrace_enable_sysctl take a kernel pointer buffer tracing: Make the space reserved for the pid wider ftrace: Fix missing synchronize_rcu() removing trampoline from kallsyms ftrace: Free the trampoline when ftrace_startup() fails kprobes: Fix to check probe enabled before disarm_kprobe_ftrace()
2020-09-21selftests/bpf: Fix stat probe in d_path testJiri Olsa1-0/+8
Some kernels builds might inline vfs_getattr call within fstat syscall code path, so fentry/vfs_getattr trampoline is not called. Add security_inode_getattr to allowlist and switch the d_path test stat trampoline to security_inode_getattr. Keeping dentry_open and filp_close, because they are in their own files, so unlikely to be inlined, but in case they are, adding security_file_open. Adding flags that indicate trampolines were called and failing the test if any of them got missed, so it's easier to identify the issue next time. Fixes: e4d1af4b16f8 ("selftests/bpf: Add test for d_path helper") Suggested-by: Alexei Starovoitov <[email protected]> Signed-off-by: Jiri Olsa <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-09-21bpf: Use a table to drive helper arg type checksLorenz Bauer1-74/+109
The mapping between bpf_arg_type and bpf_reg_type is encoded in a big hairy if statement that is hard to follow. The debug output also leaves to be desired: if a reg_type doesn't match we only print one of the options, instead printing all the valid ones. Convert the if statement into a table which is then used to drive type checking. If none of the reg_types match we print all options, e.g.: R2 type=rdonly_buf expected=fp, pkt, pkt_meta, map_value Signed-off-by: Lorenz Bauer <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Acked-by: Martin KaFai Lau <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-09-21bpf: Hoist type checking for nullable arg typesLorenz Bauer1-34/+30
check_func_arg has a plethora of weird if statements with empty branches. They work around the fact that *_OR_NULL argument types should accept a SCALAR_VALUE register, as long as it's value is 0. These statements make it difficult to reason about the type checking logic. Instead, skip more detailed type checking logic iff the register is 0, and the function expects a nullable type. This allows simplifying the type checking itself. Signed-off-by: Lorenz Bauer <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Acked-by: Andrii Nakryiko <[email protected]> Acked-by: Martin KaFai Lau <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-09-21bpf: Check ARG_PTR_TO_SPINLOCK register type in check_func_argLorenz Bauer1-14/+14
Move the check for PTR_TO_MAP_VALUE to check_func_arg, where all other checking is done as well. Move the invocation of process_spin_lock away from the register type checking, to allow a future refactoring. Signed-off-by: Lorenz Bauer <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Acked-by: Martin KaFai Lau <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-09-21bpf: Set meta->raw_mode for pointers close to useLorenz Bauer1-1/+5
If we encounter a pointer to memory, we set meta->raw_mode depending on the type of memory we point at. What isn't obvious is that this information is only used when the next memory size argument is encountered. Move the assignment closer to where it's used, and add a comment that explains what is going on. Signed-off-by: Lorenz Bauer <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Acked-by: Martin KaFai Lau <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-09-21bpf: Make context access check genericLorenz Bauer1-3/+4
Always check context access if the register we're operating on is PTR_TO_CTX, rather than relying on ARG_PTR_TO_CTX. This allows simplifying the arg_type checking section of the function. Signed-off-by: Lorenz Bauer <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Acked-by: Martin KaFai Lau <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-09-21bpf: Make reference tracking genericLorenz Bauer1-16/+10
Instead of dealing with reg->ref_obj_id individually for every arg type that needs it, rely on the fact that ref_obj_id is zero if the register is not reference tracked. Signed-off-by: Lorenz Bauer <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Acked-by: Andrii Nakryiko <[email protected]> Acked-by: Martin KaFai Lau <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-09-21bpf: Make BTF pointer type checking genericLorenz Bauer1-18/+20
Perform BTF type checks if the register we're working on contains a BTF pointer, rather than if the argument is for a BTF pointer. This is easier to understand, and allows removing the code from the arg_type checking section of the function. Signed-off-by: Lorenz Bauer <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Acked-by: Martin KaFai Lau <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-09-21bpf: Allow specifying a BTF ID per argument in function protosLorenz Bauer5-53/+32
Function prototypes using ARG_PTR_TO_BTF_ID currently use two ways to signal which BTF IDs are acceptable. First, bpf_func_proto.btf_id is an array of IDs, one for each argument. This array is only accessed up to the highest numbered argument that uses ARG_PTR_TO_BTF_ID and may therefore be less than five arguments long. It usually points at a BTF_ID_LIST. Second, check_btf_id is a function pointer that is called by the verifier if present. It gets the actual BTF ID of the register, and the argument number we're currently checking. It turns out that the only user check_arg_btf_id ignores the argument, and is simply used to check whether the BTF ID has a struct sock_common at it's start. Replace both of these mechanisms with an explicit BTF ID for each argument in a function proto. Thanks to btf_struct_ids_match this is very flexible: check_arg_btf_id can be replaced by requiring struct sock_common. Signed-off-by: Lorenz Bauer <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Acked-by: Martin KaFai Lau <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-09-21bpf: Check scalar or invalid register in check_helper_mem_accessLorenz Bauer1-13/+11
Move the check for a NULL or zero register to check_helper_mem_access. This makes check_stack_boundary easier to understand. Signed-off-by: Lorenz Bauer <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Acked-by: Andrii Nakryiko <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-09-21btf: Make btf_set_contains take a const pointerLorenz Bauer1-1/+1
bsearch doesn't modify the contents of the array, so we can take a const pointer. Signed-off-by: Lorenz Bauer <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Acked-by: Andrii Nakryiko <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-09-21Merge branch 'rcu/urgent' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu Pull RCU fix from Paul McKenney: "This contains a single commit that fixes a bug that was introduced in the last merge window. This bug causes a compiler warning complaining about show_rcu_tasks_classic_gp_kthread() being an unused static function in !SMP kernels. The fix is straightforward, just adding an 'inline' to make this a static inline function, thus avoiding the warning. This bug was reported by Laurent Pinchart, who would like it fixed sooner rather than later" * 'rcu/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: rcu-tasks: Prevent complaints of unused show_rcu_tasks_classic_gp_kthread()
2020-09-21bpf: Fix potential call bpf_link_free() in atomic contextMuchun Song1-6/+2
The in_atomic() macro cannot always detect atomic context, in particular, it cannot know about held spinlocks in non-preemptible kernels. Although, there is no user call bpf_link_put() with holding spinlock now, be on the safe side, so we can avoid this in the future. Signed-off-by: Muchun Song <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Song Liu <[email protected]> Acked-by: Andrii Nakryiko <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-09-20Merge tag 'core_urgent_for_v5.9_rc6' of ↵Linus Torvalds1-2/+4
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull syscall tracing fix from Borislav Petkov: "Fix the seccomp syscall rewriting so that trace and audit see the rewritten syscall number, from Kees Cook" * tag 'core_urgent_for_v5.9_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: core/entry: Report syscall correctly for trace and audit
2020-09-20Merge tag 'locking_urgent_for_v5.9_rc6' of ↵Linus Torvalds3-8/+33
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fixes from Borislav Petkov: "Two fixes from the locking/urgent pile: - Fix lockdep's detection of "USED" <- "IN-NMI" inversions (Peter Zijlstra) - Make percpu-rwsem operations on the semaphore's ->read_count IRQ-safe because it can be used in an IRQ context (Hou Tao)" * tag 'locking_urgent_for_v5.9_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking/percpu-rwsem: Use this_cpu_{inc,dec}() for read_count locking/lockdep: Fix "USED" <- "IN-NMI" inversions
2020-09-19Merge branch 'akpm' (patches from Andrew)Linus Torvalds3-4/+10
Merge fixes from Andrew Morton: "15 patches. Subsystems affected by this patch series: mailmap, mm/hotfixes, mm/thp, mm/memory-hotplug, misc, kcsan" * emailed patches from Andrew Morton <[email protected]>: kcsan: kconfig: move to menu 'Generic Kernel Debugging Instruments' fs/fs-writeback.c: adjust dirtytime_interval_handler definition to match prototype stackleak: let stack_erasing_sysctl take a kernel pointer buffer ftrace: let ftrace_enable_sysctl take a kernel pointer buffer mm/memory_hotplug: drain per-cpu pages again during memory offline selftests/vm: fix display of page size in map_hugetlb mm/thp: fix __split_huge_pmd_locked() for migration PMD kprobes: fix kill kprobe which has been marked as gone tmpfs: restore functionality of nr_inodes=0 mlock: fix unevictable_pgs event counts on THP mm: fix check_move_unevictable_pages() on THP mm: migration of hugetlbfs page skip memcg ksm: reinstate memcg charge on copied pages mailmap: add older email addresses for Kees Cook
2020-09-19stackleak: let stack_erasing_sysctl take a kernel pointer bufferTobias Klauser1-1/+1
Commit 32927393dc1c ("sysctl: pass kernel pointers to ->proc_handler") changed ctl_table.proc_handler to take a kernel pointer. Adjust the signature of stack_erasing_sysctl to match ctl_table.proc_handler which fixes the following sparse warning: kernel/stackleak.c:31:50: warning: incorrect type in argument 3 (different address spaces) kernel/stackleak.c:31:50: expected void * kernel/stackleak.c:31:50: got void [noderef] __user *buffer Fixes: 32927393dc1c ("sysctl: pass kernel pointers to ->proc_handler") Signed-off-by: Tobias Klauser <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Al Viro <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-09-19ftrace: let ftrace_enable_sysctl take a kernel pointer bufferTobias Klauser1-2/+1
Commit 32927393dc1c ("sysctl: pass kernel pointers to ->proc_handler") changed ctl_table.proc_handler to take a kernel pointer. Adjust the signature of ftrace_enable_sysctl to match ctl_table.proc_handler which fixes the following sparse warning: kernel/trace/ftrace.c:7544:43: warning: incorrect type in argument 3 (different address spaces) kernel/trace/ftrace.c:7544:43: expected void * kernel/trace/ftrace.c:7544:43: got void [noderef] __user *buffer Fixes: 32927393dc1c ("sysctl: pass kernel pointers to ->proc_handler") Signed-off-by: Tobias Klauser <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Al Viro <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-09-19kprobes: fix kill kprobe which has been marked as goneMuchun Song1-1/+8
If a kprobe is marked as gone, we should not kill it again. Otherwise, we can disarm the kprobe more than once. In that case, the statistics of kprobe_ftrace_enabled can unbalance which can lead to that kprobe do not work. Fixes: e8386a0cb22f ("kprobes: support probing module __exit function") Co-developed-by: Chengming Zhou <[email protected]> Signed-off-by: Muchun Song <[email protected]> Signed-off-by: Chengming Zhou <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Acked-by: Masami Hiramatsu <[email protected]> Cc: "Naveen N . Rao" <[email protected]> Cc: Anil S Keshavamurthy <[email protected]> Cc: David S. Miller <[email protected]> Cc: Song Liu <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-09-18Merge tag 's390-5.9-6' of ↵Linus Torvalds1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 fixes from Vasily Gorbik: - Fix order in trace_hardirqs_off_caller() to make locking state consistent even if the IRQ tracer calls into lockdep again. Touches common code. Acked-by Peter Zijlstra. - Correctly handle secure storage violation exception to avoid kernel panic triggered by user space misbehaviour. - Switch the idle->seqcount over to using raw_write_*() to avoid "suspicious RCU usage". - Fix memory leaks on hard unplug in pci code. - Use kvmalloc instead of kmalloc for larger allocations in zcrypt. - Add few missing __init annotations to static functions to avoid section mismatch complains when functions are not inlined. * tag 's390-5.9-6' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390: add 3f program exception handler lockdep: fix order in trace_hardirqs_off_caller() s390/pci: fix leak of DMA tables on hard unplug s390/init: add missing __init annotations s390/zcrypt: fix kmalloc 256k failure s390/idle: fix suspicious RCU usage
2020-09-19bpf: Use hlist_add_head_rcu when linking to local_storageMartin KaFai Lau1-1/+1
The local_storage->list will be traversed by rcu reader in parallel. Thus, hlist_add_head_rcu() is needed in bpf_selem_link_storage_nolock(). This patch fixes it. This part of the code has recently been refactored in bpf-next and this patch makes changes to the new file "bpf_local_storage.c". Instead of using the original offending commit in the Fixes tag, the commit that created the file "bpf_local_storage.c" is used. A separate fix has been provided to the bpf tree. Fixes: 450af8d0f6be ("bpf: Split bpf_local_storage to bpf_sk_storage") Signed-off-by: Martin KaFai Lau <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Song Liu <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-09-18kprobes: tracing/kprobes: Fix to kill kprobes on initmem after bootMasami Hiramatsu1-0/+22
Since kprobe_event= cmdline option allows user to put kprobes on the functions in initmem, kprobe has to make such probes gone after boot. Currently the probes on the init functions in modules will be handled by module callback, but the kernel init text isn't handled. Without this, kprobes may access non-exist text area to disable or remove it. Link: https://lkml.kernel.org/r/159972810544.428528.1839307531600646955.stgit@devnote2 Fixes: 970988e19eb0 ("tracing/kprobe: Add kprobe_event= boot parameter") Cc: Jonathan Corbet <[email protected]> Cc: Shuah Khan <[email protected]> Cc: Randy Dunlap <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: [email protected] Signed-off-by: Masami Hiramatsu <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>