aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2020-09-29Merge branch 'octeontx2-af-cleanup-and-extend-parser-config'David S. Miller5-214/+557
Stanislaw Kardach says: ==================== octeontx2-af: cleanup and extend parser config Current KPU configuration data is spread over multiple files which makes it hard to read. Clean this up by gathering all configuration data in a single structure and also in a single file (npc_profile.h). This should increase the readability of KPU handling code (since it always references same structure), simplify updates to the CAM key extraction code and allow abstracting out the configuration source. Additionally extend and fix the parser config to support additional DSA types, NAT-T-ESP and IPv6 fields. Patch 1 ensures that CUSTOMx LTYPEs are not aliased with meaningful LTYPEs where possible. Patch 2 gathers all KPU profile related data into a single struct and creates an adapter structure which provides an interface to the KPU profile for the octeontx2-af driver. Patches 3-4 add support for Extended DSA, eDSA and Forward DSA. Patches 5-6 adds IPv6 fields to CAM key extraction and optimize the parser performance for fragmented IPv6 packets. Patch 7 refactors ESP handling in the parser to support NAT-T-ESP. ==================== Signed-off-by: David S. Miller <[email protected]>
2020-09-29octeontx2-af: add parser support for NAT-T-ESPKiran Kumar K2-29/+65
Add support for NAT-T-ESP to KPU parser configuration. NAT ESP is a UDP based protocol. So move ESP to LE so that both UDP and ESP can be extracted. Signed-off-by: Kiran Kumar K <[email protected]> Acked-by: Sunil Goutham <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-09-29octeontx2-af: optimize parsing of IPv6 fragmentsAbhijit Ayarekar1-40/+40
IPv6 fragmented packet may not contain completed layer 4 information. So stop KPU parsing after setting ipv6 fragmentation flag. Signed-off-by: Abhijit Ayarekar <[email protected]> Acked-by: Sunil Goutham <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-09-29octeontx2-af: Add IPv6 fields to default MKEXVidhya Vidhyaraman1-0/+5
Added some IPv6 protocol fields to the default MKEX profile. They include everything from the beginning of IP header and up to source address. The pattern occupies full KW2 in MCAM entry. Only one out of two LD registers for this protocol is used. Signed-off-by: Vidhya Vidhyaraman <[email protected]> Acked-by: Sunil Goutham <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-09-29octeontx2-af: fix Extended DSA and eDSA parsingSatha Rao1-3/+3
KPU profile interpret Extended DSA and eDSA by looking source dev. This was incorrect and it restricts to use few source device ids and also created confusion while parsing regular DSA tag. With below patch lookup was based on bit 12 of Word0. This is always zero for DSA tag and it should be one for Extended DSA and eDSA. Signed-off-by: Satha Rao <[email protected]> Acked-by: Sunil Goutham <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-09-29octeontx2-af: add parser support for Forward DSAHariprasad Kelam2-0/+145
Marvell Prestera switches supports distributed switch architecture by inserting Forward DSA tag of 4 bytes right after ethernet SMAC. This tag don't have a tpid field. This patch provides parser and extraction support for the same. Default ldata extraction profile added for FDSA such that Src_port is extracted and placed inplace of vlanid field. Like extended DSA and eDSA tags,a special PKIND of 62 is used for this tag. Signed-off-by: Hariprasad Kelam <[email protected]> Acked-by: Sunil Goutham <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-09-29octeontx2-af: cleanup KPU config dataStanislaw Kardach5-144/+301
Refactor KPU related NPC code gathering all configuration data in a structured format and putting it in one place (npc_profile.h). This increases readability and makes it easier to extend the profile configuration (as opposed to jumping between multiple header and source files). To do this: * Gather all KPU profile related data into a single adapter struct. * Convert the built-in MKEX definition to a structured one to streamline the MKEX loading. * Convert LT default register configuration into a structure, keeping default protocol settings in same file where identifiers for those protocols are defined. * Add a single point for KPU profile loading, so that its source may change in the future once proper interfaces for loading such config are in place. Signed-off-by: Stanislaw Kardach <[email protected]> Acked-by: Sunil Goutham <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-09-29octeontx2-af: fix LD CUSTOM LTYPE aliasingStanislaw Kardach1-2/+2
Since LD contains LTYPE definitions tweaked toward efficient NIX_AF_RX_FLOW_KEY_ALG(0..31)_FIELD(0..4) usage, the original location of NPC_LT_LD_CUSTOM0/1 was aliased with MPLS_IN_* definitions. Moving custom frame to value 6 and 7 removes the aliasing at the cost of custom frames being also considered when TCP/UDP RSS algo is configured. However since the goal of CUSTOM frames is to classify them to a separate set of RQs, this cost is acceptable. Signed-off-by: Stanislaw Kardach <[email protected]> Acked-by: Sunil Goutham <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-09-29selftests: Add selftest for disallowing modify_return attachment to freplaceToke Høiland-Jørgensen3-1/+71
This adds a selftest that ensures that modify_return tracing programs cannot be attached to freplace programs. The security_ prefix is added to the freplace program because that would otherwise let it pass the check for modify_return. Signed-off-by: Toke Høiland-Jørgensen <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Acked-by: Andrii Nakryiko <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-09-29selftests/bpf: Adding test for arg dereference in extension traceJiri Olsa3-0/+154
Adding test that setup following program: SEC("classifier/test_pkt_md_access") int test_pkt_md_access(struct __sk_buff *skb) with its extension: SEC("freplace/test_pkt_md_access") int test_pkt_md_access_new(struct __sk_buff *skb) and tracing that extension with: SEC("fentry/test_pkt_md_access_new") int BPF_PROG(fentry, struct sk_buff *skb) The test verifies that the tracing program can dereference skb argument properly. Signed-off-by: Jiri Olsa <[email protected]> Signed-off-by: Toke Høiland-Jørgensen <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Acked-by: Andrii Nakryiko <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-09-29selftests: Add test for multiple attachments of freplace programToke Høiland-Jørgensen2-32/+139
This adds a selftest for attaching an freplace program to multiple targets simultaneously. Signed-off-by: Toke Høiland-Jørgensen <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Acked-by: Andrii Nakryiko <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-09-29libbpf: Add support for freplace attachment in bpf_link_createToke Høiland-Jørgensen5-9/+60
This adds support for supplying a target btf ID for the bpf_link_create() operation, and adds a new bpf_program__attach_freplace() high-level API for attaching freplace functions with a target. Signed-off-by: Toke Høiland-Jørgensen <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Acked-by: Andrii Nakryiko <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-09-29bpf: Fix context type resolving for extension programsToke Høiland-Jørgensen1-1/+8
Eelco reported we can't properly access arguments if the tracing program is attached to extension program. Having following program: SEC("classifier/test_pkt_md_access") int test_pkt_md_access(struct __sk_buff *skb) with its extension: SEC("freplace/test_pkt_md_access") int test_pkt_md_access_new(struct __sk_buff *skb) and tracing that extension with: SEC("fentry/test_pkt_md_access_new") int BPF_PROG(fentry, struct sk_buff *skb) It's not possible to access skb argument in the fentry program, with following error from verifier: ; int BPF_PROG(fentry, struct sk_buff *skb) 0: (79) r1 = *(u64 *)(r1 +0) invalid bpf_context access off=0 size=8 The problem is that btf_ctx_access gets the context type for the traced program, which is in this case the extension. But when we trace extension program, we want to get the context type of the program that the extension is attached to, so we can access the argument properly in the trace program. This version of the patch is tweaked slightly from Jiri's original one, since the refactoring in the previous patches means we have to get the target prog type from the new variable in prog->aux instead of directly from the target prog. Reported-by: Eelco Chaudron <[email protected]> Suggested-by: Jiri Olsa <[email protected]> Signed-off-by: Toke Høiland-Jørgensen <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Acked-by: Andrii Nakryiko <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-09-29bpf: Support attaching freplace programs to multiple attach pointsToke Høiland-Jørgensen5-20/+142
This enables support for attaching freplace programs to multiple attach points. It does this by amending the UAPI for bpf_link_Create with a target btf ID that can be used to supply the new attachment point along with the target program fd. The target must be compatible with the target that was supplied at program load time. The implementation reuses the checks that were factored out of check_attach_btf_id() to ensure compatibility between the BTF types of the old and new attachment. If these match, a new bpf_tracing_link will be created for the new attach target, allowing multiple attachments to co-exist simultaneously. The code could theoretically support multiple-attach of other types of tracing programs as well, but since I don't have a use case for any of those, there is no API support for doing so. Signed-off-by: Toke Høiland-Jørgensen <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Acked-by: Andrii Nakryiko <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-09-29bpf: Move prog->aux->linked_prog and trampoline into bpf_link on attachToke Høiland-Jørgensen8-259/+298
In preparation for allowing multiple attachments of freplace programs, move the references to the target program and trampoline into the bpf_tracing_link structure when that is created. To do this atomically, introduce a new mutex in prog->aux to protect writing to the two pointers to target prog and trampoline, and rename the members to make it clear that they are related. With this change, it is no longer possible to attach the same tracing program multiple times (detaching in-between), since the reference from the tracing program to the target disappears on the first attach. However, since the next patch will let the caller supply an attach target, that will also make it possible to attach to the same place multiple times. Signed-off-by: Toke Høiland-Jørgensen <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Acked-by: Andrii Nakryiko <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-09-29Merge branch 'libbpf: support loading/storing any BTF'Alexei Starovoitov6-98/+419
Andrii Nakryiko says: ==================== Add support for loading and storing BTF in either little- or big-endian integer encodings, regardless of host endianness. This allows users of libbpf to not care about endianness when they don't want to and transparently open/load BTF of any endianness. libbpf will preserve original endianness and will convert output raw data as necessary back to original endianness, if necessary. This allows tools like pahole to be ignorant to such issues during cross-compilation. While working with BTF data in memory, the endianness is always native to the host. Convetion can happen only during btf__get_raw_data() call, and only in a raw data copy. Additionally, it's possible to force output BTF endianness through new btf__set_endianness() API. This which allows to create flexible tools doing arbitrary conversions of BTF endianness, just by relying on libbpf. Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Tony Ambardar <[email protected]> Cc: Ilya Leoshkevich <[email protected]> Cc: Luka Perkov <[email protected]> ==================== Signed-off-by: Alexei Starovoitov <[email protected]>
2020-09-29selftests/bpf: Test BTF's handling of endiannessAndrii Nakryiko1-0/+101
Add selftests juggling endianness back and forth to validate BTF's handling of endianness convertions internally. Signed-off-by: Andrii Nakryiko <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-09-29libbpf: Support BTF loading and raw data output in both endiannessAndrii Nakryiko3-64/+255
Teach BTF to recognized wrong endianness and transparently convert it internally to host endianness. Original endianness of BTF will be preserved and used during btf__get_raw_data() to convert resulting raw data to the same endianness and a source raw_data. This means that little-endian host can parse big-endian BTF with no issues, all the type data will be presented to the client application in native endianness, but when it's time for emitting BTF to persist it in a file (e.g., after BTF deduplication), original non-native endianness will be preserved and stored. It's possible to query original endianness of BTF data with new btf__endianness() API. It's also possible to override desired output endianness with btf__set_endianness(), so that if application needs to load, say, big-endian BTF and store it as little-endian BTF, it's possible to manually override this. If btf__set_endianness() was used to change endianness, btf__endianness() will reflect overridden endianness. Given there are no known use cases for supporting cross-endianness for .BTF.ext, loading .BTF.ext in non-native endianness is not supported. Signed-off-by: Andrii Nakryiko <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-09-29selftests/bpf: Move and extend ASSERT_xxx() testing macrosAndrii Nakryiko2-34/+63
Move existing ASSERT_xxx() macros out of btf_write selftest into test_progs.h to use across all selftests. Also expand a set of macros for typical cases. Now there are the following macros: - ASSERT_EQ() -- check for equality of two integers; - ASSERT_STREQ() -- check for equality of two C strings; - ASSERT_OK() -- check for successful (zero) return result; - ASSERT_ERR() -- check for unsuccessful (non-zero) return result; - ASSERT_NULL() -- check for NULL pointer; - ASSERT_OK_PTR() -- check for a valid pointer; - ASSERT_ERR_PTR() -- check for NULL or negative error encoded in a pointer. Signed-off-by: Andrii Nakryiko <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-09-29selftests: Make sure all 'skel' variables are declared staticToke Høiland-Jørgensen2-2/+2
If programs in prog_tests using skeletons declare the 'skel' variable as global but not static, that will lead to linker errors on the final link of the prog_tests binary due to duplicate symbols. Fix a few instances of this. Fixes: b18c1f0aa477 ("bpf: selftest: Adapt sock_fields test to use skel and global variables") Fixes: 9a856cae2217 ("bpf: selftest: Add test_btf_skc_cls_ingress") Signed-off-by: Toke Høiland-Jørgensen <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-09-29xsk: Fix a documentation mistake in xsk_queue.hCiara Loftus1-1/+1
After 'peeking' the ring, the consumer, not the producer, reads the data. Fix this mistake in the comments. Fixes: 15d8c9162ced ("xsk: Add function naming comments and reorder functions") Signed-off-by: Ciara Loftus <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Acked-by: Magnus Karlsson <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-09-29selftests/bpf_iter: Don't fail test due to missing __builtin_btf_type_idToke Høiland-Jørgensen1-3/+9
The new test for task iteration in bpf_iter checks (in do_btf_read()) if it should be skipped due to missing __builtin_btf_type_id. However, this 'skip' verdict is not propagated to the caller, so the parent test will still fail. Fix this by also skipping the rest of the parent test if the skip condition was reached. Fixes: b72091bd4ee4 ("selftests/bpf: Add test for bpf_seq_printf_btf helper") Signed-off-by: Toke Høiland-Jørgensen <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Reviewed-by: Alan Maguire <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-09-29bpf/preload: Make sure Makefile cleans up after itself, and add .gitignoreToke Høiland-Jørgensen2-0/+6
The Makefile in bpf/preload builds a local copy of libbpf, but does not properly clean up after itself. This can lead to subsequent compilation failures, since the feature detection cache is kept around which can lead subsequent detection to fail. Fix this by properly setting clean-files, and while we're at it, also add a .gitignore for the directory to ignore the build artifacts. Fixes: d71fa5c9763c ("bpf: Add kernel module with user mode driver that populates bpffs.") Signed-off-by: Toke Høiland-Jørgensen <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-09-29Merge branch 'selftests/bpf: BTF-based kernel data display'Alexei Starovoitov4-2/+52
Alan Maguire says: ==================== Resolve issues in bpf selftests introduced with BTF-based kernel data display selftests; these are - a warning introduced in snprintf_btf.c; and - compilation failures with old kernels vmlinux.h ==================== Signed-off-by: Alexei Starovoitov <[email protected]>
2020-09-29selftests/bpf: Ensure snprintf_btf/bpf_iter tests compatibility with old ↵Alan Maguire3-1/+51
vmlinux.h Andrii reports that bpf selftests relying on "struct btf_ptr" and BTF_F_* values will not build as vmlinux.h for older kernels will not include "struct btf_ptr" or the BTF_F_* enum values. Undefine and redefine them to work around this. Fixes: b72091bd4ee4 ("selftests/bpf: Add test for bpf_seq_printf_btf helper") Fixes: 076a95f5aff2 ("selftests/bpf: Add bpf_snprintf_btf helper tests") Reported-by: Andrii Nakryiko <[email protected]> Signed-off-by: Alan Maguire <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Acked-by: Andrii Nakryiko <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-09-29selftests/bpf: Fix unused-result warning in snprintf_btf.cAlan Maguire1-1/+1
Daniel reports: + system("ping -c 1 127.0.0.1 > /dev/null"); This generates the following new warning when compiling BPF selftests: [...] EXT-OBJ [test_progs] cgroup_helpers.o EXT-OBJ [test_progs] trace_helpers.o EXT-OBJ [test_progs] network_helpers.o EXT-OBJ [test_progs] testing_helpers.o TEST-OBJ [test_progs] snprintf_btf.test.o /root/bpf-next/tools/testing/selftests/bpf/prog_tests/snprintf_btf.c: In function ‘test_snprintf_btf’: /root/bpf-next/tools/testing/selftests/bpf/prog_tests/snprintf_btf.c:30:2: warning: ignoring return value of ‘system’, declared with attribute warn_unused_result [-Wunused-result] system("ping -c 1 127.0.0.1 > /dev/null"); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [...] Fixes: 076a95f5aff2 ("selftests/bpf: Add bpf_snprintf_btf helper tests") Reported-by: Daniel Borkmann <[email protected]> Signed-off-by: Alan Maguire <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-09-28bpf, selftests: Fix cast to smaller integer type 'int' warning in raw_tpJohn Fastabend1-1/+1
Fix warning in bpf selftests, progs/test_raw_tp_test_run.c:18:10: warning: cast to smaller integer type 'int' from 'struct task_struct *' [-Wpointer-to-int-cast] Change int type cast to long to fix. Discovered with gcc-9 and llvm-11+ where llvm was recent main branch. Fixes: 09d8ad16885ee ("selftests/bpf: Add raw_tp_test_run") Signed-off-by: John Fastabend <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Link: https://lore.kernel.org/bpf/160134424745.11199.13841922833336698133.stgit@john-Precision-5820-Tower
2020-09-28Merge branch 'libbpf: BTF writer APIs'Alexei Starovoitov4-5/+1128
Andrii Nakryiko says: ==================== This patch set introduces a new set of BTF APIs to libbpf that allow to conveniently produce BTF types and strings. These APIs will allow libbpf to do more intrusive modifications of program's BTF (by rewriting it, at least as of right now), which is necessary for the upcoming libbpf static linking. But they are complete and generic, so can be adopted by anyone who has a need to produce BTF type information. One such example outside of libbpf is pahole, which was actually converted to these APIs (locally, pending landing of these changes in libbpf) completely and shows reduction in amount of custom pahole code necessary and brings nice savings in memory usage (about 370MB reduction at peak for my kernel configuration) and even BTF deduplication times (one second reduction, 23.7s -> 22.7s). Memory savings are due to avoiding pahole's own copy of "uncompressed" raw BTF data. Time reduction comes from faster string search and deduplication by relying on hashmap instead of BST used by pahole's own code. Consequently, these APIs are already tested on real-world complicated kernel BTF, but there is also pretty extensive selftest doing extra validations. Selftests in patch #3 add a set of generic ASSERT_{EQ,STREQ,ERR,OK} macros that are useful for writing shorter and less repretitive selftests. I decided to keep them local to that selftest for now, but if they prove to be useful in more contexts we should move them to test_progs.h. And few more (e.g., inequality tests) macros are probably necessary to have a more complete set. Cc: Arnaldo Carvalho de Melo <[email protected]> v2->v3: - resending original patches #7-9 as patches #1-3 due to merge conflict; v1->v2: - fixed comments (John); - renamed btf__append_xxx() into btf__add_xxx() (Alexei); - added btf__find_str() in addition to btf__add_str(); - btf__new_empty() now sets kernel FD to -1 initially. ==================== Signed-off-by: Alexei Starovoitov <[email protected]>
2020-09-28selftests/bpf: Test BTF writing APIsAndrii Nakryiko2-4/+282
Add selftests for BTF writer APIs. Signed-off-by: Andrii Nakryiko <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Acked-by: John Fastabend <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-09-28libbpf: Add btf__str_by_offset() as a more generic variant of name_by_offsetAndrii Nakryiko3-1/+8
BTF strings are used not just for names, they can be arbitrary strings used for CO-RE relocations, line/func infos, etc. Thus "name_by_offset" terminology is too specific and might be misleading. Instead, introduce btf__str_by_offset() API which uses generic string terminology. Signed-off-by: Andrii Nakryiko <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Acked-by: John Fastabend <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-09-28libbpf: Add BTF writing APIsAndrii Nakryiko3-0/+838
Add APIs for appending new BTF types at the end of BTF object. Each BTF kind has either one API of the form btf__add_<kind>(). For types that have variable amount of additional items (struct/union, enum, func_proto, datasec), additional API is provided to emit each such item. E.g., for emitting a struct, one would use the following sequence of API calls: btf__add_struct(...); btf__add_field(...); ... btf__add_field(...); Each btf__add_field() will ensure that the last BTF type is of STRUCT or UNION kind and will automatically increment that type's vlen field. All the strings are provided as C strings (const char *), not a string offset. This significantly improves usability of BTF writer APIs. All such strings will be automatically appended to string section or existing string will be re-used, if such string was already added previously. Each API attempts to do all the reasonable validations, like enforcing non-empty names for entities with required names, proper value bounds, various bit offset restrictions, etc. Type ID validation is minimal because it's possible to emit a type that refers to type that will be emitted later, so libbpf has no way to enforce such cases. User must be careful to properly emit all the necessary types and specify type IDs that will be valid in the finally generated BTF. Each of btf__add_<kind>() APIs return new type ID on success or negative value on error. APIs like btf__add_field() that emit additional items return zero on success and negative value on error. Signed-off-by: Andrii Nakryiko <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Acked-by: John Fastabend <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-09-28net/sched: cls_u32: Replace one-element array with flexible-array memberGustavo A. R. Silva1-4/+4
There is a regular need in the kernel to provide a way to declare having a dynamically sized set of trailing elements in a structure. Kernel code should always use “flexible array members”[1] for these cases. The older style of one-element or zero-length arrays should no longer be used[2]. Refactor the code according to the use of a flexible-array member in struct tc_u_hnode and use the struct_size() helper to calculate the size for the allocations. Commit 5778d39d070b ("net_sched: fix struct tc_u_hnode layout in u32") makes it clear that the code is expected to dynamically allocate divisor + 1 entries for ->ht[] in tc_uhnode. Also, based on other observations, as the piece of code below: 1232 for (h = 0; h <= ht->divisor; h++) { 1233 for (n = rtnl_dereference(ht->ht[h]); 1234 n; 1235 n = rtnl_dereference(n->next)) { 1236 if (tc_skip_hw(n->flags)) 1237 continue; 1238 1239 err = u32_reoffload_knode(tp, n, add, cb, 1240 cb_priv, extack); 1241 if (err) 1242 return err; 1243 } 1244 } we can assume that, in general, the code is actually expecting to allocate that extra space for the one-element array in tc_uhnode, everytime it allocates memory for instances of tc_uhnode or tc_u_common structures. That's the reason for passing '1' as the last argument for struct_size() in the allocation for _root_ht_ and _tp_c_, and 'divisor + 1' in the allocation code for _ht_. [1] https://en.wikipedia.org/wiki/Flexible_array_member [2] https://www.kernel.org/doc/html/v5.9-rc1/process/deprecated.html#zero-length-and-one-element-arrays Tested-by: kernel test robot <[email protected]> Link: https://lore.kernel.org/lkml/5f7062af.z3T9tn9yIPv6h5Ny%[email protected]/ Signed-off-by: Gustavo A. R. Silva <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-09-28qed/qed_ll2: Replace one-element array with flexible-array memberGustavo A. R. Silva2-14/+12
There is a regular need in the kernel to provide a way to declare having a dynamically sized set of trailing elements in a structure. Kernel code should always use “flexible array members”[1] for these cases. The older style of one-element or zero-length arrays should no longer be used[2]. Refactor the code according to the use of a flexible-array member in struct qed_ll2_tx_packet, instead of a one-element array and use the struct_size() helper to calculate the size for the allocations. Commit f5823fe6897c ("qed: Add ll2 option to limit the number of bds per packet") was used as a reference point for these changes. Also, it's important to notice that flexible-array members should occur last in any structure, and structures containing such arrays and that are members of other structures, must also occur last in the containing structure. That's why _cur_completing_packet_ is now moved to the bottom in struct qed_ll2_tx_queue. _descq_mem_ and _cur_send_packet_ are also moved for unification. [1] https://en.wikipedia.org/wiki/Flexible_array_member [2] https://www.kernel.org/doc/html/v5.9-rc1/process/deprecated.html#zero-length-and-one-element-arrays Tested-by: kernel test robot <[email protected]> Link: https://lore.kernel.org/lkml/5f707198.PA1UCZ8MYozYZYAR%[email protected]/ Signed-off-by: Gustavo A. R. Silva <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-09-28stmmac: intel: Adding ref clock 1us tic for LPI cntrRusaimi Amira Ruslan3-0/+19
Adding reference clock (1us tic) for all LPI timer on Intel platforms. The reference clock is derived from ptp clk. This also enables all LPI counter. Signed-off-by: Rusaimi Amira Ruslan <[email protected]> Signed-off-by: Voon Weifeng <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-09-28Merge branch 'net-ipa-miscellaneous-cleanups'David S. Miller7-100/+34
Alex Elder says: ==================== net: ipa: miscellaneous cleanups This series contains some minor cleanups I've been meaning to get around to for a while. The first few remove the definitions of some currently-unused symbols. Several fix some warnings that are reported when the build is done with "W=2". All are simple and have no effect on the operation of the code. ==================== Signed-off-by: David S. Miller <[email protected]>
2020-09-28net: ipa: fix two commentsAlex Elder2-2/+2
In ipa_uc_response_hdlr() a comment uses the wrong function name when it describes where a clock reference is taken. Fix this. Also fix the comment in ipa_uc_response_hdlr() to correctly refer to ipa_uc_setup(), which is where the clock reference described here is taken. Signed-off-by: Alex Elder <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-09-28net: ipa: rename a phandle variableAlex Elder1-4/+4
When "W=2" is supplied to the build command, we get a warning about shadowing a global declaration (of a typedef) for a variable defined in ipa_probe(). Rename the variable to get rid of the warning. Signed-off-by: Alex Elder <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-09-28net: ipa: fix two mild warningsAlex Elder1-3/+2
Fix two spots where a variable "channel_id" is unnecessarily redefined inside loops in "gsi.c". This is warned about if "W=2" is added to the build command. Note that this problem is harmless, so there's no need to backport it as a bugfix. Remove a comment in gsi_init() about waking the system; the GSI interrupt does not wake the system any more. Signed-off-by: Alex Elder <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-09-28net: ipa: share field mask values for GSI general interruptAlex Elder2-16/+7
The GSI general interrupt is managed by three registers: enable; status; and clear. The three registers have same set of field bits at the same locations. Use a common set of field masks for all three registers to avoid duplication. Signed-off-by: Alex Elder <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-09-28net: ipa: share field mask values for GSI global interruptAlex Elder2-17/+8
The GSI global interrupt is managed by three registers: enable; status; and clear. The three registers have same set of field bits at the same locations. Use a common set of field masks for all three registers to avoid duplication. Signed-off-by: Alex Elder <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-09-28net: ipa: share field mask values for GSI interrupt typeAlex Elder2-14/+7
The GSI interrupt type register and interrupt type mask register have the same field bits at the same locations. Use a common set of field masks for both registers rather than essentially duplicating them. The only place the interrupt mask register uses any of these is in gsi_irq_enable(). Signed-off-by: Alex Elder <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-09-28net: ipa: remove unused status structure field masksAlex Elder1-27/+0
Most of the field masks used for fields in a status structure are unused. Remove their definitions; we can add them back again when we actually use them to handle arriving status messages. These are warned about if "W=2" is added to the build command. Signed-off-by: Alex Elder <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-09-28net: ipa: kill unused status exceptionsAlex Elder1-11/+3
Only the deaggregation status exception type is ever actually used. If any other status exception type is reported we basically ignore it, and consume the packet. Remove the unused definitions of status exception type symbols; they can be added back when we actually handle them. Separately, two consecutive if statements test the same condition near the top of ipa_endpoint_suspend_one(). Instead, use a single test with a block that combines the previously-separate lines of code. Signed-off-by: Alex Elder <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-09-28net: ipa: kill unused status opcodesAlex Elder1-5/+1
Three status opcodes are not currently supported. Symbols representing their numeric values are defined but never used. Remove those unused definitions; they can be defined again when they actually get used. Signed-off-by: Alex Elder <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-09-28net: ipa: kill definition of TRE_FLAGS_IEOB_FMASKAlex Elder1-1/+0
In "gsi_trans.c", the field mask TRE_FLAGS_IEOB_FMASK is defined but never used. Although there's no harm in defining this, remove it for now and redefine it at some future date if it becomes needed. This is warned about if "W=2" is added to the build command. Signed-off-by: Alex Elder <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-09-28Merge branch 'bpf: add helpers to support BTF-based kernel'Alexei Starovoitov15-117/+1659
Alan Maguire says: ==================== This series attempts to provide a simple way for BPF programs (and in future other consumers) to utilize BPF Type Format (BTF) information to display kernel data structures in-kernel. The use case this functionality is applied to here is to support a snprintf()-like helper to copy a BTF representation of kernel data to a string, and a BPF seq file helper to display BTF data for an iterator. There is already support in kernel/bpf/btf.c for "show" functionality; the changes here generalize that support from seq-file specific verifier display to the more generic case and add another specific use case; rather than seq_printf()ing the show data, it is copied to a supplied string using a snprintf()-like function. Other future consumers of the show functionality could include a bpf_printk_btf() function which printk()ed the data instead. Oops messaging in particular would be an interesting application for such functionality. The above potential use case hints at a potential reply to a reasonable objection that such typed display should be solved by tracing programs, where the in-kernel tracing records data and the userspace program prints it out. While this is certainly the recommended approach for most cases, I believe having an in-kernel mechanism would be valuable also. Critically in BPF programs it greatly simplifies debugging and tracing of such data to invoking a simple helper. One challenge raised in an earlier iteration of this work - where the BTF printing was implemented as a printk() format specifier - was that the amount of data printed per printk() was large, and other format specifiers were far simpler. Here we sidestep that concern by printing components of the BTF representation as we go for the seq file case, and in the string case the snprintf()-like operation is intended to be a basis for perf event or ringbuf output. The reasons for avoiding bpf_trace_printk are that 1. bpf_trace_printk() strings are restricted in size and cannot display anything beyond trivial data structures; and 2. bpf_trace_printk() is for debugging purposes only. As Alexei suggested, a bpf_trace_puts() helper could solve this in the future but it still would be limited by the 1000 byte limit for traced strings. Default output for an sk_buff looks like this (zeroed fields are omitted): (struct sk_buff){ .transport_header = (__u16)65535, .mac_header = (__u16)65535, .end = (sk_buff_data_t)192, .head = (unsigned char *)0x000000007524fd8b, .data = (unsigned char *)0x000000007524fd8b, .truesize = (unsigned int)768, .users = (refcount_t){ .refs = (atomic_t){ .counter = (int)1, }, }, } Flags can modify aspects of output format; see patch 3 for more details. Changes since v6: - Updated safe data size to 32, object name size to 80. This increases the number of safe copies done, but performance is not a key goal here. WRT name size the largest type name length in bpf-next according to "pahole -s" is 64 bytes, so that still gives room for additional type qualifiers, parens etc within the name limit (Alexei, patch 2) - Remove inlines and converted as many #defines to functions as was possible. In a few cases - btf_show_type_value[s]() specifically - I left these as macros as btf_show_type_value[s]() prepends and appends format strings to the format specifier (in order to include indentation, delimiters etc so a macro makes that simpler (Alexei, patch 2) - Handle btf_resolve_size() error in btf_show_obj_safe() (Alexei, patch 2) - Removed clang loop unroll in BTF snprintf test (Alexei) - switched to using bpf_core_type_id_kernel(type) as suggested by Andrii, and Alexei noted that __builtin_btf_type_id(,1) should be used (patch 4) - Added skip logic if __builtin_btf_type_id is not available (patches 4,8) - Bumped limits on bpf iters to support printing larger structures (Alexei, patch 5) - Updated overflow bpf_iter tests to reflect new iter max size (patch 6) - Updated seq helper to use type id only (Alexei, patch 7) - Updated BTF task iter test to use task struct instead of struct fs_struct since new limits allow a task_struct to be displayed (patch 8) - Fixed E2BIG handling in iter task (Alexei, patch 8) Changes since v5: - Moved btf print prepare into patch 3, type show seq with flags into patch 2 (Alexei, patches 2,3) - Fixed build bot warnings around static declarations and printf attributes - Renamed functions to snprintf_btf/seq_printf_btf (Alexei, patches 3-6) Changes since v4: - Changed approach from a BPF trace event-centric design to one utilizing a snprintf()-like helper and an iter helper (Alexei, patches 3,5) - Added tests to verify BTF output (patch 4) - Added support to tests for verifying BTF type_id-based display as well as type name via __builtin_btf_type_id (Andrii, patch 4). - Augmented task iter tests to cover the BTF-based seq helper. Because a task_struct's BTF-based representation would overflow the PAGE_SIZE limit on iterator data, the "struct fs_struct" (task->fs) is displayed for each task instead (Alexei, patch 6). Changes since v3: - Moved to RFC since the approach is different (and bpf-next is closed) - Rather than using a printk() format specifier as the means of invoking BTF-enabled display, a dedicated BPF helper is used. This solves the issue of printk() having to output large amounts of data using a complex mechanism such as BTF traversal, but still provides a way for the display of such data to be achieved via BPF programs. Future work could include a bpf_printk_btf() function to invoke display via printk() where the elements of a data structure are printk()ed one at a time. Thanks to Petr Mladek, Andy Shevchenko and Rasmus Villemoes who took time to look at the earlier printk() format-specifier-focused version of this and provided feedback clarifying the problems with that approach. - Added trace id to the bpf_trace_printk events as a means of separating output from standard bpf_trace_printk() events, ensuring it can be easily parsed by the reader. - Added bpf_trace_btf() helper tests which do simple verification of the various display options. Changes since v2: - Alexei and Yonghong suggested it would be good to use probe_kernel_read() on to-be-shown data to ensure safety during operation. Safe copy via probe_kernel_read() to a buffer object in "struct btf_show" is used to support this. A few different approaches were explored including dynamic allocation and per-cpu buffers. The downside of dynamic allocation is that it would be done during BPF program execution for bpf_trace_printk()s using %pT format specifiers. The problem with per-cpu buffers is we'd have to manage preemption and since the display of an object occurs over an extended period and in printk context where we'd rather not change preemption status, it seemed tricky to manage buffer safety while considering preemption. The approach of utilizing stack buffer space via the "struct btf_show" seemed like the simplest approach. The stack size of the associated functions which have a "struct btf_show" on their stack to support show operation (btf_type_snprintf_show() and btf_type_seq_show()) stays under 500 bytes. The compromise here is the safe buffer we use is small - 256 bytes - and as a result multiple probe_kernel_read()s are needed for larger objects. Most objects of interest are smaller than this (e.g. "struct sk_buff" is 224 bytes), and while task_struct is a notable exception at ~8K, performance is not the priority for BTF-based display. (Alexei and Yonghong, patch 2). - safe buffer use is the default behaviour (and is mandatory for BPF) but unsafe display - meaning no safe copy is done and we operate on the object itself - is supported via a 'u' option. - pointers are prefixed with 0x for clarity (Alexei, patch 2) - added additional comments and explanations around BTF show code, especially around determining whether objects such zeroed. Also tried to comment safe object scheme used. (Yonghong, patch 2) - added late_initcall() to initialize vmlinux BTF so that it would not have to be initialized during printk operation (Alexei, patch 5) - removed CONFIG_BTF_PRINTF config option as it is not needed; CONFIG_DEBUG_INFO_BTF can be used to gate test behaviour and determining behaviour of type-based printk can be done via retrieval of BTF data; if it's not there BTF was unavailable or broken (Alexei, patches 4,6) - fix bpf_trace_printk test to use vmlinux.h and globals via skeleton infrastructure, removing need for perf events (Andrii, patch 8) Changes since v1: - changed format to be more drgn-like, rendering indented type info along with type names by default (Alexei) - zeroed values are omitted (Arnaldo) by default unless the '0' modifier is specified (Alexei) - added an option to print pointer values without obfuscation. The reason to do this is the sysctls controlling pointer display are likely to be irrelevant in many if not most tracing contexts. Some questions on this in the outstanding questions section below... - reworked printk format specifer so that we no longer rely on format %pT<type> but instead use a struct * which contains type information (Rasmus). This simplifies the printk parsing, makes use more dynamic and also allows specification by BTF id as well as name. - removed incorrect patch which tried to fix dereferencing of resolved BTF info for vmlinux; instead we skip modifiers for the relevant case (array element type determination) (Alexei). - fixed issues with negative snprintf format length (Rasmus) - added test cases for various data structure formats; base types, typedefs, structs, etc. - tests now iterate through all typedef, enum, struct and unions defined for vmlinux BTF and render a version of the target dummy value which is either all zeros or all 0xff values; the idea is this exercises the "skip if zero" and "print everything" cases. - added support in BPF for using the %pT format specifier in bpf_trace_printk() - added BPF tests which ensure %pT format specifier use works (Alexei). ==================== Signed-off-by: Alexei Starovoitov <[email protected]>
2020-09-28selftests/bpf: Add test for bpf_seq_printf_btf helperAlan Maguire2-0/+124
Add a test verifying iterating over tasks and displaying BTF representation of task_struct succeeds. Suggested-by: Alexei Starovoitov <[email protected]> Signed-off-by: Alan Maguire <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-09-28bpf: Add bpf_seq_printf_btf helperAlan Maguire6-2/+56
A helper is added to allow seq file writing of kernel data structures using vmlinux BTF. Its signature is long bpf_seq_printf_btf(struct seq_file *m, struct btf_ptr *ptr, u32 btf_ptr_size, u64 flags); Flags and struct btf_ptr definitions/use are identical to the bpf_snprintf_btf helper, and the helper returns 0 on success or a negative error value. Suggested-by: Alexei Starovoitov <[email protected]> Signed-off-by: Alan Maguire <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-09-28selftests/bpf: Fix overflow tests to reflect iter size increaseAlan Maguire1-7/+7
bpf iter size increase to PAGE_SIZE << 3 means overflow tests assuming page size need to be bumped also. Signed-off-by: Alan Maguire <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-09-28bpf: Bump iter seq size to support BTF representation of large data structuresAlan Maguire1-2/+2
BPF iter size is limited to PAGE_SIZE; if we wish to display BTF-based representations of larger kernel data structures such as task_struct, this will be insufficient. Suggested-by: Alexei Starovoitov <[email protected]> Signed-off-by: Alan Maguire <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]