aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2021-01-14perf: Add build id data in mmap2 eventJiri Olsa2-9/+65
Adding support to carry build id data in mmap2 event. The build id data replaces maj/min/ino/ino_generation fields, which are also used to identify map's binary, so it's ok to replace them with build id data: union { struct { u32 maj; u32 min; u64 ino; u64 ino_generation; }; struct { u8 build_id_size; u8 __reserved_1; u16 __reserved_2; u8 build_id[20]; }; }; Replaced maj/min/ino/ino_generation fields give us size of 24 bytes. We use 20 bytes for build id data, 1 byte for size and rest is unused. There's new misc bit for mmap2 to signal there's build id data in it: #define PERF_RECORD_MISC_MMAP_BUILD_ID (1 << 14) Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/bpf/20210114134044.1418404-4-jolsa@kernel.org
2021-01-14bpf: Add size arg to build_id_parse functionJiri Olsa3-10/+24
It's possible to have other build id types (other than default SHA1). Currently there's also ld support for MD5 build id. Adding size argument to build_id_parse function, that returns (if defined) size of the parsed build id, so we can recognize the build id type. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210114134044.1418404-3-jolsa@kernel.org
2021-01-14bpf: Move stack_map_get_build_id into libJiri Olsa4-140/+153
Moving stack_map_get_build_id into lib with declaration in linux/buildid.h header: int build_id_parse(struct vm_area_struct *vma, unsigned char *build_id); This function returns build id for given struct vm_area_struct. There is no functional change to stack_map_get_build_id function. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20210114134044.1418404-2-jolsa@kernel.org
2021-01-14Merge branch 'Atomics for eBPF'Alexei Starovoitov44-212/+1466
Brendan Jackman says: ==================== There's still one unresolved review comment from John[3] which I will resolve with a followup patch. Differences from v6->v7 [1]: * Fixed riscv build error detected by 0-day robot. Differences from v5->v6 [1]: * Carried Björn Töpel's ack for RISC-V code, plus a couple more acks from Yonhgong. * Doc fixups. * Trivial cleanups. Differences from v4->v5 [1]: * Fixed bogus type casts in interpreter that led to warnings from the 0day robot. * Dropped feature-detection for Clang per Andrii's suggestion in [4]. The selftests will now fail to build unless you have llvm-project commit 286daafd6512. The ENABLE_ATOMICS_TEST macro is still needed to support the no_alu32 tests. * Carried some Acks from John and Yonghong. * Dropped confusing usage of __atomic_exchange from prog_test in favour of __sync_lock_test_and_set. * [Really] got rid of all the forest of instruction macros (BPF_ATOMIC_FETCH_ADD and friends); now there's just BPF_ATOMIC_OP to define all the instructions as we use them in the verifier tests. This makes the atomic ops less special in that API, and I don't think the resulting usage is actually any harder to read. Differences from v3->v4 [1]: * Added one Ack from Yonghong. He acked some other patches but those have now changed non-trivally so I didn't add those acks. * Fixups to commit messages. * Fixed disassembly and comments: first arg to atomic_fetch_* is a pointer. * Improved prog_test efficiency. BPF progs are now all loaded in a single call, then the skeleton is re-used for each subtest. * Dropped use of tools/build/feature in favour of a one-liner in the Makefile. * Dropped the commit that created an emit_neg helper in the x86 JIT. It's not used any more (it wasn't used in v3 either). * Combined all the different filter.h macros (used to be BPF_ATOMIC_ADD, BPF_ATOMIC_FETCH_ADD, BPF_ATOMIC_AND, etc) into just BPF_ATOMIC32 and BPF_ATOMIC64. * Removed some references to BPF_STX_XADD from tools/, samples/ and lib/ that I missed before. Differences from v2->v3 [1]: * More minor fixes and naming/comment changes * Dropped atomic subtract: compilers can implement this by preceding an atomic add with a NEG instruction (which is what the x86 JIT did under the hood anyway). * Dropped the use of -mcpu=v4 in the Clang BPF command-line; there is no longer an architecture version bump. Instead a feature test is added to Kbuild - it builds a source file to check if Clang supports BPF atomics. * Fixed the prog_test so it no longer breaks test_progs-no_alu32. This requires some ifdef acrobatics to avoid complicating the prog_tests model where the same userspace code exercises both the normal and no_alu32 BPF test objects, using the same skeleton header. Differences from v1->v2 [1]: * Fixed mistakes in the netronome driver * Addd sub, add, or, xor operations * The above led to some refactors to keep things readable. (Maybe I should have just waited until I'd implemented these before starting the review...) * Replaced BPF_[CMP]SET | BPF_FETCH with just BPF_[CMP]XCHG, which include the BPF_FETCH flag * Added a bit of documentation. Suggestions welcome for more places to dump this info... The prog_test that's added depends on Clang/LLVM features added by Yonghong in commit 286daafd6512 (was https://reviews.llvm.org/D72184). This only includes a JIT implementation for x86_64 - I don't plan to implement JIT support myself for other architectures. Operations ========== This patchset adds atomic operations to the eBPF instruction set. The use-case that motivated this work was a trivial and efficient way to generate globally-unique cookies in BPF progs, but I think it's obvious that these features are pretty widely applicable. The instructions that are added here can be summarised with this list of kernel operations: * atomic[64]_[fetch_]add * atomic[64]_[fetch_]and * atomic[64]_[fetch_]or * atomic[64]_xchg * atomic[64]_cmpxchg The following are left out of scope for this effort: * 16 and 8 bit operations * Explicit memory barriers Encoding ======== I originally planned to add new values for bpf_insn.opcode. This was rather unpleasant: the opcode space has holes in it but no entire instruction classes[2]. Yonghong Song had a better idea: use the immediate field of the existing STX XADD instruction to encode the operation. This works nicely, without breaking existing programs, because the immediate field is currently reserved-must-be-zero, and extra-nicely because BPF_ADD happens to be zero. Note that this of course makes immediate-source atomic operations impossible. It's hard to imagine a measurable speedup from such instructions, and if it existed it would certainly not benefit x86, which has no support for them. The BPF_OP opcode fields are re-used in the immediate, and an additional flag BPF_FETCH is used to mark instructions that should fetch a pre-modification value from memory. So, BPF_XADD is now called BPF_ATOMIC (the old name is kept to avoid breaking userspace builds), and where we previously had .imm = 0, we now have .imm = BPF_ADD (which is 0). Operands ======== Reg-source eBPF instructions only have two operands, while these atomic operations have up to four. To avoid needing to encode additional operands, then: - One of the input registers is re-used as an output register (e.g. atomic_fetch_add both reads from and writes to the source register). - Where necessary (i.e. for cmpxchg) , R0 is "hard-coded" as one of the operands. This approach also allows the new eBPF instructions to map directly to single x86 instructions. [1] Previous iterations: v1: https://lore.kernel.org/bpf/20201123173202.1335708-1-jackmanb@google.com/ v2: https://lore.kernel.org/bpf/20201127175738.1085417-1-jackmanb@google.com/ v3: https://lore.kernel.org/bpf/X8kN7NA7bJC7aLQI@google.com/ v4: https://lore.kernel.org/bpf/20201207160734.2345502-1-jackmanb@google.com/ v5: https://lore.kernel.org/bpf/20201215121816.1048557-1-jackmanb@google.com/ v6: https://lore.kernel.org/bpf/20210112154235.2192781-1-jackmanb@google.com/ [2] Visualisation of eBPF opcode space: https://gist.github.com/bjackman/00fdad2d5dfff601c1918bc29b16e778 [3] Comment from John about propagating bounds in verifier: https://lore.kernel.org/bpf/5fcf0fbcc8aa8_9ab320853@john-XPS-13-9370.notmuch/ [4] Mail from Andrii about not supporting old Clang in selftests: https://lore.kernel.org/bpf/CAEf4BzYBddPaEzRUs=jaWSo5kbf=LZdb7geAUVj85GxLQztuAQ@mail.gmail.com/ ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2021-01-14bpf: Document new atomic instructionsBrendan Jackman1-0/+31
Document new atomic instructions. Signed-off-by: Brendan Jackman <jackmanb@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210114181751.768687-12-jackmanb@google.com
2021-01-14bpf: Add tests for new BPF atomic operationsBrendan Jackman9-0/+881
The prog_test that's added depends on Clang/LLVM features added by Yonghong in commit 286daafd6512 (was https://reviews.llvm.org/D72184). Note the use of a define called ENABLE_ATOMICS_TESTS: this is used to: - Avoid breaking the build for people on old versions of Clang - Avoid needing separate lists of test objects for no_alu32, where atomics are not supported even if Clang has the feature. The atomics_test.o BPF object is built unconditionally both for test_progs and test_progs-no_alu32. For test_progs, if Clang supports atomics, ENABLE_ATOMICS_TESTS is defined, so it includes the proper test code. Otherwise, progs and global vars are defined anyway, as stubs; this means that the skeleton user code still builds. The atomics_test.o userspace object is built once and used for both test_progs and test_progs-no_alu32. A variable called skip_tests is defined in the BPF object's data section, which tells the userspace object whether to skip the atomics test. Signed-off-by: Brendan Jackman <jackmanb@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210114181751.768687-11-jackmanb@google.com
2021-01-14bpf: Add bitwise atomic instructionsBrendan Jackman6-5/+87
This adds instructions for atomic[64]_[fetch_]and atomic[64]_[fetch_]or atomic[64]_[fetch_]xor All these operations are isomorphic enough to implement with the same verifier, interpreter, and x86 JIT code, hence being a single commit. The main interesting thing here is that x86 doesn't directly support the fetch_ version these operations, so we need to generate a CMPXCHG loop in the JIT. This requires the use of two temporary registers, IIUC it's safe to use BPF_REG_AX and x86's AUX_REG for this purpose. Signed-off-by: Brendan Jackman <jackmanb@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210114181751.768687-10-jackmanb@google.com
2021-01-14bpf: Pull out a macro for interpreting atomic ALU operationsBrendan Jackman1-41/+39
Since the atomic operations that are added in subsequent commits are all isomorphic with BPF_ADD, pull out a macro to avoid the interpreter becoming dominated by lines of atomic-related code. Note that this sacrificies interpreter performance (combining STX_ATOMIC_W and STX_ATOMIC_DW into single switch case means that we need an extra conditional branch to differentiate them) in favour of compact and (relatively!) simple C code. Signed-off-by: Brendan Jackman <jackmanb@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210114181751.768687-9-jackmanb@google.com
2021-01-14bpf: Add instructions for atomic_[cmp]xchgBrendan Jackman8-4/+70
This adds two atomic opcodes, both of which include the BPF_FETCH flag. XCHG without the BPF_FETCH flag would naturally encode atomic_set. This is not supported because it would be of limited value to userspace (it doesn't imply any barriers). CMPXCHG without BPF_FETCH woulud be an atomic compare-and-write. We don't have such an operation in the kernel so it isn't provided to BPF either. There are two significant design decisions made for the CMPXCHG instruction: - To solve the issue that this operation fundamentally has 3 operands, but we only have two register fields. Therefore the operand we compare against (the kernel's API calls it 'old') is hard-coded to be R0. x86 has similar design (and A64 doesn't have this problem). A potential alternative might be to encode the other operand's register number in the immediate field. - The kernel's atomic_cmpxchg returns the old value, while the C11 userspace APIs return a boolean indicating the comparison result. Which should BPF do? A64 returns the old value. x86 returns the old value in the hard-coded register (and also sets a flag). That means return-old-value is easier to JIT, so that's what we use. Signed-off-by: Brendan Jackman <jackmanb@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210114181751.768687-8-jackmanb@google.com
2021-01-14bpf: Add BPF_FETCH field / create atomic_fetch_add instructionBrendan Jackman8-9/+56
The BPF_FETCH field can be set in bpf_insn.imm, for BPF_ATOMIC instructions, in order to have the previous value of the atomically-modified memory location loaded into the src register after an atomic op is carried out. Suggested-by: Yonghong Song <yhs@fb.com> Signed-off-by: Brendan Jackman <jackmanb@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20210114181751.768687-7-jackmanb@google.com
2021-01-14bpf: Move BPF_STX reserved field check into BPF_STX verifier codeBrendan Jackman1-7/+6
I can't find a reason why this code is in resolve_pseudo_ldimm64; since I'll be modifying it in a subsequent commit, tidy it up. Signed-off-by: Brendan Jackman <jackmanb@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20210114181751.768687-6-jackmanb@google.com
2021-01-14bpf: Rename BPF_XADD and prepare to encode other atomics in .immBrendan Jackman35-152/+291
A subsequent patch will add additional atomic operations. These new operations will use the same opcode field as the existing XADD, with the immediate discriminating different operations. In preparation, rename the instruction mode BPF_ATOMIC and start calling the zero immediate BPF_ADD. This is possible (doesn't break existing valid BPF progs) because the immediate field is currently reserved MBZ and BPF_ADD is zero. All uses are removed from the tree but the BPF_XADD definition is kept around to avoid breaking builds for people including kernel headers. Signed-off-by: Brendan Jackman <jackmanb@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Björn Töpel <bjorn.topel@gmail.com> Link: https://lore.kernel.org/bpf/20210114181751.768687-5-jackmanb@google.com
2021-01-14bpf: x86: Factor out a lookup table for some ALU opcodesBrendan Jackman1-18/+15
A later commit will need to lookup a subset of these opcodes. To avoid duplicating code, pull out a table. The shift opcodes won't be needed by that later commit, but they're already duplicated, so fold them into the table anyway. Signed-off-by: Brendan Jackman <jackmanb@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20210114181751.768687-4-jackmanb@google.com
2021-01-14bpf: x86: Factor out emission of REX byteBrendan Jackman1-16/+23
The JIT case for encoding atomic ops is about to get more complicated. In order to make the review & resulting code easier, let's factor out some shared helpers. Signed-off-by: Brendan Jackman <jackmanb@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20210114181751.768687-3-jackmanb@google.com
2021-01-14bpf: x86: Factor out emission of ModR/M for *(reg + off)Brendan Jackman1-18/+25
The case for JITing atomics is about to get more complicated. Let's factor out some common code to make the review and result more readable. NB the atomics code doesn't yet use the new helper - a subsequent patch will add its use as a side-effect of other changes. Signed-off-by: Brendan Jackman <jackmanb@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20210114181751.768687-2-jackmanb@google.com
2021-01-13tools/bpftool: Add -Wall when building BPF programsIan Rogers1-1/+1
No additional warnings are generated by enabling this, but having it enabled will help avoid regressions. Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210113223609.3358812-2-irogers@google.com
2021-01-13bpf, libbpf: Avoid unused function warning on bpf_tail_call_staticIan Rogers1-1/+1
Add inline to __always_inline making it match the linux/compiler.h. Adding this avoids an unused function warning on bpf_tail_call_static when compining with -Wall. Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210113223609.3358812-1-irogers@google.com
2021-01-13Merge branch 'selftests/bpf: Some build fixes'Andrii Nakryiko1-17/+41
Jean-Philippe Brucker says: ==================== A few fixes for cross-building the sefltests out of tree. This will enable wider automated testing on various Arm hardware. Changes since v1 [1]: * Use wildcard in patch 5 * Move the MAKE_DIRS declaration in patch 1 [1] https://lore.kernel.org/bpf/20210112135959.649075-1-jean-philippe@linaro.org/ ==================== Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2021-01-13selftests/bpf: Install btf_dump test casesJean-Philippe Brucker1-1/+2
The btf_dump test cannot access the original source files for comparison when running the selftests out of tree, causing several failures: awk: btf_dump_test_case_syntax.c: No such file or directory ... Add those files to $(TEST_FILES) to have "make install" pick them up. Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210113163319.1516382-6-jean-philippe@linaro.org
2021-01-13selftests/bpf: Fix installation of urandom_readJean-Philippe Brucker1-1/+1
For out-of-tree builds, $(TEST_CUSTOM_PROGS) require the $(OUTPUT) prefix, otherwise the kselftest lib doesn't know how to install them: rsync: [sender] link_stat "tools/testing/selftests/bpf/urandom_read" failed: No such file or directory (2) Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210113163319.1516382-5-jean-philippe@linaro.org
2021-01-13selftests/bpf: Move generated test files to $(TEST_GEN_FILES)Jean-Philippe Brucker1-4/+3
During an out-of-tree build, attempting to install the $(TEST_FILES) into the $(OUTPUT) directory fails, because the objects were already generated into $(OUTPUT): rsync: [sender] link_stat "tools/testing/selftests/bpf/test_lwt_ip_encap.o" failed: No such file or directory (2) rsync: [sender] link_stat "tools/testing/selftests/bpf/test_tc_edt.o" failed: No such file or directory (2) Use $(TEST_GEN_FILES) instead. Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210113163319.1516382-4-jean-philippe@linaro.org
2021-01-13selftests/bpf: Fix out-of-tree buildJean-Philippe Brucker1-1/+1
When building out-of-tree, the .skel.h files are generated into the $(OUTPUT) directory, rather than $(CURDIR). Add $(OUTPUT) to the include paths. Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210113163319.1516382-3-jean-philippe@linaro.org
2021-01-13selftests/bpf: Enable cross-buildingJean-Philippe Brucker1-11/+35
Build bpftool and resolve_btfids using the host toolchain when cross-compiling, since they are executed during build to generate the selftests. Add a host build directory in order to build both host and target version of libbpf. Build host tools using $(HOSTCC) defined in Makefile.include. Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210113163319.1516382-2-jean-philippe@linaro.org
2021-01-12Merge branch 'Support kernel module ksym variables'Alexei Starovoitov14-84/+305
Andrii Nakryiko says: ==================== Add support for using kernel module global variables (__ksym externs in BPF program). BPF verifier will now support ldimm64 with src_reg=BPF_PSEUDO_BTF_ID and non-zero insn[1].imm field, specifying module BTF's FD. In such case, module BTF object, similarly to BPF maps referenced from ldimm64 with src_reg=BPF_PSEUDO_MAP_FD, will be recorded in bpf_progran's auxiliary data and refcnt will be increased for both BTF object itself and its kernel module. This makes sure kernel module won't be unloaded from under active attached BPF program. These refcounts will be dropped when BPF program is unloaded. New selftest validates all this is working as intended. bpf_testmod.ko is extended with per-CPU variable. Selftests expects the latest pahole changes (soon to be released as v1.20) to generate per-CPU variable BTF info for kernel module. v2->v3: - added comments, addressed feedack (Yonghong, Hao); v1->v2: - fixed few compiler warnings, posted as separate pre-patches; rfc->v1: - use sys_membarrier(MEMBARRIER_CMD_GLOBAL) (Alexei). Cc: Hao Luo <haoluo@google.com> ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2021-01-12selftests/bpf: Test kernel module ksym externsAndrii Nakryiko3-0/+60
Add per-CPU variable to bpf_testmod.ko and use those from new selftest to validate it works end-to-end. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Acked-by: Hao Luo <haoluo@google.com> Link: https://lore.kernel.org/bpf/20210112075520.4103414-8-andrii@kernel.org
2021-01-12libbpf: Support kernel module ksym externsAndrii Nakryiko1-18/+32
Add support for searching for ksym externs not just in vmlinux BTF, but across all module BTFs, similarly to how it's done for CO-RE relocations. Kernels that expose module BTFs through sysfs are assumed to support new ldimm64 instruction extension with BTF FD provided in insn[1].imm field, so no extra feature detection is performed. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Acked-by: Hao Luo <haoluo@google.com> Link: https://lore.kernel.org/bpf/20210112075520.4103414-7-andrii@kernel.org
2021-01-12bpf: Support BPF ksym variables in kernel modulesAndrii Nakryiko6-30/+194
Add support for directly accessing kernel module variables from BPF programs using special ldimm64 instructions. This functionality builds upon vmlinux ksym support, but extends ldimm64 with src_reg=BPF_PSEUDO_BTF_ID to allow specifying kernel module BTF's FD in insn[1].imm field. During BPF program load time, verifier will resolve FD to BTF object and will take reference on BTF object itself and, for module BTFs, corresponding module as well, to make sure it won't be unloaded from under running BPF program. The mechanism used is similar to how bpf_prog keeps track of used bpf_maps. One interesting change is also in how per-CPU variable is determined. The logic is to find .data..percpu data section in provided BTF, but both vmlinux and module each have their own .data..percpu entries in BTF. So for module's case, the search for DATASEC record needs to look at only module's added BTF types. This is implemented with custom search function. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Acked-by: Hao Luo <haoluo@google.com> Link: https://lore.kernel.org/bpf/20210112075520.4103414-6-andrii@kernel.org
2021-01-12selftests/bpf: Sync RCU before unloading bpf_testmodAndrii Nakryiko3-33/+12
If some of the subtests use module BTFs through ksyms, they will cause bpf_prog to take a refcount on bpf_testmod module, which will prevent it from successfully unloading. Module's refcnt is decremented when bpf_prog is freed, which generally happens in RCU callback. So we need to trigger syncronize_rcu() in the kernel, which can be achieved nicely with membarrier(MEMBARRIER_CMD_SHARED) or membarrier(MEMBARRIER_CMD_GLOBAL) syscall. So do that in kernel_sync_rcu() and make it available to other test inside the test_progs. This synchronize_rcu() is called before attempting to unload bpf_testmod. Fixes: 9f7fa225894c ("selftests/bpf: Add bpf_testmod kernel module for testing") Suggested-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Acked-by: Hao Luo <haoluo@google.com> Link: https://lore.kernel.org/bpf/20210112075520.4103414-5-andrii@kernel.org
2021-01-12bpf: Declare __bpf_free_used_maps() unconditionallyAndrii Nakryiko1-2/+3
__bpf_free_used_maps() is always defined in kernel/bpf/core.c, while include/linux/bpf.h is guarding it behind CONFIG_BPF_SYSCALL. Move it out of that guard region and fix compiler warning. Fixes: a2ea07465c8d ("bpf: Fix missing prog untrack in release_maps") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210112075520.4103414-4-andrii@kernel.org
2021-01-12bpf: Avoid warning when re-casting __bpf_call_base into __bpf_call_base_argsAndrii Nakryiko1-1/+1
BPF interpreter uses extra input argument, so re-casts __bpf_call_base into __bpf_call_base_args. Avoid compiler warning about incompatible function prototypes by casting to void * first. Fixes: 1ea47e01ad6e ("bpf: add support for bpf_call to interpreter") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210112075520.4103414-3-andrii@kernel.org
2021-01-12bpf: Add bpf_patch_call_args prototype to include/linux/bpf.hAndrii Nakryiko1-0/+3
Add bpf_patch_call_args() prototype. This function is called from BPF verifier and only if CONFIG_BPF_JIT_ALWAYS_ON is not defined. This fixes compiler warning about missing prototype in some kernel configurations. Fixes: 1ea47e01ad6e ("bpf: add support for bpf_call to interpreter") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210112075520.4103414-2-andrii@kernel.org
2021-01-12bpf: Extend bind v4/v6 selftests for mark/prio/bindtoifindexDaniel Borkmann2-8/+76
Extend existing cgroup bind4/bind6 tests to add coverage for setting and retrieving SO_MARK, SO_PRIORITY and SO_BINDTOIFINDEX at the bind hook. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/384fdc90e5fa83f8335a37aa90fa2f5f3661929c.1610406333.git.daniel@iogearbox.net
2021-01-12bpf: Allow to retrieve sol_socket opts from sock_addr progsDaniel Borkmann1-2/+23
The _bpf_setsockopt() is able to set some of the SOL_SOCKET level options, however, _bpf_getsockopt() has little support to actually retrieve them. This small patch adds few misc options such as SO_MARK, SO_PRIORITY and SO_BINDTOIFINDEX. For the latter getter and setter are added. The mark and priority in particular allow to retrieve the options from BPF cgroup hooks to then implement custom behavior / settings on the syscall hooks compared to other sockets that stick to the defaults, for example. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/cba44439b801e5ddc1170e5be787f4dc93a2d7f9.1610406333.git.daniel@iogearbox.net
2021-01-12bpf: Fix a verifier message for alloc size helper argBrendan Jackman1-1/+1
The error message here is misleading, the argument will be rejected unless it is a known constant. Signed-off-by: Brendan Jackman <jackmanb@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yhs@fb.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210112123913.2016804-1-jackmanb@google.com
2021-01-12bpf: Clarify return value of probe str helpersBrendan Jackman2-10/+10
When the buffer is too small to contain the input string, these helpers return the length of the buffer, not the length of the original string. This tries to make the docs totally clear about that, since "the length of the [copied ]string" could also refer to the length of the input. Signed-off-by: Brendan Jackman <jackmanb@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: KP Singh <kpsingh@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210112123422.2011234-1-jackmanb@google.com
2021-01-08libbpf: Clarify kernel type use with USER variants of CORE reading macrosAndrii Nakryiko1-6/+39
Add comments clarifying that USER variants of CO-RE reading macro are still only going to work with kernel types, defined in kernel or kernel module BTF. This should help preventing invalid use of those macro to read user-defined types (which doesn't work with CO-RE). Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210108194408.3468860-1-andrii@kernel.org
2021-01-08selftests/bpf: Remove duplicate include in test_lsmMenglong Dong1-1/+0
'unistd.h' included in 'selftests/bpf/prog_tests/test_lsm.c' is duplicated. Signed-off-by: Menglong Dong <dong.menglong@zte.com.cn> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20210105152047.6070-1-dong.menglong@zte.com.cn Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2021-01-08net, xdp: Introduce xdp_prepare_buff utility routineLorenzo Bianconi28-152/+105
Introduce xdp_prepare_buff utility routine to initialize per-descriptor xdp_buff fields (e.g. xdp_buff pointers). Rely on xdp_prepare_buff() in all XDP capable drivers. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Shay Agroskin <shayagr@amazon.com> Acked-by: Martin Habets <habetsm.xilinx@gmail.com> Acked-by: Camelia Groza <camelia.groza@nxp.com> Acked-by: Marcin Wojtas <mw@semihalf.com> Link: https://lore.kernel.org/bpf/45f46f12295972a97da8ca01990b3e71501e9d89.1608670965.git.lorenzo@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2021-01-08net, xdp: Introduce xdp_init_buff utility routineLorenzo Bianconi28-72/+68
Introduce xdp_init_buff utility routine to initialize xdp_buff fields const over NAPI iterations (e.g. frame_sz or rxq pointer). Rely on xdp_init_buff in all XDP capable drivers. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Shay Agroskin <shayagr@amazon.com> Acked-by: Martin Habets <habetsm.xilinx@gmail.com> Acked-by: Camelia Groza <camelia.groza@nxp.com> Acked-by: Marcin Wojtas <mw@semihalf.com> Link: https://lore.kernel.org/bpf/7f8329b6da1434dc2b05a77f2e800b29628a8913.1608670965.git.lorenzo@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2021-01-08bpf: Replace fput with sockfd_put in sock mapZheng Yongjun1-1/+1
The function sockfd_lookup uses fget on the value that is stored in the file field of the returned structure, so fput should ultimately be applied to this value. This can be done directly, but it seems better to use the specific macro sockfd_put, which does the same thing. The cleanup was done using the following semantic patch: (http://www.emn.fr/x-info/coccinelle/) // <smpl> @@ expression s; @@ s = sockfd_lookup(...) ... + sockfd_put(s); ?- fput(s->file); // </smpl> Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20201229134834.22962-1-zhengyongjun3@huawei.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2021-01-08bpf: Remove unnecessary <argp.h> include from preload/iteratorsLeah Neukirchen1-1/+1
This program does not use argp (which is a glibcism). Instead include <errno.h> directly, which was pulled in by <argp.h>. Signed-off-by: Leah Neukirchen <leah@vuxu.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20201216100306.30942-1-leah@vuxu.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2021-01-08selftests/bpf: Add tests for user- and non-CO-RE BPF_CORE_READ() variantsAndrii Nakryiko2-0/+114
Add selftests validating that newly added variations of BPF_CORE_READ(), for use with user-space addresses and for non-CO-RE reads, work as expected. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20201218235614.2284956-4-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2021-01-08libbpf: Add non-CO-RE variants of BPF_CORE_READ() macro familyAndrii Nakryiko1-0/+38
BPF_CORE_READ(), in addition to handling CO-RE relocations, also allows much nicer way to read data structures with nested pointers. Instead of writing a sequence of bpf_probe_read() calls to follow links, one can just write BPF_CORE_READ(a, b, c, d) to effectively do a->b->c->d read. This is a welcome ability when porting BCC code, which (in most cases) allows exactly the intuitive a->b->c->d variant. This patch adds non-CO-RE variants of BPF_CORE_READ() family of macros for cases where CO-RE is not supported (e.g., old kernels). In such cases, the property of shortening a sequence of bpf_probe_read()s to a simple BPF_PROBE_READ(a, b, c, d) invocation is still desirable, especially when porting BCC code to libbpf. Yet, no CO-RE relocation is going to be emitted. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20201218235614.2284956-3-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2021-01-08libbpf: Add user-space variants of BPF_CORE_READ() family of macrosAndrii Nakryiko1-39/+59
Add BPF_CORE_READ_USER(), BPF_CORE_READ_USER_STR() and their _INTO() variations to allow reading CO-RE-relocatable kernel data structures from the user-space. One of such cases is reading input arguments of syscalls, while reaping the benefits of CO-RE relocations w.r.t. handling 32/64 bit conversions and handling missing/new fields in UAPI data structs. Suggested-by: Gilad Reti <gilad.reti@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20201218235614.2284956-2-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2021-01-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski112-477/+1303
Trivial conflict in CAN on file rename. Conflicts: drivers/net/can/m_can/tcan4x5x-core.c Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-08Merge tag 'net-5.11-rc3-2' of ↵Linus Torvalds66-309/+746
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull more networking fixes from Jakub Kicinski: "Slightly lighter pull request to get back into the Thursday cadence. Current release - always broken: - can: mcp251xfd: fix Tx/Rx ring buffer driver race conditions - dsa: hellcreek: fix led_classdev build errors Previous releases - regressions: - ipv6: fib: flush exceptions when purging route to avoid netdev reference leak - ip_tunnels: fix pmtu check in nopmtudisc mode - ip: always refragment ip defragmented packets to avoid MTU issues when forwarding through tunnels, correct "packet too big" message is prohibitively tricky to generate - s390/qeth: fix locking for discipline setup / removal and during recovery to prevent both deadlocks and races - mlx5: Use port_num 1 instead of 0 when delete a RoCE address Previous releases - always broken: - cdc_ncm: correct overhead calculation in delayed_ndp_size to prevent out of bound accesses with Huawei 909s-120 LTE module - fix stmmac dwmac-sun8i suspend/resume: - PHY being left powered off - MAC syscon configuration being reset - reference to the reset controller being improperly dropped - qrtr: fix null-ptr-deref in qrtr_ns_remove - can: tcan4x5x: fix bittiming const, use common bittiming from m_can driver - mlx5e: CT: Use per flow counter when CT flow accounting is enabled - mlx5e: Fix SWP offsets when vlan inserted by driver Misc: - bpf: Fix a task_iter bug caused by a bpf -> net merge conflict resolution And the usual many fixes to various error paths" * tag 'net-5.11-rc3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (69 commits) net: dsa: lantiq_gswip: Exclude RMII from modes that report 1 GbE s390/qeth: fix L2 header access in qeth_l3_osa_features_check() s390/qeth: fix locking for discipline setup / removal s390/qeth: fix deadlock during recovery selftests: fib_nexthops: Fix wrong mausezahn invocation nexthop: Bounce NHA_GATEWAY in FDB nexthop groups nexthop: Unlink nexthop group entry in error path nexthop: Fix off-by-one error in error path octeontx2-af: fix memory leak of lmac and lmac->name chtls: Fix chtls resources release sequence chtls: Added a check to avoid NULL pointer dereference chtls: Replace skb_dequeue with skb_peek chtls: Avoid unnecessary freeing of oreq pointer chtls: Fix panic when route to peer not configured chtls: Remove invalid set_tcb call chtls: Fix hardware tid leak net: ip: always refragment ip defragmented packets net: fix pmtu check in nopmtudisc mode selftests: netfilter: add selftest for ipip pmtu discovery with enabled connection tracking docs: octeontx2: tune rst markup ...
2021-01-08Merge branch 'linus' of ↵Linus Torvalds2-1/+3
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto fixes from Herbert Xu: "This fixes a functional bug in arm/chacha-neon as well as a potential buffer overflow in ecdh" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: ecdh - avoid buffer overflow in ecdh_set_secret() crypto: arm/chacha-neon - add missing counter increment
2021-01-08poll: fix performance regression due to out-of-line __put_user()Linus Torvalds1-3/+11
The kernel test robot reported a -5.8% performance regression on the "poll2" test of will-it-scale, and bisected it to commit d55564cfc222 ("x86: Make __put_user() generate an out-of-line call"). I didn't expect an out-of-line __put_user() to matter, because no normal core code should use that non-checking legacy version of user access any more. But I had overlooked the very odd poll() usage, which does a __put_user() to update the 'revents' values of the poll array. Now, Al Viro correctly points out that instead of updating just the 'revents' field, it would be much simpler to just copy the _whole_ pollfd entry, and then we could just use "copy_to_user()" on the whole array of entries, the same way we use "copy_from_user()" a few lines earlier to get the original values. But that is not what we've traditionally done, and I worry that threaded applications might be concurrently modifying the other fields of the pollfd array. So while Al's suggestion is simpler - and perhaps worth trying in the future - this instead keeps the "just update revents" model. To fix the performance regression, use the modern "unsafe_put_user()" instead of __put_user(), with the proper "user_write_access_begin()" guarding in place. This improves code generation enormously. Link: https://lore.kernel.org/lkml/20210107134723.GA28532@xsang-OptiPlex-9020/ Reported-by: kernel test robot <oliver.sang@intel.com> Tested-by: Oliver Sang <oliver.sang@intel.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: David Laight <David.Laight@aculab.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-01-08Revert "init/console: Use ttynull as a fallback when there is no console"Petr Mladek5-30/+18
This reverts commit 757055ae8dedf5333af17b3b5b4b70ba9bc9da4e. The commit caused that ttynull was used as the default console on several systems[1][2][3]. As a result, the console was blank even when a better alternative existed. It happened when there was no console configured on the command line and ttynull_init() was the first initcall calling register_console(). Or it happened when /dev/ did not exist when console_on_rootfs() was called. It was not able to open /dev/console even though a console driver was registered. It tried to add ttynull console but it obviously did not help. But ttynull became the preferred console and was used by /dev/console when it was available later. The commit tried to fix a historical problem that have been there for ages. The primary motivation was the commit 3cffa06aeef7ece30f6 ("printk/console: Allow to disable console output by using console="" or console=null"). It provided a clean solution for a workaround that was widely used and worked only by chance. This revert causes that the console="" or console=null command line options will again work only by chance. These options will cause that a particular console will be preferred and the default (tty) ones will not get enabled. There will be no console registered at all. As a result there won't be stdin, stdout, and stderr for the init process. But it worked exactly this way even before. The proper solution has to fulfill many conditions: + Register ttynull only when explicitly required or as the ultimate fallback. + ttynull should get associated with /dev/console but it must not become preferred console when used as a fallback. Especially, it must still be possible to replace it by a better console later. Such a change requires clean up of the register_console() code. Otherwise, it would be even harder to follow. Especially, the use of has_preferred_console and CON_CONSDEV flag is tricky. The clean up is risky. The ordering of consoles is not well defined. And any changes tend to break existing user settings. Do the revert at the least risky solution for now. [1] https://lore.kernel.org/linux-kselftest/20201221144302.GR4077@smile.fi.intel.com/ [2] https://lore.kernel.org/lkml/d2a3b3c0-e548-7dd1-730f-59bc5c04e191@synopsys.com/ [3] https://patchwork.ozlabs.org/project/linux-um/patch/20210105120128.10854-1-thomas@m3y3r.de/ Reported-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reported-by: Vineet Gupta <vgupta@synopsys.com> Reported-by: Thomas Meyer <thomas@m3y3r.de> Signed-off-by: Petr Mladek <pmladek@suse.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-01-07Merge tag 'mlx5-fixes-2021-01-07' of ↵Jakub Kicinski13-66/+122
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5 fixes 2021-01-07 * tag 'mlx5-fixes-2021-01-07' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux: net/mlx5e: Fix memleak in mlx5e_create_l2_table_groups net/mlx5e: Fix two double free cases net/mlx5: Release devlink object if adev fails net/mlx5e: ethtool, Fix restriction of autoneg with 56G net/mlx5e: In skb build skip setting mark in switchdev mode net/mlx5: E-Switch, fix changing vf VLANID net/mlx5e: Fix SWP offsets when vlan inserted by driver net/mlx5e: CT: Use per flow counter when CT flow accounting is enabled net/mlx5: Use port_num 1 instead of 0 when delete a RoCE address net/mlx5e: Add missing capability check for uplink follow net/mlx5: Check if lag is supported before creating one ==================== Link: https://lore.kernel.org/r/20210107202845.470205-1-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>