Age | Commit message (Collapse) | Author | Files | Lines | |
---|---|---|---|---|---|
2020-07-21 | bpf, riscv: Use compressed instructions in the rv64 JIT | Luke Nelson | 1 | -134/+147 | |
This patch uses the RVC support and encodings from bpf_jit.h to optimize the rv64 jit. The optimizations work by replacing emit(rv_X(...)) with a call to a helper function emit_X, which will emit a compressed version of the instruction when possible, and when RVC is enabled. The JIT continues to pass all tests in lib/test_bpf.c, and introduces no new failures to test_verifier; both with and without RVC being enabled. Most changes are straightforward replacements of emit(rv_X(...), ctx) with emit_X(..., ctx), with the following exceptions bearing mention; * Change emit_imm to sign-extend the value in "lower", since the checks for RVC (and the instructions themselves) treat the value as signed. Otherwise, small negative immediates will not be recognized as encodable using an RVC instruction. For example, without this change, emit_imm(rd, -1, ctx) would cause lower to become 4095, which is not a 6b int even though a "c.li rd, -1" instruction suffices. * For {BPF_MOV,BPF_ADD} BPF_X, drop using addiw,addw in the 32-bit cases since the values are zero-extended into the upper 32 bits in the following instructions anyways, and the addition commutes with zero-extension. (BPF_SUB BPF_X must still use subw since subtraction does not commute with zero-extension.) This patch avoids optimizing branches and jumps to use RVC instructions since surrounding code often makes assumptions about the sizes of emitted instructions. Optimizing these will require changing these functions (e.g., emit_branch) to dynamically compute jump offsets. The following are examples of the JITed code for the verifier selftest "direct packet read test#3 for CGROUP_SKB OK", without and with RVC enabled, respectively. The former uses 178 bytes, and the latter uses 112, for a ~37% reduction in code size for this example. Without RVC: 0: 02000813 addi a6,zero,32 4: fd010113 addi sp,sp,-48 8: 02813423 sd s0,40(sp) c: 02913023 sd s1,32(sp) 10: 01213c23 sd s2,24(sp) 14: 01313823 sd s3,16(sp) 18: 01413423 sd s4,8(sp) 1c: 03010413 addi s0,sp,48 20: 03056683 lwu a3,48(a0) 24: 02069693 slli a3,a3,0x20 28: 0206d693 srli a3,a3,0x20 2c: 03456703 lwu a4,52(a0) 30: 02071713 slli a4,a4,0x20 34: 02075713 srli a4,a4,0x20 38: 03856483 lwu s1,56(a0) 3c: 02049493 slli s1,s1,0x20 40: 0204d493 srli s1,s1,0x20 44: 03c56903 lwu s2,60(a0) 48: 02091913 slli s2,s2,0x20 4c: 02095913 srli s2,s2,0x20 50: 04056983 lwu s3,64(a0) 54: 02099993 slli s3,s3,0x20 58: 0209d993 srli s3,s3,0x20 5c: 09056a03 lwu s4,144(a0) 60: 020a1a13 slli s4,s4,0x20 64: 020a5a13 srli s4,s4,0x20 68: 00900313 addi t1,zero,9 6c: 006a7463 bgeu s4,t1,0x74 70: 00000a13 addi s4,zero,0 74: 02d52823 sw a3,48(a0) 78: 02e52a23 sw a4,52(a0) 7c: 02952c23 sw s1,56(a0) 80: 03252e23 sw s2,60(a0) 84: 05352023 sw s3,64(a0) 88: 00000793 addi a5,zero,0 8c: 02813403 ld s0,40(sp) 90: 02013483 ld s1,32(sp) 94: 01813903 ld s2,24(sp) 98: 01013983 ld s3,16(sp) 9c: 00813a03 ld s4,8(sp) a0: 03010113 addi sp,sp,48 a4: 00078513 addi a0,a5,0 a8: 00008067 jalr zero,0(ra) With RVC: 0: 02000813 addi a6,zero,32 4: 7179 c.addi16sp sp,-48 6: f422 c.sdsp s0,40(sp) 8: f026 c.sdsp s1,32(sp) a: ec4a c.sdsp s2,24(sp) c: e84e c.sdsp s3,16(sp) e: e452 c.sdsp s4,8(sp) 10: 1800 c.addi4spn s0,sp,48 12: 03056683 lwu a3,48(a0) 16: 1682 c.slli a3,0x20 18: 9281 c.srli a3,0x20 1a: 03456703 lwu a4,52(a0) 1e: 1702 c.slli a4,0x20 20: 9301 c.srli a4,0x20 22: 03856483 lwu s1,56(a0) 26: 1482 c.slli s1,0x20 28: 9081 c.srli s1,0x20 2a: 03c56903 lwu s2,60(a0) 2e: 1902 c.slli s2,0x20 30: 02095913 srli s2,s2,0x20 34: 04056983 lwu s3,64(a0) 38: 1982 c.slli s3,0x20 3a: 0209d993 srli s3,s3,0x20 3e: 09056a03 lwu s4,144(a0) 42: 1a02 c.slli s4,0x20 44: 020a5a13 srli s4,s4,0x20 48: 4325 c.li t1,9 4a: 006a7363 bgeu s4,t1,0x50 4e: 4a01 c.li s4,0 50: d914 c.sw a3,48(a0) 52: d958 c.sw a4,52(a0) 54: dd04 c.sw s1,56(a0) 56: 03252e23 sw s2,60(a0) 5a: 05352023 sw s3,64(a0) 5e: 4781 c.li a5,0 60: 7422 c.ldsp s0,40(sp) 62: 7482 c.ldsp s1,32(sp) 64: 6962 c.ldsp s2,24(sp) 66: 69c2 c.ldsp s3,16(sp) 68: 6a22 c.ldsp s4,8(sp) 6a: 6145 c.addi16sp sp,48 6c: 853e c.mv a0,a5 6e: 8082 c.jr ra Signed-off-by: Luke Nelson <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Cc: Björn Töpel <[email protected]> Link: https://lore.kernel.org/bpf/[email protected] | |||||
2020-07-21 | bpf, riscv: Modify JIT ctx to support compressed instructions | Luke Nelson | 1 | -6/+6 | |
This patch makes the necessary changes to struct rv_jit_context and to bpf_int_jit_compile to support compressed riscv (RVC) instructions in the BPF JIT. It changes the JIT image to be u16 instead of u32, since RVC instructions are 2 bytes as opposed to 4. It also changes ctx->offset and ctx->ninsns to refer to 2-byte instructions rather than 4-byte ones. The riscv PC is required to be 16-bit aligned with or without RVC, so this is sufficient to refer to any valid riscv offset. The code for computing jump offsets in bytes is updated accordingly, and factored into a new "ninsns_rvoff" function to simplify the code. Signed-off-by: Luke Nelson <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Link: https://lore.kernel.org/bpf/[email protected] | |||||
2020-05-06 | bpf, riscv: Optimize BPF_JSET BPF_K using andi on RV64 | Luke Nelson | 1 | -8/+19 | |
This patch optimizes BPF_JSET BPF_K by using a RISC-V andi instruction when the BPF immediate fits in 12 bits, instead of first loading the immediate to a temporary register. Examples of generated code with and without this optimization: BPF_JMP_IMM(BPF_JSET, R1, 2, 1) without optimization: 20: li t1,2 24: and t1,a0,t1 28: bnez t1,0x30 BPF_JMP_IMM(BPF_JSET, R1, 2, 1) with optimization: 20: andi t1,a0,2 24: bnez t1,0x2c BPF_JMP32_IMM(BPF_JSET, R1, 2, 1) without optimization: 20: li t1,2 24: mv t2,a0 28: slli t2,t2,0x20 2c: srli t2,t2,0x20 30: slli t1,t1,0x20 34: srli t1,t1,0x20 38: and t1,t2,t1 3c: bnez t1,0x44 BPF_JMP32_IMM(BPF_JSET, R1, 2, 1) with optimization: 20: andi t1,a0,2 24: bnez t1,0x2c In these examples, because the upper 32 bits of the sign-extended immediate are 0, BPF_JMP BPF_JSET and BPF_JMP32 BPF_JSET are equivalent and therefore the JIT produces identical code for them. Co-developed-by: Xi Wang <[email protected]> Signed-off-by: Xi Wang <[email protected]> Signed-off-by: Luke Nelson <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Reviewed-by: Björn Töpel <[email protected]> Acked-by: Björn Töpel <[email protected]> Link: https://lore.kernel.org/bpf/[email protected] | |||||
2020-05-06 | bpf, riscv: Optimize BPF_JMP BPF_K when imm == 0 on RV64 | Luke Nelson | 1 | -5/+10 | |
This patch adds an optimization to BPF_JMP (32- and 64-bit) BPF_K for when the BPF immediate is zero. When the immediate is zero, the code can directly use the RISC-V zero register instead of loading a zero immediate to a temporary register first. Co-developed-by: Xi Wang <[email protected]> Signed-off-by: Xi Wang <[email protected]> Signed-off-by: Luke Nelson <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Reviewed-by: Björn Töpel <[email protected]> Acked-by: Björn Töpel <[email protected]> Link: https://lore.kernel.org/bpf/[email protected] | |||||
2020-05-06 | bpf, riscv: Optimize FROM_LE using verifier_zext on RV64 | Luke Nelson | 1 | -6/+14 | |
This patch adds two optimizations for BPF_ALU BPF_END BPF_FROM_LE in the RV64 BPF JIT. First, it enables the verifier zero-extension optimization to avoid zero extension when imm == 32. Second, it avoids generating code for imm == 64, since it is equivalent to a no-op. Co-developed-by: Xi Wang <[email protected]> Signed-off-by: Xi Wang <[email protected]> Signed-off-by: Luke Nelson <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Reviewed-by: Björn Töpel <[email protected]> Acked-by: Björn Töpel <[email protected]> Link: https://lore.kernel.org/bpf/[email protected] | |||||
2020-05-06 | bpf, riscv: Enable missing verifier_zext optimizations on RV64 | Luke Nelson | 1 | -4/+4 | |
Commit 66d0d5a854a6 ("riscv: bpf: eliminate zero extension code-gen") added support for the verifier zero-extension optimization on RV64 and commit 46dd3d7d287b ("bpf, riscv: Enable zext optimization for more RV64G ALU ops") enabled it for more instruction cases. However, BPF_LSH BPF_X and BPF_{LSH,RSH,ARSH} BPF_K are still missing the optimization. This patch enables the zero-extension optimization for these remaining cases. Co-developed-by: Xi Wang <[email protected]> Signed-off-by: Xi Wang <[email protected]> Signed-off-by: Luke Nelson <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Reviewed-by: Björn Töpel <[email protected]> Acked-by: Björn Töpel <[email protected]> Link: https://lore.kernel.org/bpf/[email protected] | |||||
2020-04-08 | riscv, bpf: Fix offset range checking for auipc+jalr on RV64 | Luke Nelson | 1 | -17/+32 | |
The existing code in emit_call on RV64 checks that the PC-relative offset to the function fits in 32 bits before calling emit_jump_and_link to emit an auipc+jalr pair. However, this check is incorrect because offsets in the range [2^31 - 2^11, 2^31 - 1] cannot be encoded using auipc+jalr on RV64 (see discussion [1]). The RISC-V spec has recently been updated to reflect this fact [2, 3]. This patch fixes the problem by moving the check on the offset into emit_jump_and_link and modifying it to the correct range of encodable offsets, which is [-2^31 - 2^11, 2^31 - 2^11). This also enforces the check on the offset to other uses of emit_jump_and_link (e.g., BPF_JA) as well. Currently, this bug is unlikely to be triggered, because the memory region from which JITed images are allocated is close enough to kernel text for the offsets to not become too large; and because the bounds on BPF program size are small enough. This patch prevents this problem from becoming an issue if either of these change. [1]: https://groups.google.com/a/groups.riscv.org/forum/#!topic/isa-dev/bwWFhBnnZFQ [2]: https://github.com/riscv/riscv-isa-manual/commit/b1e42e09ac55116dbf9de5e4fb326a5a90e4a993 [3]: https://github.com/riscv/riscv-isa-manual/commit/4c1b2066ebd2965a422e41eb262d0a208a7fea07 Signed-off-by: Luke Nelson <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Link: https://lore.kernel.org/bpf/[email protected] | |||||
2020-03-05 | riscv, bpf: Factor common RISC-V JIT code | Luke Nelson | 1 | -0/+1103 | |
This patch factors out code that can be used by both the RV64 and RV32 BPF JITs to a common bpf_jit.h and bpf_jit_core.c. Move struct definitions and macro-like functions to header. Rename rv_sb_insn/rv_uj_insn to rv_b_insn/rv_j_insn to match the RISC-V specification. Move reusable functions emit_body() and bpf_int_jit_compile() to bpf_jit_core.c with minor simplifications. Rename emit_insn() and build_{prologue,epilogue}() to be prefixed with "bpf_jit_" as they are no longer static. Rename bpf_jit_comp.c to bpf_jit_comp64.c to be more explicit. Co-developed-by: Xi Wang <[email protected]> Signed-off-by: Xi Wang <[email protected]> Signed-off-by: Luke Nelson <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Reviewed-by: Björn Töpel <[email protected]> Acked-by: Björn Töpel <[email protected]> Link: https://lore.kernel.org/bpf/[email protected] |