aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2020-07-21enetc: Refine buffer descriptor ring sizesClaudiu Manoil2-4/+5
It's time to differentiate between Rx and Tx ring sizes. Not only Tx rings are processed differently than Rx rings, but their default number also differs - i.e. up to 8 Tx rings per device (8 traffic classes) vs. 2 Rx rings (one per CPU). So let's set Tx rings sizes to half the size of the Rx rings for now, to be conservative. The default ring sizes were decreased as well (to the next lower power of 2), to reduce the memory footprint, buffering etc., since the measurements I've made so far show that the rings are very unlikely to get full. This change also anticipates the introduction of the dynamic interrupt moderation (dim) algorithm which operates on maximum packet thresholds of 256 packets for Rx and 128 packets for Tx. Signed-off-by: Claudiu Manoil <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-07-21net-sysfs: add a newline when printing 'tx_timeout' by sysfsXiongfeng Wang1-1/+1
When I cat 'tx_timeout' by sysfs, it displays as follows. It's better to add a newline for easy reading. root@syzkaller:~# cat /sys/devices/virtual/net/lo/queues/tx-0/tx_timeout 0root@syzkaller:~# Signed-off-by: Xiongfeng Wang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-07-21net: mdio-mux-gpio: use devm_gpiod_get_array()Jisheng Zhang1-8/+3
Use devm_gpiod_get_array() to simplify the error handling and exit code path. Signed-off-by: Jisheng Zhang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-07-21net: ethernet: ravb: exit if re-initialization fails in tx timeoutYoshihiro Shimoda1-2/+24
According to the report of [1], this driver is possible to cause the following error in ravb_tx_timeout_work(). ravb e6800000.ethernet ethernet: failed to switch device to config mode This error means that the hardware could not change the state from "Operation" to "Configuration" while some tx and/or rx queue are operating. After that, ravb_config() in ravb_dmac_init() will fail, and then any descriptors will be not allocaled anymore so that NULL pointer dereference happens after that on ravb_start_xmit(). To fix the issue, the ravb_tx_timeout_work() should check the return values of ravb_stop_dma() and ravb_dmac_init(). If ravb_stop_dma() fails, ravb_tx_timeout_work() re-enables TX and RX and just exits. If ravb_dmac_init() fails, just exits. [1] https://lore.kernel.org/linux-renesas-soc/[email protected]/ Reported-by: Dirk Behme <[email protected]> Signed-off-by: Yoshihiro Shimoda <[email protected]> Reviewed-by: Sergei Shtylyov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-07-21Merge branch 'udp-Fix-reuseport-selection-with-connected-sockets'David S. Miller3-12/+19
Kuniyuki Iwashima says: ==================== udp: Fix reuseport selection with connected sockets. This patch set addresses two issues which happen when both connected and unconnected sockets are in the same UDP reuseport group. ==================== Signed-off-by: David S. Miller <[email protected]>
2020-07-21udp: Improve load balancing for SO_REUSEPORT.Kuniyuki Iwashima2-12/+18
Currently, SO_REUSEPORT does not work well if connected sockets are in a UDP reuseport group. Then reuseport_has_conns() returns true and the result of reuseport_select_sock() is discarded. Also, unconnected sockets have the same score, hence only does the first unconnected socket in udp_hslot always receive all packets sent to unconnected sockets. So, the result of reuseport_select_sock() should be used for load balancing. The noteworthy point is that the unconnected sockets placed after connected sockets in sock_reuseport.socks will receive more packets than others because of the algorithm in reuseport_select_sock(). index | connected | reciprocal_scale | result --------------------------------------------- 0 | no | 20% | 40% 1 | no | 20% | 20% 2 | yes | 20% | 0% 3 | no | 20% | 40% 4 | yes | 20% | 0% If most of the sockets are connected, this can be a problem, but it still works better than now. Fixes: acdcecc61285 ("udp: correct reuseport selection with connected sockets") CC: Willem de Bruijn <[email protected]> Reviewed-by: Benjamin Herrenschmidt <[email protected]> Signed-off-by: Kuniyuki Iwashima <[email protected]> Acked-by: Willem de Bruijn <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-07-21udp: Copy has_conns in reuseport_grow().Kuniyuki Iwashima1-0/+1
If an unconnected socket in a UDP reuseport group connect()s, has_conns is set to 1. Then, when a packet is received, udp[46]_lib_lookup2() scans all sockets in udp_hslot looking for the connected socket with the highest score. However, when the number of sockets bound to the port exceeds max_socks, reuseport_grow() resets has_conns to 0. It can cause udp[46]_lib_lookup2() to return without scanning all sockets, resulting in that packets sent to connected sockets may be distributed to unconnected sockets. Therefore, reuseport_grow() should copy has_conns. Fixes: acdcecc61285 ("udp: correct reuseport selection with connected sockets") CC: Willem de Bruijn <[email protected]> Reviewed-by: Benjamin Herrenschmidt <[email protected]> Signed-off-by: Kuniyuki Iwashima <[email protected]> Acked-by: Willem de Bruijn <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-07-21bpftool: Use only nftw for file tree parsingTony Ambardar2-59/+82
The bpftool sources include code to walk file trees, but use multiple frameworks to do so: nftw and fts. While nftw conforms to POSIX/SUSv3 and is widely available, fts is not conformant and less common, especially on non-glibc systems. The inconsistent framework usage hampers maintenance and portability of bpftool, in particular for embedded systems. Standardize code usage by rewriting one fts-based function to use nftw and clean up some related function warnings by extending use of "const char *" arguments. This change helps in building bpftool against musl for OpenWrt. Also fix an unsafe call to dirname() by duplicating the string to pass, since some implementations may directly alter it. The same approach is used in libbpf.c. Signed-off-by: Tony Ambardar <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Reviewed-by: Quentin Monnet <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-07-21Merge branch 'bpf_iter-BTF_ID-at-build-time'Alexei Starovoitov12-73/+153
Yonghong Song says: ==================== Commit 5a2798ab32ba ("bpf: Add BTF_ID_LIST/BTF_ID/BTF_ID_UNUSED macros") implemented a mechanism to compute btf_ids at kernel build time which can simplify kernel implementation and reduce runtime overhead by removing in-kernel btf_id calculation. This patch set tried to use this mechanism to compute btf_ids for bpf_skc_to_*() helpers and for btf_id_or_null ctx arguments specified during bpf iterator registration. Please see individual patch for details. Changelogs: v1 -> v2: - v1 ([1]) is only for bpf_skc_to_*() helpers. This version expanded it to cover ctx btf_id_or_null arguments - abandoned the change of "extern u32 name[]" to "static u32 name[]" for BPF_ID_LIST local "name" definition. gcc 9 incurred a compilation error. [1]: https://lore.kernel.org/bpf/[email protected]/T ==================== Signed-off-by: Alexei Starovoitov <[email protected]>
2020-07-21samples/bpf, selftests/bpf: Use bpf_probe_read_kernelIlya Leoshkevich10-16/+33
A handful of samples and selftests fail to build on s390, because after commit 0ebeea8ca8a4 ("bpf: Restrict bpf_probe_read{, str}() only to archs where they work") bpf_probe_read is not available anymore. Fix by using bpf_probe_read_kernel. Signed-off-by: Ilya Leoshkevich <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-07-21bpf: net: Use precomputed btf_id for bpf iteratorsYonghong Song8-9/+38
One additional field btf_id is added to struct bpf_ctx_arg_aux to store the precomputed btf_ids. The btf_id is computed at build time with BTF_ID_LIST or BTF_ID_LIST_GLOBAL macro definitions. All existing bpf iterators are changed to used pre-compute btf_ids. Signed-off-by: Yonghong Song <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-07-21selftests/bpf: Fix test_lwt_seg6local.sh hangsIlya Leoshkevich1-1/+1
OpenBSD netcat (Debian patchlevel 1.195-2) does not seem to react to SIGINT for whatever reason, causing prefix.pl to hang after test_lwt_seg6local.sh exits due to netcat inheriting test_lwt_seg6local.sh's file descriptors. Fix by using SIGTERM instead. Signed-off-by: Ilya Leoshkevich <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-07-21bpf: Make btf_sock_ids globalYonghong Song3-28/+62
tcp and udp bpf_iter can reuse some socket ids in btf_sock_ids, so make it global. I put the extern definition in btf_ids.h as a central place so it can be easily discovered by developers. Signed-off-by: Yonghong Song <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-07-21Merge branch 'compressed-JITed-insn'Alexei Starovoitov4-153/+643
Luke Nelson says: ==================== his patch series enables using compressed riscv (RVC) instructions in the rv64 BPF JIT. RVC is a standard riscv extension that adds a set of compressed, 2-byte instructions that can replace some regular 4-byte instructions for improved code density. This series first modifies the JIT to support using 2-byte instructions (e.g., in jump offset computations), then adds RVC encoding and helper functions, and finally uses the helper functions to optimize the rv64 JIT. I used our formal verification framework, Serval, to verify the correctness of the RVC encodings and their uses in the rv64 JIT. The JIT continues to pass all tests in lib/test_bpf.c, and introduces no new failures to test_verifier; both with and without RVC being enabled. The following are examples of the JITed code for the verifier selftest "direct packet read test#3 for CGROUP_SKB OK", without and with RVC enabled, respectively. The former uses 178 bytes, and the latter uses 112, for a ~37% reduction in code size for this example. Without RVC: 0: 02000813 addi a6,zero,32 4: fd010113 addi sp,sp,-48 8: 02813423 sd s0,40(sp) c: 02913023 sd s1,32(sp) 10: 01213c23 sd s2,24(sp) 14: 01313823 sd s3,16(sp) 18: 01413423 sd s4,8(sp) 1c: 03010413 addi s0,sp,48 20: 03056683 lwu a3,48(a0) 24: 02069693 slli a3,a3,0x20 28: 0206d693 srli a3,a3,0x20 2c: 03456703 lwu a4,52(a0) 30: 02071713 slli a4,a4,0x20 34: 02075713 srli a4,a4,0x20 38: 03856483 lwu s1,56(a0) 3c: 02049493 slli s1,s1,0x20 40: 0204d493 srli s1,s1,0x20 44: 03c56903 lwu s2,60(a0) 48: 02091913 slli s2,s2,0x20 4c: 02095913 srli s2,s2,0x20 50: 04056983 lwu s3,64(a0) 54: 02099993 slli s3,s3,0x20 58: 0209d993 srli s3,s3,0x20 5c: 09056a03 lwu s4,144(a0) 60: 020a1a13 slli s4,s4,0x20 64: 020a5a13 srli s4,s4,0x20 68: 00900313 addi t1,zero,9 6c: 006a7463 bgeu s4,t1,0x74 70: 00000a13 addi s4,zero,0 74: 02d52823 sw a3,48(a0) 78: 02e52a23 sw a4,52(a0) 7c: 02952c23 sw s1,56(a0) 80: 03252e23 sw s2,60(a0) 84: 05352023 sw s3,64(a0) 88: 00000793 addi a5,zero,0 8c: 02813403 ld s0,40(sp) 90: 02013483 ld s1,32(sp) 94: 01813903 ld s2,24(sp) 98: 01013983 ld s3,16(sp) 9c: 00813a03 ld s4,8(sp) a0: 03010113 addi sp,sp,48 a4: 00078513 addi a0,a5,0 a8: 00008067 jalr zero,0(ra) With RVC: 0: 02000813 addi a6,zero,32 4: 7179 c.addi16sp sp,-48 6: f422 c.sdsp s0,40(sp) 8: f026 c.sdsp s1,32(sp) a: ec4a c.sdsp s2,24(sp) c: e84e c.sdsp s3,16(sp) e: e452 c.sdsp s4,8(sp) 10: 1800 c.addi4spn s0,sp,48 12: 03056683 lwu a3,48(a0) 16: 1682 c.slli a3,0x20 18: 9281 c.srli a3,0x20 1a: 03456703 lwu a4,52(a0) 1e: 1702 c.slli a4,0x20 20: 9301 c.srli a4,0x20 22: 03856483 lwu s1,56(a0) 26: 1482 c.slli s1,0x20 28: 9081 c.srli s1,0x20 2a: 03c56903 lwu s2,60(a0) 2e: 1902 c.slli s2,0x20 30: 02095913 srli s2,s2,0x20 34: 04056983 lwu s3,64(a0) 38: 1982 c.slli s3,0x20 3a: 0209d993 srli s3,s3,0x20 3e: 09056a03 lwu s4,144(a0) 42: 1a02 c.slli s4,0x20 44: 020a5a13 srli s4,s4,0x20 48: 4325 c.li t1,9 4a: 006a7363 bgeu s4,t1,0x50 4e: 4a01 c.li s4,0 50: d914 c.sw a3,48(a0) 52: d958 c.sw a4,52(a0) 54: dd04 c.sw s1,56(a0) 56: 03252e23 sw s2,60(a0) 5a: 05352023 sw s3,64(a0) 5e: 4781 c.li a5,0 60: 7422 c.ldsp s0,40(sp) 62: 7482 c.ldsp s1,32(sp) 64: 6962 c.ldsp s2,24(sp) 66: 69c2 c.ldsp s3,16(sp) 68: 6a22 c.ldsp s4,8(sp) 6a: 6145 c.addi16sp sp,48 6c: 853e c.mv a0,a5 6e: 8082 c.jr ra RFC -> v1: - From Björn Töpel: * Changed RVOFF macro to static inline "ninsns_rvoff". * Changed return type of rvc_ functions from u32 to u16. * Changed sizeof(u16) to sizeof(*ctx->insns). * Factored unsigned immediate checks into helper functions (is_8b_uint, etc.) * Changed to use IS_ENABLED instead of #ifdef to check if RVC is enabled. * Changed type of immediate arguments to rvc_* encoding to u32 to avoid issues from promotion of u16 to signed int. * Cleaned up RVC checks in emit_{addi,slli,srli,srai}. + Wrapped lines at 100 instead of 80 columns for increased clarity. + Move !imm checks into each branch instead of checking separately. + Strengthed checks for c.{slli,srli,srai} to check that imm < XLEN. Otherwise, imm could be non-zero but the lower XLEN bits could all be zero, leading to invalid RVC encoding. * Changed emit_imm to sign-extend the 12-bit value in "lower" + The immediate checks for emit_{addiw,li,addi} use signed comparisons, so this enables the RVC variants to be used more often (e.g., if val == -1, then lower should be -1 as opposed to 4095). ==================== Reviewed-by: Björn Töpel <[email protected]> Acked-by: Björn Töpel <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]>
2020-07-21bpf: Add BTF_ID_LIST_GLOBAL in btf_ids.hYonghong Song3-14/+39
Existing BTF_ID_LIST used a local static variable to store btf_ids. This patch provided a new macro BTF_ID_LIST_GLOBAL to store btf_ids in a global variable which can be shared among multiple files. The existing BTF_ID_LIST is still retained. Two reasons. First, BTF_ID_LIST is also used to build btf_ids for helper arguments which typically is an array of 5. Since typically different helpers have different signature, it makes little sense to share them. Second, some current computed btf_ids are indeed local. If later those btf_ids are shared between different files, they can use BTF_ID_LIST_GLOBAL then. Signed-off-by: Yonghong Song <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Acked-by: Jiri Olsa <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-07-21tools/bpf: Sync btf_ids.h to toolsYonghong Song2-1/+11
Sync kernel header btf_ids.h to tools directory. Also define macro CONFIG_DEBUG_INFO_BTF before including btf_ids.h in prog_tests/resolve_btfids.c since non-stub definitions for BTF_ID_LIST etc. macros are defined under CONFIG_DEBUG_INFO_BTF. This prevented test_progs from failing. Signed-off-by: Yonghong Song <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-07-21bpf: Compute bpf_skc_to_*() helper socket btf ids at build timeYonghong Song3-36/+18
Currently, socket types (struct tcp_sock, udp_sock, etc.) used by bpf_skc_to_*() helpers are computed when vmlinux_btf is first built in the kernel. Commit 5a2798ab32ba ("bpf: Add BTF_ID_LIST/BTF_ID/BTF_ID_UNUSED macros") implemented a mechanism to compute btf_ids at kernel build time which can simplify kernel implementation and reduce runtime overhead by removing in-kernel btf_id calculation. This patch did exactly this, removing in-kernel btf_id computation and utilizing build-time btf_id computation. If CONFIG_DEBUG_INFO_BTF is not defined, BTF_ID_LIST will define an array with size of 5, which is not enough for btf_sock_ids. So define its own static array if CONFIG_DEBUG_INFO_BTF is not defined. Signed-off-by: Yonghong Song <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-07-21tools/bpftool: Fix error handing in do_skeleton()YueHaibing1-1/+4
Fix pass 0 to PTR_ERR, also dump more err info using libbpf_strerror. Fixes: 5dc7a8b21144 ("bpftool, selftests/bpf: Embed object file inside skeleton") Signed-off-by: YueHaibing <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Reviewed-by: Quentin Monnet <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-07-21bpf, riscv: Use compressed instructions in the rv64 JITLuke Nelson1-134/+147
This patch uses the RVC support and encodings from bpf_jit.h to optimize the rv64 jit. The optimizations work by replacing emit(rv_X(...)) with a call to a helper function emit_X, which will emit a compressed version of the instruction when possible, and when RVC is enabled. The JIT continues to pass all tests in lib/test_bpf.c, and introduces no new failures to test_verifier; both with and without RVC being enabled. Most changes are straightforward replacements of emit(rv_X(...), ctx) with emit_X(..., ctx), with the following exceptions bearing mention; * Change emit_imm to sign-extend the value in "lower", since the checks for RVC (and the instructions themselves) treat the value as signed. Otherwise, small negative immediates will not be recognized as encodable using an RVC instruction. For example, without this change, emit_imm(rd, -1, ctx) would cause lower to become 4095, which is not a 6b int even though a "c.li rd, -1" instruction suffices. * For {BPF_MOV,BPF_ADD} BPF_X, drop using addiw,addw in the 32-bit cases since the values are zero-extended into the upper 32 bits in the following instructions anyways, and the addition commutes with zero-extension. (BPF_SUB BPF_X must still use subw since subtraction does not commute with zero-extension.) This patch avoids optimizing branches and jumps to use RVC instructions since surrounding code often makes assumptions about the sizes of emitted instructions. Optimizing these will require changing these functions (e.g., emit_branch) to dynamically compute jump offsets. The following are examples of the JITed code for the verifier selftest "direct packet read test#3 for CGROUP_SKB OK", without and with RVC enabled, respectively. The former uses 178 bytes, and the latter uses 112, for a ~37% reduction in code size for this example. Without RVC: 0: 02000813 addi a6,zero,32 4: fd010113 addi sp,sp,-48 8: 02813423 sd s0,40(sp) c: 02913023 sd s1,32(sp) 10: 01213c23 sd s2,24(sp) 14: 01313823 sd s3,16(sp) 18: 01413423 sd s4,8(sp) 1c: 03010413 addi s0,sp,48 20: 03056683 lwu a3,48(a0) 24: 02069693 slli a3,a3,0x20 28: 0206d693 srli a3,a3,0x20 2c: 03456703 lwu a4,52(a0) 30: 02071713 slli a4,a4,0x20 34: 02075713 srli a4,a4,0x20 38: 03856483 lwu s1,56(a0) 3c: 02049493 slli s1,s1,0x20 40: 0204d493 srli s1,s1,0x20 44: 03c56903 lwu s2,60(a0) 48: 02091913 slli s2,s2,0x20 4c: 02095913 srli s2,s2,0x20 50: 04056983 lwu s3,64(a0) 54: 02099993 slli s3,s3,0x20 58: 0209d993 srli s3,s3,0x20 5c: 09056a03 lwu s4,144(a0) 60: 020a1a13 slli s4,s4,0x20 64: 020a5a13 srli s4,s4,0x20 68: 00900313 addi t1,zero,9 6c: 006a7463 bgeu s4,t1,0x74 70: 00000a13 addi s4,zero,0 74: 02d52823 sw a3,48(a0) 78: 02e52a23 sw a4,52(a0) 7c: 02952c23 sw s1,56(a0) 80: 03252e23 sw s2,60(a0) 84: 05352023 sw s3,64(a0) 88: 00000793 addi a5,zero,0 8c: 02813403 ld s0,40(sp) 90: 02013483 ld s1,32(sp) 94: 01813903 ld s2,24(sp) 98: 01013983 ld s3,16(sp) 9c: 00813a03 ld s4,8(sp) a0: 03010113 addi sp,sp,48 a4: 00078513 addi a0,a5,0 a8: 00008067 jalr zero,0(ra) With RVC: 0: 02000813 addi a6,zero,32 4: 7179 c.addi16sp sp,-48 6: f422 c.sdsp s0,40(sp) 8: f026 c.sdsp s1,32(sp) a: ec4a c.sdsp s2,24(sp) c: e84e c.sdsp s3,16(sp) e: e452 c.sdsp s4,8(sp) 10: 1800 c.addi4spn s0,sp,48 12: 03056683 lwu a3,48(a0) 16: 1682 c.slli a3,0x20 18: 9281 c.srli a3,0x20 1a: 03456703 lwu a4,52(a0) 1e: 1702 c.slli a4,0x20 20: 9301 c.srli a4,0x20 22: 03856483 lwu s1,56(a0) 26: 1482 c.slli s1,0x20 28: 9081 c.srli s1,0x20 2a: 03c56903 lwu s2,60(a0) 2e: 1902 c.slli s2,0x20 30: 02095913 srli s2,s2,0x20 34: 04056983 lwu s3,64(a0) 38: 1982 c.slli s3,0x20 3a: 0209d993 srli s3,s3,0x20 3e: 09056a03 lwu s4,144(a0) 42: 1a02 c.slli s4,0x20 44: 020a5a13 srli s4,s4,0x20 48: 4325 c.li t1,9 4a: 006a7363 bgeu s4,t1,0x50 4e: 4a01 c.li s4,0 50: d914 c.sw a3,48(a0) 52: d958 c.sw a4,52(a0) 54: dd04 c.sw s1,56(a0) 56: 03252e23 sw s2,60(a0) 5a: 05352023 sw s3,64(a0) 5e: 4781 c.li a5,0 60: 7422 c.ldsp s0,40(sp) 62: 7482 c.ldsp s1,32(sp) 64: 6962 c.ldsp s2,24(sp) 66: 69c2 c.ldsp s3,16(sp) 68: 6a22 c.ldsp s4,8(sp) 6a: 6145 c.addi16sp sp,48 6c: 853e c.mv a0,a5 6e: 8082 c.jr ra Signed-off-by: Luke Nelson <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Cc: Björn Töpel <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-07-21libbpf bpf_helpers: Use __builtin_offsetof for offsetofIan Rogers1-1/+1
The non-builtin route for offsetof has a dependency on size_t from stdlib.h/stdint.h that is undeclared and may break targets. The offsetof macro in bpf_helpers may disable the same macro in other headers that have a #ifdef offsetof guard. Rather than add additional dependencies improve the offsetof macro declared here to use the builtin that is available since llvm 3.7 (the first with a BPF backend). Signed-off-by: Ian Rogers <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Acked-by: Andrii Nakryiko <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-07-21bpf, riscv: Add encodings for compressed instructionsLuke Nelson1-0/+452
This patch adds functions for encoding and emitting compressed riscv (RVC) instructions to the BPF JIT. Some regular riscv instructions can be compressed into an RVC instruction if the instruction fields meet some requirements. For example, "add rd, rs1, rs2" can be compressed into "c.add rd, rs2" when rd == rs1. To make using RVC encodings simpler, this patch also adds helper functions that selectively emit either a regular instruction or a compressed instruction if possible. For example, emit_add will produce a "c.add" if possible and regular "add" otherwise. Signed-off-by: Luke Nelson <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-07-21s390/bpf: Use bpf_skip() in bpf_jit_prologue()Ilya Leoshkevich1-4/+5
Now that we have bpf_skip() for emitting nops, use it in bpf_jit_prologue() in order to reduce code duplication. Signed-off-by: Ilya Leoshkevich <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-07-21bpf, riscv: Modify JIT ctx to support compressed instructionsLuke Nelson4-19/+44
This patch makes the necessary changes to struct rv_jit_context and to bpf_int_jit_compile to support compressed riscv (RVC) instructions in the BPF JIT. It changes the JIT image to be u16 instead of u32, since RVC instructions are 2 bytes as opposed to 4. It also changes ctx->offset and ctx->ninsns to refer to 2-byte instructions rather than 4-byte ones. The riscv PC is required to be 16-bit aligned with or without RVC, so this is sufficient to refer to any valid riscv offset. The code for computing jump offsets in bytes is updated accordingly, and factored into a new "ninsns_rvoff" function to simplify the code. Signed-off-by: Luke Nelson <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-07-21s390/bpf: Tolerate not converging code shrinkingIlya Leoshkevich1-1/+26
"BPF_MAXINSNS: Maximum possible literals" unnecessarily falls back to the interpreter because of failing sanity check in bpf_set_addr. The problem is that there are a lot of branches that can be shrunk, and doing so opens up the possibility to shrink even more. This process does not converge after 3 passes, causing code offsets to change during the codegen pass, which must never happen. Fix by inserting nops during codegen pass in order to preserve code offets. Fixes: 4e9b4a6883dd ("s390/bpf: Use relative long branches") Signed-off-by: Ilya Leoshkevich <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-07-21s390/bpf: Use brcl for jumping to exit_ip if necessaryIlya Leoshkevich1-2/+6
"BPF_MAXINSNS: Maximum possible literals" test causes panic with bpf_jit_harden = 2. The reason is that BPF_JMP | BPF_EXIT is always emitted as brc, however, after removal of JITed image size limitations, brcl might be required. Fix by using brcl when necessary. Fixes: 4e9b4a6883dd ("s390/bpf: Use relative long branches") Signed-off-by: Ilya Leoshkevich <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-07-21s390/bpf: Fix sign extension in branch_kuIlya Leoshkevich1-15/+4
Both signed and unsigned variants of BPF_JMP | BPF_K require sign-extending the immediate. JIT emits cgfi for the signed case, which is correct, and clgfi for the unsigned case, which is not correct: clgfi zero-extends the immediate. s390 does not provide an instruction that does sign-extension and unsigned comparison at the same time. Therefore, fix by first loading the sign-extended immediate into work register REG_1 and proceeding as if it's BPF_X. Fixes: 4e9b4a6883dd ("s390/bpf: Use relative long branches") Reported-by: Seth Forshee <[email protected]> Signed-off-by: Ilya Leoshkevich <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Tested-by: Seth Forshee <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-07-21selftests: bpf: test_kmod.sh: Fix running out of srctreeIlya Leoshkevich1-3/+9
When running out of srctree, relative path to lib/test_bpf.ko is different than when running in srctree. Check $building_out_of_srctree environment variable and use a different relative path if needed. Signed-off-by: Ilya Leoshkevich <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-07-21btrfs: fix mount failure caused by race with umountBoris Burkov1-0/+8
It is possible to cause a btrfs mount to fail by racing it with a slow umount. The crux of the sequence is generic_shutdown_super not yet calling sop->put_super before btrfs_mount_root calls btrfs_open_devices. If that occurs, btrfs_open_devices will decide the opened counter is non-zero, increment it, and skip resetting fs_devices->total_rw_bytes to 0. From here, mount will call sget which will result in grab_super trying to take the super block umount semaphore. That semaphore will be held by the slow umount, so mount will block. Before up-ing the semaphore, umount will delete the super block, resulting in mount's sget reliably allocating a new one, which causes the mount path to dutifully fill it out, and increment total_rw_bytes a second time, which causes the mount to fail, as we see double the expected bytes. Here is the sequence laid out in greater detail: CPU0 CPU1 down_write sb->s_umount btrfs_kill_super kill_anon_super(sb) generic_shutdown_super(sb); shrink_dcache_for_umount(sb); sync_filesystem(sb); evict_inodes(sb); // SLOW btrfs_mount_root btrfs_scan_one_device fs_devices = device->fs_devices fs_info->fs_devices = fs_devices // fs_devices-opened makes this a no-op btrfs_open_devices(fs_devices, mode, fs_type) s = sget(fs_type, test, set, flags, fs_info); find sb in s_instances grab_super(sb); down_write(&s->s_umount); // blocks sop->put_super(sb) // sb->fs_devices->opened == 2; no-op spin_lock(&sb_lock); hlist_del_init(&sb->s_instances); spin_unlock(&sb_lock); up_write(&sb->s_umount); return 0; retry lookup don't find sb in s_instances (deleted by CPU0) s = alloc_super return s; btrfs_fill_super(s, fs_devices, data) open_ctree // fs_devices total_rw_bytes improperly set! btrfs_read_chunk_tree read_one_dev // increment total_rw_bytes again!! super_total_bytes < fs_devices->total_rw_bytes // ERROR!!! To fix this, we clear total_rw_bytes from within btrfs_read_chunk_tree before the calls to read_one_dev, while holding the sb umount semaphore and the uuid mutex. To reproduce, it is sufficient to dirty a decent number of inodes, then quickly umount and mount. for i in $(seq 0 500) do dd if=/dev/zero of="/mnt/foo/$i" bs=1M count=1 done umount /mnt/foo& mount /mnt/foo does the trick for me. CC: [email protected] # 4.4+ Signed-off-by: Boris Burkov <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2020-07-21btrfs: fix page leaks after failure to lock page for delallocRobbie Ko1-1/+2
When locking pages for delalloc, we check if it's dirty and mapping still matches. If it does not match, we need to return -EAGAIN and release all pages. Only the current page was put though, iterate over all the remaining pages too. CC: [email protected] # 4.14+ Reviewed-by: Filipe Manana <[email protected]> Reviewed-by: Nikolay Borisov <[email protected]> Signed-off-by: Robbie Ko <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2020-07-21btrfs: qgroup: fix data leak caused by race between writeback and truncateQu Wenruo1-13/+10
[BUG] When running tests like generic/013 on test device with btrfs quota enabled, it can normally lead to data leak, detected at unmount time: BTRFS warning (device dm-3): qgroup 0/5 has unreleased space, type 0 rsv 4096 ------------[ cut here ]------------ WARNING: CPU: 11 PID: 16386 at fs/btrfs/disk-io.c:4142 close_ctree+0x1dc/0x323 [btrfs] RIP: 0010:close_ctree+0x1dc/0x323 [btrfs] Call Trace: btrfs_put_super+0x15/0x17 [btrfs] generic_shutdown_super+0x72/0x110 kill_anon_super+0x18/0x30 btrfs_kill_super+0x17/0x30 [btrfs] deactivate_locked_super+0x3b/0xa0 deactivate_super+0x40/0x50 cleanup_mnt+0x135/0x190 __cleanup_mnt+0x12/0x20 task_work_run+0x64/0xb0 __prepare_exit_to_usermode+0x1bc/0x1c0 __syscall_return_slowpath+0x47/0x230 do_syscall_64+0x64/0xb0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 ---[ end trace caf08beafeca2392 ]--- BTRFS error (device dm-3): qgroup reserved space leaked [CAUSE] In the offending case, the offending operations are: 2/6: writev f2X[269 1 0 0 0 0] [1006997,67,288] 0 2/7: truncate f2X[269 1 0 0 48 1026293] 18388 0 The following sequence of events could happen after the writev(): CPU1 (writeback) | CPU2 (truncate) ----------------------------------------------------------------- btrfs_writepages() | |- extent_write_cache_pages() | |- Got page for 1003520 | | 1003520 is Dirty, no writeback | | So (!clear_page_dirty_for_io()) | | gets called for it | |- Now page 1003520 is Clean. | | | btrfs_setattr() | | |- btrfs_setsize() | | |- truncate_setsize() | | New i_size is 18388 |- __extent_writepage() | | |- page_offset() > i_size | |- btrfs_invalidatepage() | |- Page is clean, so no qgroup | callback executed This means, the qgroup reserved data space is not properly released in btrfs_invalidatepage() as the page is Clean. [FIX] Instead of checking the dirty bit of a page, call btrfs_qgroup_free_data() unconditionally in btrfs_invalidatepage(). As qgroup rsv are completely bound to the QGROUP_RESERVED bit of io_tree, not bound to page status, thus we won't cause double freeing anyway. Fixes: 0b34c261e235 ("btrfs: qgroup: Prevent qgroup->reserved from going subzero") CC: [email protected] # 4.14+ Reviewed-by: Josef Bacik <[email protected]> Signed-off-by: Qu Wenruo <[email protected]> Signed-off-by: David Sterba <[email protected]>
2020-07-21drm/amdgpu: Fix NULL dereference in dpm sysfs handlersPaweł Gronowski1-6/+3
NULL dereference occurs when string that is not ended with space or newline is written to some dpm sysfs interface (for example pp_dpm_sclk). This happens because strsep replaces the tmp with NULL if the delimiter is not present in string, which is then dereferenced by tmp[0]. Reproduction example: sudo sh -c 'echo -n 1 > /sys/class/drm/card0/device/pp_dpm_sclk' Signed-off-by: Paweł Gronowski <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected]
2020-07-21drm/amd/powerplay: fix a crash when overclocking Vega MQiu Wenbo1-4/+6
Avoid kernel crash when vddci_control is SMU7_VOLTAGE_CONTROL_NONE and vddci_voltage_table is empty. It has been tested on Intel Hades Canyon (i7-8809G). Bug: https://bugzilla.kernel.org/show_bug.cgi?id=208489 Fixes: ac7822b0026f ("drm/amd/powerplay: add smumgr support for VEGAM (v2)") Reviewed-by: Evan Quan <[email protected]> Signed-off-by: Qiu Wenbo <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected]
2020-07-21btrfs: fix double free on ulist after backref resolution failureFilipe Manana1-0/+1
At btrfs_find_all_roots_safe() we allocate a ulist and set the **roots argument to point to it. However if later we fail due to an error returned by find_parent_nodes(), we free that ulist but leave a dangling pointer in the **roots argument. Upon receiving the error, a caller of this function can attempt to free the same ulist again, resulting in an invalid memory access. One such scenario is during qgroup accounting: btrfs_qgroup_account_extents() --> calls btrfs_find_all_roots() passes &new_roots (a stack allocated pointer) to btrfs_find_all_roots() --> btrfs_find_all_roots() just calls btrfs_find_all_roots_safe() passing &new_roots to it --> allocates ulist and assigns its address to **roots (which points to new_roots from btrfs_qgroup_account_extents()) --> find_parent_nodes() returns an error, so we free the ulist and leave **roots pointing to it after returning --> btrfs_qgroup_account_extents() sees btrfs_find_all_roots() returned an error and jumps to the label 'cleanup', which just tries to free again the same ulist Stack trace example: ------------[ cut here ]------------ BTRFS: tree first key check failed WARNING: CPU: 1 PID: 1763215 at fs/btrfs/disk-io.c:422 btrfs_verify_level_key+0xe0/0x180 [btrfs] Modules linked in: dm_snapshot dm_thin_pool (...) CPU: 1 PID: 1763215 Comm: fsstress Tainted: G W 5.8.0-rc3-btrfs-next-64 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:btrfs_verify_level_key+0xe0/0x180 [btrfs] Code: 28 5b 5d (...) RSP: 0018:ffffb89b473779a0 EFLAGS: 00010286 RAX: 0000000000000000 RBX: ffff90397759bf08 RCX: 0000000000000000 RDX: 0000000000000001 RSI: 0000000000000027 RDI: 00000000ffffffff RBP: ffff9039a419c000 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: ffffb89b43301000 R12: 000000000000005e R13: ffffb89b47377a2e R14: ffffb89b473779af R15: 0000000000000000 FS: 00007fc47e1e1000(0000) GS:ffff9039ac200000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fc47e1df000 CR3: 00000003d9e4e001 CR4: 00000000003606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: read_block_for_search+0xf6/0x350 [btrfs] btrfs_next_old_leaf+0x242/0x650 [btrfs] resolve_indirect_refs+0x7cf/0x9e0 [btrfs] find_parent_nodes+0x4ea/0x12c0 [btrfs] btrfs_find_all_roots_safe+0xbf/0x130 [btrfs] btrfs_qgroup_account_extents+0x9d/0x390 [btrfs] btrfs_commit_transaction+0x4f7/0xb20 [btrfs] btrfs_sync_file+0x3d4/0x4d0 [btrfs] do_fsync+0x38/0x70 __x64_sys_fdatasync+0x13/0x20 do_syscall_64+0x5c/0xe0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7fc47e2d72e3 Code: Bad RIP value. RSP: 002b:00007fffa32098c8 EFLAGS: 00000246 ORIG_RAX: 000000000000004b RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007fc47e2d72e3 RDX: 00007fffa3209830 RSI: 00007fffa3209830 RDI: 0000000000000003 RBP: 000000000000072e R08: 0000000000000001 R09: 0000000000000003 R10: 0000000000000000 R11: 0000000000000246 R12: 00000000000003e8 R13: 0000000051eb851f R14: 00007fffa3209970 R15: 00005607c4ac8b50 irq event stamp: 0 hardirqs last enabled at (0): [<0000000000000000>] 0x0 hardirqs last disabled at (0): [<ffffffffb8eb5e85>] copy_process+0x755/0x1eb0 softirqs last enabled at (0): [<ffffffffb8eb5e85>] copy_process+0x755/0x1eb0 softirqs last disabled at (0): [<0000000000000000>] 0x0 ---[ end trace 8639237550317b48 ]--- BTRFS error (device sdc): tree first key mismatch detected, bytenr=62324736 parent_transid=94 key expected=(262,108,1351680) has=(259,108,1921024) general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6b6b: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC PTI CPU: 2 PID: 1763215 Comm: fsstress Tainted: G W 5.8.0-rc3-btrfs-next-64 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:ulist_release+0x14/0x60 [btrfs] Code: c7 07 00 (...) RSP: 0018:ffffb89b47377d60 EFLAGS: 00010282 RAX: 6b6b6b6b6b6b6b6b RBX: ffff903959b56b90 RCX: 0000000000000000 RDX: 0000000000000001 RSI: 0000000000270024 RDI: ffff9036e2adc840 RBP: ffff9036e2adc848 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffff9036e2adc840 R13: 0000000000000015 R14: ffff9039a419ccf8 R15: ffff90395d605840 FS: 00007fc47e1e1000(0000) GS:ffff9039ac600000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f8c1c0a51c8 CR3: 00000003d9e4e004 CR4: 00000000003606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: ulist_free+0x13/0x20 [btrfs] btrfs_qgroup_account_extents+0xf3/0x390 [btrfs] btrfs_commit_transaction+0x4f7/0xb20 [btrfs] btrfs_sync_file+0x3d4/0x4d0 [btrfs] do_fsync+0x38/0x70 __x64_sys_fdatasync+0x13/0x20 do_syscall_64+0x5c/0xe0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7fc47e2d72e3 Code: Bad RIP value. RSP: 002b:00007fffa32098c8 EFLAGS: 00000246 ORIG_RAX: 000000000000004b RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007fc47e2d72e3 RDX: 00007fffa3209830 RSI: 00007fffa3209830 RDI: 0000000000000003 RBP: 000000000000072e R08: 0000000000000001 R09: 0000000000000003 R10: 0000000000000000 R11: 0000000000000246 R12: 00000000000003e8 R13: 0000000051eb851f R14: 00007fffa3209970 R15: 00005607c4ac8b50 Modules linked in: dm_snapshot dm_thin_pool (...) ---[ end trace 8639237550317b49 ]--- RIP: 0010:ulist_release+0x14/0x60 [btrfs] Code: c7 07 00 (...) RSP: 0018:ffffb89b47377d60 EFLAGS: 00010282 RAX: 6b6b6b6b6b6b6b6b RBX: ffff903959b56b90 RCX: 0000000000000000 RDX: 0000000000000001 RSI: 0000000000270024 RDI: ffff9036e2adc840 RBP: ffff9036e2adc848 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffff9036e2adc840 R13: 0000000000000015 R14: ffff9039a419ccf8 R15: ffff90395d605840 FS: 00007fc47e1e1000(0000) GS:ffff9039ad200000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f6a776f7d40 CR3: 00000003d9e4e002 CR4: 00000000003606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Fix this by making btrfs_find_all_roots_safe() set *roots to NULL after it frees the ulist. Fixes: 8da6d5815c592b ("Btrfs: added btrfs_find_all_roots()") CC: [email protected] # 4.4+ Reviewed-by: Josef Bacik <[email protected]> Signed-off-by: Filipe Manana <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2020-07-21RDMA/mlx5: Prevent prefetch from racing with implicit destructionJason Gunthorpe1-3/+19
Prefetch work in mlx5_ib_prefetch_mr_work can be queued and able to run concurrently with destruction of the implicit MR. The num_deferred_work was intended to serialize this, but there is a race: CPU0 CPU1 mlx5_ib_free_implicit_mr() xa_erase(odp_mkeys) synchronize_srcu() __xa_erase(implicit_children) mlx5_ib_prefetch_mr_work() pagefault_mr() pagefault_implicit_mr() implicit_get_child_mr() xa_cmpxchg() atomic_dec_and_test(num_deferred_mr) wait_event(imr->q_deferred_work) ib_umem_odp_release(odp_imr) kfree(odp_imr) At this point in mlx5_ib_free_implicit_mr() the implicit_children list is supposed to be empty forever so that destroy_unused_implicit_child_mr() and related are not and will not be running. Since it is not empty the destroy_unused_implicit_child_mr() flow ends up touching deallocated memory as mlx5_ib_free_implicit_mr() already tore down the imr parent. The solution is to flush out the prefetch wq by driving num_deferred_work to zero after creation of new prefetch work is blocked. Fixes: 5256edcb98a1 ("RDMA/mlx5: Rework implicit ODP destroy") Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-07-21bpf: cpumap: Fix possible rcpu kthread hungLorenzo Bianconi1-4/+7
Fix the following cpumap kthread hung. The issue is currently occurring when __cpu_map_load_bpf_program fails (e.g if the bpf prog has not BPF_XDP_CPUMAP as expected_attach_type) $./test_progs -n 101 101/1 cpumap_with_progs:OK 101 xdp_cpumap_attach:OK Summary: 1/1 PASSED, 0 SKIPPED, 0 FAILED [ 369.996478] INFO: task cpumap/0/map:7:205 blocked for more than 122 seconds. [ 369.998463] Not tainted 5.8.0-rc4-01472-ge57892f50a07 #212 [ 370.000102] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 370.001918] cpumap/0/map:7 D 0 205 2 0x00004000 [ 370.003228] Call Trace: [ 370.003930] __schedule+0x5c7/0xf50 [ 370.004901] ? io_schedule_timeout+0xb0/0xb0 [ 370.005934] ? static_obj+0x31/0x80 [ 370.006788] ? mark_held_locks+0x24/0x90 [ 370.007752] ? cpu_map_bpf_prog_run_xdp+0x6c0/0x6c0 [ 370.008930] schedule+0x6f/0x160 [ 370.009728] schedule_preempt_disabled+0x14/0x20 [ 370.010829] kthread+0x17b/0x240 [ 370.011433] ? kthread_create_worker_on_cpu+0xd0/0xd0 [ 370.011944] ret_from_fork+0x1f/0x30 [ 370.012348] Showing all locks held in the system: [ 370.013025] 1 lock held by khungtaskd/33: [ 370.013432] #0: ffffffff82b24720 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x28/0x1c3 [ 370.014461] ============================================= Fixes: 9216477449f3 ("bpf: cpumap: Add the possibility to attach an eBPF program to cpumap") Reported-by: Jakub Sitnicki <[email protected]> Signed-off-by: Lorenzo Bianconi <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Tested-by: Jakub Sitnicki <[email protected]> Reviewed-by: Jakub Sitnicki <[email protected]> Link: https://lore.kernel.org/bpf/e54f2aabf959f298939e5507b09c48f8c2e380be.1595170625.git.lorenzo@kernel.org
2020-07-21bpf, netns: Fix build without CONFIG_INETJakub Sitnicki1-0/+4
When CONFIG_NET is set but CONFIG_INET isn't, build fails with: ld: kernel/bpf/net_namespace.o: in function `netns_bpf_attach_type_unneed': kernel/bpf/net_namespace.c:32: undefined reference to `bpf_sk_lookup_enabled' ld: kernel/bpf/net_namespace.o: in function `netns_bpf_attach_type_need': kernel/bpf/net_namespace.c:43: undefined reference to `bpf_sk_lookup_enabled' This is because without CONFIG_INET bpf_sk_lookup_enabled symbol is not available. Wrap references to bpf_sk_lookup_enabled with preprocessor conditionals. Fixes: 1559b4aa1db4 ("inet: Run SK_LOOKUP BPF program on socket lookup") Reported-by: Randy Dunlap <[email protected]> Reported-by: Stephen Rothwell <[email protected]> Signed-off-by: Jakub Sitnicki <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Acked-by: Randy Dunlap <[email protected]> # build-tested Link: https://lore.kernel.org/bpf/[email protected]
2020-07-21Merge tag 'timers-v5.8-rc7' of ↵Thomas Gleixner2-10/+58
https://git.linaro.org/people/daniel.lezcano/linux into timers/urgent Pull a timer chip fix from Daniel Lezcano: - Fix kernel panic at suspend / resume time on TI am3/am4 (Tony Lindgren)
2020-07-21Merge tag 'sound-5.8-rc7' of ↵Linus Torvalds29-92/+266
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound into master Pull sound fixes from Takashi Iwai: "This became fairly large, containing mostly the collection of ASoC fixes that slipped from the previous request, so I sent now a bit earlier than usual. But all changes look small and mostly device-specific, hence nothing to worry too much. Majority of changes are for x86 based platforms and their CODEC drivers, in order to address some issues hit by their recent tests and fuzzing. The rest are other ASoC device-specific fixes (imx, qcom, wm8974, amd, rockchip) as well as a trivial fix for a kernel WARNING hit by syzkaller" * tag 'sound-5.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (28 commits) ALSA: hda/realtek: Fixed ALC298 sound bug by adding quirk for Samsung Notebook Pen S ALSA: info: Drop WARN_ON() from buffer NULL sanity check ASoC: rt5682: Report the button event in the headset type only ASoC: Intel: bytcht_es8316: Add missed put_device() ASoC: rt5682: Enable Vref2 under using PLL2 ASoC: rt286: fix unexpected interrupt happens ASoC: wm8974: remove unsupported clock mode ASoC: wm8974: fix Boost Mixer Aux Switch ASoC: SOF: core: fix null-ptr-deref bug during device removal ASoc: codecs: max98373: remove Idle_bias_on to let codec suspend ASoC: codecs: max98373: Removed superfluous volume control from chip default ASoC: topology: fix tlvs in error handling for widget_dmixer ASoC: topology: fix kernel oops on route addition error ASoC: SOF: imx: add min/max channels for SAI/ESAI on i.MX8/i.MX8M ASoC: Intel: bdw-rt5677: fix non BE conversion ASoC: soc-dai: set dai_link dpcm_ flags with a helper MAINTAINERS: Add Shengjiu to reviewer list of sound/soc/fsl ASoC: core: Remove only the registered component in devm functions MAINTAINERS: Change Maintainer for some at91 drivers ASoC: dt-bindings: simple-card: Fix 'make dt_binding_check' warnings ...
2020-07-21clocksource/drivers/timer-ti-dm: Fix suspend and resume for am3 and am4Tony Lindgren2-10/+58
Carlos Hernandez <[email protected]> reported that we now have a suspend and resume regresssion on am3 and am4 compared to the earlier kernels. While suspend and resume works with v5.8-rc3, we now get errors with rtcwake: pm33xx pm33xx: PM: Could not transition all powerdomains to target state ... rtcwake: write error This is because we now fail to idle the system timer clocks that the idle code checks and the error gets propagated to the rtcwake. Turns out there are several issues that need to be fixed: 1. Ignore no-idle and no-reset configured timers for the ti-sysc interconnect target driver as otherwise it will keep the system timer clocks enabled 2. Toggle the system timer functional clock for suspend for am3 and am4 (but not for clocksource on am3) 3. Only reconfigure type1 timers in dmtimer_systimer_disable() 4. Use of_machine_is_compatible() instead of of_device_is_compatible() for checking the SoC type Fixes: 52762fbd1c47 ("clocksource/drivers/timer-ti-dm: Add clockevent and clocksource support") Reported-by: Carlos Hernandez <[email protected]> Signed-off-by: Tony Lindgren <[email protected]> Tested-by: Carlos Hernandez <[email protected]> Signed-off-by: Daniel Lezcano <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-07-21s390/cpum_cf,perf: change DFLT_CCERROR counter nameThomas Richter2-3/+3
Change the counter name DLFT_CCERROR to DLFT_CCFINISH on IBM z15. This counter counts completed DEFLATE instructions with exit code 0, 1 or 2. Since exit code 0 means success and exit code 1 or 2 indicate errors, change the counter name to avoid confusion. This counter is incremented each time the DEFLATE instruction completed regardless if an error was detected or not. Fixes: d68d5d51dc89 ("s390/cpum_cf: Add new extended counters for IBM z15") Fixes: e7950166e402 ("perf vendor events s390: Add new deflate counters for IBM z15") Cc: [email protected] # v5.7 Signed-off-by: Thomas Richter <[email protected]> Reviewed-by: Sumanth Korikkar <[email protected]> Signed-off-by: Heiko Carstens <[email protected]>
2020-07-20riscv: kasan: use local_tlb_flush_all() to avoid uninitialized __sbi_rfenceVincent Chen1-2/+2
It fails to boot the v5.8-rc4 kernel with CONFIG_KASAN because kasan_init and kasan_early_init use uninitialized __sbi_rfence as executing the tlb_flush_all(). Actually, at this moment, only the CPU which is responsible for the system initialization enables the MMU. Other CPUs are parking at the .Lsecondary_start. Hence the tlb_flush_all() is able to be replaced by local_tlb_flush_all() to avoid using uninitialized __sbi_rfence. Signed-off-by: Vincent Chen <[email protected]> Signed-off-by: Palmer Dabbelt <[email protected]>
2020-07-20tipc: allow to build NACK message in link timeout functionTung Nguyen1-1/+1
Commit 02288248b051 ("tipc: eliminate gap indicator from ACK messages") eliminated sending of the 'gap' indicator in regular ACK messages and only allowed to build NACK message with enabled probe/probe_reply. However, necessary correction for building NACK message was missed in tipc_link_timeout() function. This leads to significant delay and link reset (due to retransmission failure) in lossy environment. This commit fixes it by setting the 'probe' flag to 'true' when the receive deferred queue is not empty. As a result, NACK message will be built to send back to another peer. Fixes: 02288248b051 ("tipc: eliminate gap indicator from ACK messages") Acked-by: Jon Maloy <[email protected]> Signed-off-by: Tung Nguyen <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-07-21exfat: fix name_hash computation on big endian systemsIlya Ponetayev1-4/+4
On-disk format for name_hash field is LE, so it must be explicitly transformed on BE system for proper result. Fixes: 370e812b3ec1 ("exfat: add nls operations") Cc: [email protected] # v5.7 Signed-off-by: Chen Minqiang <[email protected]> Signed-off-by: Ilya Ponetayev <[email protected]> Reviewed-by: Sungjong Seo <[email protected]> Signed-off-by: Namjae Jeon <[email protected]>
2020-07-21exfat: fix wrong size update of stream entry by typoHyeongseok Kim1-1/+1
The stream.size field is updated to the value of create timestamp of the file entry. Fix this to use correct stream entry pointer. Fixes: 29bbb14bfc80 ("exfat: fix incorrect update of stream entry in __exfat_truncate()") Signed-off-by: Hyeongseok Kim <[email protected]> Signed-off-by: Namjae Jeon <[email protected]>
2020-07-21exfat: fix wrong hint_stat initialization in exfat_find_dir_entry()Namjae Jeon1-1/+1
We found the wrong hint_stat initialization in exfat_find_dir_entry(). It should be initialized when cluster is EXFAT_EOF_CLUSTER. Fixes: ca06197382bd ("exfat: add directory operations") Cc: [email protected] # v5.7 Reviewed-by: Sungjong Seo <[email protected]> Signed-off-by: Namjae Jeon <[email protected]>
2020-07-21exfat: fix overflow issue in exfat_cluster_to_sector()Namjae Jeon1-1/+1
An overflow issue can occur while calculating sector in exfat_cluster_to_sector(). It needs to cast clus's type to sector_t before left shifting. Fixes: 1acf1a564b60 ("exfat: add in-memory and on-disk structures and headers") Cc: [email protected] # v5.7 Reviewed-by: Sungjong Seo <[email protected]> Signed-off-by: Namjae Jeon <[email protected]>
2020-07-20net: neterion: vxge: reduce stack usage in VXGE_COMPLETE_VPATH_TXBixuan Cui1-1/+1
Fix the warning: [-Werror=-Wframe-larger-than=] drivers/net/ethernet/neterion/vxge/vxge-main.c: In function'VXGE_COMPLETE_VPATH_TX.isra.37': drivers/net/ethernet/neterion/vxge/vxge-main.c:119:1: warning: the frame size of 1056 bytes is larger than 1024 bytes Dropping the NR_SKB_COMPLETED to 16 is appropriate that won't have much impact on performance and functionality. Signed-off-by: Bixuan Cui <[email protected]> Signed-off-by: Stephen Hemminger <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-07-20net: ag71xx: add missed clk_disable_unprepare in error path of probeHuang Guobin1-1/+2
The ag71xx_mdio_probe() forgets to call clk_disable_unprepare() when of_reset_control_get_exclusive() failed. Add the missed call to fix it. Fixes: d51b6ce441d3 ("net: ethernet: add ag71xx driver") Reported-by: Hulk Robot <[email protected]> Signed-off-by: Huang Guobin <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-07-20net/sched: act_ct: fix restore the qdisc_skb_cb after defragwenxu1-2/+14
The fragment packets do defrag in tcf_ct_handle_fragments will clear the skb->cb which make the qdisc_skb_cb clear too. So the qdsic_skb_cb should be store before defrag and restore after that. It also update the pkt_len after all the fragments finish the defrag to one packet and make the following actions counter correct. Fixes: b57dc7c13ea9 ("net/sched: Introduce action ct") Signed-off-by: wenxu <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-07-20net: dsa: use the ETH_MIN_MTU and ETH_DATA_LEN default valuesVladimir Oltean1-3/+0
Now that DSA supports MTU configuration, undo the effects of commit 8b1efc0f83f1 ("net: remove MTU limits on a few ether_setup callers") and let DSA interfaces use the default min_mtu and max_mtu specified by ether_setup(). This is more important for min_mtu: since DSA is Ethernet, the minimum MTU is the same as of any other Ethernet interface, and definitely not zero. For the max_mtu, we have a callback through which drivers can override that, if they want to. Signed-off-by: Vladimir Oltean <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>