Age | Commit message (Collapse) | Author | Files | Lines |
|
s/bounts/bounds/
Signed-off-by: Tobias Klauser <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
When we attach any cgroup hook, the rest (even if unused/unattached) start
to contribute small overhead. In particular, the one we want to avoid is
__cgroup_bpf_run_filter_skb which does two redirections to get to
the cgroup and pushes/pulls skb.
Let's split cgroup_bpf_enabled to be per-attach to make sure
only used attach types trigger.
I've dropped some existing high-level cgroup_bpf_enabled in some
places because BPF_PROG_CGROUP_XXX_RUN macros usually have another
cgroup_bpf_enabled check.
I also had to copy-paste BPF_CGROUP_RUN_SA_PROG_LOCK for
GETPEERNAME/GETSOCKNAME because type for cgroup_bpf_enabled[type]
has to be constant and known at compile time.
Signed-off-by: Stanislav Fomichev <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Acked-by: Song Liu <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
When we attach a bpf program to cgroup/getsockopt any other getsockopt()
syscall starts incurring kzalloc/kfree cost.
Let add a small buffer on the stack and use it for small (majority)
{s,g}etsockopt values. The buffer is small enough to fit into
the cache line and cover the majority of simple options (most
of them are 4 byte ints).
It seems natural to do the same for setsockopt, but it's a bit more
involved when the BPF program modifies the data (where we have to
kmalloc). The assumption is that for the majority of setsockopt
calls (which are doing pure BPF options or apply policy) this
will bring some benefit as well.
Without this patch (we remove about 1% __kmalloc):
3.38% 0.07% tcp_mmap [kernel.kallsyms] [k] __cgroup_bpf_run_filter_getsockopt
|
--3.30%--__cgroup_bpf_run_filter_getsockopt
|
--0.81%--__kmalloc
Signed-off-by: Stanislav Fomichev <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Acked-by: Martin KaFai Lau <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
Add custom implementation of getsockopt hook for TCP_ZEROCOPY_RECEIVE.
We skip generic hooks for TCP_ZEROCOPY_RECEIVE and have a custom
call in do_tcp_getsockopt using the on-stack data. This removes
3% overhead for locking/unlocking the socket.
Without this patch:
3.38% 0.07% tcp_mmap [kernel.kallsyms] [k] __cgroup_bpf_run_filter_getsockopt
|
--3.30%--__cgroup_bpf_run_filter_getsockopt
|
--0.81%--__kmalloc
With the patch applied:
0.52% 0.12% tcp_mmap [kernel.kallsyms] [k] __cgroup_bpf_run_filter_getsockopt_kern
Note, exporting uapi/tcp.h requires removing netinet/tcp.h
from test_progs.h because those headers have confliciting
definitions.
Signed-off-by: Stanislav Fomichev <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Acked-by: Martin KaFai Lau <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
llvm patch https://reviews.llvm.org/D84002 permitted
to emit empty rodata datasec if the elf .rodata section
contains read-only data from local variables. These
local variables will be not emitted as BTF_KIND_VARs
since llvm converted these local variables as
static variables with private linkage without debuginfo
types. Such an empty rodata datasec will make
skeleton code generation easy since for skeleton
a rodata struct will be generated if there is a
.rodata elf section. The existence of a rodata
btf datasec is also consistent with the existence
of a rodata map created by libbpf.
The btf with such an empty rodata datasec will fail
in the kernel though as kernel will reject a datasec
with zero vlen and zero size. For example, for the below code,
int sys_enter(void *ctx)
{
int fmt[6] = {1, 2, 3, 4, 5, 6};
int dst[6];
bpf_probe_read(dst, sizeof(dst), fmt);
return 0;
}
We got the below btf (bpftool btf dump ./test.o):
[1] PTR '(anon)' type_id=0
[2] FUNC_PROTO '(anon)' ret_type_id=3 vlen=1
'ctx' type_id=1
[3] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED
[4] FUNC 'sys_enter' type_id=2 linkage=global
[5] INT 'char' size=1 bits_offset=0 nr_bits=8 encoding=SIGNED
[6] ARRAY '(anon)' type_id=5 index_type_id=7 nr_elems=4
[7] INT '__ARRAY_SIZE_TYPE__' size=4 bits_offset=0 nr_bits=32 encoding=(none)
[8] VAR '_license' type_id=6, linkage=global-alloc
[9] DATASEC '.rodata' size=0 vlen=0
[10] DATASEC 'license' size=0 vlen=1
type_id=8 offset=0 size=4
When loading the ./test.o to the kernel with bpftool,
we see the following error:
libbpf: Error loading BTF: Invalid argument(22)
libbpf: magic: 0xeb9f
...
[6] ARRAY (anon) type_id=5 index_type_id=7 nr_elems=4
[7] INT __ARRAY_SIZE_TYPE__ size=4 bits_offset=0 nr_bits=32 encoding=(none)
[8] VAR _license type_id=6 linkage=1
[9] DATASEC .rodata size=24 vlen=0 vlen == 0
libbpf: Error loading .BTF into kernel: -22. BTF is optional, ignoring.
Basically, libbpf changed .rodata datasec size to 24 since elf .rodata
section size is 24. The kernel then rejected the BTF since vlen = 0.
Note that the above kernel verifier failure can be worked around with
changing local variable "fmt" to a static or global, optionally const, variable.
This patch permits a datasec with vlen = 0 in kernel.
Signed-off-by: Yonghong Song <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
Introduce __xdp_build_skb_from_frame utility routine to build
the skb from xdp_frame. Rely on __xdp_build_skb_from_frame in
cpumap code.
Signed-off-by: Lorenzo Bianconi <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Jesper Dangaard Brouer <[email protected]>
Link: https://lore.kernel.org/bpf/4f9f4c6b3dd3933770c617eb6689dbc0c6e25863.1610475660.git.lorenzo@kernel.org
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
Conflicts:
drivers/net/can/dev.c
commit 03f16c5075b2 ("can: dev: can_restart: fix use after free bug")
commit 3e77f70e7345 ("can: dev: move driver related infrastructure into separate subdir")
Code move.
drivers/net/dsa/b53/b53_common.c
commit 8e4052c32d6b ("net: dsa: b53: fix an off by one in checking "vlan->vid"")
commit b7a9e0da2d1c ("net: switchdev: remove vid_begin -> vid_end range from VLAN objects")
Field rename.
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Networking fixes for 5.11-rc5, including fixes from bpf, wireless, and
can trees.
Current release - regressions:
- nfc: nci: fix the wrong NCI_CORE_INIT parameters
Current release - new code bugs:
- bpf: allow empty module BTFs
Previous releases - regressions:
- bpf: fix signed_{sub,add32}_overflows type handling
- tcp: do not mess with cloned skbs in tcp_add_backlog()
- bpf: prevent double bpf_prog_put call from bpf_tracing_prog_attach
- bpf: don't leak memory in bpf getsockopt when optlen == 0
- tcp: fix potential use-after-free due to double kfree()
- mac80211: fix encryption issues with WEP
- devlink: use right genl user_ptr when handling port param get/set
- ipv6: set multicast flag on the multicast route
- tcp: fix TCP_USER_TIMEOUT with zero window
Previous releases - always broken:
- bpf: local storage helpers should check nullness of owner ptr passed
- mac80211: fix incorrect strlen of .write in debugfs
- cls_flower: call nla_ok() before nla_next()
- skbuff: back tiny skbs with kmalloc() in __netdev_alloc_skb() too"
* tag 'net-5.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (52 commits)
net: systemport: free dev before on error path
net: usb: cdc_ncm: don't spew notifications
net: mscc: ocelot: Fix multicast to the CPU port
tcp: Fix potential use-after-free due to double kfree()
bpf: Fix signed_{sub,add32}_overflows type handling
can: peak_usb: fix use after free bugs
can: vxcan: vxcan_xmit: fix use after free bug
can: dev: can_restart: fix use after free bug
tcp: fix TCP socket rehash stats mis-accounting
net: dsa: b53: fix an off by one in checking "vlan->vid"
tcp: do not mess with cloned skbs in tcp_add_backlog()
selftests: net: fib_tests: remove duplicate log test
net: nfc: nci: fix the wrong NCI_CORE_INIT parameters
sh_eth: Fix power down vs. is_opened flag ordering
net: Disable NETIF_F_HW_TLS_RX when RXCSUM is disabled
netfilter: rpfilter: mask ecn bits before fib lookup
udp: mask TOS bits in udp_v4_early_demux()
xsk: Clear pool even for inactive queues
bpf: Fix helper bpf_map_peek_elem_proto pointing to wrong callback
sh_eth: Make PHY access aware of Runtime PM to fix reboot crash
...
|
|
Fix incorrect signed_{sub,add32}_overflows() input types (and a related buggy
comment). It looks like this might have slipped in via copy/paste issue, also
given prior to 3f50f132d840 ("bpf: Verifier, do explicit ALU32 bounds tracking")
the signature of signed_sub_overflows() had s64 a and s64 b as its input args
whereas now they are truncated to s32. Thus restore proper types. Also, the case
of signed_add32_overflows() is not consistent to signed_sub32_overflows(). Both
have s32 as inputs, therefore align the former.
Fixes: 3f50f132d840 ("bpf: Verifier, do explicit ALU32 bounds tracking")
Reported-by: De4dCr0w <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Reviewed-by: John Fastabend <[email protected]>
Acked-by: Alexei Starovoitov <[email protected]>
|
|
Pull task_work fix from Jens Axboe:
"The TIF_NOTIFY_SIGNAL change inadvertently removed the unconditional
task_work run we had in get_signal().
This caused a regression for some setups, since we're relying on eg
____fput() being run to close and release, for example, a pipe and
wake the other end.
For 5.11, I prefer the simple solution of just reinstating the
unconditional run, even if it conceptually doesn't make much sense -
if you need that kind of guarantee, you should be using TWA_SIGNAL
instead of TWA_NOTIFY. But it's the trivial fix for 5.11, and would
ensure that other potential gotchas/assumptions for task_work don't
regress for 5.11.
We're looking into further simplifying the task_work notifications for
5.12 which would resolve that too"
* tag 'task_work-2021-01-19' of git://git.kernel.dk/linux-block:
task_work: unconditionally run task_work from get_signal()
|
|
I assume this was obtained by copy/paste. Point it to bpf_map_peek_elem()
instead of bpf_map_pop_elem(). In practice it may have been less likely
hit when under JIT given shielded via 84430d4232c3 ("bpf, verifier: avoid
retpoline for map push/pop/peek operation").
Fixes: f1a2e44a3aec ("bpf: add queue and stack maps")
Signed-off-by: Mircea Cirjaliu <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Cc: Mauricio Vasquez <[email protected]>
Link: https://lore.kernel.org/bpf/AM7PR02MB6082663DFDCCE8DA7A6DD6B1BBA30@AM7PR02MB6082.eurprd02.prod.outlook.com
|
|
Daniel Borkmann says:
====================
pull-request: bpf-next 2021-01-16
1) Extend atomic operations to the BPF instruction set along with x86-64 JIT support,
that is, atomic{,64}_{xchg,cmpxchg,fetch_{add,and,or,xor}}, from Brendan Jackman.
2) Add support for using kernel module global variables (__ksym externs in BPF
programs) retrieved via module's BTF, from Andrii Nakryiko.
3) Generalize BPF stackmap's buildid retrieval and add support to have buildid
stored in mmap2 event for perf, from Jiri Olsa.
4) Various fixes for cross-building BPF sefltests out-of-tree which then will
unblock wider automated testing on ARM hardware, from Jean-Philippe Brucker.
5) Allow to retrieve SOL_SOCKET opts from sock_addr progs, from Daniel Borkmann.
6) Clean up driver's XDP buffer init and split into two helpers to init per-
descriptor and non-changing fields during processing, from Lorenzo Bianconi.
7) Minor misc improvements to libbpf & bpftool, from Ian Rogers.
* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (41 commits)
perf: Add build id data in mmap2 event
bpf: Add size arg to build_id_parse function
bpf: Move stack_map_get_build_id into lib
bpf: Document new atomic instructions
bpf: Add tests for new BPF atomic operations
bpf: Add bitwise atomic instructions
bpf: Pull out a macro for interpreting atomic ALU operations
bpf: Add instructions for atomic_[cmp]xchg
bpf: Add BPF_FETCH field / create atomic_fetch_add instruction
bpf: Move BPF_STX reserved field check into BPF_STX verifier code
bpf: Rename BPF_XADD and prepare to encode other atomics in .imm
bpf: x86: Factor out a lookup table for some ALU opcodes
bpf: x86: Factor out emission of REX byte
bpf: x86: Factor out emission of ModR/M for *(reg + off)
tools/bpftool: Add -Wall when building BPF programs
bpf, libbpf: Avoid unused function warning on bpf_tail_call_static
selftests/bpf: Install btf_dump test cases
selftests/bpf: Fix installation of urandom_read
selftests/bpf: Move generated test files to $(TEST_GEN_FILES)
selftests/bpf: Fix out-of-tree build
...
====================
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Daniel Borkmann says:
====================
pull-request: bpf 2021-01-16
1) Fix a double bpf_prog_put() for BPF_PROG_{TYPE_EXT,TYPE_TRACING} types in
link creation's error path causing a refcount underflow, from Jiri Olsa.
2) Fix BTF validation errors for the case where kernel modules don't declare
any new types and end up with an empty BTF, from Andrii Nakryiko.
3) Fix BPF local storage helpers to first check their {task,inode} owners for
being NULL before access, from KP Singh.
4) Fix a memory leak in BPF setsockopt handling for the case where optlen is
zero and thus temporary optval buffer should be freed, from Stanislav Fomichev.
5) Fix a syzbot memory allocation splat in BPF_PROG_TEST_RUN infra for
raw_tracepoint caused by too big ctx_size_in, from Song Liu.
6) Fix LLVM code generation issues with verifier where PTR_TO_MEM{,_OR_NULL}
registers were spilled to stack but not recognized, from Gilad Reti.
* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
MAINTAINERS: Update my email address
selftests/bpf: Add verifier test for PTR_TO_MEM spill
bpf: Support PTR_TO_MEM{,_OR_NULL} register spilling
bpf: Reject too big ctx_size_in for raw_tp test run
libbpf: Allow loading empty BTFs
bpf: Allow empty module BTFs
bpf: Don't leak memory in bpf getsockopt when optlen == 0
bpf: Update local storage test to check handling of null ptrs
bpf: Fix typo in bpf_inode_storage.c
bpf: Local storage helpers should check nullness of owner ptr passed
bpf: Prevent double bpf_prog_put call from bpf_tracing_prog_attach
====================
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Adding support to carry build id data in mmap2 event.
The build id data replaces maj/min/ino/ino_generation
fields, which are also used to identify map's binary,
so it's ok to replace them with build id data:
union {
struct {
u32 maj;
u32 min;
u64 ino;
u64 ino_generation;
};
struct {
u8 build_id_size;
u8 __reserved_1;
u16 __reserved_2;
u8 build_id[20];
};
};
Replaced maj/min/ino/ino_generation fields give us size
of 24 bytes. We use 20 bytes for build id data, 1 byte
for size and rest is unused.
There's new misc bit for mmap2 to signal there's build
id data in it:
#define PERF_RECORD_MISC_MMAP_BUILD_ID (1 << 14)
Signed-off-by: Jiri Olsa <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
It's possible to have other build id types (other than default SHA1).
Currently there's also ld support for MD5 build id.
Adding size argument to build_id_parse function, that returns (if defined)
size of the parsed build id, so we can recognize the build id type.
Signed-off-by: Jiri Olsa <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
Moving stack_map_get_build_id into lib with
declaration in linux/buildid.h header:
int build_id_parse(struct vm_area_struct *vma, unsigned char *build_id);
This function returns build id for given struct vm_area_struct.
There is no functional change to stack_map_get_build_id function.
Signed-off-by: Jiri Olsa <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Acked-by: Song Liu <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
This adds instructions for
atomic[64]_[fetch_]and
atomic[64]_[fetch_]or
atomic[64]_[fetch_]xor
All these operations are isomorphic enough to implement with the same
verifier, interpreter, and x86 JIT code, hence being a single commit.
The main interesting thing here is that x86 doesn't directly support
the fetch_ version these operations, so we need to generate a CMPXCHG
loop in the JIT. This requires the use of two temporary registers,
IIUC it's safe to use BPF_REG_AX and x86's AUX_REG for this purpose.
Signed-off-by: Brendan Jackman <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Acked-by: Yonghong Song <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
Since the atomic operations that are added in subsequent commits are
all isomorphic with BPF_ADD, pull out a macro to avoid the
interpreter becoming dominated by lines of atomic-related code.
Note that this sacrificies interpreter performance (combining
STX_ATOMIC_W and STX_ATOMIC_DW into single switch case means that we
need an extra conditional branch to differentiate them) in favour of
compact and (relatively!) simple C code.
Signed-off-by: Brendan Jackman <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Acked-by: Yonghong Song <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
This adds two atomic opcodes, both of which include the BPF_FETCH
flag. XCHG without the BPF_FETCH flag would naturally encode
atomic_set. This is not supported because it would be of limited
value to userspace (it doesn't imply any barriers). CMPXCHG without
BPF_FETCH woulud be an atomic compare-and-write. We don't have such
an operation in the kernel so it isn't provided to BPF either.
There are two significant design decisions made for the CMPXCHG
instruction:
- To solve the issue that this operation fundamentally has 3
operands, but we only have two register fields. Therefore the
operand we compare against (the kernel's API calls it 'old') is
hard-coded to be R0. x86 has similar design (and A64 doesn't
have this problem).
A potential alternative might be to encode the other operand's
register number in the immediate field.
- The kernel's atomic_cmpxchg returns the old value, while the C11
userspace APIs return a boolean indicating the comparison
result. Which should BPF do? A64 returns the old value. x86 returns
the old value in the hard-coded register (and also sets a
flag). That means return-old-value is easier to JIT, so that's
what we use.
Signed-off-by: Brendan Jackman <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Acked-by: Yonghong Song <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
The BPF_FETCH field can be set in bpf_insn.imm, for BPF_ATOMIC
instructions, in order to have the previous value of the
atomically-modified memory location loaded into the src register
after an atomic op is carried out.
Suggested-by: Yonghong Song <[email protected]>
Signed-off-by: Brendan Jackman <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Acked-by: John Fastabend <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
I can't find a reason why this code is in resolve_pseudo_ldimm64;
since I'll be modifying it in a subsequent commit, tidy it up.
Signed-off-by: Brendan Jackman <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Acked-by: Yonghong Song <[email protected]>
Acked-by: John Fastabend <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
A subsequent patch will add additional atomic operations. These new
operations will use the same opcode field as the existing XADD, with
the immediate discriminating different operations.
In preparation, rename the instruction mode BPF_ATOMIC and start
calling the zero immediate BPF_ADD.
This is possible (doesn't break existing valid BPF progs) because the
immediate field is currently reserved MBZ and BPF_ADD is zero.
All uses are removed from the tree but the BPF_XADD definition is
kept around to avoid breaking builds for people including kernel
headers.
Signed-off-by: Brendan Jackman <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Acked-by: Björn Töpel <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
Add support for pointer to mem register spilling, to allow the verifier
to track pointers to valid memory addresses. Such pointers are returned
for example by a successful call of the bpf_ringbuf_reserve helper.
The patch was partially contributed by CyberArk Software, Inc.
Fixes: 457f44363a88 ("bpf: Implement BPF ring buffer and verifier support for it")
Suggested-by: Yonghong Song <[email protected]>
Signed-off-by: Gilad Reti <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Acked-by: KP Singh <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
Add support for directly accessing kernel module variables from BPF programs
using special ldimm64 instructions. This functionality builds upon vmlinux
ksym support, but extends ldimm64 with src_reg=BPF_PSEUDO_BTF_ID to allow
specifying kernel module BTF's FD in insn[1].imm field.
During BPF program load time, verifier will resolve FD to BTF object and will
take reference on BTF object itself and, for module BTFs, corresponding module
as well, to make sure it won't be unloaded from under running BPF program. The
mechanism used is similar to how bpf_prog keeps track of used bpf_maps.
One interesting change is also in how per-CPU variable is determined. The
logic is to find .data..percpu data section in provided BTF, but both vmlinux
and module each have their own .data..percpu entries in BTF. So for module's
case, the search for DATASEC record needs to look at only module's added BTF
types. This is implemented with custom search function.
Signed-off-by: Andrii Nakryiko <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Acked-by: Yonghong Song <[email protected]>
Acked-by: Hao Luo <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
The error message here is misleading, the argument will be rejected unless
it is a known constant.
Signed-off-by: Brendan Jackman <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Yonghong Song <[email protected]>
Acked-by: Andrii Nakryiko <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
Some modules don't declare any new types and end up with an empty BTF,
containing only valid BTF header and no types or strings sections. This
currently causes BTF validation error. There is nothing wrong with such BTF,
so fix the issue by allowing module BTFs with no types or strings.
Fixes: 36e68442d1af ("bpf: Load and verify kernel module BTFs")
Reported-by: Christopher William Snowhill <[email protected]>
Signed-off-by: Andrii Nakryiko <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Yonghong Song <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
optlen == 0 indicates that the kernel should ignore BPF buffer
and use the original one from the user. We, however, forget
to free the temporary buffer that we've allocated for BPF.
Fixes: d8fe449a9c51 ("bpf: Don't return EINVAL from {get,set}sockopt when optlen > PAGE_SIZE")
Reported-by: Martin KaFai Lau <[email protected]>
Signed-off-by: Stanislav Fomichev <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Martin KaFai Lau <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
Fix "gurranteed" -> "guaranteed" in bpf_inode_storage.c
Suggested-by: Andrii Nakryiko <[email protected]>
Signed-off-by: KP Singh <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Yonghong Song <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
The verifier allows ARG_PTR_TO_BTF_ID helper arguments to be NULL, so
helper implementations need to check this before dereferencing them.
This was already fixed for the socket storage helpers but not for task
and inode.
The issue can be reproduced by attaching an LSM program to
inode_rename hook (called when moving files) which tries to get the
inode of the new file without checking for its nullness and then trying
to move an existing file to a new path:
mv existing_file new_file_does_not_exist
The report including the sample program and the steps for reproducing
the bug:
https://lore.kernel.org/bpf/CANaYP3HWkH91SN=wTNO9FL_2ztHfqcXKX38SSE-JJ2voh+vssw@mail.gmail.com
Fixes: 4cf1bc1f1045 ("bpf: Implement task local storage")
Fixes: 8ea636848aca ("bpf: Implement bpf_local_storage for inodes")
Reported-by: Gilad Reti <[email protected]>
Signed-off-by: KP Singh <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Martin KaFai Lau <[email protected]>
Acked-by: Yonghong Song <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
The bpf_tracing_prog_attach error path calls bpf_prog_put
on prog, which causes refcount underflow when it's called
from link_create function.
link_create
prog = bpf_prog_get <-- get
...
tracing_bpf_link_attach(prog..
bpf_tracing_prog_attach(prog..
out_put_prog:
bpf_prog_put(prog); <-- put
if (ret < 0)
bpf_prog_put(prog); <-- put
Removing bpf_prog_put call from bpf_tracing_prog_attach
and making sure its callers call it instead.
Fixes: 4a1e7c0c63e0 ("bpf: Support attaching freplace programs to multiple attach points")
Signed-off-by: Jiri Olsa <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Toke Høiland-Jørgensen <[email protected]>
Acked-by: Andrii Nakryiko <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fix from Steven Rostedt:
"Blacklist properly on all archs.
The code to blacklist notrace functions for kprobes was not using the
right kconfig option, which caused some archs (powerpc) to possibly
not blacklist them"
* tag 'trace-v5.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing/kprobes: Do the notrace functions check without kprobes on ftrace
|
|
Enable the notrace function check on the architecture which doesn't
support kprobes on ftrace but support dynamic ftrace. This notrace
function check is not only for the kprobes on ftrace but also
sw-breakpoint based kprobes.
Thus there is no reason to limit this check for the arch which
supports kprobes on ftrace.
This also changes the dependency of Kconfig. Because kprobe event
uses the function tracer's address list for identifying notrace
function, if the CONFIG_DYNAMIC_FTRACE=n, it can not check whether
the target function is notrace or not.
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/161007957862.114704.4512260007555399463.stgit@devnote2
Cc: [email protected]
Fixes: 45408c4f92506 ("tracing: kprobes: Prohibit probing on notrace function")
Acked-by: Naveen N. Rao <[email protected]>
Signed-off-by: Masami Hiramatsu <[email protected]>
Signed-off-by: Steven Rostedt (VMware) <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull staging driver fixes from Greg KH:
"Here are some small staging driver fixes for 5.11-rc3. Nothing major,
just resolving some reported issues:
- cleanup some remaining mentions of the ION drivers that were
removed in 5.11-rc1
- comedi driver bugfix
- two error path memory leak fixes
All have been in linux-next for a while with no reported issues"
* tag 'staging-5.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
staging: ION: remove some references to CONFIG_ION
staging: mt7621-dma: Fix a resource leak in an error handling path
Staging: comedi: Return -EFAULT if copy_to_user() fails
staging: spmi: hisi-spmi-controller: Fix some error handling paths
|
|
This program does not use argp (which is a glibcism). Instead include <errno.h>
directly, which was pulled in by <argp.h>.
Signed-off-by: Leah Neukirchen <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Song Liu <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
Song reported a boot regression in a kvm image with 5.11-rc, and bisected
it down to the below patch. Debugging this issue, turns out that the boot
stalled when a task is waiting on a pipe being released. As we no longer
run task_work from get_signal() unless it's queued with TWA_SIGNAL, the
task goes idle without running the task_work. This prevents ->release()
from being called on the pipe, which another boot task is waiting on.
For now, re-instate the unconditional task_work run from get_signal().
For 5.12, we'll collapse TWA_RESUME and TWA_SIGNAL, as it no longer
makes sense to have a distinction between the two. This will turn
task_work notification into a simple boolean, whether to notify or not.
Fixes: 98b89b649fce ("signal: kill JOBCTL_TASK_WORK")
Reported-by: Song Liu <[email protected]>
Tested-by: John Stultz <[email protected]>
Tested-by: Douglas Anderson <[email protected]>
Tested-by: Sedat Dilek <[email protected]> # LLVM/Clang version 11.0.1
Signed-off-by: Jens Axboe <[email protected]>
|
|
Alexei Starovoitov says:
====================
pull-request: bpf 2021-01-07
We've added 4 non-merge commits during the last 10 day(s) which contain
a total of 4 files changed, 14 insertions(+), 7 deletions(-).
The main changes are:
1) Fix task_iter bug caused by the merge conflict resolution, from Yonghong.
2) Fix resolve_btfids for multiple type hierarchies, from Jiri.
* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
bpftool: Fix compilation failure for net.o with older glibc
tools/resolve_btfids: Warn when having multiple IDs for single type
bpf: Fix a task_iter bug caused by a merge conflict resolution
selftests/bpf: Fix a compile error for BPF_F_BPRM_SECUREEXEC
====================
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
With commit e722a295cf49 ("staging: ion: remove from the tree"), ION and
its corresponding config CONFIG_ION is gone. Remove stale references
from drivers/staging/media/atomisp/pci and from the recommended Android
kernel config.
Fixes: e722a295cf49 ("staging: ion: remove from the tree")
Cc: Hridya Valsaraju <[email protected]>
Cc: Rob Herring <[email protected]>
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Matthias Maennich <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Networking fixes, including fixes from netfilter, wireless and bpf
trees.
Current release - regressions:
- mt76: fix NULL pointer dereference in mt76u_status_worker and
mt76s_process_tx_queue
- net: ipa: fix interconnect enable bug
Current release - always broken:
- netfilter: fixes possible oops in mtype_resize in ipset
- ath11k: fix number of coding issues found by static analysis tools
and spurious error messages
Previous releases - regressions:
- e1000e: re-enable s0ix power saving flows for systems with the
Intel i219-LM Ethernet controllers to fix power use regression
- virtio_net: fix recursive call to cpus_read_lock() to avoid a
deadlock
- ipv4: ignore ECN bits for fib lookups in fib_compute_spec_dst()
- sysfs: take the rtnl lock around XPS configuration
- xsk: fix memory leak for failed bind and rollback reservation at
NETDEV_TX_BUSY
- r8169: work around power-saving bug on some chip versions
Previous releases - always broken:
- dcb: validate netlink message in DCB handler
- tun: fix return value when the number of iovs exceeds MAX_SKB_FRAGS
to prevent unnecessary retries
- vhost_net: fix ubuf refcount when sendmsg fails
- bpf: save correct stopping point in file seq iteration
- ncsi: use real net-device for response handler
- neighbor: fix div by zero caused by a data race (TOCTOU)
- bareudp: fix use of incorrect min_headroom size and a false
positive lockdep splat from the TX lock
- mvpp2:
- clear force link UP during port init procedure in case
bootloader had set it
- add TCAM entry to drop flow control pause frames
- fix PPPoE with ipv6 packet parsing
- fix GoP Networking Complex Control config of port 3
- fix pkt coalescing IRQ-threshold configuration
- xsk: fix race in SKB mode transmit with shared cq
- ionic: account for vlan tag len in rx buffer len
- stmmac: ignore the second clock input, current clock framework does
not handle exclusive clock use well, other drivers may reconfigure
the second clock
Misc:
- ppp: change PPPIOCUNBRIDGECHAN ioctl request number to follow
existing scheme"
* tag 'net-5.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (99 commits)
net: dsa: lantiq_gswip: Fix GSWIP_MII_CFG(p) register access
net: dsa: lantiq_gswip: Enable GSWIP_MII_CFG_EN also for internal PHYs
net: lapb: Decrease the refcount of "struct lapb_cb" in lapb_device_event
r8169: work around power-saving bug on some chip versions
net: usb: qmi_wwan: add Quectel EM160R-GL
selftests: mlxsw: Set headroom size of correct port
net: macb: Correct usage of MACB_CAPS_CLK_HW_CHG flag
ibmvnic: fix: NULL pointer dereference.
docs: networking: packet_mmap: fix old config reference
docs: networking: packet_mmap: fix formatting for C macros
vhost_net: fix ubuf refcount incorrectly when sendmsg fails
bareudp: Fix use of incorrect min_headroom size
bareudp: set NETIF_F_LLTX flag
net: hdlc_ppp: Fix issues when mod_timer is called while timer is running
atlantic: remove architecture depends
erspan: fix version 1 check in gre_parse_header()
net: hns: fix return value check in __lb_other_process()
net: sched: prevent invalid Scell_log shift count
net: neighbor: fix a crash caused by mod zero
ipv4: Ignore ECN bits for fib lookups in fib_compute_spec_dst()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu
Pull RCU fix from Paul McKenney:
"This is a fix for a regression in the v5.10 merge window, but it was
reported quite late in the v5.10 process, plus generating and testing
the fix took some time.
The regression is due to commit 36dadef23fcc ("kprobes: Init kprobes
in early_initcall") which on powerpc can use RCU Tasks before
initialization, resulting in boot failures.
The fix is straightforward, simply moving initialization of RCU Tasks
before the early_initcall()s. The fix has been exposed to -next and
kbuild test robot testing, and has been tested by the PowerPC guys"
* 'rcu/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu:
rcu-tasks: Move RCU-tasks initialization to before early_initcall()
|
|
Latest bpf tree has a bug for bpf_iter selftest:
$ ./test_progs -n 4/25
test_bpf_sk_storage_get:PASS:bpf_iter_bpf_sk_storage_helpers__open_and_load 0 nsec
test_bpf_sk_storage_get:PASS:socket 0 nsec
...
do_dummy_read:PASS:read 0 nsec
test_bpf_sk_storage_get:FAIL:bpf_map_lookup_elem map value wasn't set correctly
(expected 1792, got -1, err=0)
#4/25 bpf_sk_storage_get:FAIL
#4 bpf_iter:FAIL
Summary: 0/0 PASSED, 0 SKIPPED, 2 FAILED
When doing merge conflict resolution, Commit 4bfc4714849d missed to
save curr_task to seq_file private data. The task pointer in seq_file
private data is passed to bpf program. This caused NULL-pointer task
passed to bpf program which will immediately return upon checking
whether task pointer is NULL.
This patch added back the assignment of curr_task to seq_file private
data and fixed the issue.
Fixes: 4bfc4714849d ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf")
Signed-off-by: Yonghong Song <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: John Fastabend <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
Pull io_uring fixes from Jens Axboe:
"A few fixes that should go into 5.11, all marked for stable as well:
- Fix issue around identity COW'ing and users that share a ring
across processes
- Fix a hang associated with unregistering fixed files (Pavel)
- Move the 'process is exiting' cancelation a bit earlier, so
task_works aren't affected by it (Pavel)"
* tag 'io_uring-5.11-2021-01-01' of git://git.kernel.dk/linux-block:
kernel/io_uring: cancel io_uring before task works
io_uring: fix io_sqe_files_unregister() hangs
io_uring: add a helper for setting a ref node
io_uring: don't assume mm is constant across submits
|
|
For cancelling io_uring requests it needs either to be able to run
currently enqueued task_works or having it shut down by that moment.
Otherwise io_uring_cancel_files() may be waiting for requests that won't
ever complete.
Go with the first way and do cancellations before setting PF_EXITING and
so before putting the task_work infrastructure into a transition state
where task_work_run() would better not be called.
Cc: [email protected] # 5.5+
Signed-off-by: Pavel Begunkov <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Daniel Borkmann says:
====================
pull-request: bpf 2020-12-28
The following pull-request contains BPF updates for your *net* tree.
There is a small merge conflict between bpf tree commit 69ca310f3416
("bpf: Save correct stopping point in file seq iteration") and net tree
commit 66ed594409a1 ("bpf/task_iter: In task_file_seq_get_next use
task_lookup_next_fd_rcu"). The get_files_struct() does not exist anymore
in net, so take the hunk in HEAD and add the `info->tid = curr_tid` to
the error path:
[...]
curr_task = task_seq_get_next(ns, &curr_tid, true);
if (!curr_task) {
info->task = NULL;
info->tid = curr_tid;
return NULL;
}
/* set info->task and info->tid */
[...]
We've added 10 non-merge commits during the last 9 day(s) which contain
a total of 11 files changed, 75 insertions(+), 20 deletions(-).
The main changes are:
1) Various AF_XDP fixes such as fill/completion ring leak on failed bind and
fixing a race in skb mode's backpressure mechanism, from Magnus Karlsson.
2) Fix latency spikes on lockdep enabled kernels by adding a rescheduling
point to BPF hashtab initialization, from Eric Dumazet.
3) Fix a splat in task iterator by saving the correct stopping point in the
seq file iteration, from Jonathan Lemon.
4) Fix BPF maps selftest by adding retries in case hashtab returns EBUSY
errors on update/deletes, from Andrii Nakryiko.
5) Fix BPF selftest error reporting to something more user friendly if the
vmlinux BTF cannot be found, from Kamal Mostafa.
====================
Signed-off-by: David S. Miller <[email protected]>
|
|
Pull workqueue update from Tejun Heo:
"The same as the cgroup tree - one commit which was scheduled for the
5.11 merge window.
All the commit does is avoding spurious worker wakeups from workqueue
allocation / config change path to help cpuisol use cases"
* 'for-5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
workqueue: Kick a worker based on the actual activation of delayed works
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup updates from Tejun Heo:
"These three patches were scheduled for the merge window but I forgot
to send them out. Sorry about that.
None of them are significant and they fit well in a fix pull request
too - two are cosmetic and one fixes a memory leak in the mount option
parsing path"
* 'for-5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup: Fix memory leak when parsing multiple source parameters
cgroup/cgroup.c: replace 'of->kn->priv' with of_cft()
kernel: cgroup: Mundane spelling fixes throughout the file
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking fixes from Ingo Molnar:
"Misc fixes/updates:
- Fix static keys usage in module __init sections
- Add separate MAINTAINERS entry for static branches/calls
- Fix lockdep splat with CONFIG_PREEMPTIRQ_EVENTS=y tracing"
* tag 'locking-urgent-2020-12-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
softirq: Avoid bad tracing / lockdep interaction
jump_label/static_call: Add MAINTAINERS
jump_label: Fix usage in module __init
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fixes from Ingo Molnar:
"Update/fix two CPU sanity checks in the hotplug and the boot code, and
fix a typo in the Kconfig help text.
[ Context: the first two commits are the result of an ongoing
annotation+review work of (intentional) tick_do_timer_cpu() data
races reported by KCSAN, but the annotations aren't fully cooked
yet ]"
* tag 'timers-urgent-2020-12-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
timekeeping: Fix spelling mistake in Kconfig "fullfill" -> "fulfill"
tick/sched: Remove bogus boot "safety" check
tick: Remove pointless cpu valid check in hotplug code
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fix from Ingo Molnar:
"Fix a context switch performance regression"
* tag 'sched-urgent-2020-12-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched: Optimize finish_lock_switch()
|
|
Commit 64a1b95bb9fe ("genirq: Restrict export of irq_to_desc()") removed
the export of irq_to_desc() unless powerpc KVM is being built, because
there is still a use of irq_to_desc() in modular code there.
However it used:
#ifdef CONFIG_KVM_BOOK3S_64_HV
Which doesn't work when that symbol is =m, leading to a build failure:
ERROR: modpost: "irq_to_desc" [arch/powerpc/kvm/kvm-hv.ko] undefined!
Fix it by checking for the definedness of the correct symbol which is
CONFIG_KVM_BOOK3S_64_HV_MODULE.
Fixes: 64a1b95bb9fe ("genirq: Restrict export of irq_to_desc()")
Signed-off-by: Michael Ellerman <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq updates from Thomas Gleixner:
"This is the second attempt after the first one failed miserably and
got zapped to unblock the rest of the interrupt related patches.
A treewide cleanup of interrupt descriptor (ab)use with all sorts of
racy accesses, inefficient and disfunctional code. The goal is to
remove the export of irq_to_desc() to prevent these things from
creeping up again"
* tag 'irq-core-2020-12-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (30 commits)
genirq: Restrict export of irq_to_desc()
xen/events: Implement irq distribution
xen/events: Reduce irq_info:: Spurious_cnt storage size
xen/events: Only force affinity mask for percpu interrupts
xen/events: Use immediate affinity setting
xen/events: Remove disfunct affinity spreading
xen/events: Remove unused bind_evtchn_to_irq_lateeoi()
net/mlx5: Use effective interrupt affinity
net/mlx5: Replace irq_to_desc() abuse
net/mlx4: Use effective interrupt affinity
net/mlx4: Replace irq_to_desc() abuse
PCI: mobiveil: Use irq_data_get_irq_chip_data()
PCI: xilinx-nwl: Use irq_data_get_irq_chip_data()
NTB/msi: Use irq_has_action()
mfd: ab8500-debugfs: Remove the racy fiddling with irq_desc
pinctrl: nomadik: Use irq_has_action()
drm/i915/pmu: Replace open coded kstat_irqs() copy
drm/i915/lpe_audio: Remove pointless irq_to_desc() usage
s390/irq: Use irq_desc_kstat_cpu() in show_msi_interrupt()
parisc/irq: Use irq_desc_kstat_cpu() in show_interrupts()
...
|