Age | Commit message (Collapse) | Author | Files | Lines |
|
Add test for matchall classifier with mirred egress mirror action.
Signed-off-by: Jiri Pirko <[email protected]>
Signed-off-by: Ido Schimmel <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
This patch adds a new port attribute, IFLA_BRPORT_MRP_RING_OPEN, which allows
to notify the userspace when the port lost the continuite of MRP frames.
This attribute is set by kernel whenever the SW or HW detects that the ring is
being open or closed.
Reviewed-by: Nikolay Aleksandrov <[email protected]>
Signed-off-by: Horatiu Vultur <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
System hangs or killed rcutorture guest OSes can result in truncated
"Reader Pipe:" lines, which can in turn result in false-positive
reader-batch near-miss warnings. This commit therefore adjusts the
reader-batch checks to account for possible line truncation.
Signed-off-by: Paul E. McKenney <[email protected]>
|
|
This commit adds a TRACE02 scenario which enables preemption and RCU
Tasks Trace IPIs, more specifically, disabling heavyweight readers.
Signed-off-by: Paul E. McKenney <[email protected]>
|
|
This commit adds the definitions required to torture the tracing flavor
of RCU tasks.
Signed-off-by: Paul E. McKenney <[email protected]>
|
|
This commit adds the definitions required to torture the rude flavor of
RCU tasks.
Signed-off-by: Paul E. McKenney <[email protected]>
|
|
Linux 5.7-rc3
|
|
bpf_object__load() has various return code, when it failed to load
object, it must return err instead of -EINVAL.
Signed-off-by: Mao Wenan <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Andrii Nakryiko <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
cls_redirect is a TC clsact based replacement for the glb-redirect iptables
module available at [1]. It enables what GitHub calls "second chance"
flows [2], similarly proposed by the Beamer paper [3]. In contrast to
glb-redirect, it also supports migrating UDP flows as long as connected
sockets are used. cls_redirect is in production at Cloudflare, as part of
our own L4 load balancer.
We have modified the encapsulation format slightly from glb-redirect:
glbgue_chained_routing.private_data_type has been repurposed to form a
version field and several flags. Both have been arranged in a way that
a private_data_type value of zero matches the current glb-redirect
behaviour. This means that cls_redirect will understand packets in
glb-redirect format, but not vice versa.
The test suite only covers basic features. For example, cls_redirect will
correctly forward path MTU discovery packets, but this is not exercised.
It is also possible to switch the encapsulation format to GRE on the last
hop, which is also not tested.
There are two major distinctions from glb-redirect: first, cls_redirect
relies on receiving encapsulated packets directly from a router. This is
because we don't have access to the neighbour tables from BPF, yet. See
forward_to_next_hop for details. Second, cls_redirect performs decapsulation
instead of using separate ipip and sit tunnel devices. This
avoids issues with the sit tunnel [4] and makes deploying the classifier
easier: decapsulated packets appear on the same interface, so existing
firewall rules continue to work as expected.
The code base started it's life on v4.19, so there are most likely still
hold overs from old workarounds. In no particular order:
- The function buf_off is required to defeat a clang optimization
that leads to the verifier rejecting the program due to pointer
arithmetic in the wrong order.
- The function pkt_parse_ipv6 is force inlined, because it would
otherwise be rejected due to returning a pointer to stack memory.
- The functions fill_tuple and classify_tcp contain kludges, because
we've run out of function arguments.
- The logic in general is rather nested, due to verifier restrictions.
I think this is either because the verifier loses track of constants
on the stack, or because it can't track enum like variables.
1: https://github.com/github/glb-director/tree/master/src/glb-redirect
2: https://github.com/github/glb-director/blob/master/docs/development/second-chance-design.md
3: https://www.usenix.org/conference/nsdi18/presentation/olteanu
4: https://github.com/github/glb-director/issues/64
Signed-off-by: Lorenz Bauer <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
To make BPF verifier verbose log more releavant and easier to use to debug
verification failures, "pop" parts of log that were successfully verified.
This has effect of leaving only verifier logs that correspond to code branches
that lead to verification failure, which in practice should result in much
shorter and more relevant verifier log dumps. This behavior is made the
default behavior and can be overriden to do exhaustive logging by specifying
BPF_LOG_LEVEL2 log level.
Using BPF_LOG_LEVEL2 to disable this behavior is not ideal, because in some
cases it's good to have BPF_LOG_LEVEL2 per-instruction register dump
verbosity, but still have only relevant verifier branches logged. But for this
patch, I didn't want to add any new flags. It might be worth-while to just
rethink how BPF verifier logging is performed and requested and streamline it
a bit. But this trimming of successfully verified branches seems to be useful
and a good default behavior.
To test this, I modified runqslower slightly to introduce read of
uninitialized stack variable. Log (**truncated in the middle** to save many
lines out of this commit message) BEFORE this change:
; int handle__sched_switch(u64 *ctx)
0: (bf) r6 = r1
; struct task_struct *prev = (struct task_struct *)ctx[1];
1: (79) r1 = *(u64 *)(r6 +8)
func 'sched_switch' arg1 has btf_id 151 type STRUCT 'task_struct'
2: (b7) r2 = 0
; struct event event = {};
3: (7b) *(u64 *)(r10 -24) = r2
last_idx 3 first_idx 0
regs=4 stack=0 before 2: (b7) r2 = 0
4: (7b) *(u64 *)(r10 -32) = r2
5: (7b) *(u64 *)(r10 -40) = r2
6: (7b) *(u64 *)(r10 -48) = r2
; if (prev->state == TASK_RUNNING)
[ ... instruction dump from insn #7 through #50 are cut out ... ]
51: (b7) r2 = 16
52: (85) call bpf_get_current_comm#16
last_idx 52 first_idx 42
regs=4 stack=0 before 51: (b7) r2 = 16
; bpf_perf_event_output(ctx, &events, BPF_F_CURRENT_CPU,
53: (bf) r1 = r6
54: (18) r2 = 0xffff8881f3868800
56: (18) r3 = 0xffffffff
58: (bf) r4 = r7
59: (b7) r5 = 32
60: (85) call bpf_perf_event_output#25
last_idx 60 first_idx 53
regs=20 stack=0 before 59: (b7) r5 = 32
61: (bf) r2 = r10
; event.pid = pid;
62: (07) r2 += -16
; bpf_map_delete_elem(&start, &pid);
63: (18) r1 = 0xffff8881f3868000
65: (85) call bpf_map_delete_elem#3
; }
66: (b7) r0 = 0
67: (95) exit
from 44 to 66: safe
from 34 to 66: safe
from 11 to 28: R1_w=inv0 R2_w=inv0 R6_w=ctx(id=0,off=0,imm=0) R10=fp0 fp-8=mmmm???? fp-24_w=00000000 fp-32_w=00000000 fp-40_w=00000000 fp-48_w=00000000
; bpf_map_update_elem(&start, &pid, &ts, 0);
28: (bf) r2 = r10
;
29: (07) r2 += -16
; tsp = bpf_map_lookup_elem(&start, &pid);
30: (18) r1 = 0xffff8881f3868000
32: (85) call bpf_map_lookup_elem#1
invalid indirect read from stack off -16+0 size 4
processed 65 insns (limit 1000000) max_states_per_insn 1 total_states 5 peak_states 5 mark_read 4
Notice how there is a successful code path from instruction 0 through 67, few
successfully verified jumps (44->66, 34->66), and only after that 11->28 jump
plus error on instruction #32.
AFTER this change (full verifier log, **no truncation**):
; int handle__sched_switch(u64 *ctx)
0: (bf) r6 = r1
; struct task_struct *prev = (struct task_struct *)ctx[1];
1: (79) r1 = *(u64 *)(r6 +8)
func 'sched_switch' arg1 has btf_id 151 type STRUCT 'task_struct'
2: (b7) r2 = 0
; struct event event = {};
3: (7b) *(u64 *)(r10 -24) = r2
last_idx 3 first_idx 0
regs=4 stack=0 before 2: (b7) r2 = 0
4: (7b) *(u64 *)(r10 -32) = r2
5: (7b) *(u64 *)(r10 -40) = r2
6: (7b) *(u64 *)(r10 -48) = r2
; if (prev->state == TASK_RUNNING)
7: (79) r2 = *(u64 *)(r1 +16)
; if (prev->state == TASK_RUNNING)
8: (55) if r2 != 0x0 goto pc+19
R1_w=ptr_task_struct(id=0,off=0,imm=0) R2_w=inv0 R6_w=ctx(id=0,off=0,imm=0) R10=fp0 fp-24_w=00000000 fp-32_w=00000000 fp-40_w=00000000 fp-48_w=00000000
; trace_enqueue(prev->tgid, prev->pid);
9: (61) r1 = *(u32 *)(r1 +1184)
10: (63) *(u32 *)(r10 -4) = r1
; if (!pid || (targ_pid && targ_pid != pid))
11: (15) if r1 == 0x0 goto pc+16
from 11 to 28: R1_w=inv0 R2_w=inv0 R6_w=ctx(id=0,off=0,imm=0) R10=fp0 fp-8=mmmm???? fp-24_w=00000000 fp-32_w=00000000 fp-40_w=00000000 fp-48_w=00000000
; bpf_map_update_elem(&start, &pid, &ts, 0);
28: (bf) r2 = r10
;
29: (07) r2 += -16
; tsp = bpf_map_lookup_elem(&start, &pid);
30: (18) r1 = 0xffff8881db3ce800
32: (85) call bpf_map_lookup_elem#1
invalid indirect read from stack off -16+0 size 4
processed 65 insns (limit 1000000) max_states_per_insn 1 total_states 5 peak_states 5 mark_read 4
Notice how in this case, there are 0-11 instructions + jump from 11 to
28 is recorded + 28-32 instructions with error on insn #32.
test_verifier test runner was updated to specify BPF_LOG_LEVEL2 for
VERBOSE_ACCEPT expected result due to potentially "incomplete" success verbose
log at BPF_LOG_LEVEL1.
On success, verbose log will only have a summary of number of processed
instructions, etc, but no branch tracing log. Having just a last succesful
branch tracing seemed weird and confusing. Having small and clean summary log
in success case seems quite logical and nice, though.
Signed-off-by: Andrii Nakryiko <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
On a device like a cellphone which is constantly suspending
and resuming CLOCK_MONOTONIC is not particularly useful for
keeping track of or reacting to external network events.
Instead you want to use CLOCK_BOOTTIME.
Hence add bpf_ktime_get_boot_ns() as a mirror of bpf_ktime_get_ns()
based around CLOCK_BOOTTIME instead of CLOCK_MONOTONIC.
Signed-off-by: Maciej Żenczykowski <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
The following error was shown when a bpf program was compiled without
vmlinux.h auto-generated from BTF:
# clang -I./linux/tools/lib/ -I/lib/modules/$(uname -r)/build/include/ \
-O2 -Wall -target bpf -emit-llvm -c bpf_prog.c -o bpf_prog.bc
...
In file included from linux/tools/lib/bpf/bpf_helpers.h:5:
linux/tools/lib/bpf/bpf_helper_defs.h:56:82: error: unknown type name '__u64'
...
It seems that bpf programs are intended for being built together with
the vmlinux.h (which will have all the __u64 and other typedefs). But
users may mistakenly think "include <linux/types.h>" is missing
because the vmlinux.h is not common for non-bpf developers. IMO, an
explicit comment therefore should be added to bpf_helpers.h as this
patch shows.
Signed-off-by: Yoshiki Komachi <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Acked-by: Andrii Nakryiko <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
Currently the following prog types don't fall back to bpf_base_func_proto()
(instead they have cgroup_base_func_proto which has a limited set of
helpers from bpf_base_func_proto):
* BPF_PROG_TYPE_CGROUP_DEVICE
* BPF_PROG_TYPE_CGROUP_SYSCTL
* BPF_PROG_TYPE_CGROUP_SOCKOPT
I don't see any specific reason why we shouldn't use bpf_base_func_proto(),
every other type of program (except bpf-lirc and, understandably, tracing)
use it, so let's fall back to bpf_base_func_proto for those prog types
as well.
This basically boils down to adding access to the following helpers:
* BPF_FUNC_get_prandom_u32
* BPF_FUNC_get_smp_processor_id
* BPF_FUNC_get_numa_node_id
* BPF_FUNC_tail_call
* BPF_FUNC_ktime_get_ns
* BPF_FUNC_spin_lock (CAP_SYS_ADMIN)
* BPF_FUNC_spin_unlock (CAP_SYS_ADMIN)
* BPF_FUNC_jiffies64 (CAP_SYS_ADMIN)
I've also added bpf_perf_event_output() because it's really handy for
logging and debugging.
Signed-off-by: Stanislav Fomichev <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
Code cleanup: Remove duplicate headers which are included twice.
Signed-off-by: Jagadeesh Pagadala <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Reviewed-by: Quentin Monnet <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
Randy reported that objtool got stuck in an infinite loop when
processing drivers/i2c/busses/i2c-parport.o. It was caused by the
following code:
00000000000001fd <line_set>:
1fd: 48 b8 00 00 00 00 00 movabs $0x0,%rax
204: 00 00 00
1ff: R_X86_64_64 .rodata-0x8
207: 41 55 push %r13
209: 41 89 f5 mov %esi,%r13d
20c: 41 54 push %r12
20e: 49 89 fc mov %rdi,%r12
211: 55 push %rbp
212: 48 89 d5 mov %rdx,%rbp
215: 53 push %rbx
216: 0f b6 5a 01 movzbl 0x1(%rdx),%ebx
21a: 48 8d 34 dd 00 00 00 lea 0x0(,%rbx,8),%rsi
221: 00
21e: R_X86_64_32S .rodata
222: 48 89 f1 mov %rsi,%rcx
225: 48 29 c1 sub %rax,%rcx
find_jump_table() saw the .rodata reference and tried to find a jump
table associated with it (though there wasn't one). The -0x8 rela
addend is unusual. It caused find_jump_table() to send a negative
table_offset (unsigned 0xfffffffffffffff8) to find_rela_by_dest().
The negative offset should have been harmless, but it actually threw
for_offset_range() for a loop... literally. When the mask value got
incremented past the end value, it also wrapped to zero, causing the
loop exit condition to remain true forever.
Prevent this scenario from happening by ensuring the incremented value
is always >= the starting value.
Fixes: 74b873e49d92 ("objtool: Optimize find_rela_by_dest_range()")
Reported-by: Randy Dunlap <[email protected]>
Tested-by: Randy Dunlap <[email protected]>
Acked-by: Randy Dunlap <[email protected]>
Signed-off-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Cc: Julien Thierry <[email protected]>
Cc: Miroslav Benes <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/02b719674b031800b61e33c30b2e823183627c19.1587842122.git.jpoimboe@redhat.com
|
|
Simple overlapping changes to linux/vermagic.h
Signed-off-by: David S. Miller <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull objtool fixes from Ingo Molnar:
"Two fixes: fix an off-by-one bug, and fix 32-bit builds on 64-bit
systems"
* tag 'objtool-urgent-2020-04-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
objtool: Fix off-by-one in symbol_by_offset()
objtool: Fix 32bit cross builds
|
|
When the current frame address (CFA) is stored on the stack (i.e.,
cfa->base == CFI_SP_INDIRECT), objtool neglects to adjust the stack
offset when there are subsequent pushes or pops. This results in bad
ORC data at the end of the ENTER_IRQ_STACK macro, when it puts the
previous stack pointer on the stack and does a subsequent push.
This fixes the following unwinder warning:
WARNING: can't dereference registers at 00000000f0a6bdba for ip interrupt_entry+0x9f/0xa0
Fixes: 627fce14809b ("objtool: Add ORC unwind table generation")
Reported-by: Vince Weaver <[email protected]>
Reported-by: Dave Jones <[email protected]>
Reported-by: Steven Rostedt <[email protected]>
Reported-by: Vegard Nossum <[email protected]>
Reported-by: Joe Mario <[email protected]>
Reviewed-by: Miroslav Benes <[email protected]>
Signed-off-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Jann Horn <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/r/853d5d691b29e250333332f09b8e27410b2d9924.1587808742.git.jpoimboe@redhat.com
|
|
Pull networking fixes from David Miller:
1) Fix memory leak in netfilter flowtable, from Roi Dayan.
2) Ref-count leaks in netrom and tipc, from Xiyu Yang.
3) Fix warning when mptcp socket is never accepted before close, from
Florian Westphal.
4) Missed locking in ovs_ct_exit(), from Tonghao Zhang.
5) Fix large delays during PTP synchornization in cxgb4, from Rahul
Lakkireddy.
6) team_mode_get() can hang, from Taehee Yoo.
7) Need to use kvzalloc() when allocating fw tracer in mlx5 driver,
from Niklas Schnelle.
8) Fix handling of bpf XADD on BTF memory, from Jann Horn.
9) Fix BPF_STX/BPF_B encoding in x86 bpf jit, from Luke Nelson.
10) Missing queue memory release in iwlwifi pcie code, from Johannes
Berg.
11) Fix NULL deref in macvlan device event, from Taehee Yoo.
12) Initialize lan87xx phy correctly, from Yuiko Oshino.
13) Fix looping between VRF and XFRM lookups, from David Ahern.
14) etf packet scheduler assumes all sockets are full sockets, which is
not necessarily true. From Eric Dumazet.
15) Fix mptcp data_fin handling in RX path, from Paolo Abeni.
16) fib_select_default() needs to handle nexthop objects, from David
Ahern.
17) Use GFP_ATOMIC under spinlock in mac80211_hwsim, from Wei Yongjun.
18) vxlan and geneve use wrong nlattr array, from Sabrina Dubroca.
19) Correct rx/tx stats in bcmgenet driver, from Doug Berger.
20) BPF_LDX zero-extension is encoded improperly in x86_32 bpf jit, fix
from Luke Nelson.
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (100 commits)
selftests/bpf: Fix a couple of broken test_btf cases
tools/runqslower: Ensure own vmlinux.h is picked up first
bpf: Make bpf_link_fops static
bpftool: Respect the -d option in struct_ops cmd
selftests/bpf: Add test for freplace program with expected_attach_type
bpf: Propagate expected_attach_type when verifying freplace programs
bpf: Fix leak in LINK_UPDATE and enforce empty old_prog_fd
bpf, x86_32: Fix logic error in BPF_LDX zero-extension
bpf, x86_32: Fix clobbering of dst for BPF_JSET
bpf, x86_32: Fix incorrect encoding in BPF_LDX zero-extension
bpf: Fix reStructuredText markup
net: systemport: suppress warnings on failed Rx SKB allocations
net: bcmgenet: suppress warnings on failed Rx SKB allocations
macsec: avoid to set wrong mtu
mac80211: sta_info: Add lockdep condition for RCU list usage
mac80211: populate debugfs only after cfg80211 init
net: bcmgenet: correct per TX/RX ring statistics
net: meth: remove spurious copyright text
net: phy: bcm84881: clear settings on link down
chcr: Fix CPU hard lockup
...
|
|
Alexei Starovoitov says:
====================
pull-request: bpf 2020-04-24
The following pull-request contains BPF updates for your *net* tree.
We've added 17 non-merge commits during the last 5 day(s) which contain
a total of 19 files changed, 203 insertions(+), 85 deletions(-).
The main changes are:
1) link_update fix, from Andrii.
2) libbpf get_xdp_id fix, from David.
3) xadd verifier fix, from Jann.
4) x86-32 JIT fixes, from Luke and Wang.
5) test_btf fix, from Stanislav.
6) freplace verifier fix, from Toke.
====================
Signed-off-by: David S. Miller <[email protected]>
|
|
Commit 51c39bb1d5d1 ("bpf: Introduce function-by-function verification")
introduced function linkage flag and changed the error message from
"vlen != 0" to "Invalid func linkage" and broke some fake BPF programs.
Adjust the test accordingly.
AFACT, the programs don't really need any arguments and only look
at BTF for maps, so let's drop the args altogether.
Before:
BTF raw test[103] (func (Non zero vlen)): do_test_raw:3703:FAIL expected
err_str:vlen != 0
magic: 0xeb9f
version: 1
flags: 0x0
hdr_len: 24
type_off: 0
type_len: 72
str_off: 72
str_len: 10
btf_total_size: 106
[1] INT (anon) size=4 bits_offset=0 nr_bits=32 encoding=SIGNED
[2] INT (anon) size=4 bits_offset=0 nr_bits=32 encoding=(none)
[3] FUNC_PROTO (anon) return=0 args=(1 a, 2 b)
[4] FUNC func type_id=3 Invalid func linkage
BTF libbpf test[1] (test_btf_haskv.o): libbpf: load bpf program failed:
Invalid argument
libbpf: -- BEGIN DUMP LOG ---
libbpf:
Validating test_long_fname_2() func#1...
Arg#0 type PTR in test_long_fname_2() is not supported yet.
processed 0 insns (limit 1000000) max_states_per_insn 0 total_states 0
peak_states 0 mark_read 0
libbpf: -- END LOG --
libbpf: failed to load program 'dummy_tracepoint'
libbpf: failed to load object 'test_btf_haskv.o'
do_test_file:4201:FAIL bpf_object__load: -4007
BTF libbpf test[2] (test_btf_newkv.o): libbpf: load bpf program failed:
Invalid argument
libbpf: -- BEGIN DUMP LOG ---
libbpf:
Validating test_long_fname_2() func#1...
Arg#0 type PTR in test_long_fname_2() is not supported yet.
processed 0 insns (limit 1000000) max_states_per_insn 0 total_states 0
peak_states 0 mark_read 0
libbpf: -- END LOG --
libbpf: failed to load program 'dummy_tracepoint'
libbpf: failed to load object 'test_btf_newkv.o'
do_test_file:4201:FAIL bpf_object__load: -4007
BTF libbpf test[3] (test_btf_nokv.o): libbpf: load bpf program failed:
Invalid argument
libbpf: -- BEGIN DUMP LOG ---
libbpf:
Validating test_long_fname_2() func#1...
Arg#0 type PTR in test_long_fname_2() is not supported yet.
processed 0 insns (limit 1000000) max_states_per_insn 0 total_states 0
peak_states 0 mark_read 0
libbpf: -- END LOG --
libbpf: failed to load program 'dummy_tracepoint'
libbpf: failed to load object 'test_btf_nokv.o'
do_test_file:4201:FAIL bpf_object__load: -4007
Fixes: 51c39bb1d5d1 ("bpf: Introduce function-by-function verification")
Signed-off-by: Stanislav Fomichev <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
Reorder include paths to ensure that runqslower sources are picking up
vmlinux.h, generated by runqslower's own Makefile. When runqslower is built
from selftests/bpf, due to current -I$(BPF_INCLUDE) -I$(OUTPUT) ordering, it
might pick up not-yet-complete vmlinux.h, generated by selftests Makefile,
which could lead to compilation errors like [0]. So ensure that -I$(OUTPUT)
goes first and rely on runqslower's Makefile own dependency chain to ensure
vmlinux.h is properly completed before source code relying on it is compiled.
[0] https://travis-ci.org/github/libbpf/libbpf/jobs/677905925
Signed-off-by: Andrii Nakryiko <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
In the prog cmd, the "-d" option turns on the verifier log.
This is missed in the "struct_ops" cmd and this patch fixes it.
Fixes: 65c93628599d ("bpftool: Add struct_ops support")
Signed-off-by: Martin KaFai Lau <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Reviewed-by: Quentin Monnet <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
This adds a new selftest that tests the ability to attach an freplace
program to a program type that relies on the expected_attach_type of the
target program to pass verification.
Signed-off-by: Toke Høiland-Jørgensen <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
The patch fixes:
$ scripts/bpf_helpers_doc.py > bpf-helpers.rst
$ rst2man bpf-helpers.rst > bpf-helpers.7
bpf-helpers.rst:1105: (WARNING/2) Inline strong start-string without end-string.
Signed-off-by: Jakub Wilk <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Reviewed-by: Quentin Monnet <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"Restore an optimization related to asynchronous suspend and resume of
devices during system-wide power transitions that was disabled by
mistake (Kai-Heng Feng) and update the pm-graph suite of power
management utilities (Todd Brandt)"
* tag 'pm-5.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PM: sleep: core: Switch back to async_schedule_dev()
pm-graph v5.6
|
|
It is possible to get multiple records from trace during test and then more
than 4 arguments are assigned to ARGS. This situation results in the failure
of kprobe_args_type.tc. For example:
-----------------------------------------------------------
grep testprobe trace
ftracetest-5902 [001] d... 111195.682227: testprobe: (_do_fork+0x0/0x460) arg1=334823024 arg2=334823024 arg3=0x13f4fe70 arg4=7
pmlogger-5949 [000] d... 111195.709898: testprobe: (_do_fork+0x0/0x460) arg1=345308784 arg2=345308784 arg3=0x1494fe70 arg4=7
grep testprobe trace
sed -e 's/.* arg1=\(.*\) arg2=\(.*\) arg3=\(.*\) arg4=\(.*\)/\1 \2 \3 \4/'
ARGS='334823024 334823024 0x13f4fe70 7
345308784 345308784 0x1494fe70 7'
-----------------------------------------------------------
We don't care which process calls do_fork so just check the first record to
fix the issue.
Signed-off-by: Xiao Yang <[email protected]>
Acked-by: Masami Hiramatsu <[email protected]>
Signed-off-by: Shuah Khan <[email protected]>
|
|
Add build/cross-build dependency check script kselftest_deps.sh
This script does the following:
Usage: ./kselftest_deps.sh -[p] <compiler> [test_name]
kselftest_deps.sh [-p] gcc
kselftest_deps.sh [-p] gcc vm
kselftest_deps.sh [-p] aarch64-linux-gnu-gcc
kselftest_deps.sh [-p] aarch64-linux-gnu-gcc vm
- Should be run in selftests directory in the kernel repo.
- Checks if Kselftests can be built/cross-built on a system.
- Parses all test/sub-test Makefile to find library dependencies.
- Runs compile test on a trivial C file with LDLIBS specified
in the test Makefiles to identify missing library dependencies.
- Prints suggested target list for a system filtering out tests
failed the build dependency check from the TARGETS in Selftests
the main Makefile when optional -p is specified.
- Prints pass/fail dependency check for each tests/sub-test.
- Prints pass/fail targets and libraries.
- Default: runs dependency checks on all tests.
- Optional test name can be specified to check dependencies for it.
To make LDLIBS parsing easier
- change gpio and memfd Makefiles to use the same temporary variable used
to find and add libraries to LDLIBS.
- simlify LDLIBS append logic in intel_pstate/Makefile.
Results from run on x86_64 system (trimmed detailed pass/fail list):
========================================================
Kselftest Dependency Check for [./kselftest_deps.sh gcc ] results...
========================================================
Checked tests defining LDLIBS dependencies
--------------------------------------------------------
Total tests with Dependencies:
55 Pass: 53 Fail: 2
--------------------------------------------------------
Targets passed build dependency check on system:
bpf capabilities filesystems futex gpio intel_pstate membarrier memfd
mqueue net powerpc ptp rseq rtc safesetid timens timers vDSO vm
--------------------------------------------------------
FAIL: netfilter/Makefile dependency check: -lmnl
FAIL: gpio/Makefile dependency check: -lmount
--------------------------------------------------------
Targets failed build dependency check on system:
gpio netfilter
--------------------------------------------------------
Missing libraries system
-lmnl -lmount
--------------------------------------------------------
========================================================
Results from run on x86_64 system with aarch64-linux-gnu-gcc:
(trimmed detailed pass/fail list):
========================================================
Kselftest Dependency Check for [./kselftest_deps.sh aarch64-linux-gnu-gcc ]
results...
========================================================
Checked tests defining LDLIBS dependencies
--------------------------------------------------------
Total tests with Dependencies:
55 Pass: 41 Fail: 14
--------------------------------------------------------
Targets failed build dependency check on system:
bpf capabilities filesystems futex gpio intel_pstate membarrier memfd
mqueue net powerpc ptp rseq rtc timens timers vDSO vm
--------------------------------------------------------
--------------------------------------------------------
Targets failed build dependency check on system:
bpf capabilities gpio memfd mqueue net netfilter safesetid vm
--------------------------------------------------------
Missing libraries system
-lcap -lcap-ng -lelf -lfuse -lmnl -lmount -lnuma -lpopt -lz
--------------------------------------------------------
========================================================
Signed-off-by: Shuah Khan <[email protected]>
|
|
Without CONFIG_DYNAMIC_FTRACE, some tests get failure because required
filter files(set_ftrace_filter/available_filter_functions/stack_trace_filter)
are missing. So implement check_filter_file() and make all related tests
check required filter files by it.
BTW: set_ftrace_filter and available_filter_functions are introduced together
so just check either of them.
Signed-off-by: Xiao Yang <[email protected]>
Acked-by: Masami Hiramatsu <[email protected]>
Signed-off-by: Shuah Khan <[email protected]>
|
|
To control degree of parallelism of the synthesize_mmap() code which
is scanning /proc/PID/task/PID/maps and can be time consuming.
Mimic perf top way of handling the option.
If not specified will default to 1 thread, i.e. default behavior before
this option.
On a desktop computer the processing of /proc/PID/task/PID/maps isn't
slow enough to warrant parallel processing and the thread creation has
some cost - hence the default of 1. On a loaded server with
>100 cores it is possible to see synthesis times in the order of
seconds and in this case having the option is desirable.
As the processing is a synchronization point, it is legitimate to worry if
Amdahl's law will apply to this patch. Profiling with this patch in
place:
https://lore.kernel.org/lkml/[email protected]/
shows:
...
- 32.59% __perf_event__synthesize_threads
- 32.54% __event__synthesize_thread
+ 22.13% perf_event__synthesize_mmap_events
+ 6.68% perf_event__get_comm_ids.constprop.0
+ 1.49% process_synthesized_event
+ 1.29% __GI___readdir64
+ 0.60% __opendir
...
That is the processing is 1.49% of execution time and there is plenty to
make parallel. This is shown in the benchmark in this patch:
https://lore.kernel.org/lkml/[email protected]/
Computing performance of multi threaded perf event synthesis by
synthesizing events on CPU 0:
Number of synthesis threads: 1
Average synthesis took: 127729.000 usec (+- 3372.880 usec)
Average num. events: 21548.600 (+- 0.306)
Average time per event 5.927 usec
Number of synthesis threads: 2
Average synthesis took: 88863.500 usec (+- 385.168 usec)
Average num. events: 21552.800 (+- 0.327)
Average time per event 4.123 usec
Number of synthesis threads: 3
Average synthesis took: 83257.400 usec (+- 348.617 usec)
Average num. events: 21553.200 (+- 0.327)
Average time per event 3.863 usec
Number of synthesis threads: 4
Average synthesis took: 75093.000 usec (+- 422.978 usec)
Average num. events: 21554.200 (+- 0.200)
Average time per event 3.484 usec
Number of synthesis threads: 5
Average synthesis took: 64896.600 usec (+- 353.348 usec)
Average num. events: 21558.000 (+- 0.000)
Average time per event 3.010 usec
Number of synthesis threads: 6
Average synthesis took: 59210.200 usec (+- 342.890 usec)
Average num. events: 21560.000 (+- 0.000)
Average time per event 2.746 usec
Number of synthesis threads: 7
Average synthesis took: 54093.900 usec (+- 306.247 usec)
Average num. events: 21562.000 (+- 0.000)
Average time per event 2.509 usec
Number of synthesis threads: 8
Average synthesis took: 48938.700 usec (+- 341.732 usec)
Average num. events: 21564.000 (+- 0.000)
Average time per event 2.269 usec
Where average time per synthesized event goes from 5.927 usec with 1
thread to 2.269 usec with 8. This isn't a linear speed up as not all of
synthesize code has been made parallel. If the synthesis time was about
10 seconds then using 8 threads may bring this down to less than 4.
Signed-off-by: Stephane Eranian <[email protected]>
Reviewed-by: Ian Rogers <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Alexey Budankov <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Tony Jones <[email protected]>
Cc: yuzhoujian <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Commit 2d4f27999b88 ("perf data: Add global path holder") missed path
conversion in tests/topology.c, causing the "Session topology" testcase
to "hang" (waits forever for input from stdin) when doing "ssh $VM perf
test".
Can be reproduced by running "cat | perf test topo", and crashed by
replacing cat with true:
$ true | perf test -v topo
40: Session topology :
--- start ---
test child forked, pid 3638
templ file: /tmp/perf-test-QPvAch
incompatible file format
incompatible file format (rerun with -v to learn more)
free(): invalid pointer
test child interrupted
---- end ----
Session topology: FAILED!
Committer testing:
Reproduced the above result before the patch and after it is back
working:
# true | perf test -v topo
41: Session topology :
--- start ---
test child forked, pid 19374
templ file: /tmp/perf-test-YOTEQg
CPU 0, core 0, socket 0
CPU 1, core 1, socket 0
CPU 2, core 2, socket 0
CPU 3, core 3, socket 0
CPU 4, core 0, socket 0
CPU 5, core 1, socket 0
CPU 6, core 2, socket 0
CPU 7, core 3, socket 0
test child finished with 0
---- end ----
Session topology: Ok
#
Fixes: 2d4f27999b88 ("perf data: Add global path holder")
Signed-off-by: Tommi Rantala <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Mamatha Inamdar <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ravi Bangoria <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
For interval mode, the metric is printed after the '#' character if it
exists. But it's not calculated by the counts generated in this
interval.
See the following examples:
root@kbl-ppc:~# perf stat -M CPI -I1000 --interval-count 2
# time counts unit events
1.000422803 764,809 inst_retired.any # 2.9 CPI
1.000422803 2,234,932 cycles
2.001464585 1,960,061 inst_retired.any # 1.6 CPI
2.001464585 4,022,591 cycles
The second CPI should not be 1.6 (4,022,591/1,960,061 is 2.1)
root@kbl-ppc:~# perf stat -e cycles,instructions -I1000 --interval-count 2
# time counts unit events
1.000429493 2,869,311 cycles
1.000429493 816,875 instructions # 0.28 insn per cycle
2.001516426 9,260,973 cycles
2.001516426 5,250,634 instructions # 0.87 insn per cycle
The second 'insn per cycle' should not be 0.87 (5,250,634/9,260,973 is
0.57).
The current code uses a global variable 'rt_stat' for tracking and
updating the std dev of runtime stat. Unlike the counts, 'rt_stat' is not
reset for interval. While the counts are reset for interval.
perf_stat_process_counter()
{
if (config->interval)
init_stats(ps->res_stats);
}
So for interval mode, the 'rt_stat' variable should be reset too.
This patch resets 'rt_stat' before read_counters(), so the runtime stat
is only calculated by the counts generated in this interval.
With this patch:
root@kbl-ppc:~# perf stat -M CPI -I1000 --interval-count 2
# time counts unit events
1.000420924 2,408,818 inst_retired.any # 2.1 CPI
1.000420924 5,010,111 cycles
2.001448579 2,798,407 inst_retired.any # 1.6 CPI
2.001448579 4,599,861 cycles
root@kbl-ppc:~# perf stat -e cycles,instructions -I1000 --interval-count 2
# time counts unit events
1.000428555 2,769,714 cycles
1.000428555 774,462 instructions # 0.28 insn per cycle
2.001471562 3,595,904 cycles
2.001471562 1,243,703 instructions # 0.35 insn per cycle
Now the second 'insn per cycle' and CPI are calculated by the counts
generated in this interval.
Signed-off-by: Jin Yao <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Tested-By: Kajol Jain <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Jin Yao <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Mostly straightforward constification, except that WARN_FUNC()
needs a writable pointer while we have read-only pointers,
so deflect this to WARN().
Acked-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Sami Tolvanen <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
'struct elf *' handling is an open/close paradigm, make sure the naming
matches that:
elf_open_read()
elf_write()
elf_close()
Acked-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Sami Tolvanen <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
In preparation to parallelize certain parts of objtool, map out which uses
of various data structures are read-only vs. read-write.
As a first step constify 'struct elf' pointer passing, most of the secondary
uses of it in find_symbol_*() methods are read-only.
Also, while at it, better group the 'struct elf' handling methods in elf.h.
Acked-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Sami Tolvanen <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
The commit in the Fixes tag changed get_xdp_id to only return prog_id
if flags is 0, but there are other XDP flags than the modes - e.g.,
XDP_FLAGS_UPDATE_IF_NOEXIST. Since the intention was only to look at
MODE flags, clear other ones before checking if flags is 0.
Fixes: f07cbad29741 ("libbpf: Fix bpf_get_link_xdp_id flags handling")
Signed-off-by: David Ahern <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Acked-by: Andrey Ignatov <[email protected]>
|
|
Add nodad when adding IPv6 addresses and remove the sleep.
A recent change to iproute2 moved the 'pref medium' to the prefix
(where it belongs). Change the expected route check to strip
'pref medium' to be compatible with old and new iproute2.
Add IPv4 runtime test with an IPv6 address as the gateway in
the default route.
Signed-off-by: David Ahern <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
A user reported [0] hitting the WARN_ON in fib_info_nh:
[ 8633.839816] ------------[ cut here ]------------
[ 8633.839819] WARNING: CPU: 0 PID: 1719 at include/net/nexthop.h:251 fib_select_path+0x303/0x381
...
[ 8633.839846] RIP: 0010:fib_select_path+0x303/0x381
...
[ 8633.839848] RSP: 0018:ffffb04d407f7d00 EFLAGS: 00010286
[ 8633.839850] RAX: 0000000000000000 RBX: ffff9460b9897ee8 RCX: 00000000000000fe
[ 8633.839851] RDX: 0000000000000000 RSI: 00000000ffffffff RDI: 0000000000000000
[ 8633.839852] RBP: ffff946076049850 R08: 0000000059263a83 R09: ffff9460840e4000
[ 8633.839853] R10: 0000000000000014 R11: 0000000000000000 R12: ffffb04d407f7dc0
[ 8633.839854] R13: ffffffffa4ce3240 R14: 0000000000000000 R15: ffff9460b7681f60
[ 8633.839857] FS: 00007fcac2e02700(0000) GS:ffff9460bdc00000(0000) knlGS:0000000000000000
[ 8633.839858] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 8633.839859] CR2: 00007f27beb77e28 CR3: 0000000077734000 CR4: 00000000000006f0
[ 8633.839867] Call Trace:
[ 8633.839871] ip_route_output_key_hash_rcu+0x421/0x890
[ 8633.839873] ip_route_output_key_hash+0x5e/0x80
[ 8633.839876] ip_route_output_flow+0x1a/0x50
[ 8633.839878] __ip4_datagram_connect+0x154/0x310
[ 8633.839880] ip4_datagram_connect+0x28/0x40
[ 8633.839882] __sys_connect+0xd6/0x100
...
The WARN_ON is triggered in fib_select_default which is invoked when
there are multiple default routes. Update the function to use
fib_info_nhc and convert the nexthop checks to use fib_nh_common.
Add test case that covers the affected code path.
[0] https://github.com/FRRouting/frr/issues/6089
Fixes: 493ced1ac47c ("ipv4: Allow routes to use nexthop objects")
Signed-off-by: David Ahern <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Add a self-test for the IPv6 dsfield munge that iproute2 will support.
Signed-off-by: Petr Machata <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Extend the pedit_dsfield forwarding selftest with coverage of "pedit ex
munge ip6 dsfield set".
Signed-off-by: Petr Machata <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Add tests for vrf and xfrms with a second round after adding a
qdisc. There are a few known problems documented with the test
cases that fail. The fix is non-trivial; will come back to it
when time allows.
Signed-off-by: David Ahern <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Sometimes, WARN_FUNC() and other users of symbol_by_offset() will
associate the first instruction of a symbol with the symbol preceding
it. This is because symbol->offset + symbol->len is already outside of
the symbol's range.
Fixes: 2a362ecc3ec9 ("objtool: Optimize find_symbol_*() and read_symbols()")
Signed-off-by: Julien Thierry <[email protected]>
Reviewed-by: Miroslav Benes <[email protected]>
Signed-off-by: Josh Poimboeuf <[email protected]>
|
|
Apparently there's people doing 64bit builds on 32bit machines.
Fixes: 74b873e49d92 ("objtool: Optimize find_rela_by_dest_range()")
Reported-by: [email protected]
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
|
|
fib_tests is spewing errors:
...
Cannot open network namespace "ns1": No such file or directory
Cannot open network namespace "ns1": No such file or directory
Cannot open network namespace "ns1": No such file or directory
Cannot open network namespace "ns1": No such file or directory
ping: connect: Network is unreachable
Cannot open network namespace "ns1": No such file or directory
Cannot open network namespace "ns1": No such file or directory
...
Each test entry in fib_tests is supposed to do its own setup and
cleanup. Right now the $IP commands in fib_suppress_test are
failing because there is no ns1. Add the setup/cleanup and logging
expected for each test.
Fixes: ca7a03c41753 ("ipv6: do not free rt if FIB_LOOKUP_NOREF is set on suppress rule")
Signed-off-by: David Ahern <[email protected]>
Cc: Jason A. Donenfeld <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
As the code comments in perf_stat_process_counter() say, we calculate
counter's data every interval, and the display code shows ps->res_stats
avg value. We need to zero the stats for interval mode.
But the current code only zeros the res_stats[0], it doesn't zero the
res_stats[1] and res_stats[2], which are for ena and run of counter.
This patch zeros the whole res_stats[] for interval mode.
Fixes: 51fd2df1e882 ("perf stat: Fix interval output values")
Signed-off-by: Jin Yao <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Jin Yao <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull kselftest fixes from Shuah Khan:
"This consists of fixes to runner scripts and individual test run-time
bugs. Includes fixes to tpm2 and memfd test run-time regressions"
* tag 'linux-kselftest-5.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests/ipc: Fix test failure seen after initial test run
Revert "Kernel selftests: tpm2: check for tpm support"
selftests/ftrace: Add CONFIG_SAMPLE_FTRACE_DIRECT=m kconfig
selftests/seccomp: allow clock_nanosleep instead of nanosleep
kselftest/runner: allow to properly deliver signals to tests
selftests/harness: fix spelling mistake "SIGARLM" -> "SIGALRM"
selftests: Fix memfd test run-time regression
selftests: vm: Fix 64-bit test builds for powerpc64le
selftests: vm: Do not override definition of ARCH
|
|
The hidepid parameter values are becoming more and more and it becomes
difficult to remember what each new magic number means.
Backward compatibility is preserved since it is possible to specify
numerical value for the hidepid parameter. This does not break the
fsconfig since it is not possible to specify a numerical value through
it. All numeric values are converted to a string. The type
FSCONFIG_SET_BINARY cannot be used to indicate a numerical value.
Selftest has been added to verify this behavior.
Suggested-by: Andy Lutomirski <[email protected]>
Signed-off-by: Alexey Gladkov <[email protected]>
Reviewed-by: Alexey Dobriyan <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
Signed-off-by: Eric W. Biederman <[email protected]>
|
|
This patch allows to have multiple procfs instances inside the
same pid namespace. The aim here is lightweight sandboxes, and to allow
that we have to modernize procfs internals.
1) The main aim of this work is to have on embedded systems one
supervisor for apps. Right now we have some lightweight sandbox support,
however if we create pid namespacess we have to manages all the
processes inside too, where our goal is to be able to run a bunch of
apps each one inside its own mount namespace without being able to
notice each other. We only want to use mount namespaces, and we want
procfs to behave more like a real mount point.
2) Linux Security Modules have multiple ptrace paths inside some
subsystems, however inside procfs, the implementation does not guarantee
that the ptrace() check which triggers the security_ptrace_check() hook
will always run. We have the 'hidepid' mount option that can be used to
force the ptrace_may_access() check inside has_pid_permissions() to run.
The problem is that 'hidepid' is per pid namespace and not attached to
the mount point, any remount or modification of 'hidepid' will propagate
to all other procfs mounts.
This also does not allow to support Yama LSM easily in desktop and user
sessions. Yama ptrace scope which restricts ptrace and some other
syscalls to be allowed only on inferiors, can be updated to have a
per-task context, where the context will be inherited during fork(),
clone() and preserved across execve(). If we support multiple private
procfs instances, then we may force the ptrace_may_access() on
/proc/<pids>/ to always run inside that new procfs instances. This will
allow to specifiy on user sessions if we should populate procfs with
pids that the user can ptrace or not.
By using Yama ptrace scope, some restricted users will only be able to see
inferiors inside /proc, they won't even be able to see their other
processes. Some software like Chromium, Firefox's crash handler, Wine
and others are already using Yama to restrict which processes can be
ptracable. With this change this will give the possibility to restrict
/proc/<pids>/ but more importantly this will give desktop users a
generic and usuable way to specifiy which users should see all processes
and which users can not.
Side notes:
* This covers the lack of seccomp where it is not able to parse
arguments, it is easy to install a seccomp filter on direct syscalls
that operate on pids, however /proc/<pid>/ is a Linux ABI using
filesystem syscalls. With this change LSMs should be able to analyze
open/read/write/close...
In the new patch set version I removed the 'newinstance' option
as suggested by Eric W. Biederman.
Selftest has been added to verify new behavior.
Signed-off-by: Alexey Gladkov <[email protected]>
Reviewed-by: Alexey Dobriyan <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
Signed-off-by: Eric W. Biederman <[email protected]>
|
|
al->sym may be NULL given current if conditions and may cause a segv.
Fixes: d2bedb7863e9 ("perf script: Allow --symbol to accept hexadecimal addresses")
Signed-off-by: Ian Rogers <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Code cleanup: Remove duplicate headers which are included twice.
Signed-off-by: Jagadeesh Pagadala <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Jiri Olsa <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|