Age | Commit message (Collapse) | Author | Files | Lines |
|
Alexei Starovoitov says:
====================
pull-request: bpf 2019-06-15
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) fix stack layout of JITed x64 bpf code, from Alexei.
2) fix out of bounds memory access in bpf_sk_storage, from Arthur.
3) fix lpm trie walk, from Jonathan.
4) fix nested bpf_perf_event_output, from Matt.
5) and several other fixes.
====================
Signed-off-by: David S. Miller <[email protected]>
|
|
BPF_PROG_TYPE_RAW_TRACEPOINTs can be executed nested on the same CPU, as
they do not increment bpf_prog_active while executing.
This enables three levels of nesting, to support
- a kprobe or raw tp or perf event,
- another one of the above that irq context happens to call, and
- another one in nmi context
(at most one of which may be a kprobe or perf event).
Fixes: 20b9d7ac4852 ("bpf: avoid excessive stack usage for perf_sample_data")
Signed-off-by: Matt Mullins <[email protected]>
Acked-by: Andrii Nakryiko <[email protected]>
Acked-by: Daniel Borkmann <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fixes from Steven Rostedt:
- Out of range read of stack trace output
- Fix for NULL pointer dereference in trace_uprobe_create()
- Fix to a livepatching / ftrace permission race in the module code
- Fix for NULL pointer dereference in free_ftrace_func_mapper()
- A couple of build warning clean ups
* tag 'trace-v5.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
ftrace: Fix NULL pointer dereference in free_ftrace_func_mapper()
module: Fix livepatch/ftrace module text permissions race
tracing/uprobe: Fix obsolete comment on trace_uprobe_create()
tracing/uprobe: Fix NULL pointer dereference in trace_uprobe_create()
tracing: Make two symbols static
tracing: avoid build warning with HAVE_NOP_MCOUNT
tracing: Fix out-of-range read in trace_stack_print()
|
|
stop_machine is the only user left of cpu_relax_yield. Given that it
now has special semantics which are tied to stop_machine introduce a
weak stop_machine_yield function which architectures can override, and
get rid of the generic cpu_relax_yield implementation.
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Acked-by: Thomas Gleixner <[email protected]>
Signed-off-by: Heiko Carstens <[email protected]>
|
|
The stop_machine loop to advance the state machine and to wait for all
affected CPUs to check-in calls cpu_relax_yield in a tight loop until
the last missing CPUs acknowledged the state transition.
On a virtual system where not all logical CPUs are backed by real CPUs
all the time it can take a while for all CPUs to check-in. With the
current definition of cpu_relax_yield a diagnose 0x44 is done which
tells the hypervisor to schedule *some* other CPU. That can be any
CPU and not necessarily one of the CPUs that need to run in order to
advance the state machine. This can lead to a pretty bad diagnose 0x44
storm until the last missing CPU finally checked-in.
Replace the undirected cpu_relax_yield based on diagnose 0x44 with a
directed yield. Each CPU in the wait loop will pick up the next CPU
in the cpumask of stop_machine. The diagnose 0x9c is used to tell the
hypervisor to run this next CPU instead of the current one. If there
is only a limited number of real CPUs backing the virtual CPUs we
end up with the real CPUs passed around in a round-robin fashion.
[[email protected]]:
Use cpumask_next_wrap as suggested by Peter Zijlstra.
Signed-off-by: Martin Schwidefsky <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Acked-by: Thomas Gleixner <[email protected]>
Signed-off-by: Heiko Carstens <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup fixes from Tejun Heo:
"This has an unusually high density of tricky fixes:
- task_get_css() could deadlock when it races against a dying cgroup.
- cgroup.procs didn't list thread group leaders with live threads.
This could mislead readers to think that a cgroup is empty when
it's not. Fixed by making PROCS iterator include dead tasks. I made
a couple mistakes making this change and this pull request contains
a couple follow-up patches.
- When cpusets run out of online cpus, it updates cpusmasks of member
tasks in bizarre ways. Joel improved the behavior significantly"
* 'for-5.2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cpuset: restore sanity to cpuset_cpus_allowed_fallback()
cgroup: Fix css_task_iter_advance_css_set() cset skip condition
cgroup: css_task_iter_skip()'d iterators must be advanced before accessed
cgroup: Include dying leaders with live threads in PROCS iterations
cgroup: Implement css_task_iter_skip()
cgroup: Call cgroup_release() before __exit_signal()
docs cgroups: add another example size for hugetlb
cgroup: Use css_tryget() instead of css_tryget_online() in task_get_css()
|
|
Convert proc_dointvec_minmax_bpf_stats() into a more generic
helper, since we are going to use jump labels more often.
Note that sysctl_bpf_stats_enabled is removed, since
it is no longer needed/used.
Signed-off-by: Eric Dumazet <[email protected]>
Acked-by: Alexei Starovoitov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
.ndo_xdp_xmit() assumes it is called under RCU. For example virtio_net
uses RCU to detect it has setup the resources for tx. The assumption
accidentally broke when introducing bulk queue in devmap.
Fixes: 5d053f9da431 ("bpf: devmap prepare xdp frames for bulking")
Reported-by: David Ahern <[email protected]>
Signed-off-by: Toshiaki Makita <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
|
|
dev_map_free() forgot to free bulk queue when freeing its entries.
Fixes: 5d053f9da431 ("bpf: devmap prepare xdp frames for bulking")
Signed-off-by: Toshiaki Makita <[email protected]>
Acked-by: Jesper Dangaard Brouer <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
|
|
dev_map_free() waits for flush_needed bitmap to be empty in order to
ensure all flush operations have completed before freeing its entries.
However the corresponding clear_bit() was called before using the
entries, so the entries could be used after free.
All access to the entries needs to be done before clearing the bit.
It seems commit a5e2da6e9787 ("bpf: netdev is never null in
__dev_map_flush") accidentally changed the clear_bit() and memory access
order.
Note that the problem happens only in __dev_map_flush(), not in
dev_map_flush_old(). dev_map_flush_old() is called only after nulling
out the corresponding netdev_map entry, so dev_map_free() never frees
the entry thus no such race happens there.
Fixes: a5e2da6e9787 ("bpf: netdev is never null in __dev_map_flush")
Signed-off-by: Toshiaki Makita <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
|
|
The mapper may be NULL when called from register_ftrace_function_probe()
with probe->data == NULL.
This issue can be reproduced as follow (it may be covered by compiler
optimization sometime):
/ # cat /sys/kernel/debug/tracing/set_ftrace_filter
#### all functions enabled ####
/ # echo foo_bar:dump > /sys/kernel/debug/tracing/set_ftrace_filter
[ 206.949100] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
[ 206.952402] Mem abort info:
[ 206.952819] ESR = 0x96000006
[ 206.955326] Exception class = DABT (current EL), IL = 32 bits
[ 206.955844] SET = 0, FnV = 0
[ 206.956272] EA = 0, S1PTW = 0
[ 206.956652] Data abort info:
[ 206.957320] ISV = 0, ISS = 0x00000006
[ 206.959271] CM = 0, WnR = 0
[ 206.959938] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000419f3a000
[ 206.960483] [0000000000000000] pgd=0000000411a87003, pud=0000000411a83003, pmd=0000000000000000
[ 206.964953] Internal error: Oops: 96000006 [#1] SMP
[ 206.971122] Dumping ftrace buffer:
[ 206.973677] (ftrace buffer empty)
[ 206.975258] Modules linked in:
[ 206.976631] Process sh (pid: 281, stack limit = 0x(____ptrval____))
[ 206.978449] CPU: 10 PID: 281 Comm: sh Not tainted 5.2.0-rc1+ #17
[ 206.978955] Hardware name: linux,dummy-virt (DT)
[ 206.979883] pstate: 60000005 (nZCv daif -PAN -UAO)
[ 206.980499] pc : free_ftrace_func_mapper+0x2c/0x118
[ 206.980874] lr : ftrace_count_free+0x68/0x80
[ 206.982539] sp : ffff0000182f3ab0
[ 206.983102] x29: ffff0000182f3ab0 x28: ffff8003d0ec1700
[ 206.983632] x27: ffff000013054b40 x26: 0000000000000001
[ 206.984000] x25: ffff00001385f000 x24: 0000000000000000
[ 206.984394] x23: ffff000013453000 x22: ffff000013054000
[ 206.984775] x21: 0000000000000000 x20: ffff00001385fe28
[ 206.986575] x19: ffff000013872c30 x18: 0000000000000000
[ 206.987111] x17: 0000000000000000 x16: 0000000000000000
[ 206.987491] x15: ffffffffffffffb0 x14: 0000000000000000
[ 206.987850] x13: 000000000017430e x12: 0000000000000580
[ 206.988251] x11: 0000000000000000 x10: cccccccccccccccc
[ 206.988740] x9 : 0000000000000000 x8 : ffff000013917550
[ 206.990198] x7 : ffff000012fac2e8 x6 : ffff000012fac000
[ 206.991008] x5 : ffff0000103da588 x4 : 0000000000000001
[ 206.991395] x3 : 0000000000000001 x2 : ffff000013872a28
[ 206.991771] x1 : 0000000000000000 x0 : 0000000000000000
[ 206.992557] Call trace:
[ 206.993101] free_ftrace_func_mapper+0x2c/0x118
[ 206.994827] ftrace_count_free+0x68/0x80
[ 206.995238] release_probe+0xfc/0x1d0
[ 206.995555] register_ftrace_function_probe+0x4a8/0x868
[ 206.995923] ftrace_trace_probe_callback.isra.4+0xb8/0x180
[ 206.996330] ftrace_dump_callback+0x50/0x70
[ 206.996663] ftrace_regex_write.isra.29+0x290/0x3a8
[ 206.997157] ftrace_filter_write+0x44/0x60
[ 206.998971] __vfs_write+0x64/0xf0
[ 206.999285] vfs_write+0x14c/0x2f0
[ 206.999591] ksys_write+0xbc/0x1b0
[ 206.999888] __arm64_sys_write+0x3c/0x58
[ 207.000246] el0_svc_common.constprop.0+0x408/0x5f0
[ 207.000607] el0_svc_handler+0x144/0x1c8
[ 207.000916] el0_svc+0x8/0xc
[ 207.003699] Code: aa0003f8 a9025bf5 aa0103f5 f946ea80 (f9400303)
[ 207.008388] ---[ end trace 7b6d11b5f542bdf1 ]---
[ 207.010126] Kernel panic - not syncing: Fatal exception
[ 207.011322] SMP: stopping secondary CPUs
[ 207.013956] Dumping ftrace buffer:
[ 207.014595] (ftrace buffer empty)
[ 207.015632] Kernel Offset: disabled
[ 207.017187] CPU features: 0x002,20006008
[ 207.017985] Memory Limit: none
[ 207.019825] ---[ end Kernel panic - not syncing: Fatal exception ]---
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Wei Li <[email protected]>
Signed-off-by: Steven Rostedt (VMware) <[email protected]>
|
|
Convert the PM documents to ReST, in order to allow them to
build with Sphinx.
The conversion is actually:
- add blank lines and indentation in order to identify paragraphs;
- fix tables markups;
- add some lists markups;
- mark literal blocks;
- adjust title markups.
At its new index.rst, let's add a :orphan: while this is not linked to
the main index.rst file, in order to avoid build warnings.
Signed-off-by: Mauro Carvalho Chehab <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
Acked-by: Mark Brown <[email protected]>
Acked-by: Srivatsa S. Bhat (VMware) <[email protected]>
|
|
It's possible for livepatch and ftrace to be toggling a module's text
permissions at the same time, resulting in the following panic:
BUG: unable to handle page fault for address: ffffffffc005b1d9
#PF: supervisor write access in kernel mode
#PF: error_code(0x0003) - permissions violation
PGD 3ea0c067 P4D 3ea0c067 PUD 3ea0e067 PMD 3cc13067 PTE 3b8a1061
Oops: 0003 [#1] PREEMPT SMP PTI
CPU: 1 PID: 453 Comm: insmod Tainted: G O K 5.2.0-rc1-a188339ca5 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-20181126_142135-anatol 04/01/2014
RIP: 0010:apply_relocate_add+0xbe/0x14c
Code: fa 0b 74 21 48 83 fa 18 74 38 48 83 fa 0a 75 40 eb 08 48 83 38 00 74 33 eb 53 83 38 00 75 4e 89 08 89 c8 eb 0a 83 38 00 75 43 <89> 08 48 63 c1 48 39 c8 74 2e eb 48 83 38 00 75 32 48 29 c1 89 08
RSP: 0018:ffffb223c00dbb10 EFLAGS: 00010246
RAX: ffffffffc005b1d9 RBX: 0000000000000000 RCX: ffffffff8b200060
RDX: 000000000000000b RSI: 0000004b0000000b RDI: ffff96bdfcd33000
RBP: ffffb223c00dbb38 R08: ffffffffc005d040 R09: ffffffffc005c1f0
R10: ffff96bdfcd33c40 R11: ffff96bdfcd33b80 R12: 0000000000000018
R13: ffffffffc005c1f0 R14: ffffffffc005e708 R15: ffffffff8b2fbc74
FS: 00007f5f447beba8(0000) GS:ffff96bdff900000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffffc005b1d9 CR3: 000000003cedc002 CR4: 0000000000360ea0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
klp_init_object_loaded+0x10f/0x219
? preempt_latency_start+0x21/0x57
klp_enable_patch+0x662/0x809
? virt_to_head_page+0x3a/0x3c
? kfree+0x8c/0x126
patch_init+0x2ed/0x1000 [livepatch_test02]
? 0xffffffffc0060000
do_one_initcall+0x9f/0x1c5
? kmem_cache_alloc_trace+0xc4/0xd4
? do_init_module+0x27/0x210
do_init_module+0x5f/0x210
load_module+0x1c41/0x2290
? fsnotify_path+0x3b/0x42
? strstarts+0x2b/0x2b
? kernel_read+0x58/0x65
__do_sys_finit_module+0x9f/0xc3
? __do_sys_finit_module+0x9f/0xc3
__x64_sys_finit_module+0x1a/0x1c
do_syscall_64+0x52/0x61
entry_SYSCALL_64_after_hwframe+0x44/0xa9
The above panic occurs when loading two modules at the same time with
ftrace enabled, where at least one of the modules is a livepatch module:
CPU0 CPU1
klp_enable_patch()
klp_init_object_loaded()
module_disable_ro()
ftrace_module_enable()
ftrace_arch_code_modify_post_process()
set_all_modules_text_ro()
klp_write_object_relocations()
apply_relocate_add()
*patches read-only code* - BOOM
A similar race exists when toggling ftrace while loading a livepatch
module.
Fix it by ensuring that the livepatch and ftrace code patching
operations -- and their respective permissions changes -- are protected
by the text_mutex.
Link: http://lkml.kernel.org/r/ab43d56ab909469ac5d2520c5d944ad6d4abd476.1560474114.git.jpoimboe@redhat.com
Reported-by: Johannes Erdfelt <[email protected]>
Fixes: 444d13ff10fb ("modules: add ro_after_init support")
Acked-by: Jessica Yu <[email protected]>
Reviewed-by: Petr Mladek <[email protected]>
Reviewed-by: Miroslav Benes <[email protected]>
Signed-off-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Steven Rostedt (VMware) <[email protected]>
|
|
Commit 0597c49c69d5 ("tracing/uprobes: Use dyn_event framework for
uprobe events") cleaned up the usage of trace_uprobe_create(), and the
function has been no longer used for removing uprobe/uretprobe.
Link: http://lkml.kernel.org/r/[email protected]
Reviewed-by: Srikar Dronamraju <[email protected]>
Signed-off-by: Eiichi Tsukata <[email protected]>
Signed-off-by: Steven Rostedt (VMware) <[email protected]>
|
|
Just like the case of commit 8b05a3a7503c ("tracing/kprobes: Fix NULL
pointer dereference in trace_kprobe_create()"), writing an incorrectly
formatted string to uprobe_events can trigger NULL pointer dereference.
Reporeducer:
# echo r > /sys/kernel/debug/tracing/uprobe_events
dmesg:
BUG: kernel NULL pointer dereference, address: 0000000000000000
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 8000000079d12067 P4D 8000000079d12067 PUD 7b7ab067 PMD 0
Oops: 0000 [#1] PREEMPT SMP PTI
CPU: 0 PID: 1903 Comm: bash Not tainted 5.2.0-rc3+ #15
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-2.fc30 04/01/2014
RIP: 0010:strchr+0x0/0x30
Code: c0 eb 0d 84 c9 74 18 48 83 c0 01 48 39 d0 74 0f 0f b6 0c 07 3a 0c 06 74 ea 19 c0 83 c8 01 c3 31 c0 c3 0f 1f 84 00 00 00 00 00 <0f> b6 07 89 f2 40 38 f0 75 0e eb 13 0f b6 47 01 48 83 c
RSP: 0018:ffffb55fc0403d10 EFLAGS: 00010293
RAX: ffff993ffb793400 RBX: 0000000000000000 RCX: ffffffffa4852625
RDX: 0000000000000000 RSI: 000000000000002f RDI: 0000000000000000
RBP: ffffb55fc0403dd0 R08: ffff993ffb793400 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: ffff993ff9cc1668 R14: 0000000000000001 R15: 0000000000000000
FS: 00007f30c5147700(0000) GS:ffff993ffda00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 000000007b628000 CR4: 00000000000006f0
Call Trace:
trace_uprobe_create+0xe6/0xb10
? __kmalloc_track_caller+0xe6/0x1c0
? __kmalloc+0xf0/0x1d0
? trace_uprobe_create+0xb10/0xb10
create_or_delete_trace_uprobe+0x35/0x90
? trace_uprobe_create+0xb10/0xb10
trace_run_command+0x9c/0xb0
trace_parse_run_command+0xf9/0x1eb
? probes_open+0x80/0x80
__vfs_write+0x43/0x90
vfs_write+0x14a/0x2a0
ksys_write+0xa2/0x170
do_syscall_64+0x7f/0x200
entry_SYSCALL_64_after_hwframe+0x49/0xbe
Link: http://lkml.kernel.org/r/[email protected]
Cc: [email protected]
Fixes: 0597c49c69d5 ("tracing/uprobes: Use dyn_event framework for uprobe events")
Reviewed-by: Srikar Dronamraju <[email protected]>
Signed-off-by: Eiichi Tsukata <[email protected]>
Signed-off-by: Steven Rostedt (VMware) <[email protected]>
|
|
Fix sparse warnings:
kernel/trace/trace.c:6927:24: warning:
symbol 'get_tracing_log_err' was not declared. Should it be static?
kernel/trace/trace.c:8196:15: warning:
symbol 'trace_instance_dir' was not declared. Should it be static?
Link: http://lkml.kernel.org/r/[email protected]
Acked-by: Tom Zanussi <[email protected]>
Reported-by: Hulk Robot <[email protected]>
Signed-off-by: YueHaibing <[email protected]>
Signed-off-by: Steven Rostedt (VMware) <[email protected]>
|
|
Selecting HAVE_NOP_MCOUNT enables -mnop-mcount (if gcc supports it)
and sets CC_USING_NOP_MCOUNT. Reuse __is_defined (which is suitable for
testing CC_USING_* defines) to avoid conditional compilation and fix
the following gcc 9 warning on s390:
kernel/trace/ftrace.c:2514:1: warning: ‘ftrace_code_disable’ defined
but not used [-Wunused-function]
Link: http://lkml.kernel.org/r/patch.git-1a82d13f33ac.your-ad-here.call-01559732716-ext-6629@work.hours
Fixes: 2f4df0017baed ("tracing: Add -mcount-nop option support")
Signed-off-by: Vasily Gorbik <[email protected]>
Signed-off-by: Steven Rostedt (VMware) <[email protected]>
|
|
In order to prepare to add them to the Kernel API book,
convert the files to ReST format.
The conversion is actually:
- add blank lines and identation in order to identify paragraphs;
- fix tables markups;
- add some lists markups;
- mark literal blocks;
- adjust title markups.
At its new index.rst, let's add a :orphan: while this is not linked to
the main index.rst file, in order to avoid build warnings.
Signed-off-by: Mauro Carvalho Chehab <[email protected]>
Signed-off-by: Jonathan Corbet <[email protected]>
|
|
Convert the cgroup-v1 files to ReST format, in order to
allow a later addition to the admin-guide.
The conversion is actually:
- add blank lines and identation in order to identify paragraphs;
- fix tables markups;
- add some lists markups;
- mark literal blocks;
- adjust title markups.
At its new index.rst, let's add a :orphan: while this is not linked to
the main index.rst file, in order to avoid build warnings.
Signed-off-by: Mauro Carvalho Chehab <[email protected]>
Acked-by: Tejun Heo <[email protected]>
Signed-off-by: Tejun Heo <[email protected]>
|
|
Puts range check before dereferencing the pointer.
Reproducer:
# echo stacktrace > trace_options
# echo 1 > events/enable
# cat trace > /dev/null
KASAN report:
==================================================================
BUG: KASAN: use-after-free in trace_stack_print+0x26b/0x2c0
Read of size 8 at addr ffff888069d20000 by task cat/1953
CPU: 0 PID: 1953 Comm: cat Not tainted 5.2.0-rc3+ #5
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-2.fc30 04/01/2014
Call Trace:
dump_stack+0x8a/0xce
print_address_description+0x60/0x224
? trace_stack_print+0x26b/0x2c0
? trace_stack_print+0x26b/0x2c0
__kasan_report.cold+0x1a/0x3e
? trace_stack_print+0x26b/0x2c0
kasan_report+0xe/0x20
trace_stack_print+0x26b/0x2c0
print_trace_line+0x6ea/0x14d0
? tracing_buffers_read+0x700/0x700
? trace_find_next_entry_inc+0x158/0x1d0
s_show+0xea/0x310
seq_read+0xaa7/0x10e0
? seq_escape+0x230/0x230
__vfs_read+0x7c/0x100
vfs_read+0x16c/0x3a0
ksys_read+0x121/0x240
? kernel_write+0x110/0x110
? perf_trace_sys_enter+0x8a0/0x8a0
? syscall_slow_exit_work+0xa9/0x410
do_syscall_64+0xb7/0x390
? prepare_exit_to_usermode+0x165/0x200
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7f867681f910
Code: b6 fe ff ff 48 8d 3d 0f be 08 00 48 83 ec 08 e8 06 db 01 00 66 0f 1f 44 00 00 83 3d f9 2d 2c 00 00 75 10 b8 00 00 00 00 04
RSP: 002b:00007ffdabf23488 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007f867681f910
RDX: 0000000000020000 RSI: 00007f8676cde000 RDI: 0000000000000003
RBP: 00007f8676cde000 R08: ffffffffffffffff R09: 0000000000000000
R10: 0000000000000871 R11: 0000000000000246 R12: 00007f8676cde000
R13: 0000000000000003 R14: 0000000000020000 R15: 0000000000000ec0
Allocated by task 1214:
save_stack+0x1b/0x80
__kasan_kmalloc.constprop.0+0xc2/0xd0
kmem_cache_alloc+0xaf/0x1a0
getname_flags+0xd2/0x5b0
do_sys_open+0x277/0x5a0
do_syscall_64+0xb7/0x390
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Freed by task 1214:
save_stack+0x1b/0x80
__kasan_slab_free+0x12c/0x170
kmem_cache_free+0x8a/0x1c0
putname+0xe1/0x120
do_sys_open+0x2c5/0x5a0
do_syscall_64+0xb7/0x390
entry_SYSCALL_64_after_hwframe+0x44/0xa9
The buggy address belongs to the object at ffff888069d20000
which belongs to the cache names_cache of size 4096
The buggy address is located 0 bytes inside of
4096-byte region [ffff888069d20000, ffff888069d21000)
The buggy address belongs to the page:
page:ffffea0001a74800 refcount:1 mapcount:0 mapping:ffff88806ccd1380 index:0x0 compound_mapcount: 0
flags: 0x100000000010200(slab|head)
raw: 0100000000010200 dead000000000100 dead000000000200 ffff88806ccd1380
raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffff888069d1ff00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888069d1ff80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>ffff888069d20000: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888069d20080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888069d20100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================
Link: http://lkml.kernel.org/r/[email protected]
Fixes: 4285f2fcef80 ("tracing: Remove the ULONG_MAX stack trace hackery")
Signed-off-by: Eiichi Tsukata <[email protected]>
Signed-off-by: Steven Rostedt (VMware) <[email protected]>
|
|
a5e112e6424a ("cgroup: add cgroup_parse_float()") accidentally added
cgroup_parse_float() inside CONFIG_SYSFS block. Move it outside so
that it doesn't cause failures on !CONFIG_SYSFS builds.
Signed-off-by: Tejun Heo <[email protected]>
Fixes: a5e112e6424a ("cgroup: add cgroup_parse_float()")
|
|
This brings the kernel doc in line with the function signature.
Signed-off-by: Yangtao Li <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]
|
|
The inline keyword was not at the beginning of the function declarations.
Fix the following warnings triggered when using W=1:
kernel/time/clocksource.c:108:1: warning: 'inline' is not at beginning of declaration [-Wold-style-declaration]
kernel/time/clocksource.c:113:1: warning: 'inline' is not at beginning of declaration [-Wold-style-declaration]
Signed-off-by: Mathieu Malaterre <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: John Stultz <[email protected]>
Cc: Stephen Boyd <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
With architectures allowing the kernel to be placed almost arbitrarily
in memory (e.g.: ARM64), it is possible to have the kernel resides at
physical addresses above 4GB, resulting in neither the default CMA area,
nor the atomic pool from successfully allocating. This does not prevent
specific peripherals from working though, one example is XHCI, which
still operates correctly.
Trouble comes when the XHCI driver gets suspended and resumed, since we
can now trigger the following NPD:
[ 12.664170] usb usb1: root hub lost power or was reset
[ 12.669387] usb usb2: root hub lost power or was reset
[ 12.674662] Unable to handle kernel NULL pointer dereference at virtual address 00000008
[ 12.682896] pgd = ffffffc1365a7000
[ 12.686386] [00000008] *pgd=0000000136500003, *pud=0000000136500003, *pmd=0000000000000000
[ 12.694897] Internal error: Oops: 96000006 [#1] SMP
[ 12.699843] Modules linked in:
[ 12.702980] CPU: 0 PID: 1499 Comm: pml Not tainted 4.9.135-1.13pre #51
[ 12.709577] Hardware name: BCM97268DV (DT)
[ 12.713736] task: ffffffc136bb6540 task.stack: ffffffc1366cc000
[ 12.719740] PC is at addr_in_gen_pool+0x4/0x48
[ 12.724253] LR is at __dma_free+0x64/0xbc
[ 12.728325] pc : [<ffffff80083c0df8>] lr : [<ffffff80080979e0>] pstate: 60000145
[ 12.735825] sp : ffffffc1366cf990
[ 12.739196] x29: ffffffc1366cf990 x28: ffffffc1366cc000
[ 12.744608] x27: 0000000000000000 x26: ffffffc13a8568c8
[ 12.750020] x25: 0000000000000000 x24: ffffff80098f9000
[ 12.755433] x23: 000000013a5ff000 x22: ffffff8009c57000
[ 12.760844] x21: ffffffc13a856810 x20: 0000000000000000
[ 12.766255] x19: 0000000000001000 x18: 000000000000000a
[ 12.771667] x17: 0000007f917553e0 x16: 0000000000001002
[ 12.777078] x15: 00000000000a36cb x14: ffffff80898feb77
[ 12.782490] x13: ffffffffffffffff x12: 0000000000000030
[ 12.787899] x11: 00000000fffffffe x10: ffffff80098feb7f
[ 12.793311] x9 : 0000000005f5e0ff x8 : 65776f702074736f
[ 12.798723] x7 : 6c2062756820746f x6 : ffffff80098febb1
[ 12.804134] x5 : ffffff800809797c x4 : 0000000000000000
[ 12.809545] x3 : 000000013a5ff000 x2 : 0000000000000fff
[ 12.814955] x1 : ffffff8009c57000 x0 : 0000000000000000
[ 12.820363]
[ 12.821907] Process pml (pid: 1499, stack limit = 0xffffffc1366cc020)
[ 12.828421] Stack: (0xffffffc1366cf990 to 0xffffffc1366d0000)
[ 12.834240] f980: ffffffc1366cf9e0 ffffff80086004d0
[ 12.842186] f9a0: ffffffc13ab08238 0000000000000010 ffffff80097c2218 ffffffc13a856810
[ 12.850131] f9c0: ffffff8009c57000 000000013a5ff000 0000000000000008 000000013a5ff000
[ 12.858076] f9e0: ffffffc1366cfa50 ffffff80085f9250 ffffffc13ab08238 0000000000000004
[ 12.866021] fa00: ffffffc13ab08000 ffffff80097b6000 ffffffc13ab08130 0000000000000001
[ 12.873966] fa20: 0000000000000008 ffffffc13a8568c8 0000000000000000 ffffffc1366cc000
[ 12.881911] fa40: ffffffc13ab08130 0000000000000001 ffffffc1366cfa90 ffffff80085e3de8
[ 12.889856] fa60: ffffffc13ab08238 0000000000000000 ffffffc136b75b00 0000000000000000
[ 12.897801] fa80: 0000000000000010 ffffff80089ccb92 ffffffc1366cfac0 ffffff80084ad040
[ 12.905746] faa0: ffffffc13a856810 0000000000000000 ffffff80084ad004 ffffff80084b91a8
[ 12.913691] fac0: ffffffc1366cfae0 ffffff80084b91b4 ffffffc13a856810 ffffff80080db5cc
[ 12.921636] fae0: ffffffc1366cfb20 ffffff80084b96bc ffffffc13a856810 0000000000000010
[ 12.929581] fb00: ffffffc13a856870 0000000000000000 ffffffc13a856810 ffffff800984d2b8
[ 12.937526] fb20: ffffffc1366cfb50 ffffff80084baa70 ffffff8009932ad0 ffffff800984d260
[ 12.945471] fb40: 0000000000000010 00000002eff0a065 ffffffc1366cfbb0 ffffff80084bafbc
[ 12.953415] fb60: 0000000000000010 0000000000000003 ffffff80098fe000 0000000000000000
[ 12.961360] fb80: ffffff80097b6000 ffffff80097b6dc8 ffffff80098c12b8 ffffff80098c12f8
[ 12.969306] fba0: ffffff8008842000 ffffff80097b6dc8 ffffffc1366cfbd0 ffffff80080e0d88
[ 12.977251] fbc0: 00000000fffffffb ffffff80080e10bc ffffffc1366cfc60 ffffff80080e16a8
[ 12.985196] fbe0: 0000000000000000 0000000000000003 ffffff80097b6000 ffffff80098fe9f0
[ 12.993140] fc00: ffffff80097d4000 ffffff8008983802 0000000000000123 0000000000000040
[ 13.001085] fc20: ffffff8008842000 ffffffc1366cc000 ffffff80089803c2 00000000ffffffff
[ 13.009029] fc40: 0000000000000000 0000000000000000 ffffffc1366cfc60 0000000000040987
[ 13.016974] fc60: ffffffc1366cfcc0 ffffff80080dfd08 0000000000000003 0000000000000004
[ 13.024919] fc80: 0000000000000003 ffffff80098fea08 ffffffc136577ec0 ffffff80089803c2
[ 13.032864] fca0: 0000000000000123 0000000000000001 0000000500000002 0000000000040987
[ 13.040809] fcc0: ffffffc1366cfd00 ffffff80083a89d4 0000000000000004 ffffffc136577ec0
[ 13.048754] fce0: ffffffc136610cc0 ffffffffffffffea ffffffc1366cfeb0 ffffffc136610cd8
[ 13.056700] fd00: ffffffc1366cfd10 ffffff800822a614 ffffffc1366cfd40 ffffff80082295d4
[ 13.064645] fd20: 0000000000000004 ffffffc136577ec0 ffffffc136610cc0 0000000021670570
[ 13.072590] fd40: ffffffc1366cfd80 ffffff80081b5d10 ffffff80097b6000 ffffffc13aae4200
[ 13.080536] fd60: ffffffc1366cfeb0 0000000000000004 0000000021670570 0000000000000004
[ 13.088481] fd80: ffffffc1366cfe30 ffffff80081b6b20 ffffffc13aae4200 0000000000000000
[ 13.096427] fda0: 0000000000000004 0000000021670570 ffffffc1366cfeb0 ffffffc13a838200
[ 13.104371] fdc0: 0000000000000000 000000000000000a ffffff80097b6000 0000000000040987
[ 13.112316] fde0: ffffffc1366cfe20 ffffff80081b3af0 ffffffc13a838200 0000000000000000
[ 13.120261] fe00: ffffffc1366cfe30 ffffff80081b6b0c ffffffc13aae4200 0000000000000000
[ 13.128206] fe20: 0000000000000004 0000000000040987 ffffffc1366cfe70 ffffff80081b7dd8
[ 13.136151] fe40: ffffff80097b6000 ffffffc13aae4200 ffffffc13aae4200 fffffffffffffff7
[ 13.144096] fe60: 0000000021670570 ffffffc13a8c63c0 0000000000000000 ffffff8008083180
[ 13.152042] fe80: ffffffffffffff1d 0000000021670570 ffffffffffffffff 0000007f917ad9b8
[ 13.159986] fea0: 0000000020000000 0000000000000015 0000000000000000 0000000000040987
[ 13.167930] fec0: 0000000000000001 0000000021670570 0000000000000004 0000000000000000
[ 13.175874] fee0: 0000000000000888 0000440110000000 000000000000006d 0000000000000003
[ 13.183819] ff00: 0000000000000040 ffffff80ffffffc8 0000000000000000 0000000000000020
[ 13.191762] ff20: 0000000000000000 0000000000000000 0000000000000001 0000000000000000
[ 13.199707] ff40: 0000000000000000 0000007f917553e0 0000000000000000 0000000000000004
[ 13.207651] ff60: 0000000021670570 0000007f91835480 0000000000000004 0000007f91831638
[ 13.215595] ff80: 0000000000000004 00000000004b0de0 00000000004b0000 0000000000000000
[ 13.223539] ffa0: 0000000000000000 0000007fc92ac8c0 0000007f9175d178 0000007fc92ac8c0
[ 13.231483] ffc0: 0000007f917ad9b8 0000000020000000 0000000000000001 0000000000000040
[ 13.239427] ffe0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 13.247360] Call trace:
[ 13.249866] Exception stack(0xffffffc1366cf7a0 to 0xffffffc1366cf8d0)
[ 13.256386] f7a0: 0000000000001000 0000007fffffffff ffffffc1366cf990 ffffff80083c0df8
[ 13.264331] f7c0: 0000000060000145 ffffff80089b5001 ffffffc13ab08130 0000000000000001
[ 13.272275] f7e0: 0000000000000008 ffffffc13a8568c8 0000000000000000 0000000000000000
[ 13.280220] f800: ffffffc1366cf960 ffffffc1366cf960 ffffffc1366cf930 00000000ffffffd8
[ 13.288165] f820: ffffff8009931ac0 4554535953425553 4544006273753d4d 3831633d45434956
[ 13.296110] f840: ffff003832313a39 ffffff800845926c ffffffc1366cf880 0000000000040987
[ 13.304054] f860: 0000000000000000 ffffff8009c57000 0000000000000fff 000000013a5ff000
[ 13.311999] f880: 0000000000000000 ffffff800809797c ffffff80098febb1 6c2062756820746f
[ 13.319944] f8a0: 65776f702074736f 0000000005f5e0ff ffffff80098feb7f 00000000fffffffe
[ 13.327884] f8c0: 0000000000000030 ffffffffffffffff
[ 13.332835] [<ffffff80083c0df8>] addr_in_gen_pool+0x4/0x48
[ 13.338398] [<ffffff80086004d0>] xhci_mem_cleanup+0xc8/0x51c
[ 13.344137] [<ffffff80085f9250>] xhci_resume+0x308/0x65c
[ 13.349524] [<ffffff80085e3de8>] xhci_brcm_resume+0x84/0x8c
[ 13.355174] [<ffffff80084ad040>] platform_pm_resume+0x3c/0x64
[ 13.360997] [<ffffff80084b91b4>] dpm_run_callback+0x5c/0x15c
[ 13.366732] [<ffffff80084b96bc>] device_resume+0xc0/0x190
[ 13.372205] [<ffffff80084baa70>] dpm_resume+0x144/0x2cc
[ 13.377504] [<ffffff80084bafbc>] dpm_resume_end+0x20/0x34
[ 13.382980] [<ffffff80080e0d88>] suspend_devices_and_enter+0x104/0x704
[ 13.389585] [<ffffff80080e16a8>] pm_suspend+0x320/0x53c
[ 13.394881] [<ffffff80080dfd08>] state_store+0xbc/0xe0
[ 13.400094] [<ffffff80083a89d4>] kobj_attr_store+0x14/0x24
[ 13.405655] [<ffffff800822a614>] sysfs_kf_write+0x60/0x70
[ 13.411128] [<ffffff80082295d4>] kernfs_fop_write+0x130/0x194
[ 13.416954] [<ffffff80081b5d10>] __vfs_write+0x60/0x150
[ 13.422254] [<ffffff80081b6b20>] vfs_write+0xc8/0x164
[ 13.427376] [<ffffff80081b7dd8>] SyS_write+0x70/0xc8
[ 13.432412] [<ffffff8008083180>] el0_svc_naked+0x34/0x38
[ 13.437800] Code: 92800173 97f6fb9e 17fffff5 d1000442 (f8408c03)
[ 13.444033] ---[ end trace 2effe12f909ce205 ]---
The call path leading to this problem is xhci_mem_cleanup() ->
dma_free_coherent() -> dma_free_from_pool() -> addr_in_gen_pool. If the
atomic_pool is NULL, we can't possibly have the address in the atomic
pool anyway, so guard against that.
Signed-off-by: Florian Fainelli <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
|
|
Jason reported that the coarse ktime based time getters advance only once
per second and not once per tick as advertised.
The code reads only the monotonic base time, which advances once per
second. The nanoseconds are accumulated on every tick in xtime_nsec up to
a second and the regular time getters take this nanoseconds offset into
account, but the ktime_get_coarse*() implementation fails to do so.
Add the accumulated xtime_nsec value to the monotonic base time to get the
proper per tick advancing coarse tinme.
Fixes: b9ff604cff11 ("timekeeping: Add ktime_get_coarse_with_offset")
Reported-by: Jason A. Donenfeld <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Jason A. Donenfeld <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Clemens Ladisch <[email protected]>
Cc: Sultan Alsawaf <[email protected]>
Cc: Waiman Long <[email protected]>
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]
|
|
The declaration for pfn_is_nosave is only available in
kernel/power/power.h. Since this function can be override in arch,
expose it globally. Having a prototype will make sure to avoid warning
(sometime treated as error with W=1) such as:
arch/powerpc/kernel/suspend.c:18:5: error: no previous prototype for 'pfn_is_nosave' [-Werror=missing-prototypes]
This moves the declaration into a globally visible header file and add
missing include to avoid a warning on powerpc.
Also remove the duplicated prototypes since not required anymore.
Signed-off-by: Mathieu Malaterre <[email protected]>
Acked-by: Michael Ellerman <[email protected]> (powerpc)
Signed-off-by: Rafael J. Wysocki <[email protected]>
|
|
In module_add_modinfo_attrs if sysfs_create_file
fails, we forget to free allocated modinfo_attrs
and roll back the sysfs files.
Fixes: 03e88ae1b13d ("[PATCH] fix module sysfs files reference counting")
Reviewed-by: Miroslav Benes <[email protected]>
Signed-off-by: YueHaibing <[email protected]>
Signed-off-by: Jessica Yu <[email protected]>
|
|
Logan noticed that devm_memremap_pages_release() kills the percpu_ref
drops all the page references that were acquired at init and then
immediately proceeds to unplug, arch_remove_memory(), the backing pages
for the pagemap. If for some reason device shutdown actually collides
with a busy / elevated-ref-count page then arch_remove_memory() should
be deferred until after that reference is dropped.
As it stands the "wait for last page ref drop" happens *after*
devm_memremap_pages_release() returns, which is obviously too late and
can lead to crashes.
Fix this situation by assigning the responsibility to wait for the
percpu_ref to go idle to devm_memremap_pages() with a new ->cleanup()
callback. Implement the new cleanup callback for all
devm_memremap_pages() users: pmem, devdax, hmm, and p2pdma.
Link: http://lkml.kernel.org/r/155727339156.292046.5432007428235387859.stgit@dwillia2-desk3.amr.corp.intel.com
Fixes: 41e94a851304 ("add devm_memremap_pages")
Signed-off-by: Dan Williams <[email protected]>
Reported-by: Logan Gunthorpe <[email protected]>
Reviewed-by: Ira Weiny <[email protected]>
Reviewed-by: Logan Gunthorpe <[email protected]>
Cc: Bjorn Helgaas <[email protected]>
Cc: "Jérôme Glisse" <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Use the new devm_release_action() facility to allow
devm_memremap_pages_release() to be manually triggered.
Link: http://lkml.kernel.org/r/155727337088.292046.5774214552136776763.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <[email protected]>
Reviewed-by: Ira Weiny <[email protected]>
Reviewed-by: Logan Gunthorpe <[email protected]>
Cc: Bjorn Helgaas <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: "Jérôme Glisse" <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The sync_exp_work_done() function uses smp_mb__before_atomic(), but
there is no obvious atomic in the ensuing code. The ordering is
absolutely required for grace periods to work correctly, so this
commit upgrades the smp_mb__before_atomic() to smp_mb().
Fixes: 6fba2b3767ea ("rcu: Remove deprecated RCU debugfs tracing code")
Reported-by: Andrea Parri <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
|
|
In the case that a process is constrained by taskset(1) (i.e.
sched_setaffinity(2)) to a subset of available cpus, and all of those are
subsequently offlined, the scheduler will set tsk->cpus_allowed to
the current value of task_cs(tsk)->effective_cpus.
This is done via a call to do_set_cpus_allowed() in the context of
cpuset_cpus_allowed_fallback() made by the scheduler when this case is
detected. This is the only call made to cpuset_cpus_allowed_fallback()
in the latest mainline kernel.
However, this is not sane behavior.
I will demonstrate this on a system running the latest upstream kernel
with the following initial configuration:
# grep -i cpu /proc/$$/status
Cpus_allowed: ffffffff,fffffff
Cpus_allowed_list: 0-63
(Where cpus 32-63 are provided via smt.)
If we limit our current shell process to cpu2 only and then offline it
and reonline it:
# taskset -p 4 $$
pid 2272's current affinity mask: ffffffffffffffff
pid 2272's new affinity mask: 4
# echo off > /sys/devices/system/cpu/cpu2/online
# dmesg | tail -3
[ 2195.866089] process 2272 (bash) no longer affine to cpu2
[ 2195.872700] IRQ 114: no longer affine to CPU2
[ 2195.879128] smpboot: CPU 2 is now offline
# echo on > /sys/devices/system/cpu/cpu2/online
# dmesg | tail -1
[ 2617.043572] smpboot: Booting Node 0 Processor 2 APIC 0x4
We see that our current process now has an affinity mask containing
every cpu available on the system _except_ the one we originally
constrained it to:
# grep -i cpu /proc/$$/status
Cpus_allowed: ffffffff,fffffffb
Cpus_allowed_list: 0-1,3-63
This is not sane behavior, as the scheduler can now not only place the
process on previously forbidden cpus, it can't even schedule it on
the cpu it was originally constrained to!
Other cases result in even more exotic affinity masks. Take for instance
a process with an affinity mask containing only cpus provided by smt at
the moment that smt is toggled, in a configuration such as the following:
# taskset -p f000000000 $$
# grep -i cpu /proc/$$/status
Cpus_allowed: 000000f0,00000000
Cpus_allowed_list: 36-39
A double toggle of smt results in the following behavior:
# echo off > /sys/devices/system/cpu/smt/control
# echo on > /sys/devices/system/cpu/smt/control
# grep -i cpus /proc/$$/status
Cpus_allowed: ffffff00,ffffffff
Cpus_allowed_list: 0-31,40-63
This is even less sane than the previous case, as the new affinity mask
excludes all smt-provided cpus with ids less than those that were
previously in the affinity mask, as well as those that were actually in
the mask.
With this patch applied, both of these cases end in the following state:
# grep -i cpu /proc/$$/status
Cpus_allowed: ffffffff,ffffffff
Cpus_allowed_list: 0-63
The original policy is discarded. Though not ideal, it is the simplest way
to restore sanity to this fallback case without reinventing the cpuset
wheel that rolls down the kernel just fine in cgroup v2. A user who wishes
for the previous affinity mask to be restored in this fallback case can use
that mechanism instead.
This patch modifies scheduler behavior by instead resetting the mask to
task_cs(tsk)->cpus_allowed by default, and cpu_possible mask in legacy
mode. I tested the cases above on both modes.
Note that the scheduler uses this fallback mechanism if and only if
_every_ other valid avenue has been traveled, and it is the last resort
before calling BUG().
Suggested-by: Waiman Long <[email protected]>
Suggested-by: Phil Auld <[email protected]>
Signed-off-by: Joel Savitz <[email protected]>
Acked-by: Phil Auld <[email protected]>
Acked-by: Waiman Long <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Tejun Heo <[email protected]>
|
|
Compiling kernel/bpf/core.c with W=1 causes a flood of warnings:
kernel/bpf/core.c:1198:65: warning: initialized field overwritten [-Woverride-init]
1198 | #define BPF_INSN_3_TBL(x, y, z) [BPF_##x | BPF_##y | BPF_##z] = true
| ^~~~
kernel/bpf/core.c:1087:2: note: in expansion of macro 'BPF_INSN_3_TBL'
1087 | INSN_3(ALU, ADD, X), \
| ^~~~~~
kernel/bpf/core.c:1202:3: note: in expansion of macro 'BPF_INSN_MAP'
1202 | BPF_INSN_MAP(BPF_INSN_2_TBL, BPF_INSN_3_TBL),
| ^~~~~~~~~~~~
kernel/bpf/core.c:1198:65: note: (near initialization for 'public_insntable[12]')
1198 | #define BPF_INSN_3_TBL(x, y, z) [BPF_##x | BPF_##y | BPF_##z] = true
| ^~~~
kernel/bpf/core.c:1087:2: note: in expansion of macro 'BPF_INSN_3_TBL'
1087 | INSN_3(ALU, ADD, X), \
| ^~~~~~
kernel/bpf/core.c:1202:3: note: in expansion of macro 'BPF_INSN_MAP'
1202 | BPF_INSN_MAP(BPF_INSN_2_TBL, BPF_INSN_3_TBL),
| ^~~~~~~~~~~~
98 copies of the above.
The attached patch silences the warnings, because we *know* we're overwriting
the default initializer. That leaves bpf/core.c with only 6 other warnings,
which become more visible in comparison.
Signed-off-by: Valdis Kletnieks <[email protected]>
Acked-by: Andrii Nakryiko <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
|
|
When "deep" suspend is enabled, all CPUs except the primary CPU are frozen
via CPU hotplug one by one. After all secondary CPUs are unplugged the
wakeup pending condition is evaluated and if pending the suspend operation
is aborted and the secondary CPUs are brought up again.
CPU hotplug is a slow operation, so it makes sense to check for wakeup
pending in the freezer loop before bringing down the next CPU. This
improves the system suspend abort latency significantly.
[ tglx: Massaged changelog and improved printk message ]
Signed-off-by: Pavankumar Kondeti <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Len Brown <[email protected]>
Cc: Pavel Machek <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Konrad Rzeszutek Wilk <[email protected]>
Cc: iri Kosina <[email protected]>
Cc: Mukesh Ojha <[email protected]>
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]
|
|
The *affd argument is neither used in irq_build_affinity_masks() nor
__irq_build_affinity_masks(). Remove it.
Signed-off-by: Minwoo Im <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Ming Lei <[email protected]>
Cc: Minwoo Im <[email protected]>
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]
|
|
The circular buffers are now validated with selftests. The next interrupt
index algorithm which is the hardest part to validate needs extra coverage.
Add a selftest which uses the intervals stored in the arrays and insert all
the values except the last one. The next event computation must return the
same value as the last element which was not inserted.
Signed-off-by: Daniel Lezcano <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]
|
|
After testing the per cpu interrupt circular event, make sure the per
interrupt circular buffer usage is correct.
Add tests to validate the interrupt circular buffer.
Signed-off-by: Daniel Lezcano <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]
|
|
Due to the complexity of the code and the difficulty to debug it, add some
selftests to the framework in order to spot issues or regression at boot
time when the runtime testing is enabled for this subsystem.
This tests the circular buffer at the limits and validates:
- the encoding / decoding of the values
- the macro to browse the irq timings circular buffer
- the function to push data in the circular buffer
Signed-off-by: Daniel Lezcano <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]
|
|
For the next patches providing the selftest, it is required to insert
interval values directly in the buffer in order to check the correctness of
the code. Encapsulate the code doing that in a always inline function in
order to reuse it in the test code.
No functional changes.
Signed-off-by: Daniel Lezcano <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]
|
|
For the next patches providing the selftest, it is required to artificially
insert timings value in the circular buffer in order to check the
correctness of the code. Encapsulate the common code between the future
test code and the current code with an always-inline tag.
No functional change.
Signed-off-by: Daniel Lezcano <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]
|
|
With a minimal period and if there is a period which is a multiple of it
but lesser than the max period then it will be detected before and the
minimal period will be never reached.
1 2 1 2 1 2 1 2 1 2 1 2
<-----> <-----> <----->
<-> <-> <-> <-> <-> <->
In that case, the minimum period is 2 and the maximum period is 5. That
means all repeating pattern of 2 will be detected as repeating pattern of
4, it is pointless to go up to 2 when searching for the period as it will
always fail.
Remove one loop iteration by increasing the minimal period to 3.
Signed-off-by: Daniel Lezcano <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]
|
|
It appears the index beginning computation is not correct, the current
code does:
i = (irqts->count & IRQ_TIMINGS_MASK) - 1
If irqts->count is equal to zero, we end up with an index equal to -1,
but that does not happen because the function checks against zero
before and returns in such case.
However, if irqts->count is a multiple of IRQ_TIMINGS_SIZE, the
resulting & bit op will be zero and leads also to a -1 index.
Re-introduce the iteration loop belonging to the previous variance
code which was correct.
Fixes: bbba0e7c5cda "genirq/timings: Add array suffix computation code"
Signed-off-by: Daniel Lezcano <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]
|
|
The current code is luckily working with most of the interval samples
testing but actually it fails to correctly detect pattern repetition
breaking at the end of the buffer.
Narrowing down the bug has been a real pain because of the pointers,
so the routine is rewrittne by using indexes instead.
Fixes: bbba0e7c5cda "genirq/timings: Add array suffix computation code"
Signed-off-by: Daniel Lezcano <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]
|
|
seq_file.h does not need to be included, so remove it.
Signed-off-by: Yangtao Li <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull ptrace fixes from Eric Biederman:
"This is just two very minor fixes:
- prevent ptrace from reading unitialized kernel memory found twice
by syzkaller
- restore a missing smp_rmb in ptrace_may_access and add comment tp
it so it is not removed by accident again.
Apologies for being a little slow about getting this to you, I am
still figuring out how to develop with a little baby in the house"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
ptrace: restore smp_rmb() in __ptrace_may_access()
signal/ptrace: Don't leak unitialized kernel memory with PTRACE_PEEK_SIGINFO
|
|
Restore the read memory barrier in __ptrace_may_access() that was deleted
a couple years ago. Also add comments on this barrier and the one it pairs
with to explain why they're there (as far as I understand).
Fixes: bfedb589252c ("mm: Add a user_ns owner to mm_struct and fix ptrace permission checks")
Cc: [email protected]
Acked-by: Kees Cook <[email protected]>
Acked-by: Oleg Nesterov <[email protected]>
Signed-off-by: Jann Horn <[email protected]>
Signed-off-by: Eric W. Biederman <[email protected]>
|
|
With a specifically contrived memory layout where there is no physical
memory available to the kernel below the 4GB boundary, we will fail to
perform the initial swiotlb_init() call and set no_iotlb_memory to true.
There are drivers out there that call into swiotlb_nr_tbl() to determine
whether they can use the SWIOTLB. With the right DMA_BIT_MASK() value
for these drivers (say 64-bit), they won't ever need to hit
swiotlb_tbl_map_single() so this can go unoticed and we would be
possibly lying about those drivers.
Signed-off-by: Florian Fainelli <[email protected]>
Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
|
|
Avoid repeating the zeroing of global swiotlb variables in two locations
and introduce swiotlb_cleanup() to do that.
Signed-off-by: Florian Fainelli <[email protected]>
Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
|
|
If the leftmost parent node of the tree has does not have a child
on the left side, then trie_get_next_key (and bpftool map dump) will
not look at the child on the right. This leads to the traversal
missing elements.
Lookup is not affected.
Update selftest to handle this case.
Reproducer:
bpftool map create /sys/fs/bpf/lpm type lpm_trie key 6 \
value 1 entries 256 name test_lpm flags 1
bpftool map update pinned /sys/fs/bpf/lpm key 8 0 0 0 0 0 value 1
bpftool map update pinned /sys/fs/bpf/lpm key 16 0 0 0 0 128 value 2
bpftool map dump pinned /sys/fs/bpf/lpm
Returns only 1 element. (2 expected)
Fixes: b471f2f1de8b ("bpf: implement MAP_GET_NEXT_KEY command for LPM_TRIE")
Signed-off-by: Jonathan Lemon <[email protected]>
Acked-by: Martin KaFai Lau <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
|
|
Currently, the AF_XDP code uses a separate map in order to
determine if an xsk is bound to a queue. Instead of doing this,
have bpf_map_lookup_elem() return a xdp_sock.
Rearrange some xdp_sock members to eliminate structure holes.
Remove selftest - will be added back in later patch.
Signed-off-by: Jonathan Lemon <[email protected]>
Acked-by: Martin KaFai Lau <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
|