aboutsummaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)AuthorFilesLines
2019-06-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller3-20/+98
Alexei Starovoitov says: ==================== pull-request: bpf 2019-06-15 The following pull-request contains BPF updates for your *net* tree. The main changes are: 1) fix stack layout of JITed x64 bpf code, from Alexei. 2) fix out of bounds memory access in bpf_sk_storage, from Arthur. 3) fix lpm trie walk, from Jonathan. 4) fix nested bpf_perf_event_output, from Matt. 5) and several other fixes. ==================== Signed-off-by: David S. Miller <[email protected]>
2019-06-15bpf: fix nested bpf tracepoints with per-cpu dataMatt Mullins1-16/+84
BPF_PROG_TYPE_RAW_TRACEPOINTs can be executed nested on the same CPU, as they do not increment bpf_prog_active while executing. This enables three levels of nesting, to support - a kprobe or raw tp or perf event, - another one of the above that irq context happens to call, and - another one in nmi context (at most one of which may be a kprobe or perf event). Fixes: 20b9d7ac4852 ("bpf: avoid excessive stack usage for perf_sample_data") Signed-off-by: Matt Mullins <[email protected]> Acked-by: Andrii Nakryiko <[email protected]> Acked-by: Daniel Borkmann <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]>
2019-06-15Merge tag 'trace-v5.2-rc4' of ↵Linus Torvalds5-14/+35
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixes from Steven Rostedt: - Out of range read of stack trace output - Fix for NULL pointer dereference in trace_uprobe_create() - Fix to a livepatching / ftrace permission race in the module code - Fix for NULL pointer dereference in free_ftrace_func_mapper() - A couple of build warning clean ups * tag 'trace-v5.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: ftrace: Fix NULL pointer dereference in free_ftrace_func_mapper() module: Fix livepatch/ftrace module text permissions race tracing/uprobe: Fix obsolete comment on trace_uprobe_create() tracing/uprobe: Fix NULL pointer dereference in trace_uprobe_create() tracing: Make two symbols static tracing: avoid build warning with HAVE_NOP_MCOUNT tracing: Fix out-of-range read in trace_stack_print()
2019-06-15processor: get rid of cpu_relax_yieldHeiko Carstens1-1/+6
stop_machine is the only user left of cpu_relax_yield. Given that it now has special semantics which are tied to stop_machine introduce a weak stop_machine_yield function which architectures can override, and get rid of the generic cpu_relax_yield implementation. Acked-by: Peter Zijlstra (Intel) <[email protected]> Acked-by: Thomas Gleixner <[email protected]> Signed-off-by: Heiko Carstens <[email protected]>
2019-06-15s390: improve wait logic of stop_machineMartin Schwidefsky1-5/+9
The stop_machine loop to advance the state machine and to wait for all affected CPUs to check-in calls cpu_relax_yield in a tight loop until the last missing CPUs acknowledged the state transition. On a virtual system where not all logical CPUs are backed by real CPUs all the time it can take a while for all CPUs to check-in. With the current definition of cpu_relax_yield a diagnose 0x44 is done which tells the hypervisor to schedule *some* other CPU. That can be any CPU and not necessarily one of the CPUs that need to run in order to advance the state machine. This can lead to a pretty bad diagnose 0x44 storm until the last missing CPU finally checked-in. Replace the undirected cpu_relax_yield based on diagnose 0x44 with a directed yield. Each CPU in the wait loop will pick up the next CPU in the cpumask of stop_machine. The diagnose 0x9c is used to tell the hypervisor to run this next CPU instead of the current one. If there is only a limited number of real CPUs backing the virtual CPUs we end up with the real CPUs passed around in a round-robin fashion. [[email protected]]: Use cpumask_next_wrap as suggested by Peter Zijlstra. Signed-off-by: Martin Schwidefsky <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Acked-by: Thomas Gleixner <[email protected]> Signed-off-by: Heiko Carstens <[email protected]>
2019-06-14Merge branch 'for-5.2-fixes' of ↵Linus Torvalds3-32/+91
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup fixes from Tejun Heo: "This has an unusually high density of tricky fixes: - task_get_css() could deadlock when it races against a dying cgroup. - cgroup.procs didn't list thread group leaders with live threads. This could mislead readers to think that a cgroup is empty when it's not. Fixed by making PROCS iterator include dead tasks. I made a couple mistakes making this change and this pull request contains a couple follow-up patches. - When cpusets run out of online cpus, it updates cpusmasks of member tasks in bizarre ways. Joel improved the behavior significantly" * 'for-5.2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cpuset: restore sanity to cpuset_cpus_allowed_fallback() cgroup: Fix css_task_iter_advance_css_set() cset skip condition cgroup: css_task_iter_skip()'d iterators must be advanced before accessed cgroup: Include dying leaders with live threads in PROCS iterations cgroup: Implement css_task_iter_skip() cgroup: Call cgroup_release() before __exit_signal() docs cgroups: add another example size for hugetlb cgroup: Use css_tryget() instead of css_tryget_online() in task_get_css()
2019-06-14sysctl: define proc_do_static_key()Eric Dumazet2-22/+23
Convert proc_dointvec_minmax_bpf_stats() into a more generic helper, since we are going to use jump labels more often. Note that sysctl_bpf_stats_enabled is removed, since it is no longer needed/used. Signed-off-by: Eric Dumazet <[email protected]> Acked-by: Alexei Starovoitov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-06-15bpf, devmap: Add missing RCU read lock on flushToshiaki Makita1-0/+4
.ndo_xdp_xmit() assumes it is called under RCU. For example virtio_net uses RCU to detect it has setup the resources for tx. The assumption accidentally broke when introducing bulk queue in devmap. Fixes: 5d053f9da431 ("bpf: devmap prepare xdp frames for bulking") Reported-by: David Ahern <[email protected]> Signed-off-by: Toshiaki Makita <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]>
2019-06-15bpf, devmap: Add missing bulk queue freeToshiaki Makita1-0/+1
dev_map_free() forgot to free bulk queue when freeing its entries. Fixes: 5d053f9da431 ("bpf: devmap prepare xdp frames for bulking") Signed-off-by: Toshiaki Makita <[email protected]> Acked-by: Jesper Dangaard Brouer <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]>
2019-06-15bpf, devmap: Fix premature entry free on destroying mapToshiaki Makita1-2/+2
dev_map_free() waits for flush_needed bitmap to be empty in order to ensure all flush operations have completed before freeing its entries. However the corresponding clear_bit() was called before using the entries, so the entries could be used after free. All access to the entries needs to be done before clearing the bit. It seems commit a5e2da6e9787 ("bpf: netdev is never null in __dev_map_flush") accidentally changed the clear_bit() and memory access order. Note that the problem happens only in __dev_map_flush(), not in dev_map_flush_old(). dev_map_flush_old() is called only after nulling out the corresponding netdev_map entry, so dev_map_free() never frees the entry thus no such race happens there. Fixes: a5e2da6e9787 ("bpf: netdev is never null in __dev_map_flush") Signed-off-by: Toshiaki Makita <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]>
2019-06-14ftrace: Fix NULL pointer dereference in free_ftrace_func_mapper()Wei Li1-2/+5
The mapper may be NULL when called from register_ftrace_function_probe() with probe->data == NULL. This issue can be reproduced as follow (it may be covered by compiler optimization sometime): / # cat /sys/kernel/debug/tracing/set_ftrace_filter #### all functions enabled #### / # echo foo_bar:dump > /sys/kernel/debug/tracing/set_ftrace_filter [ 206.949100] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 [ 206.952402] Mem abort info: [ 206.952819] ESR = 0x96000006 [ 206.955326] Exception class = DABT (current EL), IL = 32 bits [ 206.955844] SET = 0, FnV = 0 [ 206.956272] EA = 0, S1PTW = 0 [ 206.956652] Data abort info: [ 206.957320] ISV = 0, ISS = 0x00000006 [ 206.959271] CM = 0, WnR = 0 [ 206.959938] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000419f3a000 [ 206.960483] [0000000000000000] pgd=0000000411a87003, pud=0000000411a83003, pmd=0000000000000000 [ 206.964953] Internal error: Oops: 96000006 [#1] SMP [ 206.971122] Dumping ftrace buffer: [ 206.973677] (ftrace buffer empty) [ 206.975258] Modules linked in: [ 206.976631] Process sh (pid: 281, stack limit = 0x(____ptrval____)) [ 206.978449] CPU: 10 PID: 281 Comm: sh Not tainted 5.2.0-rc1+ #17 [ 206.978955] Hardware name: linux,dummy-virt (DT) [ 206.979883] pstate: 60000005 (nZCv daif -PAN -UAO) [ 206.980499] pc : free_ftrace_func_mapper+0x2c/0x118 [ 206.980874] lr : ftrace_count_free+0x68/0x80 [ 206.982539] sp : ffff0000182f3ab0 [ 206.983102] x29: ffff0000182f3ab0 x28: ffff8003d0ec1700 [ 206.983632] x27: ffff000013054b40 x26: 0000000000000001 [ 206.984000] x25: ffff00001385f000 x24: 0000000000000000 [ 206.984394] x23: ffff000013453000 x22: ffff000013054000 [ 206.984775] x21: 0000000000000000 x20: ffff00001385fe28 [ 206.986575] x19: ffff000013872c30 x18: 0000000000000000 [ 206.987111] x17: 0000000000000000 x16: 0000000000000000 [ 206.987491] x15: ffffffffffffffb0 x14: 0000000000000000 [ 206.987850] x13: 000000000017430e x12: 0000000000000580 [ 206.988251] x11: 0000000000000000 x10: cccccccccccccccc [ 206.988740] x9 : 0000000000000000 x8 : ffff000013917550 [ 206.990198] x7 : ffff000012fac2e8 x6 : ffff000012fac000 [ 206.991008] x5 : ffff0000103da588 x4 : 0000000000000001 [ 206.991395] x3 : 0000000000000001 x2 : ffff000013872a28 [ 206.991771] x1 : 0000000000000000 x0 : 0000000000000000 [ 206.992557] Call trace: [ 206.993101] free_ftrace_func_mapper+0x2c/0x118 [ 206.994827] ftrace_count_free+0x68/0x80 [ 206.995238] release_probe+0xfc/0x1d0 [ 206.995555] register_ftrace_function_probe+0x4a8/0x868 [ 206.995923] ftrace_trace_probe_callback.isra.4+0xb8/0x180 [ 206.996330] ftrace_dump_callback+0x50/0x70 [ 206.996663] ftrace_regex_write.isra.29+0x290/0x3a8 [ 206.997157] ftrace_filter_write+0x44/0x60 [ 206.998971] __vfs_write+0x64/0xf0 [ 206.999285] vfs_write+0x14c/0x2f0 [ 206.999591] ksys_write+0xbc/0x1b0 [ 206.999888] __arm64_sys_write+0x3c/0x58 [ 207.000246] el0_svc_common.constprop.0+0x408/0x5f0 [ 207.000607] el0_svc_handler+0x144/0x1c8 [ 207.000916] el0_svc+0x8/0xc [ 207.003699] Code: aa0003f8 a9025bf5 aa0103f5 f946ea80 (f9400303) [ 207.008388] ---[ end trace 7b6d11b5f542bdf1 ]--- [ 207.010126] Kernel panic - not syncing: Fatal exception [ 207.011322] SMP: stopping secondary CPUs [ 207.013956] Dumping ftrace buffer: [ 207.014595] (ftrace buffer empty) [ 207.015632] Kernel Offset: disabled [ 207.017187] CPU features: 0x002,20006008 [ 207.017985] Memory Limit: none [ 207.019825] ---[ end Kernel panic - not syncing: Fatal exception ]--- Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Wei Li <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2019-06-14docs: power: convert docs to ReST and rename to *.rstMauro Carvalho Chehab1-3/+3
Convert the PM documents to ReST, in order to allow them to build with Sphinx. The conversion is actually: - add blank lines and indentation in order to identify paragraphs; - fix tables markups; - add some lists markups; - mark literal blocks; - adjust title markups. At its new index.rst, let's add a :orphan: while this is not linked to the main index.rst file, in order to avoid build warnings. Signed-off-by: Mauro Carvalho Chehab <[email protected]> Signed-off-by: Bjorn Helgaas <[email protected]> Acked-by: Mark Brown <[email protected]> Acked-by: Srivatsa S. Bhat (VMware) <[email protected]>
2019-06-14module: Fix livepatch/ftrace module text permissions raceJosh Poimboeuf2-1/+15
It's possible for livepatch and ftrace to be toggling a module's text permissions at the same time, resulting in the following panic: BUG: unable to handle page fault for address: ffffffffc005b1d9 #PF: supervisor write access in kernel mode #PF: error_code(0x0003) - permissions violation PGD 3ea0c067 P4D 3ea0c067 PUD 3ea0e067 PMD 3cc13067 PTE 3b8a1061 Oops: 0003 [#1] PREEMPT SMP PTI CPU: 1 PID: 453 Comm: insmod Tainted: G O K 5.2.0-rc1-a188339ca5 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-20181126_142135-anatol 04/01/2014 RIP: 0010:apply_relocate_add+0xbe/0x14c Code: fa 0b 74 21 48 83 fa 18 74 38 48 83 fa 0a 75 40 eb 08 48 83 38 00 74 33 eb 53 83 38 00 75 4e 89 08 89 c8 eb 0a 83 38 00 75 43 <89> 08 48 63 c1 48 39 c8 74 2e eb 48 83 38 00 75 32 48 29 c1 89 08 RSP: 0018:ffffb223c00dbb10 EFLAGS: 00010246 RAX: ffffffffc005b1d9 RBX: 0000000000000000 RCX: ffffffff8b200060 RDX: 000000000000000b RSI: 0000004b0000000b RDI: ffff96bdfcd33000 RBP: ffffb223c00dbb38 R08: ffffffffc005d040 R09: ffffffffc005c1f0 R10: ffff96bdfcd33c40 R11: ffff96bdfcd33b80 R12: 0000000000000018 R13: ffffffffc005c1f0 R14: ffffffffc005e708 R15: ffffffff8b2fbc74 FS: 00007f5f447beba8(0000) GS:ffff96bdff900000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffffc005b1d9 CR3: 000000003cedc002 CR4: 0000000000360ea0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: klp_init_object_loaded+0x10f/0x219 ? preempt_latency_start+0x21/0x57 klp_enable_patch+0x662/0x809 ? virt_to_head_page+0x3a/0x3c ? kfree+0x8c/0x126 patch_init+0x2ed/0x1000 [livepatch_test02] ? 0xffffffffc0060000 do_one_initcall+0x9f/0x1c5 ? kmem_cache_alloc_trace+0xc4/0xd4 ? do_init_module+0x27/0x210 do_init_module+0x5f/0x210 load_module+0x1c41/0x2290 ? fsnotify_path+0x3b/0x42 ? strstarts+0x2b/0x2b ? kernel_read+0x58/0x65 __do_sys_finit_module+0x9f/0xc3 ? __do_sys_finit_module+0x9f/0xc3 __x64_sys_finit_module+0x1a/0x1c do_syscall_64+0x52/0x61 entry_SYSCALL_64_after_hwframe+0x44/0xa9 The above panic occurs when loading two modules at the same time with ftrace enabled, where at least one of the modules is a livepatch module: CPU0 CPU1 klp_enable_patch() klp_init_object_loaded() module_disable_ro() ftrace_module_enable() ftrace_arch_code_modify_post_process() set_all_modules_text_ro() klp_write_object_relocations() apply_relocate_add() *patches read-only code* - BOOM A similar race exists when toggling ftrace while loading a livepatch module. Fix it by ensuring that the livepatch and ftrace code patching operations -- and their respective permissions changes -- are protected by the text_mutex. Link: http://lkml.kernel.org/r/ab43d56ab909469ac5d2520c5d944ad6d4abd476.1560474114.git.jpoimboe@redhat.com Reported-by: Johannes Erdfelt <[email protected]> Fixes: 444d13ff10fb ("modules: add ro_after_init support") Acked-by: Jessica Yu <[email protected]> Reviewed-by: Petr Mladek <[email protected]> Reviewed-by: Miroslav Benes <[email protected]> Signed-off-by: Josh Poimboeuf <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2019-06-14tracing/uprobe: Fix obsolete comment on trace_uprobe_create()Eiichi Tsukata1-2/+0
Commit 0597c49c69d5 ("tracing/uprobes: Use dyn_event framework for uprobe events") cleaned up the usage of trace_uprobe_create(), and the function has been no longer used for removing uprobe/uretprobe. Link: http://lkml.kernel.org/r/[email protected] Reviewed-by: Srikar Dronamraju <[email protected]> Signed-off-by: Eiichi Tsukata <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2019-06-14tracing/uprobe: Fix NULL pointer dereference in trace_uprobe_create()Eiichi Tsukata1-3/+10
Just like the case of commit 8b05a3a7503c ("tracing/kprobes: Fix NULL pointer dereference in trace_kprobe_create()"), writing an incorrectly formatted string to uprobe_events can trigger NULL pointer dereference. Reporeducer: # echo r > /sys/kernel/debug/tracing/uprobe_events dmesg: BUG: kernel NULL pointer dereference, address: 0000000000000000 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 8000000079d12067 P4D 8000000079d12067 PUD 7b7ab067 PMD 0 Oops: 0000 [#1] PREEMPT SMP PTI CPU: 0 PID: 1903 Comm: bash Not tainted 5.2.0-rc3+ #15 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-2.fc30 04/01/2014 RIP: 0010:strchr+0x0/0x30 Code: c0 eb 0d 84 c9 74 18 48 83 c0 01 48 39 d0 74 0f 0f b6 0c 07 3a 0c 06 74 ea 19 c0 83 c8 01 c3 31 c0 c3 0f 1f 84 00 00 00 00 00 <0f> b6 07 89 f2 40 38 f0 75 0e eb 13 0f b6 47 01 48 83 c RSP: 0018:ffffb55fc0403d10 EFLAGS: 00010293 RAX: ffff993ffb793400 RBX: 0000000000000000 RCX: ffffffffa4852625 RDX: 0000000000000000 RSI: 000000000000002f RDI: 0000000000000000 RBP: ffffb55fc0403dd0 R08: ffff993ffb793400 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: ffff993ff9cc1668 R14: 0000000000000001 R15: 0000000000000000 FS: 00007f30c5147700(0000) GS:ffff993ffda00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 000000007b628000 CR4: 00000000000006f0 Call Trace: trace_uprobe_create+0xe6/0xb10 ? __kmalloc_track_caller+0xe6/0x1c0 ? __kmalloc+0xf0/0x1d0 ? trace_uprobe_create+0xb10/0xb10 create_or_delete_trace_uprobe+0x35/0x90 ? trace_uprobe_create+0xb10/0xb10 trace_run_command+0x9c/0xb0 trace_parse_run_command+0xf9/0x1eb ? probes_open+0x80/0x80 __vfs_write+0x43/0x90 vfs_write+0x14a/0x2a0 ksys_write+0xa2/0x170 do_syscall_64+0x7f/0x200 entry_SYSCALL_64_after_hwframe+0x49/0xbe Link: http://lkml.kernel.org/r/[email protected] Cc: [email protected] Fixes: 0597c49c69d5 ("tracing/uprobes: Use dyn_event framework for uprobe events") Reviewed-by: Srikar Dronamraju <[email protected]> Signed-off-by: Eiichi Tsukata <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2019-06-14tracing: Make two symbols staticYueHaibing1-2/+2
Fix sparse warnings: kernel/trace/trace.c:6927:24: warning: symbol 'get_tracing_log_err' was not declared. Should it be static? kernel/trace/trace.c:8196:15: warning: symbol 'trace_instance_dir' was not declared. Should it be static? Link: http://lkml.kernel.org/r/[email protected] Acked-by: Tom Zanussi <[email protected]> Reported-by: Hulk Robot <[email protected]> Signed-off-by: YueHaibing <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2019-06-14tracing: avoid build warning with HAVE_NOP_MCOUNTVasily Gorbik1-3/+2
Selecting HAVE_NOP_MCOUNT enables -mnop-mcount (if gcc supports it) and sets CC_USING_NOP_MCOUNT. Reuse __is_defined (which is suitable for testing CC_USING_* defines) to avoid conditional compilation and fix the following gcc 9 warning on s390: kernel/trace/ftrace.c:2514:1: warning: ‘ftrace_code_disable’ defined but not used [-Wunused-function] Link: http://lkml.kernel.org/r/patch.git-1a82d13f33ac.your-ad-here.call-01559732716-ext-6629@work.hours Fixes: 2f4df0017baed ("tracing: Add -mcount-nop option support") Signed-off-by: Vasily Gorbik <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2019-06-14docs: scheduler: convert docs to ReST and rename to *.rstMauro Carvalho Chehab1-1/+1
In order to prepare to add them to the Kernel API book, convert the files to ReST format. The conversion is actually: - add blank lines and identation in order to identify paragraphs; - fix tables markups; - add some lists markups; - mark literal blocks; - adjust title markups. At its new index.rst, let's add a :orphan: while this is not linked to the main index.rst file, in order to avoid build warnings. Signed-off-by: Mauro Carvalho Chehab <[email protected]> Signed-off-by: Jonathan Corbet <[email protected]>
2019-06-14docs: cgroup-v1: convert docs to ReST and rename to *.rstMauro Carvalho Chehab1-1/+1
Convert the cgroup-v1 files to ReST format, in order to allow a later addition to the admin-guide. The conversion is actually: - add blank lines and identation in order to identify paragraphs; - fix tables markups; - add some lists markups; - mark literal blocks; - adjust title markups. At its new index.rst, let's add a :orphan: while this is not linked to the main index.rst file, in order to avoid build warnings. Signed-off-by: Mauro Carvalho Chehab <[email protected]> Acked-by: Tejun Heo <[email protected]> Signed-off-by: Tejun Heo <[email protected]>
2019-06-14tracing: Fix out-of-range read in trace_stack_print()Eiichi Tsukata1-1/+1
Puts range check before dereferencing the pointer. Reproducer: # echo stacktrace > trace_options # echo 1 > events/enable # cat trace > /dev/null KASAN report: ================================================================== BUG: KASAN: use-after-free in trace_stack_print+0x26b/0x2c0 Read of size 8 at addr ffff888069d20000 by task cat/1953 CPU: 0 PID: 1953 Comm: cat Not tainted 5.2.0-rc3+ #5 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-2.fc30 04/01/2014 Call Trace: dump_stack+0x8a/0xce print_address_description+0x60/0x224 ? trace_stack_print+0x26b/0x2c0 ? trace_stack_print+0x26b/0x2c0 __kasan_report.cold+0x1a/0x3e ? trace_stack_print+0x26b/0x2c0 kasan_report+0xe/0x20 trace_stack_print+0x26b/0x2c0 print_trace_line+0x6ea/0x14d0 ? tracing_buffers_read+0x700/0x700 ? trace_find_next_entry_inc+0x158/0x1d0 s_show+0xea/0x310 seq_read+0xaa7/0x10e0 ? seq_escape+0x230/0x230 __vfs_read+0x7c/0x100 vfs_read+0x16c/0x3a0 ksys_read+0x121/0x240 ? kernel_write+0x110/0x110 ? perf_trace_sys_enter+0x8a0/0x8a0 ? syscall_slow_exit_work+0xa9/0x410 do_syscall_64+0xb7/0x390 ? prepare_exit_to_usermode+0x165/0x200 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7f867681f910 Code: b6 fe ff ff 48 8d 3d 0f be 08 00 48 83 ec 08 e8 06 db 01 00 66 0f 1f 44 00 00 83 3d f9 2d 2c 00 00 75 10 b8 00 00 00 00 04 RSP: 002b:00007ffdabf23488 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007f867681f910 RDX: 0000000000020000 RSI: 00007f8676cde000 RDI: 0000000000000003 RBP: 00007f8676cde000 R08: ffffffffffffffff R09: 0000000000000000 R10: 0000000000000871 R11: 0000000000000246 R12: 00007f8676cde000 R13: 0000000000000003 R14: 0000000000020000 R15: 0000000000000ec0 Allocated by task 1214: save_stack+0x1b/0x80 __kasan_kmalloc.constprop.0+0xc2/0xd0 kmem_cache_alloc+0xaf/0x1a0 getname_flags+0xd2/0x5b0 do_sys_open+0x277/0x5a0 do_syscall_64+0xb7/0x390 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Freed by task 1214: save_stack+0x1b/0x80 __kasan_slab_free+0x12c/0x170 kmem_cache_free+0x8a/0x1c0 putname+0xe1/0x120 do_sys_open+0x2c5/0x5a0 do_syscall_64+0xb7/0x390 entry_SYSCALL_64_after_hwframe+0x44/0xa9 The buggy address belongs to the object at ffff888069d20000 which belongs to the cache names_cache of size 4096 The buggy address is located 0 bytes inside of 4096-byte region [ffff888069d20000, ffff888069d21000) The buggy address belongs to the page: page:ffffea0001a74800 refcount:1 mapcount:0 mapping:ffff88806ccd1380 index:0x0 compound_mapcount: 0 flags: 0x100000000010200(slab|head) raw: 0100000000010200 dead000000000100 dead000000000200 ffff88806ccd1380 raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff888069d1ff00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff888069d1ff80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >ffff888069d20000: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff888069d20080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff888069d20100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ================================================================== Link: http://lkml.kernel.org/r/[email protected] Fixes: 4285f2fcef80 ("tracing: Remove the ULONG_MAX stack trace hackery") Signed-off-by: Eiichi Tsukata <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2019-06-14cgroup: Move cgroup_parse_float() implementation out of CONFIG_SYSFSTejun Heo1-42/+42
a5e112e6424a ("cgroup: add cgroup_parse_float()") accidentally added cgroup_parse_float() inside CONFIG_SYSFS block. Move it outside so that it doesn't cause failures on !CONFIG_SYSFS builds. Signed-off-by: Tejun Heo <[email protected]> Fixes: a5e112e6424a ("cgroup: add cgroup_parse_float()")
2019-06-14alarmtimer: Fix kerneldoc comment for alarmtimer_suspend()Yangtao Li1-1/+0
This brings the kernel doc in line with the function signature. Signed-off-by: Yangtao Li <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected]
2019-06-14clocksource: Move inline keyword to the beginning of function declarationsMathieu Malaterre1-2/+2
The inline keyword was not at the beginning of the function declarations. Fix the following warnings triggered when using W=1: kernel/time/clocksource.c:108:1: warning: 'inline' is not at beginning of declaration [-Wold-style-declaration] kernel/time/clocksource.c:113:1: warning: 'inline' is not at beginning of declaration [-Wold-style-declaration] Signed-off-by: Mathieu Malaterre <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: John Stultz <[email protected]> Cc: Stephen Boyd <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2019-06-14dma-remap: Avoid de-referencing NULL atomic_poolFlorian Fainelli1-0/+3
With architectures allowing the kernel to be placed almost arbitrarily in memory (e.g.: ARM64), it is possible to have the kernel resides at physical addresses above 4GB, resulting in neither the default CMA area, nor the atomic pool from successfully allocating. This does not prevent specific peripherals from working though, one example is XHCI, which still operates correctly. Trouble comes when the XHCI driver gets suspended and resumed, since we can now trigger the following NPD: [ 12.664170] usb usb1: root hub lost power or was reset [ 12.669387] usb usb2: root hub lost power or was reset [ 12.674662] Unable to handle kernel NULL pointer dereference at virtual address 00000008 [ 12.682896] pgd = ffffffc1365a7000 [ 12.686386] [00000008] *pgd=0000000136500003, *pud=0000000136500003, *pmd=0000000000000000 [ 12.694897] Internal error: Oops: 96000006 [#1] SMP [ 12.699843] Modules linked in: [ 12.702980] CPU: 0 PID: 1499 Comm: pml Not tainted 4.9.135-1.13pre #51 [ 12.709577] Hardware name: BCM97268DV (DT) [ 12.713736] task: ffffffc136bb6540 task.stack: ffffffc1366cc000 [ 12.719740] PC is at addr_in_gen_pool+0x4/0x48 [ 12.724253] LR is at __dma_free+0x64/0xbc [ 12.728325] pc : [<ffffff80083c0df8>] lr : [<ffffff80080979e0>] pstate: 60000145 [ 12.735825] sp : ffffffc1366cf990 [ 12.739196] x29: ffffffc1366cf990 x28: ffffffc1366cc000 [ 12.744608] x27: 0000000000000000 x26: ffffffc13a8568c8 [ 12.750020] x25: 0000000000000000 x24: ffffff80098f9000 [ 12.755433] x23: 000000013a5ff000 x22: ffffff8009c57000 [ 12.760844] x21: ffffffc13a856810 x20: 0000000000000000 [ 12.766255] x19: 0000000000001000 x18: 000000000000000a [ 12.771667] x17: 0000007f917553e0 x16: 0000000000001002 [ 12.777078] x15: 00000000000a36cb x14: ffffff80898feb77 [ 12.782490] x13: ffffffffffffffff x12: 0000000000000030 [ 12.787899] x11: 00000000fffffffe x10: ffffff80098feb7f [ 12.793311] x9 : 0000000005f5e0ff x8 : 65776f702074736f [ 12.798723] x7 : 6c2062756820746f x6 : ffffff80098febb1 [ 12.804134] x5 : ffffff800809797c x4 : 0000000000000000 [ 12.809545] x3 : 000000013a5ff000 x2 : 0000000000000fff [ 12.814955] x1 : ffffff8009c57000 x0 : 0000000000000000 [ 12.820363] [ 12.821907] Process pml (pid: 1499, stack limit = 0xffffffc1366cc020) [ 12.828421] Stack: (0xffffffc1366cf990 to 0xffffffc1366d0000) [ 12.834240] f980: ffffffc1366cf9e0 ffffff80086004d0 [ 12.842186] f9a0: ffffffc13ab08238 0000000000000010 ffffff80097c2218 ffffffc13a856810 [ 12.850131] f9c0: ffffff8009c57000 000000013a5ff000 0000000000000008 000000013a5ff000 [ 12.858076] f9e0: ffffffc1366cfa50 ffffff80085f9250 ffffffc13ab08238 0000000000000004 [ 12.866021] fa00: ffffffc13ab08000 ffffff80097b6000 ffffffc13ab08130 0000000000000001 [ 12.873966] fa20: 0000000000000008 ffffffc13a8568c8 0000000000000000 ffffffc1366cc000 [ 12.881911] fa40: ffffffc13ab08130 0000000000000001 ffffffc1366cfa90 ffffff80085e3de8 [ 12.889856] fa60: ffffffc13ab08238 0000000000000000 ffffffc136b75b00 0000000000000000 [ 12.897801] fa80: 0000000000000010 ffffff80089ccb92 ffffffc1366cfac0 ffffff80084ad040 [ 12.905746] faa0: ffffffc13a856810 0000000000000000 ffffff80084ad004 ffffff80084b91a8 [ 12.913691] fac0: ffffffc1366cfae0 ffffff80084b91b4 ffffffc13a856810 ffffff80080db5cc [ 12.921636] fae0: ffffffc1366cfb20 ffffff80084b96bc ffffffc13a856810 0000000000000010 [ 12.929581] fb00: ffffffc13a856870 0000000000000000 ffffffc13a856810 ffffff800984d2b8 [ 12.937526] fb20: ffffffc1366cfb50 ffffff80084baa70 ffffff8009932ad0 ffffff800984d260 [ 12.945471] fb40: 0000000000000010 00000002eff0a065 ffffffc1366cfbb0 ffffff80084bafbc [ 12.953415] fb60: 0000000000000010 0000000000000003 ffffff80098fe000 0000000000000000 [ 12.961360] fb80: ffffff80097b6000 ffffff80097b6dc8 ffffff80098c12b8 ffffff80098c12f8 [ 12.969306] fba0: ffffff8008842000 ffffff80097b6dc8 ffffffc1366cfbd0 ffffff80080e0d88 [ 12.977251] fbc0: 00000000fffffffb ffffff80080e10bc ffffffc1366cfc60 ffffff80080e16a8 [ 12.985196] fbe0: 0000000000000000 0000000000000003 ffffff80097b6000 ffffff80098fe9f0 [ 12.993140] fc00: ffffff80097d4000 ffffff8008983802 0000000000000123 0000000000000040 [ 13.001085] fc20: ffffff8008842000 ffffffc1366cc000 ffffff80089803c2 00000000ffffffff [ 13.009029] fc40: 0000000000000000 0000000000000000 ffffffc1366cfc60 0000000000040987 [ 13.016974] fc60: ffffffc1366cfcc0 ffffff80080dfd08 0000000000000003 0000000000000004 [ 13.024919] fc80: 0000000000000003 ffffff80098fea08 ffffffc136577ec0 ffffff80089803c2 [ 13.032864] fca0: 0000000000000123 0000000000000001 0000000500000002 0000000000040987 [ 13.040809] fcc0: ffffffc1366cfd00 ffffff80083a89d4 0000000000000004 ffffffc136577ec0 [ 13.048754] fce0: ffffffc136610cc0 ffffffffffffffea ffffffc1366cfeb0 ffffffc136610cd8 [ 13.056700] fd00: ffffffc1366cfd10 ffffff800822a614 ffffffc1366cfd40 ffffff80082295d4 [ 13.064645] fd20: 0000000000000004 ffffffc136577ec0 ffffffc136610cc0 0000000021670570 [ 13.072590] fd40: ffffffc1366cfd80 ffffff80081b5d10 ffffff80097b6000 ffffffc13aae4200 [ 13.080536] fd60: ffffffc1366cfeb0 0000000000000004 0000000021670570 0000000000000004 [ 13.088481] fd80: ffffffc1366cfe30 ffffff80081b6b20 ffffffc13aae4200 0000000000000000 [ 13.096427] fda0: 0000000000000004 0000000021670570 ffffffc1366cfeb0 ffffffc13a838200 [ 13.104371] fdc0: 0000000000000000 000000000000000a ffffff80097b6000 0000000000040987 [ 13.112316] fde0: ffffffc1366cfe20 ffffff80081b3af0 ffffffc13a838200 0000000000000000 [ 13.120261] fe00: ffffffc1366cfe30 ffffff80081b6b0c ffffffc13aae4200 0000000000000000 [ 13.128206] fe20: 0000000000000004 0000000000040987 ffffffc1366cfe70 ffffff80081b7dd8 [ 13.136151] fe40: ffffff80097b6000 ffffffc13aae4200 ffffffc13aae4200 fffffffffffffff7 [ 13.144096] fe60: 0000000021670570 ffffffc13a8c63c0 0000000000000000 ffffff8008083180 [ 13.152042] fe80: ffffffffffffff1d 0000000021670570 ffffffffffffffff 0000007f917ad9b8 [ 13.159986] fea0: 0000000020000000 0000000000000015 0000000000000000 0000000000040987 [ 13.167930] fec0: 0000000000000001 0000000021670570 0000000000000004 0000000000000000 [ 13.175874] fee0: 0000000000000888 0000440110000000 000000000000006d 0000000000000003 [ 13.183819] ff00: 0000000000000040 ffffff80ffffffc8 0000000000000000 0000000000000020 [ 13.191762] ff20: 0000000000000000 0000000000000000 0000000000000001 0000000000000000 [ 13.199707] ff40: 0000000000000000 0000007f917553e0 0000000000000000 0000000000000004 [ 13.207651] ff60: 0000000021670570 0000007f91835480 0000000000000004 0000007f91831638 [ 13.215595] ff80: 0000000000000004 00000000004b0de0 00000000004b0000 0000000000000000 [ 13.223539] ffa0: 0000000000000000 0000007fc92ac8c0 0000007f9175d178 0000007fc92ac8c0 [ 13.231483] ffc0: 0000007f917ad9b8 0000000020000000 0000000000000001 0000000000000040 [ 13.239427] ffe0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 13.247360] Call trace: [ 13.249866] Exception stack(0xffffffc1366cf7a0 to 0xffffffc1366cf8d0) [ 13.256386] f7a0: 0000000000001000 0000007fffffffff ffffffc1366cf990 ffffff80083c0df8 [ 13.264331] f7c0: 0000000060000145 ffffff80089b5001 ffffffc13ab08130 0000000000000001 [ 13.272275] f7e0: 0000000000000008 ffffffc13a8568c8 0000000000000000 0000000000000000 [ 13.280220] f800: ffffffc1366cf960 ffffffc1366cf960 ffffffc1366cf930 00000000ffffffd8 [ 13.288165] f820: ffffff8009931ac0 4554535953425553 4544006273753d4d 3831633d45434956 [ 13.296110] f840: ffff003832313a39 ffffff800845926c ffffffc1366cf880 0000000000040987 [ 13.304054] f860: 0000000000000000 ffffff8009c57000 0000000000000fff 000000013a5ff000 [ 13.311999] f880: 0000000000000000 ffffff800809797c ffffff80098febb1 6c2062756820746f [ 13.319944] f8a0: 65776f702074736f 0000000005f5e0ff ffffff80098feb7f 00000000fffffffe [ 13.327884] f8c0: 0000000000000030 ffffffffffffffff [ 13.332835] [<ffffff80083c0df8>] addr_in_gen_pool+0x4/0x48 [ 13.338398] [<ffffff80086004d0>] xhci_mem_cleanup+0xc8/0x51c [ 13.344137] [<ffffff80085f9250>] xhci_resume+0x308/0x65c [ 13.349524] [<ffffff80085e3de8>] xhci_brcm_resume+0x84/0x8c [ 13.355174] [<ffffff80084ad040>] platform_pm_resume+0x3c/0x64 [ 13.360997] [<ffffff80084b91b4>] dpm_run_callback+0x5c/0x15c [ 13.366732] [<ffffff80084b96bc>] device_resume+0xc0/0x190 [ 13.372205] [<ffffff80084baa70>] dpm_resume+0x144/0x2cc [ 13.377504] [<ffffff80084bafbc>] dpm_resume_end+0x20/0x34 [ 13.382980] [<ffffff80080e0d88>] suspend_devices_and_enter+0x104/0x704 [ 13.389585] [<ffffff80080e16a8>] pm_suspend+0x320/0x53c [ 13.394881] [<ffffff80080dfd08>] state_store+0xbc/0xe0 [ 13.400094] [<ffffff80083a89d4>] kobj_attr_store+0x14/0x24 [ 13.405655] [<ffffff800822a614>] sysfs_kf_write+0x60/0x70 [ 13.411128] [<ffffff80082295d4>] kernfs_fop_write+0x130/0x194 [ 13.416954] [<ffffff80081b5d10>] __vfs_write+0x60/0x150 [ 13.422254] [<ffffff80081b6b20>] vfs_write+0xc8/0x164 [ 13.427376] [<ffffff80081b7dd8>] SyS_write+0x70/0xc8 [ 13.432412] [<ffffff8008083180>] el0_svc_naked+0x34/0x38 [ 13.437800] Code: 92800173 97f6fb9e 17fffff5 d1000442 (f8408c03) [ 13.444033] ---[ end trace 2effe12f909ce205 ]--- The call path leading to this problem is xhci_mem_cleanup() -> dma_free_coherent() -> dma_free_from_pool() -> addr_in_gen_pool. If the atomic_pool is NULL, we can't possibly have the address in the atomic pool anyway, so guard against that. Signed-off-by: Florian Fainelli <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2019-06-14timekeeping: Repair ktime_get_coarse*() granularityThomas Gleixner1-2/+3
Jason reported that the coarse ktime based time getters advance only once per second and not once per tick as advertised. The code reads only the monotonic base time, which advances once per second. The nanoseconds are accumulated on every tick in xtime_nsec up to a second and the regular time getters take this nanoseconds offset into account, but the ktime_get_coarse*() implementation fails to do so. Add the accumulated xtime_nsec value to the monotonic base time to get the proper per tick advancing coarse tinme. Fixes: b9ff604cff11 ("timekeeping: Add ktime_get_coarse_with_offset") Reported-by: Jason A. Donenfeld <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Jason A. Donenfeld <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Clemens Ladisch <[email protected]> Cc: Sultan Alsawaf <[email protected]> Cc: Waiman Long <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected]
2019-06-14PM: hibernate: powerpc: Expose pfn_is_nosave() prototypeMathieu Malaterre1-2/+0
The declaration for pfn_is_nosave is only available in kernel/power/power.h. Since this function can be override in arch, expose it globally. Having a prototype will make sure to avoid warning (sometime treated as error with W=1) such as: arch/powerpc/kernel/suspend.c:18:5: error: no previous prototype for 'pfn_is_nosave' [-Werror=missing-prototypes] This moves the declaration into a globally visible header file and add missing include to avoid a warning on powerpc. Also remove the duplicated prototypes since not required anymore. Signed-off-by: Mathieu Malaterre <[email protected]> Acked-by: Michael Ellerman <[email protected]> (powerpc) Signed-off-by: Rafael J. Wysocki <[email protected]>
2019-06-14kernel/module: Fix mem leak in module_add_modinfo_attrsYueHaibing1-5/+17
In module_add_modinfo_attrs if sysfs_create_file fails, we forget to free allocated modinfo_attrs and roll back the sysfs files. Fixes: 03e88ae1b13d ("[PATCH] fix module sysfs files reference counting") Reviewed-by: Miroslav Benes <[email protected]> Signed-off-by: YueHaibing <[email protected]> Signed-off-by: Jessica Yu <[email protected]>
2019-06-13mm/devm_memremap_pages: fix final page put raceDan Williams1-5/+12
Logan noticed that devm_memremap_pages_release() kills the percpu_ref drops all the page references that were acquired at init and then immediately proceeds to unplug, arch_remove_memory(), the backing pages for the pagemap. If for some reason device shutdown actually collides with a busy / elevated-ref-count page then arch_remove_memory() should be deferred until after that reference is dropped. As it stands the "wait for last page ref drop" happens *after* devm_memremap_pages_release() returns, which is obviously too late and can lead to crashes. Fix this situation by assigning the responsibility to wait for the percpu_ref to go idle to devm_memremap_pages() with a new ->cleanup() callback. Implement the new cleanup callback for all devm_memremap_pages() users: pmem, devdax, hmm, and p2pdma. Link: http://lkml.kernel.org/r/155727339156.292046.5432007428235387859.stgit@dwillia2-desk3.amr.corp.intel.com Fixes: 41e94a851304 ("add devm_memremap_pages") Signed-off-by: Dan Williams <[email protected]> Reported-by: Logan Gunthorpe <[email protected]> Reviewed-by: Ira Weiny <[email protected]> Reviewed-by: Logan Gunthorpe <[email protected]> Cc: Bjorn Helgaas <[email protected]> Cc: "Jérôme Glisse" <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: "Rafael J. Wysocki" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-06-13mm/devm_memremap_pages: introduce devm_memunmap_pagesDan Williams1-0/+6
Use the new devm_release_action() facility to allow devm_memremap_pages_release() to be manually triggered. Link: http://lkml.kernel.org/r/155727337088.292046.5774214552136776763.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <[email protected]> Reviewed-by: Ira Weiny <[email protected]> Reviewed-by: Logan Gunthorpe <[email protected]> Cc: Bjorn Helgaas <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: "Jérôme Glisse" <[email protected]> Cc: "Rafael J. Wysocki" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-06-13rcu: Upgrade sync_exp_work_done() to smp_mb()Paul E. McKenney1-2/+1
The sync_exp_work_done() function uses smp_mb__before_atomic(), but there is no obvious atomic in the ensuing code. The ordering is absolutely required for grace periods to work correctly, so this commit upgrades the smp_mb__before_atomic() to smp_mb(). Fixes: 6fba2b3767ea ("rcu: Remove deprecated RCU debugfs tracing code") Reported-by: Andrea Parri <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2019-06-12cpuset: restore sanity to cpuset_cpus_allowed_fallback()Joel Savitz1-1/+14
In the case that a process is constrained by taskset(1) (i.e. sched_setaffinity(2)) to a subset of available cpus, and all of those are subsequently offlined, the scheduler will set tsk->cpus_allowed to the current value of task_cs(tsk)->effective_cpus. This is done via a call to do_set_cpus_allowed() in the context of cpuset_cpus_allowed_fallback() made by the scheduler when this case is detected. This is the only call made to cpuset_cpus_allowed_fallback() in the latest mainline kernel. However, this is not sane behavior. I will demonstrate this on a system running the latest upstream kernel with the following initial configuration: # grep -i cpu /proc/$$/status Cpus_allowed: ffffffff,fffffff Cpus_allowed_list: 0-63 (Where cpus 32-63 are provided via smt.) If we limit our current shell process to cpu2 only and then offline it and reonline it: # taskset -p 4 $$ pid 2272's current affinity mask: ffffffffffffffff pid 2272's new affinity mask: 4 # echo off > /sys/devices/system/cpu/cpu2/online # dmesg | tail -3 [ 2195.866089] process 2272 (bash) no longer affine to cpu2 [ 2195.872700] IRQ 114: no longer affine to CPU2 [ 2195.879128] smpboot: CPU 2 is now offline # echo on > /sys/devices/system/cpu/cpu2/online # dmesg | tail -1 [ 2617.043572] smpboot: Booting Node 0 Processor 2 APIC 0x4 We see that our current process now has an affinity mask containing every cpu available on the system _except_ the one we originally constrained it to: # grep -i cpu /proc/$$/status Cpus_allowed: ffffffff,fffffffb Cpus_allowed_list: 0-1,3-63 This is not sane behavior, as the scheduler can now not only place the process on previously forbidden cpus, it can't even schedule it on the cpu it was originally constrained to! Other cases result in even more exotic affinity masks. Take for instance a process with an affinity mask containing only cpus provided by smt at the moment that smt is toggled, in a configuration such as the following: # taskset -p f000000000 $$ # grep -i cpu /proc/$$/status Cpus_allowed: 000000f0,00000000 Cpus_allowed_list: 36-39 A double toggle of smt results in the following behavior: # echo off > /sys/devices/system/cpu/smt/control # echo on > /sys/devices/system/cpu/smt/control # grep -i cpus /proc/$$/status Cpus_allowed: ffffff00,ffffffff Cpus_allowed_list: 0-31,40-63 This is even less sane than the previous case, as the new affinity mask excludes all smt-provided cpus with ids less than those that were previously in the affinity mask, as well as those that were actually in the mask. With this patch applied, both of these cases end in the following state: # grep -i cpu /proc/$$/status Cpus_allowed: ffffffff,ffffffff Cpus_allowed_list: 0-63 The original policy is discarded. Though not ideal, it is the simplest way to restore sanity to this fallback case without reinventing the cpuset wheel that rolls down the kernel just fine in cgroup v2. A user who wishes for the previous affinity mask to be restored in this fallback case can use that mechanism instead. This patch modifies scheduler behavior by instead resetting the mask to task_cs(tsk)->cpus_allowed by default, and cpu_possible mask in legacy mode. I tested the cases above on both modes. Note that the scheduler uses this fallback mechanism if and only if _every_ other valid avenue has been traveled, and it is the last resort before calling BUG(). Suggested-by: Waiman Long <[email protected]> Suggested-by: Phil Auld <[email protected]> Signed-off-by: Joel Savitz <[email protected]> Acked-by: Phil Auld <[email protected]> Acked-by: Waiman Long <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Tejun Heo <[email protected]>
2019-06-12bpf: silence warning messages in coreValdis Klētnieks1-0/+1
Compiling kernel/bpf/core.c with W=1 causes a flood of warnings: kernel/bpf/core.c:1198:65: warning: initialized field overwritten [-Woverride-init] 1198 | #define BPF_INSN_3_TBL(x, y, z) [BPF_##x | BPF_##y | BPF_##z] = true | ^~~~ kernel/bpf/core.c:1087:2: note: in expansion of macro 'BPF_INSN_3_TBL' 1087 | INSN_3(ALU, ADD, X), \ | ^~~~~~ kernel/bpf/core.c:1202:3: note: in expansion of macro 'BPF_INSN_MAP' 1202 | BPF_INSN_MAP(BPF_INSN_2_TBL, BPF_INSN_3_TBL), | ^~~~~~~~~~~~ kernel/bpf/core.c:1198:65: note: (near initialization for 'public_insntable[12]') 1198 | #define BPF_INSN_3_TBL(x, y, z) [BPF_##x | BPF_##y | BPF_##z] = true | ^~~~ kernel/bpf/core.c:1087:2: note: in expansion of macro 'BPF_INSN_3_TBL' 1087 | INSN_3(ALU, ADD, X), \ | ^~~~~~ kernel/bpf/core.c:1202:3: note: in expansion of macro 'BPF_INSN_MAP' 1202 | BPF_INSN_MAP(BPF_INSN_2_TBL, BPF_INSN_3_TBL), | ^~~~~~~~~~~~ 98 copies of the above. The attached patch silences the warnings, because we *know* we're overwriting the default initializer. That leaves bpf/core.c with only 6 other warnings, which become more visible in comparison. Signed-off-by: Valdis Kletnieks <[email protected]> Acked-by: Andrii Nakryiko <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]>
2019-06-12cpu/hotplug: Abort disabling secondary CPUs if wakeup is pendingPavankumar Kondeti1-0/+7
When "deep" suspend is enabled, all CPUs except the primary CPU are frozen via CPU hotplug one by one. After all secondary CPUs are unplugged the wakeup pending condition is evaluated and if pending the suspend operation is aborted and the secondary CPUs are brought up again. CPU hotplug is a slow operation, so it makes sense to check for wakeup pending in the freezer loop before bringing down the next CPU. This improves the system suspend abort latency significantly. [ tglx: Massaged changelog and improved printk message ] Signed-off-by: Pavankumar Kondeti <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: "Rafael J. Wysocki" <[email protected]> Cc: Len Brown <[email protected]> Cc: Pavel Machek <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Konrad Rzeszutek Wilk <[email protected]> Cc: iri Kosina <[email protected]> Cc: Mukesh Ojha <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected]
2019-06-12genirq/affinity: Remove unused argument from [__]irq_build_affinity_masks()Minwoo Im1-7/+5
The *affd argument is neither used in irq_build_affinity_masks() nor __irq_build_affinity_masks(). Remove it. Signed-off-by: Minwoo Im <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Ming Lei <[email protected]> Cc: Minwoo Im <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected]
2019-06-12genirq/timings: Add selftest for next event computationDaniel Lezcano1-0/+66
The circular buffers are now validated with selftests. The next interrupt index algorithm which is the hardest part to validate needs extra coverage. Add a selftest which uses the intervals stored in the arrays and insert all the values except the last one. The next event computation must return the same value as the last element which was not inserted. Signed-off-by: Daniel Lezcano <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected]
2019-06-12genirq/timings: Add selftest for irqs circular bufferDaniel Lezcano1-0/+139
After testing the per cpu interrupt circular event, make sure the per interrupt circular buffer usage is correct. Add tests to validate the interrupt circular buffer. Signed-off-by: Daniel Lezcano <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected]
2019-06-12genirq/timings: Add selftest for circular arrayDaniel Lezcano2-0/+122
Due to the complexity of the code and the difficulty to debug it, add some selftests to the framework in order to spot issues or regression at boot time when the runtime testing is enabled for this subsystem. This tests the circular buffer at the limits and validates: - the encoding / decoding of the values - the macro to browse the irq timings circular buffer - the function to push data in the circular buffer Signed-off-by: Daniel Lezcano <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected]
2019-06-12genirq/timings: Encapsulate storing functionDaniel Lezcano1-19/+34
For the next patches providing the selftest, it is required to insert interval values directly in the buffer in order to check the correctness of the code. Encapsulate the code doing that in a always inline function in order to reuse it in the test code. No functional changes. Signed-off-by: Daniel Lezcano <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected]
2019-06-12genirq/timings: Encapsulate timings pushDaniel Lezcano1-9/+12
For the next patches providing the selftest, it is required to artificially insert timings value in the circular buffer in order to check the correctness of the code. Encapsulate the common code between the future test code and the current code with an always-inline tag. No functional change. Signed-off-by: Daniel Lezcano <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected]
2019-06-12genirq/timings: Optimize the period detection speedDaniel Lezcano1-1/+1
With a minimal period and if there is a period which is a multiple of it but lesser than the max period then it will be detected before and the minimal period will be never reached. 1 2 1 2 1 2 1 2 1 2 1 2 <-----> <-----> <-----> <-> <-> <-> <-> <-> <-> In that case, the minimum period is 2 and the maximum period is 5. That means all repeating pattern of 2 will be detected as repeating pattern of 4, it is pointless to go up to 2 when searching for the period as it will always fail. Remove one loop iteration by increasing the minimal period to 3. Signed-off-by: Daniel Lezcano <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected]
2019-06-12genirq/timings: Fix timings buffer inspectionDaniel Lezcano1-5/+18
It appears the index beginning computation is not correct, the current code does: i = (irqts->count & IRQ_TIMINGS_MASK) - 1 If irqts->count is equal to zero, we end up with an index equal to -1, but that does not happen because the function checks against zero before and returns in such case. However, if irqts->count is a multiple of IRQ_TIMINGS_SIZE, the resulting & bit op will be zero and leads also to a -1 index. Re-introduce the iteration loop belonging to the previous variance code which was correct. Fixes: bbba0e7c5cda "genirq/timings: Add array suffix computation code" Signed-off-by: Daniel Lezcano <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected]
2019-06-12genirq/timings: Fix next event index functionDaniel Lezcano1-9/+42
The current code is luckily working with most of the interval samples testing but actually it fails to correctly detect pattern repetition breaking at the end of the buffer. Narrowing down the bug has been a real pain because of the pointers, so the routine is rewrittne by using indexes instead. Fixes: bbba0e7c5cda "genirq/timings: Add array suffix computation code" Signed-off-by: Daniel Lezcano <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected]
2019-06-12hrtimer: Remove unused header includeYangtao Li1-1/+0
seq_file.h does not need to be included, so remove it. Signed-off-by: Yangtao Li <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2019-06-11Merge branch 'for-linus' of ↵Linus Torvalds2-2/+27
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull ptrace fixes from Eric Biederman: "This is just two very minor fixes: - prevent ptrace from reading unitialized kernel memory found twice by syzkaller - restore a missing smp_rmb in ptrace_may_access and add comment tp it so it is not removed by accident again. Apologies for being a little slow about getting this to you, I am still figuring out how to develop with a little baby in the house" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: ptrace: restore smp_rmb() in __ptrace_may_access() signal/ptrace: Don't leak unitialized kernel memory with PTRACE_PEEK_SIGINFO
2019-06-11ptrace: restore smp_rmb() in __ptrace_may_access()Jann Horn2-0/+19
Restore the read memory barrier in __ptrace_may_access() that was deleted a couple years ago. Also add comments on this barrier and the one it pairs with to explain why they're there (as far as I understand). Fixes: bfedb589252c ("mm: Add a user_ns owner to mm_struct and fix ptrace permission checks") Cc: [email protected] Acked-by: Kees Cook <[email protected]> Acked-by: Oleg Nesterov <[email protected]> Signed-off-by: Jann Horn <[email protected]> Signed-off-by: Eric W. Biederman <[email protected]>
2019-06-11swiotlb: Return consistent SWIOTLB segments/nr_tblFlorian Fainelli1-4/+4
With a specifically contrived memory layout where there is no physical memory available to the kernel below the 4GB boundary, we will fail to perform the initial swiotlb_init() call and set no_iotlb_memory to true. There are drivers out there that call into swiotlb_nr_tbl() to determine whether they can use the SWIOTLB. With the right DMA_BIT_MASK() value for these drivers (say 64-bit), they won't ever need to hit swiotlb_tbl_map_single() so this can go unoticed and we would be possibly lying about those drivers. Signed-off-by: Florian Fainelli <[email protected]> Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
2019-06-11swiotlb: Group identical cleanup in swiotlb_cleanup()Florian Fainelli1-8/+10
Avoid repeating the zeroing of global swiotlb variables in two locations and introduce swiotlb_cleanup() to do that. Signed-off-by: Florian Fainelli <[email protected]> Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
2019-06-11bpf: lpm_trie: check left child of last leftmost node for NULLJonathan Lemon1-2/+7
If the leftmost parent node of the tree has does not have a child on the left side, then trie_get_next_key (and bpftool map dump) will not look at the child on the right. This leads to the traversal missing elements. Lookup is not affected. Update selftest to handle this case. Reproducer: bpftool map create /sys/fs/bpf/lpm type lpm_trie key 6 \ value 1 entries 256 name test_lpm flags 1 bpftool map update pinned /sys/fs/bpf/lpm key 8 0 0 0 0 0 value 1 bpftool map update pinned /sys/fs/bpf/lpm key 16 0 0 0 0 128 value 2 bpftool map dump pinned /sys/fs/bpf/lpm Returns only 1 element. (2 expected) Fixes: b471f2f1de8b ("bpf: implement MAP_GET_NEXT_KEY command for LPM_TRIE") Signed-off-by: Jonathan Lemon <[email protected]> Acked-by: Martin KaFai Lau <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]>
2019-06-10bpf: Allow bpf_map_lookup_elem() on an xskmapJonathan Lemon2-2/+31
Currently, the AF_XDP code uses a separate map in order to determine if an xsk is bound to a queue. Instead of doing this, have bpf_map_lookup_elem() return a xdp_sock. Rearrange some xdp_sock members to eliminate structure holes. Remove selftest - will be added back in later patch. Signed-off-by: Jonathan Lemon <[email protected]> Acked-by: Martin KaFai Lau <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]>
2019-06-10Merge branch 'for-5.2-fixes' into for-5.3Tejun Heo1-1/+5