Age | Commit message (Collapse) | Author | Files | Lines |
|
To pick the changes in:
344c6c804703 ("KVM/Hyper-V: Add new KVM capability KVM_CAP_HYPERV_DIRECT_TLBFLUSH")
dee04eee9182 ("KVM: RISC-V: Add KVM_REG_RISCV for ONE_REG interface")
These trigger the rebuild of this object:
CC /tmp/build/perf/trace/beauty/ioctl.o
But do not result in any change in tooling, as the additions are not
being used in any table generatator.
This silences this perf build warning:
Warning: Kernel ABI header at 'tools/include/uapi/linux/kvm.h' differs from latest version at 'include/uapi/linux/kvm.h'
diff -u tools/include/uapi/linux/kvm.h include/uapi/linux/kvm.h
Cc: Adrian Hunter <[email protected]>
Cc: Anup Patel <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Tianyu Lan <[email protected]>
Link: https://lkml.kernel.org/n/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
To pick the changes in:
0cb8410b90e7 ("kvm: svm: Intercept RDPRU")
That trigger a rebuild in too in tooling:
CC /tmp/build/perf/arch/x86/util/kvm-stat.o
But this time around no changes in tooling results, as SVM_EXIT_RDPRU
wasn't added to SVM_EXIT_REASONS, that is used in kvm-stat.c.
And addresses this perf build warnings:
Warning: Kernel ABI header at 'tools/arch/x86/include/uapi/asm/svm.h' differs from latest version at 'arch/x86/include/uapi/asm/svm.h'
diff -u tools/arch/x86/include/uapi/asm/svm.h arch/x86/include/uapi/asm/svm.h
Cc: Adrian Hunter <[email protected]>
Cc: Jim Mattson <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Link: https://lkml.kernel.org/n/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
To pick the changes in:
bf653b78f960 ("KVM: vmx: Introduce handle_unexpected_vmexit and handle WAITPKG vmexit")
That trigger these changes in tooling:
CC /tmp/build/perf/arch/x86/util/kvm-stat.o
INSTALL GTK UI
DESCEND plugins
make[3]: Nothing to be done for '/tmp/build/perf/plugins/libtraceevent-dynamic-list'.
INSTALL trace_plugins
LD /tmp/build/perf/arch/x86/util/perf-in.o
LD /tmp/build/perf/arch/x86/perf-in.o
LD /tmp/build/perf/arch/perf-in.o
LD /tmp/build/perf/perf-in.o
LINK /tmp/build/perf/perf
And this is not just because that header is included, kvm-stat.c
uses the VMX_EXIT_REASONS define and it got changed by the above cset.
And addresses this perf build warnings:
Warning: Kernel ABI header at 'tools/arch/x86/include/uapi/asm/vmx.h' differs from latest version at 'arch/x86/include/uapi/asm/vmx.h'
diff -u tools/arch/x86/include/uapi/asm/vmx.h arch/x86/include/uapi/asm/vmx.h
Cc: Adrian Hunter <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Tao Xu <[email protected]>
Cc: Wang Nan <[email protected]>
Link: https://lkml.kernel.org/n/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
There is a memory leak problem in the failure paths of
build_cl_output(), so fix it.
Signed-off-by: Yunfeng Ye <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Feilong Lin <[email protected]>
Cc: Hu Shiyuan <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
slow_copyfile() opens the file by name, so "write" permissions must not
be removed in copyfile_mode_ns() before calling slow_copyfile().
Example:
Before:
$ sudo chmod +r /proc/kcore
$ sudo setcap "cap_sys_admin,cap_sys_ptrace,cap_syslog,cap_sys_rawio=ep" tools/perf/perf
$ tools/perf/perf buildid-cache -k /proc/kcore
Couldn't add /proc/kcore
After:
$ sudo chmod +r /proc/kcore
$ sudo setcap "cap_sys_admin,cap_sys_ptrace,cap_syslog,cap_sys_rawio=ep" tools/perf/perf
$ tools/perf/perf buildid-cache -v -k /proc/kcore
kcore added to build-id cache directory /home/ahunter/.debug/[kernel.kcore]/37e340b1b5a7cf4f57ba8de2bc777359588a957f/2019100709562289
Signed-off-by: Adrian Hunter <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Store SYMBOL_ANNOTATE_ERRNO__BPF_MISSING_BTF in variable *ret*, instead
of returning in the middle of the function and leaking multiple
resources: prog_linfo, btf, s and bfdf.
Addresses-Coverity-ID: 1454832 ("Structurally dead code")
Fixes: 11aad897f6d1 ("perf annotate: Don't return -1 for error when doing BPF disassembly")
Signed-off-by: Gustavo A. R. Silva <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lore.kernel.org/lkml/20191014171047.GA30850@embeddedor
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Now we recalculate the sequence of timeout with 'req->sequence =
ctx->cached_sq_head + count - 1', judge the right place to insert
for timeout_list by compare the number of request we still expected for
completion. But we have not consider about the situation of overflow:
1. ctx->cached_sq_head + count - 1 may overflow. And a bigger count for
the new timeout req can have a small req->sequence.
2. cached_sq_head of now may overflow compare with before req. And it
will lead the timeout req with small req->sequence.
This overflow will lead to the misorder of timeout_list, which can lead
to the wrong order of the completion of timeout_list. Fix it by reuse
req->submit.sequence to store the count, and change the logic of
inserting sort in io_timeout.
Signed-off-by: yangerkun <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Both build_mem_topology() and rm_rf_depth_pat() have resource leaks of
closedir() on the error paths.
Fix this by calling closedir() before function returns.
Fixes: e2091cedd51b ("perf tools: Add MEM_TOPOLOGY feature to perf data file")
Fixes: cdb6b0235f17 ("perf tools: Add pattern name checking to rm_rf")
Signed-off-by: Yunfeng Ye <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Alexey Budankov <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: Feilong Lin <[email protected]>
Cc: Hu Shiyuan <[email protected]>
Cc: Igor Lubashev <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Martin KaFai Lau <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Song Liu <[email protected]>
Cc: Yonghong Song <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
In the earlier fix for the memory overrun of id arrays I managed to typo
the wrong event in the fix.
Of course we need to close the current event in the loop, not the
original failing event.
The same test case as in the original patch still passes.
Fixes: 7834fa948beb ("perf evlist: Fix access of freed id arrays")
Signed-off-by: Andi Kleen <[email protected]>
Cc: Jiri Olsa <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
The build of file libperf-jvmti.so succeeds but the resulting
object fails to load:
# ~/linux/tools/perf/perf record -k mono -- java \
-XX:+PreserveFramePointer \
-agentpath:/root/linux/tools/perf/libperf-jvmti.so \
hog 100000 123450
Error occurred during initialization of VM
Could not find agent library /root/linux/tools/perf/libperf-jvmti.so
in absolute path, with error:
/root/linux/tools/perf/libperf-jvmti.so: undefined symbol: _ctype
Add the missing _ctype symbol into the build script.
Fixes: 79743bc927f6 ("perf jvmti: Link against tools/lib/string.o to have weak strlcpy()")
Signed-off-by: Thomas Richter <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Fix bashism reported by checkbashisms by using only one '=':
possible bashism in scripts/setlocalversion line 96 (should be 'b = a'):
if [ "`hg log -r . --template '{latesttagdistance}'`" == "1" ]; then
Fixes: 38b3439d84f4 ("setlocalversion: update mercurial tag parsing")
Signed-off-by: Randy Dunlap <[email protected]>
Cc: Mike Crowe <[email protected]>
Signed-off-by: Masahiro Yamada <[email protected]>
|
|
Commit 000ec95fbe75 ("kbuild: pkg: rename scripts/package/Makefile to
scripts/Makefile.package") missed to update this comment.
Fixes: 000ec95fbe75 ("kbuild: pkg: rename scripts/package/Makefile to scripts/Makefile.package")
Signed-off-by: Masahiro Yamada <[email protected]>
|
|
During nvme_tcp_setup_cmd_pdu error flow, one must call nvme_cleanup_cmd
since it's symmetric to nvme_setup_cmd.
Signed-off-by: Max Gurtovoy <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Keith Busch <[email protected]>
|
|
During nvme_loop_queue_rq error flow, one must call nvme_cleanup_cmd since
it's symmetric to nvme_setup_cmd.
Signed-off-by: Max Gurtovoy <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Keith Busch <[email protected]>
|
|
The patch 32b593bfcb58 ("Btrfs: remove no longer used function to run
delayed refs asynchronously") removed the async delayed refs but the
thread has been created, without any use. Remove it to avoid resource
consumption.
Fixes: 32b593bfcb58 ("Btrfs: remove no longer used function to run delayed refs asynchronously")
CC: [email protected] # 5.2+
Reviewed-by: Josef Bacik <[email protected]>
Signed-off-by: David Sterba <[email protected]>
|
|
IOMMU Event Log encodes 20-bit PASID for events:
ILLEGAL_DEV_TABLE_ENTRY
IO_PAGE_FAULT
PAGE_TAB_HARDWARE_ERROR
INVALID_DEVICE_REQUEST
as:
PASID[15:0] = bit 47:32
PASID[19:16] = bit 19:16
Note that INVALID_PPR_REQUEST event has different encoding
from the rest of the events as the following:
PASID[15:0] = bit 31:16
PASID[19:16] = bit 45:42
So, fixes the decoding logic.
Fixes: d64c0486ed50 ("iommu/amd: Update the PASID information printed to the system log")
Cc: Joerg Roedel <[email protected]>
Cc: Gary R Hook <[email protected]>
Signed-off-by: Suravee Suthikulpanit <[email protected]>
Signed-off-by: Joerg Roedel <[email protected]>
|
|
As platform_get_irq() now prints an error when the interrupt does not
exist, calling it gratuitously causes scary messages like:
ipmmu-vmsa e6740000.mmu: IRQ index 0 not found
Fix this by moving the call to platform_get_irq() down, where the
existence of the interrupt is mandatory.
Fixes: 7723f4c5ecdb8d83 ("driver core: platform: Add an error message to platform_get_irq*()")
Signed-off-by: Geert Uytterhoeven <[email protected]>
Reviewed-by: Yoshihiro Shimoda <[email protected]>
Tested-by: Yoshihiro Shimoda <[email protected]>
Reviewed-by: Stephen Boyd <[email protected]>
Signed-off-by: Joerg Roedel <[email protected]>
|
|
Till now the Rockchip iommu driver walked through the irq list via
platform_get_irq() until it encountered an ENXIO error. With the
recent change to add a central error message, this always results
in such an error for each iommu on probe and shutdown.
To not confuse people, switch to platform_count_irqs() to get the
actual number of interrupts before walking through them.
Fixes: 7723f4c5ecdb ("driver core: platform: Add an error message to platform_get_irq*()")
Signed-off-by: Heiko Stuebner <[email protected]>
Tested-by: Enric Balletbo i Serra <[email protected]>
Signed-off-by: Joerg Roedel <[email protected]>
|
|
Check that the per-cpu cluster mask pointer has been set prior to
clearing a dying cpu's bit. The per-cpu pointer is not set until the
target cpu reaches smp_callin() during CPUHP_BRINGUP_CPU, whereas the
teardown function, x2apic_dead_cpu(), is associated with the earlier
CPUHP_X2APIC_PREPARE. If an error occurs before the cpu is awakened,
e.g. if do_boot_cpu() itself fails, x2apic_dead_cpu() will dereference
the NULL pointer and cause a panic.
smpboot: do_boot_cpu failed(-22) to wakeup CPU#1
BUG: kernel NULL pointer dereference, address: 0000000000000008
RIP: 0010:x2apic_dead_cpu+0x1a/0x30
Call Trace:
cpuhp_invoke_callback+0x9a/0x580
_cpu_up+0x10d/0x140
do_cpu_up+0x69/0xb0
smp_init+0x63/0xa9
kernel_init_freeable+0xd7/0x229
? rest_init+0xa0/0xa0
kernel_init+0xa/0x100
ret_from_fork+0x35/0x40
Fixes: 023a611748fd5 ("x86/apic/x2apic: Simplify cluster management")
Signed-off-by: Sean Christopherson <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]
|
|
Now that there's Hyper-V IOMMU driver, Linux can switch to x2apic mode
when supported by the vcpus.
However, the apic access functions for Hyper-V enlightened apic assume
xapic mode only.
As a result, Linux fails to bring up secondary cpus when run as a guest
in QEMU/KVM with both hv_apic and x2apic enabled.
According to Michael Kelley, when in x2apic mode, the Hyper-V synthetic
apic MSRs behave exactly the same as the corresponding architectural
x2apic MSRs, so there's no need to override the apic accessors. The
only exception is hv_apic_eoi_write, which benefits from lazy EOI when
available; however, its implementation works for both xapic and x2apic
modes.
Fixes: 29217a474683 ("iommu/hyper-v: Add Hyper-V stub IOMMU driver")
Fixes: 6b48cb5f8347 ("X86/Hyper-V: Enlighten APIC access")
Suggested-by: Michael Kelley <[email protected]>
Signed-off-by: Roman Kagan <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Vitaly Kuznetsov <[email protected]>
Reviewed-by: Michael Kelley <[email protected]>
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]
|
|
Connecting a vCPU to a XIVE KVM device means establishing a 1:1
association between a vCPU id and the offset (VP id) of a VP
structure within a fixed size block of VPs. We currently try to
enforce the 1:1 relationship by checking that a vCPU with the
same id isn't already connected. This is good but unfortunately
not enough because we don't map VP ids to raw vCPU ids but to
packed vCPU ids, and the packing function kvmppc_pack_vcpu_id()
isn't bijective by design. We got away with it because QEMU passes
vCPU ids that fit well in the packing pattern. But nothing prevents
userspace to come up with a forged vCPU id resulting in a packed id
collision which causes the KVM device to associate two vCPUs to the
same VP. This greatly confuses the irq layer and ultimately crashes
the kernel, as shown below.
Example: a guest with 1 guest thread per core, a core stride of
8 and 300 vCPUs has vCPU ids 0,8,16...2392. If QEMU is patched to
inject at some point an invalid vCPU id 348, which is the packed
version of itself and 2392, we get:
genirq: Flags mismatch irq 199. 00010000 (kvm-2-2392) vs. 00010000 (kvm-2-348)
CPU: 24 PID: 88176 Comm: qemu-system-ppc Not tainted 5.3.0-xive-nr-servers-5.3-gku+ #38
Call Trace:
[c000003f7f9937e0] [c000000000c0110c] dump_stack+0xb0/0xf4 (unreliable)
[c000003f7f993820] [c0000000001cb480] __setup_irq+0xa70/0xad0
[c000003f7f9938d0] [c0000000001cb75c] request_threaded_irq+0x13c/0x260
[c000003f7f993940] [c00800000d44e7ac] kvmppc_xive_attach_escalation+0x104/0x270 [kvm]
[c000003f7f9939d0] [c00800000d45013c] kvmppc_xive_connect_vcpu+0x424/0x620 [kvm]
[c000003f7f993ac0] [c00800000d444428] kvm_arch_vcpu_ioctl+0x260/0x448 [kvm]
[c000003f7f993b90] [c00800000d43593c] kvm_vcpu_ioctl+0x154/0x7c8 [kvm]
[c000003f7f993d00] [c0000000004840f0] do_vfs_ioctl+0xe0/0xc30
[c000003f7f993db0] [c000000000484d44] ksys_ioctl+0x104/0x120
[c000003f7f993e00] [c000000000484d88] sys_ioctl+0x28/0x80
[c000003f7f993e20] [c00000000000b278] system_call+0x5c/0x68
xive-kvm: Failed to request escalation interrupt for queue 0 of VCPU 2392
------------[ cut here ]------------
remove_proc_entry: removing non-empty directory 'irq/199', leaking at least 'kvm-2-348'
WARNING: CPU: 24 PID: 88176 at /home/greg/Work/linux/kernel-kvm-ppc/fs/proc/generic.c:684 remove_proc_entry+0x1ec/0x200
Modules linked in: kvm_hv kvm dm_mod vhost_net vhost tap xt_CHECKSUM iptable_mangle xt_MASQUERADE iptable_nat nf_nat xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ipt_REJECT nf_reject_ipv4 tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter squashfs loop fuse i2c_dev sg ofpart ocxl powernv_flash at24 xts mtd uio_pdrv_genirq vmx_crypto opal_prd ipmi_powernv uio ipmi_devintf ipmi_msghandler ibmpowernv ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables ext4 mbcache jbd2 raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor xor async_tx raid6_pq libcrc32c raid1 raid0 linear sd_mod ast i2c_algo_bit drm_vram_helper ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm ahci libahci libata tg3 drm_panel_orientation_quirks [last unloaded: kvm]
CPU: 24 PID: 88176 Comm: qemu-system-ppc Not tainted 5.3.0-xive-nr-servers-5.3-gku+ #38
NIP: c00000000053b0cc LR: c00000000053b0c8 CTR: c0000000000ba3b0
REGS: c000003f7f9934b0 TRAP: 0700 Not tainted (5.3.0-xive-nr-servers-5.3-gku+)
MSR: 9000000000029033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 48228222 XER: 20040000
CFAR: c000000000131a50 IRQMASK: 0
GPR00: c00000000053b0c8 c000003f7f993740 c0000000015ec500 0000000000000057
GPR04: 0000000000000001 0000000000000000 000049fb98484262 0000000000001bcf
GPR08: 0000000000000007 0000000000000007 0000000000000001 9000000000001033
GPR12: 0000000000008000 c000003ffffeb800 0000000000000000 000000012f4ce5a1
GPR16: 000000012ef5a0c8 0000000000000000 000000012f113bb0 0000000000000000
GPR20: 000000012f45d918 c000003f863758b0 c000003f86375870 0000000000000006
GPR24: c000003f86375a30 0000000000000007 c0002039373d9020 c0000000014c4a48
GPR28: 0000000000000001 c000003fe62a4f6b c00020394b2e9fab c000003fe62a4ec0
NIP [c00000000053b0cc] remove_proc_entry+0x1ec/0x200
LR [c00000000053b0c8] remove_proc_entry+0x1e8/0x200
Call Trace:
[c000003f7f993740] [c00000000053b0c8] remove_proc_entry+0x1e8/0x200 (unreliable)
[c000003f7f9937e0] [c0000000001d3654] unregister_irq_proc+0x114/0x150
[c000003f7f993880] [c0000000001c6284] free_desc+0x54/0xb0
[c000003f7f9938c0] [c0000000001c65ec] irq_free_descs+0xac/0x100
[c000003f7f993910] [c0000000001d1ff8] irq_dispose_mapping+0x68/0x80
[c000003f7f993940] [c00800000d44e8a4] kvmppc_xive_attach_escalation+0x1fc/0x270 [kvm]
[c000003f7f9939d0] [c00800000d45013c] kvmppc_xive_connect_vcpu+0x424/0x620 [kvm]
[c000003f7f993ac0] [c00800000d444428] kvm_arch_vcpu_ioctl+0x260/0x448 [kvm]
[c000003f7f993b90] [c00800000d43593c] kvm_vcpu_ioctl+0x154/0x7c8 [kvm]
[c000003f7f993d00] [c0000000004840f0] do_vfs_ioctl+0xe0/0xc30
[c000003f7f993db0] [c000000000484d44] ksys_ioctl+0x104/0x120
[c000003f7f993e00] [c000000000484d88] sys_ioctl+0x28/0x80
[c000003f7f993e20] [c00000000000b278] system_call+0x5c/0x68
Instruction dump:
2c230000 41820008 3923ff78 e8e900a0 3c82ff69 3c62ff8d 7fa6eb78 7fc5f378
3884f080 3863b948 4bbf6925 60000000 <0fe00000> 4bffff7c fba10088 4bbf6e41
---[ end trace b925b67a74a1d8d1 ]---
BUG: Kernel NULL pointer dereference at 0x00000010
Faulting instruction address: 0xc00800000d44fc04
Oops: Kernel access of bad area, sig: 11 [#1]
LE PAGE_SIZE=64K MMU=Radix MMU=Hash SMP NR_CPUS=2048 NUMA PowerNV
Modules linked in: kvm_hv kvm dm_mod vhost_net vhost tap xt_CHECKSUM iptable_mangle xt_MASQUERADE iptable_nat nf_nat xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ipt_REJECT nf_reject_ipv4 tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter squashfs loop fuse i2c_dev sg ofpart ocxl powernv_flash at24 xts mtd uio_pdrv_genirq vmx_crypto opal_prd ipmi_powernv uio ipmi_devintf ipmi_msghandler ibmpowernv ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables ext4 mbcache jbd2 raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor xor async_tx raid6_pq libcrc32c raid1 raid0 linear sd_mod ast i2c_algo_bit drm_vram_helper ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm ahci libahci libata tg3 drm_panel_orientation_quirks [last unloaded: kvm]
CPU: 24 PID: 88176 Comm: qemu-system-ppc Tainted: G W 5.3.0-xive-nr-servers-5.3-gku+ #38
NIP: c00800000d44fc04 LR: c00800000d44fc00 CTR: c0000000001cd970
REGS: c000003f7f9938e0 TRAP: 0300 Tainted: G W (5.3.0-xive-nr-servers-5.3-gku+)
MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 24228882 XER: 20040000
CFAR: c0000000001cd9ac DAR: 0000000000000010 DSISR: 40000000 IRQMASK: 0
GPR00: c00800000d44fc00 c000003f7f993b70 c00800000d468300 0000000000000000
GPR04: 00000000000000c7 0000000000000000 0000000000000000 c000003ffacd06d8
GPR08: 0000000000000000 c000003ffacd0738 0000000000000000 fffffffffffffffd
GPR12: 0000000000000040 c000003ffffeb800 0000000000000000 000000012f4ce5a1
GPR16: 000000012ef5a0c8 0000000000000000 000000012f113bb0 0000000000000000
GPR20: 000000012f45d918 00007ffffe0d9a80 000000012f4f5df0 000000012ef8c9f8
GPR24: 0000000000000001 0000000000000000 c000003fe4501ed0 c000003f8b1d0000
GPR28: c0000033314689c0 c000003fe4501c00 c000003fe4501e70 c000003fe4501e90
NIP [c00800000d44fc04] kvmppc_xive_cleanup_vcpu+0xfc/0x210 [kvm]
LR [c00800000d44fc00] kvmppc_xive_cleanup_vcpu+0xf8/0x210 [kvm]
Call Trace:
[c000003f7f993b70] [c00800000d44fc00] kvmppc_xive_cleanup_vcpu+0xf8/0x210 [kvm] (unreliable)
[c000003f7f993bd0] [c00800000d450bd4] kvmppc_xive_release+0xdc/0x1b0 [kvm]
[c000003f7f993c30] [c00800000d436a98] kvm_device_release+0xb0/0x110 [kvm]
[c000003f7f993c70] [c00000000046730c] __fput+0xec/0x320
[c000003f7f993cd0] [c000000000164ae0] task_work_run+0x150/0x1c0
[c000003f7f993d30] [c000000000025034] do_notify_resume+0x304/0x440
[c000003f7f993e20] [c00000000000dcc4] ret_from_except_lite+0x70/0x74
Instruction dump:
3bff0008 7fbfd040 419e0054 847e0004 2fa30000 419effec e93d0000 8929203c
2f890000 419effb8 4800821d e8410018 <e9230010> e9490008 9b2a0039 7c0004ac
---[ end trace b925b67a74a1d8d2 ]---
Kernel panic - not syncing: Fatal exception
This affects both XIVE and XICS-on-XIVE devices since the beginning.
Check the VP id instead of the vCPU id when a new vCPU is connected.
The allocation of the XIVE CPU structure in kvmppc_xive_connect_vcpu()
is moved after the check to avoid the need for rollback.
Cc: [email protected] # v4.12+
Signed-off-by: Greg Kurz <[email protected]>
Reviewed-by: Cédric Le Goater <[email protected]>
Signed-off-by: Paul Mackerras <[email protected]>
|
|
There is a bug in create_safe_exec_page(), when page table is allocated
it is not checked that table is allocated successfully:
But it is dereferenced in: pgd_none(READ_ONCE(*pgdp)). Check that
allocation was successful.
Fixes: 82869ac57b5d ("arm64: kernel: Add support for hibernate/suspend-to-disk")
Reviewed-by: James Morse <[email protected]>
Signed-off-by: Pavel Tatashin <[email protected]>
Signed-off-by: Will Deacon <[email protected]>
|
|
If CONFIG_ARM64_SVE=n then we fail to report ID_AA64ZFR0_EL1 as 0 when
read by userspace, despite being required by the architecture. Although
this is theoretically a change in ABI, userspace will first check for
the presence of SVE via the HWCAP or the ID_AA64PFR0_EL1.SVE field
before probing the ID_AA64ZFR0_EL1 register. Given that these are
reported correctly for this configuration, we can safely tighten up the
current behaviour.
Ensure ID_AA64ZFR0_EL1 is treated as RAZ when CONFIG_ARM64_SVE=n.
Signed-off-by: Julien Grall <[email protected]>
Reviewed-by: Suzuki K Poulose <[email protected]>
Reviewed-by: Mark Rutland <[email protected]>
Reviewed-by: Dave Martin <[email protected]>
Fixes: 06a916feca2b ("arm64: Expose SVE2 features for userspace")
Signed-off-by: Will Deacon <[email protected]>
|
|
Igor Russkikh says:
====================
Aquantia/Marvell AQtion atlantic driver fixes 10/2019
Here is a set of various bugfixes, to be considered for stable as well.
V2: double space removed
====================
Signed-off-by: David S. Miller <[email protected]>
|
|
macvlan and multicast handling is now mixed up.
The explicit issue is that macvlan interface gets broken (no traffic)
after clearing MULTICAST flag on the real interface.
We now do separate logic and consider both ALLMULTI and MULTICAST
flags on the device.
Fixes: 11ba961c9161 ("net: aquantia: Fix IFF_ALLMULTI flag functionality")
Signed-off-by: Dmitry Bogdanov <[email protected]>
Signed-off-by: Igor Russkikh <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Individual descriptors on LRO TCP session should be checked
for CRC errors. It was discovered that HW recalculates
L4 checksums on LRO session and does not break it up on bad L4
csum.
Thus, driver should aggregate HW LRO L4 statuses from all individual
buffers of LRO session and drop packet if one of the buffers has bad
L4 checksum.
Fixes: f38f1ee8aeb2 ("net: aquantia: check rx csum for all packets in LRO session")
Signed-off-by: Dmitry Bogdanov <[email protected]>
Signed-off-by: Igor Russkikh <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
>From HW specification to correctly reset HW caches (this is a required
workaround when stopping the device), register bit should actually
be toggled.
It was previosly always just set. Due to the way driver stops HW this
never actually caused any issues, but it still may, so cleaning this up.
Fixes: 7a1bb49461b1 ("net: aquantia: fix potential IOMMU fault after driver unbind")
Signed-off-by: Igor Russkikh <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Chip temperature is a two byte word, colocated internally with cable
length data. We do all readouts from HW memory by dwords, thus
we should clear extra high bytes, otherwise temperature output
gets weird as soon as we attach a cable to the NIC.
Fixes: 8f8940118654 ("net: aquantia: add infrastructure to readout chip temperature")
Tested-by: Holger Hoffstätte <[email protected]>
Signed-off-by: Igor Russkikh <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Merge more fixes from Andrew Morton:
"The usual shower of hotfixes and some followups to the recently merged
page_owner enhancements"
* emailed patches from Andrew Morton <[email protected]>:
mm/memory-failure: poison read receives SIGKILL instead of SIGBUS if mmaped more than once
mm/slab.c: fix kernel-doc warning for __ksize()
xarray.h: fix kernel-doc warning
bitmap.h: fix kernel-doc warning and typo
fs/fs-writeback.c: fix kernel-doc warning
fs/libfs.c: fix kernel-doc warning
fs/direct-io.c: fix kernel-doc warning
mm, compaction: fix wrong pfn handling in __reset_isolation_pfn()
mm, hugetlb: allow hugepage allocations to reclaim as needed
lib/test_meminit: add a kmem_cache_alloc_bulk() test
mm/slub.c: init_on_free=1 should wipe freelist ptr for bulk allocations
lib/generic-radix-tree.c: add kmemleak annotations
mm/slub: fix a deadlock in show_slab_objects()
mm, page_owner: rename flag indicating that page is allocated
mm, page_owner: decouple freeing stack trace from debug_pagealloc
mm, page_owner: fix off-by-one error in __set_page_owner_handle()
|
|
We switch the default handler to be handle_bad_irq() instead of
handle_simple_irq() (which was not correct anyway).
Signed-off-by: Andy Shevchenko <[email protected]>
Signed-off-by: Linus Walleij <[email protected]>
|
|
The driver wants to initialize related registers before IRQ chip will be added.
That's why move it to a corresponding callback. It also fixes the NULL pointer
dereference.
Fixes: 8f86a5b4ad67 ("gpio: merrifield: Pass irqchip when adding gpiochip")
Signed-off-by: Andy Shevchenko <[email protected]>
Signed-off-by: Linus Walleij <[email protected]>
|
|
The driver wants to initialize related registers before IRQ chip will be added.
That's why move it to a corresponding callback. It also fixes the NULL pointer
dereference.
Fixes: 7b1e889436a1 ("gpio: lynxpoint: Pass irqchip when adding gpiochip")
Signed-off-by: Andy Shevchenko <[email protected]>
Signed-off-by: Linus Walleij <[email protected]>
|
|
The driver wants to initialize related registers before IRQ chip will be added.
That's why move it to a corresponding callback. It also fixes the NULL pointer
dereference.
Fixes: 8069e69a9792 ("gpio: intel-mid: Pass irqchip when adding gpiochip")
Signed-off-by: Andy Shevchenko <[email protected]>
Signed-off-by: Linus Walleij <[email protected]>
|
|
After changing the drivers to use GPIO core to add an IRQ chip
it appears that some of them requires a hardware initialization
before adding the IRQ chip.
Add an optional callback ->init_hw() to allow that drivers
to initialize hardware if needed.
This change is a part of the fix NULL pointer dereference
brought to the several drivers recently.
Cc: Hans de Goede <[email protected]>
Signed-off-by: Andy Shevchenko <[email protected]>
Signed-off-by: Linus Walleij <[email protected]>
|
|
During conversion to internal IRQ chip initialization the commit
8f86a5b4ad67 ("gpio: merrifield: Pass irqchip when adding gpiochip")
lost the irq_base assignment.
drivers/gpio/gpio-merrifield.c: In function ‘mrfld_gpio_probe’:
drivers/gpio/gpio-merrifield.c:405:17: warning: variable ‘irq_base’ set but not used [-Wunused-but-set-variable]
Assign the girq->first to it.
Fixes: 8f86a5b4ad67 ("gpio: merrifield: Pass irqchip when adding gpiochip")
Signed-off-by: Andy Shevchenko <[email protected]>
Signed-off-by: Linus Walleij <[email protected]>
|
|
Custom outs*/ins* implementations are long gone from the xtensa port,
remove matching EXPORT_SYMBOLs.
This fixes the following build warnings issued by modpost since commit
15bfc2348d54 ("modpost: check for static EXPORT_SYMBOL* functions"):
WARNING: "insb" [vmlinux] is a static EXPORT_SYMBOL
WARNING: "insw" [vmlinux] is a static EXPORT_SYMBOL
WARNING: "insl" [vmlinux] is a static EXPORT_SYMBOL
WARNING: "outsb" [vmlinux] is a static EXPORT_SYMBOL
WARNING: "outsw" [vmlinux] is a static EXPORT_SYMBOL
WARNING: "outsl" [vmlinux] is a static EXPORT_SYMBOL
Cc: [email protected]
Fixes: d38efc1f150f ("xtensa: adopt generic io routines")
Signed-off-by: Max Filippov <[email protected]>
|
|
more than once
Mmap /dev/dax more than once, then read the poison location using
address from one of the mappings. The other mappings due to not having
the page mapped in will cause SIGKILLs delivered to the process.
SIGKILL succeeds over SIGBUS, so user process loses the opportunity to
handle the UE.
Although one may add MAP_POPULATE to mmap(2) to work around the issue,
MAP_POPULATE makes mapping 128GB of pmem several magnitudes slower, so
isn't always an option.
Details -
ndctl inject-error --block=10 --count=1 namespace6.0
./read_poison -x dax6.0 -o 5120 -m 2
mmaped address 0x7f5bb6600000
mmaped address 0x7f3cf3600000
doing local read at address 0x7f3cf3601400
Killed
Console messages in instrumented kernel -
mce: Uncorrected hardware memory error in user-access at edbe201400
Memory failure: tk->addr = 7f5bb6601000
Memory failure: address edbe201: call dev_pagemap_mapping_shift
dev_pagemap_mapping_shift: page edbe201: no PUD
Memory failure: tk->size_shift == 0
Memory failure: Unable to find user space address edbe201 in read_poison
Memory failure: tk->addr = 7f3cf3601000
Memory failure: address edbe201: call dev_pagemap_mapping_shift
Memory failure: tk->size_shift = 21
Memory failure: 0xedbe201: forcibly killing read_poison:22434 because of failure to unmap corrupted page
=> to deliver SIGKILL
Memory failure: 0xedbe201: Killing read_poison:22434 due to hardware memory corruption
=> to deliver SIGBUS
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Jane Chu <[email protected]>
Suggested-by: Naoya Horiguchi <[email protected]>
Reviewed-by: Dan Williams <[email protected]>
Acked-by: Naoya Horiguchi <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Fix kernel-doc warning in mm/slab.c:
mm/slab.c:4215: warning: Function parameter or member 'objp' not described in '__ksize'
Also add Return: documentation section for this function.
Link: http://lkml.kernel.org/r/[email protected]
Fixes: 10d1f8cb3965 ("mm/slab: refactor common ksize KASAN logic into slab_common.c")
Signed-off-by: Randy Dunlap <[email protected]>
Acked-by: Marco Elver <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Fix (Sphinx) kernel-doc warning in <linux/xarray.h>:
include/linux/xarray.h:232: WARNING: Unexpected indentation.
Link: http://lkml.kernel.org/r/[email protected]
Fixes: a3e4d3f97ec8 ("XArray: Redesign xa_alloc API")
Signed-off-by: Randy Dunlap <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Fix kernel-doc warning in <linux/bitmap.h>:
include/linux/bitmap.h:341: warning: Function parameter or member 'nbits' not described in 'bitmap_or_equal'
Also fix small typo (bitnaps).
Link: http://lkml.kernel.org/r/[email protected]
Fixes: b9fa6442f704 ("cpumask: Implement cpumask_or_equal()")
Signed-off-by: Randy Dunlap <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Fix kernel-doc warning in fs/fs-writeback.c:
fs/fs-writeback.c:913: warning: Excess function parameter 'nr_pages' description in 'cgroup_writeback_by_id'
Link: http://lkml.kernel.org/r/[email protected]
Fixes: d62241c7a406 ("writeback, memcg: Implement cgroup_writeback_by_id()")
Signed-off-by: Randy Dunlap <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Jens Axboe <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Fix kernel-doc warning in fs/libfs.c:
fs/libfs.c:496: warning: Excess function parameter 'available' description in 'simple_write_end'
Link: http://lkml.kernel.org/r/[email protected]
Fixes: ad2a722f196d ("libfs: Open code simple_commit_write into only user")
Signed-off-by: Randy Dunlap <[email protected]>
Cc: Boaz Harrosh <[email protected]>
Cc: Alexander Viro <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Fix kernel-doc warning in fs/direct-io.c:
fs/direct-io.c:258: warning: Excess function parameter 'offset' description in 'dio_complete'
Also, don't mark this function as having kernel-doc notation since it is
not exported.
Link: http://lkml.kernel.org/r/[email protected]
Fixes: 6d544bb4d901 ("dio: centralize completion in dio_complete()")
Signed-off-by: Randy Dunlap <[email protected]>
Cc: Zach Brown <[email protected]>
Cc: Alexander Viro <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Florian and Dave reported [1] a NULL pointer dereference in
__reset_isolation_pfn(). While the exact cause is unclear, staring at
the code revealed two bugs, which might be related.
One bug is that if zone starts in the middle of pageblock, block_page
might correspond to different pfn than block_pfn, and then the
pfn_valid_within() checks will check different pfn's than those accessed
via struct page. This might result in acessing an unitialized page in
CONFIG_HOLES_IN_ZONE configs.
The other bug is that end_page refers to the first page of next
pageblock and not last page of current pageblock. The online and valid
check is then wrong and with sections, the while (page < end_page) loop
might wander off actual struct page arrays.
[1] https://lore.kernel.org/linux-xfs/[email protected]/
Link: http://lkml.kernel.org/r/[email protected]
Fixes: 6b0868c820ff ("mm/compaction.c: correct zone boundary handling when resetting pageblock skip hints")
Signed-off-by: Vlastimil Babka <[email protected]>
Reported-by: Florian Weimer <[email protected]>
Reported-by: Dave Chinner <[email protected]>
Acked-by: Mel Gorman <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Commit b39d0ee2632d ("mm, page_alloc: avoid expensive reclaim when
compaction may not succeed") has chnaged the allocator to bail out from
the allocator early to prevent from a potentially excessive memory
reclaim. __GFP_RETRY_MAYFAIL is designed to retry the allocation,
reclaim and compaction loop as long as there is a reasonable chance to
make forward progress. Neither COMPACT_SKIPPED nor COMPACT_DEFERRED at
the INIT_COMPACT_PRIORITY compaction attempt gives this feedback.
The most obvious affected subsystem is hugetlbfs which allocates huge
pages based on an admin request (or via admin configured overcommit). I
have done a simple test which tries to allocate half of the memory for
hugetlb pages while the memory is full of a clean page cache. This is
not an unusual situation because we try to cache as much of the memory
as possible and sysctl/sysfs interface to allocate huge pages is there
for flexibility to allocate hugetlb pages at any time.
System has 1GB of RAM and we are requesting 515MB worth of hugetlb pages
after the memory is prefilled by a clean page cache:
root@test1:~# cat hugetlb_test.sh
set -x
echo 0 > /proc/sys/vm/nr_hugepages
echo 3 > /proc/sys/vm/drop_caches
echo 1 > /proc/sys/vm/compact_memory
dd if=/mnt/data/file-1G of=/dev/null bs=$((4<<10))
TS=$(date +%s)
echo 256 > /proc/sys/vm/nr_hugepages
cat /proc/sys/vm/nr_hugepages
The results for 2 consecutive runs on clean 5.3
root@test1:~# sh hugetlb_test.sh
+ echo 0
+ echo 3
+ echo 1
+ dd if=/mnt/data/file-1G of=/dev/null bs=4096
262144+0 records in
262144+0 records out
1073741824 bytes (1.1 GB) copied, 21.0694 s, 51.0 MB/s
+ date +%s
+ TS=1569905284
+ echo 256
+ cat /proc/sys/vm/nr_hugepages
256
root@test1:~# sh hugetlb_test.sh
+ echo 0
+ echo 3
+ echo 1
+ dd if=/mnt/data/file-1G of=/dev/null bs=4096
262144+0 records in
262144+0 records out
1073741824 bytes (1.1 GB) copied, 21.7548 s, 49.4 MB/s
+ date +%s
+ TS=1569905311
+ echo 256
+ cat /proc/sys/vm/nr_hugepages
256
Now with b39d0ee2632d applied
root@test1:~# sh hugetlb_test.sh
+ echo 0
+ echo 3
+ echo 1
+ dd if=/mnt/data/file-1G of=/dev/null bs=4096
262144+0 records in
262144+0 records out
1073741824 bytes (1.1 GB) copied, 20.1815 s, 53.2 MB/s
+ date +%s
+ TS=1569905516
+ echo 256
+ cat /proc/sys/vm/nr_hugepages
11
root@test1:~# sh hugetlb_test.sh
+ echo 0
+ echo 3
+ echo 1
+ dd if=/mnt/data/file-1G of=/dev/null bs=4096
262144+0 records in
262144+0 records out
1073741824 bytes (1.1 GB) copied, 21.9485 s, 48.9 MB/s
+ date +%s
+ TS=1569905541
+ echo 256
+ cat /proc/sys/vm/nr_hugepages
12
The success rate went down by factor of 20!
Although hugetlb allocation requests might fail and it is reasonable to
expect them to under extremely fragmented memory or when the memory is
under a heavy pressure but the above situation is not that case.
Fix the regression by reverting back to the previous behavior for
__GFP_RETRY_MAYFAIL requests and disable the beail out heuristic for
those requests.
Mike said:
: hugetlbfs allocations are commonly done via sysctl/sysfs shortly after
: boot where this may not be as much of an issue. However, I am aware of at
: least three use cases where allocations are made after the system has been
: up and running for quite some time:
:
: - DB reconfiguration. If sysctl/sysfs fails to get required number of
: huge pages, system is rebooted to perform allocation after boot.
:
: - VM provisioning. If unable get required number of huge pages, fall
: back to base pages.
:
: - An application that does not preallocate pool, but rather allocates
: pages at fault time for optimal NUMA locality.
:
: In all cases, I would expect b39d0ee2632d to cause regressions and
: noticable behavior changes.
:
: My quick/limited testing in
: https://lkml.kernel.org/r/[email protected]
: was insufficient. It was also mentioned that if something like
: b39d0ee2632d went forward, I would like exemptions for __GFP_RETRY_MAYFAIL
: requests as in this patch.
[[email protected]: reworded changelog]
Link: http://lkml.kernel.org/r/[email protected]
Fixes: b39d0ee2632d ("mm, page_alloc: avoid expensive reclaim when compaction may not succeed")
Signed-off-by: David Rientjes <[email protected]>
Signed-off-by: Michal Hocko <[email protected]>
Reviewed-by: Mike Kravetz <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
Cc: Mel Gorman <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Make sure allocations from kmem_cache_alloc_bulk() and
kmem_cache_free_bulk() are properly initialized.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Alexander Potapenko <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Laura Abbott <[email protected]>
Cc: Thibaut Sautereau <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
slab_alloc_node() already zeroed out the freelist pointer if
init_on_free was on. Thibaut Sautereau noticed that the same needs to
be done for kmem_cache_alloc_bulk(), which performs the allocations
separately.
kmem_cache_alloc_bulk() is currently used in two places in the kernel,
so this change is unlikely to have a major performance impact.
SLAB doesn't require a similar change, as auto-initialization makes the
allocator store the freelist pointers off-slab.
Link: http://lkml.kernel.org/r/[email protected]
Fixes: 6471384af2a6 ("mm: security: introduce init_on_alloc=1 and init_on_free=1 boot options")
Signed-off-by: Alexander Potapenko <[email protected]>
Reported-by: Thibaut Sautereau <[email protected]>
Reported-by: Kees Cook <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Laura Abbott <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Kmemleak is falsely reporting a leak of the slab allocation in
sctp_stream_init_ext():
BUG: memory leak
unreferenced object 0xffff8881114f5d80 (size 96):
comm "syz-executor934", pid 7160, jiffies 4294993058 (age 31.950s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<00000000ce7a1326>] kmemleak_alloc_recursive include/linux/kmemleak.h:55 [inline]
[<00000000ce7a1326>] slab_post_alloc_hook mm/slab.h:439 [inline]
[<00000000ce7a1326>] slab_alloc mm/slab.c:3326 [inline]
[<00000000ce7a1326>] kmem_cache_alloc_trace+0x13d/0x280 mm/slab.c:3553
[<000000007abb7ac9>] kmalloc include/linux/slab.h:547 [inline]
[<000000007abb7ac9>] kzalloc include/linux/slab.h:742 [inline]
[<000000007abb7ac9>] sctp_stream_init_ext+0x2b/0xa0 net/sctp/stream.c:157
[<0000000048ecb9c1>] sctp_sendmsg_to_asoc+0x946/0xa00 net/sctp/socket.c:1882
[<000000004483ca2b>] sctp_sendmsg+0x2a8/0x990 net/sctp/socket.c:2102
[...]
But it's freed later. Kmemleak misses the allocation because its
pointer is stored in the generic radix tree sctp_stream::out, and the
generic radix tree uses raw pages which aren't tracked by kmemleak.
Fix this by adding the kmemleak hooks to the generic radix tree code.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Eric Biggers <[email protected]>
Reported-by: <[email protected]>
Reviewed-by: Marcelo Ricardo Leitner <[email protected]>
Acked-by: Neil Horman <[email protected]>
Reviewed-by: Catalin Marinas <[email protected]>
Cc: Kent Overstreet <[email protected]>
Cc: Vlad Yasevich <[email protected]>
Cc: Xin Long <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
A long time ago we fixed a similar deadlock in show_slab_objects() [1].
However, it is apparently due to the commits like 01fb58bcba63 ("slab:
remove synchronous synchronize_sched() from memcg cache deactivation
path") and 03afc0e25f7f ("slab: get_online_mems for
kmem_cache_{create,destroy,shrink}"), this kind of deadlock is back by
just reading files in /sys/kernel/slab which will generate a lockdep
splat below.
Since the "mem_hotplug_lock" here is only to obtain a stable online node
mask while racing with NUMA node hotplug, in the worst case, the results
may me miscalculated while doing NUMA node hotplug, but they shall be
corrected by later reads of the same files.
WARNING: possible circular locking dependency detected
------------------------------------------------------
cat/5224 is trying to acquire lock:
ffff900012ac3120 (mem_hotplug_lock.rw_sem){++++}, at:
show_slab_objects+0x94/0x3a8
but task is already holding lock:
b8ff009693eee398 (kn->count#45){++++}, at: kernfs_seq_start+0x44/0xf0
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #2 (kn->count#45){++++}:
lock_acquire+0x31c/0x360
__kernfs_remove+0x290/0x490
kernfs_remove+0x30/0x44
sysfs_remove_dir+0x70/0x88
kobject_del+0x50/0xb0
sysfs_slab_unlink+0x2c/0x38
shutdown_cache+0xa0/0xf0
kmemcg_cache_shutdown_fn+0x1c/0x34
kmemcg_workfn+0x44/0x64
process_one_work+0x4f4/0x950
worker_thread+0x390/0x4bc
kthread+0x1cc/0x1e8
ret_from_fork+0x10/0x18
-> #1 (slab_mutex){+.+.}:
lock_acquire+0x31c/0x360
__mutex_lock_common+0x16c/0xf78
mutex_lock_nested+0x40/0x50
memcg_create_kmem_cache+0x38/0x16c
memcg_kmem_cache_create_func+0x3c/0x70
process_one_work+0x4f4/0x950
worker_thread+0x390/0x4bc
kthread+0x1cc/0x1e8
ret_from_fork+0x10/0x18
-> #0 (mem_hotplug_lock.rw_sem){++++}:
validate_chain+0xd10/0x2bcc
__lock_acquire+0x7f4/0xb8c
lock_acquire+0x31c/0x360
get_online_mems+0x54/0x150
show_slab_objects+0x94/0x3a8
total_objects_show+0x28/0x34
slab_attr_show+0x38/0x54
sysfs_kf_seq_show+0x198/0x2d4
kernfs_seq_show+0xa4/0xcc
seq_read+0x30c/0x8a8
kernfs_fop_read+0xa8/0x314
__vfs_read+0x88/0x20c
vfs_read+0xd8/0x10c
ksys_read+0xb0/0x120
__arm64_sys_read+0x54/0x88
el0_svc_handler+0x170/0x240
el0_svc+0x8/0xc
other info that might help us debug this:
Chain exists of:
mem_hotplug_lock.rw_sem --> slab_mutex --> kn->count#45
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(kn->count#45);
lock(slab_mutex);
lock(kn->count#45);
lock(mem_hotplug_lock.rw_sem);
*** DEADLOCK ***
3 locks held by cat/5224:
#0: 9eff00095b14b2a0 (&p->lock){+.+.}, at: seq_read+0x4c/0x8a8
#1: 0eff008997041480 (&of->mutex){+.+.}, at: kernfs_seq_start+0x34/0xf0
#2: b8ff009693eee398 (kn->count#45){++++}, at:
kernfs_seq_start+0x44/0xf0
stack backtrace:
Call trace:
dump_backtrace+0x0/0x248
show_stack+0x20/0x2c
dump_stack+0xd0/0x140
print_circular_bug+0x368/0x380
check_noncircular+0x248/0x250
validate_chain+0xd10/0x2bcc
__lock_acquire+0x7f4/0xb8c
lock_acquire+0x31c/0x360
get_online_mems+0x54/0x150
show_slab_objects+0x94/0x3a8
total_objects_show+0x28/0x34
slab_attr_show+0x38/0x54
sysfs_kf_seq_show+0x198/0x2d4
kernfs_seq_show+0xa4/0xcc
seq_read+0x30c/0x8a8
kernfs_fop_read+0xa8/0x314
__vfs_read+0x88/0x20c
vfs_read+0xd8/0x10c
ksys_read+0xb0/0x120
__arm64_sys_read+0x54/0x88
el0_svc_handler+0x170/0x240
el0_svc+0x8/0xc
I think it is important to mention that this doesn't expose the
show_slab_objects to use-after-free. There is only a single path that
might really race here and that is the slab hotplug notifier callback
__kmem_cache_shrink (via slab_mem_going_offline_callback) but that path
doesn't really destroy kmem_cache_node data structures.
[1] http://lkml.iu.edu/hypermail/linux/kernel/1101.0/02850.html
[[email protected]: add comment explaining why we don't need mem_hotplug_lock]
Link: http://lkml.kernel.org/r/[email protected]
Fixes: 01fb58bcba63 ("slab: remove synchronous synchronize_sched() from memcg cache deactivation path")
Fixes: 03afc0e25f7f ("slab: get_online_mems for kmem_cache_{create,destroy,shrink}")
Signed-off-by: Qian Cai <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Vladimir Davydov <[email protected]>
Cc: Roman Gushchin <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Commit 37389167a281 ("mm, page_owner: keep owner info when freeing the
page") has introduced a flag PAGE_EXT_OWNER_ACTIVE to indicate that page
is tracked as being allocated. Kirril suggested naming it
PAGE_EXT_OWNER_ALLOCATED to make it more clear, as "active is somewhat
loaded term for a page".
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Vlastimil Babka <[email protected]>
Suggested-by: Kirill A. Shutemov <[email protected]>
Cc: Andrey Ryabinin <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Walter Wu <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|