aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2021-07-08s390/linkage: increase asm symbols alignment to 16Vasily Gorbik1-1/+1
Both clang and gcc (for -march=z13 and later) align functions to 16 bytes at -O2 to benefit branch prediction. Make asm symbols alignment consistent with that. This also benefits potential ftrace code patching, which is currently able to patch 8 aligned bytes at once. With defconfig this currently increases .text size by 4104 bytes. Reviewed-by: Heiko Carstens <[email protected]> Signed-off-by: Vasily Gorbik <[email protected]>
2021-07-08s390: rename CALL_ON_STACK_NORETURN() to call_on_stack_noreturn()Heiko Carstens3-3/+3
Lower case matches the call_on_stack() macro and is easier to read. Reviewed-by: Sven Schnelle <[email protected]> Signed-off-by: Heiko Carstens <[email protected]> Signed-off-by: Vasily Gorbik <[email protected]>
2021-07-08s390: add type checking to CALL_ON_STACK_NORETURN() macroHeiko Carstens1-1/+3
Make sure the to be called function takes no arguments (and returns void). Otherwise usage of CALL_ON_STACK_NORETURN() would generate broken code. Reviewed-by: Sven Schnelle <[email protected]> Signed-off-by: Heiko Carstens <[email protected]> Signed-off-by: Vasily Gorbik <[email protected]>
2021-07-08s390: remove old CALL_ON_STACK() macroHeiko Carstens1-37/+0
Reviewed-by: Sven Schnelle <[email protected]> Signed-off-by: Heiko Carstens <[email protected]> Signed-off-by: Vasily Gorbik <[email protected]>
2021-07-08s390/softirq: use call_on_stack() macroHeiko Carstens1-1/+1
Reviewed-by: Sven Schnelle <[email protected]> Signed-off-by: Heiko Carstens <[email protected]> Signed-off-by: Vasily Gorbik <[email protected]>
2021-07-08s390/lib: use call_on_stack() macroHeiko Carstens1-2/+3
Reviewed-by: Sven Schnelle <[email protected]> Signed-off-by: Heiko Carstens <[email protected]> Signed-off-by: Vasily Gorbik <[email protected]>
2021-07-08s390/smp: use call_on_stack() macroHeiko Carstens1-4/+8
Reviewed-by: Sven Schnelle <[email protected]> Signed-off-by: Heiko Carstens <[email protected]> Signed-off-by: Vasily Gorbik <[email protected]>
2021-07-08s390/kexec: use call_on_stack() macroHeiko Carstens1-1/+2
Reviewed-by: Sven Schnelle <[email protected]> Signed-off-by: Heiko Carstens <[email protected]> Signed-off-by: Vasily Gorbik <[email protected]>
2021-07-08s390/irq: use call_on_stack() macroHeiko Carstens1-3/+5
Reviewed-by: Sven Schnelle <[email protected]> Signed-off-by: Heiko Carstens <[email protected]> Signed-off-by: Vasily Gorbik <[email protected]>
2021-07-08s390/mm: use call_on_stack() macroHeiko Carstens1-4/+9
Reviewed-by: Sven Schnelle <[email protected]> Signed-off-by: Heiko Carstens <[email protected]> Signed-off-by: Vasily Gorbik <[email protected]>
2021-07-08s390: introduce proper type handling call_on_stack() macroHeiko Carstens1-0/+97
The existing CALL_ON_STACK() macro allows for subtle bugs: - There is no type checking of the function that is being called. That is: missing or too many arguments do not cause any compile error or warning. The same is true if the return type of the called function changes. This can lead to quite random bugs. - Sign and zero extension of arguments is missing. Given that the s390 C ABI requires that the caller of a function performs proper sign and zero extension this can also lead to subtle bugs. - If arguments to the CALL_ON_STACK() macros contain functions calls register corruption can happen due to register asm constructs being used. Therefore introduce a new call_on_stack() macro which is supposed to fix all these problems. Reviewed-by: Sven Schnelle <[email protected]> Signed-off-by: Heiko Carstens <[email protected]> Signed-off-by: Vasily Gorbik <[email protected]>
2021-07-08s390/irq: simplify on_async_stack()Heiko Carstens1-1/+1
Make on_async_stack() a bit more readable, even though as usual it depends if one considers "!!!" readable or not. At least the new construct to check if the async stack is in use or not is a bit shorter and generates slightly better code. Reviewed-by: Sven Schnelle <[email protected]> Signed-off-by: Heiko Carstens <[email protected]> Signed-off-by: Vasily Gorbik <[email protected]>
2021-07-08s390/irq: inline do_softirq_own_stack()Heiko Carstens2-8/+13
Move do_softirq_own_stack() to proper header file so it can be inlined; saving a few cycles. Reviewed-by: Sven Schnelle <[email protected]> Signed-off-by: Heiko Carstens <[email protected]> Signed-off-by: Vasily Gorbik <[email protected]>
2021-07-08s390/irq: simplify do_softirq_own_stack()Heiko Carstens1-11/+1
do_softirq_own_stack() is always called from task context and therefore it is not necessary to check if the async stack is currently used. Remove the check and directly switch to async stack. Reviewed-by: Sven Schnelle <[email protected]> Signed-off-by: Heiko Carstens <[email protected]> Signed-off-by: Vasily Gorbik <[email protected]>
2021-07-08s390/ap: get rid of register asm in ap_dqap()Harald Freudenberger1-18/+24
This is the second part of the cleanup for the header file ap.h to remove the register asm statements. This patch deals with the inline ap_dqap() function where within the assembler code an odd register of an register pair is to be addressed. [[email protected]: this intentionally breaks compilation with any clang compilers prior to llvm-project commit 458eac257377 ("[SystemZ] Support the 'N' code for the odd register in inline-asm."). This is hopefully the last clang kernel compile breakage caused by incompatibilities between gcc and clang.] Signed-off-by: Harald Freudenberger <[email protected]> Signed-off-by: Heiko Carstens <[email protected]> Signed-off-by: Vasily Gorbik <[email protected]>
2021-07-08s390: rename PIF_SYSCALL_RESTART to PIF_EXECVE_PGSTE_RESTARTSven Schnelle3-11/+12
PIF_SYSCALL_RESTART is now only used to restart execve when loading PGSTE binaries. Rename the flag to reflect that, and avoid people thinking that this bit has anything to do with generic syscall restarting. Signed-off-by: Sven Schnelle <[email protected]> Reviewed-by: Heiko Carstens <[email protected]> Signed-off-by: Vasily Gorbik <[email protected]>
2021-07-08s390: move restart of execve() syscallSven Schnelle2-15/+20
On s390, execve might have to be restarted for PGSTE binaries like kvm. In the past this was done via the PIF_SYSCALL_RESTART bit. However, with the recent changes, syscalls are now restarted differently. Now that execve() is the only call that might get restarted via PIF_SYSCALL_RESTART, move the loop to do_syscall(). This also has the advantage that the restart is no longer visible to userspace. Signed-off-by: Sven Schnelle <[email protected]> Reviewed-by: Heiko Carstens <[email protected]> Signed-off-by: Vasily Gorbik <[email protected]>
2021-07-08s390/signal: remove sigreturn on stackSven Schnelle4-36/+2
{rt_}sigreturn is now called from the vdso, so we no longer need the svc on the stack, and therefore no hack to support that mechanism on machines with non-executable stack. Signed-off-by: Sven Schnelle <[email protected]> Reviewed-by: Heiko Carstens <[email protected]> Signed-off-by: Vasily Gorbik <[email protected]>
2021-07-08s390/signal: switch to using vdso for sigreturn and syscall restartSven Schnelle4-31/+31
with generic entry, there's a bug when it comes to restarting of signals. The failing sequence is: a) a signal is coming in, and no handler is registered, so the lower part of arch_do_signal_or_restart() in arch/s390/kernel/signal.c sets PIF_SYSCALL_RESTART. b) a second signal gets pending while the kernel is still in the exit loop, and for that one, a handler exists. c) The first part of arch_do_signal_or_restart() is called. That part calls handle_signal(), which sets up stack + registers for handling the signal. d) __do_syscall() in arch/s390/kernel/syscall.c checks for PIF_SYSCALL_RESTART right before leaving to userspace. If it is set, it restart's the syscall. However, the registers are already setup for handling a signal from c). The syscall is now restarted with the wrong arguments. Change the code to: - use vdso for syscall_restart() instead of PIF_SYSCALL_RESTART because we cannot rewind and go back to userspace on s390 because the system call number might be encoded in the svc instruction. - for all other syscalls we rewind the PSW and return to userspace. Cc: <[email protected]> # v5.12+ d57778feb987: s390/vdso: always enable vdso Cc: <[email protected]> # v5.12+ 686341f2548b: s390/vdso64: add sigreturn,rt_sigreturn and restart_syscall Cc: <[email protected]> # v5.12+ 43e1f76b0b69: s390/vdso: rename VDSO64_LBASE to VDSO_LBASE Cc: <[email protected]> # v5.12+ 779df2248739: s390/vdso: add minimal compat vdso Cc: <[email protected]> # v5.12+ Signed-off-by: Sven Schnelle <[email protected]> Reviewed-by: Heiko Carstens <[email protected]> Signed-off-by: Vasily Gorbik <[email protected]>
2021-07-08io_uring: mitigate unlikely iopoll lagPavel Begunkov1-1/+5
We have requests like IORING_OP_FILES_UPDATE that don't go through ->iopoll_list but get completed in place under ->uring_lock, and so after dropping the lock io_iopoll_check() should expect that some CQEs might have get completed in a meanwhile. Currently such events won't be accounted in @nr_events, and the loop will continue to poll even if there is enough of CQEs. It shouldn't be a problem as it's not likely to happen and so, but not nice either. Just return earlier in this case, it should be enough. Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/66ef932cc66a34e3771bbae04b2953a8058e9d05.1625747741.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
2021-07-08Merge tag 'drm-next-2021-07-08-1' of git://anongit.freedesktop.org/drm/drmLinus Torvalds53-504/+1048
Pull drm fixes from Dave Airlie: "Some fixes for rc1 that came in the past weeks, mainly a bunch of amdgpu fixes, some i915 and the rest are misc around the place. I'm sending this a bit early so some more stuff may show up, but I'll probably take tomorrow off. dma-buf: - doc fixes amdgpu: - Misc Navi fixes - Powergating fix - Yellow Carp updates - Beige Goby updates - S0ix fix - Revert overlay validation fix - GPU reset fix for DC - PPC64 fix - Add new dimgrey cavefish DID - RAS fix - TTM fixes amdkfd: - SVM fixes radeon: - Fix missing drm_gem_object_put in error path - NULL ptr deref fix i915: - display DP VSC fix - DG1 display fix - IRQ fixes - IRQ demidlayering gma500: - bo leaks in error paths fixed" * tag 'drm-next-2021-07-08-1' of git://anongit.freedesktop.org/drm/drm: (52 commits) drm/i915: Drop all references to DRM IRQ midlayer drm/i915: Use the correct IRQ during resume drm/i915/display/dg1: Correctly map DPLLs during state readout drm/i915/display: Do not zero past infoframes.vsc drm/amdgpu: Conditionally reset SDMA RAS error counts drm/amdkfd: Maintain svm_bo reference in page->zone_device_data drm/amdkfd: add invalid pages debug at vram migration drm/amdkfd: skip migration for pages already in VRAM drm/amdkfd: skip invalid pages during migrations drm/amdkfd: classify and map mixed svm range pages in GPU drm/amdkfd: use hmm range fault to get both domain pfns drm/amdgpu: get owner ref in validate and map drm/amdkfd: set owner ref to svm range prefault drm/amdkfd: add owner ref param to get hmm pages drm/amdkfd: device pgmap owner at the svm migrate init drm/amdkfd: inc counter on child ranges with xnack off drm/amd/display: Extend DMUB diagnostic logging to DCN3.1 drm/amdgpu: Update NV SIMD-per-CU to 2 drm/amdgpu: add new dimgrey cavefish DID drm/amd/pm: skip PrepareMp1ForUnload message in s0ix ...
2021-07-08Merge tag 'pwm/for-5.14-rc1' of ↵Linus Torvalds50-642/+710
git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm Pull pwm updates from Thierry Reding: "This contains mostly various fixes, cleanups and some conversions to the atomic API. One noteworthy change is that PWM consumers can now pass a hint to the PWM core about the PWM usage, enabling PWM providers to implement various optimizations. There's also a fair bit of simplification here with the addition of some device-managed helpers as well as unification between the DT and ACPI firmware interfaces" * tag 'pwm/for-5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm: (50 commits) pwm: Remove redundant assignment to pointer pwm pwm: ep93xx: Fix read of uninitialized variable ret pwm: ep93xx: Prepare clock before using it pwm: ep93xx: Unfold legacy callbacks into ep93xx_pwm_apply() pwm: ep93xx: Implement .apply callback pwm: vt8500: Only unprepare the clock after the pwmchip was removed pwm: vt8500: Drop if with an always false condition pwm: tegra: Assert reset only after the PWM was unregistered pwm: tegra: Don't needlessly enable and disable the clock in .remove() pwm: tegra: Don't modify HW state in .remove callback pwm: tegra: Drop an if block with an always false condition pwm: core: Simplify some devm_*pwm*() functions pwm: core: Remove unused devm_pwm_put() pwm: core: Unify fwnode checks in the module pwm: core: Reuse fwnode_to_pwmchip() in ACPI case pwm: core: Convert to use fwnode for matching docs: firmware-guide: ACPI: Add a PWM example dt-bindings: pwm: pwm-tiecap: Add compatible string for AM64 SoC dt-bindings: pwm: pwm-tiecap: Convert to json schema pwm: sprd: Don't check the return code of pwmchip_remove() ...
2021-07-08Merge tag 'clk-for-linus' of ↵Linus Torvalds7-54/+107
git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux Pull more clk updates from Stephen Boyd: - A handful of fixes for lmk04832 driver - Migrate the basic clk divider to use determine rate ops - Fix modpost build for hisilicon hi3559a driver - Actually set the parent in k210_clk_set_parent() * tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: Revert "clk: divider: Switch from .round_rate to .determine_rate by default" clk: hisilicon: hi3559a: Drop __init markings everywhere clk: meson: regmap: switch to determine_rate for the dividers clk: divider: Switch from .round_rate to .determine_rate by default clk: divider: Add re-usable determine_rate implementations clk: k210: Fix k210_clk_set_parent() clk: lmk04832: Fix spelling mistakes in dev_err messages and comments clk: lmk04832: fix return value check in lmk04832_probe() clk: stm32mp1: fix missing spin_lock_init()
2021-07-08Merge tag 'pci-v5.14-changes' of ↵Linus Torvalds52-429/+722
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull pci updates from Bjorn Helgaas: "Enumeration: - Fix dsm_label_utf16s_to_utf8s() buffer overrun (Krzysztof Wilczyński) - Rely on lengths from scnprintf(), dsm_label_utf16s_to_utf8s() (Krzysztof Wilczyński) - Use sysfs_emit() and sysfs_emit_at() in "show" functions (Krzysztof Wilczyński) - Fix 'resource_alignment' newline issues (Krzysztof Wilczyński) - Add 'devspec' newline (Krzysztof Wilczyński) - Dynamically map ECAM regions (Russell King) Resource management: - Coalesce host bridge contiguous apertures (Kai-Heng Feng) PCIe native device hotplug: - Ignore Link Down/Up caused by DPC (Lukas Wunner) Power management: - Leave Apple Thunderbolt controllers on for s2idle or standby (Konstantin Kharlamov) Virtualization: - Work around Huawei Intelligent NIC VF FLR erratum (Chiqijun) - Clarify error message for unbound IOV devices (Moritz Fischer) - Add pci_reset_bus_function() Secondary Bus Reset interface (Raphael Norwitz) Peer-to-peer DMA: - Simplify distance calculation (Christoph Hellwig) - Finish RCU conversion of pdev->p2pdma (Eric Dumazet) - Rename upstream_bridge_distance() and rework doc (Logan Gunthorpe) - Collect acs list in stack buffer to avoid sleeping (Logan Gunthorpe) - Use correct calc_map_type_and_dist() return type (Logan Gunthorpe) - Warn if host bridge not in whitelist (Logan Gunthorpe) - Refactor pci_p2pdma_map_type() (Logan Gunthorpe) - Avoid pci_get_slot(), which may sleep (Logan Gunthorpe) Altera PCIe controller driver: - Add Joyce Ooi as Altera PCIe maintainer (Joyce Ooi) Broadcom iProc PCIe controller driver: - Fix multi-MSI base vector number allocation (Sandor Bodo-Merle) - Support multi-MSI only on uniprocessor kernel (Sandor Bodo-Merle) Freescale i.MX6 PCIe controller driver: - Limit DBI register length for imx6qp PCIe (Richard Zhu) - Add "vph-supply" for PHY supply voltage (Richard Zhu) - Enable PHY internal regulator when supplied >3V (Richard Zhu) - Remove imx6_pcie_probe() redundant error message (Zhen Lei) Intel Gateway PCIe controller driver: - Fix INTx enable (Martin Blumenstingl) Marvell Aardvark PCIe controller driver: - Fix checking for PIO Non-posted Request (Pali Rohár) - Implement workaround for the readback value of VEND_ID (Pali Rohár) MediaTek PCIe controller driver: - Remove redundant error printing in mtk_pcie_subsys_powerup() (Zhen Lei) MediaTek PCIe Gen3 controller driver: - Add missing MODULE_DEVICE_TABLE (Zou Wei) Microchip PolarFlare PCIe controller driver: - Make struct event_descs static (Krzysztof Wilczyński) Microsoft Hyper-V host bridge driver: - Fix race condition when removing the device (Long Li) - Remove bus device removal unused refcount/functions (Long Li) Mobiveil PCIe controller driver: - Remove unused readl and writel functions (Krzysztof Wilczyński) NVIDIA Tegra PCIe controller driver: - Add missing MODULE_DEVICE_TABLE (Zou Wei) NVIDIA Tegra194 PCIe controller driver: - Fix tegra_pcie_ep_raise_msi_irq() ill-defined shift (Jon Hunter) - Fix host initialization during resume (Vidya Sagar) Rockchip PCIe controller driver: - Register IRQ handlers after device and data are ready (Javier Martinez Canillas)" * tag 'pci-v5.14-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (48 commits) PCI/P2PDMA: Finish RCU conversion of pdev->p2pdma PCI: xgene: Annotate __iomem pointer PCI: Fix kernel-doc formatting PCI: cpcihp: Declare cpci_debug in header file MAINTAINERS: Add Joyce Ooi as Altera PCIe maintainer PCI: rockchip: Register IRQ handlers after device and data are ready PCI: tegra194: Fix tegra_pcie_ep_raise_msi_irq() ill-defined shift PCI: aardvark: Implement workaround for the readback value of VEND_ID PCI: aardvark: Fix checking for PIO Non-posted Request PCI: tegra194: Fix host initialization during resume PCI: tegra: Add missing MODULE_DEVICE_TABLE PCI: imx6: Enable PHY internal regulator when supplied >3V dt-bindings: imx6q-pcie: Add "vph-supply" for PHY supply voltage PCI: imx6: Limit DBI register length for imx6qp PCIe PCI: imx6: Remove imx6_pcie_probe() redundant error message PCI: intel-gw: Fix INTx enable PCI: iproc: Support multi-MSI only on uniprocessor kernel PCI: iproc: Fix multi-MSI base vector number allocation PCI: mediatek-gen3: Add missing MODULE_DEVICE_TABLE PCI: Dynamically map ECAM regions ...
2021-07-09scripts: add generic syscallnr.shMasahiro Yamada1-0/+74
Like syscallhdr.sh and syscalltbl.sh, add a simple script to generate the __NR_syscalls, which should not be exported to userspace. This script is useful to replace arch/mips/kernel/syscalls/syscallnr.sh, refactor arch/s390/kernel/syscalls/syscalltbl, and eliminate the code surrounded by #ifdef __KERNEL__ / #endif from exported uapi/asm/unistd_*.h files. Signed-off-by: Masahiro Yamada <[email protected]>
2021-07-09scripts: check duplicated syscall number in syscall tableMasahiro Yamada2-2/+7
Currently, syscall{hdr,tbl}.sh sorts the entire syscall table, but you can assume it is already sorted by the syscall number. The generated syscall table does not work if the same syscall number appears twice. Check it in the script. Signed-off-by: Masahiro Yamada <[email protected]>
2021-07-08powerpc/mm: enable HAVE_MOVE_PMD supportAneesh Kumar K.V1-0/+2
mremap HAVE_MOVE_PMD/PUD optimization time comparison for 1GB region: 1GB mremap - Source PTE-aligned, Destination PTE-aligned mremap time: 2292772ns 1GB mremap - Source PMD-aligned, Destination PMD-aligned mremap time: 1158928ns 1GB mremap - Source PUD-aligned, Destination PUD-aligned mremap time: 63886ns Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Aneesh Kumar K.V <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Joel Fernandes <[email protected]> Cc: Kalesh Singh <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Stephen Rothwell <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-07-08powerpc/book3s64/mm: update flush_tlb_range to flush page walk cacheAneesh Kumar K.V3-18/+36
flush_tlb_range is special in that we don't specify the page size used for the translation. Hence when flushing TLB we flush the translation cache for all possible page sizes. The kernel also uses the same interface when moving page tables around. Such a move requires us to flush the page walk cache. Instead of adding another interface to force page walk cache flush, update flush_tlb_range to flush page walk cache if the range flushed is more than the PMD range. A page table move will always involve an invalidate range more than PMD_SIZE. Running microbenchmark with mprotect and parallel memory access didn't show any observable performance impact. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Aneesh Kumar K.V <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Joel Fernandes <[email protected]> Cc: Kalesh Singh <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Stephen Rothwell <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-07-08mm/mremap: allow arch runtime overrideAneesh Kumar K.V2-1/+20
Patch series "Speedup mremap on ppc64", v8. This patchset enables MOVE_PMD/MOVE_PUD support on power. This requires the platform to support updating higher-level page tables without updating page table entries. This also needs to invalidate the Page Walk Cache on architecture supporting the same. This patch (of 3): Architectures like ppc64 support faster mremap only with radix translation. Hence allow a runtime check w.r.t support for fast mremap. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Aneesh Kumar K.V <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Kalesh Singh <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Joel Fernandes <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: "Aneesh Kumar K . V" <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Stephen Rothwell <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-07-08mm/mremap: hold the rmap lock in write mode when moving page table entries.Aneesh Kumar K.V1-2/+2
To avoid a race between rmap walk and mremap, mremap does take_rmap_locks(). The lock was taken to ensure that rmap walk don't miss a page table entry due to PTE moves via move_pagetables(). The kernel does further optimization of this lock such that if we are going to find the newly added vma after the old vma, the rmap lock is not taken. This is because rmap walk would find the vmas in the same order and if we don't find the page table attached to older vma we would find it with the new vma which we would iterate later. As explained in commit eb66ae030829 ("mremap: properly flush TLB before releasing the page") mremap is special in that it doesn't take ownership of the page. The optimized version for PUD/PMD aligned mremap also doesn't hold the ptl lock. This can result in stale TLB entries as show below. This patch updates the rmap locking requirement in mremap to handle the race condition explained below with optimized mremap:: Optmized PMD move CPU 1 CPU 2 CPU 3 mremap(old_addr, new_addr) page_shrinker/try_to_unmap_one mmap_write_lock_killable() addr = old_addr lock(pte_ptl) lock(pmd_ptl) pmd = *old_pmd pmd_clear(old_pmd) flush_tlb_range(old_addr) *new_pmd = pmd *new_addr = 10; and fills TLB with new addr and old pfn unlock(pmd_ptl) ptep_clear_flush() old pfn is free. Stale TLB entry Optimized PUD move also suffers from a similar race. Both the above race condition can be fixed if we force mremap path to take rmap lock. Link: https://lkml.kernel.org/r/[email protected] Fixes: 2c91bd4a4e2e ("mm: speed up mremap by 20x on large regions") Fixes: c49dd3401802 ("mm: speedup mremap on 1GB or larger regions") Link: https://lore.kernel.org/linux-mm/CAHk-=wgXVR04eBNtxQfevontWnP6FDm+oj5vauQXP3S-huwbPw@mail.gmail.com Signed-off-by: Aneesh Kumar K.V <[email protected]> Acked-by: Hugh Dickins <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Joel Fernandes <[email protected]> Cc: Kalesh Singh <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Stephen Rothwell <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-07-08mm/mremap: use pmd/pud_poplulate to update page table entriesAneesh Kumar K.V1-4/+3
pmd/pud_populate is the right interface to be used to set the respective page table entries. Some architectures like ppc64 do assume that set_pmd/pud_at can only be used to set a hugepage PTE. Since we are not setting up a hugepage PTE here, use the pmd/pud_populate interface. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Aneesh Kumar K.V <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Joel Fernandes <[email protected]> Cc: Kalesh Singh <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Stephen Rothwell <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-07-08mm/mremap: don't enable optimized PUD move if page table levels is 2Aneesh Kumar K.V1-1/+1
With two level page table don't enable move_normal_pud. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Aneesh Kumar K.V <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Joel Fernandes <[email protected]> Cc: Kalesh Singh <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Stephen Rothwell <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-07-08mm/mremap: convert huge PUD move to separate helperAneesh Kumar K.V1-7/+73
With TRANSPARENT_HUGEPAGE_PUD enabled the kernel can find huge PUD entries. Add a helper to move huge PUD entries on mremap(). This will be used by a later patch to optimize mremap of PUD_SIZE aligned level 4 PTE mapped address This also make sure we support mremap on huge PUD entries even with CONFIG_HAVE_MOVE_PUD disabled. [[email protected]: fix build failure with clang-10] Link: https://lore.kernel.org/lkml/YMuOSnJsL9qkxweY@archlinux-ax161 Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Aneesh Kumar K.V <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Joel Fernandes <[email protected]> Cc: Kalesh Singh <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Stephen Rothwell <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-07-08selftest/mremap_test: avoid crash with static buildAneesh Kumar K.V1-2/+3
With a large mmap map size, we can overlap with the text area and using MAP_FIXED results in unmapping that area. Switch to MAP_FIXED_NOREPLACE and handle the EEXIST error. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Aneesh Kumar K.V <[email protected]> Reviewed-by: Kalesh Singh <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Joel Fernandes <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Stephen Rothwell <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-07-08selftest/mremap_test: update the test to handle pagesize other than 4KAneesh Kumar K.V1-52/+61
Patch series "mrermap fixes", v2. This patch (of 6): Instead of hardcoding 4K page size fetch it using sysconf(). For the performance measurements test still assume 2M and 1G are hugepage sizes. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Aneesh Kumar K.V <[email protected]> Reviewed-by: Kalesh Singh <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Joel Fernandes <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Stephen Rothwell <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-07-08mm: rename p4d_page_vaddr to p4d_pgtable and make it return pud_t *Aneesh Kumar K.V13-18/+25
No functional change in this patch. [[email protected]: m68k build error reported by kernel robot] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Link: https://lore.kernel.org/linuxppc-dev/CAHk-=wi+J+iodze9FtjM3Zi4j4OeS+qqbKxME9QN4roxPEXH9Q@mail.gmail.com/ Signed-off-by: Aneesh Kumar K.V <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Joel Fernandes <[email protected]> Cc: Kalesh Singh <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Stephen Rothwell <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-07-08mm: rename pud_page_vaddr to pud_pgtable and make it return pmd_t *Aneesh Kumar K.V22-36/+46
No functional change in this patch. [[email protected]: fix] Link: https://lkml.kernel.org/r/[email protected] [[email protected]: another fix] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Link: https://lore.kernel.org/linuxppc-dev/CAHk-=wi+J+iodze9FtjM3Zi4j4OeS+qqbKxME9QN4roxPEXH9Q@mail.gmail.com/ Signed-off-by: Aneesh Kumar K.V <[email protected]> Signed-off-by: Stephen Rothwell <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Joel Fernandes <[email protected]> Cc: Kalesh Singh <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Stephen Rothwell <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-07-08kdump: use vmlinux_build_id to simplifyStephen Boyd4-56/+10
We can use the vmlinux_build_id array here now instead of open coding it. This mostly consolidates code. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Stephen Boyd <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Jessica Yu <[email protected]> Cc: Evan Green <[email protected]> Cc: Hsin-Yi Wang <[email protected]> Cc: Dave Young <[email protected]> Cc: Baoquan He <[email protected]> Cc: Vivek Goyal <[email protected]> Cc: Andy Shevchenko <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Konstantin Khlebnikov <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Petr Mladek <[email protected]> Cc: Rasmus Villemoes <[email protected]> Cc: Sasha Levin <[email protected]> Cc: Sergey Senozhatsky <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-07-08buildid: fix kernel-doc notationStephen Boyd1-1/+1
Kernel doc should use "Return:" instead of "Returns" to properly reflect the return values. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Stephen Boyd <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Jessica Yu <[email protected]> Cc: Evan Green <[email protected]> Cc: Hsin-Yi Wang <[email protected]> Cc: Andy Shevchenko <[email protected]> Cc: Baoquan He <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Dave Young <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Konstantin Khlebnikov <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Petr Mladek <[email protected]> Cc: Rasmus Villemoes <[email protected]> Cc: Sasha Levin <[email protected]> Cc: Sergey Senozhatsky <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vivek Goyal <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-07-08buildid: mark some arguments constStephen Boyd1-4/+4
These arguments are never modified so they can be marked const to indicate as such. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Stephen Boyd <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Jessica Yu <[email protected]> Cc: Evan Green <[email protected]> Cc: Hsin-Yi Wang <[email protected]> Cc: Andy Shevchenko <[email protected]> Cc: Baoquan He <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Dave Young <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Konstantin Khlebnikov <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Petr Mladek <[email protected]> Cc: Rasmus Villemoes <[email protected]> Cc: Sasha Levin <[email protected]> Cc: Sergey Senozhatsky <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vivek Goyal <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-07-08scripts/decode_stacktrace.sh: indicate 'auto' can be used for base pathStephen Boyd1-1/+1
Add "auto" to the usage message so that it's a little clearer that you can pass "auto" as the second argument. When passing "auto" the script tries to find the base path automatically instead of requiring it be passed on the commandline. Also use [<variable>] to indicate the variable argument and that it is optional so that we can differentiate from the literal "auto" that should be passed. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Stephen Boyd <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Jessica Yu <[email protected]> Cc: Evan Green <[email protected]> Cc: Hsin-Yi Wang <[email protected]> Cc: Konstantin Khlebnikov <[email protected]> Cc: Sasha Levin <[email protected]> Cc: Andy Shevchenko <[email protected]> Cc: Baoquan He <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Dave Young <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Petr Mladek <[email protected]> Cc: Rasmus Villemoes <[email protected]> Cc: Sergey Senozhatsky <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vivek Goyal <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-07-08scripts/decode_stacktrace.sh: silence stderr messages from addr2line/nmStephen Boyd1-3/+3
Sometimes if you're using tools that have linked things improperly or have new features/sections that older tools don't expect you'll see warnings printed to stderr. We don't really care about these warnings, so let's just silence these messages to cleanup output of this script. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Stephen Boyd <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Jessica Yu <[email protected]> Cc: Evan Green <[email protected]> Cc: Hsin-Yi Wang <[email protected]> Cc: Konstantin Khlebnikov <[email protected]> Cc: Sasha Levin <[email protected]> Cc: Andy Shevchenko <[email protected]> Cc: Baoquan He <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Dave Young <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Petr Mladek <[email protected]> Cc: Rasmus Villemoes <[email protected]> Cc: Sergey Senozhatsky <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vivek Goyal <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-07-08scripts/decode_stacktrace.sh: support debuginfodStephen Boyd1-11/+70
Now that stacktraces contain the build ID information we can update this script to use debuginfod-find to locate the debuginfo for the vmlinux and modules automatically. This can replace the existing code that requires specifying a path to vmlinux or tries to find the vmlinux and modules automatically by using the release number. Work it into the script as a fallback option if the vmlinux isn't specified on the commandline. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Stephen Boyd <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Jessica Yu <[email protected]> Cc: Evan Green <[email protected]> Cc: Hsin-Yi Wang <[email protected]> Cc: Konstantin Khlebnikov <[email protected]> Cc: Sasha Levin <[email protected]> Cc: Petr Mladek <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Andy Shevchenko <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Baoquan He <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Dave Young <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Rasmus Villemoes <[email protected]> Cc: Sergey Senozhatsky <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vivek Goyal <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-07-08x86/dumpstack: use %pSb/%pBb for backtrace printingStephen Boyd1-1/+1
Let's use the new printk formats to print the stacktrace entries when printing a backtrace to the kernel logs. This will include any module's build ID[1] in it so that offline/crash debugging can easily locate the debuginfo for a module via something like debuginfod[2]. Link: https://lkml.kernel.org/r/[email protected] Link: https://fedoraproject.org/wiki/Releases/FeatureBuildId [1] Link: https://sourceware.org/elfutils/Debuginfod.html [2] Signed-off-by: Stephen Boyd <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Jessica Yu <[email protected]> Cc: Evan Green <[email protected]> Cc: Hsin-Yi Wang <[email protected]> Cc: Petr Mladek <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Andy Shevchenko <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Baoquan He <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Dave Young <[email protected]> Cc: Konstantin Khlebnikov <[email protected]> Cc: Rasmus Villemoes <[email protected]> Cc: Sasha Levin <[email protected]> Cc: Sergey Senozhatsky <[email protected]> Cc: Vivek Goyal <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-07-08arm64: stacktrace: use %pSb for backtrace printingStephen Boyd1-1/+1
Let's use the new printk format to print the stacktrace entry when printing a backtrace to the kernel logs. This will include any module's build ID[1] in it so that offline/crash debugging can easily locate the debuginfo for a module via something like debuginfod[2]. Link: https://lkml.kernel.org/r/[email protected] Link: https://fedoraproject.org/wiki/Releases/FeatureBuildId [1] Link: https://sourceware.org/elfutils/Debuginfod.html [2] Signed-off-by: Stephen Boyd <[email protected]> Acked-by: Catalin Marinas <[email protected]> Cc: Will Deacon <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Jessica Yu <[email protected]> Cc: Evan Green <[email protected]> Cc: Hsin-Yi Wang <[email protected]> Cc: Petr Mladek <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Andy Shevchenko <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Baoquan He <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Dave Young <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Konstantin Khlebnikov <[email protected]> Cc: Rasmus Villemoes <[email protected]> Cc: Sasha Levin <[email protected]> Cc: Sergey Senozhatsky <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vivek Goyal <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-07-08module: add printk formats to add module build ID to stacktracesStephen Boyd6-29/+166
Let's make kernel stacktraces easier to identify by including the build ID[1] of a module if the stacktrace is printing a symbol from a module. This makes it simpler for developers to locate a kernel module's full debuginfo for a particular stacktrace. Combined with scripts/decode_stracktrace.sh, a developer can download the matching debuginfo from a debuginfod[2] server and find the exact file and line number for the functions plus offsets in a stacktrace that match the module. This is especially useful for pstore crash debugging where the kernel crashes are recorded in something like console-ramoops and the recovery kernel/modules are different or the debuginfo doesn't exist on the device due to space concerns (the debuginfo can be too large for space limited devices). Originally, I put this on the %pS format, but that was quickly rejected given that %pS is used in other places such as ftrace where build IDs aren't meaningful. There was some discussions on the list to put every module build ID into the "Modules linked in:" section of the stacktrace message but that quickly becomes very hard to read once you have more than three or four modules linked in. It also provides too much information when we don't expect each module to be traversed in a stacktrace. Having the build ID for modules that aren't important just makes things messy. Splitting it to multiple lines for each module quickly explodes the number of lines printed in an oops too, possibly wrapping the warning off the console. And finally, trying to stash away each module used in a callstack to provide the ID of each symbol printed is cumbersome and would require changes to each architecture to stash away modules and return their build IDs once unwinding has completed. Instead, we opt for the simpler approach of introducing new printk formats '%pS[R]b' for "pointer symbolic backtrace with module build ID" and '%pBb' for "pointer backtrace with module build ID" and then updating the few places in the architecture layer where the stacktrace is printed to use this new format. Before: Call trace: lkdtm_WARNING+0x28/0x30 [lkdtm] direct_entry+0x16c/0x1b4 [lkdtm] full_proxy_write+0x74/0xa4 vfs_write+0xec/0x2e8 After: Call trace: lkdtm_WARNING+0x28/0x30 [lkdtm 6c2215028606bda50de823490723dc4bc5bf46f9] direct_entry+0x16c/0x1b4 [lkdtm 6c2215028606bda50de823490723dc4bc5bf46f9] full_proxy_write+0x74/0xa4 vfs_write+0xec/0x2e8 [[email protected]: fix build with CONFIG_MODULES=n, tweak code layout] [[email protected]: fix build when CONFIG_MODULES is not set] Link: https://lkml.kernel.org/r/[email protected] [[email protected]: make kallsyms_lookup_buildid() static] [[email protected]: fix build error when CONFIG_SYSFS is disabled] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Link: https://fedoraproject.org/wiki/Releases/FeatureBuildId [1] Link: https://sourceware.org/elfutils/Debuginfod.html [2] Signed-off-by: Stephen Boyd <[email protected]> Signed-off-by: Bixuan Cui <[email protected]> Signed-off-by: Randy Dunlap <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Jessica Yu <[email protected]> Cc: Evan Green <[email protected]> Cc: Hsin-Yi Wang <[email protected]> Cc: Petr Mladek <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Sergey Senozhatsky <[email protected]> Cc: Andy Shevchenko <[email protected]> Cc: Rasmus Villemoes <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Baoquan He <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Dave Young <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Konstantin Khlebnikov <[email protected]> Cc: Sasha Levin <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vivek Goyal <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-07-08dump_stack: add vmlinux build ID to stack tracesStephen Boyd4-2/+28
Add the running kernel's build ID[1] to the stacktrace information header. This makes it simpler for developers to locate the vmlinux with full debuginfo for a particular kernel stacktrace. Combined with scripts/decode_stracktrace.sh, a developer can download the correct vmlinux from a debuginfod[2] server and find the exact file and line number for the functions plus offsets in a stacktrace. This is especially useful for pstore crash debugging where the kernel crashes are recorded in the pstore logs and the recovery kernel is different or the debuginfo doesn't exist on the device due to space concerns (the data can be large and a security concern). The stacktrace can be analyzed after the crash by using the build ID to find the matching vmlinux and understand where in the function something went wrong. Example stacktrace from lkdtm: WARNING: CPU: 4 PID: 3255 at drivers/misc/lkdtm/bugs.c:83 lkdtm_WARNING+0x28/0x30 [lkdtm] Modules linked in: lkdtm rfcomm algif_hash algif_skcipher af_alg xt_cgroup uinput xt_MASQUERADE CPU: 4 PID: 3255 Comm: bash Not tainted 5.11 #3 aa23f7a1231c229de205662d5a9e0d4c580f19a1 Hardware name: Google Lazor (rev3+) with KB Backlight (DT) pstate: 00400009 (nzcv daif +PAN -UAO -TCO BTYPE=--) pc : lkdtm_WARNING+0x28/0x30 [lkdtm] The hex string aa23f7a1231c229de205662d5a9e0d4c580f19a1 is the build ID, following the kernel version number. Put it all behind a config option, STACKTRACE_BUILD_ID, so that kernel developers can remove this information if they decide it is too much. Link: https://lkml.kernel.org/r/[email protected] Link: https://fedoraproject.org/wiki/Releases/FeatureBuildId [1] Link: https://sourceware.org/elfutils/Debuginfod.html [2] Signed-off-by: Stephen Boyd <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Jessica Yu <[email protected]> Cc: Evan Green <[email protected]> Cc: Hsin-Yi Wang <[email protected]> Cc: Petr Mladek <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Andy Shevchenko <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Baoquan He <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Dave Young <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Konstantin Khlebnikov <[email protected]> Cc: Rasmus Villemoes <[email protected]> Cc: Sasha Levin <[email protected]> Cc: Sergey Senozhatsky <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vivek Goyal <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-07-08buildid: stash away kernels build ID on initStephen Boyd3-0/+20
Parse the kernel's build ID at initialization so that other code can print a hex format string representation of the running kernel's build ID. This will be used in the kdump and dump_stack code so that developers can easily locate the vmlinux debug symbols for a crash/stacktrace. [[email protected]: fix implicit declaration of init_vmlinux_build_id()] Link: https://lkml.kernel.org/r/CAE-0n51UjTbay8N9FXAyE7_aR2+ePrQnKSRJ0gbmRsXtcLBVaw@mail.gmail.com Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Stephen Boyd <[email protected]> Acked-by: Baoquan He <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Jessica Yu <[email protected]> Cc: Evan Green <[email protected]> Cc: Hsin-Yi Wang <[email protected]> Cc: Dave Young <[email protected]> Cc: Vivek Goyal <[email protected]> Cc: Andy Shevchenko <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Konstantin Khlebnikov <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Petr Mladek <[email protected]> Cc: Rasmus Villemoes <[email protected]> Cc: Sasha Levin <[email protected]> Cc: Sergey Senozhatsky <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-07-08buildid: add API to parse build ID out of bufferStephen Boyd2-13/+38
Add an API that can parse the build ID out of a buffer, instead of a vma, to support printing a kernel module's build ID for stack traces. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Stephen Boyd <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Jessica Yu <[email protected]> Cc: Evan Green <[email protected]> Cc: Hsin-Yi Wang <[email protected]> Cc: Andy Shevchenko <[email protected]> Cc: Baoquan He <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Dave Young <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Konstantin Khlebnikov <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Petr Mladek <[email protected]> Cc: Rasmus Villemoes <[email protected]> Cc: Sasha Levin <[email protected]> Cc: Sergey Senozhatsky <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vivek Goyal <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-07-08buildid: only consider GNU notes for build ID parsingStephen Boyd1-0/+1
Patch series "Add build ID to stacktraces", v6. This series adds the kernel's build ID[1] to the stacktrace header printed in oops messages, warnings, etc. and the build ID for any module that appears in the stacktrace after the module name. The goal is to make the stacktrace more self-contained and descriptive by including the relevant build IDs in the kernel logs when something goes wrong. This can be used by post processing tools like script/decode_stacktrace.sh and kernel developers to easily locate the debug info associated with a kernel crash and line up what line and file things started falling apart at. To show how this can be used I've included a patch to decode_stacktrace.sh that downloads the debuginfo from a debuginfod server. This also includes some patches to make the buildid.c file use more const arguments and consolidate logic into buildid.c from kdump. These are left to the end as they were mostly cleanup patches. Here's an example lkdtm stacktrace on arm64. WARNING: CPU: 4 PID: 3255 at drivers/misc/lkdtm/bugs.c:83 lkdtm_WARNING+0x28/0x30 [lkdtm] Modules linked in: lkdtm rfcomm algif_hash algif_skcipher af_alg xt_cgroup uinput xt_MASQUERADE CPU: 4 PID: 3255 Comm: bash Not tainted 5.11 #3 aa23f7a1231c229de205662d5a9e0d4c580f19a1 Hardware name: Google Lazor (rev3+) with KB Backlight (DT) pstate: 00400009 (nzcv daif +PAN -UAO -TCO BTYPE=--) pc : lkdtm_WARNING+0x28/0x30 [lkdtm] lr : lkdtm_do_action+0x24/0x40 [lkdtm] sp : ffffffc0134fbca0 x29: ffffffc0134fbca0 x28: ffffff92d53ba240 x27: 0000000000000000 x26: 0000000000000000 x25: 0000000000000000 x24: ffffffe3622352c0 x23: 0000000000000020 x22: ffffffe362233366 x21: ffffffe3622352e0 x20: ffffffc0134fbde0 x19: 0000000000000008 x18: 0000000000000000 x17: ffffff929b6536fc x16: 0000000000000000 x15: 0000000000000000 x14: 0000000000000012 x13: ffffffe380ed892c x12: ffffffe381d05068 x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000001 x8 : ffffffe362237000 x7 : aaaaaaaaaaaaaaaa x6 : 0000000000000000 x5 : 0000000000000000 x4 : 0000000000000001 x3 : 0000000000000008 x2 : ffffff93fef25a70 x1 : ffffff93fef15788 x0 : ffffffe3622352e0 Call trace: lkdtm_WARNING+0x28/0x30 [lkdtm ed5019fdf5e53be37cb1ba7899292d7e143b259e] direct_entry+0x16c/0x1b4 [lkdtm ed5019fdf5e53be37cb1ba7899292d7e143b259e] full_proxy_write+0x74/0xa4 vfs_write+0xec/0x2e8 ksys_write+0x84/0xf0 __arm64_sys_write+0x24/0x30 el0_svc_common+0xf4/0x1c0 do_el0_svc_compat+0x28/0x3c el0_svc_compat+0x10/0x1c el0_sync_compat_handler+0xa8/0xcc el0_sync_compat+0x178/0x180 ---[ end trace 3d95032303e59e68 ]--- This patch (of 13): Some kernel elf files have various notes that also happen to have an elf note type of '3', which matches NT_GNU_BUILD_ID but the note name isn't "GNU". For example, this note trips up the existing logic: Owner Data size Description Xen 0x00000008 Unknown note type: (0x00000003) description data: 00 00 00 ffffff80 ffffffff ffffffff ffffffff ffffffff Let's make sure that it is a GNU note when parsing the build ID so that we can use this function to parse a vmlinux's build ID too. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Fixes: bd7525dacd7e ("bpf: Move stack_map_get_build_id into lib") Signed-off-by: Stephen Boyd <[email protected]> Reported-by: Petr Mladek <[email protected]> Tested-by: Petr Mladek <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Jessica Yu <[email protected]> Cc: Evan Green <[email protected]> Cc: Hsin-Yi Wang <[email protected]> Cc: Andy Shevchenko <[email protected]> Cc: Baoquan He <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Dave Young <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Konstantin Khlebnikov <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Rasmus Villemoes <[email protected]> Cc: Sasha Levin <[email protected]> Cc: Sergey Senozhatsky <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vivek Goyal <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>