aboutsummaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)AuthorFilesLines
2023-06-20fgraph: Add declaration of "struct fgraph_ret_regs"Steven Rostedt (Google)1-0/+3
In final testing of: https://patchwork.kernel.org/project/linux-trace-kernel/patch/1fc502712c981e0e6742185ba242992170ac9da8.1680954589.git.pengdonglin@sangfor.com.cn/ "function_graph: Support recording and printing the return value of function" The test failed due to a new warning found in the build: kernel/trace/fgraph.c:243:56: warning: ‘struct fgraph_ret_regs’ declared inside parameter list will not be visible outside of this definition or declaration Instead of asking to send another patch series, just add it and then apply the updates. Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-06-20Merge tag 'trace-v6.4-rc6' of ↵Linus Torvalds2-84/+208
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing fixes from Steven Rostedt: - Fix MAINTAINERS file to point to proper mailing list for rtla and rv The mailing list pointed to linux-trace-devel instead of linux-trace-kernel. The former is for the tracing libraries and the latter is for anything in the Linux kernel tree. The wrong mailing list was used because linux-trace-kernel did not exist when rtla and rv were created. - User events: - Fix matching of dynamic events to their user events When user writes to dynamic_events file, a lookup of the registered dynamic events is made, but there were some cases that a match could be incorrectly made. - Add auto cleanup of user events Have the user events automatically get removed when the last reference (file descriptor) is closed. This was asked for to prevent leaks of user events hanging around needing admins to clean them up. - Add persistent logic (but not let user space use it yet) In some cases, having a persistent user event (one that does not get cleaned up automatically) is useful. But there's still debates about how to expose this to user space. The infrastructure is added, but the API is not. - Update the selftests Update the user event selftests to reflect the above changes" * tag 'trace-v6.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing/user_events: Document auto-cleanup and remove dyn_event refs selftests/user_events: Adapt dyn_test to non-persist events selftests/user_events: Ensure auto cleanup works as expected tracing/user_events: Add auto cleanup and future persist flag tracing/user_events: Track refcount consistently via put/get tracing/user_events: Store register flags on events tracing/user_events: Remove user_ns walk for groups selftests/user_events: Add perf self-test for empty arguments events selftests/user_events: Clear the events after perf self-test selftests/user_events: Add ftrace self-test for empty arguments events tracing/user_events: Fix the incorrect trace record for empty arguments events tracing: Modify print_fields() for fields output order tracing/user_events: Handle matching arguments that is null from dyn_events tracing/user_events: Prevent same name but different args event tracing/rv/rtla: Update MAINTAINERS file to point to proper mailing list
2023-06-19watchdog/sparc64: define HARDLOCKUP_DETECTOR_SPARC64Petr Mladek1-1/+1
The HAVE_ prefix means that the code could be enabled. Add another variable for HAVE_HARDLOCKUP_DETECTOR_SPARC64 without this prefix. It will be set when it should be built. It will make it compatible with the other hardlockup detectors. Before, it is far from obvious that the SPARC64 variant is actually used: $> make ARCH=sparc64 defconfig $> grep HARDLOCKUP_DETECTOR .config CONFIG_HAVE_HARDLOCKUP_DETECTOR_BUDDY=y CONFIG_HAVE_HARDLOCKUP_DETECTOR_SPARC64=y After, it is more clear: $> make ARCH=sparc64 defconfig $> grep HARDLOCKUP_DETECTOR .config CONFIG_HAVE_HARDLOCKUP_DETECTOR_BUDDY=y CONFIG_HAVE_HARDLOCKUP_DETECTOR_SPARC64=y CONFIG_HARDLOCKUP_DETECTOR_SPARC64=y Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Petr Mladek <[email protected]> Reviewed-by: Douglas Anderson <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Nicholas Piggin <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-06-19watchdog/hardlockup: make HAVE_NMI_WATCHDOG sparc64-specificPetr Mladek1-1/+1
There are several hardlockup detector implementations and several Kconfig values which allow selection and build of the preferred one. CONFIG_HARDLOCKUP_DETECTOR was introduced by the commit 23637d477c1f53acb ("lockup_detector: Introduce CONFIG_HARDLOCKUP_DETECTOR") in v2.6.36. It was a preparation step for introducing the new generic perf hardlockup detector. The existing arch-specific variants did not support the to-be-created generic build configurations, sysctl interface, etc. This distinction was made explicit by the commit 4a7863cc2eb5f98 ("x86, nmi_watchdog: Remove ARCH_HAS_NMI_WATCHDOG and rely on CONFIG_HARDLOCKUP_DETECTOR") in v2.6.38. CONFIG_HAVE_NMI_WATCHDOG was introduced by the commit d314d74c695f967e105 ("nmi watchdog: do not use cpp symbol in Kconfig") in v3.4-rc1. It replaced the above mentioned ARCH_HAS_NMI_WATCHDOG. At that time, it was still used by three architectures, namely blackfin, mn10300, and sparc. The support for blackfin and mn10300 architectures has been completely dropped some time ago. And sparc is the only architecture with the historic NMI watchdog at the moment. And the old sparc implementation is really special. It is always built on sparc64. It used to be always enabled until the commit 7a5c8b57cec93196b ("sparc: implement watchdog_nmi_enable and watchdog_nmi_disable") added in v4.10-rc1. There are only few locations where the sparc64 NMI watchdog interacts with the generic hardlockup detectors code: + implements arch_touch_nmi_watchdog() which is called from the generic touch_nmi_watchdog() + implements watchdog_hardlockup_enable()/disable() to support /proc/sys/kernel/nmi_watchdog + is always preferred over other generic watchdogs, see CONFIG_HARDLOCKUP_DETECTOR + includes asm/nmi.h into linux/nmi.h because some sparc-specific functions are needed in sparc-specific code which includes only linux/nmi.h. The situation became more complicated after the commit 05a4a95279311c3 ("kernel/watchdog: split up config options") and commit 2104180a53698df5 ("powerpc/64s: implement arch-specific hardlockup watchdog") in v4.13-rc1. They introduced HAVE_HARDLOCKUP_DETECTOR_ARCH. It was used for powerpc specific hardlockup detector. It was compatible with the perf one regarding the general boot, sysctl, and programming interfaces. HAVE_HARDLOCKUP_DETECTOR_ARCH was defined as a superset of HAVE_NMI_WATCHDOG. It made some sense because all arch-specific detectors had some common requirements, namely: + implemented arch_touch_nmi_watchdog() + included asm/nmi.h into linux/nmi.h + defined the default value for /proc/sys/kernel/nmi_watchdog But it actually has made things pretty complicated when the generic buddy hardlockup detector was added. Before the generic perf detector was newer supported together with an arch-specific one. But the buddy detector could work on any SMP system. It means that an architecture could support both the arch-specific and buddy detector. As a result, there are few tricky dependencies. For example, CONFIG_HARDLOCKUP_DETECTOR depends on: ((HAVE_HARDLOCKUP_DETECTOR_PERF || HAVE_HARDLOCKUP_DETECTOR_BUDDY) && !HAVE_NMI_WATCHDOG) || HAVE_HARDLOCKUP_DETECTOR_ARCH The problem is that the very special sparc implementation is defined as: HAVE_NMI_WATCHDOG && !HAVE_HARDLOCKUP_DETECTOR_ARCH Another problem is that the meaning of HAVE_NMI_WATCHDOG is far from clear without reading understanding the history. Make the logic less tricky and more self-explanatory by making HAVE_NMI_WATCHDOG specific for the sparc64 implementation. And rename it to HAVE_HARDLOCKUP_DETECTOR_SPARC64. Note that HARDLOCKUP_DETECTOR_PREFER_BUDDY, HARDLOCKUP_DETECTOR_PERF, and HARDLOCKUP_DETECTOR_BUDDY may conflict only with HAVE_HARDLOCKUP_DETECTOR_ARCH. They depend on HARDLOCKUP_DETECTOR and it is not longer enabled when HAVE_NMI_WATCHDOG is set. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Petr Mladek <[email protected]> Reviewed-by: Douglas Anderson <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Nicholas Piggin <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-06-19watchdog/hardlockup: move SMP barriers from common code to buddy codeDouglas Anderson2-6/+21
It's been suggested that since the SMP barriers are only potentially useful for the buddy hardlockup detector, not the perf hardlockup detector, that the barriers belong in the buddy code. Let's move them and add clearer comments about why they're needed. Link: https://lkml.kernel.org/r/20230526184139.9.I5ab0a0eeb0bd52fb23f901d298c72fa5c396e22b@changeid Signed-off-by: Douglas Anderson <[email protected]> Suggested-by: Petr Mladek <[email protected]> Reviewed-by: Petr Mladek <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Nicholas Piggin <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-06-19watchdog/buddy: don't copy the cpumask in watchdog_next_cpu()Douglas Anderson1-3/+2
There's no reason to make a copy of the "watchdog_cpus" locally in watchdog_next_cpu(). Making a copy wouldn't make things any more race free and we're just reading the value so there's no need for a copy. Link: https://lkml.kernel.org/r/20230526184139.7.If466f9a2b50884cbf6a1d8ad05525a2c17069407@changeid Signed-off-by: Douglas Anderson <[email protected]> Suggested-by: Petr Mladek <[email protected]> Reviewed-by: Petr Mladek <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Nicholas Piggin <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-06-19watchdog/buddy: cleanup how watchdog_buddy_check_hardlockup() is calledDouglas Anderson2-9/+8
In the patch ("watchdog/hardlockup: detect hard lockups using secondary (buddy) CPUs"), we added a call from the common watchdog.c file into the buddy. That call could be done more cleanly. Specifically: 1. If we move the call into watchdog_hardlockup_kick() then it keeps watchdog_timer_fn() simpler. 2. We don't need to pass an "unsigned long" to the buddy for the timer count. In the patch ("watchdog/hardlockup: add a "cpu" param to watchdog_hardlockup_check()") the count was changed to "atomic_t" which is backed by an int, so we should match types. Link: https://lkml.kernel.org/r/20230526184139.6.I006c7d958a1ea5c4e1e4dc44a25596d9bb5fd3ba@changeid Signed-off-by: Douglas Anderson <[email protected]> Suggested-by: Petr Mladek <[email protected]> Reviewed-by: Petr Mladek <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Nicholas Piggin <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-06-19watchdog/hardlockup: in watchdog_hardlockup_check() use cpumask_copy()Douglas Anderson1-1/+3
In the patch ("watchdog/hardlockup: add a "cpu" param to watchdog_hardlockup_check()") we started using a cpumask to keep track of which CPUs to backtrace. When setting up this cpumask, it's better to use cpumask_copy() than to just copy the structure directly. Fix this. Link: https://lkml.kernel.org/r/20230526184139.4.Iccee2d1ea19114dafb6553a854ea4d8ab2a3f25b@changeid Signed-off-by: Douglas Anderson <[email protected]> Suggested-by: Petr Mladek <[email protected]> Reviewed-by: Petr Mladek <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Nicholas Piggin <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-06-19watchdog/hardlockup: don't use raw_cpu_ptr() in watchdog_hardlockup_kick()Douglas Anderson1-1/+1
In the patch ("watchdog/hardlockup: add a "cpu" param to watchdog_hardlockup_check()") there was no reason to use raw_cpu_ptr(). Using this_cpu_ptr() works fine. Link: https://lkml.kernel.org/r/20230526184139.3.I660e103077dcc23bb29aaf2be09cb234e0495b2d@changeid Signed-off-by: Douglas Anderson <[email protected]> Suggested-by: Petr Mladek <[email protected]> Reviewed-by: Petr Mladek <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Nicholas Piggin <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-06-19watchdog/hardlockup: HAVE_NMI_WATCHDOG must implement ↵Douglas Anderson1-13/+0
watchdog_hardlockup_probe() Right now there is one arch (sparc64) that selects HAVE_NMI_WATCHDOG without selecting HAVE_HARDLOCKUP_DETECTOR_ARCH. Because of that one architecture, we have some special case code in the watchdog core to handle the fact that watchdog_hardlockup_probe() isn't implemented. Let's implement watchdog_hardlockup_probe() for sparc64 and get rid of the special case. As a side effect of doing this, code inspection tells us that we could fix a minor bug where the system won't properly realize that NMI watchdogs are disabled. Specifically, on powerpc if CONFIG_PPC_WATCHDOG is turned off the arch might still select CONFIG_HAVE_HARDLOCKUP_DETECTOR_ARCH which selects CONFIG_HAVE_NMI_WATCHDOG. Since CONFIG_PPC_WATCHDOG was off then nothing will override the "weak" watchdog_hardlockup_probe() and we'll fallback to looking at CONFIG_HAVE_NMI_WATCHDOG. Link: https://lkml.kernel.org/r/20230526184139.2.Ic6ebbf307ca0efe91f08ce2c1eb4a037ba6b0700@changeid Signed-off-by: Douglas Anderson <[email protected]> Suggested-by: Petr Mladek <[email protected]> Reviewed-by: Petr Mladek <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Nicholas Piggin <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-06-19watchdog/hardlockup: keep kernel.nmi_watchdog sysctl as 0444 if probe failsDouglas Anderson1-10/+20
Patch series "watchdog: Cleanup / fixes after buddy series v5 reviews". This patch series attempts to finish resolving the feedback received from Petr Mladek on the v5 series I posted. Probably the only thing that wasn't fully as clean as Petr requested was the Kconfig stuff. I couldn't find a better way to express it without a more major overhaul. In the very least, I renamed "NON_ARCH" to "PERF_OR_BUDDY" in the hopes that will make it marginally better. Nothing in this series is terribly critical and even the bugfixes are small. However, it does cleanup a few things that were pointed out in review. This patch (of 10): The permissions for the kernel.nmi_watchdog sysctl have always been set at compile time despite the fact that a watchdog can fail to probe. Let's fix this and set the permissions based on whether the hardlockup detector actually probed. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/20230526184139.1.I0d75971cc52a7283f495aac0bd5c3041aadc734e@changeid Fixes: a994a3147e4c ("watchdog/hardlockup/perf: Implement init time detection of perf") Signed-off-by: Douglas Anderson <[email protected]> Reported-by: Petr Mladek <[email protected]> Closes: https://lore.kernel.org/r/ZHCn4hNxFpY5-9Ki@alley Reviewed-by: Petr Mladek <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Nicholas Piggin <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-06-19kernel: pid_namespace: remove unused set_memfd_noexec_scope()YueHaibing1-1/+0
This inline function is unused, remove it. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: YueHaibing <[email protected]> Cc: Jeff Xu <[email protected]> Cc: Kees Cook <[email protected]> Cc: Luis Chamberlain <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-06-19mm: ptep_get() conversionRyan Roberts1-1/+1
Convert all instances of direct pte_t* dereferencing to instead use ptep_get() helper. This means that by default, the accesses change from a C dereference to a READ_ONCE(). This is technically the correct thing to do since where pgtables are modified by HW (for access/dirty) they are volatile and therefore we should always ensure READ_ONCE() semantics. But more importantly, by always using the helper, it can be overridden by the architecture to fully encapsulate the contents of the pte. Arch code is deliberately not converted, as the arch code knows best. It is intended that arch code (arm64) will override the default with its own implementation that can (e.g.) hide certain bits from the core code, or determine young/dirty status by mixing in state from another source. Conversion was done using Coccinelle: ---- // $ make coccicheck \ // COCCI=ptepget.cocci \ // SPFLAGS="--include-headers" \ // MODE=patch virtual patch @ depends on patch @ pte_t *v; @@ - *v + ptep_get(v) ---- Then reviewed and hand-edited to avoid multiple unnecessary calls to ptep_get(), instead opting to store the result of a single call in a variable, where it is correct to do so. This aims to negate any cost of READ_ONCE() and will benefit arch-overrides that may be more complex. Included is a fix for an issue in an earlier version of this patch that was pointed out by kernel test robot. The issue arose because config MMU=n elides definition of the ptep helper functions, including ptep_get(). HUGETLB_PAGE=n configs still define a simple huge_ptep_clear_flush() for linking purposes, which dereferences the ptep. So when both configs are disabled, this caused a build error because ptep_get() is not defined. Fix by continuing to do a direct dereference when MMU=n. This is safe because for this config the arch code cannot be trying to virtualize the ptes because none of the ptep helpers are defined. Link: https://lkml.kernel.org/r/[email protected] Reported-by: kernel test robot <[email protected]> Link: https://lore.kernel.org/oe-kbuild-all/[email protected]/ Signed-off-by: Ryan Roberts <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Alex Williamson <[email protected]> Cc: Al Viro <[email protected]> Cc: Andrey Konovalov <[email protected]> Cc: Andrey Ryabinin <[email protected]> Cc: Christian Brauner <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: Dave Airlie <[email protected]> Cc: Dimitri Sivanich <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Jérôme Glisse <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Lorenzo Stoakes <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Miaohe Lin <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Mike Rapoport (IBM) <[email protected]> Cc: Muchun Song <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: Oleksandr Tyshchenko <[email protected]> Cc: Pavel Tatashin <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: SeongJae Park <[email protected]> Cc: Shakeel Butt <[email protected]> Cc: Uladzislau Rezki (Sony) <[email protected]> Cc: Vincenzo Frascino <[email protected]> Cc: Yu Zhao <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-06-19dma-mapping: force bouncing if the kmalloc() size is not cache-line-alignedCatalin Marinas2-1/+6
For direct DMA, if the size is small enough to have originated from a kmalloc() cache below ARCH_DMA_MINALIGN, check its alignment against dma_get_cache_alignment() and bounce if necessary. For larger sizes, it is the responsibility of the DMA API caller to ensure proper alignment. At this point, the kmalloc() caches are properly aligned but this will change in a subsequent patch. Architectures can opt in by selecting DMA_BOUNCE_UNALIGNED_KMALLOC. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Catalin Marinas <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Robin Murphy <[email protected]> Tested-by: Isaac J. Manjarres <[email protected]> Cc: Alasdair Kergon <[email protected]> Cc: Ard Biesheuvel <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Herbert Xu <[email protected]> Cc: Jerry Snitselaar <[email protected]> Cc: Joerg Roedel <[email protected]> Cc: Jonathan Cameron <[email protected]> Cc: Jonathan Cameron <[email protected]> Cc: Lars-Peter Clausen <[email protected]> Cc: Logan Gunthorpe <[email protected]> Cc: Marc Zyngier <[email protected]> Cc: Mark Brown <[email protected]> Cc: Mike Snitzer <[email protected]> Cc: "Rafael J. Wysocki" <[email protected]> Cc: Saravana Kannan <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-06-19dma-mapping: name SG DMA flag helpers consistentlyRobin Murphy1-1/+1
sg_is_dma_bus_address() is inconsistent with the naming pattern of its corresponding setters and its own kerneldoc, so take the majority vote and rename it sg_dma_is_bus_address() (and fix up the missing underscores in the kerneldoc too). This gives us a nice clear pattern where SG DMA flags are SG_DMA_<NAME>, and the helpers for acting on them are sg_dma_<action>_<name>(). Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Robin Murphy <[email protected]> Signed-off-by: Catalin Marinas <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Jerry Snitselaar <[email protected]> Reviewed-by: Logan Gunthorpe <[email protected]> Link: https://lore.kernel.org/r/fa2eca2862c7ffc41b50337abffb2dfd2864d3ea.1685036694.git.robin.murphy@arm.com Tested-by: Isaac J. Manjarres <[email protected]> Cc: Alasdair Kergon <[email protected]> Cc: Ard Biesheuvel <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Herbert Xu <[email protected]> Cc: Joerg Roedel <[email protected]> Cc: Jonathan Cameron <[email protected]> Cc: Jonathan Cameron <[email protected]> Cc: Lars-Peter Clausen <[email protected]> Cc: Marc Zyngier <[email protected]> Cc: Mark Brown <[email protected]> Cc: Mike Snitzer <[email protected]> Cc: "Rafael J. Wysocki" <[email protected]> Cc: Saravana Kannan <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-06-19scatterlist: add dedicated config for DMA flagsRobin Murphy1-0/+3
The DMA flags field will be useful for users beyond PCI P2P, so upgrade to its own dedicated config option. [[email protected]: use #ifdef CONFIG_NEED_SG_DMA_FLAGS in scatterlist.h] [[email protected]: update PCI_P2PDMA dma_flags comment in scatterlist.h] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Robin Murphy <[email protected]> Signed-off-by: Catalin Marinas <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Tested-by: Isaac J. Manjarres <[email protected]> Cc: Alasdair Kergon <[email protected]> Cc: Ard Biesheuvel <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Herbert Xu <[email protected]> Cc: Jerry Snitselaar <[email protected]> Cc: Joerg Roedel <[email protected]> Cc: Jonathan Cameron <[email protected]> Cc: Jonathan Cameron <[email protected]> Cc: Lars-Peter Clausen <[email protected]> Cc: Logan Gunthorpe <[email protected]> Cc: Marc Zyngier <[email protected]> Cc: Mark Brown <[email protected]> Cc: Mike Snitzer <[email protected]> Cc: "Rafael J. Wysocki" <[email protected]> Cc: Saravana Kannan <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-06-19perf/core: allow pte_offset_map() to failHugh Dickins1-0/+4
In rare transient cases, not yet made possible, pte_offset_map() and pte_offet_map_lock() may not find a page table: handle appropriately. [[email protected]: __wp_page_copy_user(): don't call update_mmu_tlb() with NULL] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Hugh Dickins <[email protected]> Cc: Alistair Popple <[email protected]> Cc: Anshuman Khandual <[email protected]> Cc: Axel Rasmussen <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: "Huang, Ying" <[email protected]> Cc: Ira Weiny <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Lorenzo Stoakes <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Miaohe Lin <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Mike Rapoport (IBM) <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: Pavel Tatashin <[email protected]> Cc: Peter Xu <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Qi Zheng <[email protected]> Cc: Ralph Campbell <[email protected]> Cc: Ryan Roberts <[email protected]> Cc: SeongJae Park <[email protected]> Cc: Song Liu <[email protected]> Cc: Steven Price <[email protected]> Cc: Suren Baghdasaryan <[email protected]> Cc: Thomas Hellström <[email protected]> Cc: Will Deacon <[email protected]> Cc: Yang Shi <[email protected]> Cc: Yu Zhao <[email protected]> Cc: Zack Rusin <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-06-19bpf: Keep BPF_PROG_LOAD permission checks clear of validationsAndrii Nakryiko1-12/+9
Move out flags validation and license checks out of the permission checks. They were intermingled, which makes subsequent changes harder. Clean this up: perform straightforward flag validation upfront, and fetch and check license later, right where we use it. Also consolidate capabilities check in one block, right after basic attribute sanity checks. Signed-off-by: Andrii Nakryiko <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Stanislav Fomichev <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2023-06-19bpf: Centralize permissions checks for all BPF map typesAndrii Nakryiko11-35/+47
This allows to do more centralized decisions later on, and generally makes it very explicit which maps are privileged and which are not (e.g., LRU_HASH and LRU_PERCPU_HASH, which are privileged HASH variants, as opposed to unprivileged HASH and HASH_PERCPU; now this is explicit and easy to verify). Signed-off-by: Andrii Nakryiko <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Stanislav Fomichev <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2023-06-19bpf: Inline map creation logic in map_create() functionAndrii Nakryiko1-33/+24
Currently find_and_alloc_map() performs two separate functions: some argument sanity checking and partial map creation workflow hanling. Neither of those functions are self-sufficient and are augmented by further checks and initialization logic in the caller (map_create() function). So unify all the sanity checks, permission checks, and creation and initialization logic in one linear piece of code in map_create() instead. This also make it easier to further enhance permission checks and keep them located in one place. Signed-off-by: Andrii Nakryiko <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Stanislav Fomichev <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2023-06-19bpf: Move unprivileged checks into map_create() and bpf_prog_load()Andrii Nakryiko1-15/+19
Make each bpf() syscall command a bit more self-contained, making it easier to further enhance it. We move sysctl_unprivileged_bpf_disabled handling down to map_create() and bpf_prog_load(), two special commands in this regard. Also swap the order of checks, calling bpf_capable() only if sysctl_unprivileged_bpf_disabled is true, avoiding unnecessary audit messages. Signed-off-by: Andrii Nakryiko <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Stanislav Fomichev <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2023-06-18tick/rcu: Fix bogus ratelimit conditionWen Yang1-1/+1
The ratelimit logic in report_idle_softirq() is broken because the exit condition is always true: static int ratelimit; if (ratelimit < 10) return false; ---> always returns here ratelimit++; ---> no chance to run Make it check for >= 10 instead. Fixes: 0345691b24c0 ("tick/rcu: Stop allowing RCU_SOFTIRQ in idle") Signed-off-by: Wen Yang <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-06-18alarmtimer: Remove unnecessary (void *) castLi zeming1-1/+1
Pointers of type void * do not require a type cast when they are assigned to a real pointer. Signed-off-by: Li zeming <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-06-18alarmtimer: Remove unnecessary initialization of variable 'ret'Li zeming1-1/+1
ret is assigned before checked, so it does not need to initialize the variable Signed-off-by: Li zeming <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-06-18posix-timers: Refer properly to CONFIG_HIGH_RES_TIMERSLukas Bulwahn1-1/+1
Commit c78f261e5dcb ("posix-timers: Clarify posix_timer_fn() comments") turns an ifdef CONFIG_HIGH_RES_TIMERS into an conditional on "IS_ENABLED(CONFIG_HIGHRES_TIMERS)"; note that the new conditional refers to "HIGHRES_TIMERS" not "HIGH_RES_TIMERS" as before. Fix this typo introduced in that refactoring. Fixes: c78f261e5dcb ("posix-timers: Clarify posix_timer_fn() comments") Signed-off-by: Lukas Bulwahn <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-06-18posix-timers: Polish coding style in a few placesThomas Gleixner1-7/+7
Make it consistent with the TIP tree documentation. Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Frederic Weisbecker <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-06-18posix-timers: Remove pointless commentsThomas Gleixner1-25/+0
Documenting the obvious is just consuming space for no value. Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Frederic Weisbecker <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-06-18posix-timers: Clarify posix_timer_fn() commentsThomas Gleixner1-30/+32
Make the issues vs. SIG_IGN understandable and remove the 15 years old promise that a proper solution is already on the horizon. Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/874jnrdmrq.ffs@tglx
2023-06-18posix-timers: Clarify posix_timer_rearm() commentThomas Gleixner1-9/+3
Yet another incomprehensible piece of art. Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Frederic Weisbecker <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-06-18posix-timers: Comment SIGEV_THREAD_ID properlyThomas Gleixner1-5/+2
Replace the word salad. Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Frederic Weisbecker <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-06-18posix-timers: Add proper comments in do_timer_create()Thomas Gleixner1-9/+11
The comment about timer lifetime at the end of the function is misplaced and uncomprehensible. Make it understandable and put it at the right place. Add a new comment about the visibility of the new timer ID to user space. Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Frederic Weisbecker <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-06-18posix-timers: Document nanosleep() detailsThomas Gleixner1-2/+7
The descriptions for common_nsleep() is wrong and common_nsleep_timens() lacks any form of comment. Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Frederic Weisbecker <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-06-18posix-timers: Document sys_clock_settime() permissions in placeThomas Gleixner1-7/+4
The documentation of sys_clock_settime() permissions is at a random place and mostly word salad. Remove it and add a concise comment into sys_clock_settime(). Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Frederic Weisbecker <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-06-18posix-timers: Document sys_clock_getoverrun()Thomas Gleixner1-8/+17
Document the syscall in detail and with coherent sentences. Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Frederic Weisbecker <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-06-18posix-timers: Document common_clock_get() correctlyThomas Gleixner1-20/+30
Replace another confusing and inaccurate set of comments. Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Frederic Weisbecker <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-06-18posix-timers: Document sys_clock_getres() correctlyThomas Gleixner1-8/+73
The decades old comment about Posix clock resolution is confusing at best. Remove it and add a proper explanation to sys_clock_getres(). Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Frederic Weisbecker <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-06-18posix-timers: Split release_posix_timers()Thomas Gleixner1-16/+15
release_posix_timers() is called for cleaning up both hashed and unhashed timers. The cases are differentiated by an argument and the usage is hideous. Seperate the actual free path out and use it for unhashed timers. Provide a function for hashed timers. No functional change. Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Frederic Weisbecker <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-06-18posix-timers: Remove pointless irqsafe from hash_lockThomas Gleixner1-3/+2
All usage of hash_lock is in thread context. No point in using spin_lock_irqsave()/irqrestore() for a single usage site. Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Frederic Weisbecker <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-06-18posix-timers: Set k_itimer:: It_signal to NULL on exit()Thomas Gleixner1-0/+8
Technically it's not required to set k_itimer::it_signal to NULL on exit() because there is no other thread anymore which could lookup the timer concurrently. Set it to NULL for consistency sake and add a comment to that effect. Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Frederic Weisbecker <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-06-18posix-timers: Annotate concurrent access to k_itimer:: It_signalThomas Gleixner1-7/+7
k_itimer::it_signal is read lockless in the RCU protected hash lookup, but it can be written concurrently in the timer_create() and timer_delete() path. Annotate these places with READ_ONCE() and WRITE_ONCE() Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Frederic Weisbecker <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-06-18posix-timers: Add comments about timer lookupThomas Gleixner1-7/+32
Document how the timer ID validation in the hash table works. Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Frederic Weisbecker <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-06-18posix-timers: Cleanup comments about timer ID trackingThomas Gleixner1-20/+8
Describe the hash table properly and remove the IDR leftover comments. Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Frederic Weisbecker <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-06-18posix-timers: Clarify timer_wait_running() commentThomas Gleixner1-4/+12
Explain it better and add the CONFIG_POSIX_CPU_TIMERS_TASK_WORK=y aspect for completeness. Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Frederic Weisbecker <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-06-18posix-timers: Ensure timer ID search-loop limit is validThomas Gleixner1-13/+18
posix_timer_add() tries to allocate a posix timer ID by starting from the cached ID which was stored by the last successful allocation. This is done in a loop searching the ID space for a free slot one by one. The loop has to terminate when the search wrapped around to the starting point. But that's racy vs. establishing the starting point. That is read out lockless, which leads to the following problem: CPU0 CPU1 posix_timer_add() start = sig->posix_timer_id; lock(hash_lock); ... posix_timer_add() if (++sig->posix_timer_id < 0) start = sig->posix_timer_id; sig->posix_timer_id = 0; So CPU1 can observe a negative start value, i.e. -1, and the loop break never happens because the condition can never be true: if (sig->posix_timer_id == start) break; While this is unlikely to ever turn into an endless loop as the ID space is huge (INT_MAX), the racy read of the start value caught the attention of KCSAN and Dmitry unearthed that incorrectness. Rewrite it so that all id operations are under the hash lock. Reported-by: [email protected] Reported-by: Dmitry Vyukov <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Frederic Weisbecker <[email protected]> Link: https://lore.kernel.org/r/87bkhzdn6g.ffs@tglx
2023-06-18posix-timers: Prevent RT livelock in itimer_delete()Thomas Gleixner1-8/+35
itimer_delete() has a retry loop when the timer is concurrently expired. On non-RT kernels this just spin-waits until the timer callback has completed, except for posix CPU timers which have HAVE_POSIX_CPU_TIMERS_TASK_WORK enabled. In that case and on RT kernels the existing task could live lock when preempting the task which does the timer delivery. Replace spin_unlock() with an invocation of timer_wait_running() to handle it the same way as the other retry loops in the posix timer code. Fixes: ec8f954a40da ("posix-timers: Use a callback for cancel synchronization on PREEMPT_RT") Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Frederic Weisbecker <[email protected]> Link: https://lore.kernel.org/r/87v8g7c50d.ffs@tglx
2023-06-17irqdomain: Include internals.h for function prototypesArnd Bergmann1-0/+2
irq_domain_debugfs_init() is defined in irqdomain.c, but the declaration is in a header that is not included here: kernel/irq/irqdomain.c:1965:13: error: no previous prototype for 'irq_domain_debugfs_init' [-Werror=missing-prototypes] Signed-off-by: Arnd Bergmann <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-06-16sched/core: Avoid multiple calling update_rq_clock() in __cfsb_csd_unthrottle()Hao Jia2-0/+40
After commit 8ad075c2eb1f ("sched: Async unthrottling for cfs bandwidth"), we may update the rq clock multiple times in the loop of __cfsb_csd_unthrottle(). A prior (although less common) instance of this problem exists in unthrottle_offline_cfs_rqs(). Cure both by ensuring update_rq_clock() is called before the loop and setting RQCF_ACT_SKIP during the loop, to supress further updates. The alternative would be pulling update_rq_clock() out of unthrottle_cfs_rq(), but that gives an even bigger mess. Fixes: 8ad075c2eb1f ("sched: Async unthrottling for cfs bandwidth") Reviewed-By: Ben Segall <[email protected]> Suggested-by: Vincent Guittot <[email protected]> Signed-off-by: Hao Jia <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Vincent Guittot <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2023-06-16sched/core: Avoid double calling update_rq_clock() in __balance_push_cpu_stop()Hao Jia1-3/+4
There is a double update_rq_clock() invocation: __balance_push_cpu_stop() update_rq_clock() __migrate_task() update_rq_clock() Sadly select_fallback_rq() also needs update_rq_clock() for __do_set_cpus_allowed(), it is not possible to remove the update from __balance_push_cpu_stop(). So remove it from __migrate_task() and ensure all callers of this function call update_rq_clock() prior to calling it. Signed-off-by: Hao Jia <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Vincent Guittot <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2023-06-16sched/core: Fixed missing rq clock update before calling set_rq_offline()Hao Jia2-4/+4
When using a cpufreq governor that uses cpufreq_add_update_util_hook(), it is possible to trigger a missing update_rq_clock() warning for the CPU hotplug path: rq_attach_root() set_rq_offline() rq_offline_rt() __disable_runtime() sched_rt_rq_enqueue() enqueue_top_rt_rq() cpufreq_update_util() data->func(data, rq_clock(rq), flags) Move update_rq_clock() from sched_cpu_deactivate() (one of it's callers) into set_rq_offline() such that it covers all set_rq_offline() usage. Additionally change rq_attach_root() to use rq_lock_irqsave() so that it will properly manage the runqueue clock flags. Suggested-by: Ben Segall <[email protected]> Signed-off-by: Hao Jia <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Vincent Guittot <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2023-06-16sched/deadline: Fix bandwidth reclaim equation in GRUBVineeth Pillai2-27/+29
According to the GRUB[1] rule, the runtime is depreciated as: "dq = -max{u, (1 - Uinact - Uextra)} dt" (1) To guarantee that deadline tasks doesn't starve lower class tasks, we do not allocate the full bandwidth of the cpu to deadline tasks. Maximum bandwidth usable by deadline tasks is denoted by "Umax". Considering Umax, equation (1) becomes: "dq = -(max{u, (Umax - Uinact - Uextra)} / Umax) dt" (2) Current implementation has a minor bug in equation (2), which this patch fixes. The reclamation logic is verified by a sample program which creates multiple deadline threads and observing their utilization. The tests were run on an isolated cpu(isolcpus=3) on a 4 cpu system. Tests on 6.3.0 ============== RUN 1: runtime=7ms, deadline=period=10ms, RT capacity = 95% TID[693]: RECLAIM=1, (r=7ms, d=10ms, p=10ms), Util: 93.33 TID[693]: RECLAIM=1, (r=7ms, d=10ms, p=10ms), Util: 93.35 RUN 2: runtime=1ms, deadline=period=100ms, RT capacity = 95% TID[708]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 16.69 TID[708]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 16.69 RUN 3: 2 tasks Task 1: runtime=1ms, deadline=period=10ms Task 2: runtime=1ms, deadline=period=100ms TID[631]: RECLAIM=1, (r=1ms, d=10ms, p=10ms), Util: 62.67 TID[632]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 6.37 TID[631]: RECLAIM=1, (r=1ms, d=10ms, p=10ms), Util: 62.38 TID[632]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 6.23 As seen above, the reclamation doesn't reclaim the maximum allowed bandwidth and as the bandwidth of tasks gets smaller, the reclaimed bandwidth also comes down. Tests with this patch applied ============================= RUN 1: runtime=7ms, deadline=period=10ms, RT capacity = 95% TID[608]: RECLAIM=1, (r=7ms, d=10ms, p=10ms), Util: 95.19 TID[608]: RECLAIM=1, (r=7ms, d=10ms, p=10ms), Util: 95.16 RUN 2: runtime=1ms, deadline=period=100ms, RT capacity = 95% TID[616]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 95.27 TID[616]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 95.21 RUN 3: 2 tasks Task 1: runtime=1ms, deadline=period=10ms Task 2: runtime=1ms, deadline=period=100ms TID[620]: RECLAIM=1, (r=1ms, d=10ms, p=10ms), Util: 86.64 TID[621]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 8.66 TID[620]: RECLAIM=1, (r=1ms, d=10ms, p=10ms), Util: 86.45 TID[621]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 8.73 Running tasks on all cpus allowing for migration also showed that the utilization is reclaimed to the maximum. Running 10 tasks on 3 cpus SCHED_FLAG_RECLAIM - top shows: %Cpu0 : 94.6 us, 0.0 sy, 0.0 ni, 5.4 id, 0.0 wa %Cpu1 : 95.2 us, 0.0 sy, 0.0 ni, 4.8 id, 0.0 wa %Cpu2 : 95.8 us, 0.0 sy, 0.0 ni, 4.2 id, 0.0 wa [1]: Abeni, Luca & Lipari, Giuseppe & Parri, Andrea & Sun, Youcheng. (2015). Parallel and sequential reclaiming in multicore real-time global scheduling. Signed-off-by: Vineeth Pillai (Google) <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Daniel Bristot de Oliveira <[email protected]> Acked-by: Juri Lelli <[email protected]> Link: https://lore.kernel.org/r/[email protected]