aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2021-09-04mm, slub: call deactivate_slab() without disabling irqsVlastimil Babka1-5/+19
The function is now safe to be called with irqs enabled, so move the calls outside of irq disabled sections. When called from ___slab_alloc() -> flush_slab() we have irqs disabled, so to reenable them before deactivate_slab() we need to open-code flush_slab() in ___slab_alloc() and reenable irqs after modifying the kmem_cache_cpu fields. But that means a IRQ handler meanwhile might have assigned a new page to kmem_cache_cpu.page so we have to retry the whole check. The remaining callers of flush_slab() are the IPI handler which has disabled irqs anyway, and slub_cpu_dead() which will be dealt with in the following patch. Signed-off-by: Vlastimil Babka <[email protected]>
2021-09-04mm, slub: make locking in deactivate_slab() irq-safeVlastimil Babka1-4/+5
dectivate_slab() now no longer touches the kmem_cache_cpu structure, so it will be possible to call it with irqs enabled. Just convert the spin_lock calls to their irq saving/restoring variants to make it irq-safe. Note we now have to use cmpxchg_double_slab() for irq-safe slab_lock(), because in some situations we don't take the list_lock, which would disable irqs. Signed-off-by: Vlastimil Babka <[email protected]>
2021-09-04mm, slub: move reset of c->page and freelist out of deactivate_slab()Vlastimil Babka1-13/+18
deactivate_slab() removes the cpu slab by merging the cpu freelist with slab's freelist and putting the slab on the proper node's list. It also sets the respective kmem_cache_cpu pointers to NULL. By extracting the kmem_cache_cpu operations from the function, we can make it not dependent on disabled irqs. Also if we return a single free pointer from ___slab_alloc, we no longer have to assign kmem_cache_cpu.page before deactivation or care if somebody preempted us and assigned a different page to our kmem_cache_cpu in the process. Signed-off-by: Vlastimil Babka <[email protected]>
2021-09-04mm, slub: stop disabling irqs around get_partial()Vlastimil Babka1-14/+8
The function get_partial() does not need to have irqs disabled as a whole. It's sufficient to convert spin_lock operations to their irq saving/restoring versions. As a result, it's now possible to reach the page allocator from the slab allocator without disabling and re-enabling interrupts on the way. Signed-off-by: Vlastimil Babka <[email protected]>
2021-09-04mm, slub: check new pages with restored irqsVlastimil Babka1-5/+3
Building on top of the previous patch, re-enable irqs before checking new pages. alloc_debug_processing() is now called with enabled irqs so we need to remove VM_BUG_ON(!irqs_disabled()); in check_slab() - there doesn't seem to be a need for it anyway. Signed-off-by: Vlastimil Babka <[email protected]>
2021-09-04mm, slub: validate slab from partial list or page allocator before making it ↵Vlastimil Babka1-8/+9
cpu slab When we obtain a new slab page from node partial list or page allocator, we assign it to kmem_cache_cpu, perform some checks, and if they fail, we undo the assignment. In order to allow doing the checks without irq disabled, restructure the code so that the checks are done first, and kmem_cache_cpu.page assignment only after they pass. Signed-off-by: Vlastimil Babka <[email protected]>
2021-09-04mm, slub: restore irqs around calling new_slab()Vlastimil Babka1-6/+2
allocate_slab() currently re-enables irqs before calling to the page allocator. It depends on gfpflags_allow_blocking() to determine if it's safe to do so. Now we can instead simply restore irq before calling it through new_slab(). The other caller early_kmem_cache_node_alloc() is unaffected by this. Signed-off-by: Vlastimil Babka <[email protected]>
2021-09-04mm, slub: move disabling irqs closer to get_partial() in ___slab_alloc()Vlastimil Babka1-9/+25
Continue reducing the irq disabled scope. Check for per-cpu partial slabs with first with irqs enabled and then recheck with irqs disabled before grabbing the slab page. Mostly preparatory for the following patches. Signed-off-by: Vlastimil Babka <[email protected]>
2021-09-04mm, slub: do initial checks in ___slab_alloc() with irqs enabledVlastimil Babka2-9/+54
As another step of shortening irq disabled sections in ___slab_alloc(), delay disabling irqs until we pass the initial checks if there is a cached percpu slab and it's suitable for our allocation. Now we have to recheck c->page after actually disabling irqs as an allocation in irq handler might have replaced it. Because we call pfmemalloc_match() as one of the checks, we might hit VM_BUG_ON_PAGE(!PageSlab(page)) in PageSlabPfmemalloc in case we get interrupted and the page is freed. Thus introduce a pfmemalloc_match_unsafe() variant that lacks the PageSlab check. Signed-off-by: Vlastimil Babka <[email protected]> Acked-by: Mel Gorman <[email protected]>
2021-09-04mm, slub: move disabling/enabling irqs to ___slab_alloc()Vlastimil Babka1-12/+24
Currently __slab_alloc() disables irqs around the whole ___slab_alloc(). This includes cases where this is not needed, such as when the allocation ends up in the page allocator and has to awkwardly enable irqs back based on gfp flags. Also the whole kmem_cache_alloc_bulk() is executed with irqs disabled even when it hits the __slab_alloc() slow path, and long periods with disabled interrupts are undesirable. As a first step towards reducing irq disabled periods, move irq handling into ___slab_alloc(). Callers will instead prevent the s->cpu_slab percpu pointer from becoming invalid via get_cpu_ptr(), thus preempt_disable(). This does not protect against modification by an irq handler, which is still done by disabled irq for most of ___slab_alloc(). As a small immediate benefit, slab_out_of_memory() from ___slab_alloc() is now called with irqs enabled. kmem_cache_alloc_bulk() disables irqs for its fastpath and then re-enables them before calling ___slab_alloc(), which then disables them at its discretion. The whole kmem_cache_alloc_bulk() operation also disables preemption. When ___slab_alloc() calls new_slab() to allocate a new page, re-enable preemption, because new_slab() will re-enable interrupts in contexts that allow blocking (this will be improved by later patches). The patch itself will thus increase overhead a bit due to disabled preemption (on configs where it matters) and increased disabling/enabling irqs in kmem_cache_alloc_bulk(), but that will be gradually improved in the following patches. Note in __slab_alloc() we need to change the #ifdef CONFIG_PREEMPT guard to CONFIG_PREEMPT_COUNT to make sure preempt disable/enable is properly paired in all configurations. On configs without involuntary preemption and debugging the re-read of kmem_cache_cpu pointer is still compiled out as it was before. [ Mike Galbraith <[email protected]>: Fix kmem_cache_alloc_bulk() error path ] Signed-off-by: Vlastimil Babka <[email protected]>
2021-09-04mm, slub: simplify kmem_cache_cpu and tid setupVlastimil Babka1-13/+9
In slab_alloc_node() and do_slab_free() fastpaths we need to guarantee that our kmem_cache_cpu pointer is from the same cpu as the tid value. Currently that's done by reading the tid first using this_cpu_read(), then the kmem_cache_cpu pointer and verifying we read the same tid using the pointer and plain READ_ONCE(). This can be simplified to just fetching kmem_cache_cpu pointer and then reading tid using the pointer. That guarantees they are from the same cpu. We don't need to read the tid using this_cpu_read() because the value will be validated by this_cpu_cmpxchg_double(), making sure we are on the correct cpu and the freelist didn't change by anyone preempting us since reading the tid. Signed-off-by: Vlastimil Babka <[email protected]> Acked-by: Mel Gorman <[email protected]>
2021-09-04mm, slub: restructure new page checks in ___slab_alloc()Vlastimil Babka1-6/+22
When we allocate slab object from a newly acquired page (from node's partial list or page allocator), we usually also retain the page as a new percpu slab. There are two exceptions - when pfmemalloc status of the page doesn't match our gfp flags, or when the cache has debugging enabled. The current code for these decisions is not easy to follow, so restructure it and add comments. The new structure will also help with the following changes. No functional change. Signed-off-by: Vlastimil Babka <[email protected]> Acked-by: Mel Gorman <[email protected]>
2021-09-04mm, slub: return slab page from get_partial() and set c->page afterwardsVlastimil Babka1-10/+11
The function get_partial() finds a suitable page on a partial list, acquires and returns its freelist and assigns the page pointer to kmem_cache_cpu. In later patch we will need more control over the kmem_cache_cpu.page assignment, so instead of passing a kmem_cache_cpu pointer, pass a pointer to a pointer to a page that get_partial() can fill and the caller can assign the kmem_cache_cpu.page pointer. No functional change as all of this still happens with disabled IRQs. Signed-off-by: Vlastimil Babka <[email protected]>
2021-09-04mm, slub: dissolve new_slab_objects() into ___slab_alloc()Vlastimil Babka1-32/+18
The later patches will need more fine grained control over individual actions in ___slab_alloc(), the only caller of new_slab_objects(), so dissolve it there. This is a preparatory step with no functional change. The only minor change is moving WARN_ON_ONCE() for using a constructor together with __GFP_ZERO to new_slab(), which makes it somewhat less frequent, but still able to catch a development change introducing a systematic misuse. Signed-off-by: Vlastimil Babka <[email protected]> Acked-by: Christoph Lameter <[email protected]> Acked-by: Mel Gorman <[email protected]>
2021-09-04mm, slub: extract get_partial() from new_slab_objects()Vlastimil Babka1-6/+6
The later patches will need more fine grained control over individual actions in ___slab_alloc(), the only caller of new_slab_objects(), so this is a first preparatory step with no functional change. This adds a goto label that appears unnecessary at this point, but will be useful for later changes. Signed-off-by: Vlastimil Babka <[email protected]> Acked-by: Christoph Lameter <[email protected]>
2021-09-03io_uring: io_uring_complete() trace should take an integerJens Axboe1-3/+3
It currently takes a long, and while that's normally OK, the io_uring limit is an int. Internally in io_uring it's an int, but sometimes it's passed as a long. That can yield confusing results where a completions seems to generate a huge result: ou-sqp-1297-1298 [001] ...1 788.056371: io_uring_complete: ring 000000000e98e046, user_data 0x0, result 4294967171, cflags 0 which is due to -ECANCELED being stored in an unsigned, and then passed in as a long. Using the right int type, the trace looks correct: iou-sqp-338-339 [002] ...1 15.633098: io_uring_complete: ring 00000000e0ac60cf, user_data 0x0, result -125, cflags 0 Cc: [email protected] Signed-off-by: Jens Axboe <[email protected]>
2021-09-03Merge tag 'linux-kselftest-next-5.15-rc1' of ↵Linus Torvalds9-17/+16
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull Kselftest updates from Shuah Khan: "Fixes to build and test failures: - openat2 test failure for O_LARGEFILE flag on ARM64 - x86 test build failures related to glibc 2.34 adding support for variable sized MINSIGSTKSZ and SIGSTKSZ - removing obsolete configs in sync and cpufreq config files - minor spelling and duplicate header include cleanups" * tag 'linux-kselftest-next-5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: selftests/cpufreq: Rename DEBUG_PI_LIST to DEBUG_PLIST selftests/sync: Remove the deprecated config SYNC selftests: safesetid: Fix spelling mistake "cant" -> "can't" selftests/x86: Fix error: variably modified 'altstack_data' at file scope kselftest:sched: remove duplicate include in cs_prctl_test.c selftests: openat2: Fix testing failure for O_LARGEFILE flag
2021-09-03Merge tag 'kbuild-v5.15' of ↵Linus Torvalds101-349/+305
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild updates from Masahiro Yamada: - Add -s option (strict mode) to merge_config.sh to make it fail when any symbol is redefined. - Show a warning if a different compiler is used for building external modules. - Infer --target from ARCH for CC=clang to let you cross-compile the kernel without CROSS_COMPILE. - Make the integrated assembler default (LLVM_IAS=1) for CC=clang. - Add <linux/stdarg.h> to the kernel source instead of borrowing <stdarg.h> from the compiler. - Add Nick Desaulniers as a Kbuild reviewer. - Drop stale cc-option tests. - Fix the combination of CONFIG_TRIM_UNUSED_KSYMS and CONFIG_LTO_CLANG to handle symbols in inline assembly. - Show a warning if 'FORCE' is missing for if_changed rules. - Various cleanups * tag 'kbuild-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (39 commits) kbuild: redo fake deps at include/ksym/*.h kbuild: clean up objtool_args slightly modpost: get the *.mod file path more simply checkkconfigsymbols.py: Fix the '--ignore' option kbuild: merge vmlinux_link() between ARCH=um and other architectures kbuild: do not remove 'linux' link in scripts/link-vmlinux.sh kbuild: merge vmlinux_link() between the ordinary link and Clang LTO kbuild: remove stale *.symversions kbuild: remove unused quiet_cmd_update_lto_symversions gen_compile_commands: extract compiler command from a series of commands x86: remove cc-option-yn test for -mtune= arc: replace cc-option-yn uses with cc-option s390: replace cc-option-yn uses with cc-option ia64: move core-y in arch/ia64/Makefile to arch/ia64/Kbuild sparc: move the install rule to arch/sparc/Makefile security: remove unneeded subdir-$(CONFIG_...) kbuild: sh: remove unused install script kbuild: Fix 'no symbols' warning when CONFIG_TRIM_UNUSD_KSYMS=y kbuild: Switch to 'f' variants of integrated assembler flag kbuild: Shuffle blank line to improve comment meaning ...
2021-09-03Merge branch 'stable/for-linus-5.15-rc0' of ↵Linus Torvalds1-8/+16
git://git.kernel.org/pub/scm/linux/kernel/git/konrad/ibft Pull ibft fix from Konrad Rzeszutek Wilk: "An arm64 compile fix for the new code that fixed the iBFT KASLR handling. I missed the original 0-day build email report" * 'stable/for-linus-5.15-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/ibft: iscsi_ibft: Fix isa_bus_to_virt not working under ARM
2021-09-03mm, slub: remove redundant unfreeze_partials() from put_cpu_partial()Vlastimil Babka1-7/+0
Commit d6e0b7fa1186 ("slub: make dead caches discard free slabs immediately") introduced cpu partial flushing for kmemcg caches, based on setting the target cpu_partial to 0 and adding a flushing check in put_cpu_partial(). This code that sets cpu_partial to 0 was later moved by c9fc586403e7 ("slab: introduce __kmemcg_cache_deactivate()") and ultimately removed by 9855609bde03 ("mm: memcg/slab: use a single set of kmem_caches for all accounted allocations"). However the check and flush in put_cpu_partial() was never removed, although it's effectively a dead code. So this patch removes it. Note that d6e0b7fa1186 also added preempt_disable()/enable() to unfreeze_partials() which could be thus also considered unnecessary. But further patches will rely on it, so keep it. Signed-off-by: Vlastimil Babka <[email protected]>
2021-09-03mm, slub: don't disable irq for debug_check_no_locks_freed()Vlastimil Babka1-13/+1
In slab_free_hook() we disable irqs around the debug_check_no_locks_freed() call, which is unnecessary, as irqs are already being disabled inside the call. This seems to be leftover from the past where there were more calls inside the irq disabled sections. Remove the irq disable/enable operations. Mel noted: > Looks like it was needed for kmemcheck which went away back in 4.15 Signed-off-by: Vlastimil Babka <[email protected]> Acked-by: Mel Gorman <[email protected]>
2021-09-03mm, slub: allocate private object map for validate_slab_cache()Vlastimil Babka1-9/+15
validate_slab_cache() is called either to handle a sysfs write, or from a self-test context. In both situations it's straightforward to preallocate a private object bitmap instead of grabbing the shared static one meant for critical sections, so let's do that. Signed-off-by: Vlastimil Babka <[email protected]> Acked-by: Christoph Lameter <[email protected]> Acked-by: Mel Gorman <[email protected]>
2021-09-03mm, slub: allocate private object map for debugfs listingsVlastimil Babka1-15/+29
Slub has a static spinlock protected bitmap for marking which objects are on freelist when it wants to list them, for situations where dynamically allocating such map can lead to recursion or locking issues, and on-stack bitmap would be too large. The handlers of debugfs files alloc_traces and free_traces also currently use this shared bitmap, but their syscall context makes it straightforward to allocate a private map before entering locked sections, so switch these processing paths to use a private bitmap. Signed-off-by: Vlastimil Babka <[email protected]> Acked-by: Christoph Lameter <[email protected]> Acked-by: Mel Gorman <[email protected]>
2021-09-03mm, slub: don't call flush_all() from slab_debug_trace_open()Vlastimil Babka1-3/+0
slab_debug_trace_open() can only be called on caches with SLAB_STORE_USER flag and as with all slub debugging flags, such caches avoid cpu or percpu partial slabs altogether, so there's nothing to flush. Signed-off-by: Vlastimil Babka <[email protected]> Acked-by: Christoph Lameter <[email protected]>
2021-09-03docs: kernel-hacking: Remove inappropriate textAlyssa Rosenzweig1-7/+1
Remove inappropriate sexual (and ableist) text from the locking documentation, aligning it with the kernel code-of-conduct. As the text was unrelated to locking, this change streamlines the document and improves readability. Signed-off-by: Alyssa Rosenzweig <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jonathan Corbet <[email protected]>
2021-09-03futex: Remove unused variable 'vpid' in futex_proxy_trylock_atomic()Thomas Gleixner1-2/+1
The recent bug fix left the variable 'vpid' and an assignment to it around, but the variable is otherwise unused. clang dose not complain even with W=1, but gcc exposed this. Fixes: 4f07ec0d76f2 ("futex: Prevent inconsistent state and exit race") Signed-off-by: Thomas Gleixner <[email protected]>
2021-09-03Merge tag 'powerpc-5.15-1' of ↵Linus Torvalds212-2616/+2814
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: - Convert pseries & powernv to use MSI IRQ domains. - Rework the pseries CPU numbering so that CPUs that are removed, and later re-added, are given a CPU number on the same node as previously, when possible. - Add support for a new more flexible device-tree format for specifying NUMA distances. - Convert powerpc to GENERIC_PTDUMP. - Retire sbc8548 and sbc8641d board support. - Various other small features and fixes. Thanks to Alexey Kardashevskiy, Aneesh Kumar K.V, Anton Blanchard, Cédric Le Goater, Christophe Leroy, Emmanuel Gil Peyrot, Fabiano Rosas, Fangrui Song, Finn Thain, Gautham R. Shenoy, Hari Bathini, Joel Stanley, Jordan Niethe, Kajol Jain, Laurent Dufour, Leonardo Bras, Lukas Bulwahn, Marc Zyngier, Masahiro Yamada, Michal Suchanek, Nathan Chancellor, Nicholas Piggin, Parth Shah, Paul Gortmaker, Pratik R. Sampat, Randy Dunlap, Sebastian Andrzej Siewior, Srikar Dronamraju, Wan Jiabing, Xiongwei Song, and Zheng Yongjun. * tag 'powerpc-5.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (154 commits) powerpc/bug: Cast to unsigned long before passing to inline asm powerpc/ptdump: Fix generic ptdump for 64-bit KVM: PPC: Fix clearing never mapped TCEs in realmode powerpc/pseries/iommu: Rename "direct window" to "dma window" powerpc/pseries/iommu: Make use of DDW for indirect mapping powerpc/pseries/iommu: Find existing DDW with given property name powerpc/pseries/iommu: Update remove_dma_window() to accept property name powerpc/pseries/iommu: Reorganize iommu_table_setparms*() with new helper powerpc/pseries/iommu: Add ddw_property_create() and refactor enable_ddw() powerpc/pseries/iommu: Allow DDW windows starting at 0x00 powerpc/pseries/iommu: Add ddw_list_new_entry() helper powerpc/pseries/iommu: Add iommu_pseries_alloc_table() helper powerpc/kernel/iommu: Add new iommu_table_in_use() helper powerpc/pseries/iommu: Replace hard-coded page shift powerpc/numa: Update cpu_cpu_map on CPU online/offline powerpc/numa: Print debug statements only when required powerpc/numa: convert printk to pr_xxx powerpc/numa: Drop dbg in favour of pr_debug powerpc/smp: Enable CACHE domain for shared processor powerpc/smp: Update cpu_core_map on all PowerPc systems ...
2021-09-03Merge tag 'for-5.15/parisc-2' of ↵Linus Torvalds2-69/+1
git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux Pull parisc architecture fixes from Helge Deller: "Fix an unaligned-access crash in the bootloader and drop asm/swab.h" * tag 'for-5.15/parisc-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux: parisc: Fix unaligned-access crash in bootloader parisc: Drop __arch_swab16(), arch_swab24(), _arch_swab32() and __arch_swab64() functions
2021-09-03clk: qcom: gcc-sm6350: Remove unused variableKonrad Dybcio1-4/+0
In the commit "clk: qcom: Add SM6350 GCC driver" (no hash yet) an unused variable has been overlooked. Remove it. Signed-off-by: Konrad Dybcio <[email protected]> Link: https://lore.kernel.org/r/[email protected] Reported-by: kernel test robot <[email protected]> Signed-off-by: Stephen Boyd <[email protected]>
2021-09-03Merge tag 'mips_5.15' of ↵Linus Torvalds53-783/+346
git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux Pull MIPS updates from Thomas Bogendoerfer: - converted Pistachio platform to use MIPS generic kernel - fixes and cleanups * tag 'mips_5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux: (29 commits) MIPS: Malta: fix alignment of the devicetree buffer MIPS: ingenic: Unconditionally enable clock of CPU #0 MIPS: mscc: ocelot: mark the phy-mode for internal PHY ports MIPS: mscc: ocelot: disable all switch ports by default MAINTAINERS: adjust PISTACHIO SOC SUPPORT after its retirement MIPS: Return true/false (not 1/0) from bool functions MIPS: generic: Return true/false (not 1/0) from bool functions MIPS: Make a alias for pistachio_defconfig MIPS: Retire MACH_PISTACHIO MIPS: config: generic: Add config for Marduk board pinctrl: pistachio: Make it as an option phy: pistachio-usb: Depend on MIPS || COMPILE_TEST clocksource/drivers/pistachio: Make it selectable for MIPS clk: pistachio: Make it selectable for generic MIPS kernel MIPS: DTS: Pistachio add missing cpc and cdmm MIPS: generic: Allow generating FIT image for Marduk board MIPS: locking/atomic: Fix atomic{_64,}_sub_if_positive MIPS: loongson2ef: don't build serial.o unconditionally MIPS: Replace deprecated CPU-hotplug functions. MIPS: Alchemy: Fix spelling contraction "cant" -> "can't" ...
2021-09-03Merge tag 'for-linus' of git://github.com/openrisc/linuxLinus Torvalds10-40/+58
Pull OpenRISC updates from Stafford Horne: "A few cleanups and compiler warning fixes for OpenRISC. Also, this includes dts and defconfig updates to enable Ethernet on OpenRISC/Litex FPGA SoC's now that the LiteEth driver has gone upstream" * tag 'for-linus' of git://github.com/openrisc/linux: openrisc/litex: Update defconfig openrisc/litex: Add ethernet device openrisc/litex: Update uart address openrisc: Fix compiler warnings in setup openrisc: rename or32 code & comments to or1k openrisc: don't printk() unconditionally
2021-09-03Merge tag 'livepatching-for-5.15' of ↵Linus Torvalds1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/livepatching/livepatching Pull livepatching update from Petr Mladek. * tag 'livepatching-for-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/livepatching/livepatching: livepatch: Replace deprecated CPU-hotplug functions.
2021-09-03Merge tag 'iommu-updates-v5.15' of ↵Linus Torvalds43-607/+2034
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull iommu updates from Joerg Roedel: - New DART IOMMU driver for Apple Silicon M1 chips - Optimizations for iommu_[map/unmap] performance - Selective TLB flush support for the AMD IOMMU driver to make it more efficient on emulated IOMMUs - Rework IOVA setup and default domain type setting to move more code out of IOMMU drivers and to support runtime switching between certain types of default domains - VT-d Updates from Lu Baolu: - Update the virtual command related registers - Enable Intel IOMMU scalable mode by default - Preset A/D bits for user space DMA usage - Allow devices to have more than 32 outstanding PRs - Various cleanups - ARM SMMU Updates from Will Deacon: SMMUv3: - Minor optimisation to avoid zeroing struct members on CMD submission - Increased use of batched commands to reduce submission latency - Refactoring in preparation for ECMDQ support SMMUv2: - Fix races when probing devices with identical StreamIDs - Optimise walk cache flushing for Qualcomm implementations - Allow deep sleep states for some Qualcomm SoCs with shared clocks - Various smaller optimizations, cleanups, and fixes * tag 'iommu-updates-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (85 commits) iommu/io-pgtable: Abstract iommu_iotlb_gather access iommu/arm-smmu: Fix missing unlock on error in arm_smmu_device_group() iommu/vt-d: Add present bit check in pasid entry setup helpers iommu/vt-d: Use pasid_pte_is_present() helper function iommu/vt-d: Drop the kernel doc annotation iommu/vt-d: Allow devices to have more than 32 outstanding PRs iommu/vt-d: Preset A/D bits for user space DMA usage iommu/vt-d: Enable Intel IOMMU scalable mode by default iommu/vt-d: Refactor Kconfig a bit iommu/vt-d: Remove unnecessary oom message iommu/vt-d: Update the virtual command related registers iommu: Allow enabling non-strict mode dynamically iommu: Merge strictness and domain type configs iommu: Only log strictness for DMA domains iommu: Expose DMA domain strictness via sysfs iommu: Express DMA strictness via the domain type iommu/vt-d: Prepare for multiple DMA domain types iommu/arm-smmu: Prepare for multiple DMA domain types iommu/amd: Prepare for multiple DMA domain types iommu: Introduce explicit type for non-strict DMA domains ...
2021-09-03SUNRPC: improve error response to over-size gss credentialNeilBrown2-1/+3
When the NFS server receives a large gss (kerberos) credential and tries to pass it up to rpc.svcgssd (which is deprecated), it triggers an infinite loop in cache_read(). cache_request() always returns -EAGAIN, and this causes a "goto again". This patch: - changes the error to -E2BIG to avoid the infinite loop, and - generates a WARN_ONCE when rsi_request first sees an over-sized credential. The warning suggests switching to gssproxy. Link: https://bugzilla.kernel.org/show_bug.cgi?id=196583 Signed-off-by: NeilBrown <[email protected]> Signed-off-by: Chuck Lever <[email protected]>
2021-09-03Merge branch 'stable/for-linus-5.15' of ↵Linus Torvalds16-136/+469
git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb Pull swiotlb updates from Konrad Rzeszutek Wilk: "A new feature called restricted DMA pools. It allows SWIOTLB to utilize per-device (or per-platform) allocated memory pools instead of using the global one. The first big user of this is ARM Confidential Computing where the memory for DMA operations can be set per platform" * 'stable/for-linus-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb: (23 commits) swiotlb: use depends on for DMA_RESTRICTED_POOL of: restricted dma: Don't fail device probe on rmem init failure of: Move of_dma_set_restricted_buffer() into device.c powerpc/svm: Don't issue ultracalls if !mem_encrypt_active() s390/pv: fix the forcing of the swiotlb swiotlb: Free tbl memory in swiotlb_exit() swiotlb: Emit diagnostic in swiotlb_exit() swiotlb: Convert io_default_tlb_mem to static allocation of: Return success from of_dma_set_restricted_buffer() when !OF_ADDRESS swiotlb: add overflow checks to swiotlb_bounce swiotlb: fix implicit debugfs declarations of: Add plumbing for restricted DMA pool dt-bindings: of: Add restricted DMA pool swiotlb: Add restricted DMA pool initialization swiotlb: Add restricted DMA alloc/free support swiotlb: Refactor swiotlb_tbl_unmap_single swiotlb: Move alloc_size to swiotlb_find_slots swiotlb: Use is_swiotlb_force_bounce for swiotlb data bouncing swiotlb: Update is_swiotlb_active to add a struct device argument swiotlb: Update is_swiotlb_buffer to add a struct device argument ...
2021-09-03Merge branch 'akpm' (patches from Andrew)Linus Torvalds171-1729/+3525
Merge misc updates from Andrew Morton: "173 patches. Subsystems affected by this series: ia64, ocfs2, block, and mm (debug, pagecache, gup, swap, shmem, memcg, selftests, pagemap, mremap, bootmem, sparsemem, vmalloc, kasan, pagealloc, memory-failure, hugetlb, userfaultfd, vmscan, compaction, mempolicy, memblock, oom-kill, migration, ksm, percpu, vmstat, and madvise)" * emailed patches from Andrew Morton <[email protected]>: (173 commits) mm/madvise: add MADV_WILLNEED to process_madvise() mm/vmstat: remove unneeded return value mm/vmstat: simplify the array size calculation mm/vmstat: correct some wrong comments mm/percpu,c: remove obsolete comments of pcpu_chunk_populated() selftests: vm: add COW time test for KSM pages selftests: vm: add KSM merging time test mm: KSM: fix data type selftests: vm: add KSM merging across nodes test selftests: vm: add KSM zero page merging test selftests: vm: add KSM unmerge test selftests: vm: add KSM merge test mm/migrate: correct kernel-doc notation mm: wire up syscall process_mrelease mm: introduce process_mrelease system call memblock: make memblock_find_in_range method private mm/mempolicy.c: use in_task() in mempolicy_slab_node() mm/mempolicy: unify the create() func for bind/interleave/prefer-many policies mm/mempolicy: advertise new MPOL_PREFERRED_MANY mm/hugetlb: add support for mempolicy MPOL_PREFERRED_MANY ...
2021-09-03mm/madvise: add MADV_WILLNEED to process_madvise()zhangkui1-0/+1
There is a usecase in Android that an app process's memory is swapped out by process_madvise() with MADV_PAGEOUT, such as the memory is swapped to zram or a backing device. When the process is scheduled to running, like switch to foreground, multiple page faults may cause the app dropped frames. To reduce the problem, System Management Software can read-ahead memory of the process immediately when the app switches to forground. Calling process_madvise() with MADV_WILLNEED can meet this need. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: zhangkui <[email protected]> Cc: David Hildenbrand <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-09-03mm/vmstat: remove unneeded return valueMiaohe Lin1-6/+2
The return value of pagetypeinfo_showfree and pagetypeinfo_showblockcount are unused now. Remove them. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Miaohe Lin <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-09-03mm/vmstat: simplify the array size calculationMiaohe Lin1-5/+3
We can replace the array_num * sizeof(array[0]) with sizeof(array) to simplify the code. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Miaohe Lin <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-09-03mm/vmstat: correct some wrong commentsMiaohe Lin1-6/+1
Patch series "Cleanup for vmstat". This series contains cleanups to remove unneeded return value, correct wrong comment and simplify the array size calculation. More details can be found in the respective changelogs. This patch (of 3): Correct wrong fls(mem+1) to fls(mem)+1 and remove the duplicated comment with quiet_vmstat(). Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Miaohe Lin <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-09-03mm/percpu,c: remove obsolete comments of pcpu_chunk_populated()Jing Xiangfeng1-3/+0
Commit b239f7daf553 ("percpu: set PCPU_BITMAP_BLOCK_SIZE to PAGE_SIZE") removed the parameter 'for_alloc', so remove this comment. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Jing Xiangfeng <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-09-03selftests: vm: add COW time test for KSM pagesZhansaya Bagdauletkyzy1-3/+83
Since merged pages are copied every time they need to be modified, the write access time is different between shared and non-shared pages. Add ksm_cow_time() function which evaluates latency of these COW breaks. First, 4000 pages are allocated and the time, required to modify 1 byte in every other page, is measured. After this, the pages are merged into 2000 pairs and in each pair, 1 page is modified (i.e. they are decoupled) to detect COW breaks. The time needed to break COW of merged pages is then compared with performance of non-shared pages. The test is run as follows: ./ksm_tests -C The output: Total size: 15 MiB Not merged pages: Total time: 0.002185489 s Average speed: 3202.945 MiB/s Merged pages: Total time: 0.004386872 s Average speed: 1595.670 MiB/s Link: https://lkml.kernel.org/r/1d03ee0d1b341959d4b61672c6401d498bff5652.1629386192.git.zhansayabagdaulet@gmail.com Signed-off-by: Zhansaya Bagdauletkyzy <[email protected]> Reviewed-by: Tyler Hicks <[email protected]> Reviewed-by: Pavel Tatashin <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-09-03selftests: vm: add KSM merging time testZhansaya Bagdauletkyzy1-4/+70
Patch series "add KSM performance tests", v3. Extend KSM self tests with a performance benchmark. These tests are not part of regular regression testing, as they are mainly intended to be used by developers making changes to the memory management subsystem. This patch (of 2): Add ksm_merge_time() function to determine speed and time needed for merging. The total spent time is shown in seconds while speed is in MiB/s. User must specify the size of duplicated memory area (in MiB) before running the test. The test is run as follows: ./ksm_tests -P -s 100 The output: Total size: 100 MiB Total time: 0.201106786 s Average speed: 497.248 MiB/s Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/318b946ac80cc9205c89d0962048378f7ce0705b.1629386192.git.zhansayabagdaulet@gmail.com Signed-off-by: Zhansaya Bagdauletkyzy <[email protected]> Reviewed-by: Tyler Hicks <[email protected]> Reviewed-by: Pavel Tatashin <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-09-03mm: KSM: fix data typeZhansaya Bagdauletkyzy1-4/+4
ksm_stable_node_chains_prune_millisecs is declared as int, but in stable__node_chains_prune_millisecs_store(), it can store values up to UINT_MAX. Change its type to unsigned int. Link: https://lkml.kernel.org/r/20210806111351.GA71845@asus Signed-off-by: Zhansaya Bagdauletkyzy <[email protected]> Cc: Hugh Dickins <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-09-03selftests: vm: add KSM merging across nodes testZhansaya Bagdauletkyzy3-3/+119
Add check_ksm_numa_merge() function to test that pages in different NUMA nodes are being handled properly. First, two duplicate pages are allocated in two separate NUMA nodes using the libnuma library. Since there is one unique page in each node, with merge_across_nodes = 0, there won't be any shared pages. If merge_across_nodes is set to 1, the pages will be treated as usual duplicate pages and will be merged. If NUMA config is not enabled or the number of NUMA nodes is less than two, then the test is skipped. The test is run as follows: ./ksm_tests -N Link: https://lkml.kernel.org/r/071c17b5b04ebb0dfeba137acc495e5dd9d2a719.1626252248.git.zhansayabagdaulet@gmail.com Signed-off-by: Zhansaya Bagdauletkyzy <[email protected]> Reviewed-by: Pavel Tatashin <[email protected]> Reviewed-by: Tyler Hicks <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Shuah Khan <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-09-03selftests: vm: add KSM zero page merging testZhansaya Bagdauletkyzy2-3/+99
Add check_ksm_zero_page_merge() function to test that empty pages are being handled properly. For this, several zero pages are allocated and merged using madvise. If use_zero_pages is enabled, the pages must be shared with the special kernel zero pages; otherwise, they are merged as usual duplicate pages. The test is run as follows: ./ksm_tests -Z Link: https://lkml.kernel.org/r/6d0caab00d4bdccf5e3791cb95cf6dfd5eb85e45.1626252248.git.zhansayabagdaulet@gmail.com Signed-off-by: Zhansaya Bagdauletkyzy <[email protected]> Reviewed-by: Pavel Tatashin <[email protected]> Reviewed-by: Tyler Hicks <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Shuah Khan <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-09-03selftests: vm: add KSM unmerge testZhansaya Bagdauletkyzy2-5/+85
Add check_ksm_unmerge() function to verify that KSM is properly unmerging shared pages. For this, two duplicate pages are merged first and then their contents are modified. Since they are not identical anymore, the pages must be unmerged and the number of merged pages has to be 0. The test is run as follows: ./ksm_tests -U Link: https://lkml.kernel.org/r/c0f55420440d704d5b094275b4365aa1b2ad46b5.1626252248.git.zhansayabagdaulet@gmail.com Signed-off-by: Zhansaya Bagdauletkyzy <[email protected]> Reviewed-by: Pavel Tatashin <[email protected]> Reviewed-by: Tyler Hicks <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Shuah Khan <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-09-03selftests: vm: add KSM merge testZhansaya Bagdauletkyzy4-0/+324
Patch series "add KSM selftests". Introduce selftests to validate the functionality of KSM. The tests are run on private anonymous pages. Since some KSM tunables are modified, their starting values are saved and restored after testing. At the start, run is set to 2 to ensure that only test pages will be merged (we assume that no applications make madvise syscalls in the background). If KSM config not enabled, all tests will be skipped. This patch (of 4): Add check_ksm_merge() function to check the basic merging feature of KSM. First, some number of identical pages are allocated and the MADV_MERGEABLE advice is given to merge these pages. Then, pages_shared and pages_sharing values are compared with the expected numbers using assert_ksm_pages_count() function. The number of pages can be changed using -p option. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/90287685c13300972ea84de93d1f3f900373f9fe.1626252248.git.zhansayabagdaulet@gmail.com Signed-off-by: Zhansaya Bagdauletkyzy <[email protected]> Reviewed-by: Pavel Tatashin <[email protected]> Reviewed-by: Tyler Hicks <[email protected]> Cc: Shuah Khan <[email protected]> Cc: Hugh Dickins <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-09-03mm/migrate: correct kernel-doc notationRandy Dunlap1-1/+1
Use the expected "Return:" format to prevent a kernel-doc warning. mm/migrate.c:1157: warning: Excess function parameter 'returns' description in 'next_demotion_node' Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Randy Dunlap <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-09-03mm: wire up syscall process_mreleaseSuren Baghdasaryan21-2/+38
Split off from prev patch in the series that implements the syscall. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Suren Baghdasaryan <[email protected]> Acked-by: Geert Uytterhoeven <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Christian Brauner <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: David Rientjes <[email protected]> Cc: Florian Weimer <[email protected]> Cc: Jan Engelhardt <[email protected]> Cc: Jann Horn <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Shakeel Butt <[email protected]> Cc: Tim Murray <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>