aboutsummaryrefslogtreecommitdiff
path: root/arch/s390
AgeCommit message (Collapse)AuthorFilesLines
2022-07-11s390/airq: allow for airq structure that uses an input vectorMatthew Rosato2-5/+7
When doing device passthrough where interrupts are being forwarded from host to guest, we wish to use a pinned section of guest memory as the vector (the same memory used by the guest as the vector). To accomplish this, add a new parameter for airq_iv_create which allows passing an existing vector to be used instead of allocating a new one. The caller is responsible for ensuring the vector is pinned in memory as well as for unpinning the memory when the vector is no longer needed. A subsequent patch will use this new parameter for zPCI interpretation. Reviewed-by: Thomas Huth <[email protected]> Reviewed-by: Pierre Morel <[email protected]> Reviewed-by: Claudio Imbrenda <[email protected]> Acked-by: Cornelia Huck <[email protected]> Signed-off-by: Matthew Rosato <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Christian Borntraeger <[email protected]>
2022-07-11s390/airq: pass more TPI info to airq handlersMatthew Rosato3-4/+12
A subsequent patch will introduce an airq handler that requires additional TPI information beyond directed vs floating, so pass the entire tpi_info structure via the handler. Only pci actually uses this information today, for the other airq handlers this is effectively a no-op. Reviewed-by: Eric Farman <[email protected]> Reviewed-by: Claudio Imbrenda <[email protected]> Reviewed-by: Pierre Morel <[email protected]> Reviewed-by: Thomas Huth <[email protected]> Acked-by: Christian Borntraeger <[email protected]> Acked-by: Cornelia Huck <[email protected]> Signed-off-by: Matthew Rosato <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Christian Borntraeger <[email protected]>
2022-07-11s390/sclp: detect the AISI facilityMatthew Rosato1-0/+1
Detect the Adapter Interruption Suppression Interpretation facility. Reviewed-by: Eric Farman <[email protected]> Reviewed-by: Christian Borntraeger <[email protected]> Reviewed-by: Claudio Imbrenda <[email protected]> Signed-off-by: Matthew Rosato <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Christian Borntraeger <[email protected]>
2022-07-11s390/sclp: detect the AENI facilityMatthew Rosato1-0/+1
Detect the Adapter Event Notification Interpretation facility. Reviewed-by: Eric Farman <[email protected]> Reviewed-by: Christian Borntraeger <[email protected]> Reviewed-by: Claudio Imbrenda <[email protected]> Signed-off-by: Matthew Rosato <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Christian Borntraeger <[email protected]>
2022-07-11s390/sclp: detect the AISII facilityMatthew Rosato1-0/+1
Detect the Adapter Interruption Source ID Interpretation facility. Reviewed-by: Eric Farman <[email protected]> Reviewed-by: Christian Borntraeger <[email protected]> Reviewed-by: Claudio Imbrenda <[email protected]> Signed-off-by: Matthew Rosato <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Christian Borntraeger <[email protected]>
2022-07-11s390/sclp: detect the zPCI load/store interpretation facilityMatthew Rosato1-0/+1
Detect the zPCI Load/Store Interpretation facility. Reviewed-by: Eric Farman <[email protected]> Reviewed-by: Christian Borntraeger <[email protected]> Reviewed-by: Claudio Imbrenda <[email protected]> Signed-off-by: Matthew Rosato <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Christian Borntraeger <[email protected]>
2022-06-30bitops: unify non-atomic bitops prototypes across architecturesAlexander Lobakin1-30/+31
Currently, there is a mess with the prototypes of the non-atomic bitops across the different architectures: ret bool, int, unsigned long nr int, long, unsigned int, unsigned long addr volatile unsigned long *, volatile void * Thankfully, it doesn't provoke any bugs, but can sometimes make the compiler angry when it's not handy at all. Adjust all the prototypes to the following standard: ret bool retval can be only 0 or 1 nr unsigned long native; signed makes no sense addr volatile unsigned long * bitmaps are arrays of ulongs Next, some architectures don't define 'arch_' versions as they don't support instrumentation, others do. To make sure there is always the same set of callables present and to ease any potential future changes, make them all follow the rule: * architecture-specific files define only 'arch_' versions; * non-prefixed versions can be defined only in asm-generic files; and place the non-prefixed definitions into a new file in asm-generic to be included by non-instrumented architectures. Finally, add some static assertions in order to prevent people from making a mess in this room again. I also used the %__always_inline attribute consistently, so that they always get resolved to the actual operations. Suggested-by: Andy Shevchenko <[email protected]> Signed-off-by: Alexander Lobakin <[email protected]> Acked-by: Mark Rutland <[email protected]> Reviewed-by: Yury Norov <[email protected]> Reviewed-by: Andy Shevchenko <[email protected]> Signed-off-by: Yury Norov <[email protected]>
2022-06-30s390/qdio: Fix spelling mistakeZhang Jiaming1-3/+3
Change 'defineable' to 'definable'. Change 'paramater' to 'parameter'. Signed-off-by: Zhang Jiaming <[email protected]> Reviewed-by: Benjamin Block <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexander Gordeev <[email protected]>
2022-06-30s390/archrandom: simplify back to earlier design and initialize earlierJason A. Donenfeld3-224/+12
s390x appears to present two RNG interfaces: - a "TRNG" that gathers entropy using some hardware function; and - a "DRBG" that takes in a seed and expands it. Previously, the TRNG was wired up to arch_get_random_{long,int}(), but it was observed that this was being called really frequently, resulting in high overhead. So it was changed to be wired up to arch_get_random_ seed_{long,int}(), which was a reasonable decision. Later on, the DRBG was then wired up to arch_get_random_{long,int}(), with a complicated buffer filling thread, to control overhead and rate. Fortunately, none of the performance issues matter much now. The RNG always attempts to use arch_get_random_seed_{long,int}() first, which means a complicated implementation of arch_get_random_{long,int}() isn't really valuable or useful to have around. And it's only used when reseeding, which means it won't hit the high throughput complications that were faced before. So this commit returns to an earlier design of just calling the TRNG in arch_get_random_seed_{long,int}(), and returning false in arch_get_ random_{long,int}(). Part of what makes the simplification possible is that the RNG now seeds itself using the TRNG at bootup. But this only works if the TRNG is detected early in boot, before random_init() is called. So this commit also causes that check to happen in setup_arch(). Cc: [email protected] Cc: Harald Freudenberger <[email protected]> Cc: Ingo Franzki <[email protected]> Cc: Juergen Christ <[email protected]> Cc: Heiko Carstens <[email protected]> Signed-off-by: Jason A. Donenfeld <[email protected]> Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Harald Freudenberger <[email protected]> Acked-by: Heiko Carstens <[email protected]> Signed-off-by: Alexander Gordeev <[email protected]>
2022-06-30s390/purgatory: remove duplicated build rule of kexec-purgatory.oMasahiro Yamada1-2/+1
This is equivalent to the pattern rule in scripts/Makefile.build. Having the dependency on $(obj)/purgatory.ro is enough. Signed-off-by: Masahiro Yamada <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexander Gordeev <[email protected]>
2022-06-30s390/purgatory: hard-code obj-y in MakefileMasahiro Yamada1-1/+1
The purgatory/ directory is entirely guarded in arch/s390/Kbuild. CONFIG_ARCH_HAS_KEXEC_PURGATORY is bool type. $(CONFIG_ARCH_HAS_KEXEC_PURGATORY) is always 'y' when Kbuild visits this Makefile for building. Signed-off-by: Masahiro Yamada <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexander Gordeev <[email protected]>
2022-06-30s390: remove unneeded 'select BUILD_BIN2C'Masahiro Yamada1-1/+0
Since commit 4c0f032d4963 ("s390/purgatory: Omit use of bin2c"), s390 builds the purgatory without using bin2c. Remove 'select BUILD_BIN2C' to avoid the unneeded build of bin2c. Fixes: 4c0f032d4963 ("s390/purgatory: Omit use of bin2c") Signed-off-by: Masahiro Yamada <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexander Gordeev <[email protected]>
2022-06-28treewide: uapi: Replace zero-length arrays with flexible-array membersGustavo A. R. Silva1-3/+3
There is a regular need in the kernel to provide a way to declare having a dynamically sized set of trailing elements in a structure. Kernel code should always use “flexible array members”[1] for these cases. The older style of one-element or zero-length arrays should no longer be used[2]. This code was transformed with the help of Coccinelle: (linux-5.19-rc2$ spatch --jobs $(getconf _NPROCESSORS_ONLN) --sp-file script.cocci --include-headers --dir . > output.patch) @@ identifier S, member, array; type T1, T2; @@ struct S { ... T1 member; T2 array[ - 0 ]; }; -fstrict-flex-arrays=3 is coming and we need to land these changes to prevent issues like these in the short future: ../fs/minix/dir.c:337:3: warning: 'strcpy' will always overflow; destination buffer has size 0, but the source string has length 2 (including NUL byte) [-Wfortify-source] strcpy(de3->name, "."); ^ Since these are all [0] to [] changes, the risk to UAPI is nearly zero. If this breaks anything, we can use a union with a new member name. [1] https://en.wikipedia.org/wiki/Flexible_array_member [2] https://www.kernel.org/doc/html/v5.16/process/deprecated.html#zero-length-and-one-element-arrays Link: https://github.com/KSPP/linux/issues/78 Build-tested-by: kernel test robot <[email protected]> Link: https://lore.kernel.org/lkml/62b675ec.wKX6AOZ6cbE71vtF%[email protected]/ Acked-by: Dan Williams <[email protected]> # For ndctl.h Signed-off-by: Gustavo A. R. Silva <[email protected]>
2022-06-27Merge branch 'master' into mm-stableakpm3-7/+45
2022-06-24jump_label: make initial NOP patching the special caseArd Biesheuvel1-5/+0
Instead of defaulting to patching NOP opcodes at init time, and leaving it to the architectures to override this if this is not needed, switch to a model where doing nothing is the default. This is the common case by far, as only MIPS requires NOP patching at init time. On all other architectures, the correct encodings are emitted by the compiler and so no initial patching is needed. Signed-off-by: Ard Biesheuvel <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Acked-by: Mark Rutland <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-06-24jump_label: mips: move module NOP patching into arch codeArd Biesheuvel1-1/+0
MIPS is the only remaining architecture that needs to patch jump label NOP encodings to initialize them at load time. So let's move the module patching part of that from generic code into arch/mips, and drop it from the others. Signed-off-by: Ard Biesheuvel <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-06-24jump_label: s390: avoid pointless initial NOP patchingArd Biesheuvel2-21/+7
Patching NOPs into other NOPs at boot time serves no purpose, so let's use the same NOP encodings at compile time and runtime. Signed-off-by: Ard Biesheuvel <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-06-23s390/pai: Fix multiple concurrent event installationThomas Richter1-3/+12
Two different events such as pai_crypto/KM_AES_128/ and pai_crypto/KM_AES_192/ can be installed multiple times on the same CPU and the events are executed concurrently: # perf stat -e pai_crypto/KM_AES_128/ -C0 -a -- sleep 5 & # sleep 2 # perf stat -e pai_crypto/KM_AES_192/ -C0 -a -- true This results in the first event being installed two times with two seconds delay. The kernel does install the second event after the first event has been deleted and re-added, as can be seen in the traces: 13:48:47.600350 paicrypt_start event 0x1007 (event KM_AES_128) 13:48:49.599359 paicrypt_stop event 0x1007 (event KM_AES_128) 13:48:49.599198 paicrypt_start event 0x1007 13:48:49.599199 paicrypt_start event 0x1008 13:48:49.599921 paicrypt_event_destroy event 0x1008 13:48:52.601507 paicrypt_event_destroy event 0x1007 This is caused by functions event_sched_in() and event_sched_out() which call the PMU's add() and start() functions on schedule_in and the PMU's stop() and del() functions on schedule_out. This is correct for events attached to processes. The pai_crypto events are system-wide events and not attached to processes. Since the kernel common code can not be changed easily, fix this issue and do not reset the event count value to zero each time the event is added and started. Instead use a flag and zero the event count value only when called immediately after the event has been initialized. Therefore only the first invocation of the the event's add() function initializes the event count value to zero. The following invocations of the event's add() function leave the current event count value untouched. Fixes: 39d62336f5c1 ("s390/pai: add support for cryptography counters") Reported-by: Sumanth Korikkar <[email protected]> Signed-off-by: Thomas Richter <[email protected]> Acked-by: Sumanth Korikkar <[email protected]> Signed-off-by: Alexander Gordeev <[email protected]>
2022-06-23s390/pai: Prevent invalid event number for pai_crypto PMUThomas Richter1-2/+3
The pai_crypto PMU has to check the event number. It has to be in the supported range. This is not the case, the lower limit is not checked. Fix this and obey the lower limit. Fixes: 39d62336f5c1 ("s390/pai: add support for cryptography counters") Signed-off-by: Thomas Richter <[email protected]> Suggested-by: Sumanth Korikkar <[email protected]> Reviewed-by: Sumanth Korikkar <[email protected]> Signed-off-by: Alexander Gordeev <[email protected]>
2022-06-23s390/cpumf: Handle events cycles and instructions identicalThomas Richter1-1/+21
Events CPU_CYCLES and INSTRUCTIONS can be submitted with two different perf_event attribute::type values: - PERF_TYPE_HARDWARE: when invoked via perf tool predefined events name cycles or cpu-cycles or instructions. - pmu->type: when invoked via perf tool event name cpu_cf/CPU_CYLCES/ or cpu_cf/INSTRUCTIONS/. This invocation also selects the PMU to which the event belongs. Handle both type of invocations identical for events CPU_CYLCES and INSTRUCTIONS. They address the same hardware. The result is different when event modifier exclude_kernel is also set. Invocation with event modifier for user space event counting fails. Output before: # perf stat -e cpum_cf/cpu_cycles/u -- true Performance counter stats for 'true': <not supported> cpum_cf/cpu_cycles/u 0.000761033 seconds time elapsed 0.000076000 seconds user 0.000725000 seconds sys # Output after: # perf stat -e cpum_cf/cpu_cycles/u -- true Performance counter stats for 'true': 349,613 cpum_cf/cpu_cycles/u 0.000844143 seconds time elapsed 0.000079000 seconds user 0.000800000 seconds sys # Fixes: 6a82e23f45fe ("s390/cpumf: Adjust registration of s390 PMU device drivers") Signed-off-by: Thomas Richter <[email protected]> Acked-by: Sumanth Korikkar <[email protected]> [[email protected] corrected commit ID of Fixes commit] Signed-off-by: Alexander Gordeev <[email protected]>
2022-06-23s390/crash: make copy_oldmem_page() return number of bytes copiedAlexander Gordeev1-3/+4
Callback copy_oldmem_page() returns either error code or zero. Instead, it should return the error code or number of bytes copied. Fixes: df9694c7975f ("s390/dump: streamline oldmem copy functions") Reviewed-by: Alexander Egorenkov <[email protected]> Tested-by: Alexander Egorenkov <[email protected]> Signed-off-by: Alexander Gordeev <[email protected]>
2022-06-23s390/crash: add missing iterator advance in copy_oldmem_page()Alexander Gordeev1-0/+7
In case old memory was successfully copied the passed iterator should be advanced as well. Currently copy_oldmem_page() is always called with single-segment iterator. Should that ever change - copy_oldmem_user and copy_oldmem_kernel() functions would need a rework to deal with multi-segment iterators. Fixes: 5d8de293c224 ("vmcore: convert copy_oldmem_page() to take an iov_iter") Reviewed-by: Alexander Egorenkov <[email protected]> Tested-by: Alexander Egorenkov <[email protected]> Signed-off-by: Alexander Gordeev <[email protected]>
2022-06-16mm: avoid unnecessary page fault retires on shared memory typesPeter Xu1-0/+12
I observed that for each of the shared file-backed page faults, we're very likely to retry one more time for the 1st write fault upon no page. It's because we'll need to release the mmap lock for dirty rate limit purpose with balance_dirty_pages_ratelimited() (in fault_dirty_shared_page()). Then after that throttling we return VM_FAULT_RETRY. We did that probably because VM_FAULT_RETRY is the only way we can return to the fault handler at that time telling it we've released the mmap lock. However that's not ideal because it's very likely the fault does not need to be retried at all since the pgtable was well installed before the throttling, so the next continuous fault (including taking mmap read lock, walk the pgtable, etc.) could be in most cases unnecessary. It's not only slowing down page faults for shared file-backed, but also add more mmap lock contention which is in most cases not needed at all. To observe this, one could try to write to some shmem page and look at "pgfault" value in /proc/vmstat, then we should expect 2 counts for each shmem write simply because we retried, and vm event "pgfault" will capture that. To make it more efficient, add a new VM_FAULT_COMPLETED return code just to show that we've completed the whole fault and released the lock. It's also a hint that we should very possibly not need another fault immediately on this page because we've just completed it. This patch provides a ~12% perf boost on my aarch64 test VM with a simple program sequentially dirtying 400MB shmem file being mmap()ed and these are the time it needs: Before: 650.980 ms (+-1.94%) After: 569.396 ms (+-1.38%) I believe it could help more than that. We need some special care on GUP and the s390 pgfault handler (for gmap code before returning from pgfault), the rest changes in the page fault handlers should be relatively straightforward. Another thing to mention is that mm_account_fault() does take this new fault as a generic fault to be accounted, unlike VM_FAULT_RETRY. I explicitly didn't touch hmm_vma_fault() and break_ksm() because they do not handle VM_FAULT_RETRY even with existing code, so I'm literally keeping them as-is. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Peter Xu <[email protected]> Acked-by: Geert Uytterhoeven <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Acked-by: Johannes Weiner <[email protected]> Acked-by: Vineet Gupta <[email protected]> Acked-by: Guo Ren <[email protected]> Acked-by: Max Filippov <[email protected]> Acked-by: Christian Borntraeger <[email protected]> Acked-by: Michael Ellerman <[email protected]> (powerpc) Acked-by: Catalin Marinas <[email protected]> Reviewed-by: Alistair Popple <[email protected]> Reviewed-by: Ingo Molnar <[email protected]> Acked-by: Russell King (Oracle) <[email protected]> [arm part] Acked-by: Heiko Carstens <[email protected]> Cc: Vasily Gorbik <[email protected]> Cc: Stafford Horne <[email protected]> Cc: David S. Miller <[email protected]> Cc: Johannes Berg <[email protected]> Cc: Brian Cain <[email protected]> Cc: Richard Henderson <[email protected]> Cc: Richard Weinberger <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Janosch Frank <[email protected]> Cc: Albert Ou <[email protected]> Cc: Anton Ivanov <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Sven Schnelle <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: James Bottomley <[email protected]> Cc: Al Viro <[email protected]> Cc: Alexander Gordeev <[email protected]> Cc: Jonas Bonn <[email protected]> Cc: Will Deacon <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Michal Simek <[email protected]> Cc: Matt Turner <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Stefan Kristiansson <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: Ivan Kokshaysky <[email protected]> Cc: Chris Zankel <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Dinh Nguyen <[email protected]> Cc: Rich Felker <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Thomas Bogendoerfer <[email protected]> Cc: Helge Deller <[email protected]> Cc: Yoshinori Sato <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-06-15arch/*: Disable softirq stacks on PREEMPT_RT.Sebastian Andrzej Siewior1-1/+2
PREEMPT_RT preempts softirqs and the current implementation avoids do_softirq_own_stack() and only uses __do_softirq(). Disable the unused softirqs stacks on PREEMPT_RT to save some memory and ensure that do_softirq_own_stack() is not used bwcause it is not expected. Signed-off-by: Sebastian Andrzej Siewior <[email protected]> Signed-off-by: Arnd Bergmann <[email protected]>
2022-06-10Merge tag 'for-linus-5.19a-rc2-tag' of ↵Linus Torvalds2-11/+3
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen updates from Juergen Gross: - a small cleanup removing "export" of an __init function - a small series adding a new infrastructure for platform flags - a series adding generic virtio support for Xen guests (frontend side) * tag 'for-linus-5.19a-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen: unexport __init-annotated xen_xlate_map_ballooned_pages() arm/xen: Assign xen-grant DMA ops for xen-grant DMA devices xen/grant-dma-ops: Retrieve the ID of backend's domain for DT devices xen/grant-dma-iommu: Introduce stub IOMMU driver dt-bindings: Add xen,grant-dma IOMMU description for xen-grant DMA ops xen/virtio: Enable restricted memory access using Xen grant mappings xen/grant-dma-ops: Add option to restrict memory access under Xen xen/grants: support allocating consecutive grants arm/xen: Introduce xen_setup_dma_ops() virtio: replace arch_has_restricted_virtio_memory_access() kernel: add platform_has() infrastructure
2022-06-09gcc-12: disable '-Warray-bounds' universally for nowLinus Torvalds2-9/+2
In commit 8b202ee21839 ("s390: disable -Warray-bounds") the s390 people disabled the '-Warray-bounds' warning for gcc-12, because the new logic in gcc would cause warnings for their use of the S390_lowcore macro, which accesses absolute pointers. It turns out gcc-12 has many other issues in this area, so this takes that s390 warning disable logic, and turns it into a kernel build config entry instead. Part of the intent is that we can make this all much more targeted, and use this conflig flag to disable it in only particular configurations that cause problems, with the s390 case as an example: select GCC12_NO_ARRAY_BOUNDS and we could do that for other configuration cases that cause issues. Or we could possibly use the CONFIG_CC_NO_ARRAY_BOUNDS thing in a more targeted way, and disable the warning only for particular uses: again the s390 case as an example: KBUILD_CFLAGS_DECOMPRESSOR += $(if $(CONFIG_CC_NO_ARRAY_BOUNDS),-Wno-array-bounds) but this ends up just doing it globally in the top-level Makefile, since the current issues are spread fairly widely all over: KBUILD_CFLAGS-$(CONFIG_CC_NO_ARRAY_BOUNDS) += -Wno-array-bounds We'll try to limit this later, since the gcc-12 problems are rare enough that *much* of the kernel can be built with it without disabling this warning. Cc: Kees Cook <[email protected]> Cc: Nathan Chancellor <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-06-08KVM: Move kvm_arch_vcpu_precreate() under kvm->lockZeng Guang1-2/+0
kvm_arch_vcpu_precreate() targets to handle arch specific VM resource to be prepared prior to the actual creation of vCPU. For example, x86 platform may need do per-VM allocation based on max_vcpu_ids at the first vCPU creation. It probably leads to concurrency control on this allocation as multiple vCPU creation could happen simultaneously. From the architectual point of view, it's necessary to execute kvm_arch_vcpu_precreate() under protect of kvm->lock. Currently only arm64, x86 and s390 have non-nop implementations at the stage of vCPU pre-creation. Remove the lock acquiring in s390's design and make sure all architecture can run kvm_arch_vcpu_precreate() safely under kvm->lock without recrusive lock issue. Suggested-by: Sean Christopherson <[email protected]> Signed-off-by: Zeng Guang <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2022-06-07No need of likely/unlikely on calls of check_copy_size()Al Viro1-2/+2
it's inline and unlikely() inside of it (including the implicit one in WARN_ON_ONCE()) suffice to convince the compiler that getting false from check_copy_size() is unlikely. Spotted-by: Jens Axboe <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Christian Brauner (Microsoft) <[email protected]> Signed-off-by: Al Viro <[email protected]>
2022-06-07Merge tag 'kvm-s390-next-5.19-2' of ↵Paolo Bonzini7-1/+574
git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD KVM: s390: pvdump and selftest improvements - add an interface to provide a hypervisor dump for secure guests - improve selftests to show tests
2022-06-06virtio: replace arch_has_restricted_virtio_memory_access()Juergen Gross2-11/+3
Instead of using arch_has_restricted_virtio_memory_access() together with CONFIG_ARCH_HAS_RESTRICTED_VIRTIO_MEMORY_ACCESS, replace those with platform_has() and a new platform feature PLATFORM_VIRTIO_RESTRICTED_MEM_ACCESS. Signed-off-by: Juergen Gross <[email protected]> Reviewed-by: Oleksandr Tyshchenko <[email protected]> Tested-by: Oleksandr Tyshchenko <[email protected]> # Arm64 only Reviewed-by: Christoph Hellwig <[email protected]> Acked-by: Borislav Petkov <[email protected]>
2022-06-04Merge tag 'bitmap-for-5.19-rc1' of https://github.com/norov/linuxLinus Torvalds1-7/+3
Pull bitmap updates from Yury Norov: - bitmap: optimize bitmap_weight() usage, from me - lib/bitmap.c make bitmap_print_bitmask_to_buf parseable, from Mauro Carvalho Chehab - include/linux/find: Fix documentation, from Anna-Maria Behnsen - bitmap: fix conversion from/to fix-sized arrays, from me - bitmap: Fix return values to be unsigned, from Kees Cook It has been in linux-next for at least a week with no problems. * tag 'bitmap-for-5.19-rc1' of https://github.com/norov/linux: (31 commits) nodemask: Fix return values to be unsigned bitmap: Fix return values to be unsigned KVM: x86: hyper-v: replace bitmap_weight() with hweight64() KVM: x86: hyper-v: fix type of valid_bank_mask ia64: cleanup remove_siblinginfo() drm/amd/pm: use bitmap_{from,to}_arr32 where appropriate KVM: s390: replace bitmap_copy with bitmap_{from,to}_arr64 where appropriate lib/bitmap: add test for bitmap_{from,to}_arr64 lib: add bitmap_{from,to}_arr64 lib/bitmap: extend comment for bitmap_(from,to)_arr32() include/linux/find: Fix documentation lib/bitmap.c make bitmap_print_bitmask_to_buf parseable MAINTAINERS: add cpumask and nodemask files to BITMAP_API arch/x86: replace nodes_weight with nodes_empty where appropriate mm/vmstat: replace cpumask_weight with cpumask_empty where appropriate clocksource: replace cpumask_weight with cpumask_empty in clocksource.c genirq/affinity: replace cpumask_weight with cpumask_empty where appropriate irq: mips: replace cpumask_weight with cpumask_empty where appropriate drm/i915/pmu: replace cpumask_weight with cpumask_empty where appropriate arch/x86: replace cpumask_weight with cpumask_empty where appropriate ...
2022-06-03Merge tag 'kthread-cleanups-for-v5.19' of ↵Linus Torvalds1-5/+7
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull kthread updates from Eric Biederman: "This updates init and user mode helper tasks to be ordinary user mode tasks. Commit 40966e316f86 ("kthread: Ensure struct kthread is present for all kthreads") caused init and the user mode helper threads that call kernel_execve to have struct kthread allocated for them. This struct kthread going away during execve in turned made a use after free of struct kthread possible. Here, commit 343f4c49f243 ("kthread: Don't allocate kthread_struct for init and umh") is enough to fix the use after free and is simple enough to be backportable. The rest of the changes pass struct kernel_clone_args to clean things up and cause the code to make sense. In making init and the user mode helpers tasks purely user mode tasks I ran into two complications. The function task_tick_numa was detecting tasks without an mm by testing for the presence of PF_KTHREAD. The initramfs code in populate_initrd_image was using flush_delayed_fput to ensuere the closing of all it's file descriptors was complete, and flush_delayed_fput does not work in a userspace thread. I have looked and looked and more complications and in my code review I have not found any, and neither has anyone else with the code sitting in linux-next" * tag 'kthread-cleanups-for-v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: sched: Update task_tick_numa to ignore tasks without an mm fork: Stop allowing kthreads to call execve fork: Explicitly set PF_KTHREAD init: Deal with the init process being a user mode process fork: Generalize PF_IO_WORKER handling fork: Explicity test for idle tasks in copy_thread fork: Pass struct kernel_clone_args into copy_thread kthread: Don't allocate kthread_struct for init and umh
2022-06-03Merge tag 's390-5.19-2' of ↵Linus Torvalds18-211/+288
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull more s390 updates from Heiko Carstens: "Just a couple of small improvements, bug fixes and cleanups: - Add Eric Farman as maintainer for s390 virtio drivers. - Improve machine check handling, and avoid incorrectly injecting a machine check into a kvm guest. - Add cond_resched() call to gmap page table walker in order to avoid possible huge latencies. Also use non-quiesing sske instruction to speed up storage key handling. - Add __GFP_NORETRY to KEXEC_CONTROL_MEMORY_GFP so s390 behaves similar like common code. - Get sie control block address from correct stack slot in perf event code. This fixes potential random memory accesses. - Change uaccess code so that the exception handler sets the result of get_user() and __get_kernel_nofault() to zero in case of a fault. Until now this was done via input parameters for inline assemblies. Doing it via fault handling is what most or even all other architectures are doing. - Couple of other small cleanups and fixes" * tag 's390-5.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390/stack: add union to reflect kvm stack slot usages s390/stack: merge empty stack frame slots s390/uaccess: whitespace cleanup s390/uaccess: use __noreturn instead of __attribute__((noreturn)) s390/uaccess: use exception handler to zero result on get_user() failure s390/uaccess: use symbolic names for inline assembler operands s390/mcck: isolate SIE instruction when setting CIF_MCCK_GUEST flag s390/mm: use non-quiescing sske for KVM switch to keyed guest s390/gmap: voluntarily schedule during key setting MAINTAINERS: Update s390 virtio-ccw s390/kexec: add __GFP_NORETRY to KEXEC_CONTROL_MEMORY_GFP s390/Kconfig.debug: fix indentation s390/Kconfig: fix indentation s390/perf: obtain sie_block from the right address s390: generate register offsets into pt_regs automatically s390: simplify early program check handler s390/crypto: fix scatterwalk_unmap() callers in AES-GCM
2022-06-03KVM: s390: replace bitmap_copy with bitmap_{from,to}_arr64 where appropriateYury Norov1-7/+3
Copying bitmaps from/to 64-bit arrays with bitmap_copy is not safe on 32-bit BE machines. Use designated functions instead. CC: Alexander Gordeev <[email protected]> CC: Andy Shevchenko <[email protected]> CC: Christian Borntraeger <[email protected]> CC: Claudio Imbrenda <[email protected]> CC: David Hildenbrand <[email protected]> CC: Heiko Carstens <[email protected]> CC: Janosch Frank <[email protected]> CC: Rasmus Villemoes <[email protected]> CC: Sven Schnelle <[email protected]> CC: Vasily Gorbik <[email protected]> Signed-off-by: Yury Norov <[email protected]> Reviewed-by: David Hildenbrand <[email protected]>
2022-06-02Merge tag 'livepatching-for-5.19' of ↵Linus Torvalds1-22/+0
git://git.kernel.org/pub/scm/linux/kernel/git/livepatching/livepatching Pull livepatching cleanup from Petr Mladek: - Remove duplicated livepatch code [Christophe] * tag 'livepatching-for-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/livepatching/livepatching: livepatch: Remove klp_arch_set_pc() and asm/livepatch.h
2022-06-01KVM: s390: Add KVM_CAP_S390_PROTECTED_DUMPJanosch Frank1-0/+20
The capability indicates dump support for protected VMs. Signed-off-by: Janosch Frank <[email protected]> Reviewed-by: Claudio Imbrenda <[email protected]> Link: https://lore.kernel.org/r/[email protected] Message-Id: <[email protected]> Signed-off-by: Christian Borntraeger <[email protected]>
2022-06-01KVM: s390: Add CPU dump functionalityJanosch Frank3-0/+86
The previous patch introduced the per-VM dump functions now let's focus on dumping the VCPU state via the newly introduced KVM_S390_PV_CPU_COMMAND ioctl which mirrors the VM UV ioctl and can be extended with new commands later. Signed-off-by: Janosch Frank <[email protected]> Reviewed-by: Claudio Imbrenda <[email protected]> Link: https://lore.kernel.org/r/[email protected] Message-Id: <[email protected]> Signed-off-by: Christian Borntraeger <[email protected]>
2022-06-01KVM: s390: Add configuration dump functionalityJanosch Frank4-0/+280
Sometimes dumping inside of a VM fails, is unavailable or doesn't yield the required data. For these occasions we dump the VM from the outside, writing memory and cpu data to a file. Up to now PV guests only supported dumping from the inside of the guest through dumpers like KDUMP. A PV guest can be dumped from the hypervisor but the data will be stale and / or encrypted. To get the actual state of the PV VM we need the help of the Ultravisor who safeguards the VM state. New UV calls have been added to initialize the dump, dump storage state data, dump cpu data and complete the dump process. We expose these calls in this patch via a new UV ioctl command. The sensitive parts of the dump data are encrypted, the dump key is derived from the Customer Communication Key (CCK). This ensures that only the owner of the VM who has the CCK can decrypt the dump data. The memory is dumped / read via a normal export call and a re-import after the dump initialization is not needed (no re-encryption with a dump key). Signed-off-by: Janosch Frank <[email protected]> Reviewed-by: Claudio Imbrenda <[email protected]> Link: https://lore.kernel.org/r/[email protected] Message-Id: <[email protected]> Signed-off-by: Christian Borntraeger <[email protected]>
2022-06-01KVM: s390: pv: Add query dump informationJanosch Frank1-0/+11
The dump API requires userspace to provide buffers into which we will store data. The dump information added in this patch tells userspace how big those buffers need to be. Signed-off-by: Janosch Frank <[email protected]> Reviewed-by: Claudio Imbrenda <[email protected]> Reviewed-by: Steffen Eiden <[email protected]> Link: https://lore.kernel.org/r/[email protected] Message-Id: <[email protected]> Signed-off-by: Christian Borntraeger <[email protected]>
2022-06-01KVM: s390: pv: Add dump support definitionsJanosch Frank1-0/+33
Let's add the constants and structure definitions needed for the dump support. Signed-off-by: Janosch Frank <[email protected]> Reviewed-by: Claudio Imbrenda <[email protected]> Reviewed-by: Steffen Eiden <[email protected]> Link: https://lore.kernel.org/r/[email protected] Message-Id: <[email protected]> Signed-off-by: Christian Borntraeger <[email protected]>
2022-06-01KVM: s390: pv: Add query interfaceJanosch Frank1-0/+76
Some of the query information is already available via sysfs but having a IOCTL makes the information easier to retrieve. Signed-off-by: Janosch Frank <[email protected]> Reviewed-by: Claudio Imbrenda <[email protected]> Reviewed-by: Steffen Eiden <[email protected]> Link: https://lore.kernel.org/r/[email protected] Message-Id: <[email protected]> Signed-off-by: Christian Borntraeger <[email protected]>
2022-06-01s390/uv: Add dump fields to queryJanosch Frank3-0/+40
The new dump feature requires us to know how much memory is needed for the "dump storage state" and "dump finalize" ultravisor call. These values are reported via the UV query call. Signed-off-by: Janosch Frank <[email protected]> Reviewed-by: Claudio Imbrenda <[email protected]> Reviewed-by: Steffen Eiden <[email protected]> Link: https://lore.kernel.org/r/[email protected] Message-Id: <[email protected]> Signed-off-by: Christian Borntraeger <[email protected]>
2022-06-01s390/uv: Add SE hdr query informationJanosch Frank3-1/+28
We have information about the supported se header version and pcf bits so let's expose it via the sysfs files. Signed-off-by: Janosch Frank <[email protected]> Reviewed-by: Claudio Imbrenda <[email protected]> Reviewed-by: Steffen Eiden <[email protected]> Link: https://lore.kernel.org/r/[email protected] Message-Id: <[email protected]> Signed-off-by: Christian Borntraeger <[email protected]>
2022-06-01s390/stack: add union to reflect kvm stack slot usagesHeiko Carstens3-6/+14
Add a union which describes how the empty stack slots are being used by kvm and perf. This should help to avoid another bug like the one which was fixed with commit c9bfb460c3e4 ("s390/perf: obtain sie_block from the right address"). Reviewed-by: Nico Boehr <[email protected]> Tested-by: Nico Boehr <[email protected]> Signed-off-by: Heiko Carstens <[email protected]>
2022-06-01s390/stack: merge empty stack frame slotsHeiko Carstens3-8/+7
Merge empty1 and empty2 arrays within the stack frame to one single array. This is possible since with commit 42b01a553a56 ("s390: always use the packed stack layout") the alternative stack frame layout is gone. Reviewed-by: Nico Boehr <[email protected]> Reviewed-by: Alexander Gordeev <[email protected]> Signed-off-by: Heiko Carstens <[email protected]>
2022-06-01s390/uaccess: whitespace cleanupHeiko Carstens1-66/+66
Whitespace cleanup to get rid if some checkpatch findings, but mainly to have consistent coding style within the header file again. Signed-off-by: Heiko Carstens <[email protected]>
2022-06-01s390/uaccess: use __noreturn instead of __attribute__((noreturn))Heiko Carstens1-3/+4
Signed-off-by: Heiko Carstens <[email protected]>
2022-06-01s390/uaccess: use exception handler to zero result on get_user() failureHeiko Carstens3-59/+143
Historically the uaccess code pre-initializes the result of get_user() (and now also __get_kernel_nofault()) to zero and uses the result as input parameter for inline assemblies. This is different to what most, if not all, other architectures are doing, which set the result to zero within the exception handler in case of a fault. Use the new extable mechanism and handle zeroing of the result within the exception handler in case of a fault. Signed-off-by: Heiko Carstens <[email protected]>
2022-06-01s390/uaccess: use symbolic names for inline assembler operandsHeiko Carstens1-8/+8
Make code easier to read by using symbolic names. Signed-off-by: Heiko Carstens <[email protected]>
2022-06-01s390/mcck: isolate SIE instruction when setting CIF_MCCK_GUEST flagAlexander Gordeev1-1/+5
Commit d768bd892fc8 ("s390: add options to change branch prediction behaviour for the kernel") introduced .Lsie_exit label - supposedly to fence off SIE instruction. However, the corresponding address range length .Lsie_crit_mcck_length was not updated, which led to BPON code potentionally marked with CIF_MCCK_GUEST flag. Both .Lsie_exit and .Lsie_crit_mcck_length were removed with commit 0b0ed657fe00 ("s390: remove critical section cleanup from entry.S"), but the issue persisted - currently BPOFF and BPENTER macros might get wrongly considered by the machine check handler as a guest. Fixes: d768bd892fc8 ("s390: add options to change branch prediction behaviour for the kernel") Reviewed-by: Sven Schnelle <[email protected]> Reviewed-by: Christian Borntraeger <[email protected]> Signed-off-by: Alexander Gordeev <[email protected]> Signed-off-by: Heiko Carstens <[email protected]>