Age | Commit message (Collapse) | Author | Files | Lines |
|
clang fails to handle ".if" statements in inline assembly which are heavily
used in the alternatives code.
To work around this remove this code, and enforce that users of
alternatives must specify original and alternative instruction sequences
which have identical sizes. Add a compile time check with two ".org"
statements similar to arm64.
In result not only clang can handle this, but also quite a lot of code can
be removed.
Acked-by: Vasily Gorbik <[email protected]>
Tested-by: Nathan Chancellor <[email protected]>
Tested-by: Nick Desaulniers <[email protected]>
Link: https://github.com/ClangBuiltLinux/linux/issues/1356
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Heiko Carstens <[email protected]>
|
|
Explicitly provide identical sized original/alternative instruction
sequences. This way there is no need for the s390 specific alternatives
infrastructure to generate padding sequences.
The code which generates such sequences will be removed with a follow on
patch.
Acked-by: Vasily Gorbik <[email protected]>
Tested-by: Nathan Chancellor <[email protected]>
Tested-by: Nick Desaulniers <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Heiko Carstens <[email protected]>
|
|
Patch series "Fix CONT-PTE/PMD size hugetlb issue when unmapping or migrating", v4.
presently, migrating a hugetlb page or unmapping a poisoned hugetlb page,
we'll use ptep_clear_flush() and set_pte_at() to nuke the page table entry
and remap it, and this is incorrect for CONT-PTE or CONT-PMD size hugetlb
page, which will cause potential data consistent issue. This patch set
will change to use hugetlb related APIs to fix this issue.
Note: Mike pointed out the huge_ptep_get() will only return the one
specific value, and it would not take into account the dirty or young bits
of CONT-PTE/PMDs like the huge_ptep_get_and_clear() [1]. This
inconsistent issue is not introduced by this patch set, and this issue
will be addressed in another thread [2]. Meanwhile the uffd for hugetlb
case [3] pointed out by Gerald also needs another patch to address.
[1] https://lore.kernel.org/linux-mm/[email protected]/
[2] https://lore.kernel.org/all/[email protected]/
[3] https://lore.kernel.org/linux-mm/20220503120343.6264e126@thinkpad/
This patch (of 3):
It is incorrect to use ptep_clear_flush() to nuke a hugetlb page table
when unmapping or migrating a hugetlb page, and will change to use
huge_ptep_clear_flush() instead in the following patches.
So this is a preparation patch, which changes the huge_ptep_clear_flush()
to return the original pte to help to nuke a hugetlb page table.
[[email protected]: fix build in several more architectures]
Link: https://lkml.kernel.org/r/[email protected]
[[email protected]: fixup]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/20f77ddab90baa249bd24504c413189b82acde69.1652270205.git.baolin.wang@linux.alibaba.com
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/dcf065868cce35bceaf138613ad27f17bb7c0c19.1652147571.git.baolin.wang@linux.alibaba.com
Signed-off-by: Baolin Wang <[email protected]>
Signed-off-by: Stephen Rothwell <[email protected]>
Acked-by: Mike Kravetz <[email protected]>
Reviewed-by: Muchun Song <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Thomas Bogendoerfer <[email protected]>
Cc: James Bottomley <[email protected]>
Cc: Helge Deller <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Alexander Gordeev <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: Sven Schnelle <[email protected]>
Cc: Yoshinori Sato <[email protected]>
Cc: Rich Felker <[email protected]>
Cc: David S. Miller <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Gerald Schaefer <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
S390x defines a get_cycles() function, but it does not do the usual
`#define get_cycles get_cycles` dance, making it impossible for generic
code to see if an arch-specific function was defined. While the
get_cycles() ifdef is not currently used, the following timekeeping
patch in this series will depend on the macro existing (or not existing)
when defining random_get_entropy().
Cc: Thomas Gleixner <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Alexander Gordeev <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: Sven Schnelle <[email protected]>
Acked-by: Heiko Carstens <[email protected]>
Signed-off-by: Jason A. Donenfeld <[email protected]>
|
|
They will be used in the follow up patches to either check/set/clear
uffd-wp bit of a huge pte.
So far it reuses all the small pte helpers. Archs can overwrite these
versions when necessary (with __HAVE_ARCH_HUGE_PTE_UFFD_WP* macros) in the
future.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Peter Xu <[email protected]>
Cc: Alistair Popple <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Axel Rasmussen <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Jerome Glisse <[email protected]>
Cc: "Kirill A . Shutemov" <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Mike Kravetz <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Nadav Amit <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Patch series "userfaultfd-wp: Support shmem and hugetlbfs", v8.
Overview
========
Userfaultfd-wp anonymous support was merged two years ago. There're quite
a few applications that started to leverage this capability either to take
snapshots for user-app memory, or use it for full user controled swapping.
This series tries to complete the feature for uffd-wp so as to cover all
the RAM-based memory types. So far uffd-wp is the only missing piece of
the rest features (uffd-missing & uffd-minor mode).
One major reason to do so is that anonymous pages are sometimes not
satisfying the need of applications, and there're growing users of either
shmem and hugetlbfs for either sharing purpose (e.g., sharing guest mem
between hypervisor process and device emulation process, shmem local live
migration for upgrades), or for performance on tlb hits.
All these mean that if a uffd-wp app wants to switch to any of the memory
types, it'll stop working. I think it's worthwhile to have the kernel to
cover all these aspects.
This series chose to protect pages in pte level not page level.
One major reason is safety. I have no idea how we could make it safe if
any of the uffd-privileged app can wr-protect a page that any other
application can use. It means this app can block any process potentially
for any time it wants.
The other reason is that it aligns very well with not only the anonymous
uffd-wp solution, but also uffd as a whole. For example, userfaultfd is
implemented fundamentally based on VMAs. We set flags to VMAs showing the
status of uffd tracking. For another per-page based protection solution,
it'll be crossing the fundation line on VMA-based, and it could simply be
too far away already from what's called userfaultfd.
PTE markers
===========
The patchset is based on the idea called PTE markers. It was discussed in
one of the mm alignment sessions, proposed starting from v6, and this is
the 2nd version of it using PTE marker idea.
PTE marker is a new type of swap entry that is ony applicable to file
backed memories like shmem and hugetlbfs. It's used to persist some
pte-level information even if the original present ptes in pgtable are
zapped.
Logically pte markers can store more than uffd-wp information, but so far
only one bit is used for uffd-wp purpose. When the pte marker is
installed with uffd-wp bit set, it means this pte is wr-protected by uffd.
It solves the problem on e.g. file-backed memory mapped ptes got zapped
due to any reason (e.g. thp split, or swapped out), we can still keep the
wr-protect information in the ptes. Then when the page fault triggers
again, we'll know this pte is wr-protected so we can treat the pte the
same as a normal uffd wr-protected pte.
The extra information is encoded into the swap entry, or swp_offset to be
explicit, with the swp_type being PTE_MARKER. So far uffd-wp only uses
one bit out of the swap entry, the rest bits of swp_offset are still
reserved for other purposes.
There're two configs to enable/disable PTE markers:
CONFIG_PTE_MARKER
CONFIG_PTE_MARKER_UFFD_WP
We can set !PTE_MARKER to completely disable all the PTE markers, along
with uffd-wp support. I made two config so we can also enable PTE marker
but disable uffd-wp file-backed for other purposes. At the end of current
series, I'll enable CONFIG_PTE_MARKER by default, but that patch is
standalone and if anyone worries about having it by default, we can also
consider turn it off by dropping that oneliner patch. So far I don't see
a huge risk of doing so, so I kept that patch.
In most cases, PTE markers should be treated as none ptes. It is because
that unlike most of the other swap entry types, there's no PFN or block
offset information encoded into PTE markers but some extra well-defined
bits showing the status of the pte. These bits should only be used as
extra data when servicing an upcoming page fault, and then we behave as if
it's a none pte.
I did spend a lot of time observing all the pte_none() users this time.
It is indeed a challenge because there're a lot, and I hope I didn't miss
a single of them when we should take care of pte markers. Luckily, I
don't think it'll need to be considered in many cases, for example: boot
code, arch code (especially non-x86), kernel-only page handlings (e.g.
CPA), or device driver codes when we're tackling with pure PFN mappings.
I introduced pte_none_mostly() in this series when we need to handle pte
markers the same as none pte, the "mostly" is the other way to write
"either none pte or a pte marker".
I didn't replace pte_none() to cover pte markers for below reasons:
- Very rare case of pte_none() callers will handle pte markers. E.g., all
the kernel pages do not require knowledge of pte markers. So we don't
pollute the major use cases.
- Unconditionally change pte_none() semantics could confuse people, because
pte_none() existed for so long a time.
- Unconditionally change pte_none() semantics could make pte_none() slower
even if in many cases pte markers do not exist.
- There're cases where we'd like to handle pte markers differntly from
pte_none(), so a full replace is also impossible. E.g. khugepaged should
still treat pte markers as normal swap ptes rather than none ptes, because
pte markers will always need a fault-in to merge the marker with a valid
pte. Or the smap code will need to parse PTE markers not none ptes.
Patch Layout
============
Introducing PTE marker and uffd-wp bit in PTE marker:
mm: Introduce PTE_MARKER swap entry
mm: Teach core mm about pte markers
mm: Check against orig_pte for finish_fault()
mm/uffd: PTE_MARKER_UFFD_WP
Adding support for shmem uffd-wp:
mm/shmem: Take care of UFFDIO_COPY_MODE_WP
mm/shmem: Handle uffd-wp special pte in page fault handler
mm/shmem: Persist uffd-wp bit across zapping for file-backed
mm/shmem: Allow uffd wr-protect none pte for file-backed mem
mm/shmem: Allows file-back mem to be uffd wr-protected on thps
mm/shmem: Handle uffd-wp during fork()
Adding support for hugetlbfs uffd-wp:
mm/hugetlb: Introduce huge pte version of uffd-wp helpers
mm/hugetlb: Hook page faults for uffd write protection
mm/hugetlb: Take care of UFFDIO_COPY_MODE_WP
mm/hugetlb: Handle UFFDIO_WRITEPROTECT
mm/hugetlb: Handle pte markers in page faults
mm/hugetlb: Allow uffd wr-protect none ptes
mm/hugetlb: Only drop uffd-wp special pte if required
mm/hugetlb: Handle uffd-wp during fork()
Misc handling on the rest mm for uffd-wp file-backed:
mm/khugepaged: Don't recycle vma pgtable if uffd-wp registered
mm/pagemap: Recognize uffd-wp bit for shmem/hugetlbfs
Enabling of uffd-wp on file-backed memory:
mm/uffd: Enable write protection for shmem & hugetlbfs
mm: Enable PTE markers by default
selftests/uffd: Enable uffd-wp for shmem/hugetlbfs
Tests
=====
- Compile test on x86_64 and aarch64 on different configs
- Kernel selftests
- uffd-test [0]
- Umapsort [1,2] test for shmem/hugetlb, with swap on/off
[0] https://github.com/xzpeter/clibs/tree/master/uffd-test
[1] https://github.com/xzpeter/umap-apps/tree/peter
[2] https://github.com/xzpeter/umap/tree/peter-shmem-hugetlbfs
This patch (of 23):
Introduces a new swap entry type called PTE_MARKER. It can be installed
for any pte that maps a file-backed memory when the pte is temporarily
zapped, so as to maintain per-pte information.
The information that kept in the pte is called a "marker". Here we define
the marker as "unsigned long" just to match pgoff_t, however it will only
work if it still fits in swp_offset(), which is e.g. currently 58 bits on
x86_64.
A new config CONFIG_PTE_MARKER is introduced too; it's by default off. A
bunch of helpers are defined altogether to service the rest of the pte
marker code.
[[email protected]: fixup]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Peter Xu <[email protected]>
Cc: Mike Kravetz <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Alistair Popple <[email protected]>
Cc: Nadav Amit <[email protected]>
Cc: Axel Rasmussen <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: "Kirill A . Shutemov" <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Jerome Glisse <[email protected]>
Cc: Mike Rapoport <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
PROFILE_ALL_BRANCHES
gcc 12 does not (always) optimize away code that should only be generated
if parameters are constant and within in a certain range. This depends on
various obscure kernel config options, however in particular
PROFILE_ALL_BRANCHES can trigger this compile error:
In function ‘__atomic_add_const’,
inlined from ‘__preempt_count_add.part.0’ at ./arch/s390/include/asm/preempt.h:50:3:
./arch/s390/include/asm/atomic_ops.h:80:9: error: impossible constraint in ‘asm’
80 | asm volatile( \
| ^~~
Workaround this by simply disabling the optimization for
PROFILE_ALL_BRANCHES, since the kernel will be so slow, that this
optimization won't matter at all.
Reported-by: Thomas Richter <[email protected]>
Reviewed-by: Sven Schnelle <[email protected]>
Signed-off-by: Heiko Carstens <[email protected]>
|
|
clock_delta is declared as unsigned long in various places. However,
the clock sync delta can be negative. This would add a huge positive
offset in clock_sync_global where clock_delta is added to clk.eitod
which is a 72 bit integer. Declare it as signed long to fix this.
Cc: [email protected]
Signed-off-by: Sven Schnelle <[email protected]>
Reviewed-by: Heiko Carstens <[email protected]>
Signed-off-by: Heiko Carstens <[email protected]>
|
|
The size of the TOD offset field in the stp info response is 64 bits.
Signed-off-by: Sven Schnelle <[email protected]>
Reviewed-by: Heiko Carstens <[email protected]>
Signed-off-by: Heiko Carstens <[email protected]>
|
|
Let's use bit 52, which is unused.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: David Hildenbrand <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Don Dutile <[email protected]>
Cc: Gerald Schaefer <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Jann Horn <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Cc: Liang Zhang <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Mike Kravetz <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Nadav Amit <[email protected]>
Cc: Oded Gabbay <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Pedro Demarchi Gomes <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Roman Gushchin <[email protected]>
Cc: Shakeel Butt <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Bit 52 and bit 55 don't have to be zero: they only trigger a
translation-specifiation exception if the PTE is marked as valid, which is
not the case for swap ptes.
Document which bits are used for what, and which ones are unused.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: David Hildenbrand <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Don Dutile <[email protected]>
Cc: Gerald Schaefer <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Jann Horn <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Cc: Liang Zhang <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Mike Kravetz <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Nadav Amit <[email protected]>
Cc: Oded Gabbay <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Pedro Demarchi Gomes <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Roman Gushchin <[email protected]>
Cc: Shakeel Butt <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
PMU device driver perf_pai_crypto supports Processor Activity
Instrumentation (PAI), available with IBM z16:
- maps a full page to lowcore address 0x1500.
- uses CR0 bit 13 to turn PAI crypto counting on and off.
- creates a sample with raw data on each context switch out when
at context switch some mapped counters have a value of nonzero.
This device driver only supports CPU wide context, no task context
is allowed.
Support for counting:
- one or more counters can be specified using
perf stat -e pai_crypto/xxx/
where xxx stands for the counter event name. Multiple invocation
of this command is possible. The counter names are listed in
/sys/devices/pai_crypto/events directory.
- one special counters can be specified using
perf stat -e pai_crypto/CRYPTO_ALL/
which returns the sum of all incremented crypto counters.
- one event pai_crypto/CRYPTO_ALL/ is reserved for sampling.
No multiple invocations are possible. The event collects data at
context switch out and saves them in the ring buffer.
Add qpaci assembly instruction to query supported memory mapped crypto
counters. It returns the number of counters (no holes allowed in that
range).
The PAI crypto counter events are system wide and can not be executed
in parallel. Therefore some restrictions documented in function
paicrypt_busy apply.
In particular event CRYPTO_ALL for sampling must run exclusive.
Only counting events can run in parallel.
PAI crypto counter events can not be created when a CPU hot plug
add is processed. This means a CPU hot plug add does not get
the necessary PAI event to record PAI cryptography counter increments
on the newly added CPU. CPU hot plug remove removes the event and
terminates the counting of PAI counters immediately.
Co-developed-by: Sven Schnelle <[email protected]>
Signed-off-by: Sven Schnelle <[email protected]>
Reviewed-by: Juergen Christ <[email protected]>
Signed-off-by: Thomas Richter <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Heiko Carstens <[email protected]>
|
|
arch_check_user_regs() is used at the moment to verify that struct pt_regs
contains valid values when entering the kernel from userspace. s390 needs
a place in the generic entry code to modify a cpu data structure when
switching from userspace to kernel mode. As arch_check_user_regs() is
exactly this, rename it to arch_enter_from_user_mode().
When entering the kernel from userspace, arch_check_user_regs() is
used to verify that struct pt_regs contains valid values. Note that
the NMI codepath doesn't call this function. s390 needs a place in the
generic entry code to modify a cpu data structure when switching from
userspace to kernel mode. As arch_check_user_regs() is exactly this,
rename it to arch_enter_from_user_mode().
Signed-off-by: Sven Schnelle <[email protected]>
Reviewed-by: Thomas Gleixner <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Heiko Carstens <[email protected]>
|
|
The short psw definitions are contained in compat header files, however
short psws are not compat specific. Therefore move the definitions to
ptrace header file. This also gets rid of a compat header include in kvm
code.
Acked-by: Vasily Gorbik <[email protected]>
Signed-off-by: Heiko Carstens <[email protected]>
|
|
LLVM's integrated assembler does not like comments within macros:
<instantiation>:3:19: error: too many positional arguments
GR_NUM b2, 1 /* Base register */
^
Remove them, since they are obvious anyway.
Signed-off-by: Heiko Carstens <[email protected]>
|
|
Use local labels in .set directives to avoid potential compile errors
with LTO + clang. See commit 334865b2915c ("x86/extable: Prefer local
labels in .set directives") for further details.
Since s390 doesn't support LTO currently this doesn't fix a real bug
for now, but helps to avoid problems as soon as required pieces have
been added to llvm.
Signed-off-by: Heiko Carstens <[email protected]>
|
|
Use local labels in .set directives to avoid potential compile errors
with LTO + clang. See commit 334865b2915c ("x86/extable: Prefer local
labels in .set directives") for further details.
Since s390 doesn't support LTO currently this doesn't fix a real bug
for now, but helps to avoid problems as soon as required pieces have
been added to llvm.
Signed-off-by: Heiko Carstens <[email protected]>
|
|
There are 7 64bit architectures that support Linux COMPAT mode to
run 32bit applications. A lot of definitions are duplicate:
- COMPAT_USER_HZ
- COMPAT_RLIM_INFINITY
- COMPAT_OFF_T_MAX
- __compat_uid_t, __compat_uid_t
- compat_dev_t
- compat_ipc_pid_t
- struct compat_flock
- struct compat_flock64
- struct compat_statfs
- struct compat_ipc64_perm, compat_semid64_ds,
compat_msqid64_ds, compat_shmid64_ds
Cleanup duplicate definitions and merge them into asm-generic.
Signed-off-by: Guo Ren <[email protected]>
Signed-off-by: Guo Ren <[email protected]>
Reviewed-by: Arnd Bergmann <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Tested-by: Heiko Stuebner <[email protected]>
Acked-by: Helge Deller <[email protected]> # parisc
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Palmer Dabbelt <[email protected]>
|
|
RISC-V doesn't neeed compat_stat, so using __ARCH_WANT_COMPAT_STAT
to exclude unnecessary SYSCALL functions.
Signed-off-by: Guo Ren <[email protected]>
Signed-off-by: Guo Ren <[email protected]>
Reviewed-by: Arnd Bergmann <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Tested-by: Heiko Stuebner <[email protected]>
Acked-by: Helge Deller <[email protected]> # parisc
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Palmer Dabbelt <[email protected]>
|
|
Provide a single common definition for the compat_flock and
compat_flock64 structures using the same tricks as for the native
variants. Another extra define is added for the packing required on
x86.
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Guo Ren <[email protected]>
Reviewed-by: Arnd Bergmann <[email protected]>
Tested-by: Heiko Stuebner <[email protected]>
Acked-by: Helge Deller <[email protected]> # parisc
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Palmer Dabbelt <[email protected]>
|
|
The F_GETLK64/F_SETLK64/F_SETLKW64 fcntl opcodes are only implemented
for the 32-bit syscall APIs, but are also needed for compat handling
on 64-bit kernels.
Consolidate them in unistd.h instead of definining the internal compat
definitions in compat.h, which is rather error prone (e.g. parisc
gets the values wrong currently).
Note that before this change they were never visible to userspace due
to the fact that CONFIG_64BIT is only set for kernel builds.
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Guo Ren <[email protected]>
Reviewed-by: Arnd Bergmann <[email protected]>
Tested-by: Heiko Stuebner <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Palmer Dabbelt <[email protected]>
|
|
test_barrier fails on s390 because of the missing KCSAN instrumentation
for several synchronization primitives.
Add it to barriers by defining __mb(), __rmb(), __wmb(), __dma_rmb()
and __dma_wmb(), and letting the common code in asm-generic/barrier.h
do the rest.
Spinlocks require instrumentation only on the unlock path; notify KCSAN
that the CPU cannot move memory accesses outside of the spin lock. In
reality it also cannot move stores inside of it, but this is not
important and can be omitted.
Reported-by: Tobias Huschle <[email protected]>
Signed-off-by: Ilya Leoshkevich <[email protected]>
Signed-off-by: Heiko Carstens <[email protected]>
|
|
Currently it is not detectable from within Linux when PCI instructions
are retried because of a busy condition. Detecting such conditions and
especially how long they lasted can however be quite useful in problem
determination. This patch enables this by adding an s390dbf error log
when a CC 2 is first encountered as well as after the retried
instruction.
Despite being unlikely it may be possible that these added debug
messages drown out important other messages so allow setting the debug
level in zpci_err_insn*() and set their level to 1 so they can be
filtered out if need be.
Reviewed-by: Matthew Rosato <[email protected]>
Reviewed-by: Pierre Morel <[email protected]>
Signed-off-by: Niklas Schnelle <[email protected]>
Signed-off-by: Heiko Carstens <[email protected]>
|
|
In the current code vdso is mapped below the stack. This is
problematic when programs mapped to the top of the address space
are allocating a lot of memory, because the heap will clash with
the vdso. To avoid this map the vdso above the stack and move
STACK_TOP so that it all fits into three level paging.
Signed-off-by: Sven Schnelle <[email protected]>
Reviewed-by: Heiko Carstens <[email protected]>
Signed-off-by: Heiko Carstens <[email protected]>
|
|
This is a preparation patch for adding vdso randomization to s390.
It adds a function vdso_size(), which will be used later in calculating
the STACK_TOP value. It also moves the vdso mapping into a new function
vdso_map(), to keep the code similar to other architectures.
Signed-off-by: Sven Schnelle <[email protected]>
Reviewed-by: Heiko Carstens <[email protected]>
Signed-off-by: Heiko Carstens <[email protected]>
|
|
Fix the following coccicheck warnings:
./arch/s390/include/asm/scsw.h:695:47-49: WARNING
!A || A && B is equivalent to !A || B
I apply a readable version just to get rid of a warning.
Signed-off-by: Haowen Bai <[email protected]>
Reviewed-by: Peter Oberparleiter <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Cc: Alexander Gordeev <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Sven Schnelle <[email protected]>
Signed-off-by: Heiko Carstens <[email protected]>
|
|
If the facility IPL-complete-control is present then the last diag308
call made by kexec shall set the end-of-ipl flag in the subcode register
to signal the hypervisor that this is the last diag308 call made by Linux.
Only the diag308 calls made during a regular kexec need to set
the end-of-ipl flag, in all other cases the hypervisor will ignore it.
Signed-off-by: Alexander Egorenkov <[email protected]>
Reviewed-by: Heiko Carstens <[email protected]>
Signed-off-by: Heiko Carstens <[email protected]>
|
|
The presence of the IPL-complete-control facility can be derived
from the hypervisor's SCLP info response.
Signed-off-by: Alexander Egorenkov <[email protected]>
Reviewed-by: Christian Borntraeger <[email protected]>
Signed-off-by: Heiko Carstens <[email protected]>
|
|
s390 defines current_stack_pointer as function while all other
architectures use 'register unsigned long asm("<stackptr reg>").
This make codes like the following from check_stack_object() fail:
if (IS_ENABLED(CONFIG_STACK_GROWSUP)) {
if ((void *)current_stack_pointer < obj + len)
return BAD_STACK;
} else {
if (obj < (void *)current_stack_pointer)
return BAD_STACK;
}
because this would compare the address of current_stack_pointer() and
not the stackpointer value.
Reported-by: Karsten Graul <[email protected]>
Fixes: 2792d84e6da5 ("usercopy: Check valid lifetime via stack depth")
Cc: Kees Cook <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Alexander Gordeev <[email protected]>
Signed-off-by: Sven Schnelle <[email protected]>
Reviewed-by: Heiko Carstens <[email protected]>
Signed-off-by: Heiko Carstens <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs updates from Al Viro:
"Assorted bits and pieces"
* 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
aio: drop needless assignment in aio_read()
clean overflow checks in count_mounts() a bit
seq_file: fix NULL pointer arithmetic warning
uml/x86: use x86 load_unaligned_zeropad()
asm/user.h: killed unused macros
constify struct path argument of finish_automount()/do_add_mount()
fs: Remove FIXME comment in generic_write_checks()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull more s390 updates from Vasily Gorbik:
- Add kretprobes framepointer verification and return address recovery
in stacktrace.
- Support control domain masks on custom zcrypt devices and filter
admin requests.
- Cleanup timer API usage.
- Rework absolute lowcore access helpers.
- Other various small improvements and fixes.
* tag 's390-5.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (26 commits)
s390/alternatives: avoid using jgnop mnemonic
s390/pci: rename get_zdev_by_bus() to zdev_from_bus()
s390/pci: improve zpci_dev reference counting
s390/smp: use physical address for SIGP_SET_PREFIX command
s390: cleanup timer API use
s390/zcrypt: fix using the correct variable for sizeof()
s390/vfio-ap: fix kernel doc and signature of group notifier functions
s390/maccess: rework absolute lowcore accessors
s390/smp: cleanup control register update routines
s390/smp: cleanup target CPU callback starting
s390/test_unwind: verify __kretprobe_trampoline is replaced
s390/unwind: avoid duplicated unwinding entries for kretprobes
s390/unwind: recover kretprobe modified return address in stacktrace
s390/kprobes: enable kretprobes framepointer verification
s390/test_unwind: extend kretprobe test
s390/ap: adjust whitespace
s390/ap: use insn format for new instructions
s390/alternatives: use insn format for new instructions
s390/alternatives: use instructions instead of byte patterns
s390/traps: improve panic message for translation-specification exception
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull ptrace cleanups from Eric Biederman:
"This set of changes removes tracehook.h, moves modification of all of
the ptrace fields inside of siglock to remove races, adds a missing
permission check to ptrace.c
The removal of tracehook.h is quite significant as it has been a major
source of confusion in recent years. Much of that confusion was around
task_work and TIF_NOTIFY_SIGNAL (which I have now decoupled making the
semantics clearer).
For people who don't know tracehook.h is a vestiage of an attempt to
implement uprobes like functionality that was never fully merged, and
was later superseeded by uprobes when uprobes was merged. For many
years now we have been removing what tracehook functionaly a little
bit at a time. To the point where anything left in tracehook.h was
some weird strange thing that was difficult to understand"
* tag 'ptrace-cleanups-for-v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
ptrace: Remove duplicated include in ptrace.c
ptrace: Check PTRACE_O_SUSPEND_SECCOMP permission on PTRACE_SEIZE
ptrace: Return the signal to continue with from ptrace_stop
ptrace: Move setting/clearing ptrace_message into ptrace_stop
tracehook: Remove tracehook.h
resume_user_mode: Move to resume_user_mode.h
resume_user_mode: Remove #ifdef TIF_NOTIFY_RESUME in set_notify_resume
signal: Move set_notify_signal and clear_notify_signal into sched/signal.h
task_work: Decouple TIF_NOTIFY_SIGNAL and task_work
task_work: Call tracehook_notify_signal from get_signal on all architectures
task_work: Introduce task_work_pending
task_work: Remove unnecessary include from posix_timers.h
ptrace: Remove tracehook_signal_handler
ptrace: Remove arch_syscall_{enter,exit}_tracehook
ptrace: Create ptrace_report_syscall_{entry,exit} in ptrace.h
ptrace/arm: Rename tracehook_report_syscall report_syscall
ptrace: Move ptrace_report_syscall into ptrace.h
|
|
jgnop mnemonic is only available since binutils 2.36,
kernel minimal required version is 2.23. Stick to brcl
to avoid build errors.
Reported-by: Nathan Chancellor <[email protected]>
Fixes: 4afeb670710e ("s390/alternatives: use instructions instead of byte patterns")
Signed-off-by: Vasily Gorbik <[email protected]>
|
|
Macro mem_assign_absolute() is able to access the whole memory, but
is only used and makes sense when updating the absolute lowcore.
Instead, introduce get_abs_lowcore() and put_abs_lowcore() macros
that limit access to absolute lowcore addresses only.
Suggested-by: Heiko Carstens <[email protected]>
Reviewed-by: Heiko Carstens <[email protected]>
Signed-off-by: Alexander Gordeev <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>
|
|
Get rid of duplicate code and redundant data.
Reviewed-by: Heiko Carstens <[email protected]>
Signed-off-by: Alexander Gordeev <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>
|
|
Based on commit cd9bc2c92588 ("arm64: Recover kretprobe modified return
address in stacktrace").
"""
Since the kretprobe replaces the function return address with
the __kretprobe_trampoline on the stack, stack unwinder shows it
instead of the correct return address.
This checks whether the next return address is the
__kretprobe_trampoline(), and if so, try to find the correct
return address from the kretprobe instance list.
"""
Original patch series:
https://lore.kernel.org/all/163163030719.489837.2236069935502195491.stgit@devnote2/
Reviewed-by: Tobias Huschle <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>
|
|
Adjust indentation of inline assemblies, so all comments
start at the same position.
Signed-off-by: Heiko Carstens <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>
|
|
Use insn format with instruction format specifier instead of plain
longs. This way it is also more obvious that code instead of data is
generated.
The generated code is identical.
Reviewed-by: Harald Freudenberger <[email protected]>
Signed-off-by: Heiko Carstens <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>
|
|
Use insn format with instruction format specifier instead of plain
longs. This way it is also more obvious that code instead of data is
generated.
The generated code is identical.
Signed-off-by: Heiko Carstens <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>
|
|
Use readable nop instructions within the code which generates
the padding areas, instead of unreadable byte patterns.
The generated code is identical.
Signed-off-by: Heiko Carstens <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>
|
|
Looks like this endif comment was erroneously unchanged when copied over
from the x86 version.
Signed-off-by: Russell Currey <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Heiko Carstens <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Vasily Gorbik:
- Raise minimum supported machine generation to z10, which comes with
various cleanups and code simplifications (usercopy/spectre
mitigation/etc).
- Rework extables and get rid of anonymous out-of-line fixups.
- Page table helpers cleanup. Add set_pXd()/set_pte() helper functions.
Covert pte_val()/pXd_val() macros to functions.
- Optimize kretprobe handling by avoiding extra kprobe on
__kretprobe_trampoline.
- Add support for CEX8 crypto cards.
- Allow to trigger AP bus rescan via writing to /sys/bus/ap/scans.
- Add CONFIG_EXPOLINE_EXTERN option to build the kernel without COMDAT
group sections which simplifies kpatch support.
- Always use the packed stack layout and extend kernel unwinder tests.
- Add sanity checks for ftrace code patching.
- Add s390dbf debug log for the vfio_ap device driver.
- Various virtual vs physical address confusion fixes.
- Various small fixes and improvements all over the code.
* tag 's390-5.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (69 commits)
s390/test_unwind: add kretprobe tests
s390/kprobes: Avoid additional kprobe in kretprobe handling
s390: convert ".insn" encoding to instruction names
s390: assume stckf is always present
s390/nospec: move to single register thunks
s390: raise minimum supported machine generation to z10
s390/uaccess: Add copy_from/to_user_key functions
s390/nospec: align and size extern thunks
s390/nospec: add an option to use thunk-extern
s390/nospec: generate single register thunks if possible
s390/pci: make zpci_set_irq()/zpci_clear_irq() static
s390: remove unused expoline to BC instructions
s390/irq: use assignment instead of cast
s390/traps: get rid of magic cast for per code
s390/traps: get rid of magic cast for program interruption code
s390/signal: fix typo in comments
s390/asm-offsets: remove unused defines
s390/test_unwind: avoid build warning with W=1
s390: remove .fixup section
s390/bpf: encode register within extable entry
...
|
|
Pull kvm updates from Paolo Bonzini:
"ARM:
- Proper emulation of the OSLock feature of the debug architecture
- Scalibility improvements for the MMU lock when dirty logging is on
- New VMID allocator, which will eventually help with SVA in VMs
- Better support for PMUs in heterogenous systems
- PSCI 1.1 support, enabling support for SYSTEM_RESET2
- Implement CONFIG_DEBUG_LIST at EL2
- Make CONFIG_ARM64_ERRATUM_2077057 default y
- Reduce the overhead of VM exit when no interrupt is pending
- Remove traces of 32bit ARM host support from the documentation
- Updated vgic selftests
- Various cleanups, doc updates and spelling fixes
RISC-V:
- Prevent KVM_COMPAT from being selected
- Optimize __kvm_riscv_switch_to() implementation
- RISC-V SBI v0.3 support
s390:
- memop selftest
- fix SCK locking
- adapter interruptions virtualization for secure guests
- add Claudio Imbrenda as maintainer
- first step to do proper storage key checking
x86:
- Continue switching kvm_x86_ops to static_call(); introduce
static_call_cond() and __static_call_ret0 when applicable.
- Cleanup unused arguments in several functions
- Synthesize AMD 0x80000021 leaf
- Fixes and optimization for Hyper-V sparse-bank hypercalls
- Implement Hyper-V's enlightened MSR bitmap for nested SVM
- Remove MMU auditing
- Eager splitting of page tables (new aka "TDP" MMU only) when dirty
page tracking is enabled
- Cleanup the implementation of the guest PGD cache
- Preparation for the implementation of Intel IPI virtualization
- Fix some segment descriptor checks in the emulator
- Allow AMD AVIC support on systems with physical APIC ID above 255
- Better API to disable virtualization quirks
- Fixes and optimizations for the zapping of page tables:
- Zap roots in two passes, avoiding RCU read-side critical
sections that last too long for very large guests backed by 4
KiB SPTEs.
- Zap invalid and defunct roots asynchronously via
concurrency-managed work queue.
- Allowing yielding when zapping TDP MMU roots in response to the
root's last reference being put.
- Batch more TLB flushes with an RCU trick. Whoever frees the
paging structure now holds RCU as a proxy for all vCPUs running
in the guest, i.e. to prolongs the grace period on their behalf.
It then kicks the the vCPUs out of guest mode before doing
rcu_read_unlock().
Generic:
- Introduce __vcalloc and use it for very large allocations that need
memcg accounting"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (246 commits)
KVM: use kvcalloc for array allocations
KVM: x86: Introduce KVM_CAP_DISABLE_QUIRKS2
kvm: x86: Require const tsc for RT
KVM: x86: synthesize CPUID leaf 0x80000021h if useful
KVM: x86: add support for CPUID leaf 0x80000021
KVM: x86: do not use KVM_X86_OP_OPTIONAL_RET0 for get_mt_mask
Revert "KVM: x86/mmu: Zap only TDP MMU leafs in kvm_zap_gfn_range()"
kvm: x86/mmu: Flush TLB before zap_gfn_range releases RCU
KVM: arm64: fix typos in comments
KVM: arm64: Generalise VM features into a set of flags
KVM: s390: selftests: Add error memop tests
KVM: s390: selftests: Add more copy memop tests
KVM: s390: selftests: Add named stages for memop test
KVM: s390: selftests: Add macro as abstraction for MEM_OP
KVM: s390: selftests: Split memop tests
KVM: s390x: fix SCK locking
RISC-V: KVM: Implement SBI HSM suspend call
RISC-V: KVM: Add common kvm_riscv_vcpu_wfi() function
RISC-V: Add SBI HSM suspend related defines
RISC-V: KVM: Implement SBI v0.3 SRST extension
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux
Pull flexible-array transformations from Gustavo Silva:
"Treewide patch that replaces zero-length arrays with flexible-array
members.
This has been baking in linux-next for a whole development cycle"
* tag 'flexible-array-transformations-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux:
treewide: Replace zero-length arrays with flexible-array members
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic
Pull asm-generic updates from Arnd Bergmann:
"There are three sets of updates for 5.18 in the asm-generic tree:
- The set_fs()/get_fs() infrastructure gets removed for good.
This was already gone from all major architectures, but now we can
finally remove it everywhere, which loses some particularly tricky
and error-prone code. There is a small merge conflict against a
parisc cleanup, the solution is to use their new version.
- The nds32 architecture ends its tenure in the Linux kernel.
The hardware is still used and the code is in reasonable shape, but
the mainline port is not actively maintained any more, as all
remaining users are thought to run vendor kernels that would never
be updated to a future release.
- A series from Masahiro Yamada cleans up some of the uapi header
files to pass the compile-time checks"
* tag 'asm-generic-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: (27 commits)
nds32: Remove the architecture
uaccess: remove CONFIG_SET_FS
ia64: remove CONFIG_SET_FS support
sh: remove CONFIG_SET_FS support
sparc64: remove CONFIG_SET_FS support
lib/test_lockup: fix kernel pointer check for separate address spaces
uaccess: generalize access_ok()
uaccess: fix type mismatch warnings from access_ok()
arm64: simplify access_ok()
m68k: fix access_ok for coldfire
MIPS: use simpler access_ok()
MIPS: Handle address errors for accesses above CPU max virtual user address
uaccess: add generic __{get,put}_kernel_nofault
nios2: drop access_ok() check from __put_user()
x86: use more conventional access_ok() definition
x86: remove __range_not_ok()
sparc64: add __{get,put}_kernel_nofault()
nds32: fix access_ok() checks in get/put_user
uaccess: fix nios2 and microblaze get_user_8()
sparc64: fix building assembly files
...
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD
KVM: s390: Fix, test and feature for 5.18 part 2
- memop selftest
- fix SCK locking
- adapter interruptions virtualization for secure guests
|
|
Now that all of the definitions have moved out of tracehook.h into
ptrace.h, sched/signal.h, resume_user_mode.h there is nothing left in
tracehook.h so remove it.
Update the few files that were depending upon tracehook.h to bring in
definitions to use the headers they need directly.
Reviewed-by: Kees Cook <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: "Eric W. Biederman" <[email protected]>
|
|
So far, s390 registered a krobe on __kretprobe_trampoline which is
called everytime a kretprobe fires. This kprobe would then determine
the correct return address and adjust the psw accordingly, such that
the kretprobe would branch to the appropriate address after completion.
Some other archs handle kretprobes without such an additional kprobe.
This approach is adopted to s390 with this patch.
Furthermore, the __kretprobe_trampoline now uses an assembler function
to correctly gather the register and psw content to be passed to the
registered kretprobe handler as struct pt_regs. After completion, the
register content and the psw are set based on the contents of said
pt_regs struct.
Note that a change to the psw address in struct pt_regs will not have
an impact, as the probe will still return to the original return
address of the probed function.
The return address is now recovered by using the appropriate function
arch_kretprobe_fixup_return.
The no longer needed kprobe is removed.
Reviewed-by: Vasily Gorbik <[email protected]>
Signed-off-by: Tobias Huschle <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>
|
|
With z10 as minimum supported machine generation many ".insn" encodings
could be now converted to instruction names. There are couple of exceptions
- stfle is used from the als code built for z900 and cannot be converted
- few ".insn" directives encode unsupported instruction formats
The generated code is identical before/after this change.
Acked-by: Ilya Leoshkevich <[email protected]>
Reviewed-by: Heiko Carstens <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>
|
|
Assembler generated expoline thunks were in a form
__s390_indirect_jump_rXuse_rX when exrl instruction has not been available.
Now with z10 as minimum supported machine generation there
is no need for 2 register thunks, always generate
__s390_indirect_jump_rX versions.
Acked-by: Heiko Carstens <[email protected]>
Acked-by: Sumanth Korikkar <[email protected]>
Acked-by: Ilya Leoshkevich <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>
|