Age | Commit message (Collapse) | Author | Files | Lines |
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cleanups from Ingo Molnar:
"Misc cleanups all around the map"
* 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/CPU/AMD: Remove amd_get_topology_early()
x86/tsc: Remove redundant assignment
x86/crash: Use resource_size()
x86/cpu: Add a missing prototype for arch_smt_update()
x86/nospec: Remove unused RSB_FILL_LOOPS
x86/vdso: Provide missing include file
x86/Kconfig: Correct spelling and punctuation
Documentation/x86/boot: Fix typo
x86/boot: Fix a comment's incorrect file reference
x86/process: Remove set but not used variables prev and next
x86/Kconfig: Fix Kconfig indentation
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 asm updates from Ingo Molnar:
"Misc updates:
- Remove last remaining calls to exception_enter/exception_exit() and
simplify the entry code some more.
- Remove force_iret()
- Add support for "Fast Short Rep Mov", which is available starting
with Ice Lake Intel CPUs - and make the x86 assembly version of
memmove() use REP MOV for all sizes when FSRM is available.
- Micro-optimize/simplify the 32-bit boot code a bit.
- Use a more future-proof SYSRET instruction mnemonic"
* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/boot: Simplify calculation of output address
x86/entry/64: Add instruction suffix to SYSRET
x86: Remove force_iret()
x86/cpufeatures: Add support for fast short REP; MOVSB
x86/context-tracking: Remove exception_enter/exit() from KVM_PV_REASON_PAGE_NOT_PRESENT async page fault
x86/context-tracking: Remove exception_enter/exit() from do_page_fault()
|
|
/* Background. */
For a very long time, extending openat(2) with new features has been
incredibly frustrating. This stems from the fact that openat(2) is
possibly the most famous counter-example to the mantra "don't silently
accept garbage from userspace" -- it doesn't check whether unknown flags
are present[1].
This means that (generally) the addition of new flags to openat(2) has
been fraught with backwards-compatibility issues (O_TMPFILE has to be
defined as __O_TMPFILE|O_DIRECTORY|[O_RDWR or O_WRONLY] to ensure old
kernels gave errors, since it's insecure to silently ignore the
flag[2]). All new security-related flags therefore have a tough road to
being added to openat(2).
Userspace also has a hard time figuring out whether a particular flag is
supported on a particular kernel. While it is now possible with
contemporary kernels (thanks to [3]), older kernels will expose unknown
flag bits through fcntl(F_GETFL). Giving a clear -EINVAL during
openat(2) time matches modern syscall designs and is far more
fool-proof.
In addition, the newly-added path resolution restriction LOOKUP flags
(which we would like to expose to user-space) don't feel related to the
pre-existing O_* flag set -- they affect all components of path lookup.
We'd therefore like to add a new flag argument.
Adding a new syscall allows us to finally fix the flag-ignoring problem,
and we can make it extensible enough so that we will hopefully never
need an openat3(2).
/* Syscall Prototype. */
/*
* open_how is an extensible structure (similar in interface to
* clone3(2) or sched_setattr(2)). The size parameter must be set to
* sizeof(struct open_how), to allow for future extensions. All future
* extensions will be appended to open_how, with their zero value
* acting as a no-op default.
*/
struct open_how { /* ... */ };
int openat2(int dfd, const char *pathname,
struct open_how *how, size_t size);
/* Description. */
The initial version of 'struct open_how' contains the following fields:
flags
Used to specify openat(2)-style flags. However, any unknown flag
bits or otherwise incorrect flag combinations (like O_PATH|O_RDWR)
will result in -EINVAL. In addition, this field is 64-bits wide to
allow for more O_ flags than currently permitted with openat(2).
mode
The file mode for O_CREAT or O_TMPFILE.
Must be set to zero if flags does not contain O_CREAT or O_TMPFILE.
resolve
Restrict path resolution (in contrast to O_* flags they affect all
path components). The current set of flags are as follows (at the
moment, all of the RESOLVE_ flags are implemented as just passing
the corresponding LOOKUP_ flag).
RESOLVE_NO_XDEV => LOOKUP_NO_XDEV
RESOLVE_NO_SYMLINKS => LOOKUP_NO_SYMLINKS
RESOLVE_NO_MAGICLINKS => LOOKUP_NO_MAGICLINKS
RESOLVE_BENEATH => LOOKUP_BENEATH
RESOLVE_IN_ROOT => LOOKUP_IN_ROOT
open_how does not contain an embedded size field, because it is of
little benefit (userspace can figure out the kernel open_how size at
runtime fairly easily without it). It also only contains u64s (even
though ->mode arguably should be a u16) to avoid having padding fields
which are never used in the future.
Note that as a result of the new how->flags handling, O_PATH|O_TMPFILE
is no longer permitted for openat(2). As far as I can tell, this has
always been a bug and appears to not be used by userspace (and I've not
seen any problems on my machines by disallowing it). If it turns out
this breaks something, we can special-case it and only permit it for
openat(2) but not openat2(2).
After input from Florian Weimer, the new open_how and flag definitions
are inside a separate header from uapi/linux/fcntl.h, to avoid problems
that glibc has with importing that header.
/* Testing. */
In a follow-up patch there are over 200 selftests which ensure that this
syscall has the correct semantics and will correctly handle several
attack scenarios.
In addition, I've written a userspace library[4] which provides
convenient wrappers around openat2(RESOLVE_IN_ROOT) (this is necessary
because no other syscalls support RESOLVE_IN_ROOT, and thus lots of care
must be taken when using RESOLVE_IN_ROOT'd file descriptors with other
syscalls). During the development of this patch, I've run numerous
verification tests using libpathrs (showing that the API is reasonably
usable by userspace).
/* Future Work. */
Additional RESOLVE_ flags have been suggested during the review period.
These can be easily implemented separately (such as blocking auto-mount
during resolution).
Furthermore, there are some other proposed changes to the openat(2)
interface (the most obvious example is magic-link hardening[5]) which
would be a good opportunity to add a way for userspace to restrict how
O_PATH file descriptors can be re-opened.
Another possible avenue of future work would be some kind of
CHECK_FIELDS[6] flag which causes the kernel to indicate to userspace
which openat2(2) flags and fields are supported by the current kernel
(to avoid userspace having to go through several guesses to figure it
out).
[1]: https://lwn.net/Articles/588444/
[2]: https://lore.kernel.org/lkml/CA+55aFyyxJL1LyXZeBsf2ypriraj5ut1XkNDsunRBqgVjZU_6Q@mail.gmail.com
[3]: commit 629e014bb834 ("fs: completely ignore unknown open flags")
[4]: https://sourceware.org/bugzilla/show_bug.cgi?id=17523
[5]: https://lore.kernel.org/lkml/[email protected]/
[6]: https://youtu.be/ggD-eb3yPVs
Suggested-by: Christian Brauner <[email protected]>
Signed-off-by: Aleksa Sarai <[email protected]>
Signed-off-by: Al Viro <[email protected]>
|
|
The VVAR page layout depends on whether a task belongs to the root or
non-root time namespace. Whenever a task changes its namespace, the VVAR
page tables are cleared and then they will be re-faulted with a
corresponding layout.
Co-developed-by: Andrei Vagin <[email protected]>
Signed-off-by: Andrei Vagin <[email protected]>
Signed-off-by: Dmitry Safonov <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
As timens page has offsets to data on VVAR page VVAR is going
to be accessed shortly. Set it up with timens in one page fault
as optimization.
Suggested-by: Thomas Gleixner <[email protected]>
Co-developed-by: Andrei Vagin <[email protected]>
Signed-off-by: Andrei Vagin <[email protected]>
Signed-off-by: Dmitry Safonov <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
If a task belongs to a time namespace then the VVAR page which contains
the system wide VDSO data is replaced with a namespace specific page
which has the same layout as the VVAR page.
Co-developed-by: Andrei Vagin <[email protected]>
Signed-off-by: Andrei Vagin <[email protected]>
Signed-off-by: Dmitry Safonov <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
To support time namespaces in the VDSO with a minimal impact on regular non
time namespace affected tasks, the namespace handling needs to be hidden in
a slow path.
The most obvious place is vdso_seq_begin(). If a task belongs to a time
namespace then the VVAR page which contains the system wide VDSO data is
replaced with a namespace specific page which has the same layout as the
VVAR page. That page has vdso_data->seq set to 1 to enforce the slow path
and vdso_data->clock_mode set to VCLOCK_TIMENS to enforce the time
namespace handling path.
The extra check in the case that vdso_data->seq is odd, e.g. a concurrent
update of the VDSO data is in progress, is not really affecting regular
tasks which are not part of a time namespace as the task is spin waiting
for the update to finish and vdso_data->seq to become even again.
If a time namespace task hits that code path, it invokes the corresponding
time getter function which retrieves the real VVAR page, reads host time
and then adds the offset for the requested clock which is stored in the
special VVAR page.
Allocate the time namespace page among VVAR pages and place vdso_data on
it. Provide __arch_get_timens_vdso_data() helper for VDSO code to get the
code-relative position of VVARs on that special page.
Co-developed-by: Andrei Vagin <[email protected]>
Signed-off-by: Andrei Vagin <[email protected]>
Signed-off-by: Dmitry Safonov <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
VDSO support for time namespaces needs to set up a page with the same
layout as VVAR. That timens page will be placed on position of VVAR page
inside namespace. That page has vdso_data->seq set to 1 to enforce
the slow path and vdso_data->clock_mode set to VCLOCK_TIMENS to enforce
the time namespace handling path.
To prepare the time namespace page the kernel needs to know the vdso_data
offset. Provide arch_get_vdso_data() helper for locating vdso_data on VVAR
page.
Co-developed-by: Andrei Vagin <[email protected]>
Signed-off-by: Andrei Vagin <[email protected]>
Signed-off-by: Dmitry Safonov <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Forbid splitting VVAR VMA resulting in a stricter ABI and reducing the
amount of corner-cases to consider while working further on VDSO time
namespace support.
As the offset from timens to VVAR page is computed compile-time, the pages
in VVAR should stay together and not being partically mremap()'ed.
Co-developed-by: Andrei Vagin <[email protected]>
Signed-off-by: Andrei Vagin <[email protected]>
Signed-off-by: Dmitry Safonov <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
This wires up the pidfd_getfd syscall for all architectures.
Signed-off-by: Sargun Dhillon <[email protected]>
Acked-by: Christian Brauner <[email protected]>
Reviewed-by: Arnd Bergmann <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Christian Brauner <[email protected]>
|
|
The vDSO makefile opts out of all sanitizers (and objtool validation);
however, vma.o is a normal kernel object file (and already has objtool
validation selectively enabled), so turn the sanitizers back on for that
file.
Signed-off-by: Jann Horn <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Marco Elver <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
ignore_sysret() contains an unsuffixed SYSRET instruction. gas correctly
interprets this as SYSRETL, but leaving it up to gas to guess when there
is no register operand that implies a size is bad practice, and upstream
gas is likely to warn about this in the future. Use SYSRETL explicitly.
This does not change the assembled output.
Signed-off-by: Jan Beulich <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Acked-by: Andy Lutomirski <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
Conflicts:
init/main.c
lib/Kconfig.debug
Signed-off-by: Ingo Molnar <[email protected]>
|
|
When building with C=1, sparse issues a warning:
CHECK arch/x86/entry/vdso/vdso32-setup.c
arch/x86/entry/vdso/vdso32-setup.c:28:28: warning: symbol 'vdso32_enabled' was not declared. Should it be static?
Provide the missing header file.
Signed-off-by: Valdis Kletnieks <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: x86-ml <[email protected]>
Link: https://lkml.kernel.org/r/36224.1575599767@turing-police
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
"Various fixes:
- Fix the PAT performance regression that downgraded write-combining
device memory regions to uncached.
- There's been a number of bugs in 32-bit double fault handling -
hopefully all fixed now.
- Fix an LDT crash
- Fix an FPU over-optimization that broke with GCC9 code
optimizations.
- Misc cleanups"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm/pat: Fix off-by-one bugs in interval tree search
x86/ioperm: Save an indentation level in tss_update_io_bitmap()
x86/fpu: Don't cache access to fpu_fpregs_owner_ctx
x86/entry/32: Remove unused 'restore_all_notrace' local label
x86/ptrace: Document FSBASE and GSBASE ABI oddities
x86/ptrace: Remove set_segment_reg() implementations for current
x86/traps: die() instead of panicking on a double fault
x86/doublefault/32: Rewrite the x86_32 #DF handler and unify with 64-bit
x86/doublefault/32: Move #DF stack and TSS to cpu_entry_area
x86/doublefault/32: Rename doublefault.c to doublefault_32.c
x86/traps: Disentangle the 32-bit and 64-bit doublefault code
lkdtm: Add a DOUBLE_FAULT crash type on x86
selftests/x86/single_step_syscall: Check SYSENTER directly
x86/mm/32: Sync only to VMALLOC_END in vmalloc_sync_all()
|
|
git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground
Pull y2038 cleanups from Arnd Bergmann:
"y2038 syscall implementation cleanups
This is a series of cleanups for the y2038 work, mostly intended for
namespace cleaning: the kernel defines the traditional time_t, timeval
and timespec types that often lead to y2038-unsafe code. Even though
the unsafe usage is mostly gone from the kernel, having the types and
associated functions around means that we can still grow new users,
and that we may be missing conversions to safe types that actually
matter.
There are still a number of driver specific patches needed to get the
last users of these types removed, those have been submitted to the
respective maintainers"
Link: https://lore.kernel.org/lkml/[email protected]/
* tag 'y2038-cleanups-5.5' of git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground: (26 commits)
y2038: alarm: fix half-second cut-off
y2038: ipc: fix x32 ABI breakage
y2038: fix typo in powerpc vdso "LOPART"
y2038: allow disabling time32 system calls
y2038: itimer: change implementation to timespec64
y2038: move itimer reset into itimer.c
y2038: use compat_{get,set}_itimer on alpha
y2038: itimer: compat handling to itimer.c
y2038: time: avoid timespec usage in settimeofday()
y2038: timerfd: Use timespec64 internally
y2038: elfcore: Use __kernel_old_timeval for process times
y2038: make ns_to_compat_timeval use __kernel_old_timeval
y2038: socket: use __kernel_old_timespec instead of timespec
y2038: socket: remove timespec reference in timestamping
y2038: syscalls: change remaining timeval to __kernel_old_timeval
y2038: rusage: use __kernel_old_timeval
y2038: uapi: change __kernel_time_t to __kernel_old_time_t
y2038: stat: avoid 'time_t' in 'struct stat'
y2038: ipc: remove __kernel_time_t reference from headers
y2038: vdso: powerpc: avoid timespec references
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull seccomp updates from Kees Cook:
"Mostly this is implementing the new flag SECCOMP_USER_NOTIF_FLAG_CONTINUE,
but there are cleanups as well.
- implement SECCOMP_USER_NOTIF_FLAG_CONTINUE (Christian Brauner)
- fixes to selftests (Christian Brauner)
- remove secure_computing() argument (Christian Brauner)"
* tag 'seccomp-v5.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
seccomp: rework define for SECCOMP_USER_NOTIF_FLAG_CONTINUE
seccomp: fix SECCOMP_USER_NOTIF_FLAG_CONTINUE test
seccomp: simplify secure_computing()
seccomp: test SECCOMP_USER_NOTIF_FLAG_CONTINUE
seccomp: add SECCOMP_USER_NOTIF_FLAG_CONTINUE
seccomp: avoid overflow in implicit constant conversion
|
|
Signed-off-by: Borislav Petkov <[email protected]>
Cc: [email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
"The biggest changes in this cycle were:
- Make kcpustat vtime aware (Frederic Weisbecker)
- Rework the CFS load_balance() logic (Vincent Guittot)
- Misc cleanups, smaller enhancements, fixes.
The load-balancing rework is the most intrusive change: it replaces
the old heuristics that have become less meaningful after the
introduction of the PELT metrics, with a grounds-up load-balancing
algorithm.
As such it's not really an iterative series, but replaces the old
load-balancing logic with the new one. We hope there are no
performance regressions left - but statistically it's highly probable
that there *is* going to be some workload that is hurting from these
chnages. If so then we'd prefer to have a look at that workload and
fix its scheduling, instead of reverting the changes"
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (46 commits)
rackmeter: Use vtime aware kcpustat accessor
leds: Use all-in-one vtime aware kcpustat accessor
cpufreq: Use vtime aware kcpustat accessors for user time
procfs: Use all-in-one vtime aware kcpustat accessor
sched/vtime: Bring up complete kcpustat accessor
sched/cputime: Support other fields on kcpustat_field()
sched/cpufreq: Move the cfs_rq_util_change() call to cpufreq_update_util()
sched/fair: Add comments for group_type and balancing at SD_NUMA level
sched/fair: Fix rework of find_idlest_group()
sched/uclamp: Fix overzealous type replacement
sched/Kconfig: Fix spelling mistake in user-visible help text
sched/core: Further clarify sched_class::set_next_task()
sched/fair: Use mul_u32_u32()
sched/core: Simplify sched_class::pick_next_task()
sched/core: Optimize pick_next_task()
sched/core: Make pick_next_task_idle() more consistent
sched/fair: Better document newidle_balance()
leds: Use vtime aware kcpustat accessor to fetch CPUTIME_SYSTEM
cpufreq: Use vtime aware kcpustat accessor to fetch CPUTIME_SYSTEM
procfs: Use vtime aware kcpustat accessor to fetch CPUTIME_SYSTEM
...
|
|
The old x86_32 doublefault_fn() was old and crufty, and it did not
even try to recover. do_double_fault() is much nicer. Rewrite the
32-bit double fault code to sanitize CPU state and call
do_double_fault(). This is mostly an exercise i386 archaeology.
With this patch applied, 32-bit double faults get a real stack trace,
just like 64-bit double faults.
[ mingo: merged the patch to a later kernel base. ]
Signed-off-by: Andy Lutomirski <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Linus Torvalds <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 iopl updates from Ingo Molnar:
"This implements a nice simplification of the iopl and ioperm code that
Thomas Gleixner discovered: we can implement the IO privilege features
of the iopl system call by using the IO permission bitmap in
permissive mode, while trapping CLI/STI/POPF/PUSHF uses in user-space
if they change the interrupt flag.
This implements that feature, with testing facilities and related
cleanups"
[ "Simplification" may be an over-statement. The main goal is to avoid
the cli/sti of iopl by effectively implementing the IO port access
parts of iopl in terms of ioperm.
This may end up not workign well in case people actually depend on
cli/sti being available, or if there are mixed uses of iopl and
ioperm. We will see.. - Linus ]
* 'x86-iopl-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (22 commits)
x86/ioperm: Fix use of deprecated config option
x86/entry/32: Clarify register saving in __switch_to_asm()
selftests/x86/iopl: Extend test to cover IOPL emulation
x86/ioperm: Extend IOPL config to control ioperm() as well
x86/iopl: Remove legacy IOPL option
x86/iopl: Restrict iopl() permission scope
x86/iopl: Fixup misleading comment
selftests/x86/ioperm: Extend testing so the shared bitmap is exercised
x86/ioperm: Share I/O bitmap if identical
x86/ioperm: Remove bitmap if all permissions dropped
x86/ioperm: Move TSS bitmap update to exit to user work
x86/ioperm: Add bitmap sequence number
x86/ioperm: Move iobitmap data into a struct
x86/tss: Move I/O bitmap data into a seperate struct
x86/io: Speedup schedule out of I/O bitmap user
x86/ioperm: Avoid bitmap allocation if no permissions are set
x86/ioperm: Simplify first ioperm() invocation logic
x86/iopl: Cleanup include maze
x86/tss: Fix and move VMX BUILD_BUG_ON()
x86/cpu: Unify cpu_init()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 asm updates from Ingo Molnar:
"The main changes in this cycle were:
- Cross-arch changes to move the linker sections for NOTES and
EXCEPTION_TABLE into the RO_DATA area, where they belong on most
architectures. (Kees Cook)
- Switch the x86 linker fill byte from x90 (NOP) to 0xcc (INT3), to
trap jumps into the middle of those padding areas instead of
sliding execution. (Kees Cook)
- A thorough cleanup of symbol definitions within x86 assembler code.
The rather randomly named macros got streamlined around a
(hopefully) straightforward naming scheme:
SYM_START(name, linkage, align...)
SYM_END(name, sym_type)
SYM_FUNC_START(name)
SYM_FUNC_END(name)
SYM_CODE_START(name)
SYM_CODE_END(name)
SYM_DATA_START(name)
SYM_DATA_END(name)
etc - with about three times of these basic primitives with some
label, local symbol or attribute variant, expressed via postfixes.
No change in functionality intended. (Jiri Slaby)
- Misc other changes, cleanups and smaller fixes"
* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (67 commits)
x86/entry/64: Remove pointless jump in paranoid_exit
x86/entry/32: Remove unused resume_userspace label
x86/build/vdso: Remove meaningless CFLAGS_REMOVE_*.o
m68k: Convert missed RODATA to RO_DATA
x86/vmlinux: Use INT3 instead of NOP for linker fill bytes
x86/mm: Report actual image regions in /proc/iomem
x86/mm: Report which part of kernel image is freed
x86/mm: Remove redundant address-of operators on addresses
xtensa: Move EXCEPTION_TABLE to RO_DATA segment
powerpc: Move EXCEPTION_TABLE to RO_DATA segment
parisc: Move EXCEPTION_TABLE to RO_DATA segment
microblaze: Move EXCEPTION_TABLE to RO_DATA segment
ia64: Move EXCEPTION_TABLE to RO_DATA segment
h8300: Move EXCEPTION_TABLE to RO_DATA segment
c6x: Move EXCEPTION_TABLE to RO_DATA segment
arm64: Move EXCEPTION_TABLE to RO_DATA segment
alpha: Move EXCEPTION_TABLE to RO_DATA segment
x86/vmlinux: Move EXCEPTION_TABLE to RO_DATA segment
x86/vmlinux: Actually use _etext for the end of the text segment
vmlinux.lds.h: Allow EXCEPTION_TABLE to live in RO_DATA
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
"These are the fixes left over from the v5.4 cycle:
- Various low level 32-bit entry code fixes and improvements by Andy
Lutomirski, Peter Zijlstra and Thomas Gleixner.
- Fix 32-bit Xen PV breakage, by Jan Beulich"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/entry/32: Fix FIXUP_ESPFIX_STACK with user CR3
x86/pti/32: Calculate the various PTI cpu_entry_area sizes correctly, make the CPU_ENTRY_AREA_PAGES assert precise
selftests/x86/sigreturn/32: Invalidate DS and ES when abusing the kernel
selftests/x86/mov_ss_trap: Fix the SYSENTER test
x86/entry/32: Fix NMI vs ESPFIX
x86/entry/32: Unwind the ESPFIX stack earlier on exception entry
x86/entry/32: Move FIXUP_FRAME after pushing %fs in SAVE_ALL
x86/entry/32: Use %ss segment where required
x86/entry/32: Fix IRET exception
x86/cpu_entry_area: Add guard page for entry stack on 32bit
x86/pti/32: Size initial_page_table correctly
x86/doublefault/32: Fix stack canaries in the double fault handler
x86/xen/32: Simplify ring check in xen_iret_crit_fixup()
x86/xen/32: Make xen_iret_crit_fixup() independent of frame layout
x86/stackframe/32: Repair 32-bit Xen PV
|
|
UNWIND_ESPFIX_STACK needs to read the GDT, and the GDT mapping that
can be accessed via %fs is not mapped in the user pagetables. Use
SGDT to find the cpu_entry_area mapping and read the espfix offset
from that instead.
Reported-and-tested-by: Borislav Petkov <[email protected]>
Signed-off-by: Andy Lutomirski <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Signed-off-by: Ingo Molnar <[email protected]>
|
|
When the NMI lands on an ESPFIX_SS, we are on the entry stack and must
swizzle, otherwise we'll run do_nmi() on the entry stack, which is
BAD.
Also, similar to the normal exception path, we need to correct the
ESPFIX magic before leaving the entry stack, otherwise pt_regs will
present a non-flat stack pointer.
Tested by running sigreturn_32 concurrent with perf-record.
Fixes: e5862d0515ad ("x86/entry/32: Leave the kernel via trampoline stack")
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Acked-by: Andy Lutomirski <[email protected]>
Cc: [email protected]
|
|
Right now, we do some fancy parts of the exception entry path while SS
might have a nonzero base: we fill in regs->ss and regs->sp, and we
consider switching to the kernel stack. This results in regs->ss and
regs->sp referring to a non-flat stack and it may result in
overflowing the entry stack. The former issue means that we can try to
call iret_exc on a non-flat stack, which doesn't work.
Tested with selftests/x86/sigreturn_32.
Fixes: 45d7b255747c ("x86/entry/32: Enter the kernel via trampoline stack")
Signed-off-by: Andy Lutomirski <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: [email protected]
|
|
This will allow us to get percpu access working before FIXUP_FRAME,
which will allow us to unwind ESPFIX earlier.
Signed-off-by: Andy Lutomirski <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: [email protected]
|
|
When re-building the IRET frame we use %eax as an destination %esp,
make sure to then also match the segment for when there is a nonzero
SS base (ESPFIX).
[peterz: Changelog and minor edits]
Fixes: 3c88c692c287 ("x86/stackframe/32: Provide consistent pt_regs")
Signed-off-by: Andy Lutomirski <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: [email protected]
|
|
As reported by Lai, the commit 3c88c692c287 ("x86/stackframe/32:
Provide consistent pt_regs") wrecked the IRET EXTABLE entry by making
.Lirq_return not point at IRET.
Fix this by placing IRET_FRAME in RESTORE_REGS, to mirror how
FIXUP_FRAME is part of SAVE_ALL.
Fixes: 3c88c692c287 ("x86/stackframe/32: Provide consistent pt_regs")
Reported-by: Lai Jiangshan <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Acked-by: Andy Lutomirski <[email protected]>
Cc: [email protected]
|
|
Now that SS:ESP always get saved by SAVE_ALL, this also needs to be
accounted for in xen_iret_crit_fixup(). Otherwise the old_ax value gets
interpreted as EFLAGS, and hence VM86 mode appears to be active all the
time, leading to random "vm86_32: no user_vm86: BAD" log messages alongside
processes randomly crashing.
Since following the previous model (sitting after SAVE_ALL) would further
complicate the code _and_ retain the dependency of xen_iret_crit_fixup() on
frame manipulations done by entry_32.S, switch things around and do the
adjustment ahead of SAVE_ALL.
Fixes: 3c88c692c287 ("x86/stackframe/32: Provide consistent pt_regs")
Signed-off-by: Jan Beulich <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Juergen Gross <[email protected]>
Cc: Stable Team <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
Once again RPL checks have been introduced which don't account for a 32-bit
kernel living in ring 1 when running in a PV Xen domain. The case in
FIXUP_FRAME has been preventing boot.
Adjust BUG_IF_WRONG_CR3 as well to guard against future uses of the macro
on a code path reachable when running in PV mode under Xen; I have to admit
that I stopped at a certain point trying to figure out whether there are
present ones.
Fixes: 3c88c692c287 ("x86/stackframe/32: Provide consistent pt_regs")
Signed-off-by: Jan Beulich <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: Stable Team <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
This patch enables KCSAN for x86, with updates to build rules to not use
KCSAN for several incompatible compilation units.
Signed-off-by: Marco Elver <[email protected]>
Acked-by: Paul E. McKenney <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
|
|
Jump directly to restore_regs_and_return_to_kernel instead of making
a pointless extra jump through .Lparanoid_exit_restore
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Sean Christopherson <[email protected]>
Reviewed-by: Alexandre Chartre <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
The C reimplementation of SYSENTER left that unused ENTRY() label
around. Remove it.
Fixes: 5f310f739b4c ("x86/entry/32: Re-implement SYSENTER using the new C path")
Originally-by: Peter Zijlstra <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Sean Christopherson <[email protected]>
Reviewed-by: Alexandre Chartre <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
commit 6690e86be83a ("sched/x86: Save [ER]FLAGS on context switch")
re-introduced the flags saving on context switch to prevent AC leakage.
The pushf/popf instructions are right among the callee saved register
section, so the comment explaining the save/restore is not entirely
correct.
Add a seperate comment to pushf/popf explaining the reason.
Reported-by: Linus Torvalds <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
There is no point to update the TSS bitmap for tasks which use I/O bitmaps
on every context switch. It's enough to update it right before exiting to
user space.
That reduces the context switch bitmap handling to invalidating the io
bitmap base offset in the TSS when the outgoing task has TIF_IO_BITMAP
set. The invaldiation is done on purpose when a task with an IO bitmap
switches out to prevent any possible leakage of an activated IO bitmap.
It also removes the requirement to update the tasks bitmap atomically in
ioperm().
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
Only x86 uses the 'time' syscall in vdso, so change that to
__kernel_old_time_t as a preparation for removing 'time_t' and
'__kernel_time_t' later.
Reviewed-by: Thomas Gleixner <[email protected]>
Signed-off-by: Arnd Bergmann <[email protected]>
|
|
The gettimeofday() function in vdso uses the traditional 'timeval'
structure layout, which will be incompatible with future versions of
glibc on 32-bit architectures that use a 64-bit time_t.
This interface is problematic for y2038, when time_t overflows on 32-bit
architectures, but the plan so far is that a libc with 64-bit time_t
will not call into the gettimeofday() vdso helper at all, and only
have a method for entering clock_gettime(). This means we don't have
to fix it here, though we probably want to add a new clock_gettime()
entry point using a 64-bit version of 'struct timespec' at some point.
Changing the vdso code to use __kernel_old_timeval helps isolate
this usage from the other ones that still need to be fixed properly,
and it gets us closer to removing the 'timeval' definition from the
kernel sources.
Reviewed-by: Thomas Gleixner <[email protected]>
Signed-off-by: Arnd Bergmann <[email protected]>
|
|
CFLAGS_REMOVE_*.o syntax is used to drop particular flags when
building objects from C files. It has no effect for assembly files.
vdso-note.o is compiled from the assembly file, vdso-note.S, hence
CFLAGS_REMOVE_vdso-note.o is meaningless.
Neither vvar.c nor vvar.S is found in the vdso directory. Since there
is no source file to create vvar.o, CFLAGS_REMOVE_vvar.o is also
meaningless.
Signed-off-by: Masahiro Yamada <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: x86-ml <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
context_tracking_enabled()
Remove the superfluous "is" in the middle of the name. We want to
standardize the naming so that it can be expanded through suffixes:
context_tracking_enabled()
context_tracking_enabled_cpu()
context_tracking_enabled_this_cpu()
Signed-off-by: Frederic Weisbecker <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Jacek Anaszewski <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Pavel Machek <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Rafael J . Wysocki <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Viresh Kumar <[email protected]>
Cc: Wanpeng Li <[email protected]>
Cc: Yauheni Kaliuta <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
These are all functions which are invoked from elsewhere, so annotate
them as global using the new SYM_FUNC_START and their ENDPROC's by
SYM_FUNC_END.
Now, ENTRY/ENDPROC can be forced to be undefined on X86, so do so.
Signed-off-by: Jiri Slaby <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Cc: Allison Randal <[email protected]>
Cc: Andrey Ryabinin <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Andy Shevchenko <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Bill Metzenthen <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: Darren Hart <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Herbert Xu <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: linux-efi <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: Mark Rutland <[email protected]>
Cc: Matt Fleming <[email protected]>
Cc: Pavel Machek <[email protected]>
Cc: [email protected]
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: x86-ml <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
Change all assembly code which is marked using END (and not ENDPROC) to
appropriate new markings SYM_CODE_START and SYM_CODE_END.
And since the last user of END on X86 is gone now, make sure that END is
not defined there.
Signed-off-by: Jiri Slaby <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Cc: Andrey Ryabinin <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: Herbert Xu <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: [email protected]
Cc: Mark Rutland <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: "Steven Rostedt (VMware)" <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: x86-ml <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
All these are functions which are invoked from elsewhere but they are
not typical C functions. So annotate them using the new SYM_CODE_START.
All these were not balanced with any END, so mark their ends by
SYM_CODE_END, appropriately.
Signed-off-by: Jiri Slaby <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Boris Ostrovsky <[email protected]> [xen bits]
Reviewed-by: Rafael J. Wysocki <[email protected]> [hibernate]
Cc: Andy Lutomirski <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: Len Brown <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: Pavel Machek <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Pingfan Liu <[email protected]>
Cc: Stefano Stabellini <[email protected]>
Cc: "Steven Rostedt (VMware)" <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: x86-ml <[email protected]>
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]
|
|
These are all functions which are invoked from elsewhere, so annotate
them as global using the new SYM_FUNC_START and their ENDPROC's by
SYM_FUNC_END.
Make sure ENTRY/ENDPROC is not defined on X86_64, given these were the
last users.
Signed-off-by: Jiri Slaby <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Rafael J. Wysocki <[email protected]> [hibernate]
Reviewed-by: Boris Ostrovsky <[email protected]> [xen bits]
Acked-by: Herbert Xu <[email protected]> [crypto]
Cc: Allison Randal <[email protected]>
Cc: Andrey Ryabinin <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Andy Shevchenko <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Armijn Hemel <[email protected]>
Cc: Cao jin <[email protected]>
Cc: Darren Hart <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Enrico Weigelt <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Herbert Xu <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jim Mattson <[email protected]>
Cc: Joerg Roedel <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: Kate Stewart <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Cc: kvm ML <[email protected]>
Cc: Len Brown <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: linux-efi <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: Mark Rutland <[email protected]>
Cc: Matt Fleming <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Pavel Machek <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: [email protected]
Cc: "Radim Krčmář" <[email protected]>
Cc: Sean Christopherson <[email protected]>
Cc: Stefano Stabellini <[email protected]>
Cc: "Steven Rostedt (VMware)" <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vitaly Kuznetsov <[email protected]>
Cc: Wanpeng Li <[email protected]>
Cc: Wei Huang <[email protected]>
Cc: x86-ml <[email protected]>
Cc: [email protected]
Cc: Xiaoyao Li <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
Change all assembly code which is marked using END (and not ENDPROC).
Switch all these to the appropriate new annotation SYM_CODE_START and
SYM_CODE_END.
Signed-off-by: Jiri Slaby <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Boris Ostrovsky <[email protected]> [xen bits]
Cc: Andy Lutomirski <[email protected]>
Cc: Cao jin <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Kosina <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: [email protected]
Cc: Maran Wilson <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stefano Stabellini <[email protected]>
Cc: "Steven Rostedt (VMware)" <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: x86-ml <[email protected]>
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]
|
|
There are a couple of assembly functions which are invoked only locally
in the file they are defined. In C, they are marked "static". In
assembly, annotate them using SYM_{FUNC,CODE}_START_LOCAL (and switch
their ENDPROC to SYM_{FUNC,CODE}_END too). Whether FUNC or CODE is used,
depends on whether ENDPROC or END was used for a particular function
before.
Signed-off-by: Jiri Slaby <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Andy Shevchenko <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: Darren Hart <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: [email protected]
Cc: linux-efi <[email protected]>
Cc: [email protected]
Cc: Matt Fleming <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: [email protected]
Cc: Stefano Stabellini <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: x86-ml <[email protected]>
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]
|
|
Convert the remaining 32bit users and remove the GLOBAL macro finally.
In particular, this means to use SYM_ENTRY for the singlestepping hack
region.
Exclude the global definition of GLOBAL from x86 too.
Signed-off-by: Jiri Slaby <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Andrey Ryabinin <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: [email protected]
Cc: Mark Rutland <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: x86-ml <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
The GLOBAL macro had several meanings and is going away. Convert all the
inner function labels marked with GLOBAL to use SYM_INNER_LABEL instead.
Signed-off-by: Jiri Slaby <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Kosina <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: [email protected]
Cc: Masami Hiramatsu <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: "Steven Rostedt (VMware)" <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: x86-ml <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
* annotate functions properly by SYM_CODE_START, SYM_CODE_START_LOCAL*
and SYM_CODE_END -- these are not C-like functions, so they have to
be annotated using CODE.
* use SYM_INNER_LABEL* for labels being in the middle of other functions
This prevents nested labels annotations.
Signed-off-by: Jiri Slaby <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: [email protected]
Cc: Thomas Gleixner <[email protected]>
Cc: x86-ml <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|