Age | Commit message (Collapse) | Author | Files | Lines |
|
Add "facilities=" command line option which allows to override
facility bits returned by stfle. The main purpose of that is debugging
aids which allows to test specific kernel behaviour depending on
specific facilities presence. It also affects CPU alternatives.
"facilities=" command line option format is comma separated list of
integer values to be additionally set or cleared (if value is starting
with "!"). Values ranges are also supported. e.g.:
facilities=!130-160,159,167-169
Reviewed-by: Heiko Carstens <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>
|
|
Facilities list in the lowcore is initially set up by verify_facilities
from als.c and later initializations are redundant, so cleaning them up.
Reviewed-by: Heiko Carstens <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>
|
|
Reuse __stfle call instead of in-place implementation. __stfle is using
memcpy and memset functions but they are safe to use, since mem.S is
built with -march=z900.
Reviewed-by: Heiko Carstens <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>
|
|
Add KVM_PPC_CPU_CHAR_BCCTR_FLUSH_ASSIST &
KVM_PPC_CPU_BEHAV_FLUSH_COUNT_CACHE to the characteristics returned
from the H_GET_CPU_CHARACTERISTICS H-CALL, as queried from either the
hypervisor or the device tree.
Signed-off-by: Suraj Jitindar Singh <[email protected]>
Signed-off-by: Paul Mackerras <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux
Pull MIPS fixes from Paul Burton:
"A few more MIPS fixes:
- Fix 16b cmpxchg() operations which could erroneously fail if bits
15:8 of the old value are non-zero. In practice I'm not aware of
any actual users of 16b cmpxchg() on MIPS, but this fixes the
support for it was was introduced in v4.13.
- Provide a struct device to dma_alloc_coherent for Lantiq XWAY
systems with a "Voice MIPS Macro Core" (VMMC) device.
- Provide DMA masks for BCM63xx ethernet devices, fixing a regression
introduced in v4.19.
- Fix memblock reservation for the kernel when the system has a
non-zero PHYS_OFFSET, correcting the memblock conversion performed
in v4.20"
* tag 'mips_fixes_5.0_4' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
MIPS: fix memory setup for platforms with PHYS_OFFSET != 0
MIPS: BCM63XX: provide DMA masks for ethernet devices
MIPS: lantiq: pass struct device to DMA API functions
MIPS: fix truncation in __cmpxchg_small for short values
|
|
Building a preprocessed source file for arm64 now always produces
a warning with clang because of the page_to_virt() macro assigning
a variable to itself.
Adding a new temporary variable avoids this issue.
Fixes: 2813b9c02962 ("kasan, mm, arm64: tag non slab memory allocated via pagealloc")
Reviewed-by: Andrey Konovalov <[email protected]>
Signed-off-by: Arnd Bergmann <[email protected]>
Signed-off-by: Catalin Marinas <[email protected]>
|
|
When ARCH_MXC get enabled, ARM64_ERRATUM_845719 will be selected and
this warning will happen when COMPAT isn't set.
WARNING: unmet direct dependencies detected for ARM64_ERRATUM_845719
Depends on [n]: COMPAT [=n]
Selected by [y]:
- ARCH_MXC [=y]
Rework to add 'if COMPAT' before ARM64_ERRATUM_845719 gets selected,
since ARM64_ERRATUM_845719 depends on COMPAT.
Acked-by: Will Deacon <[email protected]>
Signed-off-by: Anders Roxell <[email protected]>
Signed-off-by: Catalin Marinas <[email protected]>
|
|
Ensure that inX() provides the same ordering guarantees as readX()
by hooking up __io_par() so that it maps directly to __iormb().
Reported-by: Andrew Murray <[email protected]>
Reviewed-by: Palmer Dabbelt <[email protected]>
Signed-off-by: Will Deacon <[email protected]>
Signed-off-by: Catalin Marinas <[email protected]>
|
|
The definitions of the __io_[p]ar() macros in asm-generic/io.h take the
value returned by the preceding I/O read as an argument so that
architectures can use this to create order with a subsequent delayX()
routine using a dependency.
Update the riscv barrier definitions to match, although the argument
is currently unused.
Suggested-by: Arnd Bergmann <[email protected]>
Reviewed-by: Palmer Dabbelt <[email protected]>
Signed-off-by: Will Deacon <[email protected]>
Signed-off-by: Catalin Marinas <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto fixes from Herbert Xu:
"This fixes a compiler warning introduced by a previous fix, as well as
two crash bugs on ARM"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: sha512/arm - fix crash bug in Thumb2 build
crypto: sha256/arm - fix crash bug in Thumb2 build
crypto: ccree - add missing inline qualifier
|
|
On the Fujitsu-A64FX cores ver(1.0, 1.1), memory access may cause
an undefined fault (Data abort, DFSC=0b111111). This fault occurs under
a specific hardware condition when a load/store instruction performs an
address translation. Any load/store instruction, except non-fault access
including Armv8 and SVE might cause this undefined fault.
The TCR_ELx.NFD1 bit is used by the kernel when CONFIG_RANDOMIZE_BASE
is enabled to mitigate timing attacks against KASLR where the kernel
address space could be probed using the FFR and suppressed fault on
SVE loads.
Since this erratum causes spurious exceptions, which may corrupt
the exception registers, we clear the TCR_ELx.NFDx=1 bits when
booting on an affected CPU.
Signed-off-by: Zhang Lei <[email protected]>
[Generated MIDR value/mask for __cpu_setup(), removed spurious-fault handler
and always disabled the NFDx bits on affected CPUs]
Signed-off-by: James Morse <[email protected]>
Tested-by: zhang.lei <[email protected]>
Signed-off-by: Catalin Marinas <[email protected]>
|
|
If we have fixed user buffers, we can map them into the kernel when we
setup the io_uring. That avoids the need to do get_user_pages() for
each and every IO.
To utilize this feature, the application must call io_uring_register()
after having setup an io_uring instance, passing in
IORING_REGISTER_BUFFERS as the opcode. The argument must be a pointer to
an iovec array, and the nr_args should contain how many iovecs the
application wishes to map.
If successful, these buffers are now mapped into the kernel, eligible
for IO. To use these fixed buffers, the application must use the
IORING_OP_READ_FIXED and IORING_OP_WRITE_FIXED opcodes, and then
set sqe->index to the desired buffer index. sqe->addr..sqe->addr+seq->len
must point to somewhere inside the indexed buffer.
The application may register buffers throughout the lifetime of the
io_uring instance. It can call io_uring_register() with
IORING_UNREGISTER_BUFFERS as the opcode to unregister the current set of
buffers, and then register a new set. The application need not
unregister buffers explicitly before shutting down the io_uring
instance.
It's perfectly valid to setup a larger buffer, and then sometimes only
use parts of it for an IO. As long as the range is within the originally
mapped region, it will work just fine.
For now, buffers must not be file backed. If file backed buffers are
passed in, the registration will fail with -1/EOPNOTSUPP. This
restriction may be relaxed in the future.
RLIMIT_MEMLOCK is used to check how much memory we can pin. A somewhat
arbitrary 1G per buffer size is also imposed.
Reviewed-by: Hannes Reinecke <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
The submission queue (SQ) and completion queue (CQ) rings are shared
between the application and the kernel. This eliminates the need to
copy data back and forth to submit and complete IO.
IO submissions use the io_uring_sqe data structure, and completions
are generated in the form of io_uring_cqe data structures. The SQ
ring is an index into the io_uring_sqe array, which makes it possible
to submit a batch of IOs without them being contiguous in the ring.
The CQ ring is always contiguous, as completion events are inherently
unordered, and hence any io_uring_cqe entry can point back to an
arbitrary submission.
Two new system calls are added for this:
io_uring_setup(entries, params)
Sets up an io_uring instance for doing async IO. On success,
returns a file descriptor that the application can mmap to
gain access to the SQ ring, CQ ring, and io_uring_sqes.
io_uring_enter(fd, to_submit, min_complete, flags, sigset, sigsetsize)
Initiates IO against the rings mapped to this fd, or waits for
them to complete, or both. The behavior is controlled by the
parameters passed in. If 'to_submit' is non-zero, then we'll
try and submit new IO. If IORING_ENTER_GETEVENTS is set, the
kernel will wait for 'min_complete' events, if they aren't
already available. It's valid to set IORING_ENTER_GETEVENTS
and 'min_complete' == 0 at the same time, this allows the
kernel to return already completed events without waiting
for them. This is useful only for polling, as for IRQ
driven IO, the application can just check the CQ ring
without entering the kernel.
With this setup, it's possible to do async IO with a single system
call. Future developments will enable polled IO with this interface,
and polled submission as well. The latter will enable an application
to do IO without doing ANY system calls at all.
For IRQ driven IO, an application only needs to enter the kernel for
completions if it wants to wait for them to occur.
Each io_uring is backed by a workqueue, to support buffered async IO
as well. We will only punt to an async context if the command would
need to wait for IO on the device side. Any data that can be accessed
directly in the page cache is done inline. This avoids the slowness
issue of usual threadpools, since cached data is accessed as quickly
as a sync interface.
Sample application: http://git.kernel.dk/cgit/fio/plain/t/io_uring.c
Reviewed-by: Hannes Reinecke <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-next
ASoC: More changes for v5.1
Another batch of changes for ASoC, no big core changes - it's mainly
small fixes and improvements for individual drivers.
- A big refresh and cleanup of the Samsung drivers, fixing a number of
issues which allow the driver to be used with a wider range of
userspaces.
- Fixes for the Intel drivers to make them more standard so less likely
to get bitten by core issues.
- New driver for Cirrus Logic CS35L26.
|
|
EFI systems do not necessarily provide a legacy ROM. If the ROM is missing
the memory is not mapped at all.
Trying to dereference values in the legacy ROM area leads to a crash on
Macbook Pro.
Only look for values in the legacy ROM area for non-EFI system.
Fixes: 3548e131ec6a ("x86/boot/compressed/64: Find a place for 32-bit trampoline")
Reported-by: Pitam Mitra <[email protected]>
Signed-off-by: Kirill A. Shutemov <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Bockjoo Kim <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=202351
|
|
From networking side, there are numerous attempts to get rid of indirect
calls in fast-path wherever feasible in order to avoid the cost of
retpolines, for example, just to name a few:
* 283c16a2dfd3 ("indirect call wrappers: helpers to speed-up indirect calls of builtin")
* aaa5d90b395a ("net: use indirect call wrappers at GRO network layer")
* 028e0a476684 ("net: use indirect call wrappers at GRO transport layer")
* 356da6d0cde3 ("dma-mapping: bypass indirect calls for dma-direct")
* 09772d92cd5a ("bpf: avoid retpoline for lookup/update/delete calls on maps")
* 10870dd89e95 ("netfilter: nf_tables: add direct calls for all builtin expressions")
[...]
Recent work on XDP from Björn and Magnus additionally found that manually
transforming the XDP return code switch statement with more than 5 cases
into if-else combination would result in a considerable speedup in XDP
layer due to avoidance of indirect calls in CONFIG_RETPOLINE enabled
builds. On i40e driver with XDP prog attached, a 20-26% speedup has been
observed [0]. Aside from XDP, there are many other places later in the
networking stack's critical path with similar switch-case
processing. Rather than fixing every XDP-enabled driver and locations in
stack by hand, it would be good to instead raise the limit where gcc would
emit expensive indirect calls from the switch under retpolines and stick
with the default as-is in case of !retpoline configured kernels. This would
also have the advantage that for archs where this is not necessary, we let
compiler select the underlying target optimization for these constructs and
avoid potential slow-downs by if-else hand-rewrite.
In case of gcc, this setting is controlled by case-values-threshold which
has an architecture global default that selects 4 or 5 (latter if target
does not have a case insn that compares the bounds) where some arch back
ends like arm64 or s390 override it with their own target hooks, for
example, in gcc commit db7a90aa0de5 ("S/390: Disable prediction of indirect
branches") the threshold pretty much disables jump tables by limit of 20
under retpoline builds. Comparing gcc's and clang's default code
generation on x86-64 under O2 level with retpoline build results in the
following outcome for 5 switch cases:
* gcc with -mindirect-branch=thunk-inline -mindirect-branch-register:
# gdb -batch -ex 'disassemble dispatch' ./c-switch
Dump of assembler code for function dispatch:
0x0000000000400be0 <+0>: cmp $0x4,%edi
0x0000000000400be3 <+3>: ja 0x400c35 <dispatch+85>
0x0000000000400be5 <+5>: lea 0x915f8(%rip),%rdx # 0x4921e4
0x0000000000400bec <+12>: mov %edi,%edi
0x0000000000400bee <+14>: movslq (%rdx,%rdi,4),%rax
0x0000000000400bf2 <+18>: add %rdx,%rax
0x0000000000400bf5 <+21>: callq 0x400c01 <dispatch+33>
0x0000000000400bfa <+26>: pause
0x0000000000400bfc <+28>: lfence
0x0000000000400bff <+31>: jmp 0x400bfa <dispatch+26>
0x0000000000400c01 <+33>: mov %rax,(%rsp)
0x0000000000400c05 <+37>: retq
0x0000000000400c06 <+38>: nopw %cs:0x0(%rax,%rax,1)
0x0000000000400c10 <+48>: jmpq 0x400c90 <fn_3>
0x0000000000400c15 <+53>: nopl (%rax)
0x0000000000400c18 <+56>: jmpq 0x400c70 <fn_2>
0x0000000000400c1d <+61>: nopl (%rax)
0x0000000000400c20 <+64>: jmpq 0x400c50 <fn_1>
0x0000000000400c25 <+69>: nopl (%rax)
0x0000000000400c28 <+72>: jmpq 0x400c40 <fn_0>
0x0000000000400c2d <+77>: nopl (%rax)
0x0000000000400c30 <+80>: jmpq 0x400cb0 <fn_4>
0x0000000000400c35 <+85>: push %rax
0x0000000000400c36 <+86>: callq 0x40dd80 <abort>
End of assembler dump.
* clang with -mretpoline emitting search tree:
# gdb -batch -ex 'disassemble dispatch' ./c-switch
Dump of assembler code for function dispatch:
0x0000000000400b30 <+0>: cmp $0x1,%edi
0x0000000000400b33 <+3>: jle 0x400b44 <dispatch+20>
0x0000000000400b35 <+5>: cmp $0x2,%edi
0x0000000000400b38 <+8>: je 0x400b4d <dispatch+29>
0x0000000000400b3a <+10>: cmp $0x3,%edi
0x0000000000400b3d <+13>: jne 0x400b52 <dispatch+34>
0x0000000000400b3f <+15>: jmpq 0x400c50 <fn_3>
0x0000000000400b44 <+20>: test %edi,%edi
0x0000000000400b46 <+22>: jne 0x400b5c <dispatch+44>
0x0000000000400b48 <+24>: jmpq 0x400c20 <fn_0>
0x0000000000400b4d <+29>: jmpq 0x400c40 <fn_2>
0x0000000000400b52 <+34>: cmp $0x4,%edi
0x0000000000400b55 <+37>: jne 0x400b66 <dispatch+54>
0x0000000000400b57 <+39>: jmpq 0x400c60 <fn_4>
0x0000000000400b5c <+44>: cmp $0x1,%edi
0x0000000000400b5f <+47>: jne 0x400b66 <dispatch+54>
0x0000000000400b61 <+49>: jmpq 0x400c30 <fn_1>
0x0000000000400b66 <+54>: push %rax
0x0000000000400b67 <+55>: callq 0x40dd20 <abort>
End of assembler dump.
For sake of comparison, clang without -mretpoline:
# gdb -batch -ex 'disassemble dispatch' ./c-switch
Dump of assembler code for function dispatch:
0x0000000000400b30 <+0>: cmp $0x4,%edi
0x0000000000400b33 <+3>: ja 0x400b57 <dispatch+39>
0x0000000000400b35 <+5>: mov %edi,%eax
0x0000000000400b37 <+7>: jmpq *0x492148(,%rax,8)
0x0000000000400b3e <+14>: jmpq 0x400bf0 <fn_0>
0x0000000000400b43 <+19>: jmpq 0x400c30 <fn_4>
0x0000000000400b48 <+24>: jmpq 0x400c10 <fn_2>
0x0000000000400b4d <+29>: jmpq 0x400c20 <fn_3>
0x0000000000400b52 <+34>: jmpq 0x400c00 <fn_1>
0x0000000000400b57 <+39>: push %rax
0x0000000000400b58 <+40>: callq 0x40dcf0 <abort>
End of assembler dump.
Raising the cases to a high number (e.g. 100) will still result in similar
code generation pattern with clang and gcc as above, in other words clang
generally turns off jump table emission by having an extra expansion pass
under retpoline build to turn indirectbr instructions from their IR into
switch instructions as a built-in -mno-jump-table lowering of a switch (in
this case, even if IR input already contained an indirect branch).
For gcc, adding --param=case-values-threshold=20 as in similar fashion as
s390 in order to raise the limit for x86 retpoline enabled builds results
in a small vmlinux size increase of only 0.13% (before=18,027,528
after=18,051,192). For clang this option is ignored due to i) not being
needed as mentioned and ii) not having above cmdline
parameter. Non-retpoline-enabled builds with gcc continue to use the
default case-values-threshold setting, so nothing changes here.
[0] https://lore.kernel.org/netdev/[email protected]/
and "The Path to DPDK Speeds for AF_XDP", LPC 2018, networking track:
- http://vger.kernel.org/lpc_net2018_talks/lpc18_pres_af_xdp_perf-v3.pdf
- http://vger.kernel.org/lpc_net2018_talks/lpc18_paper_af_xdp_perf-v2.pdf
Signed-off-by: Daniel Borkmann <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Acked-by: Jesper Dangaard Brouer <[email protected]>
Acked-by: Björn Töpel <[email protected]>
Acked-by: Linus Torvalds <[email protected]>
Cc: [email protected]
Cc: David S. Miller <[email protected]>
Cc: Magnus Karlsson <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: David Woodhouse <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Borislav Petkov <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
The max flush rep count of HvFlushGuestPhysicalAddressList hypercall is
equal with how many entries of union hv_gpa_page_range can be populated
into the input parameter page.
The code lacks parenthesis around PAGE_SIZE - 2 * sizeof(u64) which results
in bogus computations. Add them.
Fixes: cc4edae4b924 ("x86/hyper-v: Add HvFlushGuestAddressList hypercall support")
Signed-off-by: Lan Tianyu <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]
|
|
Hyper-V doesn't provide irq remapping for IO-APIC. To enable x2apic,
set x2apic destination mode to physcial mode when x2apic is available
and Hyper-V IOMMU driver makes sure cpus assigned with IO-APIC irqs have
8-bit APIC id.
Reviewed-by: Thomas Gleixner <[email protected]>
Reviewed-by: Michael Kelley <[email protected]>
Signed-off-by: Lan Tianyu <[email protected]>
Signed-off-by: Joerg Roedel <[email protected]>
|
|
Make kernfs support superblock creation/mount/remount with fs_context.
This requires that sysfs, cgroup and intel_rdt, which are built on kernfs,
be made to support fs_context also.
Notes:
(1) A kernfs_fs_context struct is created to wrap fs_context and the
kernfs mount parameters are moved in here (or are in fs_context).
(2) kernfs_mount{,_ns}() are made into kernfs_get_tree(). The extra
namespace tag parameter is passed in the context if desired
(3) kernfs_free_fs_context() is provided as a destructor for the
kernfs_fs_context struct, but for the moment it does nothing except
get called in the right places.
(4) sysfs doesn't wrap kernfs_fs_context since it has no parameters to
pass, but possibly this should be done anyway in case someone wants to
add a parameter in future.
(5) A cgroup_fs_context struct is created to wrap kernfs_fs_context and
the cgroup v1 and v2 mount parameters are all moved there.
(6) cgroup1 parameter parsing error messages are now handled by invalf(),
which allows userspace to collect them directly.
(7) cgroup1 parameter cleanup is now done in the context destructor rather
than in the mount/get_tree and remount functions.
Weirdies:
(*) cgroup_do_get_tree() calls cset_cgroup_from_root() with locks held,
but then uses the resulting pointer after dropping the locks. I'm
told this is okay and needs commenting.
(*) The cgroup refcount web. This really needs documenting.
(*) cgroup2 only has one root?
Add a suggestion from Thomas Gleixner in which the RDT enablement code is
placed into its own function.
[folded a leak fix from Andrey Vagin]
Signed-off-by: David Howells <[email protected]>
cc: Greg Kroah-Hartman <[email protected]>
cc: Tejun Heo <[email protected]>
cc: Li Zefan <[email protected]>
cc: Johannes Weiner <[email protected]>
cc: [email protected]
cc: [email protected]
Signed-off-by: Al Viro <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core
Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
perf annotate:
Wei Li:
- Fix getting source line failure
perf script:
Andi Kleen:
- Handle missing fields with -F +...
perf data:
Jiri Olsa:
- Prep work to support per-cpu files in a directory.
Intel PT:
Adrian Hunter:
- Improve thread_stack__no_call_return()
- Hide x86 retpolines in thread stacks.
- exported SQL viewer refactorings, new 'top calls' report..
Alexander Shishkin:
- Copy parent's address filter offsets on clone
- Fix address filters for vmas with non-zero offset. Applies to
ARM's CoreSight as well.
python scripts:
Tony Jones:
- Python3 support for several 'perf script' python scripts.
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Signed-off-by: Ingo Molnar <[email protected]>
|
|
On big endian arm64 kernels, the xchacha20-neon and xchacha12-neon
self-tests fail because hchacha_block_neon() outputs little endian words
but the C code expects native endianness. Fix it to output the words in
native endianness (which also makes it match the arm32 version).
Fixes: cc7cf991e9eb ("crypto: arm64/chacha20 - add XChaCha20 support")
Signed-off-by: Eric Biggers <[email protected]>
Reviewed-by: Ard Biesheuvel <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>
|
|
The change to encrypt a fifth ChaCha block using scalar instructions
caused the chacha20-neon, xchacha20-neon, and xchacha12-neon self-tests
to start failing on big endian arm64 kernels. The bug is that the
keystream block produced in 32-bit scalar registers is directly XOR'd
with the data words, which are loaded and stored in native endianness.
Thus in big endian mode the data bytes end up XOR'd with the wrong
bytes. Fix it by byte-swapping the keystream words in big endian mode.
Fixes: 2fe55987b262 ("crypto: arm64/chacha - use combined SIMD/ALU routine for more speed")
Signed-off-by: Eric Biggers <[email protected]>
Reviewed-by: Ard Biesheuvel <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>
|
|
1-block SSE2 variant of poly1305 stores variables s1..s4 containing key
material on the stack. This commit adds missing zeroing of the stack
memory. Benchmarks show negligible performance hit (tested on i7-3770).
Signed-off-by: Tommi Hirvola <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>
|
|
For platforms, which use a PHYS_OFFSET != 0, symbol _end also
contains that offset. So when calling memblock_reserve() for
reserving kernel the size argument needs to be adjusted.
Fixes: bcec54bf3118 ("mips: switch to NO_BOOTMEM")
Acked-by: Mike Rapoport <[email protected]>
Signed-off-by: Thomas Bogendoerfer <[email protected]>
Signed-off-by: Paul Burton <[email protected]>
Cc: Ralf Baechle <[email protected]>
Cc: James Hogan <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: Mike Rapoport <[email protected]>
Cc: [email protected] # v4.20+
|
|
We store 2 multilevel tables in iommu_table - one for the hardware and
one with the corresponding userspace addresses. Before allocating
the tables, the iommu_table_group_ops::get_table_size() hook returns
the combined size of the two and VFIO SPAPR TCE IOMMU driver adjusts
the locked_vm counter correctly. When the table is actually allocated,
the amount of allocated memory is stored in iommu_table::it_allocated_size
and used to decrement the locked_vm counter when we release the memory
used by the table; .get_table_size() and .create_table() calculate it
independently but the result is expected to be the same.
However the allocator does not add the userspace table size to
.it_allocated_size so when we destroy the table because of VFIO PCI
unplug (i.e. VFIO container is gone but the userspace keeps running),
we decrement locked_vm by just a half of size of memory we are
releasing.
To make things worse, since we enabled on-demand allocation of
indirect levels, it_allocated_size contains only the amount of memory
actually allocated at the table creation time which can just be a
fraction. It is not a problem with incrementing locked_vm (as
get_table_size() value is used) but it is with decrementing.
As the result, we leak locked_vm and may not be able to allocate more
IOMMU tables after few iterations of hotplug/unplug.
This sets it_allocated_size in the pnv_pci_ioda2_ops::create_table()
hook to what pnv_pci_ioda2_get_table_size() returns so from now on we
have a single place which calculates the maximum memory a table can
occupy. The original meaning of it_allocated_size is somewhat lost now
though.
We do not ditch it_allocated_size whatsoever here and we do not call
get_table_size() from vfio_iommu_spapr_tce.c when decrementing
locked_vm as we may have multiple IOMMU groups per container and even
though they all are supposed to have the same get_table_size()
implementation, there is a small chance for failure or confusion.
Fixes: 090bad39b237 ("powerpc/powernv: Add indirect levels to it_userspace")
Signed-off-by: Alexey Kardashevskiy <[email protected]>
Reviewed-by: David Gibson <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground into timers/2038
Pull additional syscall ABI cleanup for y2038 from Arnd Bergmann:
This is a follow-up to the y2038 syscall patches already merged in the tip
tree. As the final 32-bit RISC-V syscall ABI is still being decided on,
this is the last chance to make a few corrections to leave out interfaces
based on 32-bit time_t along with the old off_t and rlimit types.
The series achieves this in a few steps:
- A couple of bug fixes for minor regressions I introduced
in the original series
- A couple of older patches from Yury Norov that I had never
merged in the past, these fix up the openat/open_by_handle_at and
getrlimit/setrlimit syscalls to disallow the old versions of off_t
and rlimit.
- Hiding the deprecated system calls behind an #ifdef in
include/uapi/asm-generic/unistd.h
- Change arch/riscv to drop all these ABIs.
Originally, the plan was to also leave these out on C-Sky, but that now
has a glibc port that uses the older interfaces, so we need to leave
them in place.
|
|
The commit identified below adds MC_BTB_FLUSH macro only when
CONFIG_PPC_FSL_BOOK3E is defined. This results in the following error
on some configs (seen several times with kisskb randconfig_defconfig)
arch/powerpc/kernel/exceptions-64e.S:576: Error: Unrecognized opcode: `mc_btb_flush'
make[3]: *** [scripts/Makefile.build:367: arch/powerpc/kernel/exceptions-64e.o] Error 1
make[2]: *** [scripts/Makefile.build:492: arch/powerpc/kernel] Error 2
make[1]: *** [Makefile:1043: arch/powerpc] Error 2
make: *** [Makefile:152: sub-make] Error 2
This patch adds a blank definition of MC_BTB_FLUSH for other cases.
Fixes: 10c5e83afd4a ("powerpc/fsl: Flush the branch predictor at each kernel entry (64bit)")
Cc: Diana Craciun <[email protected]>
Signed-off-by: Christophe Leroy <[email protected]>
Reviewed-by: Daniel Axtens <[email protected]>
Reviewed-by: Diana Craciun <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Currently the opal log is globally readable. It is kernel policy to
limit the visibility of physical addresses / kernel pointers to root.
Given this and the fact the opal log may contain this information it
would be better to limit the readability to root.
Fixes: bfc36894a48b ("powerpc/powernv: Add OPAL message log interface")
Cc: [email protected] # v3.15+
Signed-off-by: Jordan Niethe <[email protected]>
Reviewed-by: Stewart Smith <[email protected]>
Reviewed-by: Andrew Donnellan <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Currently, we don't coordinate BT USB activity with our handling of the
BT out-of-band wake pin, and instead just use gpio-keys. That causes
problems because we have no way of distinguishing wake activity due to a
BT device (e.g., mouse) vs. the BT controller (e.g., re-configuring wake
mask before suspend). This can cause spurious wake events just because
we, for instance, try to reconfigure the host controller's event mask
before suspending.
We can avoid these synchronization problems by handling the BT wake pin
directly in the btusb driver -- for all activity up until BT controller
suspend(), we simply listen to normal USB activity (e.g., to know the
difference between device and host activity); once we're really ready to
suspend the host controller, there should be no more host activity, and
only *then* do we unmask the GPIO interrupt.
This is already supported by btusb; we just need to describe the wake
pin in the right node.
We list 2 compatible properties, since both PID/VID pairs show up on
Scarlet devices, and they're both essentially identical QCA6174A-based
modules.
Also note that the polarity was wrong before: Qualcomm implemented WAKE
as active high, not active low. We only got away with this because
gpio-keys always reconfigured us as bi-directional edge-triggered.
Finally, we have an external pull-up and a level-shifter on this line
(we didn't notice Qualcomm's polarity in the initial design), so we
can't do pull-down. Switch to pull-none.
Signed-off-by: Brian Norris <[email protected]>
Reviewed-by: Matthias Kaehlcke <[email protected]>
Signed-off-by: Marcel Holtmann <[email protected]>
|
|
My console locks up as soon as Linux writes to [88800000,88f00000[
AFAIU, that memory area is reserved for trustzone.
Extend TZ reserved memory range, to prevent Linux from stepping on
trustzone's toes.
Cc: [email protected] # 4.20+
Reviewed-by: Sibi Sankar <[email protected]>
Fixes: c7833949564ec ("arm64: dts: qcom: msm8998: Add smem related nodes")
Signed-off-by: Marc Gonzalez <[email protected]>
Signed-off-by: Andy Gross <[email protected]>
|
|
Compiling with CONFIG_PPC_POWERNV=y and KVM disabled currently gives
an error like this:
CC arch/powerpc/kernel/dbell.o
In file included from arch/powerpc/kernel/dbell.c:20:0:
arch/powerpc/include/asm/kvm_ppc.h: In function ‘xics_on_xive’:
arch/powerpc/include/asm/kvm_ppc.h:625:9: error: implicit declaration of function ‘xive_enabled’ [-Werror=implicit-function-declaration]
return xive_enabled() && cpu_has_feature(CPU_FTR_HVMODE);
^
cc1: all warnings being treated as errors
scripts/Makefile.build:276: recipe for target 'arch/powerpc/kernel/dbell.o' failed
make[3]: *** [arch/powerpc/kernel/dbell.o] Error 1
Fix this by making the xics_on_xive() definition conditional on the
same symbol (CONFIG_KVM_BOOK3S_64_HANDLER) that determines whether we
include <asm/xive.h> or not, since that's the header that defines
xive_enabled().
Fixes: 03f953329bd8 ("KVM: PPC: Book3S: Allow XICS emulation to work in nested hosts using XIVE")
Signed-off-by: Paul Mackerras <[email protected]>
|
|
The assembly macro get_thread_info() actually returns a task_struct and is
analogous to the current/get_current macro/function.
While it could be argued that thread_info sits at the start of
task_struct and the intention could have been to return a thread_info,
instances of loads from/stores to the address obtained from
get_thread_info() use offsets that are generated with
offsetof(struct task_struct, [...]).
Rename get_thread_info() to state it returns a task_struct.
Acked-by: Mark Rutland <[email protected]>
Signed-off-by: Julien Thierry <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Catalin Marinas <[email protected]>
|
|
TIF_USEDFPU is not defined as thread flags for Arm64. So drop it from
the documentation.
Acked-by: Will Deacon <[email protected]>
Signed-off-by: Julien Grall <[email protected]>
Cc: [email protected]
Signed-off-by: Catalin Marinas <[email protected]>
|
|
When building with -Wsometimes-uninitialized, Clang warns:
arch/powerpc/xmon/ppc-dis.c:157:7: warning: variable 'opcode' is used
uninitialized whenever 'if' condition is false
[-Wsometimes-uninitialized]
if (cpu_has_feature(CPU_FTRS_POWER9))
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
arch/powerpc/xmon/ppc-dis.c:167:7: note: uninitialized use occurs here
if (opcode == NULL)
^~~~~~
arch/powerpc/xmon/ppc-dis.c:157:3: note: remove the 'if' if its
condition is always true
if (cpu_has_feature(CPU_FTRS_POWER9))
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
arch/powerpc/xmon/ppc-dis.c:132:38: note: initialize the variable
'opcode' to silence this warning
const struct powerpc_opcode *opcode;
^
= NULL
1 warning generated.
This warning seems to make no sense on the surface because opcode is set
to NULL right below this statement. However, there is a comma instead of
semicolon to end the dialect assignment, meaning that the opcode
assignment only happens in the if statement. Properly terminate that
line so that Clang no longer warns.
Fixes: 5b102782c7f4 ("powerpc/xmon: Enable disassembly files (compilation changes)")
Signed-off-by: Nathan Chancellor <[email protected]>
Reviewed-by: Nick Desaulniers <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
The OPAL call wrapper gets interrupt disabling wrong. It disables
interrupts just by clearing MSR[EE], which has two problems:
- It doesn't call into the IRQ tracing subsystem, which means tracing
across OPAL calls does not always notice IRQs have been disabled.
- It doesn't go through the IRQ soft-mask code, which causes a minor
bug. MSR[EE] can not be restored by saving the MSR then clearing
MSR[EE], because a racing interrupt while soft-masked could clear
MSR[EE] between the two steps. This can cause MSR[EE] to be
incorrectly enabled when the OPAL call returns. Fortunately that
should only result in another masked interrupt being taken to
disable MSR[EE] again, but it's a bit sloppy.
The existing code also saves MSR to PACA, which is not re-entrant if
there is a nested OPAL call from different MSR contexts, which can
happen these days with SRESET interrupts on bare metal.
To fix these issues, move the tracing and IRQ handling code to C, and
call into asm just for the low level call when everything is ready to
go. Save the MSR on stack rather than PACA.
Performance cost is kept to a minimum with a few optimisations:
- The endian switch upon return is combined with the MSR restore,
which avoids an expensive context synchronizing operation for LE
kernels. This makes up for the additional mtmsrd to enable
interrupts with local_irq_enable().
- blr is now used to return from the opal_* functions that are called
as C functions, to avoid link stack corruption. This requires a
skiboot fix as well to keep the call stack balanced.
A NULL call is more costly after this, (410ns->430ns on POWER9), but
OPAL calls are generally not performance critical at this scale.
Signed-off-by: Nicholas Piggin <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Handlers for interrupts that set DAR / DSISR, set MSR[RI] before those
SPRs are read. If a d-side machine check hits in this window, DAR /
DSISR will be clobbered silently, leading to random corruption.
Fix this by having handlers save those registers before setting MSR[RI].
Signed-off-by: Nicholas Piggin <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
A subsequent fix for data interrupts (those that set DAR / DSISR)
requires some interrupt macros to be open-coded, and also requires
the 0x300 interrupt handler to be moved out-of-line.
This patch does that without changing behaviour, which makes the later
fix a smaller change.
Signed-off-by: Nicholas Piggin <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Code that uses HSRR registers is not required to clear MSR[RI] by
convention, however the system reset NMI itself may use HSRR
registers (e.g., to call OPAL) and clobber them.
Rather than introduce the requirement to clear RI in order to use
HSRRs, have system reset interrupt save and restore HSRRs.
Signed-off-by: Nicholas Piggin <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
HV interrupts that use HSRR registers do not enter with MSR[RI] clear,
but their entry code is not recoverable vs NMI, due to shared use of
HSPRG1 as a scratch register to save r13.
This means that a system reset or machine check that hits in HSRR
interrupt entry can cause r13 to be silently corrupted.
Fix this by marking NMIs non-recoverable if they land in HV interrupt
ranges.
Signed-off-by: Nicholas Piggin <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
To access PRBARn, where n is referenced as a binary number:
MRC p15, 0, <Rt>, c6, c8+n[3:1], 4*n[0] ; Read PRBARn into Rt
MCR p15, 0, <Rt>, c6, c8+n[3:1], 4*n[0] ; Write Rt into PRBARn
To access PRLARn, where n is referenced as a binary number:
MRC p15, 0, <Rt>, c6, c8+n[3:1], 4*n[0]+1 ; Read PRLARn into Rt
MCR p15, 0, <Rt>, c6, c8+n[3:1], 4*n[0]+1 ; Write Rt into PRLARn
For PR{B,L}AR4, n is 4, n[0] is 0, n[3:1] is 2, while current encoding
done with n[0] set to 1 which is wrong. Use proper encoding instead.
Fixes: 046835b4aa22b9ab6aa0bb274e3b71047c4b887d ("ARM: 8757/1: NOMMU: Support PMSAv8 MPU")
Signed-off-by: Vladimir Murzin <[email protected]>
Signed-off-by: Russell King <[email protected]>
|
|
arm64 has got relaxation on GIC version check at early boot stage due
to update of the GIC architecture let's align ARM with that.
To help backports (even though the code was correct at the time of writing)
Fixes: e59941b9b381 ("ARM: 8527/1: virt: enable GICv3 system registers")
Signed-off-by: Vladimir Murzin <[email protected]>
Reviewed-by: Marc Zyngier <[email protected]>
Signed-off-by: Russell King <[email protected]>
|
|
MCPM does a soft reset of the CPUs and uses common cpu_resume() routine to
perform low-level platform initialization. This results in a try to install
HYP stubs for the second time for each CPU and results in false HYP/SVC
mode mismatch detection. The HYP stubs are already installed at the
beginning of the kernel initialization on the boot CPU (head.S) or in the
secondary_startup() for other CPUs. To fix this issue MCPM code should use
a cpu_resume() routine without HYP stubs installation.
This change fixes HYP/SVC mode mismatch on Samsung Exynos5422-based Odroid
XU3/XU4/HC1 boards.
Fixes: 3721924c8154 ("ARM: 8081/1: MCPM: provide infrastructure to allow for MCPM loopback")
Signed-off-by: Marek Szyprowski <[email protected]>
Acked-by: Nicolas Pitre <[email protected]>
Tested-by: Anand Moon <[email protected]>
Signed-off-by: Russell King <[email protected]>
|
|
Use unified assembler syntax (UAL) in inline assembler. Divided
syntax is considered deprecated. This will also allow to build
the kernel using LLVM's integrated assembler.
When compiling non-Thumb2 GCC always emits a ".syntax divided"
at the beginning of the inline assembly which makes the
assembler fail. Since GCC 5 there is the -masm-syntax-unified
GCC option which make GCC assume unified syntax asm and hence
emits ".syntax unified" even in ARM mode. However, the option
is broken since GCC version 6 (see GCC PR88648 [1]). Work
around by adding ".syntax unified" as part of the inline
assembly.
[0] https://gcc.gnu.org/onlinedocs/gcc/ARM-Options.html#index-masm-syntax-unified
[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88648
Signed-off-by: Stefan Agner <[email protected]>
Acked-by: Nicolas Pitre <[email protected]>
Signed-off-by: Russell King <[email protected]>
|
|
Use unified assembler syntax (UAL) in assembly files. Divided
syntax is considered deprecated. This will also allow to build
the kernel using LLVM's integrated assembler.
Signed-off-by: Stefan Agner <[email protected]>
Acked-by: Nicolas Pitre <[email protected]>
Signed-off-by: Russell King <[email protected]>
|
|
Use unified assembler syntax (UAL) in headers. Divided syntax is
considered deprecated. This will also allow to build the kernel
using LLVM's integrated assembler.
Signed-off-by: Stefan Agner <[email protected]>
Acked-by: Nicolas Pitre <[email protected]>
Signed-off-by: Russell King <[email protected]>
|
|
Use unified assembler syntax (UAL) in macros. Divided syntax is
considered deprecated. This will also allow to build the kernel
using LLVM's integrated assembler.
Signed-off-by: Stefan Agner <[email protected]>
Acked-by: Nicolas Pitre <[email protected]>
Signed-off-by: Russell King <[email protected]>
|
|
Mostly unwind is done with irqs enabled however SLUB may call it with
irqs disabled while creating a new SLUB cache.
I had system freeze while loading a module which called
kmem_cache_create() on init. That means SLUB's __slab_alloc() disabled
interrupts and then
->new_slab_objects()
->new_slab()
->setup_object()
->setup_object_debug()
->init_tracking()
->set_track()
->save_stack_trace()
->save_stack_trace_tsk()
->walk_stackframe()
->unwind_frame()
->unwind_find_idx()
=>spin_lock_irqsave(&unwind_lock);
Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
Signed-off-by: Russell King <[email protected]>
|
|
When running kprobe on -rt kernel, the below bug is caught:
|BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:931
|in_atomic(): 1, irqs_disabled(): 128, pid: 14, name: migration/0
|Preemption disabled at:[<802f2b98>] cpu_stopper_thread+0xc0/0x140
|CPU: 0 PID: 14 Comm: migration/0 Tainted: G O 4.8.3-rt2 #1
|Hardware name: Freescale LS1021A
|[<8025a43c>] (___might_sleep)
|[<80b5b324>] (rt_spin_lock)
|[<80b5c31c>] (__patch_text_real)
|[<80b5c3ac>] (patch_text_stop_machine)
|[<802f2920>] (multi_cpu_stop)
Since patch_text_stop_machine() is called in stop_machine() which
disables IRQ, sleepable lock should be not used in this atomic context,
so replace patch_lock to raw lock.
Signed-off-by: Yang Shi <[email protected]>
Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
Reviewed-by: Arnd Bergmann <[email protected]>
Signed-off-by: Russell King <[email protected]>
|