| Age | Commit message (Collapse) | Author | Files | Lines |
|
Since smc_inet6_prot does not initialize ipv6_pinfo_offset, inet6_create()
copies an incorrect address value, sk + 0 (offset), to inet_sk(sk)->pinet6.
In addition, since inet_sk(sk)->pinet6 and smc_sk(sk)->clcsock practically
point to the same address, when smc_create_clcsk() stores the newly
created clcsock in smc_sk(sk)->clcsock, inet_sk(sk)->pinet6 is corrupted
into clcsock. This causes NULL pointer dereference and various other
memory corruptions.
To solve this problem, you need to initialize ipv6_pinfo_offset, add a
smc6_sock structure, and then add ipv6_pinfo as the second member of
the smc_sock structure.
Reported-by: syzkaller <[email protected]>
Fixes: d25a92ccae6b ("net/smc: Introduce IPPROTO_SMC")
Signed-off-by: Jeongjun Park <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Additional pipe tests where piped files are written to disk. This
means that spotting a file name of "-" isn't a sufficient "is pipe?"
test.
Signed-off-by: Ian Rogers <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: James Clark <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Nick Terrell <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Yanteng Si <[email protected]>
Cc: Yicong Yang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
No longer used by `perf inject` the repipe_fd is always -1 and repipe
is always false. Remove the options and associated code knowing the
constant values of the removed variables.
Signed-off-by: Ian Rogers <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: James Clark <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Nick Terrell <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Yanteng Si <[email protected]>
Cc: Yicong Yang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Previously inject->is_pipe was set if the input or output were a
pipe. Determining the input was a pipe had to be done prior to
starting the session and opening the file. This was done by comparing
the input file name with '-' but it fails if the pipe file is written
to disk.
Opening a pipe file from disk will correctly set perf_data.is_pipe, but
this is too late for 'perf inject' and results in a broken file. A
workaround is 'cat pipe_perf|perf inject -i - ...'.
This change removes inject->is_pipe and changes the dependent
conditions to use the is_pipe flag on the input
(inject->session->data) and output files (inject->output). This
ensures the is_pipe condition reflects things like the header being
read.
The change removes the use of perf file header repiping, that is
writing the file header out while reading it in. The case of input
pipe and output file cannot repipe as the attributes for the file are
unknown. To resolve this, write the file header when writing to disk
and as the attributes may be unknown, write them after the data.
Update sessions repipe variable to be trace_event_repipe as those are
the only events now impacted by it. Update __perf_session__new as the
repipe_fd no longer needs passing. Fully removing repipe from session
header reading will be done in a later change.
Committer testing:
root@number:~# perf record -e syscalls:sys_enter_*sleep/max-stack=4/ -o - sleep 0.01 | perf report -i -
# To display the perf.data header info, please use --header/--header-only options.
#
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.050 MB - ]
#
# Total Lost Samples: 0
#
# Samples: 1 of event 'syscalls:sys_enter_clock_nanosleep'
# Event count (approx.): 1
#
# Overhead Command Shared Object Symbol
# ........ ....... ............. ...............................
#
100.00% sleep libc.so.6 [.] clock_nanosleep@GLIBC_2.2.5
|
---__libc_start_main@@GLIBC_2.34
__libc_start_call_main
0x562fc2560a9f
clock_nanosleep@GLIBC_2.2.5
#
# (Tip: Create an archive with symtabs to analyse on other machine: perf archive)
#
root@number:~# perf record -e syscalls:sys_enter_*sleep/max-stack=4/ -o - sleep 0.01 > pipe.data
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.050 MB - ]
root@number:~# perf report --stdio -i pipe.data
# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 1 of event 'syscalls:sys_enter_clock_nanosleep'
# Event count (approx.): 1
#
# Overhead Command Shared Object Symbol
# ........ ....... ............. ...............................
#
100.00% sleep libc.so.6 [.] clock_nanosleep@GLIBC_2.2.5
|
---__libc_start_main@@GLIBC_2.34
__libc_start_call_main
0x55f775975a9f
clock_nanosleep@GLIBC_2.2.5
#
# (Tip: To set sampling period of individual events use perf record -e cpu/cpu-cycles,period=100001/,cpu/branches,period=10001/ ...)
#
root@number:~#
Signed-off-by: Ian Rogers <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: James Clark <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Nick Terrell <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Yanteng Si <[email protected]>
Cc: Yicong Yang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
For upper filesystems which do not use strict ordering of persisting
metadata changes (e.g. ubifs), when overlayfs file is modified for
the first time, copy up will create a copy of the lower file and
its parent directories in the upper layer. Permission lost of the
new upper parent directory was observed during power-cut stress test.
Fix by moving the fsync call to after metadata copy to make sure that the
metadata copied up directory and files persists to disk before renaming
from tmp to final destination.
With metacopy enabled, this change will hurt performance of workloads
such as chown -R, so we keep the legacy behavior of fsync only on copyup
of data.
Link: https://lore.kernel.org/linux-unionfs/CAOQ4uxj-pOvmw1-uXR3qVdqtLjSkwcR9nVKcNU_vC10Zyf2miQ@mail.gmail.com/
Reported-and-tested-by: Fei Lv <[email protected]>
Signed-off-by: Amir Goldstein <[email protected]>
|
|
Add power domain binding to the mediatek DPI controller
for MT8186.
Also, add power domain binding for other SoCs like
MT6795 and MT8173 that already had power domain property.
Signed-off-by: Rohit Agarwal <[email protected]>
Reviewed-by: Krzysztof Kozlowski <[email protected]>
Reviewed-by: CK Hu <[email protected]>
Link: https://patchwork.kernel.org/project/dri-devel/patch/[email protected]/
Signed-off-by: Chun-Kuang Hu <[email protected]>
|
|
The test case for SME vector length changes via sigreturn use a bit too
much cut'n'paste and only actually changed the SVE vector length in the
test itself. Andre's recent factoring out of the initialisation code caused
this to be exposed and the test to start failing. Fix the test to actually
cover the thing it's supposed to test.
Fixes: 4963aeb35a9e ("kselftest/arm64: signal: Add SME signal handling tests")
Signed-off-by: Mark Brown <[email protected]>
Reviewed-by: Andre Przywara <[email protected]>
Tested-by: Andre Przywara <[email protected]>
Link: https://lore.kernel.org/r/20240829-arm64-sme-signal-vl-change-test-v1-1-42d7534cb818@kernel.org
Signed-off-by: Will Deacon <[email protected]>
|
|
In the powerpc-pseries specific implementation, the IO hotplug
event is handled in the user space (drmgr tool). For the DLPAR
IO ADD, the corresponding device tree nodes and properties will
be added to the device tree after the device enable. The user
space (drmgr tool) uses configure_connector RTAS call with the
DRC index to retrieve the device nodes and updates the device
tree by writing to /proc/ppc64/ofdt. Under system lockdown,
/dev/mem access to allocate buffers for configure_connector RTAS
call is restricted which means the user space can not issue this
RTAS call and also can not access to /proc/ppc64/ofdt. The
pseries implementation need user interaction to power-on and add
device to the slot during the ADD event handling. So adds
complexity if the complete hotplug ADD event handling moved to
the kernel.
To overcome /dev/mem access restriction, this patch extends the
/sys/kernel/dlpar interface and provides ‘dt add index <drc_index>’
to the user space. The drmgr tool uses this interface to update
the device tree whenever the device is added. This interface
retrieves device tree nodes for the corresponding DRC index using
the configure_connector RTAS call and adds new device nodes /
properties to the device tree.
Signed-off-by: Scott Cheloha <[email protected]>
Signed-off-by: Haren Myneni <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://msgid.link/[email protected]
|
|
In the powerpc-pseries specific implementation, the IO hotplug
event is handled in the user space (drmgr tool). But update the
device tree and /dev/mem access to allocate buffers for some
RTAS calls are restricted when the kernel lockdown feature is
enabled. For the DLPAR IO REMOVE, the corresponding device tree
nodes and properties have to be removed from the device tree
after the device disable. The user space removes the device tree
nodes by updating /proc/ppc64/ofdt which is not allowed under
system lockdown is enabled. This restriction can be resolved
by moving the complete IO hotplug handling in the kernel. But
the pseries implementation need user interaction to power off
and to remove device from the slot during hotplug event handling.
To overcome the /proc/ppc64/ofdt restriction, this patch extends
the /sys/kernel/dlpar interface and provides
‘dt remove index <drc_index>’ to the user space so that drmgr
tool can remove the corresponding device tree nodes based on DRC
index from the device tree.
Signed-off-by: Scott Cheloha <[email protected]>
Signed-off-by: Haren Myneni <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://msgid.link/[email protected]
|
|
_be32 type is defined for some elements in pseries_hp_errorlog
struct but also used them u32 after be32_to_cpu() conversion.
Example: In handle_dlpar_errorlog()
hp_elog->_drc_u.drc_index = be32_to_cpu(hp_elog->_drc_u.drc_index);
And later assigned to u32 type
dlpar_cpu() - u32 drc_index = hp_elog->_drc_u.drc_index;
This incorrect usage is giving the following warnings and the
patch resolve these warnings with the correct assignment.
arch/powerpc/platforms/pseries/dlpar.c:398:53: sparse: sparse:
incorrect type in argument 1 (different base types) @@
expected unsigned int [usertype] drc_index @@
got restricted __be32 [usertype] drc_index @@
...
arch/powerpc/platforms/pseries/dlpar.c:418:43: sparse: sparse:
incorrect type in assignment (different base types) @@
expected restricted __be32 [usertype] drc_count @@
got unsigned int [usertype] @@
Reported-by: kernel test robot <[email protected]>
Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/
Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/
Signed-off-by: Haren Myneni <[email protected]>
v3:
- Fix warnings from using incorrect data types in pseries_hp_errorlog
struct
v2:
- Remove pr_info() and TODO comments
- Update more information in the commit logs
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://msgid.link/[email protected]
|
|
During merge of commit 4e991e3c16a3 ("powerpc: add CFUNC assembly
label annotation") a fallback version of CFUNC macro was added at
the last minute, so it can be used inconditionally.
Fixes: 4e991e3c16a3 ("powerpc: add CFUNC assembly label annotation")
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://msgid.link/0fa863f2f69b2ca4094ae066fcf1430fb31110c9.1724313540.git.christophe.leroy@csgroup.eu
|
|
VMAP stack added an emergency stack on powerpc/32 for when there is
a stack overflow, but failed to add stack validation for that
emergency stack. That validation is required for show stack.
Implement it.
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://msgid.link/2439d50b019f758db4a6d7b238b06441ab109799.1724156805.git.christophe.leroy@csgroup.eu
|
|
At the time being, DATA TLB miss handlers use task PGDIR for user
addresses and swapper_pg_dir for kernel addresses.
Now that kernel part of swapper_pg_dir is copied into task PGDIR
at PGD allocation, it is possible to avoid the above logic and
always use task PGDIR.
But new kernel PGD entries can still be created after init, in
which case those PGD entries may miss in task PGDIR. This can be
handled in DATA TLB error handler.
However, it needs to be done in real mode because the missing
entry might be related to the stack.
So implement copy of missing PGD entry in DATA TLB miss handler
just after detection of invalid PGD entry.
Also replace comparison by same calculation as in previous patch
to know if an address belongs to a kernel or user segment.
Note that as mentioned in platforms/Kconfig.cputype, SMP is not
supported on 603 processors so there is no risk of the PGD entry
be populated during the fault.
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://msgid.link/a2ba8eeb1c845eeb9e46b6fe3a5e9f841df9a033.1724173828.git.christophe.leroy@csgroup.eu
|
|
Now that modules exec page tables are preallocated, the instruction
TLBmiss handler can use task PGDIR inconditionally.
Also revise the identification of user vs kernel user space by doing
a calculation instead of a comparison: Get the segment number and
subtract the number of the first kernel segment. The result is
positive for kernel addresses and negative for user addresses,
which means that upper 2 bits are 0 for kernel and 3 for user.
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://msgid.link/9a3242162ad2faab8019c698e501b326a126ee9e.1724173828.git.christophe.leroy@csgroup.eu
|
|
In preparation of next patch that will perform some additional
calculations to replace comparison, switch the use of r0 and r3
as r0 has some limitations in some instructions like 'addi/subi'.
Also remove outdated comments about the meaning of each register.
The registers are used for many things and it would be difficult
to accurately describe all things done with a given register. The
function is now small enough to get a global view without much
description.
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://msgid.link/566af5e87685b1a85d3182549c0d520ce2d8877a.1724173828.git.christophe.leroy@csgroup.eu
|
|
page tables
For the same reason as 8xx, copy kernel PGD entries into all
PGDIRs in pgd_alloc() and preallocate execmem page tables before
creating new PGDs so that all PGD entries related to execmem are
copied by pgd_alloc().
This will help reduce the fast-path in TLBmiss handlers.
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://msgid.link/1a0d1feee07c4cf955f6a43a704c203e5c90fa53.1724173828.git.christophe.leroy@csgroup.eu
|
|
book3s/32 platforms have usually more memory than 8xx, but it is still
not worth reserving a full segment (256 Mbytes) for module text.
64Mbytes should be far enough.
Also fix TASK_SIZE when EXECMEM is not selected, and add a build
verification for overlap of module execmem space with user segments.
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://msgid.link/c1f6a4e47f177d919561c6e97d31af5564923cf6.1724173828.git.christophe.leroy@csgroup.eu
|
|
At the time being, DATA TLB miss handlers use task PGDIR for user
addresses and swapper_pg_dir for kernel addresses.
Now that kernel part of swapper_pg_dir is copied into task PGDIR
at PGD allocation, it is possible to avoid the above logic and
always use task PGDIR.
But new kernel PGD entries can still be created after init, in
which case those PGD entries may miss in task PGDIR. This can be
handled in DATA TLB error handler.
However, it needs to be done in real mode because the missing
entry might be related to the stack.
So implement copy of missing PGD entry in the prolog of DATA TLB
ERROR handler just after the fixup of DAR.
Note that this is feasible because 8xx doesn't implement vmap or
ioremap with 8Mbytes pages but only 512kbytes pages which are at
PTE level.
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://msgid.link/7a76a923d2a111f1d843d8b20b4df0c65d2f4a7b.1724173828.git.christophe.leroy@csgroup.eu
|
|
Now that modules exec page tables are preallocated, the instruction
TLBmiss handler can use task PGDIR inconditionally.
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://msgid.link/774fd766a8b9bcb9173b5e677d5dad0df2d3970f.1724173828.git.christophe.leroy@csgroup.eu
|
|
Preallocate execmem page tables before creating new PGDs so that
all PGD entries related to execmem can be copied in pgd_alloc().
On 8xx there are 32 Mbytes for execmem by default so this will use
32 kbytes.
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://msgid.link/a7180cc1ba59dec4502af39b4e9f3ff91c57280d.1724173828.git.christophe.leroy@csgroup.eu
|
|
8xx boards don't have much memory, the two I know have respectively
32Mbytes and 128Mbytes, so there is no point in having 256 Mbytes of
memory for module text.
Reduce it to 32Mbytes for 8xx, that's more than enough.
Nevertheless, make it a configurable value so that it can be customised
if needed.
Also add a build verification for overlap of module execmem space
with user PMD.
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://msgid.link/8db23b61e33a0d1913d814f94bfe71ba7ac78b0f.1724173828.git.christophe.leroy@csgroup.eu
|
|
It is now possible to not pin kernel text with a 8Mbytes TLB, so
the alignment for STRICT_KERNEL_RWX can be relaxed.
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://msgid.link/d0d8b05012b392dd166cfd911f14ba2741ce7e1e.1724173828.git.christophe.leroy@csgroup.eu
|
|
This reverts commit bccc58986a2f98e3af349c85c5f49aac7fb19ef2.
When STRICT_KERNEL_RWX is selected, EXEC memory must stop where
RW memory start. When pinning iTLBs it means an 8M alignment for
RW data start. That may be acceptable on boards with a lot of
memory but one of my supported boards only has 32 Mbytes and this
forced alignment leads to a waste of almost 4 Mbytes with is more
than 10% of the total memory.
So revert commit bccc58986a2f ("powerpc/8xx: Always pin kernel text
TLB") but don't restore previous behaviour in ITLB miss handler
as now kernel PGD entries are copied into each process PGDIR.
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://msgid.link/01b6780b860c8043b51a1ba9d83acfc6f2dde910.1724173828.git.christophe.leroy@csgroup.eu
|
|
In order to avoid having to select PGDIR at each TLB miss based on
fault address, copy kernel PGD entries into all PGDIRs in pgd_alloc().
At first it will be used for ITLB misses for kernel TEXT, then for
execmem then for kernel DATA.
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://msgid.link/c6d2bf5af2ea909071a85bdca8b1f5dc2df134a8.1724173828.git.christophe.leroy@csgroup.eu
|
|
Since commit 9132a2e82adc ("powerpc/8xx: Define a MODULE area below
kernel text"), module exec space is below PAGE_OFFSET so not only
space above PAGE_OFFSET, but space above TASK_SIZE need to be seen
as kernel space.
Until now the problem went undetected because by default TASK_SIZE
is 0x8000000 which means address space is determined by just
checking upper address bit. But when TASK_SIZE is over 0x80000000,
PAGE_OFFSET is used for comparison, leading to thinking module
addresses are part of user space.
Fix it by using TASK_SIZE instead of PAGE_OFFSET for address
comparison.
Fixes: 9132a2e82adc ("powerpc/8xx: Define a MODULE area below kernel text")
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://msgid.link/3f574c9845ff0a023b46cb4f38d2c45aecd769bd.1724173828.git.christophe.leroy@csgroup.eu
|
|
Commit cf209951fa7f ("powerpc/8xx: Map linear memory with huge pages")
introduced an initial mapping of kernel TEXT using PAGE_KERNEL_TEXT,
but the pages that contain kernel TEXT may also contain kernel RODATA,
and depending on selected debug options PAGE_KERNEL_TEXT may be either
RWX or ROX. RODATA must be writable during init because it also
contains ro_after_init data.
So use PAGE_KERNEL_X instead to be sure it is RWX.
Fixes: cf209951fa7f ("powerpc/8xx: Map linear memory with huge pages")
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://msgid.link/dac7a828d8497c4548c91840575a706657baa4f1.1724173828.git.christophe.leroy@csgroup.eu
|
|
The header file linux/platform_device.h is included
twice. Remove the last one.
Signed-off-by: Hongbo Li <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Mark Brown <[email protected]>
|
|
intel_spi_populate_chip() use devm_kasprintf() to set pdata->name.
This can return a NULL pointer on failure but this returned value
is not checked.
Fixes: e58db3bcd93b ("spi: intel: Add default partition and name to the second chip")
Signed-off-by: Charles Han <[email protected]>
Reviewed-by: Mika Westerberg <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Mark Brown <[email protected]>
|
|
The pnv_pci_init_ioda_hub() have been removed since
commit 5ac129cdb50b ("powerpc/powernv/pci: Remove ioda1 support"),
and now it is useless, so remove it.
Signed-off-by: Gaosheng Cui <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://msgid.link/[email protected]
|
|
The use_cop() and drop_cop() have been removed since
commit 6ff4d3e96652 ("powerpc: Remove old unused icswx based
coprocessor support"), now they are useless, so remove them.
Signed-off-by: Gaosheng Cui <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://msgid.link/[email protected]
|
|
The pas_pci_irq_fixup() have been removed since
commit 771f7404a9de ("pasemi_mac: Move the IRQ mapping from the
PCI layer to the driver"), and now it is useless, so remove it.
Signed-off-by: Gaosheng Cui <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://msgid.link/[email protected]
|
|
The maple_calibrate_decr() have been removed since
commit 10f7e7c15e6c ("[PATCH] ppc64: consolidate calibrate_decr
implementations"), and now it is useless, so remove it.
Signed-off-by: Gaosheng Cui <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://msgid.link/[email protected]
|
|
Handling FEAT_ATS1A (which provides the AT S1E{1,2}A instructions)
is pretty easy, as it is just the usual AT without the permission
check.
This basically amounts to plumbing the instructions in the various
dispatch tables, and handling FEAT_ATS1A being disabled in the
ID registers.
Signed-off-by: Marc Zyngier <[email protected]>
|
|
Hooray, we're done. Plug the AT traps into the system instruction
table, and let it rip.
Signed-off-by: Marc Zyngier <[email protected]>
|
|
FEAT_PAN3 added a check for executable permissions to FEAT_PAN2.
Add the required SCTLR_ELx.EPAN and descriptor checks to handle
this correctly.
Reviewed-by: Alexandru Elisei <[email protected]>
Signed-off-by: Marc Zyngier <[email protected]>
|
|
Ensure that SCTLR_EL1.EPAN is RES0 when FEAT_PAN3 isn't supported.
Signed-off-by: Marc Zyngier <[email protected]>
|
|
In order to plug the brokenness of our current AT implementation,
we need a SW walker that is going to... err.. walk the S1 tables
and tell us what it finds.
Of course, it builds on top of our S2 walker, and share similar
concepts. The beauty of it is that since it uses kvm_read_guest(),
it is able to bring back pages that have been otherwise evicted.
This is then plugged in the two AT S1 emulation functions as
a "slow path" fallback. I'm not sure it is that slow, but hey.
Reviewed-by: Alexandru Elisei <[email protected]>
Signed-off-by: Marc Zyngier <[email protected]>
|
|
Make this helper visible to at.c, we are going to need it.
Signed-off-by: Marc Zyngier <[email protected]>
|
|
On the face of it, AT S12E{0,1}{R,W} is pretty simple. It is the
combination of AT S1E{0,1}{R,W}, followed by an extra S2 walk.
However, there is a great deal of complexity coming from combining
the S1 and S2 attributes to report something consistent in PAR_EL1.
This is an absolute mine field, and I have a splitting headache.
Signed-off-by: Marc Zyngier <[email protected]>
|
|
Similar to our AT S1E{0,1} emulation, we implement the AT S1E2
handling.
This emulation of course suffers from the same problems, but is
somehow simpler due to the lack of PAN2 and the fact that we are
guaranteed to execute it from the correct context.
Co-developed-by: Jintack Lim <[email protected]>
Signed-off-by: Jintack Lim <[email protected]>
Signed-off-by: Marc Zyngier <[email protected]>
|
|
Building on top of our primitive AT S1E{0,1}{R,W} emulation,
add minimal support for the FEAT_PAN2 instructions, momentary
context-switching PSTATE.PAN so that it takes effect in the
context of the guest.
Signed-off-by: Marc Zyngier <[email protected]>
|
|
Emulating AT instructions is one the tasks devolved to the host
hypervisor when NV is on.
Here, we take the basic approach of emulating AT S1E{0,1}{R,W}
using the AT instructions themselves. While this mostly work,
it doesn't *always* work:
- S1 page tables can be swapped out
- shadow S2 can be incomplete and not contain mappings for
the S1 page tables
We are not trying to handle these case here, and defer it to
a later patch. Suitable comments indicate where we are in dire
need of better handling.
Co-developed-by: Jintack Lim <[email protected]>
Signed-off-by: Jintack Lim <[email protected]>
Signed-off-by: Marc Zyngier <[email protected]>
|
|
If our guest has been configured without PAN2, make sure that
AT S1E1{R,W}P will generate an UNDEF.
Reviewed-by: Anshuman Khandual <[email protected]>
Signed-off-by: Marc Zyngier <[email protected]>
|
|
The upper_attr attribute has been badly named, as it most of the
time carries the full "last walked descriptor".
Rename it to "desc" and make ti contain the full 64bit descriptor.
This will be used by the S1 PTW.
Signed-off-by: Marc Zyngier <[email protected]>
|
|
Despite KVM not using the contiguous bit for anything related to
TLBs, the spec does require that the alignment defined by the
contiguous bit for the page size and the level is enforced.
Add the required checks to offset the point where PA and VA merge.
Fixes: 61e30b9eef7f ("KVM: arm64: nv: Implement nested Stage-2 page table walk logic")
Reported-by: Alexandru Elisei <[email protected]>
Signed-off-by: Marc Zyngier <[email protected]>
|
|
Although we have helpers that encode the level of a given fault
type, the Address Size fault type is missing it.
While we're at it, fix the bracketting for ESR_ELx_FSC_ACCESS_L()
and ESR_ELx_FSC_PERM_L().
Signed-off-by: Marc Zyngier <[email protected]>
|
|
Although we already have the primitives to set PSTATE.PAN with an
immediate, we don't have a way to read the current state nor set
it ot an arbitrary value (i.e. we can generally save/restore it).
Thankfully, all that is missing for this is the definition for
the PAN pseudo system register, here named SYS_PSTATE_PAN.
Signed-off-by: Marc Zyngier <[email protected]>
|
|
As KVM is about to grow a full emulation for the AT instructions,
add the layout of the PAR_EL1 register in its non-D128 configuration.
Note that the constants are a bit ugly, as the register has two
layouts, based on the state of the F bit.
Signed-off-by: Marc Zyngier <[email protected]>
|
|
Although Linux doesn't make use of hierarchical permissions (TFFT!),
KVM needs to know where the various bits related to this feature
live in the TCR_ELx registers as well as in the page tables.
Add the missing bits.
Signed-off-by: Marc Zyngier <[email protected]>
|
|
To allow using newer instructions that current assemblers don't know about,
replace the `at` instruction with the underlying SYS instruction.
Signed-off-by: Joey Gouly <[email protected]>
Cc: Oliver Upton <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
Reviewed-by: Marc Zyngier <[email protected]>
Reviewed-by: Anshuman Khandual <[email protected]>
Acked-by: Will Deacon <[email protected]>
Signed-off-by: Marc Zyngier <[email protected]>
|