Age | Commit message (Collapse) | Author | Files | Lines |
|
PCI devices support two variants of the D3 power state: D3hot (main power
present) D3cold (main power removed). Previously struct pci_dev contained:
unsigned int d3_delay; /* D3->D0 transition time in ms */
unsigned int d3cold_delay; /* D3cold->D0 transition time in ms */
"d3_delay" refers specifically to the D3hot state. Rename it to
"d3hot_delay" to avoid ambiguity and align with the ACPI "_DSM for
Specifying Device Readiness Durations" in the PCI Firmware spec r3.2,
sec 4.6.9.
There is no change to the functionality.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Krzysztof Wilczyński <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
|
|
The PR_SPEC_DISABLE_NOEXEC option to the PR_SPEC_STORE_BYPASS prctl()
allows the SSB mitigation to be enabled only until the next execve(),
at which point the state will revert back to PR_SPEC_ENABLE and the
mitigation will be disabled.
Add support for PR_SPEC_DISABLE_NOEXEC on arm64.
Reported-by: Anthony Steinhauser <[email protected]>
Signed-off-by: Will Deacon <[email protected]>
|
|
The kbuild robot reports that we're relying on an implicit inclusion to
get a definition of task_stack_page() in the Spectre-v4 mitigation code,
which is not always in place for some configurations:
| arch/arm64/kernel/proton-pack.c:329:2: error: implicit declaration of function 'task_stack_page' [-Werror,-Wimplicit-function-declaration]
| task_pt_regs(task)->pstate |= val;
| ^
| arch/arm64/include/asm/processor.h:268:36: note: expanded from macro 'task_pt_regs'
| ((struct pt_regs *)(THREAD_SIZE + task_stack_page(p)) - 1)
| ^
| arch/arm64/kernel/proton-pack.c:329:2: note: did you mean 'task_spread_page'?
Add the missing include to fix the build error.
Fixes: a44acf477220 ("arm64: Move SSBD prctl() handler alongside other spectre mitigation code")
Reported-by: Anthony Steinhauser <[email protected]>
Reported-by: kernel test robot <[email protected]>
Link: https://lore.kernel.org/r/202009260013.Ul7AD29w%[email protected]
Signed-off-by: Will Deacon <[email protected]>
|
|
Patching the EL2 exception vectors is integral to the Spectre-v2
workaround, where it can be necessary to execute CPU-specific sequences
to nobble the branch predictor before running the hypervisor text proper.
Remove the dependency on CONFIG_RANDOMIZE_BASE and allow the EL2 vectors
to be patched even when KASLR is not enabled.
Fixes: 7a132017e7a5 ("KVM: arm64: Replace CONFIG_KVM_INDIRECT_VECTORS with CONFIG_RANDOMIZE_BASE")
Reported-by: kernel test robot <[email protected]>
Link: https://lore.kernel.org/r/202009221053.Jv1XsQUZ%[email protected]
Signed-off-by: Will Deacon <[email protected]>
|
|
Out with the old ghost, in with the new...
Signed-off-by: Marc Zyngier <[email protected]>
Signed-off-by: Will Deacon <[email protected]>
|
|
Convert the KVM WA2 code to using the Spectre infrastructure,
making the code much more readable. It also allows us to
take SSBS into account for the mitigation.
Signed-off-by: Marc Zyngier <[email protected]>
Signed-off-by: Will Deacon <[email protected]>
|
|
kvm_arm_have_ssbd() is now completely unused, get rid of it.
Signed-off-by: Marc Zyngier <[email protected]>
Signed-off-by: Will Deacon <[email protected]>
|
|
Owing to the fact that the host kernel is always mitigated, we can
drastically simplify the WA2 handling by keeping the mitigation
state ON when entering the guest. This means the guest is either
unaffected or not mitigated.
This results in a nice simplification of the mitigation space,
and the removal of a lot of code that was never really used anyway.
Signed-off-by: Marc Zyngier <[email protected]>
Signed-off-by: Will Deacon <[email protected]>
|
|
Rewrite the Spectre-v4 mitigation handling code to follow the same
approach as that taken by Spectre-v2.
For now, report to KVM that the system is vulnerable (by forcing
'ssbd_state' to ARM64_SSBD_UNKNOWN), as this will be cleared up in
subsequent steps.
Signed-off-by: Will Deacon <[email protected]>
|
|
As part of the spectre consolidation effort to shift all of the ghosts
into their own proton pack, move all of the horrible SSBD prctl() code
out of its own 'ssbd.c' file.
Signed-off-by: Will Deacon <[email protected]>
|
|
In a similar manner to the renaming of ARM64_HARDEN_BRANCH_PREDICTOR
to ARM64_SPECTRE_V2, rename ARM64_SSBD to ARM64_SPECTRE_V4. This isn't
_entirely_ accurate, as we also need to take into account the interaction
with SSBS, but that will be taken care of in subsequent patches.
Signed-off-by: Will Deacon <[email protected]>
|
|
If all CPUs discovered during boot have SSBS, then spectre-v4 will be
considered to be "mitigated". However, we still allow late CPUs without
SSBS to be onlined, albeit with a "SANITY CHECK" warning. This is
problematic for userspace because it means that the system can quietly
transition to "Vulnerable" at runtime.
Avoid this by treating SSBS as a non-strict system feature: if all of
the CPUs discovered during boot have SSBS, then late arriving secondaries
better have it as well.
Signed-off-by: Will Deacon <[email protected]>
|
|
The is_ttbrX_addr() functions have somehow ended up in the middle of
the start_thread() functions, so move them out of the way to keep the
code readable.
Signed-off-by: Will Deacon <[email protected]>
|
|
If the system is not affected by Spectre-v2, then advertise to the KVM
guest that it is not affected, without the need for a safelist in the
guest.
Signed-off-by: Marc Zyngier <[email protected]>
Signed-off-by: Will Deacon <[email protected]>
|
|
The Spectre-v2 mitigation code is pretty unwieldy and hard to maintain.
This is largely due to it being written hastily, without much clue as to
how things would pan out, and also because it ends up mixing policy and
state in such a way that it is very difficult to figure out what's going
on.
Rewrite the Spectre-v2 mitigation so that it clearly separates state from
policy and follows a more structured approach to handling the mitigation.
Signed-off-by: Will Deacon <[email protected]>
|
|
The spectre mitigation code is spread over a few different files, which
makes it both hard to follow, but also hard to remove it should we want
to do that in future.
Introduce a new file for housing the spectre mitigations, and populate
it with the spectre-v1 reporting code to start with.
Signed-off-by: Will Deacon <[email protected]>
|
|
For better or worse, the world knows about "Spectre" and not about
"Branch predictor hardening". Rename ARM64_HARDEN_BRANCH_PREDICTOR to
ARM64_SPECTRE_V2 as part of moving all of the Spectre mitigations into
their own little corner.
Signed-off-by: Will Deacon <[email protected]>
|
|
Use is_hyp_mode_available() to detect whether or not we need to patch
the KVM vectors for branch hardening, which avoids the need to take the
vector pointers as parameters.
Signed-off-by: Will Deacon <[email protected]>
|
|
The removal of CONFIG_HARDEN_BRANCH_PREDICTOR means that
CONFIG_KVM_INDIRECT_VECTORS is synonymous with CONFIG_RANDOMIZE_BASE,
so replace it.
Signed-off-by: Will Deacon <[email protected]>
|
|
The spectre mitigations are too configurable for their own good, leading
to confusing logic trying to figure out when we should mitigate and when
we shouldn't. Although the plethora of command-line options need to stick
around for backwards compatibility, the default-on CONFIG options that
depend on EXPERT can be dropped, as the mitigations only do anything if
the system is vulnerable, a mitigation is available and the command-line
hasn't disabled it.
Remove CONFIG_HARDEN_BRANCH_PREDICTOR and CONFIG_ARM64_SSBD in favour of
enabling this code unconditionally.
Signed-off-by: Will Deacon <[email protected]>
|
|
Commit 606f8e7b27bf ("arm64: capabilities: Use linear array for
detection and verification") changed the way we deal with per-CPU errata
by only calling the .matches() callback until one CPU is found to be
affected. At this point, .matches() stop being called, and .cpu_enable()
will be called on all CPUs.
This breaks the ARCH_WORKAROUND_2 handling, as only a single CPU will be
mitigated.
In order to address this, forcefully call the .matches() callback from a
.cpu_enable() callback, which brings us back to the original behaviour.
Fixes: 606f8e7b27bf ("arm64: capabilities: Use linear array for detection and verification")
Cc: <[email protected]>
Reviewed-by: Suzuki K Poulose <[email protected]>
Signed-off-by: Marc Zyngier <[email protected]>
Signed-off-by: Will Deacon <[email protected]>
|
|
The user defined label following "fallthrough" is not considered by GCC
and causes build failure.
kernel-source/include/linux/compiler_attributes.h:208:41: error: attribute
'fallthrough' not preceding a case label or default label [-Werror]
208 define fallthrough _attribute((fallthrough_))
^~~~~~~~~~~~~
Fixes: df561f6688fe ("treewide: Use fallthrough pseudo-keyword")
Signed-off-by: He Zhe <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Cc: Gustavo A. R. Silva <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
Signed-off-by: Marc Zyngier <[email protected]>
|
|
As we can now hide events from the guest, let's also adjust its view of
PCMEID{0,1}_EL1 so that it can figure out why some common events are not
counting as they should.
The astute user can still look into the TRM for their CPU and find out
they've been cheated, though. Nobody's perfect.
Signed-off-by: Marc Zyngier <[email protected]>
|
|
It can be desirable to expose a PMU to a guest, and yet not want the
guest to be able to count some of the implemented events (because this
would give information on shared resources, for example.
For this, let's extend the PMUv3 device API, and offer a way to setup a
bitmap of the allowed events (the default being no bitmap, and thus no
filtering).
Userspace can thus allow/deny ranges of event. The default policy
depends on the "polarity" of the first filter setup (default deny if the
filter allows events, and default allow if the filter denies events).
This allows to setup exactly what is allowed for a given guest.
Note that although the ioctl is per-vcpu, the map of allowed events is
global to the VM (it can be setup from any vcpu until the vcpu PMU is
initialized).
Reviewed-by: Andrew Jones <[email protected]>
Signed-off-by: Marc Zyngier <[email protected]>
|
|
The PMU code suffers from a small defect where we assume that the event
number provided by the guest is always 16 bit wide, even if the CPU only
implements the ARMv8.0 architecture. This isn't really problematic in
the sense that the event number ends up in a system register, cropping
it to the right width, but still this needs fixing.
In order to make it work, let's probe the version of the PMU that the
guest is going to use. This is done by temporarily creating a kernel
event and looking at the PMUVer field that has been saved at probe time
in the associated arm_pmu structure. This in turn gets saved in the kvm
structure, and subsequently used to compute the event mask that gets
used throughout the PMU code.
Signed-off-by: Marc Zyngier <[email protected]>
|
|
The PMU emulation error handling is pretty messy when dealing with
attributes. Let's refactor it so that we have less duplication,
and that it is easy to extend later on.
A functional change is that kvm_arm_pmu_v3_init() used to return
-ENXIO when the PMU feature wasn't set. The error is now reported
as -ENODEV, matching the documentation. -ENXIO is still returned
when the interrupt isn't properly configured.
Signed-off-by: Marc Zyngier <[email protected]>
|
|
Currently we overflow save_area_sync and write over
save_area_async. Although this is not a real problem make
startup_pgm_check_handler consistent with late pgm check handler and
store [%r0,%r7] directly into gpregs_save_area.
Reviewed-by: Sven Schnelle <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>
|
|
Since commit 394216275c7d ("s390: remove broken hibernate / power
management support") _swsusp_reset_dma is unused and could be safely
removed.
Reviewed-by: Sven Schnelle <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>
|
|
Currently there are several minor problems with randomization base
generation code:
1. It might misbehave in low memory conditions. In particular there
might be enough space for the kernel on [0, block_sum] but after
if (base < safe_addr)
base = safe_addr;
it might not be enough anymore.
2. It does not correctly handle minimal address constraint. In condition
if (base < safe_addr)
base = safe_addr;
a synthetic value is compared with an address. If we have a memory
setup with memory holes due to offline memory regions, and safe_addr is
close to the end of the first online memory block - we might position
the kernel in invalid memory.
3. block_sum calculation logic contains off-by-one error. Let's say we
have a memory block in which the kernel fits perfectly
(end - start == kernel_size). In this case:
if (end - start < kernel_size)
continue;
block_sum += end - start - kernel_size;
block_sum is not increased, while it is a valid kernel position.
So, address problems listed and explain algorithm used. Besides that
restructuring the code makes it possible to extend kernel positioning
algorithm further. Currently we pick position in between single
[min, max] range (min = safe_addr, max = memory_limit). In future we
can do that for multiple ranges as well (by calling
count_valid_kernel_positions for each range).
Reviewed-by: Philipp Rudo <[email protected]>
Reviewed-by: Alexander Egorenkov <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>
|
|
0 is a valid random value. To avoid mixing it with error code 0 as an
return code make get_random() take extra argument to output random
value and return an error code.
Reviewed-by: Philipp Rudo <[email protected]>
Reviewed-by: Alexander Egorenkov <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>
|
|
Fix the leds subnode names to match (^led-[0-9a-f]$|led).
Similar change has been also done by commit 08dc0e5dd9aa ("arm64: dts:
meson: fix leds subnodes name").
The patch is fixing this warning:
avnet-ultra96-rev1.dt.yaml: leds: 'ds2', 'ds3', 'ds4', 'ds5' do not match
any of the regexes: '(^led-[0-9a-f]$|led)', 'pinctrl-[0-9]+'
Signed-off-by: Michal Simek <[email protected]>
Link: https://lore.kernel.org/r/1a69c3fa0291f991ffcf113ea222c713ba4d4ff0.1598264917.git.michal.simek@xilinx.com
|
|
u-boot, DT properties are not documented anywhere in Linux DT binding
that's why remove them.
Signed-off-by: Michal Simek <[email protected]>
Link: https://lore.kernel.org/r/8ba339425b9c9f319bdedce7741367055a30713c.1598257720.git.michal.simek@xilinx.com
Reviewed-by: Krzysztof Kozlowski <[email protected]>
|
|
DT binding permits only one compatible string which was decribed in past by
commit 63cab195bf49 ("i2c: removed work arounds in i2c driver for Zynq
Ultrascale+ MPSoC").
The commit aea37006e183 ("dt-bindings: i2c: cadence: Migrate i2c-cadence
documentation to YAML") has converted binding to yaml and the following
issues is reported:
...: i2c@ff030000: compatible: Additional items are not allowed
('cdns,i2c-r1p10' was unexpected)
From schema:
.../Documentation/devicetree/bindings/i2c/cdns,i2c-r1p10.yaml fds
...: i2c@ff030000: compatible: ['cdns,i2c-r1p14', 'cdns,i2c-r1p10'] is too
long
The commit c415f9e8304a ("ARM64: zynqmp: Fix i2c node's compatible string")
has added the second compatible string but without removing origin one.
The patch is only keeping one compatible string "cdns,i2c-r1p14".
Fixes: c415f9e8304a ("ARM64: zynqmp: Fix i2c node's compatible string")
Signed-off-by: Michal Simek <[email protected]>
Link: https://lore.kernel.org/r/cc294ae1a79ef845af6809ddb4049f0c0f5bb87a.1598259551.git.michal.simek@xilinx.com
Reviewed-by: Krzysztof Kozlowski <[email protected]>
|
|
Rename amba-apu and amba to AXI. Based on Xilinx ZynqMP TRM (Chapter 15)
chip is "using the advanced eXtensible interface (AXI) point-to-point
channels for communicating addresses, data, and response transactions
between master and slave clients."
Issues are reported as:
...: amba: $nodename:0: 'amba' does not match
'^([a-z][a-z0-9\\-]+-bus|bus|soc|axi|ahb|apb)(@[0-9a-f]+)?$'
From schema: .../dt-schema/dtschema/schemas/simple-bus.yaml
...: amba-apu@0: $nodename:0: 'amba-apu@0' does not match
'^([a-z][a-z0-9\\-]+-bus|bus|soc|axi|ahb|apb)(@[0-9a-f]+)?$'
From schema: .../dt-schema/dtschema/schemas/simple-bus.yaml
Signed-off-by: Michal Simek <[email protected]>
Link: https://lore.kernel.org/r/68f20a2b2bb0feee80bc3348619c2ee98aa69963.1598263539.git.michal.simek@xilinx.com
Reviewed-by: Krzysztof Kozlowski <[email protected]>
|
|
The convention for node names is to use hyphens, not underscores.
dtschema for pca95xx expects GPIO hogs to end with 'hog' prefix.
Signed-off-by: Krzysztof Kozlowski <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Michal Simek <[email protected]>
|
|
Fixes: 14a61b642de9 ("KVM: VMX: Rename "vmx_msr_index" to "vmx_uret_msrs_list"")
Signed-off-by: kernel test robot <[email protected]>
Message-Id: <20200928153714.GA6285@a3a878002045>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
An error occues when sampling non-PEBS INST_RETIRED.PREC_DIST(0x01c0)
event.
perf record -e cpu/event=0xc0,umask=0x01/ -- sleep 1
Error:
The sys_perf_event_open() syscall returned with 22 (Invalid argument)
for event (cpu/event=0xc0,umask=0x01/).
/bin/dmesg | grep -i perf may provide additional information.
The idxmsk64 of the event is set to 0. The event never be successfully
scheduled.
The event should be limit to the fixed counter 0.
Fixes: 6017608936c1 ("perf/x86/intel: Add Icelake support")
Reported-by: Yi, Ammy <[email protected]>
Signed-off-by: Kan Liang <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]
|
|
The "MiB" result of the IMC free-running bandwidth events,
uncore_imc_free_running/read/ and uncore_imc_free_running/write/ are 16
times too small.
The "MiB" value equals the raw IMC free-running bandwidth counter value
times a "scale" which is inaccurate.
The IMC free-running bandwidth events should be incremented per 64B
cache line, not DWs (4 bytes). The "scale" should be 6.103515625e-5.
Fix the "scale" for both Snow Ridge and Ice Lake.
Fixes: 2b3b76b5ec67 ("perf/x86/intel/uncore: Add Ice Lake server uncore support")
Fixes: ee49532b38dd ("perf/x86/intel/uncore: Add IMC uncore support for Snow Ridge")
Signed-off-by: Kan Liang <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
Introduced early attributes /sys/devices/uncore_iio_<pmu_idx>/die* are
initialized by skx_iio_set_mapping(), however, for example, for multiple
segment platforms skx_iio_get_topology() returns -EPERM before a list of
attributes in skx_iio_mapping_group will have been initialized.
As a result the list is being NULL. Thus the warning
"sysfs: (bin_)attrs not set by subsystem for group: uncore_iio_*/" appears
and uncore_iio pmus are not available in sysfs. Clear IIO attr_update
to properly handle the cases when topology information cannot be
retrieved.
Fixes: bb42b3d39781 ("perf/x86/intel/uncore: Expose an Uncore unit to IIO PMON mapping")
Reported-by: Kyle Meyer <[email protected]>
Suggested-by: Kan Liang <[email protected]>
Signed-off-by: Alexander Antonov <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Alexei Budankov <[email protected]>
Reviewed-by: Kan Liang <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
The Jasper Lake processor is also a Tremont microarchitecture. From the
perspective of perf MSR, there is nothing changed compared with
Elkhart Lake.
Share the code path with Elkhart Lake.
Signed-off-by: Kan Liang <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
The Jasper Lake processor is also a Tremont microarchitecture. From the
perspective of Intel PMU, there is nothing changed compared with
Elkhart Lake.
Share the perf code with Elkhart Lake.
Signed-off-by: Kan Liang <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
An oops is triggered by the fuzzy test.
[ 327.853081] unchecked MSR access error: RDMSR from 0x70c at rIP:
0xffffffffc082c820 (uncore_msr_read_counter+0x10/0x50 [intel_uncore])
[ 327.853083] Call Trace:
[ 327.853085] <IRQ>
[ 327.853089] uncore_pmu_event_start+0x85/0x170 [intel_uncore]
[ 327.853093] uncore_pmu_event_add+0x1a4/0x410 [intel_uncore]
[ 327.853097] ? event_sched_in.isra.118+0xca/0x240
There are 2 GP counters for each CBOX, but the current code claims 4
counters. Accessing the invalid registers triggers the oops.
Fixes: 6e394376ee89 ("perf/x86/intel/uncore: Add Intel Icelake uncore support")
Signed-off-by: Kan Liang <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
There are some updates for the Icelake model specific uncore performance
monitors. (The update can be found at 10th generation intel core
processors families specification update Revision 004, ICL068)
1) Counter 0 of ARB uncore unit is not available for software use
2) The global 'enable bit' (bit 29) and 'freeze bit' (bit 31) of
MSR_UNC_PERF_GLOBAL_CTRL cannot be used to control counter behavior.
Needs to use local enable in event select MSR.
Accessing the modified bit/registers will be ignored by HW. Users may
observe inaccurate results with the current code.
The changes of the MSR_UNC_PERF_GLOBAL_CTRL imply that groups cannot be
read atomically anymore. Although the error of the result for a group
becomes a bit bigger, it still far lower than not using a group. The
group support is still kept. Only Remove the *_box() related
implementation.
Since the counter 0 of ARB uncore unit is not available, update the MSR
address for the ARB uncore unit.
There is no change for IMC uncore unit, which only include free-running
counters.
Fixes: 6e394376ee89 ("perf/x86/intel/uncore: Add Intel Icelake uncore support")
Signed-off-by: Kan Liang <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
Previously, the MSR uncore for the Ice Lake and Tiger Lake are
identical. The code path is shared. However, with recent update, the
global MSR_UNC_PERF_GLOBAL_CTRL register and ARB uncore unit are changed
for the Ice Lake. Split the Ice Lake and Tiger Lake MSR uncore support.
The changes only impact the MSR ops() and the ARB uncore unit. Other
codes can still be shared between the Ice Lake and the Tiger Lake.
Signed-off-by: Kan Liang <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux into for-joerg/arm-smmu/updates
Pull in core arm64 changes required to enable Shared Virtual Memory
(SVM) using SMMUv3. This brings us increasingly closer to being able to
share page-tables directly between user-space tasks running on the CPU
and their corresponding contexts on coherent devices performing DMA
through the SMMU.
Signed-off-by: Will Deacon <[email protected]>
|
|
Reading the 'prod' MMIO register in order to determine whether or not
there is valid data beyond 'cons' for a given queue does not provide
sufficient dependency ordering, as the resulting access is address
dependent only on 'cons' and can therefore be speculated ahead of time,
potentially allowing stale data to be read by the CPU.
Use readl() instead of readl_relaxed() when updating the shadow copy of
the 'prod' pointer, so that all speculated memory reads from the
corresponding queue can occur only from valid slots.
Signed-off-by: Zhou Wang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
[will: Use readl() instead of explicit barrier. Update 'cons' side to match.]
Signed-off-by: Will Deacon <[email protected]>
|
|
The SMMUv3 driver would like to read the MMFR0 PARANGE field in order to
share CPU page tables with devices. Allow the driver to be built as
module by exporting the read_sanitized_ftr_reg() cpufeature symbol.
Signed-off-by: Jean-Philippe Brucker <[email protected]>
Reviewed-by: Jonathan Cameron <[email protected]>
Acked-by: Suzuki K Poulose <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Will Deacon <[email protected]>
|
|
To enable address space sharing with the IOMMU, introduce
arm64_mm_context_get() and arm64_mm_context_put(), that pin down a
context and ensure that it will keep its ASID after a rollover. Export
the symbols to let the modular SMMUv3 driver use them.
Pinning is necessary because a device constantly needs a valid ASID,
unlike tasks that only require one when running. Without pinning, we would
need to notify the IOMMU when we're about to use a new ASID for a task,
and it would get complicated when a new task is assigned a shared ASID.
Consider the following scenario with no ASID pinned:
1. Task t1 is running on CPUx with shared ASID (gen=1, asid=1)
2. Task t2 is scheduled on CPUx, gets ASID (1, 2)
3. Task tn is scheduled on CPUy, a rollover occurs, tn gets ASID (2, 1)
We would now have to immediately generate a new ASID for t1, notify
the IOMMU, and finally enable task tn. We are holding the lock during
all that time, since we can't afford having another CPU trigger a
rollover. The IOMMU issues invalidation commands that can take tens of
milliseconds.
It gets needlessly complicated. All we wanted to do was schedule task tn,
that has no business with the IOMMU. By letting the IOMMU pin tasks when
needed, we avoid stalling the slow path, and let the pinning fail when
we're out of shareable ASIDs.
After a rollover, the allocator expects at least one ASID to be available
in addition to the reserved ones (one per CPU). So (NR_ASIDS - NR_CPUS -
1) is the maximum number of ASIDs that can be shared with the IOMMU.
Signed-off-by: Jean-Philippe Brucker <[email protected]>
Reviewed-by: Jonathan Cameron <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Will Deacon <[email protected]>
|
|
kvm_vcpu_kick() is not NMI safe. When the overflow handler is called from
NMI context, defer waking the vcpu to an irq_work queue.
A vcpu can be freed while it's not running by kvm_destroy_vm(). Prevent
running the irq_work for a non-existent vcpu by calling irq_work_sync() on
the PMU destroy path.
[Alexandru E.: Added irq_work_sync()]
Signed-off-by: Julien Thierry <[email protected]>
Signed-off-by: Alexandru Elisei <[email protected]>
Tested-by: Sumit Garg <[email protected]> (Developerbox)
Cc: Julien Thierry <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: James Morse <[email protected]>
Cc: Suzuki K Pouloze <[email protected]>
Cc: [email protected]
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Will Deacon <[email protected]>
|