Age | Commit message (Collapse) | Author | Files | Lines |
|
Syncpoint IRQs are currently requested in a code path that runs
during resume. Due to this, we get multiple overlapping registered
interrupt handlers as host1x is suspended and resumed.
Rearrange interrupt code to only request IRQs during initialization.
Signed-off-by: Mikko Perttunen <[email protected]>
Signed-off-by: Thierry Reding <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
Support sharded syncpoint interrupts on Tegra234+. This feature
allows specifying one of eight interrupt lines for each syncpoint
to lower processing latency of syncpoint threshold
interrupts.
Signed-off-by: Mikko Perttunen <[email protected]>
Signed-off-by: Thierry Reding <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
Use the newly implemented tegra_dev_iommu_get_stream_id() helper to
encapsulate and centralize the IOMMU stream ID access.
Signed-off-by: Thierry Reding <[email protected]>
|
|
Currently all fences have a 30 second timeout to ensure they are
cleaned up if the fence never completes otherwise. However, this
one size fits all solution doesn't actually fit in every case,
such as syncpoint waiting where we want to be able to have timeouts
longer than 30 seconds. As such, we want to be able to give control
over fence cancellation to the caller (and maybe eventually get rid
of the internal timeout altogether).
Here we add this cancellation mechanism by essentially adding a
function for entering the timeout path by function call, and changing
the syncpoint wait function to use it.
Signed-off-by: Mikko Perttunen <[email protected]>
Signed-off-by: Thierry Reding <[email protected]>
|
|
Move from the old, complex intr handling code to a new implementation
based on dma_fences. While there is a fair bit of churn to get there,
the new implementation is much simpler and likely faster as well due
to allowing signaling directly from interrupt context.
Signed-off-by: Mikko Perttunen <[email protected]>
Signed-off-by: Thierry Reding <[email protected]>
|
|
In anticipation of removal of the intr API, implement job tracking
using DMA fences instead. The main two things about this are
making cdma_update schedule the work since fence completion can
now be called from interrupt context, and some complication in
ensuring the callback is not running when we free the fence.
Signed-off-by: Mikko Perttunen <[email protected]>
Signed-off-by: Thierry Reding <[email protected]>
|
|
The code to write the syncpoint channel assignment register
incorrectly skips the write if hypervisor registers are not available.
The register, however, is within the guest aperture so remove the
check and assign syncpoints properly even on virtualized systems.
Fixes: c3f52220f276 ("gpu: host1x: Enable Tegra186 syncpoint protection")
Signed-off-by: Mikko Perttunen <[email protected]>
Signed-off-by: Thierry Reding <[email protected]>
|
|
On Tegra186+, the syncpoint ID has 10 bits of space. To allow
using more than 256 syncpoints, fix the mask.
Fixes: 9abdd497cd0a ("gpu: host1x: Tegra234 device data and headers")
Signed-off-by: Mikko Perttunen <[email protected]>
Signed-off-by: Thierry Reding <[email protected]>
|
|
During the refactoring of channel_submit(), assignment of syncval was
moved but it is also used in channel_submit(). Add this assignment back
to channel_submit() as well.
Signed-off-by: Mikko Perttunen <[email protected]>
Signed-off-by: Thierry Reding <[email protected]>
|
|
With the full-featured opcode sequence using MLOCKs, we need to also
unlock those MLOCKs in the event of a timeout. However, it turns out
that on Tegra186/Tegra194, by default, we don't need to do this;
furthermore, on Tegra234 it is much simpler to do; so only implement
this on Tegra234 for the time being.
Signed-off-by: Mikko Perttunen <[email protected]>
Signed-off-by: Thierry Reding <[email protected]>
|
|
For new (Tegra186+) SoCs, use a new ('full-featured') job opcode
sequence that is compatible with virtualization. In particular,
the Host1x hardware in Tegra234 is more strict regarding the sequence,
requiring ACQUIRE_MLOCK-SETCLASS-SETSTREAMID opcodes to occur in
that sequence without gaps (except for SETPAYLOAD), so let's do it
properly in one go now.
Signed-off-by: Mikko Perttunen <[email protected]>
Signed-off-by: Thierry Reding <[email protected]>
|
|
Add device data and chip headers for Tegra234.
Signed-off-by: Mikko Perttunen <[email protected]>
Signed-off-by: Thierry Reding <[email protected]>
|
|
On Tegra234, each Host1x VM has 8 interrupt lines. Each syncpoint
can be configured with which interrupt line should be used for
threshold interrupt, allowing for load balancing.
For now, to keep backwards compatibility, just set all syncpoints
to the first interrupt.
Signed-off-by: Mikko Perttunen <[email protected]>
Signed-off-by: Thierry Reding <[email protected]>
|
|
Host1x class information and opcodes are unchanged or backwards
compatible across SoCs so let's not duplicate them for each one
but have them in a shared header file.
At the same time, add opcode functions for acquire/release_mlock.
Signed-off-by: Mikko Perttunen <[email protected]>
Signed-off-by: Thierry Reding <[email protected]>
|
|
Add code to do stream ID switching at the beginning of a job. The
stream ID is switched to the stream ID specified by the context
passed in the job structure.
Before switching the stream ID, an OP_DONE wait is done on the
channel's engine to ensure that there is no residual ongoing
work that might do DMA using the new stream ID.
Signed-off-by: Mikko Perttunen <[email protected]>
Signed-off-by: Thierry Reding <[email protected]>
|
|
Add runtime PM and OPP support to the Host1x driver. For the starter we
will keep host1x always-on because dynamic power management require a major
refactoring of the driver code since lot's of code paths are missing the
RPM handling and we're going to remove some of these paths in the future.
Reviewed-by: Ulf Hansson <[email protected]>
Tested-by: Peter Geis <[email protected]> # Ouya T30
Tested-by: Paul Fertser <[email protected]> # PAZ00 T20
Tested-by: Nicolas Chauvet <[email protected]> # PAZ00 T20 and TK1 T124
Tested-by: Matt Merhar <[email protected]> # Ouya T30
Signed-off-by: Dmitry Osipenko <[email protected]>
Signed-off-by: Thierry Reding <[email protected]>
|
|
Show the values of the DMASTART and DMAEND registers when dumping status
to help with failure analysis.
Signed-off-by: Thierry Reding <[email protected]>
|
|
Dumping the full CDMA push buffer takes a long time and isn't very
useful since most of the contents are not relevant. Instead only show
the CDMA push buffer entries associated with current jobs.
While at it, tweak the indentation a bit to make the output more
readable.
Signed-off-by: Thierry Reding <[email protected]>
|
|
The host1x debug code uses a mix of phys_addr_t, dma_addr_t and u32 to
represent addresses. However, these addresses are always DMA addresses
so use the appropriate type.
This fixes some issues with how these addresses are displayed, because
they could be truncated in some cases and not show the full address.
Signed-off-by: Thierry Reding <[email protected]>
|
|
Add support for inserting syncpoint waits in the CDMA pushbuffer.
These waits need to be done in HOST1X class, while gather submitted
by the application execute in engine class.
Support is added by converting the gather list of job into a command
list that can include both gathers and waits. When the job is
submitted, these commands are pushed as the appropriate opcodes
on the CDMA pushbuffer.
Also supported are waits relative to the start of the job,
which are useful for jobs doing multiple things with an engine
that doesn't natively support pipelining.
While at it, use 32-bit waits on chips that support them.
Signed-off-by: Mikko Perttunen <[email protected]>
Signed-off-by: Thierry Reding <[email protected]>
|
|
Add a new property for jobs to enable or disable recovery i.e.
CPU increments of syncpoints to max value on job timeout. This
allows for a more solid model for hanged jobs, where userspace
doesn't need to guess if a syncpoint increment happened because
the job completed, or because job timeout was triggered.
On job timeout, we stop the channel, NOP all future jobs on the
channel using the same syncpoint, mark the syncpoint as locked
and resume the channel from the next job, if any.
The future jobs are NOPed, since because we don't do the CPU
increments, the value of the syncpoint is no longer synchronized,
and any waiters would become confused if a future job incremented
the syncpoint. The syncpoint is marked locked to ensure that any
future jobs cannot increment the syncpoint either, until the
application has recognized the situation and reallocated the
syncpoint.
Signed-off-by: Mikko Perttunen <[email protected]>
Signed-off-by: Thierry Reding <[email protected]>
|
|
Syncpoint interrupts are not working as expected on Tegra194. The
problem is that the syncpoint interrupt threshold being used is the
global interrupt threshold and not the virtual interrupt threshold.
Fix this by using the virtual interrupt threshold which aligns with
downstream.
Signed-off-by: Jon Hunter <[email protected]>
Signed-off-by: Mikko Perttunen <[email protected]>
Signed-off-by: Thierry Reding <[email protected]>
|
|
Add reference counting for allocated syncpoints to allow keeping
them allocated while jobs are referencing them. Additionally,
clean up various places using syncpoint IDs to use host1x_syncpt
pointers instead.
Signed-off-by: Mikko Perttunen <[email protected]>
Signed-off-by: Thierry Reding <[email protected]>
|
|
When job hangs and there is a memory error pointing at channel's push
buffer, it is very handy to know the push buffer's state. This patch
makes the push buffer's state to be dumped into KMSG in addition to the
job's gathers.
Signed-off-by: Dmitry Osipenko <[email protected]>
Signed-off-by: Thierry Reding <[email protected]>
|
|
Based on 1 normalized pattern(s):
this software is licensed under the terms of the gnu general public
license version 2 as published by the free software foundation and
may be copied distributed and modified under those terms this
program is distributed in the hope that it will be useful but
without any warranty without even the implied warranty of
merchantability or fitness for a particular purpose see the gnu
general public license for more details
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-only
has been chosen to replace the boilerplate/reference in 285 file(s).
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Alexios Zavras <[email protected]>
Reviewed-by: Allison Randal <[email protected]>
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
Based on 1 normalized pattern(s):
this program is free software you can redistribute it and or modify
it under the terms and conditions of the gnu general public license
version 2 as published by the free software foundation this program
is distributed in the hope it will be useful but without any
warranty without even the implied warranty of merchantability or
fitness for a particular purpose see the gnu general public license
for more details you should have received a copy of the gnu general
public license along with this program if not see http www gnu org
licenses
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-only
has been chosen to replace the boilerplate/reference in 228 file(s).
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Allison Randal <[email protected]>
Reviewed-by: Steve Winslow <[email protected]>
Reviewed-by: Richard Fontana <[email protected]>
Reviewed-by: Alexios Zavras <[email protected]>
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
If SMMU support is not available, fall back to programming the bypass
stream ID (0x7f).
Fixes: de5469c21ff9 ("gpu: host1x: Program the channel stream ID")
Suggested-by: Mikko Perttunen <[email protected]>
Signed-off-by: Arnd Bergmann <[email protected]>
Reviewed-by: Mikko Perttunen <[email protected]>
[[email protected]: rebase this on top of a later build fix]
Signed-off-by: Thierry Reding <[email protected]>
|
|
In case the IOMMU API is not available compiling host1x fails with
the following error:
In file included from drivers/gpu/host1x/hw/host1x06.c:27:
drivers/gpu/host1x/hw/channel_hw.c: In function ‘host1x_channel_set_streamid’:
drivers/gpu/host1x/hw/channel_hw.c:118:30: error: implicit declaration of function
‘dev_iommu_fwspec_get’; did you mean ‘iommu_fwspec_free’? [-Werror=implicit-function-declaration]
struct iommu_fwspec *spec = dev_iommu_fwspec_get(channel->dev->parent);
^~~~~~~~~~~~~~~~~~~~
iommu_fwspec_free
Fixes: de5469c21ff9 ("gpu: host1x: Program the channel stream ID")
Signed-off-by: Stefan Agner <[email protected]>
Signed-off-by: Thierry Reding <[email protected]>
|
|
Currently gathers of a hung job are getting NOP'ed and a restarted CDMA
executes the NOP'ed gathers. There shouldn't be a reason to not restart
CDMA execution starting with a next job, avoiding the unnecessary churning
with gathers NOP'ing.
Signed-off-by: Dmitry Osipenko <[email protected]>
Reviewed-by: Mikko Perttunen <[email protected]>
Signed-off-by: Thierry Reding <[email protected]>
|
|
The HOST1X_CHANNEL_DMAEND is an offset relative to the value written to
the HOST1X_CHANNEL_DMASTART register, but it is currently treated as an
absolute address. This can cause SMMU faults if the CDMA fetches past a
pushbuffer's IOMMU mapping.
Properly setting the DMAEND prevents the CDMA from fetching beyond that
address and avoid such issues. This is currently not observed because a
whole (almost) page of essentially scratch space absorbs any excessive
prefetching by CDMA. However, changing the number of slots in the push
buffer can trigger these SMMU faults.
Signed-off-by: Thierry Reding <[email protected]>
|
|
Tegra186 and later support 40 bits of address space. Additional
registers need to be programmed to store the full 40 bits of push
buffer addresses.
Since command stream gathers can also reside in buffers in a 40-bit
address space, a new variant of the GATHER opcode is also introduced.
It takes two parameters: the first parameter contains the lower 32
bits of the address and the second parameter contains bits 32 to 39.
Signed-off-by: Thierry Reding <[email protected]>
|
|
When processing command streams, make sure the host1x's stream ID is
programmed for the channel so that addresses are properly translated
through the SMMU.
Signed-off-by: Thierry Reding <[email protected]>
|
|
The host1x hardware found on Tegra194 is mostly backwards compatible
with the version found on Tegra186, with the notable exceptions of the
increased number of syncpoints and mlocks. In addition, some rarely
used features such as syncpoint wait bases were dropped and some
registers had to move around to accomodate the increased number of
syncpoints.
Signed-off-by: Thierry Reding <[email protected]>
|
|
The number of syncpoints on Tegra186 is 576 and therefore no longer fits
into 8 bits. Increase the size of the syncpoint ID field to 10 in order
to accomodate all syncpoints.
Reviewed-by: Mikko Perttunen <[email protected]>
Signed-off-by: Thierry Reding <[email protected]>
|
|
The register region allocated per channel was decreased from 16384 bytes
to 256 bytes on Tegra186 and later. Resize the region to make sure every
channel (instead of only the first) is properly programmed.
Suggested-by: Mikko Perttunen <[email protected]>
Reviewed-by: Mikko Perttunen <[email protected]>
Signed-off-by: Thierry Reding <[email protected]>
|
|
Functions taking a pointer to a host1x syncpoint as an argument don't
need to specify a pointer to a host1x instance because it can be
obtained from the syncpoint.
Reviewed-by: Dmitry Osipenko <[email protected]>
Tested-by: Dmitry Osipenko <[email protected]>
Signed-off-by: Thierry Reding <[email protected]>
|
|
The job submission userspace ABI doesn't support this and there are no
plans to implement it, so all of this code is dead and can be removed.
Reviewed-by: Dmitry Osipenko <[email protected]>
Tested-by: Dmitry Osipenko <[email protected]>
Signed-off-by: Thierry Reding <[email protected]>
|
|
The disassembler for debug dumps was missing some newer host1x opcodes.
Add disassembly support for these.
Signed-off-by: Mikko Perttunen <[email protected]>
Reviewed-by: Dmitry Osipenko <[email protected]>
Tested-by: Dmitry Osipenko <[email protected]>
Signed-off-by: Thierry Reding <[email protected]>
|
|
The host1x driver prints out "disassembly" dumps of the command FIFO
and gather contents on submission timeouts. However, the output has
been quite difficult to read with unnecessary newlines and occasional
missing parentheses.
Fix these problems by using pr_cont to remove unnecessary newlines
and by fixing other small issues.
Signed-off-by: Mikko Perttunen <[email protected]>
Reviewed-by: Dmitry Osipenko <[email protected]>
Tested-by: Dmitry Osipenko <[email protected]>
Signed-off-by: Thierry Reding <[email protected]>
|
|
The gather filter is a feature present on Tegra124 and newer where the
hardware prevents GATHERed command buffers from executing commands
normally reserved for the CDMA pushbuffer which is maintained by the
kernel driver.
This commit enables the gather filter on all supporting hardware.
Signed-off-by: Mikko Perttunen <[email protected]>
Reviewed-by: Dmitry Osipenko <[email protected]>
Signed-off-by: Thierry Reding <[email protected]>
|
|
Since Tegra186 the Host1x hardware allows syncpoints to be assigned to
specific channels, preventing any other channels from incrementing
them.
Enable this feature where available and assign syncpoints to channels
when submitting a job. Syncpoints are currently never unassigned from
channels since that would require extra work and is unnecessary with
the current channel allocation model.
Signed-off-by: Mikko Perttunen <[email protected]>
Reviewed-by: Dmitry Osipenko <[email protected]>
Signed-off-by: Thierry Reding <[email protected]>
|
|
Add support for the implementation of Host1x present on the Tegra186.
The register space has been shuffled around a little bit, requiring
addition of some chip-specific code sections. Tegra186 also adds
several new features, most importantly the hypervisor, but those are
not yet supported with this commit.
Signed-off-by: Mikko Perttunen <[email protected]>
Reviewed-by: Dmitry Osipenko <[email protected]>
Tested-by: Dmitry Osipenko <[email protected]>
Signed-off-by: Thierry Reding <[email protected]>
|
|
Some parts of Host1x uses BIT_WORD/BIT_MASK/BITS_PER_LONG to calculate
register or field offsets. This worked fine on ARMv7, but now that
BITS_PER_LONG is 64 but our registers are still 32-bit things are
broken.
Fix by replacing..
- BIT_WORD with (x / 32)
- BIT_MASK with BIT(x % 32)
- BITS_PER_LONG with 32
Signed-off-by: Mikko Perttunen <[email protected]>
Reviewed-by: Dmitry Osipenko <[email protected]>
Tested-by: Dmitry Osipenko <[email protected]>
Signed-off-by: Thierry Reding <[email protected]>
|
|
This is largely a rewrite of the Host1x channel allocation code, bringing
several changes:
- The previous code could deadlock due to an interaction
between the 'reflock' mutex and CDMA timeout handling.
This gets rid of the mutex.
- Support for more than 32 channels, required for Tegra186
- General refactoring, including better encapsulation
of channel ownership handling into channel.c
Signed-off-by: Mikko Perttunen <[email protected]>
Reviewed-by: Dmitry Osipenko <[email protected]>
Tested-by: Dmitry Osipenko <[email protected]>
Signed-off-by: Thierry Reding <[email protected]>
|
|
Add support for the Host1x unit to be located behind
an IOMMU. This is required when gather buffers may be
allocated non-contiguously in physical memory, as can
be the case when TegraDRM is also using the IOMMU.
Signed-off-by: Mikko Perttunen <[email protected]>
Signed-off-by: Thierry Reding <[email protected]>
|
|
There's no need to wrap the BIT() macro into an extra set of parentheses
because it's already implemented to use its own set.
Signed-off-by: Thierry Reding <[email protected]>
|
|
Insert a number of blank lines in places where they increase readability
of the code. Also collapse various variable declarations to shorten some
functions and finally rewrite some code for readability.
Signed-off-by: Thierry Reding <[email protected]>
|
|
Fix a couple of occurrences where no blank line was used to separate
variable declarations from code or where block comments were wrongly
formatted.
Signed-off-by: Thierry Reding <[email protected]>
|
|
IDs can never be negative so use unsigned int. In some instances an
explicitly sized type (such as u32) was used for no particular reason,
so turn those into unsigned int as well for consistency.
Signed-off-by: Thierry Reding <[email protected]>
|
|
The number of channels, syncpoints, bases and mlocks can never be
negative, so use unsigned int instead of int. Also make loop variables
the same type for consistency.
Signed-off-by: Thierry Reding <[email protected]>
|