Age | Commit message (Collapse) | Author | Files | Lines |
|
Thomas needs 5a498d4d06d6 ("drm/fbdev-dma: Only install deferred I/O
if necessary") in drm-misc, so start the backmerge cascade.
Signed-off-by: Simona Vetter <[email protected]>
|
|
We forgot to disable preemption around the write_seqcount_begin/end() pair
while updating GPU stats:
[ ] WARNING: CPU: 2 PID: 12 at include/linux/seqlock.h:221 __seqprop_assert.isra.0+0x128/0x150 [v3d]
[ ] Workqueue: v3d_bin drm_sched_run_job_work [gpu_sched]
<...snip...>
[ ] Call trace:
[ ] __seqprop_assert.isra.0+0x128/0x150 [v3d]
[ ] v3d_job_start_stats.isra.0+0x90/0x218 [v3d]
[ ] v3d_bin_job_run+0x23c/0x388 [v3d]
[ ] drm_sched_run_job_work+0x520/0x6d0 [gpu_sched]
[ ] process_one_work+0x62c/0xb48
[ ] worker_thread+0x468/0x5b0
[ ] kthread+0x1c4/0x1e0
[ ] ret_from_fork+0x10/0x20
Fix it.
Cc: Maíra Canal <[email protected]>
Cc: [email protected] # v6.10+
Fixes: 6abe93b621ab ("drm/v3d: Fix race-condition between sysfs/fdinfo and interrupt handler")
Signed-off-by: Tvrtko Ursulin <[email protected]>
Acked-by: Maíra Canal <[email protected]>
Signed-off-by: Maíra Canal <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
amdgpu pr conconflicts due to patches cherry-picked to -fixes, I might
as well catch up with a backmerge and handle them all. Plus both misc
and intel maintainers asked for a backmerge anyway.
Signed-off-by: Daniel Vetter <[email protected]>
|
|
When enabling UBSAN on Raspberry Pi 5, we get the following warning:
[ 387.894977] UBSAN: array-index-out-of-bounds in drivers/gpu/drm/v3d/v3d_sched.c:320:3
[ 387.903868] index 7 is out of range for type '__u32 [7]'
[ 387.909692] CPU: 0 PID: 1207 Comm: kworker/u16:2 Tainted: G WC 6.10.3-v8-16k-numa #151
[ 387.919166] Hardware name: Raspberry Pi 5 Model B Rev 1.0 (DT)
[ 387.925961] Workqueue: v3d_csd drm_sched_run_job_work [gpu_sched]
[ 387.932525] Call trace:
[ 387.935296] dump_backtrace+0x170/0x1b8
[ 387.939403] show_stack+0x20/0x38
[ 387.942907] dump_stack_lvl+0x90/0xd0
[ 387.946785] dump_stack+0x18/0x28
[ 387.950301] __ubsan_handle_out_of_bounds+0x98/0xd0
[ 387.955383] v3d_csd_job_run+0x3a8/0x438 [v3d]
[ 387.960707] drm_sched_run_job_work+0x520/0x6d0 [gpu_sched]
[ 387.966862] process_one_work+0x62c/0xb48
[ 387.971296] worker_thread+0x468/0x5b0
[ 387.975317] kthread+0x1c4/0x1e0
[ 387.978818] ret_from_fork+0x10/0x20
[ 387.983014] ---[ end trace ]---
This happens because the UAPI provides only seven configuration
registers and we are reading the eighth position of this u32 array.
Therefore, fix the out-of-bounds read in `v3d_csd_job_run()` by
accessing only seven positions on the '__u32 [7]' array. The eighth
register exists indeed on V3D 7.1, but it isn't currently used. That
being so, let's guarantee that it remains unused and add a note that it
could be set in a future patch.
Fixes: 0ad5bc1ce463 ("drm/v3d: fix up register addresses for V3D 7.x")
Reported-by: Tvrtko Ursulin <[email protected]>
Signed-off-by: Maíra Canal <[email protected]>
Reviewed-by: Iago Toral Quiroga <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
https://gitlab.freedesktop.org/drm/misc/kernel into drm-next
drm-misc-next for v6.12:
UAPI Changes:
virtio:
- Define DRM capset
Cross-subsystem Changes:
dma-buf:
- heaps: Clean up documentation
printk:
- Pass description to kmsg_dump()
Core Changes:
CI:
- Update IGT tests
- Point upstream repo to GitLab instance
modesetting:
- Introduce Power Saving Policy property for connectors
- Add might_fault() to drm_modeset_lock priming
- Add dynamic per-crtc vblank configuration support
panic:
- Avoid build-time interference with framebuffer console
docs:
- Document Colorspace property
scheduler:
- Remove full_recover from drm_sched_start
TTM:
- Make LRU walk restartable after dropping locks
- Allow direct reclaim to allocate local memory
Driver Changes:
amdgpu:
- Support Power Saving Policy connector property
ast:
- astdp: Support AST2600 with VGA; Clean up HPD
bridge:
- Silence error message on -EPROBE_DEFER
- analogix: Clean aup
- bridge-connector: Fix double free
- lt6505: Disable interrupt when powered off
- tc358767: Make default DP port preemphasis configurable
gma500:
- Update i2c terminology
ivpu:
- Add MODULE_FIRMWARE()
lcdif:
- Fix pixel clock
loongson:
- Use GEM refcount over TTM's
mgag200:
- Improve BMC handling
- Support VBLANK intterupts
nouveau:
- Refactor and clean up internals
- Use GEM refcount over TTM's
panel:
- Shutdown fixes plus documentation
- Refactor several drivers for better code sharing
- boe-th101mb31ig002: Support for starry-er88577 MIPI-DSI panel plus
DT; Fix porch parameter
- edp: Support AOU B116XTN02.3, AUO B116XAN06.1, AOU B116XAT04.1,
BOE NV140WUM-N41, BOE NV133WUM-N63, BOE NV116WHM-A4D, CMN N116BCA-EA2,
CMN N116BCP-EA2, CSW MNB601LS1-4
- himax-hx8394: Support Microchip AC40T08A MIPI Display panel plus DT
- ilitek-ili9806e: Support Densitron DMT028VGHMCMI-1D TFT plus DT
- jd9365da: Support Melfas lmfbx101117480 MIPI-DSI panel plus DT; Refactor
for code sharing
sti:
- Fix module owner
stm:
- Avoid UAF wih managed plane and CRTC helpers
- Fix module owner
- Fix error handling in probe
- Depend on COMMON_CLK
- ltdc: Fix transparency after disabling plane; Remove unused interrupt
tegra:
- Call drm_atomic_helper_shutdown()
v3d:
- Clean up perfmon
vkms:
- Clean up
Signed-off-by: Daniel Vetter <[email protected]>
From: Thomas Zimmermann <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
Let's start the new drm-misc-fixes cycle by bringing in 6.11-rc1.
Signed-off-by: Maxime Ripard <[email protected]>
|
|
Backmerging to get a late RC of v6.10 before moving into v6.11.
Signed-off-by: Thomas Zimmermann <[email protected]>
|
|
This was basically just another one of amdgpus hacks. The parameter
allowed to restart the scheduler without turning fence signaling on
again.
That this is absolutely not a good idea should be obvious by now since
the fences will then just sit there and never signal.
While at it cleanup the code a bit.
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Matthew Brost <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
If fetching of userspace memory fails during the main loop, all drm sync
objs looked up until that point will be leaked because of the missing
drm_syncobj_put.
Fix it by exporting and using a common cleanup helper.
Signed-off-by: Tvrtko Ursulin <[email protected]>
Fixes: bae7cb5d6800 ("drm/v3d: Create a CPU job extension for the reset performance query job")
Cc: Maíra Canal <[email protected]>
Cc: Iago Toral Quiroga <[email protected]>
Cc: [email protected] # v6.8+
Signed-off-by: Maíra Canal <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit 484de39fa5f5b7bd0c5f2e2c5265167250ef7501)
Signed-off-by: Thomas Zimmermann <[email protected]>
|
|
If fetching of userspace memory fails during the main loop, all drm sync
objs looked up until that point will be leaked because of the missing
drm_syncobj_put.
Fix it by exporting and using a common cleanup helper.
Signed-off-by: Tvrtko Ursulin <[email protected]>
Fixes: 9ba0ff3e083f ("drm/v3d: Create a CPU job extension for the timestamp query job")
Cc: Maíra Canal <[email protected]>
Cc: Iago Toral Quiroga <[email protected]>
Cc: [email protected] # v6.8+
Reviewed-by: Maíra Canal <[email protected]>
Signed-off-by: Maíra Canal <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit 753ce4fea62182c77e1691ab4f9022008f25b62e)
Signed-off-by: Thomas Zimmermann <[email protected]>
|
|
`args->cfg[4]` is configured in Indirect Dispatch using the number of
batches. Currently, for all V3D tech versions, `args->cfg[4]` equals the
number of batches subtracted by 1. But, for V3D 7.1.6 and later, we must not
subtract 1 from the number of batches.
Implement the fix by checking the V3D tech version and revision.
Fixes several `dEQP-VK.synchronization*` CTS tests related to Indirect Dispatch.
Fixes: 18b8413b25b7 ("drm/v3d: Create a CPU job extension for a indirect CSD job")
Signed-off-by: Maíra Canal <[email protected]>
Reviewed-by: Iago Toral Quiroga <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
Removing the intermediate buffer removes the last use of the
V3D_MAX_COUNTERS define, which will enable further driver cleanup.
While at it pull the 32 vs 64 bit copying decision outside the loop in
order to reduce the number of conditional instructions.
Signed-off-by: Tvrtko Ursulin <[email protected]>
Reviewed-by: Iago Toral Quiroga <[email protected]>
Reviewed-by: Maíra Canal <[email protected]>
Signed-off-by: Maíra Canal <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
Instead of statically reserving pessimistic space for the kperfmon_ids
array, make the userspace extension code allocate the exactly required
amount of space.
Apart from saving some memory at runtime, this also removes the need for
the V3D_MAX_PERFMONS macro whose removal will benefit further driver
cleanup.
Signed-off-by: Tvrtko Ursulin <[email protected]>
Reviewed-by: Maíra Canal <[email protected]>
Signed-off-by: Maíra Canal <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
If fetching of userspace memory fails during the main loop, all drm sync
objs looked up until that point will be leaked because of the missing
drm_syncobj_put.
Fix it by exporting and using a common cleanup helper.
Signed-off-by: Tvrtko Ursulin <[email protected]>
Fixes: bae7cb5d6800 ("drm/v3d: Create a CPU job extension for the reset performance query job")
Cc: Maíra Canal <[email protected]>
Cc: Iago Toral Quiroga <[email protected]>
Cc: [email protected] # v6.8+
Signed-off-by: Maíra Canal <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
If fetching of userspace memory fails during the main loop, all drm sync
objs looked up until that point will be leaked because of the missing
drm_syncobj_put.
Fix it by exporting and using a common cleanup helper.
Signed-off-by: Tvrtko Ursulin <[email protected]>
Fixes: 9ba0ff3e083f ("drm/v3d: Create a CPU job extension for the timestamp query job")
Cc: Maíra Canal <[email protected]>
Cc: Iago Toral Quiroga <[email protected]>
Cc: [email protected] # v6.8+
Reviewed-by: Maíra Canal <[email protected]>
Signed-off-by: Maíra Canal <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
V3D_PERFCNT_NUM represents the maximum number of performance counters
for V3D 4.2, but not for V3D 7.1. This means that, if we use
V3D_PERFCNT_NUM, we might go out-of-bounds on V3D 7.1.
Therefore, use the number of performance counters on V3D 7.1 as the
maximum number of counters. This will allow us to create arrays on the
stack with reasonable size. Note that userspace must use the value
provided by DRM_V3D_PARAM_MAX_PERF_COUNTERS.
Signed-off-by: Maíra Canal <[email protected]>
Reviewed-by: Iago Toral Quiroga <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
In V3D, the conclusion of a job is indicated by a IRQ. When a job
finishes, then we update the local and the global GPU stats of that
queue. But, while the GPU stats are being updated, a user might be
reading the stats from sysfs or fdinfo.
For example, on `gpu_stats_show()`, we could think about a scenario where
`v3d->queue[queue].start_ns != 0`, then an interrupt happens, we update
the value of `v3d->queue[queue].start_ns` to 0, we come back to
`gpu_stats_show()` to calculate `active_runtime` and now,
`active_runtime = timestamp`.
In this simple example, the user would see a spike in the queue usage,
that didn't match reality.
In order to address this issue properly, use a seqcount to protect read
and write sections of the code.
Fixes: 09a93cc4f7d1 ("drm/v3d: Implement show_fdinfo() callback for GPU usage stats")
Reported-by: Tvrtko Ursulin <[email protected]>
Signed-off-by: Maíra Canal <[email protected]>
Reviewed-by: Tvrtko Ursulin <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
Given a set of GPU stats, that is, a `struct v3d_stats` related to a
queue in a given context, create a function that can update this set
of GPU stats.
Signed-off-by: Maíra Canal <[email protected]>
Reviewed-by: Tvrtko Ursulin <[email protected]>
Reviewed-by: Jose Maria Casanova Crespo <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
This will make it easier to instantiate the GPU stats variables and it
will create a structure where we can store all the variables that refer
to GPU stats.
Note that, when we created the struct `v3d_stats`, we renamed
`jobs_sent` to `jobs_completed`. This better express the semantics of
the variable, as we are only accounting jobs that have been completed.
Signed-off-by: Maíra Canal <[email protected]>
Reviewed-by: Tvrtko Ursulin <[email protected]>
Reviewed-by: Jose Maria Casanova Crespo <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
Currently, we manually perform all operations to update the GPU stats
variables. Apart from the code repetition, this is very prone to errors,
as we can see on commit 35f4f8c9fc97 ("drm/v3d: Don't increment
`enabled_ns` twice").
Therefore, create two functions to manage updating all GPU stats
variables. Now, the jobs only need to call for `v3d_job_update_stats()`
when the job is done and `v3d_job_start_stats()` when starting the job.
Co-developed-by: Tvrtko Ursulin <[email protected]>
Signed-off-by: Tvrtko Ursulin <[email protected]>
Signed-off-by: Maíra Canal <[email protected]>
Reviewed-by: Jose Maria Casanova Crespo <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
A CPU job is a type of job that performs operations that requires CPU
intervention. A copy performance query job is a job that copy the complete
or partial result of a query to a buffer. In order to copy the result of
a performance query to a buffer, we need to get the values from the
performance monitors.
So, create a user extension for the CPU job that enables the creation
of a copy performance query job. This user extension will allow the creation
of a CPU job that copy the results of a performance query to a BO with the
possibility to indicate the availability with a availability bit.
Signed-off-by: Maíra Canal <[email protected]>
Reviewed-by: Iago Toral Quiroga <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
A CPU job is a type of job that performs operations that requires CPU
intervention. A reset performance query job is a job that resets the
performance queries by resetting the values of the perfmons. Moreover,
we also reset the syncobjs related to the availability of the query.
So, create a user extension for the CPU job that enables the creation
of a reset performance job. This user extension will allow the creation of
a CPU job that resets the perfmons values and resets the availability syncobj.
Signed-off-by: Maíra Canal <[email protected]>
Reviewed-by: Iago Toral Quiroga <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
A CPU job is a type of job that performs operations that requires CPU
intervention. A copy timestamp query job is a job that copy the complete
or partial result of a query to a buffer. As V3D doesn't provide any
mechanism to obtain a timestamp from the GPU, it is a job that needs
CPU intervention.
So, create a user extension for the CPU job that enables the creation
of a copy timestamp query job. This user extension will allow the creation
of a CPU job that copy the results of a timestamp query to a BO with the
possibility to indicate the timestamp availability with a availability bit.
Signed-off-by: Maíra Canal <[email protected]>
Reviewed-by: Iago Toral Quiroga <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
A CPU job is a type of job that performs operations that requires CPU
intervention. A reset timestamp job is a job that resets the timestamp
queries based on the value offset of the first query. As V3D doesn't
provide any mechanism to obtain a timestamp from the GPU, it is a job
that needs CPU intervention.
So, create a user extension for the CPU job that enables the creation
of a reset timestamp job. This user extension will allow the creation of
a CPU job that resets the timestamp value in the timestamp BO and resets
the availability syncobj.
Signed-off-by: Maíra Canal <[email protected]>
Reviewed-by: Iago Toral Quiroga <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
A CPU job is a type of job that performs operations that requires CPU
intervention. A timestamp query job is a job that calculates the
query timestamp and updates the query availability by signaling a
syncobj. As V3D doesn't provide any mechanism to obtain a timestamp
from the GPU, it is a job that needs CPU intervention.
So, create a user extension for the CPU job that enables the creation
of a timestamp query job. This user extension will allow the creation of
a CPU job that performs the timestamp query calculation and updates the
timestamp BO with the proper value.
Signed-off-by: Maíra Canal <[email protected]>
Reviewed-by: Iago Toral Quiroga <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
A CPU job is a type of job that performs operations that requires CPU
intervention. An indirect CSD job is a job that, when executed in the
queue, will map the indirect buffer, read the dispatch parameters, and
submit a regular dispatch. Therefore, it is a job that needs CPU
intervention.
So, create a user extension for the CPU job that enables the creation
of an indirect CSD. This user extension will allow the creation of a CSD
job linked to a CPU job. The CPU job will wait for the indirect CSD job
dependencies and, once they are signaled, it will update the CSD job
parameters.
Co-developed-by: Melissa Wen <[email protected]>
Signed-off-by: Melissa Wen <[email protected]>
Signed-off-by: Maíra Canal <[email protected]>
Reviewed-by: Iago Toral Quiroga <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
Create tracepoints to track the three major events of a CPU job
lifetime:
1. Submission of a `v3d_submit_cpu` IOCTL
2. Beginning of the execution of a CPU job
3. Ending of the execution of a CPU job
Signed-off-by: Maíra Canal <[email protected]>
Reviewed-by: Iago Toral Quiroga <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
Create a new type of job, a CPU job. A CPU job is a type of job that
performs operations that requires CPU intervention. The overall idea is
to use user extensions to enable different types of CPU job, allowing the
CPU job to perform different operations according to the type of user
extension. The user extension ID identify the type of CPU job that must
be dealt.
Having a CPU job is interesting for synchronization purposes as a CPU
job has a queue like any other V3D job and can be synchoronized by the
multisync extension.
Signed-off-by: Melissa Wen <[email protected]>
Co-developed-by: Maíra Canal <[email protected]>
Signed-off-by: Maíra Canal <[email protected]>
Reviewed-by: Iago Toral Quiroga <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
The previous patch exposed the accumulated amount of active time per
client for each V3D queue. But this doesn't provide a global notion of
the GPU usage.
Therefore, provide the accumulated amount of active time for each V3D
queue (BIN, RENDER, CSD, TFU and CACHE_CLEAN), considering all the jobs
submitted to the queue, independent of the client.
This data is exposed through the sysfs interface, so that if the
interface is queried at two different points of time the usage percentage
of each of the queues can be calculated.
Co-developed-by: Jose Maria Casanova Crespo <[email protected]>
Signed-off-by: Jose Maria Casanova Crespo <[email protected]>
Signed-off-by: Maíra Canal <[email protected]>
Acked-by: Jose Maria Casanova Crespo <[email protected]>
Reviewed-by: Melissa Wen <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
This patch exposes the accumulated amount of active time per client
through the fdinfo infrastructure. The amount of active time is exposed
for each V3D queue: BIN, RENDER, CSD, TFU and CACHE_CLEAN.
In order to calculate the amount of active time per client, a CPU clock
is used through the function local_clock(). The point where the jobs has
started is marked and is finally compared with the time that the job had
finished.
Moreover, the number of jobs submitted to each queue is also exposed on
fdinfo through the identifier "v3d-jobs-<queue>".
Co-developed-by: Jose Maria Casanova Crespo <[email protected]>
Signed-off-by: Jose Maria Casanova Crespo <[email protected]>
Signed-off-by: Maíra Canal <[email protected]>
Acked-by: Jose Maria Casanova Crespo <[email protected]>
Reviewed-by: Melissa Wen <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
This patch updates a number of register addresses that have
been changed in Raspberry Pi 5 (V3D 7.1) and updates the
code to use the corresponding registers and addresses based
on the actual V3D version.
Signed-off-by: Iago Toral Quiroga <[email protected]>
Reviewed-by: Maíra Canal <[email protected]>
Signed-off-by: Maíra Canal <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
In Xe, the new Intel GPU driver, a choice has made to have a 1 to 1
mapping between a drm_gpu_scheduler and drm_sched_entity. At first this
seems a bit odd but let us explain the reasoning below.
1. In Xe the submission order from multiple drm_sched_entity is not
guaranteed to be the same completion even if targeting the same hardware
engine. This is because in Xe we have a firmware scheduler, the GuC,
which allowed to reorder, timeslice, and preempt submissions. If a using
shared drm_gpu_scheduler across multiple drm_sched_entity, the TDR falls
apart as the TDR expects submission order == completion order. Using a
dedicated drm_gpu_scheduler per drm_sched_entity solve this problem.
2. In Xe submissions are done via programming a ring buffer (circular
buffer), a drm_gpu_scheduler provides a limit on number of jobs, if the
limit of number jobs is set to RING_SIZE / MAX_SIZE_PER_JOB we get flow
control on the ring for free.
A problem with this design is currently a drm_gpu_scheduler uses a
kthread for submission / job cleanup. This doesn't scale if a large
number of drm_gpu_scheduler are used. To work around the scaling issue,
use a worker rather than kthread for submission / job cleanup.
v2:
- (Rob Clark) Fix msm build
- Pass in run work queue
v3:
- (Boris) don't have loop in worker
v4:
- (Tvrtko) break out submit ready, stop, start helpers into own patch
v5:
- (Boris) default to ordered work queue
v6:
- (Luben / checkpatch) fix alignment in msm_ringbuffer.c
- (Luben) s/drm_sched_submit_queue/drm_sched_wqueue_enqueue
- (Luben) Update comment for drm_sched_wqueue_enqueue
- (Luben) Positive check for submit_wq in drm_sched_init
- (Luben) s/alloc_submit_wq/own_submit_wq
v7:
- (Luben) s/drm_sched_wqueue_enqueue/drm_sched_run_job_queue
v8:
- (Luben) Adjust var names / comments
Signed-off-by: Matthew Brost <[email protected]>
Reviewed-by: Luben Tuikov <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Luben Tuikov <[email protected]>
|
|
The GPU scheduler has now a variable number of run-queues, which are set up at
drm_sched_init() time. This way, each driver announces how many run-queues it
requires (supports) per each GPU scheduler it creates. Note, that run-queues
correspond to scheduler "priorities", thus if the number of run-queues is set
to 1 at drm_sched_init(), then that scheduler supports a single run-queue,
i.e. single "priority". If a driver further sets a single entity per
run-queue, then this creates a 1-to-1 correspondence between a scheduler and
a scheduled entity.
Cc: Lucas Stach <[email protected]>
Cc: Russell King <[email protected]>
Cc: Qiang Yu <[email protected]>
Cc: Rob Clark <[email protected]>
Cc: Abhinav Kumar <[email protected]>
Cc: Dmitry Baryshkov <[email protected]>
Cc: Danilo Krummrich <[email protected]>
Cc: Matthew Brost <[email protected]>
Cc: Boris Brezillon <[email protected]>
Cc: Alex Deucher <[email protected]>
Cc: Christian König <[email protected]>
Cc: Emma Anholt <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Luben Tuikov <[email protected]>
Acked-by: Christian König <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Remove redundant error message (since now it is very similar to what
we do in drm_sched_init) and centralize all error handling in a
unique place, as we follow the same steps in any case of failure.
Signed-off-by: Melissa Wen <[email protected]>
Acked-by: Andrey Grodzovsky <[email protected]>
Signed-off-by: Melissa Wen <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
Add device pointer so scheduler's printing can use
DRM_DEV_ERROR() instead, which makes life easier under multiple GPU
scenario.
v2: amend all calls of drm_sched_init()
v3: fill dev pointer for all drm_sched_init() calls
Signed-off-by: Jiawei Gu <[email protected]>
Reviewed-by: Andrey Grodzovsky <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Christian König <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
With the prep work out of the way this isn't tricky anymore.
Aside: The chaining of the various jobs is a bit awkward, with the
possibility of failure in bad places. I think with the
drm_sched_job_init/arm split and maybe preloading the
job->dependencies xarray this should be fixable.
v2: Rebase over renamed function names for adding dependencies.
Reviewed-by: Melissa Wen <[email protected]> (v1)
Acked-by: Emma Anholt <[email protected]>
Cc: Melissa Wen <[email protected]>
Signed-off-by: Daniel Vetter <[email protected]>
Cc: Emma Anholt <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
Prep work for using the scheduler dependency handling. We need to call
drm_sched_job_init earlier so we can use the new drm_sched_job_await*
functions for dependency handling here.
v2: Slightly better commit message and rebase to include the
drm_sched_job_arm() call (Emma).
v3: Cleanup jobs under construction correctly (Emma)
v4: Rebase over perfmon patch
Reviewed-by: Melissa Wen <[email protected]> (v3)
Acked-by: Emma Anholt <[email protected]>
Cc: Melissa Wen <[email protected]>
Signed-off-by: Daniel Vetter <[email protected]>
Cc: Emma Anholt <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
The V3D engine has several hardware performance counters that can of
interest for userspace performance analysis tools.
This exposes new ioctls to create and destroy performance monitor
objects, as well as to query the counter values.
Each created performance monitor object has an ID that can be attached
to CL/CSD submissions, so the driver enables the requested counters when
the job is submitted, and updates the performance monitor values when
the job is done.
It is up to the user to ensure all the jobs have been finished before
getting the performance monitor values. It is also up to the user to
properly synchronize BCL jobs when submitting jobs with different
performance monitors attached.
Cc: Daniel Vetter <[email protected]>
Cc: David Airlie <[email protected]>
Cc: Emma Anholt <[email protected]>
To: [email protected]
Signed-off-by: Juan A. Suarez Romero <[email protected]>
Acked-by: Melissa Wen <[email protected]>
Signed-off-by: Melissa Wen <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
Mali Midgard/Bifrost GPUs have 3 hardware queues but only a global GPU
reset. This leads to extra complexity when we need to synchronize timeout
works with the reset work. One solution to address that is to have an
ordered workqueue at the driver level that will be used by the different
schedulers to queue their timeout work. Thanks to the serialization
provided by the ordered workqueue we are guaranteed that timeout
handlers are executed sequentially, and can thus easily reset the GPU
from the timeout handler without extra synchronization.
v5:
* Add a new paragraph to the timedout_job() method
v3:
* New patch
v4:
* Actually use the timeout_wq to queue the timeout work
Suggested-by: Daniel Vetter <[email protected]>
Signed-off-by: Boris Brezillon <[email protected]>
Reviewed-by: Steven Price <[email protected]>
Reviewed-by: Lucas Stach <[email protected]>
Acked-by: Daniel Vetter <[email protected]>
Acked-by: Christian König <[email protected]>
Cc: Qiang Yu <[email protected]>
Cc: Emma Anholt <[email protected]>
Cc: Alex Deucher <[email protected]>
Cc: "Christian König" <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
Allow multiple schedulers to share the load balancing score.
This is useful when one engine has different hw rings.
Signed-off-by: Christian König <[email protected]>
Reviewed-and-Tested-by: Leo Liu <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
Looks like this was not correctly adjusted.
Signed-off-by: Christian König <[email protected]>
Acked-by: Daniel Vetter <[email protected]>
Fixes: a6a1f036c74e ("drm/scheduler: Job timeout handler returns status (v3)")
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
This patch does not change current behaviour.
The driver's job timeout handler now returns
status indicating back to the DRM layer whether
the device (GPU) is no longer available, such as
after it's been unplugged, or whether all is
normal, i.e. current behaviour.
All drivers which make use of the
drm_sched_backend_ops' .timedout_job() callback
have been accordingly renamed and return the
would've-been default value of
DRM_GPU_SCHED_STAT_NOMINAL to restart the task's
timeout timer--this is the old behaviour, and is
preserved by this patch.
v2: Use enum as the status of a driver's job
timeout callback method.
v3: Return scheduler/device information, rather
than task information.
Cc: Alexander Deucher <[email protected]>
Cc: Andrey Grodzovsky <[email protected]>
Cc: Christian König <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Lucas Stach <[email protected]>
Cc: Russell King <[email protected]>
Cc: Christian Gmeiner <[email protected]>
Cc: Qiang Yu <[email protected]>
Cc: Rob Herring <[email protected]>
Cc: Tomeu Vizoso <[email protected]>
Cc: Steven Price <[email protected]>
Cc: Alyssa Rosenzweig <[email protected]>
Cc: Eric Anholt <[email protected]>
Reported-by: kernel test robot <[email protected]>
Signed-off-by: Luben Tuikov <[email protected]>
Acked-by: Alyssa Rosenzweig <[email protected]>
Acked-by: Christian König <[email protected]>
Acked-by: Steven Price <[email protected]>
Signed-off-by: Christian König <[email protected]>
Link: https://patchwork.freedesktop.org/patch/415095/
|
|
Fixes the following W=1 kernel build warning(s):
drivers/gpu/drm/v3d/v3d_sched.c:75: warning: Function parameter or member 'sched_job' not described in 'v3d_job_dependency'
drivers/gpu/drm/v3d/v3d_sched.c:75: warning: Function parameter or member 's_entity' not described in 'v3d_job_dependency'
Cc: Eric Anholt <[email protected]>
Cc: David Airlie <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Sumit Semwal <[email protected]>
Cc: "Christian König" <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Lee Jones <[email protected]>
Signed-off-by: Daniel Vetter <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
Some identifiers have different names between their prototypes
and the kernel-doc markup.
Others need to be fixed, as kernel-doc markups should use this format:
identifier - description
Signed-off-by: Mauro Carvalho Chehab <[email protected]>
Acked-by: Jani Nikula <[email protected]>
Signed-off-by: Daniel Vetter <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/12d4ca26f6843618200529ce5445063734d38c04.1605521731.git.mchehab+huawei@kernel.org
|
|
We already have it in v3d_dev->drm.dev with zero additional pointer
chasing. Personally I don't like duplicated pointers like this
because:
- reviewers need to check whether the pointer is for the same or
different objects if there's multiple
- compilers have an easier time too
But also a bit a bikeshed, so feel free to ignore.
Acked-by: Eric Anholt <[email protected]>
Signed-off-by: Daniel Vetter <[email protected]>
Cc: Eric Anholt <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
We now destroy finished jobs from the worker thread to make sure that
we never destroy a job currently in timeout processing.
By this we avoid holding lock around ring mirror list in drm_sched_stop
which should solve a deadlock reported by a user.
v2: Remove unused variable.
v4: Move guilty job free into sched code.
v5:
Move sched->hw_rq_count to drm_sched_start to account for counter
decrement in drm_sched_stop even when we don't call resubmit jobs
if guily job did signal.
v6: remove unused variable
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109692
Acked-by: Chunming Zhou <[email protected]>
Signed-off-by: Christian König <[email protected]>
Signed-off-by: Andrey Grodzovsky <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
It is the expectation of existing userspace (X11 + Mesa, in
particular) that jobs submitted to the kernel against a shared BO will
get implicitly synchronized by their submission order. If we want to
allow clever userspace to disable implicit synchronization, we should
do that under its own submit flag (as amdgpu and lima do).
Note that we currently only implicitly sync for the rendering pass,
not binning -- if you texture-from-pixmap in the binning vertex shader
(vertex coordinate generation), you'll miss out on synchronization.
Fixes flickering when multiple clients are running in parallel,
particularly GL apps and compositors.
v2: Fix a missing refcount on the CSD done fence for L2 cleaning.
Signed-off-by: Eric Anholt <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Acked-by: Rob Clark <[email protected]>
|
|
The compute shader dispatch interface is pretty simple -- just pass in
the regs that userspace has passed us, with no CLs to run. However,
with no CL to run it means that we need to do manual cache flushing of
the L2 after the HW execution completes (for SSBO, atomic, and
image_load_store writes that are the output of compute shaders).
This doesn't yet expose the L2 cache's ability to have a region of the
address space not write back to memory (which could be used for
shared_var storage).
So far, the Mesa side has been tested on V3D v4.2 simpenrose (passing
the ES31 tests), and on the kernel side on 7278 (failing atomic
compswap tests in a way that doesn't reproduce on simpenrose).
v2: Fix excessive allocation for the clean_job (reported by Dan
Carpenter). Keep refs on jobs until clean_job is finished, to
avoid spurious MMU errors if the output BOs are freed by userspace
before L2 cleaning is finished.
Signed-off-by: Eric Anholt <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Acked-by: Rob Clark <[email protected]>
|
|
The CL submission had two jobs embedded in an exec struct. When I
added TFU support, I had to replicate some of the exec stuff and some
of the job stuff. As I went to add CSD, it became clear that actually
what was in exec should just be in the two CL jobs, and it would let
us share a lot more code between the 4 queues.
v2: Fix missing error path in TFU ioctl's bo[] allocation.
Signed-off-by: Eric Anholt <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Acked-by: Rob Clark <[email protected]>
|
|
We have another thing called the "done fence" that tracks when the
scheduler considers the job done, and having the shared name was
confusing.
Signed-off-by: Eric Anholt <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Reviewed-by: Dave Emett <[email protected]>
|