blaster4385/linux-IllusionX - Linux kernel with personal config changes for arch linux

Age	Commit message (Collapse)	Author	Files	Lines
2023-03-27	drm/scheduler: Fix variable name in function description	Caio Novais	1	-1/+1
	Compiling AMD GPU drivers displays two warnings: drivers/gpu/drm/scheduler/sched_main.c:738: warning: Function parameter or member 'file' not described in 'drm_sched_job_add_syncobj_dependency' drivers/gpu/drm/scheduler/sched_main.c:738: warning: Excess function parameter 'file_private' description in 'drm_sched_job_add_syncobj_dependency' Get rid of them by renaming the variable name on the function description Signed-off-by: Caio Novais <[email protected]> Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Luben Tuikov <[email protected]> Signed-off-by: Luben Tuikov <[email protected]>
2023-03-14	Merge tag 'drm-misc-next-2023-03-07' of ↵	Dave Airlie	1	-0/+29
	git://anongit.freedesktop.org/drm/drm-misc into drm-next drm-misc-next for v6.4-rc1: Note: Only changes since pull request from 2023-02-23 are included here. UAPI Changes: - Convert rockchip bindings to YAML. - Constify kobj_type structure in dma-buf. - FBDEV cmdline parser fixes, and other small fbdev fixes for mode parsing. Cross-subsystem Changes: - Add Neil Armstrong as linaro maintainer. - Actually signal the private stub dma-fence. Core Changes: - Add function for adding syncobj dep to sched_job and use it in panfrost, v3d. - Improve DisplayID 2.0 topology parsing and EDID parsing in general. - Add a gem eviction function and callback for generic GEM shrinker purposes. - Prepare to convert shmem helper to use the GEM reservation lock instead of own locking. (Actual commit itself got reverted for now) - Move the suballocator from radeon and amdgpu drivers to core in preparation for Xe. - Assorted small fixes and documentation. - Fixes to HPD polling. - Assorted small fixes in simpledrm, bridge, accel, shmem-helper, and the selftest of format-helper. - Remove dummy resource when ttm bo is created, and during pipelined gutting. Fix all drivers to accept a NULL ttm_bo->resource. - Handle pinned BO moving prevention in ttm core. - Set drm panel-bridge orientation before connector is registered. - Remove dumb_destroy callback. - Add documentation to GEM_CLOSE, PRIME_HANDLE_TO_FD, PRIME_FD_TO_HANDLE, GETFB2 ioctl's. - Add atomic enable_plane callback, use it in ast, mgag200, tidss. Driver Changes: - Use drm_gem_objects_lookup in vc4. - Assorted small fixes to virtio, ast, bridge/tc358762, meson, nouveau. - Allow virtio KMS to be disabled and compiled out. - Add Radxa 8/10HD, Samsung AMS495QA01 panels. - Fix ivpu compiler errors. - Assorted fixes to drm/panel, malidp, rockchip, ivpu, amdgpu, vgem, nouveau, vc4. - Assorted cleanups, simplifications and fixes to vmwgfx. Signed-off-by: Dave Airlie <[email protected]> From: Maarten Lankhorst <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2023-03-13	Merge drm/drm-next into drm-misc-next	Thomas Zimmermann	1	-0/+6
	Backmerging to get v6.3-rc1 and sync with the other DRM trees. Signed-off-by: Thomas Zimmermann <[email protected]>
2023-02-24	drm/sched: Create wrapper to add a syncobj dependency to job	Maíra Canal	1	-0/+29
	In order to add a syncobj's fence as a dependency to a job, it is necessary to call drm_syncobj_find_fence() to find the fence and then add the dependency with drm_sched_job_add_dependency(). So, wrap these steps in one single function, drm_sched_job_add_syncobj_dependency(). Reviewed-by: Christian König <[email protected]> Reviewed-by: Luben Tuikov <[email protected]> Signed-off-by: Maíra Canal <[email protected]> Signed-off-by: Maíra Canal <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2023-02-09	Merge branch 'etnaviv/next' of https://git.pengutronix.de/git/lst/linux into ↵	Dave Airlie	1	-0/+6
	drm-next This time we've added support for reporting of GPU load via the common fdinfo format, as already supported by multiple other drivers. Improved diagnostic messages for MMU faults. And finally added experimental support for driving the VeriSilicon NPU cores, which are very close relatives to the GPU designs, so close in fact that they can run the same compute instruction set, but with a big NN-fabric/matrix/tensor execution array glued to the side. Signed-off-by: Dave Airlie <[email protected]> From: Lucas Stach <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2023-02-07	drm/scheduler: track GPU active time per entity	Lucas Stach	1	-0/+6
	Track the accumulated time that jobs from this entity were active on the GPU. This allows drivers using the scheduler to trivially implement the DRM fdinfo when the hardware doesn't provide more specific information than signalling job completion anyways. [Bagas: Append missing colon to @elapsed_ns] Signed-off-by: Bagas Sanjaya <[email protected]> Signed-off-by: Lucas Stach <[email protected]> Reviewed-by: Andrey Grodzovsky <[email protected]>
2023-01-18	drm/scheduler: deprecate drm_sched_resubmit_jobs	Christian König	1	-1/+12
	This interface is not working as it should. Signed-off-by: Christian König <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2022-11-24	Merge drm/drm-next into drm-misc-next	Thomas Zimmermann	1	-49/+9
	Backmerging to get v6.1-rc6 into drm-misc-next. Signed-off-by: Thomas Zimmermann <[email protected]>
2022-11-23	drm/scheduler: Fix lockup in drm_sched_entity_kill()	Dmitry Osipenko	1	-2/+2
	The drm_sched_entity_kill() is invoked twice by drm_sched_entity_destroy() while userspace process is exiting or being killed. First time it's invoked when sched entity is flushed and second time when entity is released. This causes a lockup within wait_for_completion(entity_idle) due to how completion API works. Calling wait_for_completion() more times than complete() was invoked is a error condition that causes lockup because completion internally uses counter for complete/wait calls. The complete_all() must be used instead in such cases. This patch fixes lockup of Panfrost driver that is reproducible by killing any application in a middle of 3d drawing operation. Fixes: 2fdb8a8f07c2 ("drm/scheduler: rework entity flush, kill and fini") Signed-off-by: Dmitry Osipenko <[email protected]> Reviewed-by: Christian König <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2022-11-22	Merge tag 'amd-drm-next-6.2-2022-11-18' of ↵	Dave Airlie	1	-49/+9
	https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-6.2-2022-11-18: amdgpu: - SR-IOV fixes - Clean up DC checks - DCN 3.2.x fixes - DCN 3.1.x fixes - Don't enable degamma on asics which don't support it - IP discovery fixes - BACO fixes - Fix vbios allocation handling when vkms is enabled - Drop buggy tdr advanced mode GPU reset handling - Fix the build when DCN is not set in kconfig - MST DSC fixes - Userptr fixes - FRU and RAS EEPROM fixes - VCN 4.x RAS support - Aldrebaran CU occupancy reporting fix - PSP ring cleanup amdkfd: - Memory limit fix - Enable cooperative launch on gfx 10.3 amd-drm-next-6.2-2022-11-11: amdgpu: - SMU 13.x updates - GPUVM TLB race fix - DCN 3.1.4 updates - DCN 3.2.x updates - PSR fixes - Kerneldoc fix - Vega10 fan fix - GPUVM locking fixes in error pathes - BACO fix for Beige Goby - EEPROM I2C address cleanup - GFXOFF fix - Fix DC memory leak in error pathes - Flexible array updates - Mtype fix for GPUVM PTEs - Move Kconfig into amdgpu directory - SR-IOV updates - Fix possible memory leak in CS IOCTL error path amdkfd: - Fix possible memory overrun - CRIU fixes radeon: - ACPI ref count fix - HDA audio notifier support - Move Kconfig into radeon directory UAPI: - Add new GEM_CREATE flags to help to transition more KFD functionality to the DRM UAPI. These are used internally in the driver to align location based memory coherency requirements from memory allocated in the KFD with how we manage GPUVM PTEs. They are currently blocked in the GEM_CREATE IOCTL as we don't have a user right now. They are just used internally in the kernel driver for now for existing KFD memory allocations. So a change to the UAPI header, but no functional change in the UAPI. From: Alex Deucher <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Dave Airlie <[email protected]>
2022-11-15	drm/amdgpu: revert "implement tdr advanced mode"	Christian König	1	-49/+9
	This reverts commit e6c6338f393b74ac0b303d567bb918b44ae7ad75. This feature basically re-submits one job after another to figure out which one was the one causing a hang. This is obviously incompatible with gang-submit which requires that multiple jobs run at the same time. It's also absolutely not helpful to crash the hardware multiple times if a clean recovery is desired. For testing and debugging environments we should rather disable recovery alltogether to be able to inspect the state with a hw debugger. Additional to that the sw implementation is clearly buggy and causes reference count issues for the hardware fence. Signed-off-by: Christian König <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2022-11-03	drm/scheduler: remove drm_sched_dependency_optimized	Christian König	1	-26/+0
	Not used any more. Signed-off-by: Christian König <[email protected]> Reviewed-by: Luben Tuikov <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2022-11-03	drm/scheduler: add drm_sched_job_add_resv_dependencies	Christian König	1	-15/+34
	Add a new function to update job dependencies from a resv obj. Signed-off-by: Christian König <[email protected]> Reviewed-by: Luben Tuikov <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2022-10-25	drm/scheduler: Set the FIFO scheduling policy as the default	Luben Tuikov	1	-2/+2
	The currently default Round-Robin GPU scheduling can result in starvation of entities which have a large number of jobs, over entities which have a very small number of jobs (single digit). This can be illustrated in the following diagram, where jobs are alphabetized to show their chronological order of arrival, where job A is the oldest, B is the second oldest, and so on, to J, the most recent job to arrive. ---> entities j \| H-F-----A--E--I-- o \| --G-----B-----J-- b \| --------C-------- s\/ --------D-------- WLOG, assuming all jobs are "ready", then a R-R scheduling will execute them in the following order (a slice off of the top of the entities' list), H, F, A, E, I, G, B, J, C, D. However, to mitigate job starvation, we'd rather execute C and D before E, and so on, given, of course, that they're all ready to be executed. So, if all jobs are ready at this instant, the order of execution for this and the next 9 instances of picking the next job to execute, should really be, A, B, C, D, E, F, G, H, I, J, which is their chronological order. The only reason for this order to be broken, is if an older job is not yet ready, but a younger job is ready, at an instant of picking a new job to execute. For instance if job C wasn't ready at time 2, but job D was ready, then we'd pick job D, like this: 0 +1 +2 ... A, B, D, ... And from then on, C would be preferred before all other jobs, if it is ready at the time when a new job for execution is picked. So, if C became ready two steps later, the execution order would look like this: ......0 +1 +2 ... A, B, D, E, C, F, G, H, I, J This is what the FIFO GPU scheduling algorithm achieves. It uses a Red-Black tree to keep jobs sorted in chronological order, where picking the oldest job is O(1) (we use the "cached" structure), and balancing the tree is O(log n). IOW, it picks the oldest ready job to execute now. The implementation is already in the kernel, and this commit only changes the default GPU scheduling algorithm to use. This was tested and achieves about 1% faster performance over the Round Robin algorithm. Cc: Christian König <[email protected]> Cc: Alex Deucher <[email protected]> Cc: Direct Rendering Infrastructure - Development <[email protected]> Signed-off-by: Luben Tuikov <[email protected]> Reviewed-by: Christian König <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Christian König <[email protected]>
2022-10-18	Merge drm/drm-next into drm-misc-next	Maxime Ripard	1	-4/+3
	Let's kick-off this release cycle. Signed-off-by: Maxime Ripard <[email protected]>
2022-10-07	Revert "drm/sched: Use parent fence instead of finished"	Dave Airlie	1	-2/+2
	This reverts commit e4dc45b1848bc6bcac31eb1b4ccdd7f6718b3c86. This is causing instability on Linus' desktop, and I'm seeing oops with VK CTS runs. netconsole got me the following oops: [ 1234.778760] BUG: kernel NULL pointer dereference, address: 0000000000000088 [ 1234.778782] #PF: supervisor read access in kernel mode [ 1234.778787] #PF: error_code(0x0000) - not-present page [ 1234.778791] PGD 0 P4D 0 [ 1234.778798] Oops: 0000 [#1] PREEMPT SMP NOPTI [ 1234.778803] CPU: 7 PID: 805 Comm: systemd-journal Not tainted 6.0.0+ #2 [ 1234.778809] Hardware name: System manufacturer System Product Name/PRIME X370-PRO, BIOS 5603 07/28/2020 [ 1234.778813] RIP: 0010:drm_sched_job_done.isra.0+0xc/0x140 [gpu_sched] [ 1234.778828] Code: aa 0f 1d ce e9 57 ff ff ff 48 89 d7 e8 9d 8f 3f ce e9 4a ff ff ff 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 54 55 53 48 89 fb <48> 8b af 88 00 00 00 f0 ff 8d f0 00 00 00 48 8b 85 80 01 00 00 f0 [ 1234.778834] RSP: 0000:ffffabe680380de0 EFLAGS: 00010087 [ 1234.778839] RAX: ffffffffc04e9230 RBX: 0000000000000000 RCX: 0000000000000018 [ 1234.778897] RDX: 00000ba278e8977a RSI: ffff953fb288b460 RDI: 0000000000000000 [ 1234.778901] RBP: ffff953fb288b598 R08: 00000000000000e0 R09: ffff953fbd98b808 [ 1234.778905] R10: 0000000000000000 R11: ffffabe680380ff8 R12: ffffabe680380e00 [ 1234.778908] R13: 0000000000000001 R14: 00000000ffffffff R15: ffff953fbd9ec458 [ 1234.778912] FS: 00007f35e7008580(0000) GS:ffff95428ebc0000(0000) knlGS:0000000000000000 [ 1234.778916] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1234.778919] CR2: 0000000000000088 CR3: 000000010147c000 CR4: 00000000003506e0 [ 1234.778924] Call Trace: [ 1234.778981] <IRQ> [ 1234.778989] dma_fence_signal_timestamp_locked+0x6a/0xe0 [ 1234.778999] dma_fence_signal+0x2c/0x50 [ 1234.779005] amdgpu_fence_process+0xc8/0x140 [amdgpu] [ 1234.779234] sdma_v3_0_process_trap_irq+0x70/0x80 [amdgpu] [ 1234.779395] amdgpu_irq_dispatch+0xa9/0x1d0 [amdgpu] [ 1234.779609] amdgpu_ih_process+0x80/0x100 [amdgpu] [ 1234.779783] amdgpu_irq_handler+0x1f/0x60 [amdgpu] [ 1234.779940] __handle_irq_event_percpu+0x46/0x190 [ 1234.779946] handle_irq_event+0x34/0x70 [ 1234.779949] handle_edge_irq+0x9f/0x240 [ 1234.779954] __common_interrupt+0x66/0x100 [ 1234.779960] common_interrupt+0xa0/0xc0 [ 1234.779965] </IRQ> [ 1234.779968] <TASK> [ 1234.779971] asm_common_interrupt+0x22/0x40 [ 1234.779976] RIP: 0010:finish_mkwrite_fault+0x22/0x110 [ 1234.779981] Code: 1f 84 00 00 00 00 00 90 0f 1f 44 00 00 41 55 41 54 55 48 89 fd 53 48 8b 07 f6 40 50 08 0f 84 eb 00 00 00 48 8b 45 30 48 8b 18 <48> 89 df e8 66 bd ff ff 48 85 c0 74 0d 48 89 c2 83 e2 01 48 83 ea [ 1234.779985] RSP: 0000:ffffabe680bcfd78 EFLAGS: 00000202 Revert it for now and figure it out later. Signed-off-by: Dave Airlie <[email protected]>
2022-10-05	drm/sched: add missing NULL check in drm_sched_get_cleanup_job v2	Christian König	1	-2/+3
	Otherwise we would crash if the job is not resubmitted. v2: fix second usage of s_fence->parent as well. Signed-off-by: Christian König <[email protected]> Reviewed-by: Steven Price <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2022-09-30	drm/sched: Add FIFO sched policy to run queue	Andrey Grodzovsky	1	-3/+93
	When many entities are competing for the same run queue on the same scheduler, we observe an unusually long wait times and some jobs get starved. This has been observed on GPUVis. The issue is due to the Round Robin policy used by schedulers to pick up the next entity's job queue for execution. Under stress of many entities and long job queues within entity some jobs could be stuck for very long time in it's entity's queue before being popped from the queue and executed while for other entities with smaller job queues a job might execute earlier even though that job arrived later then the job in the long queue. Fix: Add FIFO selection policy to entities in run queue, chose next entity on run queue in such order that if job on one entity arrived earlier then job on another entity the first job will start executing earlier regardless of the length of the entity's job queue. v2: Switch to rb tree structure for entities based on TS of oldest job waiting in the job queue of an entity. Improves next entity extraction to O(1). Entity TS update O(log N) where N is the number of entities in the run-queue Drop default option in module control parameter. v3: Various cosmetical fixes and minor refactoring of fifo update function. (Luben) v4: Switch drm_sched_rq_select_entity_fifo to in order search (Luben) v5: Fix up drm_sched_rq_select_entity_fifo loop (Luben) v6: Add missing drm_sched_rq_remove_fifo_locked v7: Fix ts sampling bug and more cosmetic stuff (Luben) v8: Fix module parameter string (Luben) Cc: Luben Tuikov <[email protected]> Cc: Christian König <[email protected]> Cc: Direct Rendering Infrastructure - Development <[email protected]> Cc: AMD Graphics <[email protected]> Signed-off-by: Andrey Grodzovsky <[email protected]> Tested-by: Yunxiang Li (Teddy) <[email protected]> Signed-off-by: Luben Tuikov <[email protected]> Reviewed-by: Luben Tuikov <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2022-09-16	drm/sched: Use parent fence instead of finished	Arvind Yadav	1	-2/+2
	Using the parent fence instead of the finished fence to get the job status. This change is to avoid GPU scheduler timeout error which can cause GPU reset. Signed-off-by: Arvind Yadav <[email protected]> Reviewed-by: Andrey Grodzovsky <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Christian König <[email protected]>
2022-09-06	drm/scheduler: quieten kernel-doc warnings	Randy Dunlap	1	-1/+2
	Fix kernel-doc warnings in gpu_scheduler.h and sched_main.c. Quashes these warnings: include/drm/gpu_scheduler.h:332: warning: missing initial short description on line: * struct drm_sched_backend_ops include/drm/gpu_scheduler.h:412: warning: missing initial short description on line: * struct drm_gpu_scheduler include/drm/gpu_scheduler.h:461: warning: Function parameter or member 'dev' not described in 'drm_gpu_scheduler' drivers/gpu/drm/scheduler/sched_main.c:201: warning: missing initial short description on line: * drm_sched_dependency_optimized drivers/gpu/drm/scheduler/sched_main.c:995: warning: Function parameter or member 'dev' not described in 'drm_sched_init' Fixes: 2d33948e4e00 ("drm/scheduler: add documentation") Fixes: 8ab62eda177b ("drm/sched: Add device pointer to drm_gpu_scheduler") Fixes: 542cff7893a3 ("drm/sched: Avoid lockdep spalt on killing a processes") Signed-off-by: Randy Dunlap <[email protected]> Cc: David Airlie <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: Andrey Grodzovsky <[email protected]> Cc: Nayan Deshmukh <[email protected]> Cc: Alex Deucher <[email protected]> Cc: Christian König <[email protected]> Cc: Jiawei Gu <[email protected]> Cc: [email protected] Acked-by: Christian König <[email protected]> Signed-off-by: Andrey Grodzovsky <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2022-07-19	drm/sched: move calling drm_sched_entity_select_rq	Christian König	1	-2/+1
	We already discussed that the call to drm_sched_entity_select_rq() needs to move to drm_sched_job_arm() to be able to set a new scheduler list between _init() and _arm(). This was just not applied for some reason. Signed-off-by: Christian König <[email protected]> Reviewed-by: Andrey Grodzovsky <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2022-06-28	drm/sched: Partial revert of 'drm/sched: Keep s_fence->parent pointer'	Andrey Grodzovsky	1	-3/+10
	Problem: This patch caused negative refcount as described in [1] because for that case parent fence did not signal by the time of drm_sched_stop and hence kept in pending list the assumption was they will not signal and so fence was put to account for the s_fence->parent refcount but for amdgpu which has embedded HW fence (always same parent fence) drm_sched_fence_release_scheduled was always called and would still drop the count for parent fence once more. For jobs that never signaled this imbalance was masked by refcount bug in amdgpu_fence_driver_clear_job_fences that would not drop refcount on the fences that were removed from fence drive fences array (against prevois insertion into the array in get in amdgpu_fence_emit). Fix: Revert this patch and by setting s_job->s_fence->parent to NULL as before prevent the extra refcount drop in amdgpu when drm_sched_fence_release_scheduled is called on job release. Also - align behaviour in drm_sched_resubmit_jobs_ext with that of drm_sched_main when submitting jobs - take a refcount for the new parent fence pointer and drop refcount for original kref_init for new HW fence creation (or fake new HW fence in amdgpu - see next patch). [1] - https://lore.kernel.org/all/[email protected]/t/#r00c728fcc069b1276642c325bfa9d82bf8fa21a3 Signed-off-by: Andrey Grodzovsky <[email protected]> Tested-by: Yiqing Yao <[email protected]> Acked-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2022-04-07	dma-buf: add enum dma_resv_usage v4	Christian König	1	-1/+2
	This change adds the dma_resv_usage enum and allows us to specify why a dma_resv object is queried for its containing fences. Additional to that a dma_resv_usage_rw() helper function is added to aid retrieving the fences for a read or write userspace submission. This is then deployed to the different query functions of the dma_resv object and all of their users. When the write paratermer was previously true we now use DMA_RESV_USAGE_WRITE and DMA_RESV_USAGE_READ otherwise. v2: add KERNEL/OTHER in separate patch v3: some kerneldoc suggestions by Daniel v4: some more kerneldoc suggestions by Daniel, fix missing cases lost in the rebase pointed out by Bas. Signed-off-by: Christian König <[email protected]> Reviewed-by: Daniel Vetter <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2022-04-04	drm/sched: Check locking in drm_sched_job_add_implicit_dependencies	Daniel Vetter	1	-0/+2
	You really need to hold the reservation here or all kinds of funny things can happen between grabbing the dependencies and inserting the new fences. v2: Fix commit summary (Christian) Acked-by: Melissa Wen <[email protected]> Reviewed-by: "Christian König" <[email protected]> Signed-off-by: Daniel Vetter <[email protected]> Cc: "Christian König" <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: Luben Tuikov <[email protected]> Cc: Andrey Grodzovsky <[email protected]> Cc: Alex Deucher <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2022-02-23	drm/sched: Add device pointer to drm_gpu_scheduler	Jiawei Gu	1	-4/+5
	Add device pointer so scheduler's printing can use DRM_DEV_ERROR() instead, which makes life easier under multiple GPU scenario. v2: amend all calls of drm_sched_init() v3: fill dev pointer for all drm_sched_init() calls Signed-off-by: Jiawei Gu <[email protected]> Reviewed-by: Andrey Grodzovsky <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Christian König <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2021-11-17	drm/scheduler: fix drm_sched_job_add_implicit_dependencies harder	Rob Clark	1	-4/+5
	drm_sched_job_add_dependency() could drop the last ref, so we need to do the dma_fence_get() first. Cc: Christian König <[email protected]> Fixes: 9c2ba265352a ("drm/scheduler: use new iterator in drm_sched_job_add_implicit_dependencies v2") Signed-off-by: Rob Clark <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Reviewed-by: Daniel Vetter <[email protected]> Reviewed-by: Christian König <[email protected]> Tested-by: Amit Pundir <[email protected]> Signed-off-by: Christian König <[email protected]>
2021-11-16	drm/scheduler: fix drm_sched_job_add_implicit_dependencies	Christian König	1	-0/+3
	Trivial fix since we now need to grab a reference to the fence we have added. Previously the dma_resv function where doing that for us. Signed-off-by: Christian König <[email protected]> Fixes: 9c2ba265352a ("drm/scheduler: use new iterator in drm_sched_job_add_implicit_dependencies v2") Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Reviewed-by: Daniel Vetter <[email protected]> Reported-by: Nicolas Frattaroli <[email protected]> References: https://lore.kernel.org/dri-devel/2023306.UmlnhvANQh@archbook/ Tested-by: Nicolas Frattaroli <[email protected]> Tested-by: Yassine Oudjana <[email protected]>
2021-10-07	drm/scheduler: use new iterator in drm_sched_job_add_implicit_dependencies v2	Christian König	1	-20/+6
	Simplifying the code a bit. v2: use dma_resv_for_each_fence Signed-off-by: Christian König <[email protected]> Reviewed-by: Daniel Vetter <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2021-09-15	drm/sched: fix the bug of time out calculation(v4)	Monk Liu	1	-17/+9
	issue: in cleanup_job the cancle_delayed_work will cancel a TO timer even the its corresponding job is still running. fix: do not cancel the timer in cleanup_job, instead do the cancelling only when the heading job is signaled, and if there is a "next" job we start_timeout again. v2: further cleanup the logic, and do the TDR timer cancelling if the signaled job is the last one in its scheduler. v3: change the issue description remove the cancel_delayed_work in the begining of the cleanup_job recover the implement of drm_sched_job_begin. v4: remove the kthread_should_park() checking in cleanup_job routine, we should cleanup the signaled job asap TODO: 1)introduce pause/resume scheduler in job_timeout to serial the handling of scheduler and job_timeout. 2)drop the bad job's del and insert in scheduler due to above serialization (no race issue anymore with the serialization) Tested-by: jingwen <jingwen.chen@@amd.com> Signed-off-by: Monk Liu <[email protected]> Signed-off-by: Andrey Grodzovsky <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2021-09-07	drm/sched: Fix drm_sched_fence_free() so it can be passed an uninitialized fence	Boris Brezillon	1	-1/+1
	drm_sched_job_cleanup() will pass an uninitialized fence to drm_sched_fence_free(), which will cause to_drm_sched_fence() to return a NULL fence object, causing a NULL pointer deref when this NULL object is passed to kmem_cache_free(). Let's create a new drm_sched_fence_free() function that takes a drm_sched_fence pointer and suffix the old function with _rcu. While at it, complain if drm_sched_fence_free() is passed an initialized fence or if drm_sched_fence_free_rcu() is passed an uninitialized fence. Fixes: dbe48d030b28 ("drm/sched: Split drm_sched_job_init") Signed-off-by: Boris Brezillon <[email protected]> Reviewed-by: Daniel Vetter <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2021-08-30	drm/sched: Add dependency tracking	Daniel Vetter	1	-0/+104
	Instead of just a callback we can just glue in the gem helpers that panfrost, v3d and lima currently use. There's really not that many ways to skin this cat. v2/3: Rebased. v4: Repaint this shed. The functions are now called _add_dependency() and _add_implicit_dependency() Reviewed-by: Christian König <[email protected]> Reviewed-by: Boris Brezillon <[email protected]> (v3) Reviewed-by: Steven Price <[email protected]> (v1) Acked-by: Melissa Wen <[email protected]> Signed-off-by: Daniel Vetter <[email protected]> Cc: David Airlie <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: Sumit Semwal <[email protected]> Cc: "Christian König" <[email protected]> Cc: Andrey Grodzovsky <[email protected]> Cc: Lee Jones <[email protected]> Cc: Nirmoy Das <[email protected]> Cc: Boris Brezillon <[email protected]> Cc: Luben Tuikov <[email protected]> Cc: Alex Deucher <[email protected]> Cc: [email protected] Cc: [email protected] Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2021-08-30	drm/sched: Split drm_sched_job_init	Daniel Vetter	1	-11/+58
	This is a very confusingly named function, because not just does it init an object, it arms it and provides a point of no return for pushing a job into the scheduler. It would be nice if that's a bit clearer in the interface. But the real reason is that I want to push the dependency tracking helpers into the scheduler code, and that means drm_sched_job_init must be called a lot earlier, without arming the job. v2: - don't change .gitignore (Steven) - don't forget v3d (Emma) v3: Emma noticed that I leak the memory allocated in drm_sched_job_init if we bail out before the point of no return in subsequent driver patches. To be able to fix this change drm_sched_job_cleanup() so it can handle being called both before and after drm_sched_job_arm(). Also improve the kerneldoc for this. v4: - Fix the drm_sched_job_cleanup logic, I inverted the booleans, as usual (Melissa) - Christian pointed out that drm_sched_entity_select_rq() also needs to be moved into drm_sched_job_arm, which made me realize that the job->id definitely needs to be moved too. Shuffle things to fit between job_init and job_arm. v5: Reshuffle the split between init/arm once more, amdgpu abuses drm_sched.ready to signal gpu reset failures. Also document this somewhat. (Christian) v6: Rebase on top of the msm drm/sched support. Note that the drm_sched_job_init() call is completely misplaced, and hence also the split-out drm_sched_entity_push_job(). I've put in a FIXME which the next patch will address. v7: Drop the FIXME in msm, after discussions with Rob I agree it shouldn't be a problem where it is now. Acked-by: Christian König <[email protected]> Acked-by: Melissa Wen <[email protected]> Cc: Melissa Wen <[email protected]> Acked-by: Emma Anholt <[email protected]> Acked-by: Steven Price <[email protected]> (v2) Reviewed-by: Boris Brezillon <[email protected]> (v5) Signed-off-by: Daniel Vetter <[email protected]> Cc: Lucas Stach <[email protected]> Cc: Russell King <[email protected]> Cc: Christian Gmeiner <[email protected]> Cc: Qiang Yu <[email protected]> Cc: Rob Herring <[email protected]> Cc: Tomeu Vizoso <[email protected]> Cc: Steven Price <[email protected]> Cc: Alyssa Rosenzweig <[email protected]> Cc: David Airlie <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: Sumit Semwal <[email protected]> Cc: "Christian König" <[email protected]> Cc: Masahiro Yamada <[email protected]> Cc: Kees Cook <[email protected]> Cc: Adam Borowski <[email protected]> Cc: Nick Terrell <[email protected]> Cc: Mauro Carvalho Chehab <[email protected]> Cc: Paul Menzel <[email protected]> Cc: Sami Tolvanen <[email protected]> Cc: Viresh Kumar <[email protected]> Cc: Alex Deucher <[email protected]> Cc: Dave Airlie <[email protected]> Cc: Nirmoy Das <[email protected]> Cc: Deepak R Varma <[email protected]> Cc: Lee Jones <[email protected]> Cc: Kevin Wang <[email protected]> Cc: Chen Li <[email protected]> Cc: Luben Tuikov <[email protected]> Cc: "Marek Olšák" <[email protected]> Cc: Dennis Li <[email protected]> Cc: Maarten Lankhorst <[email protected]> Cc: Andrey Grodzovsky <[email protected]> Cc: Sonny Jiang <[email protected]> Cc: Boris Brezillon <[email protected]> Cc: Tian Tao <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: Emma Anholt <[email protected]> Cc: Rob Clark <[email protected]> Cc: Sean Paul <[email protected]> Cc: [email protected] Cc: [email protected] Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2021-07-01	drm/sched: Allow using a dedicated workqueue for the timeout/fault tdr	Boris Brezillon	1	-5/+9
	Mali Midgard/Bifrost GPUs have 3 hardware queues but only a global GPU reset. This leads to extra complexity when we need to synchronize timeout works with the reset work. One solution to address that is to have an ordered workqueue at the driver level that will be used by the different schedulers to queue their timeout work. Thanks to the serialization provided by the ordered workqueue we are guaranteed that timeout handlers are executed sequentially, and can thus easily reset the GPU from the timeout handler without extra synchronization. v5: * Add a new paragraph to the timedout_job() method v3: * New patch v4: * Actually use the timeout_wq to queue the timeout work Suggested-by: Daniel Vetter <[email protected]> Signed-off-by: Boris Brezillon <[email protected]> Reviewed-by: Steven Price <[email protected]> Reviewed-by: Lucas Stach <[email protected]> Acked-by: Daniel Vetter <[email protected]> Acked-by: Christian König <[email protected]> Cc: Qiang Yu <[email protected]> Cc: Emma Anholt <[email protected]> Cc: Alex Deucher <[email protected]> Cc: "Christian König" <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2021-06-28	drm/sched: Declare entity idle only after HW submission	Boris Brezillon	1	-3/+4
	The panfrost driver tries to kill in-flight jobs on FD close after destroying the FD scheduler entities. For this to work properly, we need to make sure the jobs popped from the scheduler entities have been queued at the HW level before declaring the entity idle, otherwise we might iterate over a list that doesn't contain those jobs. Suggested-by: Lucas Stach <[email protected]> Signed-off-by: Boris Brezillon <[email protected]> Cc: Lucas Stach <[email protected]> Reviewed-by: Steven Price <[email protected]> Reviewed-by: Lucas Stach <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2021-05-19	drm/scheduler: Fix hang when sched_entity released	Andrey Grodzovsky	1	-0/+24
	Problem: If scheduler is already stopped by the time sched_entity is released and entity's job_queue not empty I encountred a hang in drm_sched_entity_flush. This is because drm_sched_entity_is_idle never becomes false. Fix: In drm_sched_fini detach all sched_entities from the scheduler's run queues. This will satisfy drm_sched_entity_is_idle. Also wakeup all those processes stuck in sched_entity flushing as the scheduler main thread which wakes them up is stopped by now. v2: Reverse order of drm_sched_rq_remove_entity and marking s_entity as stopped to prevent reinserion back to rq due to race. v3: Drop drm_sched_rq_remove_entity, only modify entity->stopped and check for it in drm_sched_entity_is_idle Signed-off-by: Andrey Grodzovsky <[email protected]> Reviewed-by: Christian König <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2021-05-19	drm/sched: Make timeout timer rearm conditional.	Andrey Grodzovsky	1	-4/+7
	We don't want to rearm the timer if driver hook reports that the device is gone. v5: Update drm_gpu_sched_stat values in code. Signed-off-by: Andrey Grodzovsky <[email protected]> Reviewed-by: Christian König <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2021-05-05	drm/scheduler: Change scheduled fence track v2	Roy Sun	1	-1/+8
	Update the timestamp of scheduled fence on HW completion of the previous fences This allow more accurate tracking of the fence execution in HW v2 (chk): drop the flag check and improve the comment Signed-off-by: David M Nieto <[email protected]> Signed-off-by: Roy Sun <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Christian König <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2021-04-09	drm/amd/amdgpu implement tdr advanced mode	Jack Zhang	1	-32/+71
	[Why] Previous tdr design treats the first job in job_timeout as the bad job. But sometimes a later bad compute job can block a good gfx job and cause an unexpected gfx job timeout because gfx and compute ring share internal GC HW mutually. [How] This patch implements an advanced tdr mode.It involves an additinal synchronous pre-resubmit step(Step0 Resubmit) before normal resubmit step in order to find the real bad job. 1. At Step0 Resubmit stage, it synchronously submits and pends for the first job being signaled. If it gets timeout, we identify it as guilty and do hw reset. After that, we would do the normal resubmit step to resubmit left jobs. 2. For whole gpu reset(vram lost), do resubmit as the old way. v2: squash in build fix (Alex) Signed-off-by: Jack Zhang <[email protected]> Reviewed-by: Andrey Grodzovsky <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-02-05	drm/scheduler: provide scheduler score externally	Christian König	1	-9/+9
	Allow multiple schedulers to share the load balancing score. This is useful when one engine has different hw rings. Signed-off-by: Christian König <[email protected]> Reviewed-and-Tested-by: Leo Liu <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2021-01-29	drm/scheduler: Job timeout handler returns status (v3)	Luben Tuikov	1	-3/+1
	This patch does not change current behaviour. The driver's job timeout handler now returns status indicating back to the DRM layer whether the device (GPU) is no longer available, such as after it's been unplugged, or whether all is normal, i.e. current behaviour. All drivers which make use of the drm_sched_backend_ops' .timedout_job() callback have been accordingly renamed and return the would've-been default value of DRM_GPU_SCHED_STAT_NOMINAL to restart the task's timeout timer--this is the old behaviour, and is preserved by this patch. v2: Use enum as the status of a driver's job timeout callback method. v3: Return scheduler/device information, rather than task information. Cc: Alexander Deucher <[email protected]> Cc: Andrey Grodzovsky <[email protected]> Cc: Christian König <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: Lucas Stach <[email protected]> Cc: Russell King <[email protected]> Cc: Christian Gmeiner <[email protected]> Cc: Qiang Yu <[email protected]> Cc: Rob Herring <[email protected]> Cc: Tomeu Vizoso <[email protected]> Cc: Steven Price <[email protected]> Cc: Alyssa Rosenzweig <[email protected]> Cc: Eric Anholt <[email protected]> Reported-by: kernel test robot <[email protected]> Signed-off-by: Luben Tuikov <[email protected]> Acked-by: Alyssa Rosenzweig <[email protected]> Acked-by: Christian König <[email protected]> Acked-by: Steven Price <[email protected]> Signed-off-by: Christian König <[email protected]> Link: https://patchwork.freedesktop.org/patch/415095/
2021-01-19	drm/sched: Cancel and flush all outstanding jobs before finish.	Andrey Grodzovsky	1	-0/+3
	To avoid any possible use after free. Signed-off-by: Andrey Grodzovsky <[email protected]> Reviewed-by: Christian König <[email protected]> Link: https://patchwork.freedesktop.org/patch/414814/ CC: [email protected] Signed-off-by: Christian König <[email protected]>
2020-12-15	Merge drm/drm-next into drm-misc-next	Maarten Lankhorst	1	-0/+1
	Required backmerge since we will be based on top of v5.11, and there has been a request to backmerge already to upstream some features. Signed-off-by: Maarten Lankhorst <[email protected]>
2020-12-10	Merge tag 'amd-drm-next-5.11-2020-12-09' of ↵	Dave Airlie	1	-0/+1
	git://people.freedesktop.org/~agd5f/linux into drm-next amd-drm-next-5.11-2020-12-09: amdgpu: - SR-IOV fixes - Navy Flounder updates - Sienna Cichlid updates - Dimgrey Cavefish updates - Vangogh updates - Misc SMU fixes - Misc display fixes - Last big hunk of W=1 warning fixes - Cursor validation fixes - CI BACO updates From: Alex Deucher <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Dave Airlie <[email protected]>
2020-12-08	drm/scheduler: Essentialize the job done callback	Luben Tuikov	1	-33/+40
	The job done callback is called from various places, in two ways: in job done role, and as a fence callback role. Essentialize the callback to an atom function to just complete the job, and into a second function as a prototype of fence callback which calls to complete the job. This is used in latter patches by the completion code. Signed-off-by: Luben Tuikov <[email protected]> Reviewed-by: Christian König <[email protected]> Link: https://patchwork.freedesktop.org/patch/405574/ Cc: Alexander Deucher <[email protected]> Cc: Andrey Grodzovsky <[email protected]> Cc: Christian König <[email protected]> Cc: Daniel Vetter <[email protected]> Signed-off-by: Christian König <[email protected]>
2020-12-08	gpu/drm: ring_mirror_list --> pending_list	Luben Tuikov	1	-17/+17
	Rename "ring_mirror_list" to "pending_list", to describe what something is, not what it does, how it's used, or how the hardware implements it. This also abstracts the actual hardware implementation, i.e. how the low-level driver communicates with the device it drives, ring, CAM, etc., shouldn't be exposed to DRM. The pending_list keeps jobs submitted, which are out of our control. Usually this means they are pending execution status in hardware, but the latter definition is a more general (inclusive) definition. Signed-off-by: Luben Tuikov <[email protected]> Acked-by: Christian König <[email protected]> Link: https://patchwork.freedesktop.org/patch/405573/ Cc: Alexander Deucher <[email protected]> Cc: Andrey Grodzovsky <[email protected]> Cc: Christian König <[email protected]> Cc: Daniel Vetter <[email protected]> Signed-off-by: Christian König <[email protected]>
2020-12-08	drm/scheduler: "node" --> "list"	Luben Tuikov	1	-11/+12
	Rename "node" to "list" in struct drm_sched_job, in order to make it consistent with what we see being used throughout gpu_scheduler.h, for instance in struct drm_sched_entity, as well as the rest of DRM and the kernel. Signed-off-by: Luben Tuikov <[email protected]> Reviewed-by: Christian König <[email protected]> Link: https://patchwork.freedesktop.org/patch/403515/ Cc: Alexander Deucher <[email protected]> Cc: Andrey Grodzovsky <[email protected]> Cc: Christian König <[email protected]> Cc: Daniel Vetter <[email protected]> Signed-off-by: Christian König <[email protected]>
2020-11-16	drm: fix some kernel-doc markups	Mauro Carvalho Chehab	1	-1/+1
	Some identifiers have different names between their prototypes and the kernel-doc markup. Others need to be fixed, as kernel-doc markups should use this format: identifier - description Signed-off-by: Mauro Carvalho Chehab <[email protected]> Acked-by: Jani Nikula <[email protected]> Signed-off-by: Daniel Vetter <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/12d4ca26f6843618200529ce5445063734d38c04.1605521731.git.mchehab+huawei@kernel.org
2020-11-13	gpu: drm: scheduler: sched_main: Provide missing description for 'sched' ↵	Lee Jones	1	-0/+1
	paramter Fixes the following W=1 kernel build warning(s): drivers/gpu/drm/scheduler/sched_main.c:74: warning: Function parameter or member 'sched' not described in 'drm_sched_rq_init' Cc: David Airlie <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: Sumit Semwal <[email protected]> Cc: "Christian König" <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Signed-off-by: Lee Jones <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-09-08	Merge tag 'amd-drm-next-5.10-2020-09-03' of ↵	Dave Airlie	1	-2/+2
	git://people.freedesktop.org/~agd5f/linux into drm-next amd-drm-next-5.10-2020-09-03: amdgpu: - RAS fixes - Sienna Cichlid updates - Navy Flounder updates - DCE6 (SI) support in DC - Enable plane rotation - Rework pre-OS vram reservation handling during driver init - Add standard interface to dump GPU metrics table from SMU - Rework tiling and tmz state handling in atomic commits - Pstate fixes - Add voltage and power hwmon interfaces for renoir - SW CTF fixes - S/G display fix for Raven - Print client strings for vmfaults for vega and newer - Manual fan control fixes - Display updates - Reorg power management directory structure - Misc bug fixes - Misc code cleanups amdkfd: - Topology fixes - Add SMI events for thermal throttling and GPU resets radeon: - switch from pci_* to dma_* for dma allocations - PLL fix Scheduler: - Clean up priority levels UAPI: - amdgpu INFO IOCTL query update for TMZ state https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6049 - amdkfd SMI event interface updates https://github.com/RadeonOpenCompute/rocm_smi_lib/tree/therm_thrott From: Alex Deucher <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2020-08-18	drm/scheduler: Scheduler priority fixes (v2)	Luben Tuikov	1	-2/+2
	Remove DRM_SCHED_PRIORITY_LOW, as it was used in only one place. Rename and separate by a line DRM_SCHED_PRIORITY_MAX to DRM_SCHED_PRIORITY_COUNT as it represents a (total) count of said priorities and it is used as such in loops throughout the code. (0-based indexing is the the count number.) Remove redundant word HIGH in priority names, and rename KERNEL to HIGH, as it really means that, high. v2: Add back KERNEL and remove SW and HW, in lieu of a single HIGH between NORMAL and KERNEL. Signed-off-by: Luben Tuikov <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>