blaster4385/linux-IllusionX - Linux kernel with personal config changes for arch linux

Age	Commit message (Collapse)	Author	Files	Lines
2024-09-02	drm/amdgpu/gfx9.4.3: Implement compute pipe reset	Prike Liang	1	-19/+108
	Implement the compute pipe reset, and the driver will fallback to pipe reset when queue reset fails. The pipe reset only deactivates the queue which is scheduled in the pipe, and meanwhile the MEC pipe will be reset to the firmware _start pointer. So, it seems pipe reset will cost more cycles than the queue reset; therefore, the driver tries to recover by doing queue reset first. Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Prike Liang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-08-30	fs: move FMODE_UNSIGNED_OFFSET to fop_flags	Christian Brauner	1	-0/+1
	This is another flag that is statically set and doesn't need to use up an FMODE_* bit. Move it to ->fop_flags and free up another FMODE_* bit. (1) mem_open() used from proc_mem_operations (2) adi_open() used from adi_fops (3) drm_open_helper(): (3.1) accel_open() used from DRM_ACCEL_FOPS (3.2) drm_open() used from (3.2.1) amdgpu_driver_kms_fops (3.2.2) psb_gem_fops (3.2.3) i915_driver_fops (3.2.4) nouveau_driver_fops (3.2.5) panthor_drm_driver_fops (3.2.6) radeon_driver_kms_fops (3.2.7) tegra_drm_fops (3.2.8) vmwgfx_driver_fops (3.2.9) xe_driver_fops (3.2.10) DRM_GEM_FOPS (3.2.11) DEFINE_DRM_GEM_DMA_FOPS (4) struct memdev sets fmode flags based on type of device opened. For devices using struct mem_fops unsigned offset is used. Mark all these file operations as FOP_UNSIGNED_OFFSET and add asserts into the open helper to ensure that the flag is always set. Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Jeff Layton <[email protected]> Reviewed-by: Jan Kara <[email protected]> Signed-off-by: Christian Brauner <[email protected]>
2024-08-29	drm/amdgpu: always allocate cleared VRAM for GEM allocations	Alex Deucher	1	-0/+3
	This adds allocation latency, but aligns better with user expectations. The latency should improve with the drm buddy clearing patches that Arun has been working on. In addition this fixes the high CPU spikes seen when doing wipe on release. v2: always set AMDGPU_GEM_CREATE_VRAM_CLEARED (Christian) Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3528 Fixes: a68c7eaa7a8f ("drm/amdgpu: Enable clear page functionality") Acked-by: Arunpravin Paneer Selvam <[email protected]> Reviewed-by: Michel Dänzer <[email protected]> (v1) Signed-off-by: Alex Deucher <[email protected]> Cc: Arunpravin Paneer Selvam <[email protected]> Cc: Christian König <[email protected]>
2024-08-29	drm/amdgpu/mes: add mes mapping legacy queue switch	Jack Xiao	4	-20/+43
	For mes11 old firmware has issue to map legacy queue, add a flag to switch mes to map legacy queue. Fixes: f9d8c5c7855d ("drm/amdgpu/gfx: enable mes to map legacy queue support") Reported-by: Andrew Worsley <[email protected]> Link: https://lists.freedesktop.org/archives/amd-gfx/2024-August/112773.html Signed-off-by: Jack Xiao <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-08-29	drm/amdgpu/gfx12: return early in preempt_ib()	Alex Deucher	1	-0/+3
	When MES is enabled KIQ is not available. Return an error when someone uses the debugfs preempt test interface in that case. Acked-by: Srinivasan Shanmugam <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-08-29	drm/amdgpu/gfx11: return early in preempt_ib()	Alex Deucher	1	-0/+3
	When MES is enabled KIQ is not available. Return an error when someone uses the debugfs preempt test interface in that case. Acked-by: Srinivasan Shanmugam <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-08-29	drm/amdgpu: Move the dumping log out of for loop	Sunil Khatri	1	-3/+2
	log message "Dumping IP State Completed" needs to be logged only once when state dumping is complete. Hence moving it out of the for loop. Signed-off-by: Sunil Khatri <[email protected]> Acked-by: Trigger Huang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-08-29	drm/amd/amdgpu: move drain_workqueue before shutdown is set	Victor Zhao	1	-3/+3
	[background] when unloading amdgpu driver right after running a workload, drain_workqueue is causing "Fence fallback timer expired on ring sdma0.0". Under sriov, this issue will cause sriov full access timeout and a reset happening. move drain_workqueue before shutdown is set to allow ih process and before enter full access under sriov to avoid full access time cost. Signed-off-by: Victor Zhao <[email protected]> Reviewed-by: Feifei Xu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-08-29	drm/amdgpu: Do core dump immediately when job tmo	Trigger Huang	1	-1/+67
	Do the coredump immediately after a job timeout to get a closer representation of GPU's error status. V2: This will skip printing vram_lost as the GPU reset is not happened yet (Alex) V3: Unconditionally call the core dump as we care about all the reset functions(soft-recovery and queue reset and full adapter reset, Alex) V4: Do the dump after adev->job_hang = true (Sunil) Signed-off-by: Trigger Huang <[email protected]> Acked-by: Sunil Khatri <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-08-29	drm/amdgpu: skip printing vram_lost if needed	Trigger Huang	3	-14/+15
	The vm lost status can only be obtained after a GPU reset occurs, but sometimes a dev core dump can be happened before GPU reset. So a new argument is added to tell the dev core dump implementation whether to skip printing the vram_lost status in the dump. And this patch is also trying to decouple the core dump function from the GPU reset function, by replacing the argument amdgpu_reset_context with amdgpu_job to specify the context for core dump. V2: Inform user if VRAM lost check is skipped so users don't assume VRAM wasn't lost (Alex) Signed-off-by: Trigger Huang <[email protected]> Suggested-by: Alex Deucher <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-08-29	drm/amdgpu/gfx9: put queue resets behind a debug option	Alex Deucher	3	-0/+14
	Pending extended validation. Reviewed-and-tested-by: Jiadong Zhu <[email protected]> Acked-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-08-29	drm/amdgpu: add experimental resets debug flag	Alex Deucher	2	-0/+7
	Add this flag to enable experimental resets for testing before they are fully validated. Reviewed-and-tested-by: Jiadong Zhu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-08-28	drm/amdgpu: support for gc_info table v1.3	Likun Gao	2	-0/+17
	Add gc_info table v1.3 for IP discovery. Signed-off-by: Likun Gao <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]> (cherry picked from commit 875ff9a7ee8824200885384effa7743892a34ed6)
2024-08-28	drm/amdgpu/gfx12: set UNORD_DISPATCH in compute MQDs	Alex Deucher	1	-1/+1
	This needs to be set to 1 to avoid a potential deadlock in the GC 10.x and newer. On GC 9.x and older, this needs to be set to 0. This can lead to hangs in some mixed graphics and compute workloads. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3575 Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]> (cherry picked from commit 40318a2406bd426c6f4591269669c04e8eda571d)
2024-08-27	Merge tag 'amd-drm-next-6.12-2024-08-26' of ↵	Daniel Vetter	118	-833/+5098
	https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-6.12-2024-08-26: amdgpu: - SDMA devcoredump support - DCN 4.0.1 updates - DC SUBVP fixes - Refactor OPP in DC - Refactor MMHUBBUB in DC - DC DML 2.1 updates - DC FAMS2 updates - RAS updates - GFX12 updates - VCN 4.0.3 updates - JPEG 4.0.3 updates - Enable wave kill (soft recovery) for compute queues - Clean up CP error interrupt handling - Enable CP bad opcode interrupts - VCN 4.x fixes - VCN 5.x fixes - GPU reset fixes - Fix vbios embedded EDID size handling - SMU 14.x updates - Misc code cleanups and spelling fixes - VCN devcoredump support - ISP MFD i2c support - DC vblank fixes - GFX 12 fixes - PSR fixes - Convert vbios embedded EDID to drm_edid - DCN 3.5 updates - DMCUB updates - Cursor fixes - Overdrive support for SMU 14.x - GFX CP padding optimizations - DCC fixes - DSC fixes - Preliminary per queue reset infrastructure - Initial per queue reset support for GFX 9 - Initial per queue reset support for GFX 7, 8 - DCN 3.2 fixes - DP MST fixes - SR-IOV fixes - GFX 9.4.3/4 devcoredump support - Add process isolation framework - Enable process isolation support for GFX 9.4.3/4 - Take IOMMU remapping into account for P2P DMA checks amdkfd: - CRIU fixes - Improved input validation for user queues - HMM fix - Enable process isolation support for GFX 9.4.3/4 - Initial per queue reset support for GFX 9 - Allow users to target recommended SDMA engines radeon: - remove .load and drm_dev_alloc - Fix vbios embedded EDID size handling - Convert vbios embedded EDID to drm_edid - Use GEM references instead of TTM - r100 cp init cleanup - Fix potential overflows in evergreen CS offset tracking UAPI: - KFD support for targetting queues on recommended SDMA engines Proposed userspace: https://github.com/ROCm/ROCR-Runtime/commit/2f588a24065f41c208c3701945e20be746d8faf7 https://github.com/ROCm/ROCR-Runtime/commit/eb30a5bbc7719c6ffcf2d2dd2878bc53a47b3f30 drm/buddy: - Add start address support for trim function From: Alex Deucher <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2024-08-27	Merge v6.11-rc5 into drm-next	Daniel Vetter	29	-259/+488
	amdgpu pr conconflicts due to patches cherry-picked to -fixes, I might as well catch up with a backmerge and handle them all. Plus both misc and intel maintainers asked for a backmerge anyway. Signed-off-by: Daniel Vetter <[email protected]>
2024-08-23	drm/amdkfd: Change kfd/svm page fault drain handling	Xiaogang Chen	4	-6/+7
	When app unmap vm ranges(munmap) kfd/svm starts drain pending page fault and not handle any incoming pages fault of this process until a deferred work item got executed by default system wq. The time period of "not handle page fault" can be long and is unpredicable. That is advese to kfd performance on page faults recovery. This patch uses time stamp of incoming page fault to decide to drop or recover page fault. When app unmap vm ranges kfd records each gpu device's ih ring current time stamp. These time stamps are used at kfd page fault recovery routine. Any page fault happened on unmapped ranges after unmap events is application bug that accesses vm range after unmap. It is not driver work to cover that. By using time stamp of page fault do not need drain page faults at deferred work. So, the time period that kfd does not handle page faults is reduced and can be controlled. Signed-off-by: Xiaogang.Chen <[email protected]> Reviewed-by: Philip Yang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-08-23	drm/amdgpu: support for gc_info table v1.3	Likun Gao	2	-0/+17
	Add gc_info table v1.3 for IP discovery. Signed-off-by: Likun Gao <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-08-23	drm/amdgpu: add list empty check to avoid null pointer issue	Yang Wang	1	-0/+10
	Add list empty check to avoid null pointer issues in some corner cases. - list_for_each_entry_safe() Signed-off-by: Yang Wang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-08-23	drm/amdgpu/gfx12: set UNORD_DISPATCH in compute MQDs	Alex Deucher	1	-1/+1
	This needs to be set to 1 to avoid a potential deadlock in the GC 10.x and newer. On GC 9.x and older, this needs to be set to 0. This can lead to hangs in some mixed graphics and compute workloads. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3575 Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-08-23	drm/amdgpu: Retire query_utcl2_poison_status callback	Hawking Zhang	7	-74/+0
	Driver switches to interrupt source id to identify utcl2 poison event. polling interface is not needed. Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-08-23	drm/amdgpu: Take IOMMU remapping into account for p2p checks	Rahul Jain	1	-8/+35
	when trying to enable p2p the amdgpu_device_is_peer_accessible() checks the condition where address_mask overlaps the aper_base and hence returns 0, due to which the p2p disables for this platform IOMMU should remap the BAR addresses so the device can access them. Hence check if peer_adev is remapping DMA v5: (Felix, Alex) - fixing comment as per Alex feedback - refactor code as per Felix v4: (Alex) - fix the comment and description v3: - remove iommu_remap variable v2: (Alex) - Fix as per review comments - add new function amdgpu_device_check_iommu_remap to check if iommu remap Signed-off-by: Rahul Jain <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-08-20	drm/amdgpu: fix eGPU hotplug regression	Alex Deucher	1	-1/+1
	The driver needs to wait for the on board firmware to finish its initialization before probing the card. Commit 959056982a9b ("drm/amdgpu: Fix discovery initialization failure during pci rescan") switched from using msleep() to using usleep_range() which seems to have caused init failures on some navi1x boards. Switch back to msleep(). Fixes: 959056982a9b ("drm/amdgpu: Fix discovery initialization failure during pci rescan") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3559 Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3500 Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: Ma Jun <[email protected]> (cherry picked from commit c69b07f7bbc905022491c45097923d3487479529) Cc: [email protected] # 6.10.x
2024-08-20	drm/amdgpu: Validate TA binary size	Candice Li	1	-0/+3
	Add TA binary size validation to avoid OOB write. Signed-off-by: Candice Li <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]> (cherry picked from commit c0a04e3570d72aaf090962156ad085e37c62e442) Cc: [email protected]
2024-08-20	drm/amdgpu/sdma5.2: limit wptr workaround to sdma 5.2.1	Alex Deucher	1	-8/+10
	The workaround seems to cause stability issues on other SDMA 5.2.x IPs. Fixes: a03ebf116303 ("drm/amdgpu/sdma5.2: Update wptr registers as well as doorbell") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3556 Acked-by: Ruijing Dong <[email protected]> Signed-off-by: Alex Deucher <[email protected]> (cherry picked from commit 2dc3851ef7d9c5439ea8e9623fc36878f3b40649) Cc: [email protected]
2024-08-20	drm/amdgpu: fixing rlc firmware loading failure issue	Yang Wang	1	-2/+3
	Skip rlc firmware validation to ignore firmware header size mismatch issues. This restores the workaround added in commit 849e133c973c ("drm/amdgpu: Fix the null pointer when load rlc firmware") Fixes: 3af2c80ae2f5 ("drm/amdgpu: refine gfx10 firmware loading") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3551 Signed-off-by: Yang Wang <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]> (cherry picked from commit 89ec85d16eb8110d88c273d1d34f1fe5a70ba8cc)
2024-08-20	drm/amd/gfx11: move the gfx mutex into the caller	Alex Deucher	1	-4/+3
	Otherwise we can fail to drop the software mutex when we fail to take the hardware mutex. Fixes: 76acba7b7f12 ("drm/amdgpu/gfx11: add a mutex for the gfx semaphore") Reported-by: Dan Carpenter <[email protected]> Reviewed-by: Dan Carpenter <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-08-20	drm/amd/amdgpu: allow use kiq to do hdp flush under sriov	Victor Zhao	4	-4/+4
	when use cpu to do page table update under sriov runtime, since mmio access is blocked, kiq has to be used to flush hdp. change WREG32_NO_KIQ to WREG32 to allow kiq. Signed-off-by: Victor Zhao <[email protected]> Reviewed-by: Emily Deng <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-08-20	drm/amdgpu: fix eGPU hotplug regression	Alex Deucher	1	-1/+1
	The driver needs to wait for the on board firmware to finish its initialization before probing the card. Commit 959056982a9b ("drm/amdgpu: Fix discovery initialization failure during pci rescan") switched from using msleep() to using usleep_range() which seems to have caused init failures on some navi1x boards. Switch back to msleep(). Fixes: 959056982a9b ("drm/amdgpu: Fix discovery initialization failure during pci rescan") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3559 Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3500 Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: Ma Jun <[email protected]>
2024-08-20	drm/amdgpu: Validate TA binary size	Candice Li	1	-0/+3
	Add TA binary size validation to avoid OOB write. Signed-off-by: Candice Li <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-08-20	drm/amdgpu: Implement MES Suspend and Resume APIs for GFX11	Mukul Joshi	3	-36/+69
	Add implementation for MES Suspend and Resume APIs to unmap/map all queues for GFX11. Support for GFX12 will be added when the corresponding firmware support is in place. Signed-off-by: Mukul Joshi <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Reviewed-by: Harish Kasiviswanathan <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-08-20	drm/amdgpu/gfx_v9_4_3: Apply Isolation Enforcement to GFX & Compute rings	Srinivasan Shanmugam	2	-0/+6
	This commit applies isolation enforcement to the GFX and Compute rings in the gfx_v9_4_3 module. The commit sets `amdgpu_gfx_enforce_isolation_ring_begin_use` and `amdgpu_gfx_enforce_isolation_ring_end_use` as the functions to be called when a ring begins and ends its use, respectively. `amdgpu_gfx_enforce_isolation_ring_begin_use` is called when a ring begins its use. This function cancels any scheduled `enforce_isolation_work` and, if necessary, signals the Kernel Fusion Driver (KFD) to stop the runqueue. `amdgpu_gfx_enforce_isolation_ring_end_use` is called when a ring ends its use. This function schedules `enforce_isolation_work` to be run after a delay. These functions are part of the Enforce Isolation Handler, which enforces shader isolation on AMD GPUs to prevent data leakage between different processes. The commit also includes a check for the type of the ring. If the type of the ring is `AMDGPU_RING_TYPE_COMPUTE`, the `xcp_id` of the `enforce_isolation` structure in the `gfx` structure of the `amdgpu_device` is set to the `xcp_id` of the ring. This ensures that the correct `xcp_id` is used when enforcing isolation on compute rings. The `xcp_id` is an identifier for an XCP partition, and different rings can be associated with different XCP partitions. Cc: Christian König <[email protected]> Cc: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Signed-off-by: Srinivasan Shanmugam <[email protected]>
2024-08-20	drm/amdgpu/gfx9: Apply Isolation Enforcement to GFX & Compute rings	Srinivasan Shanmugam	1	-0/+6
	This commit applies isolation enforcement to the GFX and Compute rings in the gfx_v9_0 module. The commit sets `amdgpu_gfx_enforce_isolation_ring_begin_use` and `amdgpu_gfx_enforce_isolation_ring_end_use` as the functions to be called when a ring begins and ends its use, respectively. `amdgpu_gfx_enforce_isolation_ring_begin_use` is called when a ring begins its use. This function cancels any scheduled `enforce_isolation_work` and, if necessary, signals the Kernel Fusion Driver (KFD) to stop the runqueue. `amdgpu_gfx_enforce_isolation_ring_end_use` is called when a ring ends its use. This function schedules `enforce_isolation_work` to be run after a delay. These functions are part of the Enforce Isolation Handler, which enforces shader isolation on AMD GPUs to prevent data leakage between different processes. Cc: Christian König <[email protected]> Cc: Alex Deucher <[email protected]> Signed-off-by: Srinivasan Shanmugam <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Suggested-by: Christian König <[email protected]>
2024-08-20	drm/amdgpu: Implement Enforce Isolation Handler for KGD/KFD serialization	Srinivasan Shanmugam	4	-0/+200
	This commit introduces the Enforce Isolation Handler designed to enforce shader isolation on AMD GPUs, which helps to prevent data leakage between different processes. The handler counts the number of emitted fences for each GFX and compute ring. If there are any fences, it schedules the `enforce_isolation_work` to be run after a delay of `GFX_SLICE_PERIOD`. If there are no fences, it signals the Kernel Fusion Driver (KFD) to resume the runqueue. The function is synchronized using the `enforce_isolation_mutex`. This commit also introduces a reference count mechanism (kfd_sch_req_count) to keep track of the number of requests to enable the KFD scheduler. When a request to enable the KFD scheduler is made, the reference count is decremented. When the reference count reaches zero, a delayed work is scheduled to enforce isolation after a delay of GFX_SLICE_PERIOD. When a request to disable the KFD scheduler is made, the function first checks if the reference count is zero. If it is, it cancels the delayed work for enforcing isolation and checks if the KFD scheduler is active. If the KFD scheduler is active, it sends a request to stop the KFD scheduler and sets the KFD scheduler state to inactive. Then, it increments the reference count. The function is synchronized using the kfd_sch_mutex to ensure that the KFD scheduler state and reference count are updated atomically. Cc: Christian König <[email protected]> Cc: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Signed-off-by: Srinivasan Shanmugam <[email protected]> Suggested-by: Christian König <[email protected]> Suggested-by: Alex Deucher <[email protected]>
2024-08-20	drm/amdkfd: APIs to stop/start KFD scheduling	Amber Lin	2	-0/+32
	Provide amdgpu_amdkfd_stop_sched() for amdgpu to stop KFD scheduling compute work on HIQ. amdgpu_amdkfd_start_sched() resumes the scheduling. When amdgpu_amdkfd_stop_sched is called, KFD will unmap queues from runlist. If users send ioctls to KFD to create queues, they'll be added but those queues won't be mapped to runlist (so not scheduled) until amdgpu_amdkfd_start_sched is called. v2: fix build (Alex) Signed-off-by: Amber Lin <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-08-20	drm/amdgpu/gfx9: Add cleaner shader support for GFX9.4.4 hardware	Srinivasan Shanmugam	1	-0/+1
	This commit extends the cleaner shader feature to support GFX9.4.4 hardware. The cleaner shader feature is used to clear or initialize certain GPU resources, such as Local Data Share (LDS), Vector General Purpose Registers (VGPRs), and Scalar General Purpose Registers (SGPRs). This operation needs to be performed in isolation, while no other tasks should be running on the GPU at the same time. Previously, the cleaner shader feature was implemented for GFX9.4.3 hardware. This commit adds support for GFX9.4.4 hardware by allowing the cleaner shader to be used with this hardware version. Cc: Christian König <[email protected]> Cc: Alex Deucher <[email protected]> Signed-off-by: Srinivasan Shanmugam <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-08-20	drm/amdgpu/gfx9: Add cleaner shader for GFX9.4.3	Srinivasan Shanmugam	2	-0/+191
	This commit adds the cleaner shader microcode for GFX9.4.3 GPUs. The cleaner shader is a piece of GPU code that is used to clear or initialize certain GPU resources, such as Local Data Share (LDS), Vector General Purpose Registers (VGPRs), and Scalar General Purpose Registers (SGPRs). Clearing these resources is important for ensuring data isolation between different workloads running on the GPU. Without the cleaner shader, residual data from a previous workload could potentially be accessed by a subsequent workload, leading to data leaks and incorrect computation results. The cleaner shader microcode is represented as an array of 32-bit words (`gfx_9_4_3_cleaner_shader_hex`). This array is the binary representation of the cleaner shader code, which is written in a low-level GPU instruction set. When the cleaner shader feature is enabled, the AMDGPU driver loads this array into a specific location in the GPU memory. The GPU then reads this memory location to fetch and execute the cleaner shader instructions. The cleaner shader is executed automatically by the GPU at the end of each workload, before the next workload starts. This ensures that all GPU resources are in a clean state before the start of each workload. This addition is part of the cleaner shader feature implementation. The cleaner shader feature helps improve GPU performance and resource utilization by cleaning up GPU resources after they are used. It also enhances security and reliability by preventing data leaks between workloads. v2: fix copyright date (Alex) Cc: Christian König <[email protected]> Cc: Alex Deucher <[email protected]> Signed-off-by: Srinivasan Shanmugam <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-08-20	drm/amdgpu/gfx9: Implement cleaner shader support for GFX9.4.3 hardware	Srinivasan Shanmugam	2	-5/+78
	The patch modifies the gfx_v9_4_3_kiq_set_resources function to write the cleaner shader's memory controller address to the ring buffer. It also adds a new function, gfx_v9_4_3_ring_emit_cleaner_shader, which emits the PACKET3_RUN_CLEANER_SHADER packet to the ring buffer. This patch adds support for the PACKET3_RUN_CLEANER_SHADER packet in the gfx_v9_4_3 module. This packet is used to emit the cleaner shader, which is used to clear GPU memory before it's reused, helping to prevent data leakage between different processes. Finally, the patch updates the ring function structures to include the new gfx_v9_4_3_ring_emit_cleaner_shader function. This allows the cleaner shader to be emitted as part of the ring's operations. Cc: Christian König <[email protected]> Cc: Alex Deucher <[email protected]> Signed-off-by: Srinivasan Shanmugam <[email protected]> Suggested-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-08-20	drm/amdgpu/gfx9: Implement cleaner shader support for GFX9 hardware	Srinivasan Shanmugam	2	-5/+66
	The patch modifies the gfx_v9_0_kiq_set_resources function to write the cleaner shader's memory controller address to the ring buffer. It also adds a new function, gfx_v9_0_ring_emit_cleaner_shader, which emits the PACKET3_RUN_CLEANER_SHADER packet to the ring buffer. This patch adds support for the PACKET3_RUN_CLEANER_SHADER packet in the gfx_v9_0 module. This packet is used to emit the cleaner shader, which is used to clear GPU memory before it's reused, helping to prevent data leakage between different processes. Finally, the patch updates the ring function structures to include the new gfx_v9_0_ring_emit_cleaner_shader function. This allows the cleaner shader to be emitted as part of the ring's operations. Cc: Christian König <[email protected]> Cc: Alex Deucher <[email protected]> Signed-off-by: Srinivasan Shanmugam <[email protected]> Suggested-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-08-20	drm/amdgpu: Add PACKET3_RUN_CLEANER_SHADER for cleaner shader execution	Srinivasan Shanmugam	1	-0/+4
	This commit adds the PACKET3_RUN_CLEANER_SHADER definition. This packet is a command packet used to instruct the GPU to execute the cleaner shader. The cleaner shader is a piece of GPU code that is used to clear or initialize certain GPU resources, such as Local Data Share (LDS), Vector General Purpose Registers (VGPRs), and Scalar General Purpose Registers (SGPRs). Clearing these resources is important for ensuring data isolation between different workloads running on the GPU. The PACKET3_RUN_CLEANER_SHADER packet is used to trigger the execution of the cleaner shader on the GPU. The packet consists of a header followed by a RESERVED field, which is programmed to zero. When the GPU receives this packet, it fetches and executes the cleaner shader instructions from the location specified in the packet. The cleaner shader feature helps to enhances security and reliability by preventing data leaks between workloads. Cc: Christian König <[email protected]> Cc: Alex Deucher <[email protected]> Signed-off-by: Srinivasan Shanmugam <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-08-20	drm/amdgpu: Add sysfs interface for running cleaner shader	Srinivasan Shanmugam	1	-0/+134
	This patch adds a new sysfs interface for running the cleaner shader on AMD GPUs. The cleaner shader is used to clear GPU memory before it's reused, which can help prevent data leakage between different processes. The new sysfs file is write-only and is named `run_cleaner_shader`. Write the number of the partition to this file to trigger the cleaner shader on that partition. There is only one partition on GPUs which do not support partitioning. Changes made in this patch: - Added `amdgpu_set_run_cleaner_shader` function to handle writes to the `run_cleaner_shader` sysfs file. - Added `run_cleaner_shader` to the list of device attributes in `amdgpu_device_attrs`. - Updated `default_attr_update` to handle `run_cleaner_shader`. - Added `AMDGPU_DEVICE_ATTR_WO` macro to create write-only device attributes. v2: fix error handling (Alex) Cc: Christian König <[email protected]> Cc: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Signed-off-by: Srinivasan Shanmugam <[email protected]>
2024-08-20	drm/amdgpu: Add enforce_isolation sysfs attribute	Srinivasan Shanmugam	4	-0/+107
	This commit adds a new sysfs attribute 'enforce_isolation' to control the 'enforce_isolation' setting per GPU. The attribute can be read and written, and accepts values 0 (disabled) and 1 (enabled). When 'enforce_isolation' is enabled, reserved VMIDs are allocated for each ring. When it's disabled, the reserved VMIDs are freed. The set function locks a mutex before changing the 'enforce_isolation' flag and the VMIDs, and unlocks it afterwards. This ensures that these operations are atomic and prevents race conditions and other concurrency issues. Cc: Christian König <[email protected]> Cc: Alex Deucher <[email protected]> Signed-off-by: Srinivasan Shanmugam <[email protected]> Suggested-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-08-20	drm/amdgpu: Enforce isolation as part of the job	Srinivasan Shanmugam	3	-1/+6
	This patch adds a new parameter 'enforce_isolation' to the amdgpu_job structure. This parameter is used to determine whether shader isolation should be enforced for a job. The enforce_isolation parameter is then stored in the amdgpu_job structure and used when flushing the VM. The enforce_isolation field of the amdgpu_job structure is set directly after the job is allocated This change allows more fine-grained control over shader isolation, making it possible to enforce isolation on a per-job basis rather than globally. This can be useful in scenarios where only certain jobs require isolation. Cc: Christian König <[email protected]> Cc: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Signed-off-by: Srinivasan Shanmugam <[email protected]> Suggested-by: Christian König <[email protected]>
2024-08-16	drm/amdgpu: abort KIQ waits when there is a pending reset	Victor Skvortsov	2	-1/+8
	Stop waiting for the KIQ to return back when there is a reset pending. It's quite likely that the KIQ will never response. Signed-off-by: Koenig Christian <[email protected]> Suggested-by: Lazar Lijo <[email protected]> Tested-by: Victor Skvortsov <[email protected]> Signed-off-by: Victor Skvortsov <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-08-16	drm/amdgpu: Make enforce_isolation setting per GPU	Srinivasan Shanmugam	5	-8/+21
	This commit makes enforce_isolation setting to be per GPU and per partition by adding the enforce_isolation array to the adev structure. The adev variable is set based on the global enforce_isolation module parameter during device initialization. In amdgpu_ids.c, the adev->enforce_isolation value for the current GPU is used to determine whether to enforce isolation between graphics and compute processes on that GPU. In amdgpu_ids.c, the adev->enforce_isolation value for the current GPU and partition is used to determine whether to enforce isolation between graphics and compute processes on that GPU and partition. This allows the enforce_isolation setting to be controlled individually for each GPU and each partition, which is useful in a system with multiple GPUs and partitions where different isolation settings might be desired for different GPUs and partitions. v2: fix loop in amdgpu_vmid_mgr_init() (Alex) Cc: Christian König <[email protected]> Cc: Alex Deucher <[email protected]> Signed-off-by: Srinivasan Shanmugam <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Suggested-by: Christian König <[email protected]>
2024-08-16	drm/amdgpu: Emit cleaner shader at end of IB submission	Alex Deucher	2	-0/+6
	This commit introduces the emission of a cleaner shader at the end of the IB submission process. This is achieved by adding a new function pointer, `emit_cleaner_shader`, to the `amdgpu_ring_funcs` structure. If the `emit_cleaner_shader` function is set in the ring functions, it is called during the VM flush process. The cleaner shader is only emitted if the `enable_cleaner_shader` flag is set in the `amdgpu_device` structure. This allows the cleaner shader emission to be controlled on a per-device basis. By emitting a cleaner shader at the end of the IB submission, we can ensure that the VM state is properly cleaned up after each submission. Cc: Christian König <[email protected]> Cc: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Signed-off-by: Srinivasan Shanmugam <[email protected]> Suggested-by: Christian König <[email protected]>
2024-08-16	drm/amdgpu: Add infrastructure for Cleaner Shader feature	Srinivasan Shanmugam	2	-0/+49
	The cleaner shader is used by the CP firmware to clean LDS and GPRs between processes on the CUs. This adds an internal API for GFX IP code to allocate and initialize the cleaner shader. Cc: Christian König <[email protected]> Cc: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Signed-off-by: Srinivasan Shanmugam <[email protected]> Suggested-by: Christian König <[email protected]>
2024-08-16	drm/amdgpu: handle enforce isolation on non-0 gfxhub	Alex Deucher	1	-1/+1
	Some chips have more than one gfxhub so check if we are a gfxhub rather than just gfxhub 0. Acked-by: Srinivasan Shanmugam <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-08-16	drm/amdgpu/sdma5.2: limit wptr workaround to sdma 5.2.1	Alex Deucher	1	-8/+10
	The workaround seems to cause stability issues on other SDMA 5.2.x IPs. Fixes: a03ebf116303 ("drm/amdgpu/sdma5.2: Update wptr registers as well as doorbell") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3556 Acked-by: Ruijing Dong <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-08-16	drm/amdgpu: add vcn ip dump support for vcn_v2_6	Sunil Khatri	1	-2/+2
	Add support for logging the registers in devcoredump buffer for vcn_v2_6. Signed-off-by: Sunil Khatri <[email protected]> Acked-by: Leo Liu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>