blaster4385/linux-IllusionX - Linux kernel with personal config changes for arch linux

Age	Commit message (Collapse)	Author	Files	Lines
2024-09-18	drm/amdgpu: nuke the VM PD/PT shadow handling	Christian König	1	-85/+2
	This was only used as workaround for recovering the page tables after VRAM was lost and is no longer necessary after the function amdgpu_vm_bo_reset_state_machine() started to do the same. Compute never used shadows either, so the only proplematic case left is SVM and that is most likely not recoverable in any way when VRAM is lost. Signed-off-by: Christian König <[email protected]> Acked-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-09-17	drm/amdgpu: Fix missing check pcie_p2p module param	Bob Zhou	1	-1/+1
	The module param pcie_p2p should be checked for kfd p2p feature, so add it. Fixes: 75f0efbc4b3b ("drm/amdgpu: Take IOMMU remapping into account for p2p checks") Signed-off-by: Bob Zhou <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-08-29	drm/amd/amdgpu: move drain_workqueue before shutdown is set	Victor Zhao	1	-3/+3
	[background] when unloading amdgpu driver right after running a workload, drain_workqueue is causing "Fence fallback timer expired on ring sdma0.0". Under sriov, this issue will cause sriov full access timeout and a reset happening. move drain_workqueue before shutdown is set to allow ih process and before enter full access under sriov to avoid full access time cost. Signed-off-by: Victor Zhao <[email protected]> Reviewed-by: Feifei Xu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-08-29	drm/amdgpu: skip printing vram_lost if needed	Trigger Huang	1	-1/+1
	The vm lost status can only be obtained after a GPU reset occurs, but sometimes a dev core dump can be happened before GPU reset. So a new argument is added to tell the dev core dump implementation whether to skip printing the vram_lost status in the dump. And this patch is also trying to decouple the core dump function from the GPU reset function, by replacing the argument amdgpu_reset_context with amdgpu_job to specify the context for core dump. V2: Inform user if VRAM lost check is skipped so users don't assume VRAM wasn't lost (Alex) Signed-off-by: Trigger Huang <[email protected]> Suggested-by: Alex Deucher <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-08-27	Merge tag 'amd-drm-next-6.12-2024-08-26' of ↵	Daniel Vetter	1	-46/+86
	https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-6.12-2024-08-26: amdgpu: - SDMA devcoredump support - DCN 4.0.1 updates - DC SUBVP fixes - Refactor OPP in DC - Refactor MMHUBBUB in DC - DC DML 2.1 updates - DC FAMS2 updates - RAS updates - GFX12 updates - VCN 4.0.3 updates - JPEG 4.0.3 updates - Enable wave kill (soft recovery) for compute queues - Clean up CP error interrupt handling - Enable CP bad opcode interrupts - VCN 4.x fixes - VCN 5.x fixes - GPU reset fixes - Fix vbios embedded EDID size handling - SMU 14.x updates - Misc code cleanups and spelling fixes - VCN devcoredump support - ISP MFD i2c support - DC vblank fixes - GFX 12 fixes - PSR fixes - Convert vbios embedded EDID to drm_edid - DCN 3.5 updates - DMCUB updates - Cursor fixes - Overdrive support for SMU 14.x - GFX CP padding optimizations - DCC fixes - DSC fixes - Preliminary per queue reset infrastructure - Initial per queue reset support for GFX 9 - Initial per queue reset support for GFX 7, 8 - DCN 3.2 fixes - DP MST fixes - SR-IOV fixes - GFX 9.4.3/4 devcoredump support - Add process isolation framework - Enable process isolation support for GFX 9.4.3/4 - Take IOMMU remapping into account for P2P DMA checks amdkfd: - CRIU fixes - Improved input validation for user queues - HMM fix - Enable process isolation support for GFX 9.4.3/4 - Initial per queue reset support for GFX 9 - Allow users to target recommended SDMA engines radeon: - remove .load and drm_dev_alloc - Fix vbios embedded EDID size handling - Convert vbios embedded EDID to drm_edid - Use GEM references instead of TTM - r100 cp init cleanup - Fix potential overflows in evergreen CS offset tracking UAPI: - KFD support for targetting queues on recommended SDMA engines Proposed userspace: https://github.com/ROCm/ROCR-Runtime/commit/2f588a24065f41c208c3701945e20be746d8faf7 https://github.com/ROCm/ROCR-Runtime/commit/eb30a5bbc7719c6ffcf2d2dd2878bc53a47b3f30 drm/buddy: - Add start address support for trim function From: Alex Deucher <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2024-08-23	drm/amdgpu: Take IOMMU remapping into account for p2p checks	Rahul Jain	1	-8/+35
	when trying to enable p2p the amdgpu_device_is_peer_accessible() checks the condition where address_mask overlaps the aper_base and hence returns 0, due to which the p2p disables for this platform IOMMU should remap the BAR addresses so the device can access them. Hence check if peer_adev is remapping DMA v5: (Felix, Alex) - fixing comment as per Alex feedback - refactor code as per Felix v4: (Alex) - fix the comment and description v3: - remove iommu_remap variable v2: (Alex) - Fix as per review comments - add new function amdgpu_device_check_iommu_remap to check if iommu remap Signed-off-by: Rahul Jain <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-08-20	drm/amdgpu: Implement Enforce Isolation Handler for KGD/KFD serialization	Srinivasan Shanmugam	1	-0/+16
	This commit introduces the Enforce Isolation Handler designed to enforce shader isolation on AMD GPUs, which helps to prevent data leakage between different processes. The handler counts the number of emitted fences for each GFX and compute ring. If there are any fences, it schedules the `enforce_isolation_work` to be run after a delay of `GFX_SLICE_PERIOD`. If there are no fences, it signals the Kernel Fusion Driver (KFD) to resume the runqueue. The function is synchronized using the `enforce_isolation_mutex`. This commit also introduces a reference count mechanism (kfd_sch_req_count) to keep track of the number of requests to enable the KFD scheduler. When a request to enable the KFD scheduler is made, the reference count is decremented. When the reference count reaches zero, a delayed work is scheduled to enforce isolation after a delay of GFX_SLICE_PERIOD. When a request to disable the KFD scheduler is made, the function first checks if the reference count is zero. If it is, it cancels the delayed work for enforcing isolation and checks if the KFD scheduler is active. If the KFD scheduler is active, it sends a request to stop the KFD scheduler and sets the KFD scheduler state to inactive. Then, it increments the reference count. The function is synchronized using the kfd_sch_mutex to ensure that the KFD scheduler state and reference count are updated atomically. Cc: Christian König <[email protected]> Cc: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Signed-off-by: Srinivasan Shanmugam <[email protected]> Suggested-by: Christian König <[email protected]> Suggested-by: Alex Deucher <[email protected]>
2024-08-20	drm/amdgpu: Add enforce_isolation sysfs attribute	Srinivasan Shanmugam	1	-0/+2
	This commit adds a new sysfs attribute 'enforce_isolation' to control the 'enforce_isolation' setting per GPU. The attribute can be read and written, and accepts values 0 (disabled) and 1 (enabled). When 'enforce_isolation' is enabled, reserved VMIDs are allocated for each ring. When it's disabled, the reserved VMIDs are freed. The set function locks a mutex before changing the 'enforce_isolation' flag and the VMIDs, and unlocks it afterwards. This ensures that these operations are atomic and prevents race conditions and other concurrency issues. Cc: Christian König <[email protected]> Cc: Alex Deucher <[email protected]> Signed-off-by: Srinivasan Shanmugam <[email protected]> Suggested-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-08-16	drm/amdgpu: Make enforce_isolation setting per GPU	Srinivasan Shanmugam	1	-0/+5
	This commit makes enforce_isolation setting to be per GPU and per partition by adding the enforce_isolation array to the adev structure. The adev variable is set based on the global enforce_isolation module parameter during device initialization. In amdgpu_ids.c, the adev->enforce_isolation value for the current GPU is used to determine whether to enforce isolation between graphics and compute processes on that GPU. In amdgpu_ids.c, the adev->enforce_isolation value for the current GPU and partition is used to determine whether to enforce isolation between graphics and compute processes on that GPU and partition. This allows the enforce_isolation setting to be controlled individually for each GPU and each partition, which is useful in a system with multiple GPUs and partitions where different isolation settings might be desired for different GPUs and partitions. v2: fix loop in amdgpu_vmid_mgr_init() (Alex) Cc: Christian König <[email protected]> Cc: Alex Deucher <[email protected]> Signed-off-by: Srinivasan Shanmugam <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Suggested-by: Christian König <[email protected]>
2024-08-16	drm/amdgpu/gfx11: add a mutex for the gfx semaphore	Alex Deucher	1	-0/+1
	This will be used in more places in the future so add a mutex. Acked-by: Vitaly Prosyak <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-08-13	drm/amdgpu: Disable dpm_enabled flag while VF is in reset	Victor Skvortsov	1	-4/+2
	VFs do not perform HW fini/suspend in FLR, so the dpm_enabled is incorrectly kept enabled. Add interface to disable it in virt_pre_reset call. v2: Made implementation generic for all asics v3: Re-order conditionals so PP_MP1_STATE_FLR is only evaluated on VF Signed-off-by: Victor Skvortsov <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-08-06	drm/amdgpu: report bad status in GPU recovery	Tao Zhou	1	-2/+8
	Instead of printing GPU reset failed. v2: add check for reset_context->src. Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-08-06	drm/amdgpu: update bad state check in GPU recovery	Tao Zhou	1	-1/+1
	Return RMA status without message print. Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-08-06	drm/amdgpu: Clean up the register dump via debugfs list	Sunil Khatri	1	-19/+0
	debugfs register list for dump is cleaned as it have some issues related to proper power state of the IP before register read. Since the above mentioned is removed we no longer want this to be dumped part of the devcoredump and hence removed. Reviewed-by: Christian König <[email protected]> Signed-off-by: Sunil Khatri <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-07-29	Merge drm/drm-next into drm-misc-next	Thomas Zimmermann	1	-81/+108
	Backmerging to get a late RC of v6.10 before moving into v6.11. Signed-off-by: Thomas Zimmermann <[email protected]>
2024-07-27	drm/amdgpu: trigger ip dump before suspend of IP's	Sunil Khatri	1	-30/+30
	Problem: IP dump right now is done post suspend of all IP's which for some IP's could change power state and software state too which we do not want to reflect in the dump as it might not be same at the time of hang. Solution: IP should be dumped as close to the HW state when the GPU was in hung state without trying to reinitialize any resource. Acked-by: Alex Deucher <[email protected]> Signed-off-by: Sunil Khatri <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-07-25	drm/scheduler: remove full_recover from drm_sched_start	Christian König	1	-2/+2
	This was basically just another one of amdgpus hacks. The parameter allowed to restart the scheduler without turning fence signaling on again. That this is absolutely not a good idea should be obvious by now since the fences will then just sit there and never signal. While at it cleanup the code a bit. Signed-off-by: Christian König <[email protected]> Reviewed-by: Matthew Brost <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2024-07-24	drm/amdgpu: skip kfd init if GFX is not ready.	Yifan Zhang	1	-1/+5
	avoid kfd init crash in that case. Signed-off-by: Yifan Zhang <[email protected]> Acked-by: Alex Deucher <[email protected]> Tested-by: Jesse Zhang <[email protected]> Reviewed-by: Jesse Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-07-08	drm/amdgpu: sysfs node disable query error count during gpu reset	YiPeng Chai	1	-1/+2
	Sysfs node disable query error count during gpu reset. Signed-off-by: YiPeng Chai <[email protected]> Reviewed-by: Stanley.Yang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-06-27	drm/amdgpu: process RAS fatal error MB notification	Vignesh Chander	1	-1/+8
	For RAS error scenario, VF guest driver will check mailbox and set fed flag to avoid unnecessary HW accesses. additionally, poll for reset completion message first to avoid accidentally spamming multiple reset requests to host. v2: add another mailbox check for handling case where kfd detects timeout first v3: set host_flr bit and use wait_for_reset Signed-off-by: Vignesh Chander <[email protected]> Reviewed-by: Zhigang Luo <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-06-27	drm/amdgpu: Fix pci state save during mode-1 reset	Lijo Lazar	1	-2/+5
	Cache the PCI state before bus master is disabled. The saved state is later used for other cases like restoring config space after mode-2 reset. Fixes: 5c03e5843e6b ("drm/amdgpu:add smu mode1/2 support for aldebaran") Signed-off-by: Lijo Lazar <[email protected]> Reviewed-by: Feifei Xu <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-06-19	drm/amdgpu: fix using the reserved VMID with gang submit	Christian König	1	-4/+17
	We need to ensure that even when using a reserved VMID that the gang members can still run in parallel. Signed-off-by: Christian König <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-06-14	drm/amdgpu: create amdgpu_ras_in_recovery to simplify code	Tao Zhou	1	-11/+2
	Reduce redundant code and user doesn't need to pay attention to RAS details. Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-06-14	drm/amdgpu: refine gpu_info firmware loading	Yang Wang	1	-5/+4
	refine gpu_info firmware loading Signed-off-by: Yang Wang <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-06-14	drm/amdgpu: fix sriov host flr handler	Yunxiang Li	1	-0/+2
	We send back the ready to reset message before we stop anything. This is wrong. Move it to when we are actually ready for the FLR to happen. In the current state since we take tens of seconds to stop everything, it is very likely that host would give up waiting and reset the GPU before we send ready, so it would be the same as before. But this gets rid of the hack with reset_domain locking and also let us tell how slow ready to reset actually is from the host. The ready to reset speed can be improved later. Signed-off-by: Yunxiang Li <[email protected]> Acked-by: Christian König <[email protected]> Reviewed-by: Emily Deng <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-06-05	drm/amdkfd: add reset cause in gpu pre-reset smi event	Eric Huang	1	-1/+1
	reset cause is requested by customer as additional info for gpu reset smi event. v2: integerate reset sources suggested by Lijo Lazar Signed-off-by: Eric Huang <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-05-29	drm/amdgpu: Add lock around VF RLCG interface	Victor Skvortsov	1	-0/+1
	flush_gpu_tlb may be called from another thread while device_gpu_recover is running. Both of these threads access registers through the VF RLCG interface during VF Full Access. Add a lock around this interface to prevent race conditions between these threads. Signed-off-by: Victor Skvortsov <[email protected]> Reviewed-by: Zhigang Luo <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-05-29	drm/amdgpu: Adjust logic in amdgpu_device_partner_bandwidth()	Alex Deucher	1	-7/+12
	Use current speed/width on devices which don't support dynamic PCIe switching. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3289 Acked-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-05-23	drm/amdgpu: add prints in IP State dump	Sunil Khatri	1	-0/+2
	add prints before and after ip state is dumped. It avoids user to think of system being stuck/hung as dump could take some time after a gpu hang. Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Sunil Khatri <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-05-23	drm/amd/amdgpu: fix the inst passed to amdgpu_virt_rlcg_reg_rw	Victor Zhao	1	-2/+2
	the inst passed to amdgpu_virt_rlcg_reg_rw should be physical instance. Fix the miss matched code. Signed-off-by: Victor Zhao <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-05-17	drm/amdgpu: Check if NBIO funcs are NULL in amdgpu_device_baco_exit	Friedrich Vock	1	-1/+1
	The special case for VM passthrough doesn't check adev->nbio.funcs before dereferencing it. If GPUs that don't have an NBIO block are passed through, this leads to a NULL pointer dereference on startup. Signed-off-by: Friedrich Vock <[email protected]> Fixes: 1bece222eabe ("drm/amdgpu: Clear doorbell interrupt status for Sienna Cichlid") Cc: Alex Deucher <[email protected]> Cc: Christian König <[email protected]> Acked-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-05-13	drm/amdgpu: fix dereference after null check	Jesse Zhang	1	-1/+1
	check the pointer hive before use. Signed-off-by: Jesse Zhang <[email protected]> Reviewed-by: Tim Huang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-05-02	drm/amd/amdgpu: Check tbo resource pointer	Asad Kamal	1	-1/+2
	Validate tbo resource pointer, skip if NULL Signed-off-by: Asad Kamal <[email protected]> Reviewed-by: Christian König <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-05-02	drm/amdgpu: Add gfx v9_4_4 ip block	Hawking Zhang	1	-2/+5
	Add gfx v9_4_4 ip block support Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Le Ma <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-05-02	drm/amdgpu: Move ras resume into SRIOV function	Yunxiang Li	1	-7/+5
	This is part of the reset, move it into the reset function. Signed-off-by: Yunxiang Li <[email protected]> Reviewed-by: Emily Deng <[email protected]> Reviewed-by: Zhigang Luo <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-05-02	drm/amdgpu: Fix amdgpu_device_reset_sriov retry logic	Yunxiang Li	1	-25/+22
	The retry loop for SRIOV reset have refcount and memory leak issue. Depending on which function call fails it can potentially call amdgpu_amdkfd_pre/post_reset different number of times and causes kfd_locked count to be wrong. This will block all future attempts at opening /dev/kfd. The retry loop also leakes resources by calling amdgpu_virt_init_data_exchange multiple times without calling the corresponding fini function. Align with the bare-metal reset path which doesn't have these issues. This means taking the amdgpu_amdkfd_pre/post_reset functions out of the reset loop and calling amdgpu_device_pre_asic_reset each retry which properly free the resources from previous try by calling amdgpu_virt_fini_data_exchange. Signed-off-by: Yunxiang Li <[email protected]> Reviewed-by: Emily Deng <[email protected]> Reviewed-by: Zhigang Luo <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-05-02	drm/amdgpu: Add reset_context flag for host FLR	Yunxiang Li	1	-5/+8
	There are other reset sources that pass NULL as the job pointer, such as amdgpu_amdkfd_reset_work. Therefore, using the job pointer to check if the FLR comes from the host does not work. Add a flag in reset_context to explicitly mark host triggered reset, and set this flag when we receive host reset notification. Signed-off-by: Yunxiang Li <[email protected]> Reviewed-by: Emily Deng <[email protected]> Reviewed-by: Zhigang Luo <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-05-02	drm/amdgpu: Fix two reset triggered in a row	Yunxiang Li	1	-9/+10
	Some times a hang GPU causes multiple reset sources to schedule resets. The second source will be able to trigger an unnecessary reset if they schedule after we call amdgpu_device_stop_pending_resets. Move amdgpu_device_stop_pending_resets to after the reset is done. Since at this point the GPU is supposedly in a good state, any reset scheduled after this point would be a legitimate reset. Remove unnecessary and incorrect checks for amdgpu_in_reset that was kinda serving this purpose. Signed-off-by: Yunxiang Li <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-26	drm/amdgpu: skip ip dump if devcoredump flag is set	Sunil Khatri	1	-6/+7
	Do not dump the ip registers during driver reload in passthrough environment. Signed-off-by: Sunil Khatri <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-26	drm/amdgpu: add a spinlock to wb allocation	Alex Deucher	1	-1/+10
	As we use wb slots more dynamically, we need to lock access to avoid racing on allocation or free. Reviewed-by: Shaoyun.liu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-26	drm/amdgpu: dump ip state before reset for each ip	Sunil Khatri	1	-0/+7
	Invoke the dump_ip_state function for each ip before the asic resets and save the register values for debugging via devcoredump. Signed-off-by: Sunil Khatri <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-18	drm/amdgpu: Skip the coredump collection on reset during driver reload	Ahmad Rehman	1	-2/+5
	In passthrough environment, the driver triggers the mode-1 reset on reload. The reset causes the core dump collection which is delayed task and prevents driver from unloading until it is completed. Since we do not need to collect data on "reset on reload" case, we can skip core dump collection. v2: Use the same flag to avoid calling amdgpu_reset_reg_dumps as well. Signed-off-by: Ahmad Rehman <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-16	drm/amdgpu: refactoring the runtime pm mode detection code	Ma Jun	1	-0/+75
	refactor the code of runtime pm mode detection to support amdgpu_runtime_pm =2 and 1 two cases Signed-off-by: Ma Jun <[email protected]> Reviewed-by: Yang Wang <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-16	drm/amdgpu: Add missing space to DRM_WARN() message	Thorsten Blum	1	-1/+1
	s/,please/, please/ Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Thorsten Blum <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-09	amd/amdgpu: improve VF recover time	Zhigang Luo	1	-0/+1
	1. change AMDGPU_VF2PF_UPDATE_MAX_RETRY_LIMIT from 30 to 5. 2. set fatel error detected flag. Signed-off-by: Zhigang Luo <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-09	drm/amdgpu/pm: Add support for MACO flag checking	Ma Jun	1	-3/+5
	Add support for MACO flag checking. MACO mode only works if BACO is supported. Signed-off-by: Ma Jun <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-09	drm/amd/pm: fix the high voltage issue after unload	Kenneth Feng	1	-11/+15
	fix the high voltage issue after unload on smu 13.0.10 Signed-off-by: Kenneth Feng <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-09	drm/amdgpu: use vm_update_mode=0 as default in sriov for gfx10.3 onwards	Danijel Slivka	1	-0/+7
	Apply this rule to all newer asics in sriov case. For asic with VF MMIO access protection avoid using CPU for VM table updates. CPU pagetable updates have issues with HDP flush as VF MMIO access protection blocks write to BIF_BX_DEV0_EPF0_VF0_HDP_MEM_COHERENCY_FLUSH_CNTL register during sriov runtime. Moved the check to amdgpu_device_init() to ensure it is done after amdgpu_device_ip_early_init() where the IP versions are discovered. Signed-off-by: Danijel Slivka <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-03-22	drm/amd: Flush GFXOFF requests in prepare stage	Mario Limonciello	1	-0/+2
	If the system hasn't entered GFXOFF when suspend starts it can cause hangs accessing GC and RLC during the suspend stage. Cc: <[email protected]> # 6.1.y: 5095d5418193 ("drm/amd: Evict resources during PM ops prepare() callback") Cc: <[email protected]> # 6.1.y: cb11ca3233aa ("drm/amd: Add concept of running prepare_suspend() sequence for IP blocks") Cc: <[email protected]> # 6.1.y: 2ceec37b0e3d ("drm/amd: Add missing kernel doc for prepare_suspend()") Cc: <[email protected]> # 6.1.y: 3a9626c816db ("drm/amd: Stop evicting resources on APUs in suspend") Cc: <[email protected]> # 6.6.y: 5095d5418193 ("drm/amd: Evict resources during PM ops prepare() callback") Cc: <[email protected]> # 6.6.y: cb11ca3233aa ("drm/amd: Add concept of running prepare_suspend() sequence for IP blocks") Cc: <[email protected]> # 6.6.y: 2ceec37b0e3d ("drm/amd: Add missing kernel doc for prepare_suspend()") Cc: <[email protected]> # 6.6.y: 3a9626c816db ("drm/amd: Stop evicting resources on APUs in suspend") Cc: <[email protected]> # 6.1+ Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3132 Fixes: ab4750332dbe ("drm/amdgpu/sdma5.2: add begin/end_use ring callbacks") Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Mario Limonciello <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-03-22	drm/amdgpu: refactor code to split devcoredump code	Sunil Khatri	1	-0/+1
	Refractor devcoredump code into new files since its functionality is expanded further and better to slit and devcoredump to have its own file. v2: Fix the build failure caught by arm compiler of implicit function declaration with #ifdef v3: squash in fix for implicit declaration error Cc: Ivan Lipski <[email protected]> Acked-by: Christian König <[email protected]> Signed-off-by: Sunil Khatri <[email protected]> Signed-off-by: Alex Deucher <[email protected]>