blaster4385/linux-IllusionX - Linux kernel with personal config changes for arch linux

Age	Commit message (Collapse)	Author	Files	Lines
2024-04-22	Merge tag 'amd-drm-next-6.10-2024-04-19' of ↵	Dave Airlie	26	-198/+316
	https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-6.10-2024-04-19: amdgpu: - DC resource allocation logic updates - DC IPS fixes - DC YUV fixes - DMCUB fixes - DML2 fixes - Devcoredump updates - USB-C DSC fix - Misc display code cleanups - PSR fixes - MES timeout fix - RAS updates - UAF fix in VA IOCTL - Fix visible VRAM handling during faults - Fix IP discovery handling during PCI rescans - Misc code cleanups - PSP 14 updates - More runtime PM code rework - SMU 14.0.2 support - GPUVM page fault redirection to secondary IH rings for IH 6.x - Suspend/resume fixes - SR-IOV fixes amdkfd: - Fix eviction fence handling - Fix leak in GPU memory allocation failure case - DMABuf import handling fix radeon: - Silence UBSAN warnings related to flexible arrays Signed-off-by: Dave Airlie <[email protected]> From: Alex Deucher <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2024-04-18	drm/amdkfd: make sure VM is ready for updating operations	Lang Yu	1	-14/+20
	When page table BOs were evicted but not validated before updating page tables, VM is still in evicting state, amdgpu_vm_update_range returns -EBUSY and restore_process_worker runs into a dead loop. v2: Split the BO validation and page table update into two separate loops in amdgpu_amdkfd_restore_process_bos. (Felix) 1.Validate BOs 2.Validate VM (and DMABuf attachments) 3.Update page tables for the BOs validated above Fixes: 50661eb1a2c8 ("drm/amdgpu: Auto-validate DMABuf imports in compute VMs") Signed-off-by: Lang Yu <[email protected]> Acked-by: Christian König <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-18	drm/amdgpu: Fix leak when GPU memory allocation fails	Mukul Joshi	1	-0/+1
	Free the sync object if the memory allocation fails for any reason. Signed-off-by: Mukul Joshi <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-18	drm/amdgpu: remove virt_init_data_exchange from poison consumption handler	Zhigang Luo	1	-2/+0
	Host will initiate an FLR for all poison consumption. Guest should wait for FLR message to re-init data exchange. Signed-off-by: Zhigang Luo <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-18	drm/amdgpu: Change AID detection logic	Lijo Lazar	1	-2/+4
	On GFX 9.4.3 SOCs, only 2 SDMA instances need to be available to be considered as a valid AID. Signed-off-by: Lijo Lazar <[email protected]> Reviewed-by: Asad Kamal <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-18	drm/amdgpu: enable redirection of irq's for IH V6.1	Sunil Khatri	1	-0/+15
	Enable redirection of irq for pagefaults for specific clients to avoid overflow without dropping interrupts. So here we redirect the interrupts to another IH ring i.e ring1 where only these interrupts are processed. Signed-off-by: Sunil Khatri <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-18	drm/amdgpu: Skip the coredump collection on reset during driver reload	Ahmad Rehman	3	-2/+7
	In passthrough environment, the driver triggers the mode-1 reset on reload. The reset causes the core dump collection which is delayed task and prevents driver from unloading until it is completed. Since we do not need to collect data on "reset on reload" case, we can skip core dump collection. v2: Use the same flag to avoid calling amdgpu_reset_reg_dumps as well. Signed-off-by: Ahmad Rehman <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-18	drm/amdgpu: enable redirection of irq's for IH V6.0	Sunil Khatri	1	-0/+15
	Enable redirection of irq for pagefaults for specific clients to avoid overflow without dropping interrupts. So here we redirect the interrupts to another IH ring i.e ring1 where only these interrupts are processed. Reviewed-by: Christian König <[email protected]> Signed-off-by: Sunil Khatri <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-18	drm:amdgpu: enable IH ring1 for IH v6.1	Sunil Khatri	1	-2/+9
	We need IH ring1 for handling the pagefault interrupts which over flow in default ring for specific usecases. Signed-off-by: Sunil Khatri <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-18	drm:amdgpu: enable IH RB ring1 for IH v6.0	Sunil Khatri	1	-2/+9
	We need IH ring1 for handling the pagefault interrupts which are overflowing the default ring for specific usecases. Signed-off-by: Sunil Khatri <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-17	drm/amdgpu: fix visible VRAM handling during faults	Christian König	5	-57/+53
	When we removed the hacky start code check we actually didn't took into account that all VRAM pages needs to be CPU accessible. Clean up the code and unify the handling into a single helper which checks if the whole resource is CPU accessible. The only place where a partial check would make sense is during eviction, but that is neglitible. Signed-off-by: Christian König <[email protected]> Fixes: aed01a68047b ("drm/amdgpu: Remove TTM resource->start visible VRAM condition v2") Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]> CC: [email protected]
2024-04-17	drm/amdgpu: validate the parameters of bo mapping operations more clearly	xinhui pan	1	-26/+46
	Verify the parameters of amdgpu_vm_bo_(map/replace_map/clearing_mappings) in one common place. Fixes: dc54d3d1744d ("drm/amdgpu: implement AMDGPU_VA_OP_CLEAR v2") Cc: [email protected] Reported-by: Vlad Stolyarov <[email protected]> Suggested-by: Christian König <[email protected]> Signed-off-by: xinhui pan <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-17	drm/amdgpu: remove invalid resource->start check v2	Christian König	1	-4/+0
	The majority of those where removed in the commit aed01a68047b ("drm/amdgpu: Remove TTM resource->start visible VRAM condition v2") But this one was missed because it's working on the resource and not the BO. Since we also no longer use a fake start address for visible BOs this will now trigger invalid mapping errors. v2: also remove the unused variable Signed-off-by: Christian König <[email protected]> Fixes: aed01a68047b ("drm/amdgpu: Remove TTM resource->start visible VRAM condition v2") CC: [email protected] Acked-by: Pierre-Eric Pelloux-Prayer <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-17	Merge tag 'amd-drm-next-6.10-2024-04-13' of ↵	Dave Airlie	70	-768/+1654
	https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-6.10-2024-04-13: amdgpu: - HDCP fixes - ODM fixes - RAS fixes - Devcoredump improvements - Misc code cleanups - Expose VCN activity via sysfs - SMY 13.0.x updates - Enable fast updates on DCN 3.1.4 - Add dclk and vclk reporting on additional devices - Add ACA RAS infrastructure - Implement TLB flush fence - EEPROM handling fixes - SMUIO 14.0.2 support - SMU 14.0.1 Updates - Sync page table freeing with TLB flushes - DML2 refactor - DC debug improvements - SR-IOV fixes - Suspend and Resume fixes - DCN 3.5.x Updates - Z8 fixes - UMSCH fixes - GPU reset fixes - HDP fix for second GFX pipe on GC 10.x - Enable secondary GFX pipe on GC 10.3 - Refactor and clean up BACO/BOCO/BAMACO handling - VCN partitioning fix - DC DWB fixes - VSC SDP fixes - DCN 3.1.6 fix - GC 11.5 fixes - Remove invalid TTM resource start check - DCN 1.0 fixes amdkfd: - MQD handling cleanup - Preemption handling fixes for XCDs - TLB flush fix for GC 9.4.2 - Properly clean up workqueue during module unload - Fix memory leak process create failure - Range check CP bad op exception targets to avoid reporting invalid exceptions to userspace radeon: - Misc code cleanups From: Alex Deucher <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Dave Airlie <[email protected]>
2024-04-16	drm/amd/swsmu: support smu block discovery for smu v14	Kenneth Feng	1	-0/+2
	Support for smu ip block add for SMU v14. Signed-off-by: Kenneth Feng <[email protected]> Reviewed-by: Likun Gao <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-16	drm/amdgpu: rename DBG_DRV to HAD_DRV for psp v14	Hawking Zhang	2	-1/+3
	Add a psp bl command enum for HAD_DRV. Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Likun Gao <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-16	drm/amdgpu: refactoring the runtime pm mode detection code	Ma Jun	3	-47/+77
	refactor the code of runtime pm mode detection to support amdgpu_runtime_pm =2 and 1 two cases Signed-off-by: Ma Jun <[email protected]> Reviewed-by: Yang Wang <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-16	drm/amdgpu: Load ipkeymgr drv for psp v14	Hawking Zhang	4	-1/+28
	while DBG_DRV is renamed to HAD_DRV for psp v14, part of its APIs/functionality is moved to a new component named Ipkeymgr_Drv. Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Likun Gao <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-16	drm/amdgpu: Add missing space to DRM_WARN() message	Thorsten Blum	1	-1/+1
	s/,please/, please/ Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Thorsten Blum <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-16	drm/amdgpu: Fix discovery initialization failure during pci rescan	Ma Jun	1	-11/+6
	Waiting for system ready to fix the discovery initialization failure issue. This failure usually occurs when dGPU is removed and then rescanned via command line. It's caused by following two errors: [1] vram size is 0 [2] wrong binary signature Signed-off-by: Ma Jun <[email protected]> Acked-by: Christian König <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-16	drm/amdgpu: fix visible VRAM handling during faults	Christian König	5	-57/+53
	When we removed the hacky start code check we actually didn't took into account that all VRAM pages needs to be CPU accessible. Clean up the code and unify the handling into a single helper which checks if the whole resource is CPU accessible. The only place where a partial check would make sense is during eviction, but that is neglitible. Signed-off-by: Christian König <[email protected]> Fixes: aed01a68047b ("drm/amdgpu: Remove TTM resource->start visible VRAM condition v2") Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]> CC: [email protected]
2024-04-16	drm/amdgpu: validate the parameters of bo mapping operations more clearly	xinhui pan	1	-26/+46
	Verify the parameters of amdgpu_vm_bo_(map/replace_map/clearing_mappings) in one common place. Fixes: dc54d3d1744d ("drm/amdgpu: implement AMDGPU_VA_OP_CLEAR v2") Cc: [email protected] Reported-by: Vlad Stolyarov <[email protected]> Suggested-by: Christian König <[email protected]> Signed-off-by: xinhui pan <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-16	drm/amdgpu: add new aca smu callback func parse_error_code()	Yang Wang	2	-15/+9
	add new aca smu callback parse_error_code{} to avoid specific asic check in amdgpu_aca.c file Signed-off-by: Yang Wang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-16	drm/amdgpu: increase mes submission timeout	Jonathan Kim	1	-1/+1
	MES internally has a timeout allowance of 2 seconds. Increase driver timeout to 3 seconds to be safe. Signed-off-by: Jonathan Kim <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-16	drm/amdgpu: add missing vbios version from devcoredump	Sunil Khatri	1	-4/+5
	Add vbios version in the devcoredump along with formatting the information with proper alignment. Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Sunil Khatri <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-16	drm/amdgpu/gfx11: properly handle regGRBM_GFX_CNTL in soft reset	Alex Deucher	1	-10/+5
	Need to take the srbm_mutex and while we are here, use the helper function soc21_grbm_select(); Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-12	drm/amdgpu: remove invalid resource->start check v2	Christian König	1	-4/+0
	The majority of those where removed in the commit aed01a68047b ("drm/amdgpu: Remove TTM resource->start visible VRAM condition v2") But this one was missed because it's working on the resource and not the BO. Since we also no longer use a fake start address for visible BOs this will now trigger invalid mapping errors. v2: also remove the unused variable Signed-off-by: Christian König <[email protected]> Fixes: aed01a68047b ("drm/amdgpu: Remove TTM resource->start visible VRAM condition v2") CC: [email protected] Acked-by: Pierre-Eric Pelloux-Prayer <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-12	drm/amdgpu/sdma6: set sdma hang watchdog	Jack Xiao	1	-0/+7
	Set SDMAx_WATCHDOG_CNTL.QUEUE_HANG_COUNT registers to improve SDMA reliability. Signed-off-by: Jack Xiao <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-12	drm/amd/amdgpu: Update PF2VF Header	Luqmaan Irshad	1	-1/+3
	Adding a new field for GPU Capacity to align the header with the host. Signed-off-by: Luqmaan Irshad <[email protected]> Acked-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-10	drm/amdgpu: differentiate external rev id for gfx 11.5.0	Yifan Zhang	1	-1/+4
	This patch to differentiate external rev id for gfx 11.5.0. Signed-off-by: Yifan Zhang <[email protected]> Reviewed-by: Tim Huang <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected]
2024-04-09	drm/amdgpu: fix incorrect number of active RBs for gfx11	Tim Huang	1	-1/+1
	The RB bitmap should be global active RB bitmap & active RB bitmap based on active SA. Signed-off-by: Tim Huang <[email protected]> Reviewed-by: Yifan Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected]
2024-04-09	drm/amdgpu: clear set_q_mode_offs when VM changed	ZhenGuo Yin	1	-0/+1
	[Why] set_q_mode_offs don't get cleared after GPU reset, nexting SET_Q_MODE packet to init shadow memory will be skiped, hence there has a page fault. [How] VM flush is needed after GPU reset, clear set_q_mode_offs when emitting VM flush. Fixes: 8bc75586ea01 ("drm/amdgpu: workaround to avoid SET_Q_MODE packets v2") Reviewed-by: Christian König <[email protected]> Signed-off-by: ZhenGuo Yin <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected]
2024-04-09	drm/amdgpu: Fix VCN allocation in CPX partition	Lijo Lazar	1	-4/+11
	VCN need not be shared in CPX mode always for all GFX 9.4.3 SOC SKUs. In certain configs, VCN instance can be exclusively allocated to a partition even under CPX mode. Signed-off-by: Lijo Lazar <[email protected]> Reviewed-by: James Zhu <[email protected]> Reviewed-by: Asad Kamal <[email protected]> Acked-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-09	drm/amd/pm: fix the high voltage issue after unload	Kenneth Feng	1	-11/+15
	fix the high voltage issue after unload on smu 13.0.10 Signed-off-by: Kenneth Feng <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-09	drm/amdgpu: implement IRQ_STATE_ENABLE for SDMA v4.4.2	Tao Zhou	1	-13/+3
	SDMA_CNTL is not set in some cases, driver configures it by itself. v2: simplify code Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-09	drm/amdgpu: add smu 14.0.1 discovery support	Yifan Zhang	1	-0/+1
	This patch to add smu 14.0.1 support Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Yifan Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-09	drm/amdgpu : Increase the mes log buffer size as per new MES FW version	shaoyunl	2	-3/+3
	From MES version 0x54, the log entry increased and require the log buffer size to be increased. The 16k is maximum size agreed Signed-off-by: shaoyunl <[email protected]> Acked-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-09	drm/amdgpu : Add mes_log_enable to control mes log feature	shaoyunl	4	-3/+20
	The MES log might slow down the performance for extra step of log the data, disable it by default and introduce a parameter can enable it when necessary Signed-off-by: shaoyunl <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-09	drm/amdgpu: Reset dGPU if suspend got aborted	Lijo Lazar	1	-0/+25
	For SOC21 ASICs, there is an issue in re-enabling PM features if a suspend got aborted. In such cases, reset the device during resume phase. This is a workaround till a proper solution is finalized. Signed-off-by: Lijo Lazar <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Reviewed-by: Yang Wang <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected]
2024-04-09	drm/amdgpu/umsch: reinitialize write pointer in hw init	Lang Yu	1	-0/+2
	Otherwise the old one will be used during GPU reset. That's not expected. Signed-off-by: Lang Yu <[email protected]> Reviewed-by: Feifei Xu <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected]
2024-04-09	drm/amdgpu: Refine IB schedule error logging	Lijo Lazar	1	-2/+5
	Downgrade to debug information when IBs are skipped. Also, use dev_* to identify the device. Signed-off-by: Lijo Lazar <[email protected]> Reviewed-by: Christian König <[email protected]> Reviewed-by: Asad Kamal <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-09	drm/amdgpu: always force full reset for SOC21	Alex Deucher	1	-2/+0
	There are cases where soft reset seems to succeed, but does not, so always use mode1/2 for now. Reviewed-by: Harish Kasiviswanathan <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected]
2024-04-09	drm/amdgpu: differentiate external rev id for gfx 11.5.0	Yifan Zhang	1	-1/+4
	This patch to differentiate external rev id for gfx 11.5.0. Signed-off-by: Yifan Zhang <[email protected]> Reviewed-by: Tim Huang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-09	drm/amdgpu: fix incorrect number of active RBs for gfx11	Tim Huang	1	-1/+1
	The RB bitmap should be global active RB bitmap & active RB bitmap based on active SA. Signed-off-by: Tim Huang <[email protected]> Reviewed-by: Yifan Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-09	amd/amdgpu: improve VF recover time	Zhigang Luo	3	-1/+3
	1. change AMDGPU_VF2PF_UPDATE_MAX_RETRY_LIMIT from 30 to 5. 2. set fatel error detected flag. Signed-off-by: Zhigang Luo <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-09	drm/amdgpu: Set fatal errror detected flag earlier	Lijo Lazar	1	-13/+28
	In case of fatal errors, set FED status when interrupt is received. Set the flag on other devices in the hive before RAS recovery work. Signed-off-by: Lijo Lazar <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Reviewed-by: Asad Kamal <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-09	drm/amdgpu: clear set_q_mode_offs when VM changed	ZhenGuo Yin	1	-0/+1
	[Why] set_q_mode_offs don't get cleared after GPU reset, nexting SET_Q_MODE packet to init shadow memory will be skiped, hence there has a page fault. [How] VM flush is needed after GPU reset, clear set_q_mode_offs when emitting VM flush. Fixes: 8bc75586ea01 ("drm/amdgpu: workaround to avoid SET_Q_MODE packets v2") Reviewed-by: Christian König <[email protected]> Signed-off-by: ZhenGuo Yin <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-09	drm/amdgpu: retire UMC v12 mca_addr_to_pa	Tao Zhou	3	-161/+7
	RAS TA will handle it, the function is useless. Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-09	drm/amd/amdgpu: support MES command SET_HW_RESOURCE1 in sriov	chongli2	5	-3/+64
	support MES command SET_HW_RESOURCE1 in sriov Signed-off-by: chongli2 <[email protected]> Reviewed-by: Jingwen Chen <[email protected]> Acked-by: Jingwen Chen <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-09	drm/amdgpu: update check condition for XGMI ACA UE	Tao Zhou	1	-1/+3
	Check more possible ext error codes. Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Yang Wang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>