aboutsummaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/amd
AgeCommit message (Collapse)AuthorFilesLines
2023-06-09drm/amdgpu: add RAS POISON interrupt funcs for vcn_v4_0Horatio Zhang1-6/+30
Add ras_poison_irq and functions. And fix the amdgpu_irq_put call trace in vcn_v4_0_hw_fini. [ 44.563572] RIP: 0010:amdgpu_irq_put+0xa4/0xc0 [amdgpu] [ 44.563629] RSP: 0018:ffffb36740edfc90 EFLAGS: 00010246 [ 44.563630] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000 [ 44.563630] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 [ 44.563631] RBP: ffffb36740edfcb0 R08: 0000000000000000 R09: 0000000000000000 [ 44.563631] R10: 0000000000000000 R11: 0000000000000000 R12: ffff954c568e2ea8 [ 44.563631] R13: 0000000000000000 R14: ffff954c568c0000 R15: ffff954c568e2ea8 [ 44.563632] FS: 0000000000000000(0000) GS:ffff954f584c0000(0000) knlGS:0000000000000000 [ 44.563632] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 44.563633] CR2: 00007f028741ba70 CR3: 000000026ca10000 CR4: 0000000000750ee0 [ 44.563633] PKRU: 55555554 [ 44.563633] Call Trace: [ 44.563634] <TASK> [ 44.563634] vcn_v4_0_hw_fini+0x62/0x160 [amdgpu] [ 44.563700] vcn_v4_0_suspend+0x13/0x30 [amdgpu] [ 44.563755] amdgpu_device_ip_suspend_phase2+0x240/0x470 [amdgpu] [ 44.563806] amdgpu_device_ip_suspend+0x41/0x80 [amdgpu] [ 44.563858] amdgpu_device_pre_asic_reset+0xd9/0x4a0 [amdgpu] [ 44.563909] amdgpu_device_gpu_recover.cold+0x548/0xcf1 [amdgpu] [ 44.564006] amdgpu_debugfs_reset_work+0x4c/0x80 [amdgpu] [ 44.564061] process_one_work+0x21f/0x400 [ 44.564062] worker_thread+0x200/0x3f0 [ 44.564063] ? process_one_work+0x400/0x400 [ 44.564064] kthread+0xee/0x120 [ 44.564065] ? kthread_complete_and_exit+0x20/0x20 [ 44.564066] ret_from_fork+0x22/0x30 Suggested-by: Hawking Zhang <[email protected]> Signed-off-by: Horatio Zhang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: add RAS POISON interrupt funcs for vcn_v2_6Horatio Zhang1-4/+21
Add ras_poison_irq and functions. Suggested-by: Hawking Zhang <[email protected]> Signed-off-by: Horatio Zhang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: separate ras irq from vcn instance irq for UVD_POISONHoratio Zhang2-1/+29
Separate vcn RAS poison consumption handling from the instance irq, and register dedicated ras_poison_irq src and funcs for UVD_POISON. v2: - Separate ras irq from vcn instance irq - Improve the subject and code comments v3: - Split the patch into three parts - Improve the code comments Suggested-by: Hawking Zhang <[email protected]> Signed-off-by: Horatio Zhang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: Remove IMU ucode in vf2pfYuanShang2-2/+0
The IMU firmware is loaded on the host side, not the guest. Remove IMU in vf2pf ucode id enum. Signed-off-by: YuanShang <[email protected]> Reviewed-By: Horace Chen <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: fix the memory override in kiq ring structShiwu Zhang1-2/+2
This is introduced by the code merge and will let the adev->gfx.kiq[0].ring struct being overrided Signed-off-by: Shiwu Zhang <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: add the smu_v13_0_6 and gfx_v9_4_3 ip blockShiwu Zhang1-0/+5
Signed-off-by: Shiwu Zhang <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: don't enable secure display on incompatible platformsJesse Zhang1-1/+7
[why] [drm] psp gfx command LOAD_TA(0x1) failed and response status is (0x7) [drm] psp gfx command INVOKE_CMD(0x3) failed and response status is (0x4) amdgpu 0000:04:00.0: amdgpu: Secure display: Generic Failure. [how] don't enable secure display on incompatible platforms Suggested-by: Aaron Liu <[email protected]> Signed-off-by: Jesse zhang <[email protected]> Reviewed-by: Aaron Liu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm:amd:amdgpu: Fix missing buffer object unlock in failure pathSukrut Bellary2-2/+6
smatch warning - 1) drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c:3615 gfx_v9_0_kiq_resume() warn: inconsistent returns 'ring->mqd_obj->tbo.base.resv'. 2) drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c:6901 gfx_v10_0_kiq_resume() warn: inconsistent returns 'ring->mqd_obj->tbo.base.resv'. Signed-off-by: Sukrut Bellary <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amd/display: Fix artifacting on eDP panels when engaging freesync video modeAurabindo Pillai1-0/+2
[Why] When freesync video mode is enabled, switching resolution from native mode to one of the freesync video compatible modes can trigger continous artifacts on some eDP panels when running under KDE. The articating can be seen in the attached bug report. [How] Fix this by restricting updates that require full commit by using the same checks for stream and scaling changes in the the enable pass of dm_update_crtc_state() along with the check for compatible timings for freesync vide mode. Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2162 Fixes: da5e14909776 ("drm/amd/display: Fix hang when skipping modeset") Signed-off-by: Aurabindo Pillai <[email protected]> Reviewed-by: Rodrigo Siqueira <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: Validate VM ioctl flags.Bas Nieuwenhuizen1-0/+4
None have been defined yet, so reject anybody setting any. Mesa sets it to 0 anyway. Signed-off-by: Bas Nieuwenhuizen <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected]
2023-06-09drm/amdgpu: remove unnecessary (void*) conversionsSu Hui6-7/+7
No need cast (void*) to (struct amdgpu_device *). Signed-off-by: Su Hui <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: fix incorrect pcie_gen_mask in passthrough caseTong Liu011-3/+3
[why] Passthrough case is treated as root bus and pcie_gen_mask is set as default value that does not support GEN 3 and GEN 4 for PCIe link speed. So PCIe link speed will be downgraded at smu hw init in passthrough condition [how] Move get pci info after detect virtualization and check if it is passthrough case when set pcie_gen_mask Signed-off-by: Tong Liu01 <[email protected]> Acked-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amd/display: drop unused count variable in create_eml_sink()Hamza Mahfooz1-2/+1
Since, we are only interested in having drm_edid_override_connector_update(), update the value of connector->edid_blob_ptr. We don't care about the return value of drm_edid_override_connector_update() here. So, drop count. Fixes: 550e5d23f147 ("drm/amd/display: assign edid_blob_ptr with edid from debugfs") Reported-by: kernel test robot <[email protected]> Reviewed-by: Rodrigo Siqueira <[email protected]> Signed-off-by: Hamza Mahfooz <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amd/display: drop unused function set_abm_event()Hamza Mahfooz2-14/+0
set_abm_event() is never actually used. So, drop it. Fixes: b8fe56375f78 ("drm/amd/display: Refactor ABM feature") Reported-by: kernel test robot <[email protected]> Reported-by: Tom Rix <[email protected]> Reviewed-by: Rodrigo Siqueira <[email protected]> Signed-off-by: Hamza Mahfooz <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amd/display: drop redundant memset() in get_available_dsc_slices()Hamza Mahfooz1-2/+0
get_available_dsc_slices() returns the number of indices set, and all of the users of get_available_dsc_slices() don't cross the returned bound when iterating over available_slices[]. So, the memset() in get_available_dsc_slices() is redundant and can be dropped. Fixes: 97bda0322b8a ("drm/amd/display: Add DSC support for Navi (v2)") Reported-by: Christophe JAILLET <[email protected]> Reviewed-by: Rodrigo Siqueira <[email protected]> Signed-off-by: Hamza Mahfooz <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: fix S3 issue if MQD in VRAMJack Xiao3-0/+10
1. Need flush HDP for MQD putting in vram 2. Zero out mes MQD Signed-off-by: Jack Xiao <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: Fix warnings in amdgpu _sdma, _ucode.cSrinivasan Shanmugam2-3/+4
Fix below checkpatch warnings: WARNING: Prefer 'unsigned int' to bare use of 'unsigned' WARNING: Comparisons should place the constant on the right side of the test WARNING: Missing a blank line after declarations Cc: Luben Tuikov <[email protected]> Cc: Alex Deucher <[email protected]> Cc: Christian König <[email protected]> Signed-off-by: Srinivasan Shanmugam <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amd/amdgpu: Fix errors & warnings in amdgpu _uvd, _vce.cSrinivasan Shanmugam2-59/+63
Fix below checkpatch errors & warnings: In amdgpu_uvd.c: WARNING: Prefer 'unsigned int' to bare use of 'unsigned' WARNING: Prefer 'unsigned int *' to bare use of 'unsigned *' WARNING: Missing a blank line after declarations WARNING: %Lx is non-standard C, use %llx ERROR: space required before the open parenthesis '(' ERROR: space required before the open brace '{' WARNING: %LX is non-standard C, use %llX WARNING: Block comments use * on subsequent lines +/* multiple fence commands without any stream commands in between can + crash the vcpu so just try to emmit a dummy create/destroy msg to WARNING: Block comments use a trailing */ on a separate line + avoid this */ WARNING: braces {} are not necessary for single statement blocks + for (j = 0; j < adev->uvd.num_enc_rings; ++j) { + fences += amdgpu_fence_count_emitted(&adev->uvd.inst[i].ring_enc[j]); + } In amdgpu_vce.c: WARNING: Prefer 'unsigned int' to bare use of 'unsigned' WARNING: Missing a blank line after declarations WARNING: %Lx is non-standard C, use %llx WARNING: Possible repeated word: 'we' ERROR: space required before the open parenthesis '(' Cc: Alex Deucher <[email protected]> Cc: Christian König <[email protected]> Signed-off-by: Srinivasan Shanmugam <[email protected]> Acked-by: Luben Tuikov <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: perform mode2 reset for sdma fed error on gfx v11_0_3YiPeng Chai3-2/+25
perform mode2 reset for sdma fed error on gfx v11_0_3. Signed-off-by: YiPeng Chai <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amd/amdgpu: Fix errors & warnings in amdgpu_vcn.cSrinivasan Shanmugam1-18/+17
Fix below checkpatch insisted error & warnings: ERROR: space required before the open brace '{' WARNING: braces {} are not necessary for any arm of this statement + if ((type == VCN_ENCODE_RING) && (vcn_config & VCN_BLOCK_ENCODE_DISABLE_MASK)) { [...] + } else if ((type == VCN_DECODE_RING) && (vcn_config & VCN_BLOCK_DECODE_DISABLE_MASK)) { [...] + } else if ((type == VCN_UNIFIED_RING) && (vcn_config & VCN_BLOCK_QUEUE_DISABLE_MASK)) { [...] ERROR: code indent should use tabs where possible WARNING: Prefer 'unsigned int' to bare use of 'unsigned' WARNING: braces {} are not necessary for single statement blocks + for (i = 0; i < adev->vcn.num_enc_rings; ++i) { + fence[j] += amdgpu_fence_count_emitted(&adev->vcn.inst[j].ring_enc[i]); + ERROR: space required before the open parenthesis '(' WARNING: Missing a blank line after declarations WARNING: please, no spaces at the start of a line WARNING: Symbolic permissions 'S_IRUGO' are not preferred. Consider using octal permissions '0444'. Cc: Alex Deucher <[email protected]> Cc: Christian König <[email protected]> Signed-off-by: Srinivasan Shanmugam <[email protected]> Acked-by: Luben Tuikov <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amd/amdgpu: Fix warnings in amdgpu_encoders.cSrinivasan Shanmugam1-6/+7
Fix below checkpatch warnings: WARNING: Missing a blank line after declarations + struct amdgpu_connector *amdgpu_connector = to_amdgpu_connector(connector); + amdgpu_encoder->active_device = amdgpu_encoder->devices & amdgpu_connector->devices; WARNING: Prefer 'unsigned int' to bare use of 'unsigned' Cc: Alex Deucher <[email protected]> Cc: Christian König <[email protected]> Signed-off-by: Srinivasan Shanmugam <[email protected]> Acked-by: Luben Tuikov <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdkfd: fix stack size in svm_range_validate_and_mapAlex Deucher1-23/+33
Allocate large local variable on heap to avoid exceeding the stack size: drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_svm.c: In function ‘svm_range_validate_and_map’: drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_svm.c:1690:1: warning: the frame size of 2360 bytes is larger than 2048 bytes [-Wframe-larger-than=] Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amd/amdgpu: Fix errors & warnings in amdgpu_ttm.cSrinivasan Shanmugam1-12/+13
Fix below checkpatch insisted error & warnings: ERROR: Macros with complex values should be enclosed in parentheses WARNING: Prefer 'unsigned int' to bare use of 'unsigned' WARNING: braces {} are not necessary for single statement blocks WARNING: Block comments use a trailing */ on a separate line WARNING: Missing a blank line after declarations Cc: Alex Deucher <[email protected]> Cc: Christian König <[email protected]> Signed-off-by: Srinivasan Shanmugam <[email protected]> Acked-by: Luben Tuikov <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu/vcn4: fix endian conversionAlex Deucher1-1/+1
sq.is_enabled is a byte so there is no need to endian swap it. Acked-by: Guchun Chen <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu/gmc9: fix 64 bit division in partition codeAlex Deucher5-13/+35
Rework logic or use do_div() to avoid problems on 32 bit. v2: add a missing case for XCP macro v3: fix out of bounds array access v4: fix xcp handling harder Acked-by: Guchun Chen <[email protected]> (v1) Reviewed-by: Mukul Joshi <[email protected]> (v3) Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: initialize RAS for gfx_v9_4_3Tao Zhou1-1/+17
Register GFX RAS functions and initialize GFX RAS. v2: remove xcp operations. v3: reuse the return value of gfx_ras_sw_init. Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: add sq timeout status functions for gfx_v9_4_3Tao Zhou1-0/+98
Query and reset sq timeout status. v2: change instance from 0 to xcc_id for register access. Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: add RAS error count reset for gfx_v9_4_3Tao Zhou1-0/+38
Add GFX RAS error count reset function. v2: remove xcp operation. only select_se_sh when instance number is more than 1. v3: add check for se_num before select_se_sh. change instance from 0 to xcc_id for register access. Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: add RAS error count query for gfx_v9_4_3Tao Zhou1-0/+56
Query GFX RAS ce/ue count. Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: add RAS error count definitions for gfx_v9_4_3Tao Zhou1-0/+740
Prepare for the query of GFX RAS ce/ue count. v2: remove xcp operation. only select_se_sh when instance number is more than 1. v3: add more CE/UE registsers to query list. add check for se_num before select_se_sh. change instance from 0 to xcc_id for register access. v4: move gfx memory id definitions to gfx_v9_4_3. v5: create a dedicated patch for adding error count query function. Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: add RAS definitions for GFXTao Zhou1-0/+39
Add common GFX RAS definitions. v2: remove instance from amdgpu_gfx_ras_reg_entry, amdgpu_ras_err_status_reg_entry has already defined it. v3: remove memory id definitions from amdgpu_gfx.h, they are related to IP version. Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: Add gc v9_4_3 ras error status registersHawking Zhang2-0/+1304
GC v9_4_3 introduces UE|CE_ERR_STATUS_LO|HI to log hardware errors Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: add RAS status reset for gfx_v9_4_3Tao Zhou1-0/+41
Reset GFX RAS status registers. v2: fix typo in title. remove xcp operation. v3: change instance from 0 to xcc_id for register access. Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: add RAS status query for gfx_v9_4_3Tao Zhou1-0/+75
Query GFX RAS status. v2: remove xcp operation. v3: change instance from 0 to xcc_id for register access. Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: add GFX RAS common functionTao Zhou2-0/+23
The common function can help reduce redundant code. v2: remove xcp operation, only need to do RAS operations for all instances. v3: remove check for GFX RAS support, will be checked in higher level. add amdgpu prefix for the function name. Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: Do not access members of xcp w/o check (v2)Hawking Zhang4-7/+7
Not all the asic needs xcp. ensure check xcp availabity before accessing its member. v2: add missing change in kfd_topology.c Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Le Ma <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdkfd: Fix null ptr accessHawking Zhang1-6/+8
Avoid access null xcp_mgr pointer. Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: add check for RAS instance maskTao Zhou1-0/+38
The mask is only needed to be set when RAS block instance number is more than 1 and invalid bits should be also masked out. We only check valid bits for GFX and SDMA block for now, and will add check for other RAS blocks in the future. v2: move the check under injection operation since the mask is only used by RAS error inject. v3: add valid bits handling for SDMA. v4: print message if the mask is adjusted. Signed-off-by: Tao Zhou <[email protected]> Hawking Zhang <[email protected]> Reviewed-by: Stanley.Yang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: remove RAS GFX injection for gfx_v9_4/gfx_v9_4_2Tao Zhou2-48/+0
No special requirement in RAS injection for the two versions, switch to use default injection interface. Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Reviewed-by: Stanley.Yang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: reorganize RAS injection flowTao Zhou1-7/+6
So GFX RAS injection could use default function if it doesn't define its own injection interface. Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Reviewed-by: Stanley.Yang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: add instance mask for RAS injectTao Zhou8-19/+56
User can specify injected instances by the mask. For backward compatibility, the mask value is incorporated into sub block index without interface change of RAS TA. User uses logical mask and driver should convert it to physical value before sending it to RAS TA. v2: update parameter name. Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Reviewed-by: Stanley.Yang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: convert logical instance mask to physical oneTao Zhou3-3/+28
Convert instance mask for the convenience of RAS TA. Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Reviewed-by: Stanley.Yang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: Enable IH CAM on GFX9.4.3Mukul Joshi2-3/+4
This patch enables IH CAM on GFX9.4.3 ASIC. Signed-off-by: Mukul Joshi <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: Correct get_xcp_mem_id calculationPhilip Yang1-3/+2
Current calculation only works for NPS4/QPX mode, correct it for NPS4/CPX mode. Signed-off-by: Philip Yang <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdkfd: Refactor migrate init to support partition switchPhilip Yang6-16/+23
Rename smv_migrate_init to a better name kgd2kfd_init_zone_device because it setup zone devive pgmap for page migration and keep it in kfd_migrate.c to access static functions svm_migrate_pgmap_ops. Call it only once in amdgpu_device_ip_init after adev ip blocks are initialized, but before amdgpu_amdkfd_device_init initialize kfd nodes which enable SVM support based on pgmap. svm_range_set_max_pages is called by kgd2kfd_device_init everytime after switching compute partition mode. Signed-off-by: Philip Yang <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: route ioctls on primary node of XCPs to primary deviceShiwu Zhang1-0/+1
During XCP init, unlike the primary device, there is no amdgpu_device attached to each XCP's drm_device In case that user trying to open/close the primary node of XCP drm_device this rerouting is to solve the NULL pointer issue causing by referring to any member of the amdgpu_device BUG: unable to handle page fault for address: 0000000000020c80 #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page Oops: 0002 [#1] PREEMPT SMP NOPTI Call Trace: <TASK> lock_timer_base+0x6b/0x90 try_to_del_timer_sync+0x2b/0x80 del_timer_sync+0x29/0x40 flush_delayed_work+0x1c/0x50 amdgpu_driver_open_kms+0x2c/0x280 [amdgpu] drm_file_alloc+0x1b3/0x260 [drm] drm_open+0xaa/0x280 [drm] drm_stub_open+0xa2/0x120 [drm] chrdev_open+0xa6/0x1c0 Signed-off-by: Shiwu Zhang <[email protected]> Reviewed-by: Le Ma <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdkfd: APU mode set max svm range pagesPhilip Yang3-10/+17
svm_migrate_init set the max svm range pages based on the KFD nodes partition size. APU mode don't init pgmap because there is no migration. kgd2kfd_device_init calls svm_migrate_init after KFD nodes allocation and initialization. Signed-off-by: Philip Yang <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdkfd: Fix memory reporting on GFX 9.4.3Mukul Joshi5-31/+84
This patch fixes memory reporting on the GFX 9.4.3 APU and dGPU by reporting available memory on a per partition basis. If its an APU, available and used memory calculations take into account system and TTM memory. v2: squash in fix ("drm/amdkfd: Fix array out of bound warning") squash in fix ("drm/amdgpu: Update memory reporting for GFX9.4.3") Signed-off-by: Mukul Joshi <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdkfd: Move local_mem_info to kfd_nodeMukul Joshi7-19/+36
We need to track memory usage on a per partition basis. To do that, store the local memory information in KFD node instead of kfd device. v2: squash in fix ("amdkfd: Use mem_id to access mem_partition info") Signed-off-by: Mukul Joshi <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: use xcp partition ID for amdgpu_gemJames Zhu1-3/+5
Find xcp_id from amdgpu_fpriv, use it for amdgpu_gem_object_create. Signed-off-by: James Zhu <[email protected]> Acked-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>