aboutsummaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/amd
AgeCommit message (Collapse)AuthorFilesLines
2023-06-09drm/amd/display: Refactor fast update to use new HWSS build sequenceAlvin Lee20-30/+693
[Description] - Refactor HW sequencer to use a build / execute sequence - Also move gamma updates to become fast v2: squash in build fix ("drm/amd/display: Fix guarding of 'if (dc->debug.visual_confirm)'") Acked-by: Stylon Wang <[email protected]> Signed-off-by: Alvin Lee <[email protected]> Reviewed-by: Jun Lei <[email protected]> Tested-by: Daniel Wheeler <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amd/display: fix dcn315 single stream crb allocationDmytro Laktyushkin1-3/+12
Change to improve avoiding asymetric crb calculations for single stream scenarios. Cc: Mario Limonciello <[email protected]> Cc: Alex Deucher <[email protected]> Cc: [email protected] Acked-by: Stylon Wang <[email protected]> Signed-off-by: Dmytro Laktyushkin <[email protected]> Reviewed-by: Charlene Liu <[email protected]> Tested-by: Daniel Wheeler <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: change reserved vram info printYiPeng Chai1-3/+4
The link object of mgr->reserved_pages is the blocks variable in struct amdgpu_vram_reservation, not the link variable in struct drm_buddy_block. Signed-off-by: YiPeng Chai <[email protected]> Reviewed-by: Arunpravin Paneer Selvam <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected]
2023-06-09drm/amdgpu: fix xclk freq on CHIP_STONEYChia-I Wu1-2/+9
According to Alex, most APUs from that time seem to have the same issue (vbios says 48Mhz, actual is 100Mhz). I only have a CHIP_STONEY so I limit the fixup to CHIP_STONEY Signed-off-by: Chia-I Wu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09Revert "drm/amdgpu: switch to golden tsc registers for raven/raven2"Alex Deucher1-40/+0
This reverts commit f03eb1d26c2739b75580f58bbab4ab2d5d3eba46. This results in inconsistent timing reported via asynchronous GPU queries. Link: https://lists.freedesktop.org/archives/amd-gfx/2023-May/093731.html Cc: [email protected] Cc: [email protected] Reviewed-by: Michel Dänzer <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09Revert "drm/amdgpu: Differentiate between Raven2 and Raven/Picasso according ↵Alex Deucher1-14/+19
to revision id" This reverts commit 9d2d1827af295fd6971786672c41c4dba3657154. This results in inconsistent timing reported via asynchronous GPU queries. Link: https://lists.freedesktop.org/archives/amd-gfx/2023-May/093731.html Cc: [email protected] Cc: [email protected] Reviewed-by: Michel Dänzer <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09Revert "drm/amdgpu: change the reference clock for raven/raven2"Alex Deucher1-3/+4
This reverts commit fbc24293ca16b3b9ef891fe32ccd04735a6f8dc1. This results in inconsistent timing reported via asynchronous GPU queries. Link: https://lists.freedesktop.org/archives/amd-gfx/2023-May/093731.html Cc: [email protected] Cc: [email protected] Reviewed-by: Michel Dänzer <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: convert vcn/jpeg logical mask to physical maskStanley.Yang1-0/+4
Changed from V1: Remove amdgpu_ras_logical_mask_to_physical_mask due to GET_MASK provides same feature. Support convert VCN/JPEG logical mask to physical mask. Signed-off-by: Stanley.Yang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: support check vcn jpeg block maskStanley.Yang1-1/+5
Support VCN/JPEG instance mask checking, pass logical mask directly except GFX/SDMA/VCN/JPEG blocks. Changed from V1: correct a typo Signed-off-by: Stanley.Yang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: pass xcc mask to ras taStanley.Yang2-0/+3
pass xcc mask to ras ta, ras ta will compare the mask with the one from chiplet topology. Changed from V1: Remove IP version checking. Set ras_cmd->ras_init_message.init_flags.xcc_mask directly due to xcc_mask is common structres to all the devices. Signed-off-by: Stanley.Yang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amd/pm: update smu-driver if header for smu 13.0.0 and smu 13.0.10Kenneth Feng1-8/+25
update smu-driver if header for smu 13.0.0 and smu 13.0.10 Signed-off-by: Kenneth Feng <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu/pm: notify driver unloading to PMFW for SMU v13.0.6 dGPULe Ma1-9/+7
Per requested, follow the same sequence as APU to send only PPSMC_MSG_PrepareForDriverUnload to PMFW during driver unloading. Signed-off-by: Le Ma <[email protected]> Reviewed-by: Shiwu Zhang <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: Mark 'kgd_gfx_aldebaran_clear_address_watch' & ↵Srinivasan Shanmugam2-4/+4
'kgd_gfx_v11_clear_address_watch' functions as static Below two functions cause a warning because they lack a prototype: drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c:164:10: warning: no previous prototype for ‘kgd_gfx_aldebaran_clear_address_watch’ [-Wmissing-prototypes] drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v11.c:782:10: warning: no previous prototype for ‘kgd_gfx_v11_clear_address_watch’ [-Wmissing-prototypes] There are no callers from other files, so just mark them as 'static'. Also fixes the following checks: CHECK: Alignment should match open parenthesis +static uint32_t kgd_gfx_aldebaran_clear_address_watch(struct amdgpu_device *adev, uint32_t watch_id) CHECK: Alignment should match open parenthesis +static uint32_t kgd_gfx_v11_clear_address_watch(struct amdgpu_device *adev, uint32_t watch_id) Cc: Felix Kuehling <[email protected]> Cc: Christian König <[email protected]> Cc: Alex Deucher <[email protected]> Signed-off-by: Srinivasan Shanmugam <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amd/display: Program OTG vtotal min/max selectors unconditionally for DCN1+Aurabindo Pillai2-12/+13
For FPO/FAMS, DMCUB will try to change the output timings by writing to the OTG registers. However, the timings written directly to the OTG registers will not be honoured unless VMIN/VMAX selector registers are programmed with the right bits and trigger source is selected correctly. Proper solution needs to go into DMCUB but will require additional state tracking to ensure that the selectors are set and reset correctly as per driver state. Until fix is merged into firmware, apply the workaround in driver to unconditionally write OTG vmin/vmax selectors. Reviewed-by: Rodrigo Siqueira <[email protected]> Signed-off-by: Aurabindo Pillai <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09Revert "drm/amd/display: Only use ODM2:1 policy for high pixel rate displays"Aurabindo Pillai2-2/+0
This reverts commit 047783cdd5f604d87398236beb4971abb4d43293 since it causes higher power consumption for single display use case (4k60). Also, this patch introduced a 35% performance drop in a Vulkan benchmark. * The patch disabled the ODM-combination on most popular monitors, including 4K, 2K and FHD monitors at 60Hz. * ODM-combination can halve the DPP clock to save power, that is the reason why we introduce ODM-combination, and the PM log shows single pipe consumes more power at 4K@60Hz. * ODM-combination has 2 de-tiled buffer involved, which provides longer self-sustained time, that benefit to the memory power optimization. Signed-off-by: Aurabindo Pillai <[email protected]> Reviewed-by: Rodrigo Siqueira <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amd/display: Add gnu_printf format attribute for snprintf_count()Srinivasan Shanmugam1-4/+5
Fix the following W=1 kernel build warning: display/dc/dcn10/dcn10_hw_sequencer_debug.c: In function ‘snprintf_count’: display/dc/dcn10/dcn10_hw_sequencer_debug.c:56:2: warning: function ‘snprintf_count’ might be a candidate for ‘gnu_printf’ format attribute [-Wsuggest-attribute=format] Use the __printf() attribute to let the compiler warn if invalid format strings are passed in. And fix the following checks: CHECK: Avoid CamelCase: <pBuf> +unsigned int __printf(3, 4) snprintf_count(char *pBuf, unsigned int bufSize, char *fmt, ...) CHECK: Avoid CamelCase: <bufSize> +unsigned int __printf(3, 4) snprintf_count(char *pBuf, unsigned int bufSize, char *fmt, ...) Cc: Hamza Mahfooz <[email protected]> Cc: Rodrigo Siqueira <[email protected]> Cc: Harry Wentland <[email protected]> Cc: Aurabindo Pillai <[email protected]> Signed-off-by: Srinivasan Shanmugam <[email protected]> Reviewed-by: Rodrigo Siqueira <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amd/display: Address kdoc warnings in dcn30_fpu.cSrinivasan Shanmugam1-3/+12
Fixes the following gcc with W=1: display/dc/dml/dcn30/dcn30_fpu.c:677: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst * Finds dummy_latency_index when MCLK switching using firmware based display/dc/dml/dcn30/dcn30_fpu.c:688: warning: Function parameter or member 'dc' not described in 'dcn30_find_dummy_latency_index_for_fw_based_mclk_switch' display/dc/dml/dcn30/dcn30_fpu.c:688: warning: Function parameter or member 'context' not described in 'dcn30_find_dummy_latency_index_for_fw_based_mclk_switch' display/dc/dml/dcn30/dcn30_fpu.c:688: warning: Function parameter or member 'pipes' not described in 'dcn30_find_dummy_latency_index_for_fw_based_mclk_switch' display/dc/dml/dcn30/dcn30_fpu.c:688: warning: Function parameter or member 'pipe_cnt' not described in 'dcn30_find_dummy_latency_index_for_fw_based_mclk_switch' display/dc/dml/dcn30/dcn30_fpu.c:688: warning: Function parameter or member 'vlevel' not described in 'dcn30_find_dummy_latency_index_for_fw_based_mclk_switch' Cc: Rodrigo Siqueira <[email protected]> Cc: Aurabindo Pillai <[email protected]> Signed-off-by: Srinivasan Shanmugam <[email protected]> Reviewed-by: Rodrigo Siqueira <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amd/display: fix compilation error due to shifting negative valueGONG, Ruiqi1-1/+1
Currently compiling linux-next with allmodconfig triggers the following error: ./drivers/gpu/drm/amd/amdgpu/../display/include/fixed31_32.h: In function ‘dc_fixpt_truncate’: ./drivers/gpu/drm/amd/amdgpu/../display/include/fixed31_32.h:528:22: error: left shift of negative value [-Werror=shift-negative-value] 528 | arg.value &= (~0LL) << (FIXED31_32_BITS_PER_FRACTIONAL_PART - frac_bits); | ^~ Use `unsigned long long` instead. Signed-off-by: GONG, Ruiqi <[email protected]> Signed-off-by: Hamza Mahfooz <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu/discovery: Replace fake flex-arrays with flexible-array membersGustavo A. R. Silva1-3/+3
Zero-length and one-element arrays are deprecated, and we are moving towards adopting C99 flexible-array members, instead. Use the DECLARE_FLEX_ARRAY() helper macro to transform zero-length arrays in a union into flexible-array members. And replace a one-element array with a C99 flexible-array member. Address the following warnings found with GCC-13 and -fstrict-flex-arrays=3 enabled: drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c:1009:89: warning: array subscript kk is outside array bounds of ‘uint32_t[0]’ {aka ‘unsigned int[]’} [-Warray-bounds=] drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c:1007:94: warning: array subscript kk is outside array bounds of ‘uint64_t[0]’ {aka ‘long long unsigned int[]’} [-Warray-bounds=] drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c:1310:94: warning: array subscript k is outside array bounds of ‘uint64_t[0]’ {aka ‘long long unsigned int[]’} [-Warray-bounds=] drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c:1309:57: warning: array subscript k is outside array bounds of ‘uint32_t[0]’ {aka ‘unsigned int[]’} [-Warray-bounds=] This helps with the ongoing efforts to tighten the FORTIFY_SOURCE routines on memcpy() and help us make progress towards globally enabling -fstrict-flex-arrays=3 [1]. This results in no differences in binary output. Link: https://github.com/KSPP/linux/issues/21 Link: https://github.com/KSPP/linux/issues/193 Link: https://github.com/KSPP/linux/issues/300 Link: https://gcc.gnu.org/pipermail/gcc-patches/2022-October/602902.html [1] Reviewed-by: Kees Cook <[email protected]> Signed-off-by: Gustavo A. R. Silva <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09amdgpu: validate offset_in_bo of drm_amdgpu_gem_vaChia-I Wu1-8/+8
This is motivated by OOB access in amdgpu_vm_update_range when offset_in_bo+map_size overflows. v2: keep the validations in amdgpu_vm_bo_map v3: add the validations to amdgpu_vm_bo_map/amdgpu_vm_bo_replace_map rather than to amdgpu_gem_va_ioctl Fixes: 9f7eb5367d00 ("drm/amdgpu: actually use the VM map parameters") Reviewed-by: Christian König <[email protected]> Signed-off-by: Chia-I Wu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: fix debug wait on idle for gfx9.4.1Jonathan Kim1-1/+1
Wait calls for amd_ip_block_type not amd_hw_ip_block_type. Reported-by: Hamza Mahfooz <[email protected]> Signed-off-by: Jonathan Kim <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amd/display: fix seamless odm transitionsDmytro Laktyushkin3-1/+13
Add missing programming and function pointers Cc: Mario Limonciello <[email protected]> Cc: Alex Deucher <[email protected]> Cc: [email protected] Acked-by: Stylon Wang <[email protected]> Signed-off-by: Dmytro Laktyushkin <[email protected]> Reviewed-by: Charlene Liu <[email protected]> Tested-by: Daniel Wheeler <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amd/display: add ODM case when looking for first split pipeSamson Tam2-1/+55
[Why] When going from ODM 2:1 single display case to max displays, second odm pipe needs to be repurposed for one of the new single displays. However, acquire_first_split_pipe() only handles MPC case and not ODM case [How] Add ODM conditions in acquire_first_split_pipe() Add commit_minimal_transition_state() in commit_streams() to handle odm 2:1 exit first, and then process new streams Handle ODM condition in commit_minimal_transition_state() Cc: Mario Limonciello <[email protected]> Cc: Alex Deucher <[email protected]> Cc: [email protected] Acked-by: Stylon Wang <[email protected]> Signed-off-by: Samson Tam <[email protected]> Reviewed-by: Alvin Lee <[email protected]> Tested-by: Daniel Wheeler <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amd/display: clean up some inconsistent indentingJiapeng Chong1-2/+2
No functional modification involved. drivers/gpu/drm/amd/amdgpu/../display/dc/link/link_dpms.c:2377 link_set_dpms_on() warn: inconsistent indenting. Reported-by: Abaci Robot <[email protected]> Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=5376 Acked-by: Alex Deucher <[email protected]> Signed-off-by: Jiapeng Chong <[email protected]> Signed-off-by: Hamza Mahfooz <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amd/display: Fix dc/dcn20/dcn20_optc.c kdocSrinivasan Shanmugam1-5/+12
Fix all kdoc warnings in dc/dcn20/dcn20_optc.c: display/dc/dcn20/dcn20_optc.c:41: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst * Enable CRTC display/dc/dcn20/dcn20_optc.c:76: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst *For the below, I'm not sure how your GSL parameters are stored in your env, display/dc/dcn20/dcn20_optc.c:85: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst * There are (MAX_OPTC+1)/2 gsl groups available for use. Cc: Rodrigo Siqueira <[email protected]> Cc: Aurabindo Pillai <[email protected]> Cc: Harry Wentland <[email protected]> Signed-off-by: Srinivasan Shanmugam <[email protected]> Reviewed-by: Harry Wentland <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amd/pm: fulfill the OD support for SMU13.0.7Evan Quan1-43/+402
Fulfill the interfaces for OD settings retrieving and setting. Signed-off-by: Evan Quan <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amd/pm: Fill metrics data for SMUv13.0.6Lijo Lazar1-41/+66
Populate metrics data table for SMU v13.0.6. Add PCIe link speed/width information also. Signed-off-by: Lijo Lazar <[email protected]> Reviewed-by: Le Ma <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amd/pm: fulfill the OD support for SMU13.0.0Evan Quan1-43/+402
Fulfill the interfaces for OD settings retrieving and setting. Signed-off-by: Evan Quan <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amd/pm: fulfill SMU13 OD settings init and restoreEvan Quan4-10/+286
Gfxclk fmin/fmax, Uclk fmin/fmax and Gfx v/f curve voltage offset OD settings are supported for SMU13. Signed-off-by: Evan Quan <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdkfd: bump kfd ioctl minor version for debug api availabilityJonathan Kim1-1/+0
Bump the minor version to declare debugging capability is now available. Signed-off-by: Jonathan Kim <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdkfd: add debug device snapshot operationJonathan Kim3-2/+83
Similar to queue snapshot, return an array of device information using an entry_size check and return. Unlike queue snapshots, the debugger needs to pass to correct number of devices that exist. If it fails to do so, the KFD will return the number of actual devices so that the debugger can make a subsequent successful call. Signed-off-by: Jonathan Kim <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdkfd: add debug queue snapshot operationJonathan Kim5-0/+90
Allow the debugger to get a snapshot of a specified number of queues containing various queue property information that is copied to the debugger. Since the debugger doesn't know how many queues exist at any given time, allow the debugger to pass the requested number of snapshots as 0 to get the actual number of potential snapshots to use for a subsequent snapshot request for actual information. To prevent future ABI breakage, pass in the requested entry_size. The KFD will return it's own entry_size in case the debugger still wants log the information in a core dump on sizing failure. Also allow the debugger to clear exceptions when doing a snapshot. Signed-off-by: Jonathan Kim <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdkfd: add debug query exception info operationJonathan Kim3-0/+133
Allow the debugger to query additional info based on an exception code. For device exceptions, it's currently only memory violation information. For process exceptions, it's currently only runtime information. Queue exception only report the queue exception status. The debugger has the option of clearing the target exception on query. Signed-off-by: Jonathan Kim <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdkfd: add debug query event operationJonathan Kim3-0/+75
Allow the debugger to query a single queue, device and process exception. The KFD should also return the GPU or Queue id of the exception. The debugger also has the option of clearing exceptions after being queried. Signed-off-by: Jonathan Kim <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdkfd: add debug set flags operationJonathan Kim3-0/+61
Allow the debugger to set single memory and single ALU operations. Some exceptions are imprecise (memory violations, address watch) in the sense that a trap occurs only when the exception interrupt occurs and not at the non-halting faulty instruction. Trap temporaries 0 & 1 save the program counter address, which means that these values will not point to the faulty instruction address but to whenever the interrupt was raised. Setting the Single Memory Operations flag will inject an automatic wait on every memory operation instruction forcing imprecise memory exceptions to become precise at the cost of performance. This setting is not permitted on debug devices that support only a global setting of this option. Return the previous set flags to the debugger as well. Signed-off-by: Jonathan Kim <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdkfd: add debug set and clear address watch points operationJonathan Kim13-5/+452
Shader read, write and atomic memory operations can be alerted to the debugger as an address watch exception. Allow the debugger to pass in a watch point to a particular memory address per device. Note that there exists only 4 watch points per devices to date, so have the KFD keep track of what watch points are allocated or not. Signed-off-by: Jonathan Kim <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdkfd: add debug suspend and resume process queues operationJonathan Kim11-14/+512
In order to inspect waves from the saved context at any point during a debug session, the debugger must be able to preempt queues to trigger context save by suspending them. On queue suspend, the KFD will copy the context save header information so that the debugger can correctly crawl the appropriate size of the saved context. The debugger must then also be allowed to resume suspended queues. A queue that is newly created cannot be suspended because queue ids are recycled after destruction so the debugger needs to know that this has occurred. Query functions will be later added that will clear a given queue of its new queue status. A queue cannot be destroyed while it is suspended to preserve its saved context during debugger inspection. Have queue destruction block while a queue is suspended and unblocked when it is resumed. Likewise, if a queue is about to be destroyed, it cannot be suspended. Return the number of queues successfully suspended or resumed along with a per queue status array where the upper bits per queue status show that the request was invalid (new/destroyed queue suspend request, missing queue) or an error occurred (HWS in a fatal state so it can't suspend or resume queues). Signed-off-by: Jonathan Kim <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdkfd: add debug wave launch mode operationJonathan Kim11-3/+124
Allow the debugger to set wave behaviour on to either normally operate, halt at launch, trap on every instruction, terminate immediately or stall on allocation. Signed-off-by: Jonathan Kim <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdkfd: add debug wave launch override operationJonathan Kim11-2/+351
This operation allows the debugger to override the enabled HW exceptions on the device. On debug devices that only support the debugging of a single process, the HW exceptions are global and set through the SPI_GDBG_TRAP_MASK register. Because they are global, only address watch exceptions are allowed to be enabled. In other words, the debugger must preserve all non-address watch exception states in normal mode operation by barring a full replacement override or a non-address watch override request. For multi-process debugging, all HW exception overrides are per-VMID so all exceptions can be overridden or fully replaced. In order for the debugger to know what is permissible, returned the supported override mask back to the debugger along with the previously enable overrides. Signed-off-by: Jonathan Kim <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdkfd: add debug set exceptions enabled operationJonathan Kim3-0/+41
The debugger subscibes to nofication for requested exceptions on attach. Allow the debugger to change its subsciption later on. Signed-off-by: Jonathan Kim <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdkfd: update process interrupt handling for debug eventsJonathan Kim12-19/+686
The debugger must be notified by any debugger subscribed exception that comes from hardware interrupts. If a debugger session exits, any exceptions it subscribed to may still have interrupts in the interrupt ring buffer or KGD/KFD pipeline. To prevent a new session from inheriting stale interrupts, when a new queue is created, open an interrupt drain and allow the IH ring to drain from a timestamped checkpoint. Then inject a custom IV so that once the custom IV is picked up by the KFD, it's safe to close the drain and proceed with queue creation. The drain must also be on debug disable as SW interrupts may still be processed. Drain at this time and clear all the exception status. The debugger may also not be attached nor subscibed to certain exceptions so forward them directly to the runtime. GFX10 also requires its own IV processing, hence the creation of kfd_int_process_v10.c. This is because the IV from SQ interrupts are packed into a new continguous format unlike GFX9. To make this clear, a separate interrupting handling code file was created. Signed-off-by: Jonathan Kim <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amd/pm: update SMU13 header files for coming OD supportEvan Quan4-45/+34
Correct the data structures for OD feature support. Signed-off-by: Evan Quan <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdkfd: add debug trap enabled flag to tmaJay Cornwall3-0/+28
Trap handler behavior will differ when a debugger is attached. Make the debug trap flag available in the trap handler TMA. Update it when the debug trap ioctl is invoked. Signed-off-by: Jay Cornwall <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Jonathan Kim <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdkfd: add runtime enable operationJonathan Kim4-4/+150
The debugger can attach to a process prior to HSA enablement (i.e. inferior is spawned by the debugger and attached to immediately before target process has been enabled for HSA dispatches) or it can attach to a running target that is already HSA enabled. Either way, the debugger needs to know the enablement status to know when it can inspect queues. For the scenario where the debugger spawns the target process, it will have to wait for ROCr's runtime enable request from the target. The runtime enable request will be able to see that its process has been debug attached. ROCr raises an EC_PROCESS_RUNTIME signal to the debugger then blocks the target process while waiting the debugger's response. Once the debugger has received the runtime signal, it will unblock the target process. For the scenario where the debugger attaches to a running target process, ROCr will set the target process' runtime status as enabled so that on an attach request, the debugger will be able to see this status and will continue with debug enablement as normal. A secondary requirement is to conditionally enable the trap tempories only if the user requests it (env var HSA_ENABLE_DEBUG=1) or if the debugger attaches with HSA runtime enabled. This is because setting up the trap temporaries incurs a performance overhead that is unacceptable for microbench performance in normal mode for certain customers. In the scenario where the debugger spawns the target process, when ROCr detects that the debugger has attached during the runtime enable request, it will enable the trap temporaries before it blocks the target process while waiting for the debugger to respond. In the scenario where the debugger attaches to a running target process, it will enable to trap temporaries itself. Finally, there is an additional restriction that is required to be enforced with runtime enable and HW debug mode setting. The debugger must first ensure that HW debug mode has been enabled before permitting HW debug mode operations. With single process debug devices, allowing the debugger to set debug HW modes prior to trap activation means that debug HW mode setting can occur before the KFD has reserved the debug VMID (0xf) from the hardware scheduler's VMID allocation resource pool. This can result in the hardware scheduler assigning VMID 0xf to a non-debugged process and having that process inherit debug HW mode settings intended for the debugged target process instead, which is both incorrect and potentially fatal for normal mode operation. With multi process debug devices, allowing the debugger to set debug HW modes prior to trap activation means that non-debugged processes migrating to a new VMID could inherit unintended debug settings. All debug operations that touch HW settings must require trap activation where trap activation is triggered by both debug attach and runtime enablement (target has KFD opened and is ready to dispatch work). Signed-off-by: Jonathan Kim <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdkfd: add send exception operationJonathan Kim8-6/+135
Add a debug operation that allows the debugger to send an exception directly to runtime through a payload address. For memory violations, normal vmfault signals will be applied to notify runtime instead after passing in the saved exception data when a memory violation was raised to the debugger. For runtime exceptions, this will unblock the runtime enable function which will be explained and implemented in a follow up patch. Signed-off-by: Jonathan Kim <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdkfd: add raise exception event functionJonathan Kim4-0/+123
Exception events can be generated from interrupts or queue activitity. The raise event function will save exception status of a queue, device or process then notify the debugger of the status change by writing to a debugger polled file descriptor that the debugger provides during debug attach. For memory violation exceptions, extra exception data will be saved. The debugger will be able to query the saved exception states by query operation that will be provided by follow up patches. Signed-off-by: Jonathan Kim <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdkfd: apply trap workaround for gfx11Jonathan Kim14-31/+122
Due to a HW bug, waves in only half the shader arrays can enter trap. When starting a debug session, relocate all waves to the first shader array of each shader engine and mask off the 2nd shader array as unavailable. When ending a debug session, re-enable the 2nd shader array per shader engine. User CU masking per queue cannot be guaranteed to remain functional if requested during debugging (e.g. user cu mask requests only 2nd shader array as an available resource leading to zero HW resources available) nor can runtime be alerted of any of these changes during execution. Make user CU masking and debugging mutual exclusive with respect to availability. If the debugger tries to attach to a process with a user cu masked queue, return the runtime status as enabled but busy. If the debugger tries to attach and fails to reallocate queue waves to the first shader array of each shader engine, return the runtime status as enabled but with an error. In addition, like any other mutli-process debug supported devices, disable trap temporary setup per-process to avoid performance impact from setup overhead. Signed-off-by: Jonathan Kim <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdkfd: add per process hw trap enable and disable functionsJonathan Kim4-2/+190
To enable HW debug mode per process, all devices must be debug enabled successfully. If a failure occures, rewind the enablement of debug mode on the enabled devices. A power management scenario that needs to be considered is HW debug mode setting during GFXOFF. During GFXOFF, these registers will be unreachable so we have to transiently disable GFXOFF when setting. Also, some devices don't support the RLC save restore function for these debug registers so we have to disable GFXOFF completely during a debug session. Cooperative launch also has debugging restriction based on HW/FW bugs. If such bugs exists, the debugger cannot attach to a process that uses GWS resources nor can GWS resources be requested if a process is being debugged. Multi-process debug devices can only enable trap temporaries based on certain runtime scenerios, which will be explained when the runtime enable functions are implemented in a follow up patch. Signed-off-by: Jonathan Kim <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: expose debug api for mesJonathan Kim4-1/+84
Similar to the F32 HWS, the RS64 HWS for GFX11 now supports a multi-process debug API. The skip_process_ctx_clear ADD_QUEUE requirement is to prevent the MES from clearing the process context when the first queue is added to the scheduler in order to maintain debug mode settings during queue preemption and restore. The MES clears the process context in this case due to an unresolved FW caching bug during normal mode operations. During debug mode, the KFD will hold a reference to the target process so the process context should never go stale and MES can afford to skip this requirement. Signed-off-by: Jonathan Kim <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: prepare map process for multi-process debug devicesJonathan Kim6-0/+87
Unlike single process debug devices, multi-process debug devices allow debug mode setting per-VMID (non-device-global). Because the HWS manages PASID-VMID mapping, the new MAP_PROCESS API allows the KFD to forward the required SPI debug register write requests. To request a new debug mode setting change, the KFD must be able to preempt all queues then remap all queues with these new setting requests for MAP_PROCESS to take effect. Note that by default, trap enablement in non-debug mode must be disabled for performance reasons for multi-process debug devices due to setup overhead in FW. Signed-off-by: Jonathan Kim <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>