blaster4385/linux-IllusionX - Linux kernel with personal config changes for arch linux

Age	Commit message (Collapse)	Author	Files	Lines
2023-06-09	drm/amdgpu: pass xcc mask to ras ta	Stanley.Yang	2	-0/+3
	pass xcc mask to ras ta, ras ta will compare the mask with the one from chiplet topology. Changed from V1: Remove IP version checking. Set ras_cmd->ras_init_message.init_flags.xcc_mask directly due to xcc_mask is common structres to all the devices. Signed-off-by: Stanley.Yang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09	drm/amd/pm: update smu-driver if header for smu 13.0.0 and smu 13.0.10	Kenneth Feng	1	-8/+25
	update smu-driver if header for smu 13.0.0 and smu 13.0.10 Signed-off-by: Kenneth Feng <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09	drm/amdgpu/pm: notify driver unloading to PMFW for SMU v13.0.6 dGPU	Le Ma	1	-9/+7
	Per requested, follow the same sequence as APU to send only PPSMC_MSG_PrepareForDriverUnload to PMFW during driver unloading. Signed-off-by: Le Ma <[email protected]> Reviewed-by: Shiwu Zhang <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09	drm/amdgpu: Mark 'kgd_gfx_aldebaran_clear_address_watch' & ↵	Srinivasan Shanmugam	2	-4/+4
	'kgd_gfx_v11_clear_address_watch' functions as static Below two functions cause a warning because they lack a prototype: drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c:164:10: warning: no previous prototype for ‘kgd_gfx_aldebaran_clear_address_watch’ [-Wmissing-prototypes] drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v11.c:782:10: warning: no previous prototype for ‘kgd_gfx_v11_clear_address_watch’ [-Wmissing-prototypes] There are no callers from other files, so just mark them as 'static'. Also fixes the following checks: CHECK: Alignment should match open parenthesis +static uint32_t kgd_gfx_aldebaran_clear_address_watch(struct amdgpu_device adev, uint32_t watch_id) CHECK: Alignment should match open parenthesis +static uint32_t kgd_gfx_v11_clear_address_watch(struct amdgpu_device adev, uint32_t watch_id) Cc: Felix Kuehling <[email protected]> Cc: Christian König <[email protected]> Cc: Alex Deucher <[email protected]> Signed-off-by: Srinivasan Shanmugam <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09	drm/amd/display: Program OTG vtotal min/max selectors unconditionally for DCN1+	Aurabindo Pillai	2	-12/+13
	For FPO/FAMS, DMCUB will try to change the output timings by writing to the OTG registers. However, the timings written directly to the OTG registers will not be honoured unless VMIN/VMAX selector registers are programmed with the right bits and trigger source is selected correctly. Proper solution needs to go into DMCUB but will require additional state tracking to ensure that the selectors are set and reset correctly as per driver state. Until fix is merged into firmware, apply the workaround in driver to unconditionally write OTG vmin/vmax selectors. Reviewed-by: Rodrigo Siqueira <[email protected]> Signed-off-by: Aurabindo Pillai <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09	Revert "drm/amd/display: Only use ODM2:1 policy for high pixel rate displays"	Aurabindo Pillai	2	-2/+0
	This reverts commit 047783cdd5f604d87398236beb4971abb4d43293 since it causes higher power consumption for single display use case (4k60). Also, this patch introduced a 35% performance drop in a Vulkan benchmark. * The patch disabled the ODM-combination on most popular monitors, including 4K, 2K and FHD monitors at 60Hz. * ODM-combination can halve the DPP clock to save power, that is the reason why we introduce ODM-combination, and the PM log shows single pipe consumes more power at 4K@60Hz. * ODM-combination has 2 de-tiled buffer involved, which provides longer self-sustained time, that benefit to the memory power optimization. Signed-off-by: Aurabindo Pillai <[email protected]> Reviewed-by: Rodrigo Siqueira <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09	drm/amd/display: Add gnu_printf format attribute for snprintf_count()	Srinivasan Shanmugam	1	-4/+5
	Fix the following W=1 kernel build warning: display/dc/dcn10/dcn10_hw_sequencer_debug.c: In function ‘snprintf_count’: display/dc/dcn10/dcn10_hw_sequencer_debug.c:56:2: warning: function ‘snprintf_count’ might be a candidate for ‘gnu_printf’ format attribute [-Wsuggest-attribute=format] Use the __printf() attribute to let the compiler warn if invalid format strings are passed in. And fix the following checks: CHECK: Avoid CamelCase: <pBuf> +unsigned int __printf(3, 4) snprintf_count(char pBuf, unsigned int bufSize, char fmt, ...) CHECK: Avoid CamelCase: <bufSize> +unsigned int __printf(3, 4) snprintf_count(char pBuf, unsigned int bufSize, char fmt, ...) Cc: Hamza Mahfooz <[email protected]> Cc: Rodrigo Siqueira <[email protected]> Cc: Harry Wentland <[email protected]> Cc: Aurabindo Pillai <[email protected]> Signed-off-by: Srinivasan Shanmugam <[email protected]> Reviewed-by: Rodrigo Siqueira <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09	drm/amd/display: Address kdoc warnings in dcn30_fpu.c	Srinivasan Shanmugam	1	-3/+12
	Fixes the following gcc with W=1: display/dc/dml/dcn30/dcn30_fpu.c:677: warning: This comment starts with '/*', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst Finds dummy_latency_index when MCLK switching using firmware based display/dc/dml/dcn30/dcn30_fpu.c:688: warning: Function parameter or member 'dc' not described in 'dcn30_find_dummy_latency_index_for_fw_based_mclk_switch' display/dc/dml/dcn30/dcn30_fpu.c:688: warning: Function parameter or member 'context' not described in 'dcn30_find_dummy_latency_index_for_fw_based_mclk_switch' display/dc/dml/dcn30/dcn30_fpu.c:688: warning: Function parameter or member 'pipes' not described in 'dcn30_find_dummy_latency_index_for_fw_based_mclk_switch' display/dc/dml/dcn30/dcn30_fpu.c:688: warning: Function parameter or member 'pipe_cnt' not described in 'dcn30_find_dummy_latency_index_for_fw_based_mclk_switch' display/dc/dml/dcn30/dcn30_fpu.c:688: warning: Function parameter or member 'vlevel' not described in 'dcn30_find_dummy_latency_index_for_fw_based_mclk_switch' Cc: Rodrigo Siqueira <[email protected]> Cc: Aurabindo Pillai <[email protected]> Signed-off-by: Srinivasan Shanmugam <[email protected]> Reviewed-by: Rodrigo Siqueira <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09	drm/amd/display: fix compilation error due to shifting negative value	GONG, Ruiqi	1	-1/+1
	Currently compiling linux-next with allmodconfig triggers the following error: ./drivers/gpu/drm/amd/amdgpu/../display/include/fixed31_32.h: In function ‘dc_fixpt_truncate’: ./drivers/gpu/drm/amd/amdgpu/../display/include/fixed31_32.h:528:22: error: left shift of negative value [-Werror=shift-negative-value] 528 \| arg.value &= (~0LL) << (FIXED31_32_BITS_PER_FRACTIONAL_PART - frac_bits); \| ^~ Use `unsigned long long` instead. Signed-off-by: GONG, Ruiqi <[email protected]> Signed-off-by: Hamza Mahfooz <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09	drm/amdgpu/discovery: Replace fake flex-arrays with flexible-array members	Gustavo A. R. Silva	1	-3/+3
	Zero-length and one-element arrays are deprecated, and we are moving towards adopting C99 flexible-array members, instead. Use the DECLARE_FLEX_ARRAY() helper macro to transform zero-length arrays in a union into flexible-array members. And replace a one-element array with a C99 flexible-array member. Address the following warnings found with GCC-13 and -fstrict-flex-arrays=3 enabled: drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c:1009:89: warning: array subscript kk is outside array bounds of ‘uint32_t[0]’ {aka ‘unsigned int[]’} [-Warray-bounds=] drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c:1007:94: warning: array subscript kk is outside array bounds of ‘uint64_t[0]’ {aka ‘long long unsigned int[]’} [-Warray-bounds=] drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c:1310:94: warning: array subscript k is outside array bounds of ‘uint64_t[0]’ {aka ‘long long unsigned int[]’} [-Warray-bounds=] drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c:1309:57: warning: array subscript k is outside array bounds of ‘uint32_t[0]’ {aka ‘unsigned int[]’} [-Warray-bounds=] This helps with the ongoing efforts to tighten the FORTIFY_SOURCE routines on memcpy() and help us make progress towards globally enabling -fstrict-flex-arrays=3 [1]. This results in no differences in binary output. Link: https://github.com/KSPP/linux/issues/21 Link: https://github.com/KSPP/linux/issues/193 Link: https://github.com/KSPP/linux/issues/300 Link: https://gcc.gnu.org/pipermail/gcc-patches/2022-October/602902.html [1] Reviewed-by: Kees Cook <[email protected]> Signed-off-by: Gustavo A. R. Silva <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09	amdgpu: validate offset_in_bo of drm_amdgpu_gem_va	Chia-I Wu	1	-8/+8
	This is motivated by OOB access in amdgpu_vm_update_range when offset_in_bo+map_size overflows. v2: keep the validations in amdgpu_vm_bo_map v3: add the validations to amdgpu_vm_bo_map/amdgpu_vm_bo_replace_map rather than to amdgpu_gem_va_ioctl Fixes: 9f7eb5367d00 ("drm/amdgpu: actually use the VM map parameters") Reviewed-by: Christian König <[email protected]> Signed-off-by: Chia-I Wu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09	drm/amdgpu: fix debug wait on idle for gfx9.4.1	Jonathan Kim	1	-1/+1
	Wait calls for amd_ip_block_type not amd_hw_ip_block_type. Reported-by: Hamza Mahfooz <[email protected]> Signed-off-by: Jonathan Kim <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09	drm/amd/display: fix seamless odm transitions	Dmytro Laktyushkin	3	-1/+13
	Add missing programming and function pointers Cc: Mario Limonciello <[email protected]> Cc: Alex Deucher <[email protected]> Cc: [email protected] Acked-by: Stylon Wang <[email protected]> Signed-off-by: Dmytro Laktyushkin <[email protected]> Reviewed-by: Charlene Liu <[email protected]> Tested-by: Daniel Wheeler <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09	drm/amd/display: add ODM case when looking for first split pipe	Samson Tam	2	-1/+55
	[Why] When going from ODM 2:1 single display case to max displays, second odm pipe needs to be repurposed for one of the new single displays. However, acquire_first_split_pipe() only handles MPC case and not ODM case [How] Add ODM conditions in acquire_first_split_pipe() Add commit_minimal_transition_state() in commit_streams() to handle odm 2:1 exit first, and then process new streams Handle ODM condition in commit_minimal_transition_state() Cc: Mario Limonciello <[email protected]> Cc: Alex Deucher <[email protected]> Cc: [email protected] Acked-by: Stylon Wang <[email protected]> Signed-off-by: Samson Tam <[email protected]> Reviewed-by: Alvin Lee <[email protected]> Tested-by: Daniel Wheeler <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09	drm/amd/display: clean up some inconsistent indenting	Jiapeng Chong	1	-2/+2
	No functional modification involved. drivers/gpu/drm/amd/amdgpu/../display/dc/link/link_dpms.c:2377 link_set_dpms_on() warn: inconsistent indenting. Reported-by: Abaci Robot <[email protected]> Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=5376 Acked-by: Alex Deucher <[email protected]> Signed-off-by: Jiapeng Chong <[email protected]> Signed-off-by: Hamza Mahfooz <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09	drm/amd/display: Fix dc/dcn20/dcn20_optc.c kdoc	Srinivasan Shanmugam	1	-5/+12
	Fix all kdoc warnings in dc/dcn20/dcn20_optc.c: display/dc/dcn20/dcn20_optc.c:41: warning: This comment starts with '/*', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst Enable CRTC display/dc/dcn20/dcn20_optc.c:76: warning: This comment starts with '/*', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst For the below, I'm not sure how your GSL parameters are stored in your env, display/dc/dcn20/dcn20_optc.c:85: warning: This comment starts with '/*', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst There are (MAX_OPTC+1)/2 gsl groups available for use. Cc: Rodrigo Siqueira <[email protected]> Cc: Aurabindo Pillai <[email protected]> Cc: Harry Wentland <[email protected]> Signed-off-by: Srinivasan Shanmugam <[email protected]> Reviewed-by: Harry Wentland <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09	drm/amd/pm: fulfill the OD support for SMU13.0.7	Evan Quan	1	-43/+402
	Fulfill the interfaces for OD settings retrieving and setting. Signed-off-by: Evan Quan <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09	drm/amd/pm: Fill metrics data for SMUv13.0.6	Lijo Lazar	1	-41/+66
	Populate metrics data table for SMU v13.0.6. Add PCIe link speed/width information also. Signed-off-by: Lijo Lazar <[email protected]> Reviewed-by: Le Ma <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09	drm/amd/pm: fulfill the OD support for SMU13.0.0	Evan Quan	1	-43/+402
	Fulfill the interfaces for OD settings retrieving and setting. Signed-off-by: Evan Quan <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09	drm/amd/pm: fulfill SMU13 OD settings init and restore	Evan Quan	4	-10/+286
	Gfxclk fmin/fmax, Uclk fmin/fmax and Gfx v/f curve voltage offset OD settings are supported for SMU13. Signed-off-by: Evan Quan <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09	drm/amdkfd: bump kfd ioctl minor version for debug api availability	Jonathan Kim	1	-1/+0
	Bump the minor version to declare debugging capability is now available. Signed-off-by: Jonathan Kim <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09	drm/amdkfd: add debug device snapshot operation	Jonathan Kim	3	-2/+83
	Similar to queue snapshot, return an array of device information using an entry_size check and return. Unlike queue snapshots, the debugger needs to pass to correct number of devices that exist. If it fails to do so, the KFD will return the number of actual devices so that the debugger can make a subsequent successful call. Signed-off-by: Jonathan Kim <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09	drm/amdkfd: add debug queue snapshot operation	Jonathan Kim	5	-0/+90
	Allow the debugger to get a snapshot of a specified number of queues containing various queue property information that is copied to the debugger. Since the debugger doesn't know how many queues exist at any given time, allow the debugger to pass the requested number of snapshots as 0 to get the actual number of potential snapshots to use for a subsequent snapshot request for actual information. To prevent future ABI breakage, pass in the requested entry_size. The KFD will return it's own entry_size in case the debugger still wants log the information in a core dump on sizing failure. Also allow the debugger to clear exceptions when doing a snapshot. Signed-off-by: Jonathan Kim <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09	drm/amdkfd: add debug query exception info operation	Jonathan Kim	3	-0/+133
	Allow the debugger to query additional info based on an exception code. For device exceptions, it's currently only memory violation information. For process exceptions, it's currently only runtime information. Queue exception only report the queue exception status. The debugger has the option of clearing the target exception on query. Signed-off-by: Jonathan Kim <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09	drm/amdkfd: add debug query event operation	Jonathan Kim	3	-0/+75
	Allow the debugger to query a single queue, device and process exception. The KFD should also return the GPU or Queue id of the exception. The debugger also has the option of clearing exceptions after being queried. Signed-off-by: Jonathan Kim <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09	drm/amdkfd: add debug set flags operation	Jonathan Kim	3	-0/+61
	Allow the debugger to set single memory and single ALU operations. Some exceptions are imprecise (memory violations, address watch) in the sense that a trap occurs only when the exception interrupt occurs and not at the non-halting faulty instruction. Trap temporaries 0 & 1 save the program counter address, which means that these values will not point to the faulty instruction address but to whenever the interrupt was raised. Setting the Single Memory Operations flag will inject an automatic wait on every memory operation instruction forcing imprecise memory exceptions to become precise at the cost of performance. This setting is not permitted on debug devices that support only a global setting of this option. Return the previous set flags to the debugger as well. Signed-off-by: Jonathan Kim <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09	drm/amdkfd: add debug set and clear address watch points operation	Jonathan Kim	13	-5/+452
	Shader read, write and atomic memory operations can be alerted to the debugger as an address watch exception. Allow the debugger to pass in a watch point to a particular memory address per device. Note that there exists only 4 watch points per devices to date, so have the KFD keep track of what watch points are allocated or not. Signed-off-by: Jonathan Kim <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09	drm/amdkfd: add debug suspend and resume process queues operation	Jonathan Kim	11	-14/+512
	In order to inspect waves from the saved context at any point during a debug session, the debugger must be able to preempt queues to trigger context save by suspending them. On queue suspend, the KFD will copy the context save header information so that the debugger can correctly crawl the appropriate size of the saved context. The debugger must then also be allowed to resume suspended queues. A queue that is newly created cannot be suspended because queue ids are recycled after destruction so the debugger needs to know that this has occurred. Query functions will be later added that will clear a given queue of its new queue status. A queue cannot be destroyed while it is suspended to preserve its saved context during debugger inspection. Have queue destruction block while a queue is suspended and unblocked when it is resumed. Likewise, if a queue is about to be destroyed, it cannot be suspended. Return the number of queues successfully suspended or resumed along with a per queue status array where the upper bits per queue status show that the request was invalid (new/destroyed queue suspend request, missing queue) or an error occurred (HWS in a fatal state so it can't suspend or resume queues). Signed-off-by: Jonathan Kim <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09	drm/amdkfd: add debug wave launch mode operation	Jonathan Kim	11	-3/+124
	Allow the debugger to set wave behaviour on to either normally operate, halt at launch, trap on every instruction, terminate immediately or stall on allocation. Signed-off-by: Jonathan Kim <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09	drm/amdkfd: add debug wave launch override operation	Jonathan Kim	11	-2/+351
	This operation allows the debugger to override the enabled HW exceptions on the device. On debug devices that only support the debugging of a single process, the HW exceptions are global and set through the SPI_GDBG_TRAP_MASK register. Because they are global, only address watch exceptions are allowed to be enabled. In other words, the debugger must preserve all non-address watch exception states in normal mode operation by barring a full replacement override or a non-address watch override request. For multi-process debugging, all HW exception overrides are per-VMID so all exceptions can be overridden or fully replaced. In order for the debugger to know what is permissible, returned the supported override mask back to the debugger along with the previously enable overrides. Signed-off-by: Jonathan Kim <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09	drm/amdkfd: add debug set exceptions enabled operation	Jonathan Kim	3	-0/+41
	The debugger subscibes to nofication for requested exceptions on attach. Allow the debugger to change its subsciption later on. Signed-off-by: Jonathan Kim <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09	drm/amdkfd: update process interrupt handling for debug events	Jonathan Kim	12	-19/+686
	The debugger must be notified by any debugger subscribed exception that comes from hardware interrupts. If a debugger session exits, any exceptions it subscribed to may still have interrupts in the interrupt ring buffer or KGD/KFD pipeline. To prevent a new session from inheriting stale interrupts, when a new queue is created, open an interrupt drain and allow the IH ring to drain from a timestamped checkpoint. Then inject a custom IV so that once the custom IV is picked up by the KFD, it's safe to close the drain and proceed with queue creation. The drain must also be on debug disable as SW interrupts may still be processed. Drain at this time and clear all the exception status. The debugger may also not be attached nor subscibed to certain exceptions so forward them directly to the runtime. GFX10 also requires its own IV processing, hence the creation of kfd_int_process_v10.c. This is because the IV from SQ interrupts are packed into a new continguous format unlike GFX9. To make this clear, a separate interrupting handling code file was created. Signed-off-by: Jonathan Kim <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09	drm/amd/pm: update SMU13 header files for coming OD support	Evan Quan	4	-45/+34
	Correct the data structures for OD feature support. Signed-off-by: Evan Quan <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09	drm/amdkfd: add debug trap enabled flag to tma	Jay Cornwall	3	-0/+28
	Trap handler behavior will differ when a debugger is attached. Make the debug trap flag available in the trap handler TMA. Update it when the debug trap ioctl is invoked. Signed-off-by: Jay Cornwall <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Jonathan Kim <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09	drm/amdkfd: add runtime enable operation	Jonathan Kim	4	-4/+150
	The debugger can attach to a process prior to HSA enablement (i.e. inferior is spawned by the debugger and attached to immediately before target process has been enabled for HSA dispatches) or it can attach to a running target that is already HSA enabled. Either way, the debugger needs to know the enablement status to know when it can inspect queues. For the scenario where the debugger spawns the target process, it will have to wait for ROCr's runtime enable request from the target. The runtime enable request will be able to see that its process has been debug attached. ROCr raises an EC_PROCESS_RUNTIME signal to the debugger then blocks the target process while waiting the debugger's response. Once the debugger has received the runtime signal, it will unblock the target process. For the scenario where the debugger attaches to a running target process, ROCr will set the target process' runtime status as enabled so that on an attach request, the debugger will be able to see this status and will continue with debug enablement as normal. A secondary requirement is to conditionally enable the trap tempories only if the user requests it (env var HSA_ENABLE_DEBUG=1) or if the debugger attaches with HSA runtime enabled. This is because setting up the trap temporaries incurs a performance overhead that is unacceptable for microbench performance in normal mode for certain customers. In the scenario where the debugger spawns the target process, when ROCr detects that the debugger has attached during the runtime enable request, it will enable the trap temporaries before it blocks the target process while waiting for the debugger to respond. In the scenario where the debugger attaches to a running target process, it will enable to trap temporaries itself. Finally, there is an additional restriction that is required to be enforced with runtime enable and HW debug mode setting. The debugger must first ensure that HW debug mode has been enabled before permitting HW debug mode operations. With single process debug devices, allowing the debugger to set debug HW modes prior to trap activation means that debug HW mode setting can occur before the KFD has reserved the debug VMID (0xf) from the hardware scheduler's VMID allocation resource pool. This can result in the hardware scheduler assigning VMID 0xf to a non-debugged process and having that process inherit debug HW mode settings intended for the debugged target process instead, which is both incorrect and potentially fatal for normal mode operation. With multi process debug devices, allowing the debugger to set debug HW modes prior to trap activation means that non-debugged processes migrating to a new VMID could inherit unintended debug settings. All debug operations that touch HW settings must require trap activation where trap activation is triggered by both debug attach and runtime enablement (target has KFD opened and is ready to dispatch work). Signed-off-by: Jonathan Kim <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09	drm/amdkfd: add send exception operation	Jonathan Kim	8	-6/+135
	Add a debug operation that allows the debugger to send an exception directly to runtime through a payload address. For memory violations, normal vmfault signals will be applied to notify runtime instead after passing in the saved exception data when a memory violation was raised to the debugger. For runtime exceptions, this will unblock the runtime enable function which will be explained and implemented in a follow up patch. Signed-off-by: Jonathan Kim <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09	drm/amdkfd: add raise exception event function	Jonathan Kim	4	-0/+123
	Exception events can be generated from interrupts or queue activitity. The raise event function will save exception status of a queue, device or process then notify the debugger of the status change by writing to a debugger polled file descriptor that the debugger provides during debug attach. For memory violation exceptions, extra exception data will be saved. The debugger will be able to query the saved exception states by query operation that will be provided by follow up patches. Signed-off-by: Jonathan Kim <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09	drm/amdkfd: apply trap workaround for gfx11	Jonathan Kim	14	-31/+122
	Due to a HW bug, waves in only half the shader arrays can enter trap. When starting a debug session, relocate all waves to the first shader array of each shader engine and mask off the 2nd shader array as unavailable. When ending a debug session, re-enable the 2nd shader array per shader engine. User CU masking per queue cannot be guaranteed to remain functional if requested during debugging (e.g. user cu mask requests only 2nd shader array as an available resource leading to zero HW resources available) nor can runtime be alerted of any of these changes during execution. Make user CU masking and debugging mutual exclusive with respect to availability. If the debugger tries to attach to a process with a user cu masked queue, return the runtime status as enabled but busy. If the debugger tries to attach and fails to reallocate queue waves to the first shader array of each shader engine, return the runtime status as enabled but with an error. In addition, like any other mutli-process debug supported devices, disable trap temporary setup per-process to avoid performance impact from setup overhead. Signed-off-by: Jonathan Kim <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09	drm/amdkfd: add per process hw trap enable and disable functions	Jonathan Kim	4	-2/+190
	To enable HW debug mode per process, all devices must be debug enabled successfully. If a failure occures, rewind the enablement of debug mode on the enabled devices. A power management scenario that needs to be considered is HW debug mode setting during GFXOFF. During GFXOFF, these registers will be unreachable so we have to transiently disable GFXOFF when setting. Also, some devices don't support the RLC save restore function for these debug registers so we have to disable GFXOFF completely during a debug session. Cooperative launch also has debugging restriction based on HW/FW bugs. If such bugs exists, the debugger cannot attach to a process that uses GWS resources nor can GWS resources be requested if a process is being debugged. Multi-process debug devices can only enable trap temporaries based on certain runtime scenerios, which will be explained when the runtime enable functions are implemented in a follow up patch. Signed-off-by: Jonathan Kim <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09	drm/amdgpu: expose debug api for mes	Jonathan Kim	4	-1/+84
	Similar to the F32 HWS, the RS64 HWS for GFX11 now supports a multi-process debug API. The skip_process_ctx_clear ADD_QUEUE requirement is to prevent the MES from clearing the process context when the first queue is added to the scheduler in order to maintain debug mode settings during queue preemption and restore. The MES clears the process context in this case due to an unresolved FW caching bug during normal mode operations. During debug mode, the KFD will hold a reference to the target process so the process context should never go stale and MES can afford to skip this requirement. Signed-off-by: Jonathan Kim <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09	drm/amdgpu: prepare map process for multi-process debug devices	Jonathan Kim	6	-0/+87
	Unlike single process debug devices, multi-process debug devices allow debug mode setting per-VMID (non-device-global). Because the HWS manages PASID-VMID mapping, the new MAP_PROCESS API allows the KFD to forward the required SPI debug register write requests. To request a new debug mode setting change, the KFD must be able to preempt all queues then remap all queues with these new setting requests for MAP_PROCESS to take effect. Note that by default, trap enablement in non-debug mode must be disabled for performance reasons for multi-process debug devices due to setup overhead in FW. Signed-off-by: Jonathan Kim <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09	drm/amdkfd: prepare map process for single process debug devices	Jonathan Kim	4	-1/+111
	Older HW only supports debugging on a single process because the SPI debug mode setting registers are device global. The HWS has supplied a single pinned VMID (0xf) for MAP_PROCESS for debug purposes. To pin the VMID, the KFD will remove the VMID from the HWS dynamic VMID allocation via SET_RESOUCES so that a debugged process will never migrate away from its pinned VMID. The KFD is responsible for reserving and releasing this pinned VMID accordingly whenever the debugger attaches and detaches respectively. Signed-off-by: Jonathan Kim <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09	drm/amdgpu: add configurable grace period for unmap queues	Jonathan Kim	14	-20/+295
	The HWS schedule allows a grace period for wave completion prior to preemption for better performance by avoiding CWSR on waves that can potentially complete quickly. The debugger, on the other hand, will want to inspect wave status immediately after it actively triggers preemption (a suspend function to be provided). To minimize latency between preemption and debugger wave inspection, allow immediate preemption by setting the grace period to 0. Note that setting the preepmtion grace period to 0 will result in an infinite grace period being set due to a CP FW bug so set it to 1 for now. Signed-off-by: Jonathan Kim <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09	drm/amdgpu: add gfx11 hw debug mode enable and disable calls	Jonathan Kim	1	-0/+38
	Implement the per-device calls to enable or disable HW debug mode for GFX11. Signed-off-by: Jonathan Kim <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09	drm/amdgpu: add gfx9.4.2 hw debug mode enable and disable calls	Jonathan Kim	1	-1/+41
	GFX9.4.2 now supports per-VMID debug mode controls registers (SPI_GDBG_PER_VMID_CNTL). Because the KFD lets the HWS handle PASID-VMID mapping, the KFD will forward all debug mode setting register writes to the HWS scheduler using a new MAP_PROCESS API, so instead of writing to registers, return the required register values that the HWS needs to write on debug enable and disable. Signed-off-by: Jonathan Kim <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09	drm/amdgpu: add gfx10 hw debug mode enable and disable calls	Jonathan Kim	3	-145/+127
	Similar to GFX9 debug devices, set the hardware debug mode by draining the SPI appropriately prior the mode setting request. Because GFX10 has waves allocated by the work group boundary and each SE's SPI instances do not communicate, the SPI drain time is much longer. This long drain time will be fixed for GFX11 onwards. Also remove a bunch of deprecated misplaced references for GFX10.3. Signed-off-by: Jonathan Kim <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09	drm/amdkfd: fix kfd_suspend_all_processes	Jonathan Kim	1	-1/+1
	Flush delayed restore work in kfd_suspend_all_queues instead of cancelling. Cancelling the work before it runs results in the queues becoming permanently disabled. Flushing the work ensures that the queue suspend/resume state stays balanced. Signed-off-by: Jonathan Kim <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09	drm/amdgpu: add gfx9.4.1 hw debug mode enable and disable calls	Jonathan Kim	3	-2/+121
	On GFX9.4.1, the implicit wait count instruction on s_barrier is disabled by default in the driver during normal operation for performance requirements. There is a hardware bug in GFX9.4.1 where if the implicit wait count instruction after an s_barrier instruction is disabled, any wave that hits an exception may step over the s_barrier when returning from the trap handler with the barrier logic having no ability to be aware of this, thereby causing other waves to wait at the barrier indefinitely resulting in a shader hang. This bug has been corrected for GFX9.4.2 and onward. Since the debugger subscribes to hardware exceptions, in order to avoid this bug, the debugger must enable implicit wait count on s_barrier for a debug session and disable it on detach. In order to change this setting in the in the device global SQ_CONFIG register, the GFX pipeline must be idle. GFX9.4.1 as a compute device will either dispatch work through the compute ring buffers used for image post processing or through the hardware scheduler by the KFD. Have the KGD suspend and drain the compute ring buffer, then suspend the hardware scheduler and block any future KFD process job requests before changing the implicit wait count setting. Once set, resume all work. Signed-off-by: Jonathan Kim <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09	drm/amdgpu: add gfx9 hw debug mode enable and disable calls	Jonathan Kim	2	-0/+101
	Implement the per-device calls to enable or disable HW debug mode for GFX9 prior to GFX9.4.1. GFX9.4.1 and onward will require their own enable/disable sequence as follow on patches. When hardware debug mode setting is requested, waves will inherit these settings in the Shader Processor Input's (SPI) Sequencer Global Block (SQG). This means that the KGD must drain all waves from the SPI into SQG (approximately 96 SPI clock cycles) prior to debug mode setting to ensure that the order of operations that the debugger expects with regards to debug mode setting transaction requests and wave inheritence of that mode is upheld. Also ensure that exception overrides are reset to their original state prior to debug enable or disable. Signed-off-by: Jonathan Kim <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09	drm/amdkfd: clean up one inconsistent indenting	Yang Li	1	-1/+1
	drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_device.c:1036 kgd2kfd_interrupt() warn: inconsistent indenting Signed-off-by: Yang Li <[email protected]> Signed-off-by: Felix Kuehling <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>