blaster4385/linux-IllusionX - Linux kernel with personal config changes for arch linux

Age	Commit message (Collapse)	Author	Files	Lines
2022-01-19	drm/amdgpu: Remove repeated calls	yipechai	1	-3/+1
	Remove repeated calls. Signed-off-by: yipechai <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2022-01-18	drm/amdgpu: Fix the code style warnings in amdgpu_ras	yipechai	1	-17/+24
	Fix the code style warnings in amdgpu_ras: 1. ERROR: space required before the open parenthesis '('. 2. WARNING: line length of xxx exceeds 100 columns. 3. ERROR: "foo* bar" should be "foo *bar". 4. WARNING: unnecessary whitespace before a quoted newline. 5. WARNING: space prohibited before semicolon. 6. WARNING: suspect code indent for conditional statements. 7. WARNING: braces {} are not necessary for single statement blocks. Signed-off-by: yipechai <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Acked-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2022-01-18	drm/amdgpu: handle denied inject error into critical regions v2	Stanley.Yang	1	-1/+1
	Changed from v1: remove unused brace Signed-off-by: Stanley.Yang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2022-01-14	drm/amdgpu: fix compile warning for ras_block_match_default	yipechai	1	-1/+2
	fix compile warning for ras_block_match_default Signed-off-by: yipechai <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2022-01-14	drm/amdgpu: Use ARRAY_SIZE to get array length	yipechai	1	-1/+2
	Use ARRAY_SIZE to get array length. Signed-off-by: yipechai <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2022-01-14	drm/amdgpu: clean up some inconsistent indenting	Yang Li	1	-3/+5
	Eliminate the follow smatch warnings: drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:3504 amdgpu_device_init() warn: inconsistent indenting drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c:1716 amdgpu_ras_error_status_query() warn: if statement not indented drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c:1058 amdgpu_ras_error_inject() warn: inconsistent indenting Reported-by: Abaci Robot <[email protected]> Reviewed-by: Guchun Chen <[email protected]> Signed-off-by: Yang Li <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2022-01-14	drm/amdgpu: remove unneeded semicolon	Yang Li	1	-1/+1
	Eliminate the following coccicheck warning: ./drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c:2725:16-17: Unneeded semicolon Reported-by: Abaci Robot <[email protected]> Reviewed-by: Guchun Chen <[email protected]> Signed-off-by: Yang Li <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2022-01-14	drm/amdgpu: No longer insert ras blocks into ras_list if it already exists ↵	yipechai	1	-0/+8
	in ras_list No longer insert ras blocks into ras_list if it already exists in ras_list. Signed-off-by: yipechai <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2022-01-14	drm/amdgpu: Add ras supported check for register_ras_block	yipechai	1	-0/+3
	Add ras supported check for register_ras_block. Signed-off-by: yipechai <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2022-01-14	drm/amdgpu: Removed redundant ras code	yipechai	1	-48/+19
	Removed redundant ras code. Signed-off-by: yipechai <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Reviewed-by: John Clements <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2022-01-14	drm/amdgpu: Adjust error inject function code style in amdgpu_ras.c	yipechai	1	-76/+13
	1. Move xgmi special error inject function from amdgpu_ras.c to xgmi block. 2. Support to use psp_ras_trigger_error as default error inject function in amdgpu_ras.c. If .ras_error_inject isn't defined in ras block, default error inject function will take effect. v2: squash in warning fix (Alex) Signed-off-by: yipechai <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Reviewed-by: John Clements <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2022-01-14	drm/amdgpu: Modify sdma block to fit for the unified ras block data and ops	yipechai	1	-10/+0
	1.Modify sdma block to fit for the unified ras block data and ops. 2.Change amdgpu_sdma_ras_funcs to amdgpu_sdma_ras, and the corresponding variable name remove _funcs suffix. 3.Remove the const flag of sdma ras variable so that sdma ras block can be able to be inserted into amdgpu device ras block link list. 4.Invoke amdgpu_ras_register_ras_block function to register sdma ras block into amdgpu device ras block link list. 5.Remove the redundant code about sdma in amdgpu_ras.c after using the unified ras block. 6.Fill unified ras block .name .block .ras_late_init and .ras_fini for all of sdma versions. If .ras_late_init and .ras_fini had been defined by the selected sdma version, the defined functions will take effect; if not defined, default fill them with amdgpu_sdma_ras_late_init and amdgpu_sdma_ras_fini. v2: squash in warning fix (Alex) Signed-off-by: yipechai <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Reviewed-by: John Clements <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2022-01-14	drm/amdgpu: Modify umc block to fit for the unified ras block data and ops	yipechai	1	-15/+15
	1.Modify umc block to fit for the unified ras block data and ops. 2.Change amdgpu_umc_ras_funcs to amdgpu_umc_ras, and the corresponding variable name remove _funcs suffix. 3.Remove the const flag of umc ras variable so that umc ras block can be able to be inserted into amdgpu device ras block link list. 4.Invoke amdgpu_ras_register_ras_block function to register umc ras block into amdgpu device ras block link list. 5.Remove the redundant code about umc in amdgpu_ras.c after using the unified ras block. 6.Fill unified ras block .name .block .ras_late_init and .ras_fini for all of umc versions. If .ras_late_init and .ras_fini had been defined by the selected umc version, the defined functions will take effect; if not defined, default fill them with amdgpu_umc_ras_late_init and amdgpu_umc_ras_fini. Signed-off-by: yipechai <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Reviewed-by: John Clements <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2022-01-14	drm/amdgpu: Modify nbio block to fit for the unified ras block data and ops	yipechai	1	-12/+10
	1.Modify nbio block to fit for the unified ras block data and ops. 2.Change amdgpu_nbio_ras_funcs to amdgpu_nbio_ras, and the corresponding variable name remove _funcs suffix. 3.Remove the const flag of mmhub ras variable so that nbio ras block can be able to be inserted into amdgpu device ras block link list. 4.Invoke amdgpu_ras_register_ras_block function to register nbio ras block into amdgpu device ras block link list. 5.Remove the redundant code about nbio in amdgpu_ras.c after using the unified ras block. Signed-off-by: yipechai <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Reviewed-by: John Clements <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2022-01-14	drm/amdgpu: Modify mmhub block to fit for the unified ras block data and ops	yipechai	1	-34/+13
	1.Modify mmhub block to fit for the unified ras block data and ops. 2.Change amdgpu_mmhub_ras_funcs to amdgpu_mmhub_ras, and the corresponding variable name remove _funcs suffix. 3.Remove the const flag of mmhub ras variable so that mmhub ras block can be able to be inserted into amdgpu device ras block link list. 4.Invoke amdgpu_ras_register_ras_block function to register mmhub ras block into amdgpu device ras block link list. 5.Remove the redundant code about mmhub in amdgpu_ras.c after using the unified ras block. 5.Remove the redundant code about mmhub in amdgpu_ras.c after using the unified ras block. 6.Fill unified ras block .name .block .ras_late_init and .ras_fini for all of mmhub versions. If .ras_late_init and .ras_fini had been defined by the selected mmhub version, the defined functions will take effect; if not defined, default fill them with amdgpu_mmhub_ras_late_init and amdgpu_mmhub_ras_fini. Signed-off-by: yipechai <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Reviewed-by: John Clements <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2022-01-14	drm/amdgpu: Modify hdp block to fit for the unified ras block data and ops	yipechai	1	-8/+8
	1.Modify hdp block to fit for the unified ras block data and ops. 2.Change amdgpu_hdp_ras_funcs to amdgpu_hdp_ras, and the corresponding variable name remove _funcs suffix. 3.Remove the const flag of hdp ras variable so that hdp ras block can be able to be inserted into amdgpu device ras block link list. 4.Invoke amdgpu_ras_register_ras_block function to register hdp ras block into amdgpu device ras block link list. 5.Remove the redundant code about hdp in amdgpu_ras.c after using the unified ras block. Signed-off-by: yipechai <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Reviewed-by: John Clements <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2022-01-14	drm/amdgpu: Modify xgmi block to fit for the unified ras block data and ops	yipechai	1	-3/+7
	1.Modify gmc block to fit for the unified ras block data and ops. 2.Change amdgpu_xgmi_ras_funcs to amdgpu_xgmi_ras, and the corresponding variable name remove _funcs suffix. 3.Remove the const flag of gmc ras variable so that gmc ras block can be able to be inserted into amdgpu device ras block link list. 4.Invoke amdgpu_ras_register_ras_block function to register gmc ras block into amdgpu device ras block link list. 5.Remove the redundant code about gmc in amdgpu_ras.c after using the unified ras block. Signed-off-by: yipechai <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Reviewed-by: John Clements <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2022-01-14	drm/amdgpu: Modify gfx block to fit for the unified ras block data and ops	yipechai	1	-21/+42
	1.Modify gfx block to fit for the unified ras block data and ops. 2.Change amdgpu_gfx_ras_funcs to amdgpu_gfx_ras, and the corresponding variable name remove _funcs suffix. 3.Remove the const flag of gfx ras variable so that gfx ras block can be able to be inserted into amdgpu device ras block link list. 4.Invoke amdgpu_ras_register_ras_block function to register gfx ras block into amdgpu device ras block link list. 5.Remove the redundant code about gfx in amdgpu_ras.c after using the unified ras block. 6.Fill unified ras block .name .block .ras_late_init and .ras_fini for all of gfx versions. If .ras_late_init and .ras_fini had been defined by the selected gfx version, the defined functions will take effect; if not defined, default fill with amdgpu_gfx_ras_late_init and amdgpu_gfx_ras_fini. Signed-off-by: yipechai <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Reviewed-by: John Clements <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2022-01-14	drm/amdgpu: Modify the compilation failed problem when other ras blocks' .h ↵	yipechai	1	-0/+39
	include amdgpu_ras.h Modify the compilation failed problem when other ras blocks' .h include amdgpu_ras.h. v2: squash in forward declaration warning fix (Alex) Signed-off-by: yipechai <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Reviewed-by: John Clements <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2022-01-14	drm/amdgpu: Unify ras block interface for each ras block	yipechai	1	-0/+46
	1. Define unified ops interface for each block. 2. Add ras_block_match function pointer in ops interface, each ras block can customize specail match function to identify itself. 3. Add amdgpu_ras_block_match_default new function. If a ras block doesn't define .ras_block_match, default execute amdgpu_ras_block_match_default to identify this ras block. 4. Define unified basic ras block data for each ras block. 5. Create dedicated amdgpu device ras block link list to manage all of the ras blocks. 6. Add amdgpu_ras_register_ras_block new function interface for each ras block to register itself to ras controlling block. Signed-off-by: yipechai <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Reviewed-by: John Clements <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2022-01-14	drm/amd/pm: do not expose implementation details to other blocks out of power	Evan Quan	1	-3/+2
	Those implementation details(whether swsmu supported, some ppt_funcs supported, accessing internal statistics ...)should be kept internally. It's not a good practice and even error prone to expose implementation details. Signed-off-by: Evan Quan <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2022-01-11	drm/amdgpu: do not pass ttm_resource_manager to vram_mgr	Nirmoy Das	1	-4/+2
	Do not allow exported amdgpu_vram_mgr_*() to accept any ttm_resource_manager pointer. Also there is no need to force other module to call a ttm function just to eventually call vram_mgr functions. v2: pass adev's vram_mgr instead of adev Reviewed-by: Christian König <[email protected]> Signed-off-by: Nirmoy Das <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2022-01-11	drm/amdgpu: Clear garbage data in err_data before usage	Jiawei Gu	1	-0/+1
	Memory of err_data should be cleaned before usage when there're multiple entry in ras ih. Otherwise garbage data from last loop will be used. Signed-off-by: Jiawei Gu <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-12-13	drm/amdgpu: fix amdgpu_ras_mca_query_error_status scope	Isabella Basso	1	-3/+3
	This commit fixes the compile-time warning below: warning: no previous prototype for ‘amdgpu_ras_mca_query_error_status’ [-Wmissing-prototypes] Changes since v1: - As suggested by Alexander Deucher: 1. Make function static instead of adding prototype. Signed-off-by: Isabella Basso <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-12-13	drm/amd: fix improper docstring syntax	Isabella Basso	1	-3/+3
	This fixes various warnings relating to erroneous docstring syntax, of which some are listed below: warning: Function parameter or member 'adev' not described in 'amdgpu_atomfirmware_ras_rom_addr' ... warning: expecting prototype for amdgpu_atpx_validate_functions(). Prototype was for amdgpu_atpx_validate() instead ... warning: Excess function parameter 'mem' description in 'amdgpu_preempt_mgr_new' ... warning: Cannot understand * @kfd_get_cu_occupancy - Collect number of waves in-flight on this device ... warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst Signed-off-by: Isabella Basso <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-12-13	drm/amdgpu: enable RAS poison flag when GPU is connected to CPU	Tao Zhou	1	-1/+5
	The RAS poison mode is enabled by default on the platform. Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-12-07	drm/amdgpu: skip umc ras error count harvest	Stanley.Yang	1	-5/+10
	remove in recovery stat check, skip umc ras err cnt harvest in amdgpu_ras_log_on_err_counter Signed-off-by: Stanley.Yang <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-12-07	drm/amdgpu: only skip get ecc info for aldebaran	Stanley.Yang	1	-1/+2
	skip get ecc info for aldebarn through check ip version do not affect other asic type Signed-off-by: Stanley.Yang <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-12-02	drm/amdgpu: skip query ecc info in gpu recovery	Stanley.Yang	1	-0/+4
	this is a workaround due to get ecc info failed during gpu recovery [ 700.236122] amdgpu 0000:09:00.0: amdgpu: Failed to export SMU ecc table! [ 700.236128] amdgpu 0000:09:00.0: amdgpu: GPU reset begin! [ 704.331171] amdgpu: qcm fence wait loop timeout expired [ 704.331194] amdgpu: The cp might be in an unrecoverable state due to an unsuccessful queues preemption [ 704.332445] amdgpu 0000:09:00.0: amdgpu: GPU reset begin! [ 704.332448] amdgpu 0000:09:00.0: amdgpu: Bailing on TDR for s_job:ffffffffffffffff, as another already in progress [ 704.332456] amdgpu: Pasid 0x8000 destroy queue 0 failed, ret -62 [ 710.360924] amdgpu 0000:09:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000013 SMN_C2PMSG_82:0x00000007 [ 710.360964] amdgpu 0000:09:00.0: amdgpu: Failed to disable smu features. [ 710.361002] amdgpu 0000:09:00.0: amdgpu: Fail to disable dpm features! [ 710.361014] [drm:amdgpu_device_ip_suspend_phase2 [amdgpu]] ERROR suspend of IP block <smu> failed -62 Signed-off-by: Stanley.Yang <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-12-01	drm/amdgpu: fix disable ras feature failed when unload drvier v2	Stanley.Yang	1	-1/+0
	v2: still need call ras_disable_all_featrures to handle ras initilization failure case. Function amdgpu_device_fini_hw is called before amdgpu_device_fini_sw, so ras ta will unload before send ras disable command, ras dsiable operation must before hw fini. Signed-off-by: Stanley.Yang <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-11-22	drm/amdgpu: query umc error info from ecc_table v2	Stanley.Yang	1	-9/+33
	if smu support ECCTABLE, driver can message smu to get ecc_table then query umc error info from ECCTABLE v2: optimize source code makes logical more reasonable Signed-off-by: Stanley.Yang <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-11-22	drm/amdgpu: Add recovery_lock to save bad pages function	Candice Li	1	-0/+2
	Fix race condition failure during UMC UE injection. Signed-off-by: Candice Li <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-10-13	drm/amdgpu: Fix RAS page retirement with mode2 reset on Aldebaran	Mukul Joshi	1	-12/+21
	During mode2 reset, the GPU is temporarily removed from the mgpu_info list. As a result, page retirement fails because it cannot find the GPU in the GPU list. To fix this, create our own list of GPUs that support MCE notifier based page retirement and use that list to check if the UMC error occurred on a GPU that supports MCE notifier based page retirement. Signed-off-by: Mukul Joshi <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-10-06	drm/amdgpu: Register MCE notifier for Aldebaran RAS	Mukul Joshi	1	-0/+141
	On Aldebaran, GPU driver will handle bad page retirement for GPU memory even though UMC is host managed. As a result, register a bad page retirement handler on the mce notifier chain to retire bad pages on Aldebaran. Signed-off-by: Mukul Joshi <[email protected]> Reviewed-by: Yazen Ghannam <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-10-04	drm/amdgpu: resolve RAS query bug	John Clements	1	-0/+3
	clear error count when persistant harvesting is not enabled Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: John Clements <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-09-28	drm/amdgpu: skip umc ras irq handling in poison mode (v2)	Tao Zhou	1	-14/+20
	In ras poison mode, umc uncorrectable error will be ignored until the corrupted data consumed by another ras module (such as gfx, sdma). v2: update the debug message and replace dev_warn with dev_info. Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-09-28	drm/amdgpu: set poison supported flag for RAS (v2)	Tao Zhou	1	-2/+30
	Add RAS poison supported flag and tell PSP RAS TA about the info. v2: rename poison mode to poison supported, we can also disable poison mode even we support it. print value of poison supported if ras feature enablement fails. Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-09-23	drm/amdgpu: Remove all code paths under the EAGAIN path in RAS late init	Candice Li	1	-32/+1
	All code paths under the EAGAIN path in RAS late init are unused. Signed-off-by: Candice Li <[email protected]> Reviewed-by: John Clements <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-09-23	drm/amdgpu: Updated RAS infrastructure	John Clements	1	-37/+109
	Update RAS infrastructure to support RAS query for MCA subblocks Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: John Clements <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-09-14	drm/amdgpu: Update RAS trigger error block support	John Clements	1	-0/+3
	Added trigger error support for MP0/MP1/MPIO blocks Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: John Clements <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-09-01	drm/amd/amdgpu: add mpio to ras block	Candice Li	1	-0/+1
	Add MPIO to RAS block Signed-off-by: Candice Li <[email protected]> Reviewed-by: John Clements <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-08-16	drm/amd/amdgpu: remove unnecessary RAS context field	Candice Li	1	-3/+1
	Delete ras_if->name in the RAS ctx structure and remove related lines. Signed-off-by: Candice Li <[email protected]> Reviewed-by: John Clements <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-08-16	drm/amd/amdgpu: consolidate PSP TA context	Candice Li	1	-1/+1
	Signed-off-by: Candice Li <[email protected]> Reviewed-by: John Clements <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-07-08	drm/amdgpu: Return error if no RAS	Luben Tuikov	1	-14/+35
	In amdgpu_ras_query_error_count() return an error if the device doesn't support RAS. This prevents that function from having to always set the values of the integer pointers (if set), and thus prevents function side effects--always to have to set values of integers if integer pointers set, regardless of whether RAS is supported or not--with this change this side effect is mitigated. Also, if no pointers are set, don't count, since we've no way of reporting the counts. Also, give this function a kernel-doc. Cc: Alexander Deucher <[email protected]> Cc: John Clements <[email protected]> Cc: Hawking Zhang <[email protected]> Reported-by: Tom Rix <[email protected]> Fixes: a46751fbcde505 ("drm/amdgpu: Fix RAS function interface") Signed-off-by: Luben Tuikov <[email protected]> Reviewed-by: Alexander Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-07-01	drm/amdgpu: Fix koops when accessing RAS EEPROM	Luben Tuikov	1	-4/+12
	Debugfs RAS EEPROM files are available when the ASIC supports RAS, and when the debugfs is enabled, an also when "ras_enable" module parameter is set to 0. However in this case, we get a kernel oops when accessing some of the "ras_..." controls in debugfs. The reason for this is that struct amdgpu_ras::adev is unset. This commit sets it, thus enabling access to those facilities. Note that this facilitates EEPROM access and not necessarily RAS features or functionality. Cc: Alexander Deucher <[email protected]> Cc: John Clements <[email protected]> Cc: Hawking Zhang <[email protected]> Signed-off-by: Luben Tuikov <[email protected]> Acked-by: Alexander Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-07-01	drm/amdgpu: RAS EEPROM table is now in debugfs	Luben Tuikov	1	-3/+9
	Add "ras_eeprom_size" file in debugfs, which reports the maximum size allocated to the RAS table in EEROM, as the number of bytes and the number of records it could store. For instance, $cat /sys/kernel/debug/dri/0/ras/ras_eeprom_size 262144 bytes or 10921 records $_ Add "ras_eeprom_table" file in debugfs, which dumps the RAS table stored EEPROM, in a formatted way. For instance, $cat ras_eeprom_table Signature Version FirstOffs Size Checksum 0x414D4452 0x00010000 0x00000014 0x000000EC 0x000000DA Index Offset ErrType Bank/CU TimeStamp Offs/Addr MemChl MCUMCID RetiredPage 0 0x00014 ue 0x00 0x00000000607608DC 0x000000000000 0x00 0x00 0x000000000000 1 0x0002C ue 0x00 0x00000000607608DC 0x000000001000 0x00 0x00 0x000000000001 2 0x00044 ue 0x00 0x00000000607608DC 0x000000002000 0x00 0x00 0x000000000002 3 0x0005C ue 0x00 0x00000000607608DC 0x000000003000 0x00 0x00 0x000000000003 4 0x00074 ue 0x00 0x00000000607608DC 0x000000004000 0x00 0x00 0x000000000004 5 0x0008C ue 0x00 0x00000000607608DC 0x000000005000 0x00 0x00 0x000000000005 6 0x000A4 ue 0x00 0x00000000607608DC 0x000000006000 0x00 0x00 0x000000000006 7 0x000BC ue 0x00 0x00000000607608DC 0x000000007000 0x00 0x00 0x000000000007 8 0x000D4 ue 0x00 0x00000000607608DD 0x000000008000 0x00 0x00 0x000000000008 $_ Cc: Alexander Deucher <[email protected]> Cc: Andrey Grodzovsky <[email protected]> Cc: John Clements <[email protected]> Cc: Hawking Zhang <[email protected]> Cc: Xinhui Pan <[email protected]> Signed-off-by: Luben Tuikov <[email protected]> Acked-by: Alexander Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-07-01	drm/amdgpu: Optimize EEPROM RAS table I/O	Luben Tuikov	1	-4/+4
	Split functionality between read and write, which simplifies the code and exposes areas of optimization and more or less complexity, and take advantage of that. Read and write the table in one go; use a separate stage to decode or encode the data, as opposed to on the fly, which keeps the I2C bus busy. Use a single read/write to read/write the table or at most two if the number of records we're reading/writing wraps around. Check the check-sum of a table in EEPROM on init. Update the checksum at the same time as when updating the table header signature, when the threshold was increased on boot. Take advantage of arithmetic modulo 256, that is, use a byte!, to greatly simplify checksum arithmetic. Cc: Alexander Deucher <[email protected]> Cc: Andrey Grodzovsky <[email protected]> Signed-off-by: Luben Tuikov <[email protected]> Acked-by: Alexander Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-07-01	drm/amdgpu: Some renames	Luben Tuikov	1	-8/+8
	Qualify with "ras_". Use kernel's own--don't redefine your own. Cc: Alexander Deucher <[email protected]> Cc: Andrey Grodzovsky <[email protected]> Signed-off-by: Luben Tuikov <[email protected]> Reviewed-by: Alexander Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-07-01	drm/amdgpu: Use explicit cardinality for clarity	Luben Tuikov	1	-29/+21
	RAS_MAX_RECORD_NUM may mean the maximum record number, as in the maximum house number on your street, or it may mean the maximum number of records, as in the count of records, which is also a number. To make this distinction whether the number is ordinal (index) or cardinal (count), rename this macro to RAS_MAX_RECORD_COUNT. This makes it easy to understand what it refers to, especially when we compute quantities such as, how many records do we have left in the table, especially when there are so many other numbers, quantities and numerical macros around. Also rename the long, amdgpu_ras_eeprom_get_record_max_length() to the more succinct and clear, amdgpu_ras_eeprom_max_record_count(). When computing the threshold, which also deals with counts, i.e. "how many", use cardinal "max_eeprom_records_count", than the quantitative "max_eeprom_records_len". Simplify the logic here and there, as well. Cc: Guchun Chen <[email protected]> Cc: John Clements <[email protected]> Cc: Hawking Zhang <[email protected]> Cc: Alexander Deucher <[email protected]> Signed-off-by: Luben Tuikov <[email protected]> Acked-by: Alexander Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-07-01	drm/amdgpu: Return result fix in RAS	Luben Tuikov	1	-9/+13
	The low level EEPROM write method, doesn't return 1, but the number of bytes written. Thus do not compare to 1, instead, compare to greater than 0 for success. Other cleanup: if the lower layers returned -errno, then return that, as opposed to overwriting the error code with one-fits-all -EINVAL. For instance, some return -EAGAIN. Cc: Jean Delvare <[email protected]> Cc: Alexander Deucher <[email protected]> Cc: Andrey Grodzovsky <[email protected]> Cc: Lijo Lazar <[email protected]> Cc: Stanley Yang <[email protected]> Cc: Hawking Zhang <[email protected]> Signed-off-by: Luben Tuikov <[email protected]> Reviewed-by: Alexander Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>