blaster4385/linux-IllusionX - Linux kernel with personal config changes for arch linux

Age	Commit message (Collapse)	Author	Files	Lines
2023-08-31	drm/amdgpu: Free ras cmd input buffer properly	Hawking Zhang	1	-7/+7
	Do not access the pointer for ras input cmd buffer if it is even not allocated. Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Stanley Yang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-08-31	drm/amdgpu: Enable ras for mp0 v13_0_6 sriov	YiPeng Chai	1	-0/+1
	Enable ras for mp0 v13_0_6 sriov Signed-off-by: YiPeng Chai <[email protected]> Reviewed-by: Stanley.Yang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-08-15	drm/amdgpu: mode1 reset needs to recover mp1 for mp0 v13_0_10	YiPeng Chai	1	-0/+2
	Mode1 reset needs to recover mp1 in fatal error case for mp0 v13_0_10. v2: Define a macro to wrap psp function calls. Signed-off-by: YiPeng Chai <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-08-15	drm/amdgpu: Remove unnecessary ras cap check	Hawking Zhang	1	-4/+0
	RAS global isr will only be invoked by hardware interrupt. Don't need to query ras capability in isr In addition, amdgpu_ras_interrupt_fatal_error_handler ensures the isr won't be called from guest linux side by accident. The RAS cap check in isr that introduced to fix sriov crash is not needed any more Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-08-09	drm/amdgpu: Extend poison mode check to SDMA/VCN/JPEG	Candice Li	1	-1/+4
	Treat SDMA/VCN/JPEG as RAS capable IP blocks in poison mode. Signed-off-by: Candice Li <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-08-09	drm/amdgpu: add RAS fatal error handler for NBIO v7.9	Tao Zhou	1	-0/+5
	Register RAS fatal error interrupt and add handler. v2: only register NBIO RAS for dGPU platform. change nbio_v7_9_set_ras_controller_irq_state and nbio_v7_9_set_ras_err_event_athub_irq_state to dummy functions. Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-08-07	drm/amdgpu: Issue ras enable_feature for gfx ip only	Hawking Zhang	1	-20/+10
	For non-GFX IP blocks, set up ras obj if ras feature is allowed. For GFX IP blocks, force issue ras enable_feature command to firmware and only set up ras obj if ras feature is allowed Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-08-07	drm/amdgpu: Apply poison mode check to GFX IP only	Hawking Zhang	1	-0/+1
	For GFX IP that only supports poison consumption, GFX RAS won't be marked as enabled. i.e., hardware doesn't support gfx sram ecc. But driver still needs to issue firmware to enable poison consumption mode for GFX IP. In such case, check poison mode and treat GFX IP as RAS capable IP block. Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-08-07	drm/amdgpu: Only create err_count sysfs when hw_op is supported	Hawking Zhang	1	-13/+18
	Some IP blocks only support partial ras feature and don't have ras counter and/or ras error status register at all. Driver should not create err_count sysfs node for those IP blocks. Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-07-25	drm/amdgpu: Check APU flag to disable RAS	Stanley.Yang	1	-1/+2
	Only disable RAS by default for aqua vanjaram on APU platform. Signed-off-by: Stanley.Yang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-07-21	drm/amd: Avoid reading the VBIOS part number twice	Mario Limonciello	1	-4/+4
	The VBIOS part number is read both in amdgpu_atom_parse() as well as in atom_get_vbios_pn() and stored twice in the `struct atom_context` structure. Remove the first unnecessary read and move the `pr_info` line from that read into the second. v2: squash in unused variable removal Signed-off-by: Mario Limonciello <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-07-18	drm/amdgpu: Disable RAS by default on APU flatform	Stanley.Yang	1	-2/+11
	Disable RAS feature by default for aqua vanjaram on APU platform. Changed from V1: Splite Disable RAS by default on APU platform into a separated patch. Changed from V2: Avoid to modify global variable amdgpu_ras_enable. Signed-off-by: Stanley.Yang <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-07-18	drm/amdgpu: Enable aqua vanjaram RAS	Stanley.Yang	1	-0/+1
	Enable RAS for aqua vanjaram. Changed from V1: Split the change in amdgpu_ras_asic_supported into a separated patch. Changed from V2: Avoid to modify global variable amdgpu_ras_enable. Signed-off-by: Stanley.Yang <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-07-07	drm/amdgpu: skip address adjustment for GFX RAS injection	Tao Zhou	1	-1/+2
	The address parameter of GFX RAS injection isn't related to XGMI node number, keep it unchanged. Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Candice Li <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-30	drm/amdgpu: gpu recovers from fatal error in poison mode	YiPeng Chai	1	-0/+11
	Fatal error occurs in ras poison mode, mode1 reset is used to recover gpu. Signed-off-by: YiPeng Chai <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-15	drm/amdgpu: Add checking mc_vram_size	Stanley.Yang	1	-1/+2
	Do not compare injection address with mc_vram_size if mc_vram_size is zero. Signed-off-by: Stanley.Yang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-15	drm/amdgpu: Optimize checking ras supported	Stanley.Yang	1	-5/+3
	Using "is_app_apu" to identify device in the native APU mode or carveout mode. Signed-off-by: Stanley.Yang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-15	drm/amdgpu: Fix usage of UMC fill record in RAS	Luben Tuikov	1	-2/+1
	The fixed commit listed in the Fixes tag below, introduced a bug in amdgpu_ras.c::amdgpu_reserve_page_direct(), in that when introducing the new amdgpu_umc_fill_error_record() and internally in that new function the physical address (argument "uint64_t retired_page"--wrong name) is right-shifted by AMDGPU_GPU_PAGE_SHIFT. Thus, in amdgpu_reserve_page_direct() when we pass "address" to that new function, we should NOT right-shift it, since this results, erroneously, in the page address to be 0 for first 2^(2*AMDGPU_GPU_PAGE_SHIFT) memory addresses. This commit fixes this bug. Cc: Tao Zhou <[email protected]> Cc: Hawking Zhang <[email protected]> Cc: Alex Deucher <[email protected]> Fixes: 400013b268cb ("drm/amdgpu: add umc_fill_error_record to make code more simple") Signed-off-by: Luben Tuikov <[email protected]> Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-15	drm/amdgpu: Report ras_num_recs in debugfs	Luben Tuikov	1	-0/+2
	Report the number of records stored in the RAS EEPROM table in debugfs. This can be used by user-space to calculate the capacity of the RAS EEPROM table since "bad_page_cnt_threshold" is also reported in the same place in debugfs. See commit 7fb640714547 ("drm/amdgpu: Add bad_page_cnt_threshold to debugfs"). ras_num_recs can already be inferred by dumping the RAS EEPROM table, also in the same debugfs location, see commit reference c65b0805e77919 (drm/amdgpu: RAS EEPROM table is now in debugfs, 2021-04-08). This commit makes it an integer value easily shown in a single file. Cc: Alex Deucher <[email protected]> Cc: Hawking Zhang <[email protected]> Cc: Tao Zhou <[email protected]> Cc: Stanley Yang <[email protected]> Cc: John Clements <[email protected]> Signed-off-by: Luben Tuikov <[email protected]> Link: https://lore.kernel.org/r/[email protected] Acked-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09	drm/amdgpu: Add support EEPROM table v2.1	Stanley.Yang	1	-1/+1
	Add ras info to EEPROM table, app can analyse device ECC status without GPU driver through EEPROM table ras info. Signed-off-by: Stanley.Yang <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09	drm/amdgpu: support check vcn jpeg block mask	Stanley.Yang	1	-1/+5
	Support VCN/JPEG instance mask checking, pass logical mask directly except GFX/SDMA/VCN/JPEG blocks. Changed from V1: correct a typo Signed-off-by: Stanley.Yang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09	drm/amdgpu: perform mode2 reset for sdma fed error on gfx v11_0_3	YiPeng Chai	1	-1/+7
	perform mode2 reset for sdma fed error on gfx v11_0_3. Signed-off-by: YiPeng Chai <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09	drm/amdgpu: add check for RAS instance mask	Tao Zhou	1	-0/+38
	The mask is only needed to be set when RAS block instance number is more than 1 and invalid bits should be also masked out. We only check valid bits for GFX and SDMA block for now, and will add check for other RAS blocks in the future. v2: move the check under injection operation since the mask is only used by RAS error inject. v3: add valid bits handling for SDMA. v4: print message if the mask is adjusted. Signed-off-by: Tao Zhou <[email protected]> Hawking Zhang <[email protected]> Reviewed-by: Stanley.Yang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09	drm/amdgpu: reorganize RAS injection flow	Tao Zhou	1	-7/+6
	So GFX RAS injection could use default function if it doesn't define its own injection interface. Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Reviewed-by: Stanley.Yang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09	drm/amdgpu: add instance mask for RAS inject	Tao Zhou	1	-7/+16
	User can specify injected instances by the mask. For backward compatibility, the mask value is incorporated into sub block index without interface change of RAS TA. User uses logical mask and driver should convert it to physical value before sending it to RAS TA. v2: update parameter name. Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Reviewed-by: Stanley.Yang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09	drm/amdgpu: Adjust the sequence to query ras error info	Hawking Zhang	1	-6/+7
	It turns out STATUS_VALID_FLAG needs to be checked ahead of any other fields. ADDRESS_VALID_FLAG and ERR_INFO_VALID_FLAG only manages ADDRESS and ERR_INFO field respectively. driver should continue poll ERR CNT field even ERR_INFO_VALD_FLAG is not set. Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09	drm/amdgpu: Enable persistent edc harvesting in APP APU	Hawking Zhang	1	-1/+2
	Persistent edc harvesting is supported in APP APU Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09	drm/amdgpu: Add common helper to reset ras error	Hawking Zhang	1	-0/+20
	Add common helper to reset ras error status. It applies to IP blocks that follow the new ras error logging register design, and need to write 0 to reset the error status. For IP blocks that don't support the new design, please still implement ip specific helper. Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09	drm/amdgpu: Add common helper to query ras error (v2)	Hawking Zhang	1	-0/+119
	Add common helper to query ras error status and log error information, including memory block id and erorr count. The helpers are applicable to IP blocks that follow the new ras error logging design. For IP blocks that don't support the new design, please still implement ip specific helper to query ras error. v2: optimize struct amdgpu_ras_err_status_reg_entry and the implementaion in helper (Lijo/Tao) Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-04-21	drm/amdgpu: Drop pcie_bif ras check from fatal error handler	Candice Li	1	-2/+1
	Some ASICs support fatal error event but do not support pcie_bif ras. Signed-off-by: Candice Li <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-04-14	Revert "drm/amdgpu: enable ras for mp0 v13_0_10 on SRIOV"	Jane Jian	1	-1/+0
	This reverts commit fe120b9f5ce873516a2604e4ff0c19084be94e8c. This patch impacts sriov multi-vf stability Signed-off-by: Jane Jian <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-04-11	drm/amdgpu: correct ras enabled flag	Stanley.Yang	1	-0/+7
	XGMI RAS should be according to the gmc xgmi physical nodes number, XGMI RAS should not be enabled if xgmi num_physical_nodes is zero. Signed-off-by: Stanley.Yang <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-03-31	drm/amdgpu: Add fatal error handling in nbio v4_3	Hawking Zhang	1	-0/+11
	GPU will stop working once fatal error is detected. it will inform driver to do reset to recover from the fatal error. v2: squash in logic fix (Srinivasan) v3: squash in logic fix (Dan) Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Candice Li <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-03-22	drm/amdgpu: enable ras for mp0 v13_0_10 on SRIOV	YiPeng Chai	1	-0/+1
	Enable ras for mp0 v13_0_10 on SRIOV. Signed-off-by: YiPeng Chai <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-03-15	drm/amdgpu: drop ras check at asic level for new blocks	Hawking Zhang	1	-3/+0
	amdgpu_ras_register_ras_block should always be invoked by ras_sw_init, where driver needs to check ras caps at ip level, instead of asic level. Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Stanley Yang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-03-15	drm/amdgpu: Rework pcie_bif ras sw_init	Hawking Zhang	1	-8/+11
	pcie_bif ras blocks needs to be initialized as early as possible to handle fatal error detected in hw_init phase. also align the pcie_bif ras sw_init with other ras blocks Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Stanley Yang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-02-23	drm/amdgpu: change default behavior of bad_page_threshold parameter	Tao Zhou	1	-3/+4
	Ignore ras umc bad page threshold by default, GPU initialization won't be stopped in this mode. v2: refine the description of bad_page_threshold. Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Stanley.Yang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-02-23	drm/amdgpu: exclude duplicate pages from UMC RAS UE count	Tao Zhou	1	-3/+13
	If a UMC bad page is reserved but not freed by an application, the application may trigger uncorrectable error repeatly by accessing the page. v2: add specific function to do the check. v3: remove duplicate pages, calculate new added bad page number. v4: reuse save_bad_pages to calculate new added bad page number. Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Stanley.Yang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-01-17	drm/amdgpu: Adjust ras support check condition for special asic	YiPeng Chai	1	-1/+16
	[Why]: Amdgpu ras uses amdgpu_ras_is_supported to check whether the ras block supports the ras function. amdgpu_ras_is_supported uses .ras_enabled to determine whether the ras function of the block is enabled. But for special asic with mem ecc enabled but sram ecc not enabled, some ras blocks support poison mode but their ras function is not enabled on .ras_enabled, these ras blocks will run abnormally. [How]: If the ras block is not supported on .ras_enabled but the asic supports poison mode and the ras block has ras configuration, it can be considered that the ras block supports ras function. Signed-off-by: YiPeng Chai <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-01-17	drm/amdgpu: Remove unnecessary ras block support check	YiPeng Chai	1	-3/+0
	[Why]: For special asic with mem ecc enabled but sram ecc not enabled, some ras blocks can register their ras configuration to ras list, but these ras blocks are not enabled on .ras_enabled, so it can not get ras block object using amdgpu_ras_get_ras_block. [How]: Remove ras block support check. Even if the ras block checked is not in the ras list, it will return a null pointer and will have no effect. Signed-off-by: YiPeng Chai <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-01-17	drm/amdgpu: Perform gpu reset after gfx finishes processing ras poison ↵	YiPeng Chai	1	-3/+5
	consumption on gfx_v11_0_3 Perform gpu reset after gfx finishes processing ras poison consumption on gfx_v11_0_3. V2: Move gfx poison consumption handler from hw_ops to ip function level. V3: Adjust the calling position of amdgpu_gfx_poison_consumation_handler. V4: Since gfx v11_0_3 does not have .hw_ops instance, the .hw_ops null pointer check in amdgpu_ras_interrupt_poison_consumption_handler needs to be adjusted. Signed-off-by: YiPeng Chai <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-01-05	drm/amdgpu: allow query error counters for specific IP block	Hawking Zhang	1	-20/+69
	amdgpu_ras_block_late_init will be invoked in IP specific ras_late_init call as a common helper for all the IP blocks. However, when amdgpu_ras_block_late_init call amdgpu_ras_query_error_count to query ras error counters, amdgpu_ras_query_error_count queries all the IP blocks that support ras query interface. This results to wrong error counters cached in software copies when there are ras errors detected at time zero or warm reset procedure. i.e., in sdma_ras_late_init phase, it counts on sdma/mmhub errors, while, in mmhub_ras_late_init phase, it still counts on sdma/mmhub errors. The change updates amdgpu_ras_query_error_count interface to allow query specific ip error counter. It introduces a new input parameter: query_info. if query_info is NULL, it means query all the IP blocks, otherwise, only query the ip block specified by query_info. Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-01-03	drm/amdgpu: remove enable ras cmd call trace	Stanley.Yang	1	-3/+13
	[Why] [ 41.285804] RIP: 0010:amdgpu_ras_feature_enable+0x15c/0x310 [amdgpu] [ 41.285945] Code: 48 89 c1 48 c7 c2 b9 f2 88 c1 48 c7 c0 c0 f2 88 c1 49 8b 3c 24 48 0f 44 d0 48 c7 c6 98 33 80 c1 e8 5f 52 75 d9 e9 fa fe ff ff <0f> 0b e9 66 ff ff ff 48 8b 3d 86 8c 0f da ba 00 04 00 00 be c0 0d [ 41.285946] RSP: 0018:ffffbccdc72efc90 EFLAGS: 00010246 [ 41.285948] RAX: 0000000000000004 RBX: ffff931897406980 RCX: 0000000000000002 [ 41.285949] RDX: 0000000000000dc0 RSI: 0000000000000002 RDI: ffff931500042b00 [ 41.285950] RBP: ffffbccdc72efcc0 R08: 0000000000000002 R09: ffff931885b87000 [ 41.285951] R10: 0000000000ffff10 R11: 0000000000000001 R12: ffff931893e20000 [ 41.285952] R13: 0000000000000001 R14: ffff931885b87000 R15: 0000000000000000 [ 41.285953] FS: 0000000000000000(0000) GS:ffff931c6f200000(0000) knlGS:0000000000000000 [ 41.285954] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 41.285955] CR2: 000055dd6f532008 CR3: 000000061b010006 CR4: 00000000003706e0 [ 41.285956] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 41.285957] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 41.285958] Call Trace: [ 41.285959] <TASK> [ 41.285963] ? gfx_v11_0_early_init+0x250/0x250 [amdgpu] [ 41.286117] gfx_v11_0_late_init+0x8c/0xb0 [amdgpu] [ 41.286271] amdgpu_device_ip_late_init+0x8d/0x3c0 [amdgpu] [ 41.286401] amdgpu_device_init.cold+0x1677/0x1fda [amdgpu] [ 41.286616] ? pci_bus_read_config_word+0x4a/0x70 [ 41.286621] ? do_pci_enable_device+0xdb/0x110 [ 41.286625] amdgpu_driver_load_kms+0x1a/0x160 [amdgpu] [ 41.286762] amdgpu_pci_probe+0x18d/0x3a0 [amdgpu] [ 41.286898] local_pci_probe+0x4b/0x90 [ 41.286901] work_for_cpu_fn+0x1a/0x30 [ 41.286903] process_one_work+0x22b/0x3d0 [ 41.286905] worker_thread+0x223/0x420 [ 41.286907] ? process_one_work+0x3d0/0x3d0 [ 41.286908] kthread+0x12a/0x150 [ 41.286911] ? set_kthread_struct+0x50/0x50 [ 41.286913] ret_from_fork+0x22/0x30 [How] For specific asic, only mem ecc is enabled, sram ecc is not enabled, but it still need to send ras enable cmd to gfx block to support poison mode, so add check posion mode. Signed-off-by: Stanley.Yang <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2022-12-15	drm/amdgpu: define RAS query poison mode function	Tao Zhou	1	-21/+33
	1. no need to query poison mode on SRIOV guest side, host can handle it. 2. define the function to simplify code. v2: rename amdgpu_ras_poison_mode_query to amdgpu_ras_query_poison_mode. Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2022-12-15	drm/amdgpu: update VCN/JPEG RAS setting	Tao Zhou	1	-11/+13
	Support VCN/JPEG RAS in both bare metal and SRIOV environment. v2: update commit description. Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2022-12-15	drm/amdgpu: skip RAS error injection in SRIOV	Tao Zhou	1	-0/+4
	Injection on guest is not allowed. v2: return directly in SRIOV environment. Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2022-12-01	drm/amdgpu: use sysfs_emit() to instead of scnprintf()	ye xingchen	1	-1/+1
	Replace the open-code with sysfs_emit() to simplify the code. Reviewed-by: Luben Tuikov <[email protected]> Signed-off-by: ye xingchen <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2022-11-17	drm/amdgpu: enable RAS for VCN/JPEG v4.0	Tao Zhou	1	-1/+2
	Set support flag for VCN/JPEG 4.0. Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2022-11-17	drm/amdgpu: Enable mode-1 reset for RAS recovery in fatal error mode	YiPeng Chai	1	-1/+6
	The patch is enabling mode-1 reset for RAS recovery in fatal error mode. Signed-off-by: YiPeng Chai <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2022-10-27	drm/amdgpu: remove ras_error_status parameter for UMC poison handler	Tao Zhou	1	-2/+1
	Make the code simpler. Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>