blaster4385/linux-IllusionX - Linux kernel with personal config changes for arch linux

Age	Commit message (Collapse)	Author	Files	Lines
2019-12-05	drm/amdgpu: export amdgpu_ras_find_obj to use externally	Le Ma	1	-4/+1
	Change it to external interface. Signed-off-by: Le Ma <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-11-19	drm/amdgpu: enable ras capablity check on arcturus	Hawking Zhang	1	-1/+2
	check hw ras capablity via atomfirmware Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Reviewed-by: John Clements <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-11-06	drm/amdgpu: Improve RAS documentation (v2)	Alex Deucher	1	-7/+33
	Clarify some areas, clean up formatting, add section for unrecoverable error handling. v2: fix grammatical errors Reviewed-by: Yong Zhao <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-10-30	drm/amdgpu: bypass some cleanup work after err_event_athub (v2)	Le Ma	1	-9/+11
	PSP lost connection when err_event_athub occurs. These cleanup work can be skipped in BACO reset. v2: squash in missing include (Alex) Signed-off-by: Le Ma <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-10-25	drm/amdgpu: define macros for retire page reservation	Guchun Chen	1	-6/+11
	Easy for maintainance. Signed-off-by: Guchun Chen <[email protected]> Acked-by: Christian König <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-10-25	drm/amdgpu: refine reboot debugfs operation in ras case (v3)	Guchun Chen	1	-7/+12
	Ras reboot debugfs node allows user one easy control to avoid gpu recovery hang problem and directly reboot system per card basis, after ras uncorrectable error happens. However, it is one common entry, which should get rid of ras_ctrl node and remove ip dependence when inputting by user. So add one new auto_reboot node in ras debugfs dir to achieve this. v2: in commit mssage, add justification why ras reboot debugfs node is needed. v3: use debugfs_create_bool to create debugfs file for boolean value Signed-off-by: Guchun Chen <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-10-15	dmr/amdgpu: Fix crash on SRIOV for ERREVENT_ATHUB_INTERRUPT interrupt.	Andrey Grodzovsky	1	-0/+6
	Ignre the ERREVENT_ATHUB_INTERRUPT for systems without RAS. Signed-off-by: Andrey Grodzovsky <[email protected]> Reviewed-and-tested-by: Jack Zhang <[email protected]> Acked-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-10-10	drm/amdgpu: avoid ras error injection for retired page	Tao Zhou	1	-0/+44
	check whether a page is bad page before umc error injection, bad page should not be accessed again Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Guchun Chen <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-10-10	drm/amdgpu/ras: document the reboot ras option	Alex Deucher	1	-1/+2
	We recently added it, but never documented it. Reviewed-by: Andrey Grodzovsky <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-10-10	drm/amdgpu/ras: fix typos in documentation	Alex Deucher	1	-2/+2
	Fix a couple of spelling typos. Reviewed-by: Andrey Grodzovsky <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-10-07	drm/amdgpu: Fix error handling in amdgpu_ras_recovery_init	Felix Kuehling	1	-1/+1
	Don't set a struct pointer to NULL before freeing its members. It's hard to see what's happening due to a local pointer-to-pointer data aliasing con->eh_data. Signed-off-by: Felix Kuehling <[email protected]> Tested-by: Philip Cox <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-10-03	drm/amdgpu: simplify the access to eeprom_control struct	Tao Zhou	1	-3/+3
	simplify the code of accessing to eeprom_control struct Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Guchun Chen <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-10-03	drm/amdgpu: replace mmhub_funcs with mmhub.funcs	Tao Zhou	1	-2/+2
	remove mmhub_funcs in adev Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Guchun Chen <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-10-03	drm/amdgpu/ras: fix and update the documentation for RAS	Alex Deucher	1	-7/+46
	Add new sections to amdgpu.rst, fix up formatting issues, add additional documentation to each section. Acked-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-10-03	drm/amdgpu: avoid null pointer dereference	Guchun Chen	1	-2/+2
	null ptr should be checked first to avoid null ptr access Signed-off-by: Guchun Chen <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-10-03	drm/amdgpu/ras: use GPU PAGE_SIZE/SHIFT for reserving pages	Alex Deucher	1	-1/+2
	We are reserving vram pages so they should be aligned to the GPU page size. Reviewed-by: Tao Zhou <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-10-03	drm/amdgpu: replace DRM_ERROR with DRM_WARN in ras_reserve_bad_pages	Tao Zhou	1	-1/+6
	There are two cases of reserve error should be ignored: 1) a ras bad page has been allocated (used by someone); 2) a ras bad page has been reserved (duplicate error injection for one page); DRM_ERROR is unnecessary for the failure of bad page reserve Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Guchun Chen <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-10-03	docs: drm/amdgpu: Resolve build warnings	Adam Zerella	1	-18/+26
	Some of the documentation formatting could be improved which will resolve some Sphinx amdgpu build warnings e.g WARNING: Unexpected indentation. WARNING: Block quote ends without a blank line; unexpected unindent. WARNING: Inline emphasis start-string without end-string. Signed-off-by: Adam Zerella <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-09-17	drm/amdgpu: cleanup creating BOs at fixed location (v2)	Christian König	1	-79/+6
	The placement is something TTM/BO internal and the RAS code should avoid touching that directly. Add a helper to create a BO at a fixed location and use that instead. v2: squash in fixes (Alex) Signed-off-by: Christian König <[email protected]> Reviewed-by: Guchun Chen <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-09-16	drm/amdgpu: fix ras ctrl debugfs node leak	Guchun Chen	1	-7/+5
	Use debugfs_remove_recursive to remove the whole debugfs directory instead of removing the node one by one. Signed-off-by: Guchun Chen <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-09-16	drm/amdgpu: support pcie bif ras query and inject	Guchun Chen	1	-0/+5
	Call pcie bif ras query/inject in amdgpu ras. Signed-off-by: Tao Zhou <[email protected]> Signed-off-by: Guchun Chen <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-09-16	drm/amdgpu: enable error injection to XGMI block via debugfs	Hawking Zhang	1	-0/+1
	allow inject error to XGMI block via debugfs node ras_ctrl Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Guchun Chen <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-09-16	drm/amdgpu: Allow to reset to EERPOM table.	Andrey Grodzovsky	1	-0/+25
	The table grows quickly during debug/development effort when multiple RAS errors are injected. Allow to avoid this by setting table header back to empty if needed. v2: Switch to debugfs entry instead of load time parameter. Signed-off-by: Andrey Grodzovsky <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Reviewed-by: Guchun Chen <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-09-16	drm/amdgpu: move umc ras init to umc block	Tao Zhou	1	-4/+0
	move umc ras init from ras module to umc block, generic ras module should pay less attention to specific ras block. Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Guchun Chen <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-09-16	drm/amdgpu: Avoid RAS recovery init when no RAS support.	Andrey Grodzovsky	1	-1/+6
	Fixes driver load regression on APUs. Signed-off-by: Andrey Grodzovsky <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-09-13	drm/amdgpu: move the call of ras recovery_init and bad page reserve to ↵	Tao Zhou	1	-13/+26
	proper place ras recovery_init should be called after ttm init, bad page reserve should be put in front of gpu reset since i2c may be unstable during gpu reset. add cleanup for recovery_init and recovery_fini v2: add more comment and print. remove cancel_work_sync in recovery_init. Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Guchun Chen <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-09-13	drm/amdgpu: Hook EEPROM table to RAS	Tao Zhou	1	-28/+81
	support eeprom records load and save for ras, move EEPROM records storing to bad page reserving v2: remove redundant check for con->eh_data Signed-off-by: Tao Zhou <[email protected]> Signed-off-by: Andrey Grodzovsky <[email protected]> Reviewed-by: Guchun Chen <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-09-13	drm/amdgpu: change ras bps type to eeprom table record structure	Tao Zhou	1	-21/+38
	change bps type from retired page to eeprom table record, prepare for saving umc error records to eeprom Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Guchun Chen <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-09-13	dmr/amdgpu: Add system auto reboot to RAS.	Andrey Grodzovsky	1	-1/+8
	In case of RAS error allow user configure auto system reboot through ras_ctrl. This is also part of the temproray work around for the RAS hang problem. v4: Use latest kernel API for disk sync. Signed-off-by: Andrey Grodzovsky <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-09-13	drm/amdgpu: Avoid HW GPU reset for RAS.	Andrey Grodzovsky	1	-2/+20
	Problem: Under certain conditions, when some IP bocks take a RAS error, we can get into a situation where a GPU reset is not possible due to issues in RAS in SMU/PSP. Temporary fix until proper solution in PSP/SMU is ready: When uncorrectable error happens the DF will unconditionally broadcast error event packets to all its clients/slave upon receiving fatal error event and freeze all its outbound queues, err_event_athub interrupt will be triggered. In such case and we use this interrupt to issue GPU reset. THe GPU reset code is modified for such case to avoid HW reset, only stops schedulers, deatches all in progress and not yet scheduled job's fences, set error code on them and signals. Also reject any new incoming job submissions from user space. All this is done to notify the applications of the problem. v2: Extract amdgpu_amdkfd_pre/post_reset from amdgpu_device_lock/unlock_adev Move amdgpu_job_stop_all_jobs_on_sched to amdgpu_job.c Remove print param from amdgpu_ras_query_error_count v3: Update based on prevoius bug fixing patch to properly call amdgpu_amdkfd_pre_reset for other XGMI hive memebers. Signed-off-by: Andrey Grodzovsky <[email protected]> Acked-by: Felix Kuehling <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-09-13	drm/amdgpu: add helper function to do common ras_late_init/fini (v3)	Hawking Zhang	1	-0/+72
	In late_init for ras, the helper function will be used to 1). disable ras feature if the IP block is masked as disabled 2). send enable feature command if the ip block was masked as enabled 3). create debugfs/sysfs node per IP block 4). register interrupt handler v2: check ih_info.cb to decide add interrupt handler or not v3: add ras_late_fini for cleanup all the ras fs node and remove interrupt handler Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-09-13	drm/amdgpu: add ras_controller and err_event_athub interrupt support	Hawking Zhang	1	-0/+14
	Ras controller interrupt and Ras err event athub interrupt are two dedicated interrupts for RAS support. Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-08-23	drm/amdgpu: correct ras error count type	Guchun Chen	1	-3/+3
	Use unsigned long type for the same ras count variable. This will avoid overflow on 64 bit system. Signed-off-by: Guchun Chen <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-08-12	drm/amdgpu: remove ras block's feature status info in sysfs	Tao Zhou	1	-18/+1
	feature mask info is enough for rocm tool, "cat /sys/class/drm/card0/device/ras/features" will get the info like this: feature mask: 0x3ffb Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-08-12	drm/amdgpu: support mmhub ras in amdgpu ras	Tao Zhou	1	-0/+5
	call mmhub ras query/inject in amdgpu ras Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-08-12	drm/amdgpu: add sub block parameter in ras inject command	Tao Zhou	1	-7/+10
	ras sub block index could be passed from shell command Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Guchun Chen <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-08-06	drm/amdgpu: update ras sysfs feature info	Tao Zhou	1	-12/+5
	remove confused ras error type info Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-08-02	drm/amdgpu: replace AMDGPU_RAS_UE with AMDGPU_RAS_SUCCESS	Tao Zhou	1	-1/+1
	ce can also trigger interrupt, and even both ce and ue error can be found in one ras query, distinguishing between ce and ue in interrupt handler is uncessary. Signed-off-by: Tao Zhou <[email protected]> Suggested-by: Guchun Chen <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-08-02	drm/amdgpu: support ce interrupt in ras module	Tao Zhou	1	-4/+8
	correctable error can also trigger interrupt in some ras blocks Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-08-02	drm/amdgpu: add error address query for umc ras	Tao Zhou	1	-0/+5
	umc error address query can get ce/ue error address and clear error status Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-07-31	drm/amdgpu: support gfx ras error injection and err_cnt query	Dennis Li	1	-3/+16
	check gfx error count in both ras querry function and ras interrupt handler. gfx ras is still disabled by default due to known stability issue found in gpu reset. Signed-off-by: Dennis Li <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-07-31	drm/amdgpu: remove ras_reserve_vram in ras injection	Tao Zhou	1	-11/+10
	error injection address is not in gpu address space Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Dennis Li <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-07-31	drm/amdgpu: add check for ras error type	Tao Zhou	1	-3/+8
	only ue and ce errors are supported Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Dennis Li <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-07-31	drm/amdgpu: allow ras interrupt callback to return error data	Tao Zhou	1	-3/+3
	add error data as parameter for ras interrupt cb and process it Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Dennis Li <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-07-31	drm/amdgpu: add support for recording ras error address	Tao Zhou	1	-1/+1
	more than one error address may be recorded in one query Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Dennis Li <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-07-31	drm/amdgpu: switch to amdgpu_umc structure	Tao Zhou	1	-2/+2
	create new amdgpu_umc structure to for more umc settings in future and switch to the new structure Signed-off-by: Tao Zhou <[email protected]> Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Dennis Li <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-07-31	drm/amdgpu: add ras error count after each query (v2)	Tao Zhou	1	-0/+11
	v1: increase ras ce/ue error count v2: log the number of correctable and uncorrectable errors Signed-off-by: Tao Zhou <[email protected]> Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Dennis Li <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-07-31	drm/amdgpu: querry umc error count	Hawking Zhang	1	-1/+10
	check umc error count in both ras querry function and ras interrupt handler Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Dennis Li <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-07-31	drm/amdgpu: move some ras data structure to amdgpu_ras.h	Hawking Zhang	1	-68/+0
	These are common structures that can be included by IP specific source files Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Dennis Li <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-07-18	drm/amdgpu: drop ras self test	Hawking Zhang	1	-7/+0
	this function is not needed any more. error injection is the only way to validate ras but it can't be executed in amdgpu_ras_init, where gpu is even not initialized Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Feifei Xu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>