Age | Commit message (Collapse) | Author | Files | Lines |
|
Add RAS EEPROM table manager to eanble RAS errors to be stored
upon appearance and retrived on driver load.
v2: Fix some prints.
v3:
Fix checksum calculation.
Make table record and header structs packed to do correct byte value sum.
Fix record crossing EEPROM page boundry.
v4:
Fix byte sum val calculation for record - look at sizeof(record).
Fix some style comments.
v5: Add description to EEPROM_TABLE_RECORD_SIZE and syntax fixes.
Signed-off-by: Andrey Grodzovsky <[email protected]>
Reviewed-by: Luben Tuikov <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Use unsigned long type for the same ras count variable.
This will avoid overflow on 64 bit system.
Signed-off-by: Guchun Chen <[email protected]>
Reviewed-by: Tao Zhou <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Signed-off-by: Dennis Li <[email protected]>
Reviewed-by: Tao Zhou <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
add error data as parameter for ras interrupt cb and process it
Signed-off-by: Tao Zhou <[email protected]>
Reviewed-by: Dennis Li <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
more than one error address may be recorded in one query
Signed-off-by: Tao Zhou <[email protected]>
Reviewed-by: Dennis Li <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
These are common structures that can be included by IP specific
source files
Signed-off-by: Hawking Zhang <[email protected]>
Reviewed-by: Dennis Li <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
When calling debugfs functions, there is no need to ever check the
return value. The function can work or not, but the code logic should
never do something different based on this.
Cc: Alex Deucher <[email protected]>
Cc: "Christian König" <[email protected]>
Cc: "David (ChunMing) Zhou" <[email protected]>
Cc: David Airlie <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: xinhui pan <[email protected]>
Cc: Evan Quan <[email protected]>
Cc: Feifei Xu <[email protected]>
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
The "block" variable can be set by the user through debugfs, so it can
be quite large which leads to shift wrapping here. This means we report
a "block" as supported when it's not, and that leads to array overflows
later on.
This bug is not really a security issue in real life, because debugfs is
generally root only.
Fixes: 36ea1bd2d084 ("drm/amdgpu: add debugfs ctrl node")
Signed-off-by: Dan Carpenter <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
add ras suspend function. rename ras_post_init to amdgpu_ras_resume.
Signed-off-by: xinhui pan <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Reviewed-by: James Zhu <[email protected]>
Tested-by: James Zhu <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
add badpages node.
it will output badpages list in format
gpu pfn : gpu page size : flags
example
0x00000000 : 0x00001000 : R
0x00000001 : 0x00001000 : R
0x00000002 : 0x00001000 : R
0x00000003 : 0x00001000 : R
0x00000004 : 0x00001000 : R
0x00000005 : 0x00001000 : R
0x00000006 : 0x00001000 : R
0x00000007 : 0x00001000 : P
0x00000008 : 0x00001000 : P
0x00000009 : 0x00001000 : P
flags can be one of below characters
R: reserved.
P: pending for reserve.
F: failed to reserve for some reasons.
Signed-off-by: xinhui pan <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
add another flag to allow IP do a gpu reset after device init.
Signed-off-by: xinhui pan <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Enable this now to reset the GPU on RAS errors.
This reverts commit 138352e5752aa3e694951d70c8fe8730219f4edf.
Signed-off-by: xinhui pan <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Many parts of the whole SW stack can program the ras enablement state
during the boot. Now we handle that case by adding one function which
check the ras flags and choose different code path.
Reviewed-by: Evan Quan <[email protected]>
Signed-off-by: xinhui pan <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Add helpes to transalte the two enums. And it will catch bugs
easily.
Signed-off-by: xinhui pan <[email protected]>
Signed-off-by: Nathan Chancellor <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
add ras post init function.
Do some initialization after all IP have finished their late init.
Add new member flags which will control the ras work flow.
For now, vbios enable ras for us on boot. That might change in the
future.
So there should be a flag from vbios to tell us if ras is enabled or not
on boot. Looks like there is no such info now.
Other bits of the flags are reserved to control other parts of ras.
Signed-off-by: xinhui pan <[email protected]>
Reviewed-by: Evan Quan <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Currently, it is not clear how ras is supported. Both software and
hardware can set the supported. That is confusing.
Fix it by adding new member hw_supported.
Signed-off-by: xinhui pan <[email protected]>
Reviewed-by: Evan Quan <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
gpu reset is not stable on vega20 A1.
Signed-off-by: xinhui pan <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
allow userspace enable/disable ras
Signed-off-by: xinhui pan <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
add obj management.
add feature control.
add debugfs infrastructure.
add sysfs infrastructure.
add IH infrastructure.
add recovery infrastructure.
It is a framework. Other IPs need call amdgpu_ras_xxx function instead of
psp_ras_xxx functions.
v2: squash in warning fixes
Signed-off-by: xinhui pan <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|