aboutsummaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/amd/amdgpu
AgeCommit message (Collapse)AuthorFilesLines
2021-04-20drm/amdgpu/gmc9: remove dummy read workaround for newer chipsAlex Deucher1-2/+4
Aldebaran has a hw fix so no longer requires the workaround. Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-04-20drm/amdgpu: Add mem sync flag for IB allocated by SAJinzhou Su1-0/+2
The buffer of SA bo will be used by many cases. So it's better to invalidate the cache of indirect buffer allocated by SA before commit the IB. Signed-off-by: Jinzhou Su <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-04-20drm/amdgpu: Fix SDMA RAS error reporting on AldebaranMukul Joshi1-7/+28
Fix the following issues with SDMA RAS error reporting: 1. Read the EDC_COUNTER2 register also to fetch error counts for all sub-blocks in SDMA. 2. SDMA RAS on Aldebaran suports single-bit uncorrectable errors only. So, report error count in UE count instead of CE count. Signed-off-by: Mukul Joshi <[email protected]> Reviewed-By: John Clements <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-04-20drm/amdgpu: Reset RAS error count and status regsMukul Joshi1-0/+6
Reset the RAS error count and error status registers after reading to prevent over reporting error counts on Aldebaran. Signed-off-by: Mukul Joshi <[email protected]> Reviewed-By: John Clements <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-04-20Revert "drm/amdgpu: workaround the TMR MC address issue (v2)"Oak Zeng4-40/+10
This reverts commit 2f055097daef498da57552f422f49de50a1573e6. 2f055097daef498da57552f422f49de50a1573e6 was a driver workaround when PSP firmware was not ready. Now the PSP fw is ready so we revert this driver workaround. Signed-off-by: Oak Zeng <[email protected]> Reviewed-by: Harish Kasiviswanathan <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-04-20drm/amdgpu: fix GCR_GENERAL_CNTL offset for dimgrey_cavefishJiansong Chen1-1/+1
dimgrey_cavefish has similar gc_10_3 ip with sienna_cichlid, so follow its registers offset setting. Signed-off-by: Jiansong Chen <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-04-20drm/amdgpu: resolve erroneous gfx_v9_4_2 printsJohn Clements1-1/+1
resolve bug on aldebaran where gfx error counts will print on driver load when there are no errors present Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: John Clements <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-04-20drm/amdgpu: fix a error injection failed issueDennis Li1-1/+1
because "sscanf(str, "retire_page")" always return 0, if application use the raw data for error injection, it always wrongly falls into "op == 3". Change to use strstr instead. Signed-off-by: Dennis Li <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-04-20drm/amdgpu: only harvest gcea/mmea error status in aldebaranHawking Zhang2-13/+19
In aldebaran, driver only needs to harvest SDP RdRspStatus, WrRspStatus and first parity error on RdRsp data. Check error type before harvest error information. Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Stanley Yang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-04-20drm/amdgpu: only harvest gcea/mmea error status in arcturusHawking Zhang2-6/+18
SDP RdRspStatus/WrRspStatus or first parity error on RdRsp data can cause system fatal error in arcturus. GPU will be freezed in such case. Driver needs to harvest these error information before reset the GPU. Check error type to avoid harvest normal gcea/mmea information. Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Stanley Yang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-04-20drm/amdgpu: enable tmz on renoir asicsHuang Rui1-1/+1
The tmz functions are verified on renoir chips as well. So enable it by default. Signed-off-by: Huang Rui <[email protected]> Tested-by: Lang Yu <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-04-20drm/amdgpu: correct default gfx wdt timeout settingHawking Zhang1-2/+2
When gfx wdt was configured to fatal_disable, the timeout period should be configured to 0x0 (timeout disabled) Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Dennis Li <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-04-15drm/amdgpu: fix an error code in init_pmu_entry_by_type_and_add()Dan Carpenter1-1/+3
If the kmemdup() fails then this should return a negative error code but it currently returns success Fixes: b4a7db71ea06 ("drm/amdgpu: add per device user friendly xgmi events for vega20") Signed-off-by: Dan Carpenter <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-04-15Revert "Revert "drm/amdgpu: Ensure that the modifier requested is supported ↵Qingqing Zhuo1-0/+13
by plane."" This reverts commit 55fa622fe635bfc3f2587d784f6facc30f8fdf12. The regression caused by the original patch has been cleared, thus introduce back the change. Signed-off-by: Qingqing Zhuo <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-04-15drm/amdgpu: Copy MEC FW version to MEC2 if we skipped loading MEC2Joseph Greathouse1-0/+3
If we skipped loading MEC2 firmware separately from MEC, then MEC2 will be running the same firmware image. Copy the MEC version and feature numbers into MEC2 version and feature numbers. This is needed for things like GWS support, where we rely on knowing what version of firmware is running on MEC2. Leaving these MEC2 entries blank breaks our ability to version-check enables and workarounds. Signed-off-by: Joseph Greathouse <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-04-15drm/amdkfd: Remove legacy code not acquiring VMsFelix Kuehling2-54/+0
ROCm user mode has acquired VMs from DRM file descriptors for as long as it supported the upstream KFD. Legacy code to support older versions of ROCm is not needed any more. Signed-off-by: Felix Kuehling <[email protected]> Reviewed-by: Philip Yang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-04-15drm/amdgpu: Use iterator methods exposed by amdgpu_res_cursor.h in building ↵Ramesh Errabolu3-12/+27
SG_TABLE's for a VRAM BO Extend current implementation of SG_TABLE construction method to allow exportation of sub-buffers of a VRAM BO. This capability will enable logical partitioning of a VRAM BO into multiple non-overlapping sub-buffers. One example of this use case is to partition a VRAM BO into two sub-buffers, one for SRC and another for DST. Reviewed-by: Christian König <[email protected]> Signed-off-by: Ramesh Errabolu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-04-15drm/amdgpu: Add double-sscanf but invertLuben Tuikov1-2/+5
Add back the double-sscanf so that both decimal and hexadecimal values could be read in, but this time invert the scan so that hexadecimal format with a leading 0x is tried first, and if that fails, then try decimal format. Also use a logical-AND instead of nesting double if-conditional. See commit "drm/amdgpu: Fix a bug for input with double sscanf" Cc: Alexander Deucher <[email protected]> Cc: Hawking Zhang <[email protected]> Signed-off-by: Luben Tuikov <[email protected]> Reviewed-by: John Clements <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-04-15drm/amd/amdgpu: add ASPM support on polarisKenneth Feng1-2/+191
add ASPM support on polaris Signed-off-by: Kenneth Feng <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-04-15drm/amd/amdgpu: enable ASPM on vegaKenneth Feng3-3/+257
enable ASPM on vega to save the power without the performance hurt. Signed-off-by: Kenneth Feng <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-04-15drm/amd/amdgpu: enable ASPM on navi1xKenneth Feng1-8/+2
enable ASPM on navi1x for the benifit of system power consumption without performance hurt. Signed-off-by: Kenneth Feng <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-04-15drm/amd/sriov no need to config GECC for sriovJack Zhang1-1/+1
No need to config GECC feature here for sriov Leave the host drvier to do the configuration job. Signed-off-by: Jack Zhang <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-04-15drm/amdgpu: Fix kernel-doc for the RAS sysfs interfaceLuben Tuikov1-23/+24
Imporve the kernel-doc for the RAS sysfs interface. Fix the grammar, fix the context. Cc: Alexander Deucher <[email protected]> Cc: Hawking Zhang <[email protected]> Signed-off-by: Luben Tuikov <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-04-15drm/amdgpu: Add bad_page_cnt_threshold to debugfsLuben Tuikov1-0/+2
Add bad_page_cnt_threshold to debugfs, an optional file system used for debugging, for reporting purposes only--it usually matches the size of EEPROM but may be different depending on the "bad_page_threshold" kernel module option. The "bad_page_cnt_threshold" is a dynamically computed value. It depends on three things: the VRAM size; the size of the EEPROM (or the size allocated to the RAS table therein); and the "bad_page_threshold" module parameter. It is a dynamically computed value, when the amdgpu module is run, on which further parameters and logic depend, and as such it is helpful to see the dynamically computed value in debugfs. Cc: Alexander Deucher <[email protected]> Cc: Hawking Zhang <[email protected]> Signed-off-by: Luben Tuikov <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-04-15drm/amdgpu: Fix a bug in checking the result of reserve pageLuben Tuikov1-6/+3
Fix if (ret) --> if (!ret), a bug, for "retire_page", which caused the kernel to recall the method with *pos == end of file, and that bounced back with error. On the first run, we advanced *pos, but returned 0 back to fs layer, also a bug. Fix the logic of the check of the result of amdgpu_reserve_page_direct()--it is 0 on success, and non-zero on error, not the other way around. This patch fixes this bug. Cc: Alexander Deucher <[email protected]> Cc: John Clements <[email protected]> Cc: Hawking Zhang <[email protected]> Signed-off-by: Luben Tuikov <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-04-15drm/amdgpu: Fix a bug for input with double sscanfLuben Tuikov1-8/+5
Remove double-sscanf to scan for %llu and 0x%llx, as that is not going to work! The %llu will consume the "0" in "0x" of your input, and the hex value you think you're entering will always be 0. That is, a valid hex value can never be consumed. On the other hand, just entering a hex number without leading 0x will either be scanned as a string and not match, for instance FAB123, or the leading decimal portion is scanned as the %llu, for instance 123FAB will be scanned as 123, which is not correct. Thus remove the first %llu scan and leave only the %llx scan, removing the leading 0x since %llx can scan either. Addresses are usually always hex values, so this suffices. Cc: Alexander Deucher <[email protected]> Cc: Xinhui Pan <[email protected]> Cc: Hawking Zhang <[email protected]> Signed-off-by: Luben Tuikov <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-04-15drm/amdgpu: Add graphics cache rinse packet for sdmaJinzhou Su1-0/+28
Add emit mem sync callback for sdma_v5_2 In amdgpu sync object test, three threads created jobs to send GFX IB and SDMA IB in sequence. After the first GFX thread joined, sometimes the third thread will reuse the same physical page to store the SDMA IB. There will be a risk that SDMA will read GFX IB in the previous physical page. So it's better to flush the cache before commit sdma IB. Signed-off-by: Jinzhou Su <[email protected]> Reviewed-by: Huang Rui <[email protected]> Acked-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-04-15drm/amdkfd: change MTYPEs for Aldebaran's HW requirementEric Huang1-8/+5
Due to changes of HW memory model, we need to change Aldebaran MTYPEs to meet HW changes. Signed-off-by: Eric Huang <[email protected]> Reviewed-by: Harish Kasiviswanathan <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-04-15drm/amdgpu: Introduce new SETUP_TMR interfaceOak Zeng2-4/+19
This new interface passes both virtual and physical address to PSP. It is backward compatible with old interface. v2: use a function to simplify tmr physical address calc (Lijo) Signed-off-by: Oak Zeng <[email protected]> Reviewed-by: Harish Kasiviswanathan <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-04-15drm/amdgpu: Calling address translation functions to simplify codesOak Zeng12-25/+12
Use amdgpu_gmc_vram_pa and amdgpu_gmc_vram_cpu_pa to simplify codes. No logic change. Signed-off-by: Oak Zeng <[email protected]> Signed-off-by: Harish Kasiviswanathan <[email protected]> Acked-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-04-15drm/amdgpu: Introduce functions for vram physical addr calculationOak Zeng2-0/+39
Add one function to calculate BO's GPU physical address. And another function to calculate BO's CPU physical address. v2: Use functions vs macros (Christian) Use more proper function names (Christian) Signed-off-by: Oak Zeng <[email protected]> Suggested-by: Lijo Lazar <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-04-15drm/amdgpu: update gfx 9.4.2 ras error reportingJohn Clements1-6/+7
only output ras error status if an error bit is set or error counter is set Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: John Clements <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-04-15drm/amdgpu: update mmhub 1.7 ras error reportingJohn Clements1-1/+1
only output ras error status if an error bit is set Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: John Clements <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-04-09drm/amdgpu: page retire over debugfs mechanismJohn Clements1-0/+67
added support in RAS debugfs to add bad page for isolated page retirement testing Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: John Clements <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-04-09drm/amdgpu: enable ras eeprom on aldebaranJohn Clements1-1/+7
enable ras eeprom loading by default on aldebaran Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: John Clements <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-04-09drm/amdgpu: RAS harvest on driver loadJohn Clements1-0/+29
In event of RAS UE + warm reset, error counters shall be harvested and cleared on driver load Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: John Clements <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-04-09drm/amdgpu: add DMUB outbox event IRQ source define/complete/debug flagJude Shih1-0/+1
[Why & How] We use outbox interrupt that allows us to do the AUX via DMUB Therefore, we need to add some irq source related definition in the header files; Signed-off-by: Jude Shih <[email protected]> Reviewed-by: Harry Wentland <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-04-09drm/amdgpu: Fix size overflowxinhui pan1-1/+1
ttm->num_pages is uint32. Hit overflow when << PAGE_SHIFT directly Fixes: 230c079fdcf4 ("drm/ttm: make num_pages uint32_t") Signed-off-by: xinhui pan <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-04-09drm/amdgpu: move mmhub ras_func init to ip specific fileHawking Zhang2-19/+19
mmhub ras is always owned by gpu driver. ras_funcs initialization shall be done at ip level, instead of putting it in common gmc interface file Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-04-09drm/amdgpu: Remove unused function amdgpu_bo_fbdev_mmap()Thomas Zimmermann2-21/+0
Remove an unused function. Mapping the fbdev framebuffer is apparently not supported. Signed-off-by: Thomas Zimmermann <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-04-09Revert "drm/amdgpu: Ensure that the modifier requested is supported by plane."Qingqing Zhuo1-13/+0
This reverts commit f4a9be998c8ee39a30a68cb775c91928fe10a384. The original commit was found to cause the following two issues on sienna cichlid: 1. Refresh rate locked during vrrdemo 2. Display sticks on flipped landscape mode after changing orientation, and cannot be changed back to regular landscape Signed-off-by: Qingqing Zhuo <[email protected]> Reviewed-by: Rodrigo Siqueira <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-04-09drm/amdgpu: split gfx callbacks into ras and non-ras onesHawking Zhang8-87/+92
gfx ras is only available in cerntain ip generations. Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Dennis Li <[email protected]> Reviewed-by: John Clements <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-04-09drm/amdgpu: split mmhub callbacks into ras and non-ras onesHawking Zhang13-29/+74
mmhub ras is only avaiable in cerntain mmhub ip generation. Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Dennis Li <[email protected]> Reviewed-by: John Clements <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-04-09drm/amdgpu: do not register df_mca interrupt in certain configHawking Zhang1-2/+4
df/mca ras is not managed by gpu driver when gpu is connected to cpu through xgmi. gpu driver should register x86 mca notifier for umc ras error notification Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Dennis Li <[email protected]> Reviewed-by: John Clements <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-04-09drm/amdgpu: split umc callbacks to ras and non-ras onesHawking Zhang12-32/+51
umc ras is not managed by gpu driver when gpu is connected to cpu through xgmi. split umc callbacks into ras and non-ras ones so gpu driver only initializes umc ras callbacks when it manages umc ras. Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Dennis Li <[email protected]> Reviewed-by: John Clements <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-04-09drm/amdgpu: move xgmi ras functions to xgmi_ras_funcsHawking Zhang6-17/+42
xgmi ras is not managed by gpu driver when gpu is connected to cpu through xgmi. move all xgmi ras functions to xgmi_ras_funcs so gpu driver only initializes xgmi ras functions when it manages xgmi ras. Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Dennis Li <[email protected]> Reviewed-by: John Clements <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-04-09drm/amdgpu: split nbio callbacks into ras and non-ras onesHawking Zhang6-30/+63
nbio ras is not managed by gpu driver when gpu is connected to cpu through xgmi. split nbio callbacks into ras and non-ras ones so gpu driver only initializes nbio ras callbacks when it manages nbio ras. Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Dennis Li <[email protected]> Reviewed-by: John Clements <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-04-09drm/amdgpu: implement query_ras_error_address callbackHawking Zhang1-0/+90
query_ras_error_address will be invoked to query bad page address when there is poison data in HBM consumed by GPU engines. Signed-off-by: Hawking Zhang <[email protected]> Acked-by: Alex Deucher <[email protected]> Reviewed-by: John Clements <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-04-09drm/amdgpu: implement umc query error count callbackHawking Zhang2-0/+92
umc query_ras_error_count will be invoked to query umc correctable and uncorrectable error. It will reset the umc ras error counter after the query. Signed-off-by: Hawking Zhang <[email protected]> Acked-by: Alex Deucher <[email protected]> Reviewed-by: John Clements <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-04-09drm/amdgpu: add helper funtion to query umc ras errorHawking Zhang2-0/+77
Add helper functions to query correctable and uncorrectable umc ras error. Signed-off-by: Hawking Zhang <[email protected]> Acked-by: Alex Deucher <[email protected]> Reviewed-by: John Clements <[email protected]> Signed-off-by: Alex Deucher <[email protected]>