aboutsummaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/amd/amdgpu
AgeCommit message (Collapse)AuthorFilesLines
2021-08-20drm/amdgpu: use the preferred pin domain after the checkChristian König1-5/+5
For some reason we run into an use case where a BO is already pinned into GTT, but should be pinned into VRAM|GTT again. Handle that case gracefully as well. Reviewed-by: Shashank Sharma <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected]
2021-08-18drm/amd/amdgpu:flush ttm delayed work before cancel_syncYuBiao Wang1-1/+3
[Why] In some cases when we unload driver, warning call trace will show up in vram_mgr_fini which claims that LRU is not empty, caused by the ttm bo inside delay deleted queue. [How] We should flush delayed work to make sure the delay deleting is done. Signed-off-by: YuBiao Wang <[email protected]> Reviewed-by: Andrey Grodzovsky <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-08-18drm/amd: consolidate TA shared memory structuresCandice Li4-162/+134
Signed-off-by: Candice Li <[email protected]> Reviewed-by: John Clements <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-08-18drm/amdgpu: increase max xgmi physical node for aldebaranHawking Zhang1-3/+2
aldebaran supports up to 16 xgmi physical nodes. Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: John Clements <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-08-18drm/amdgpu: disable BACO support for 699F:C7 polaris12 SKU temporarilyEvan Quan1-1/+8
We have a S3 issue on that SKU with BACO enabled. Will bring back this when that root caused. Signed-off-by: Evan Quan <[email protected]> Acked-by: Alex Deucher <[email protected]> Reviewed-by: Guchun Chen <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-08-18drm/amdgpu: correct MMSCH 1.0 versionZhigang Luo1-3/+1
MMSCH 1.0 doesn't have major/minor version, only verison. Signed-off-by: Zhigang Luo <[email protected]> Reviewed by Shaoyun.liu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-08-18drm/amdgpu: get extended xgmi topology dataJonathan Kim4-14/+145
The TA has a limit to the amount of data that can be retrieved from GET_TOPOLOGY. For setups that exceed this limit, the xGMI topology needs to be re-initialized and data needs to be re-fetched from the extended link records by setting a flag in the shared command buffer. The number of hops and the number of links must be accumulated by the driver. Other data points are all fetched from the first request. Because the TA has already exceeded its link record limit, it cannot hold bidirectional information. Otherwise the driver would have to do more than two fetches so the driver has to reflect the topology information in the opposite direction. v2: squashed with internal reviewed fix Signed-off-by: Jonathan Kim <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-08-16drm/amd/pm: drop the unnecessary intermediate percent-based transitionEvan Quan1-0/+2
Currently, the readout of fan speed pwm is transited into percent-based and then pwm-based. However, the transition into percent-based is totally unnecessary and make the final output less accurate. Signed-off-by: Evan Quan <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-08-16drm/amd/amdgpu: remove unnecessary RAS context fieldCandice Li10-14/+6
Delete ras_if->name in the RAS ctx structure and remove related lines. Signed-off-by: Candice Li <[email protected]> Reviewed-by: John Clements <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-08-16drm/amd/amdgpu: consolidate PSP TA contextCandice Li11-187/+158
Signed-off-by: Candice Li <[email protected]> Reviewed-by: John Clements <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-08-16drm/amdgpu: Add MB_REQ_MSG_READY_TO_RESET response when VF get FLR notification.Jiange Zhao2-1/+4
When guest received FLR notification from host, it would lock adapter into reset state. There will be no more job submission and hardware access after that. Then it should send a response to host that it has prepared for host reset. Signed-off-by: Jiange Zhao <[email protected]> Signed-off-by: Peng Ju Zhou <[email protected]> Reviewed-by: Emily.Deng <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-08-16drm/amd/amdgpu embed hw_fence into amdgpu_jobJack Zhang9-37/+119
Why: Previously hw fence is alloced separately with job. It caused historical lifetime issues and corner cases. The ideal situation is to take fence to manage both job and fence's lifetime, and simplify the design of gpu-scheduler. How: We propose to embed hw_fence into amdgpu_job. 1. We cover the normal job submission by this method. 2. For ib_test, and submit without a parent job keep the legacy way to create a hw fence separately. v2: use AMDGPU_FENCE_FLAG_EMBED_IN_JOB_BIT to show that the fence is embedded in a job. v3: remove redundant variable ring in amdgpu_job v4: add tdr sequence support for this feature. Add a job_run_counter to indicate whether this job is a resubmit job. v5 add missing handling in amdgpu_fence_enable_signaling Signed-off-by: Jingwen Chen <[email protected]> Signed-off-by: Jack Zhang <[email protected]> Reviewed-by: Andrey Grodzovsky <[email protected]> Reviewed by: Monk Liu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-08-16Merge tag 'drm-misc-next-2021-08-12' of ↵Dave Airlie3-9/+15
git://anongit.freedesktop.org/drm/drm-misc into drm-next drm-misc-next for v5.15: UAPI Changes: Cross-subsystem Changes: - Add lockdep_assert(once) helpers. Core Changes: - Add lockdep assert to drm_is_current_master_locked. - Fix typos in dma-buf documentation. - Mark drm irq midlayer as legacy only. - Fix GPF in udmabuf_create. - Rename member to correct value in drm_edid.h Driver Changes: - Build fix to make nouveau build with NOUVEAU_BACKLIGHT. - Add MI101AIT-ICP1, LTTD800480070-L6WWH-RT panels. - Assorted fixes to bridge/it66121, anx7625. - Add custom crtc_state to simple helpers, and use it to convert pll handling in mgag200 to atomic. - Convert drivers to use offset-adjusted framebuffer bo mappings. - Assorted small fixes and fix for a use-after-free in vmwgfx. - Convert remaining callers of non-legacy drivers to use linux irqs directly. - Small cleanup in ingenic. - Small fixes to virtio and ti-sn65dsi86. Signed-off-by: Dave Airlie <[email protected]> From: Maarten Lankhorst <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2021-08-12gpu: Bulk conversion to generic_handle_domain_irq()Marc Zyngier1-1/+1
Wherever possible, replace constructs that match either generic_handle_irq(irq_find_mapping()) or generic_handle_irq(irq_linear_revmap()) to a single call to generic_handle_domain_irq(). Signed-off-by: Marc Zyngier <[email protected]>
2021-08-11gpu: drm: amd: amdgpu: amdgpu_i2c: fix possible uninitialized-variable ↵Tuo Li1-1/+1
access in amdgpu_i2c_router_select_ddc_port() The variable val is declared without initialization, and its address is passed to amdgpu_i2c_get_byte(). In this function, the value of val is accessed in: DRM_DEBUG("i2c 0x%02x 0x%02x read failed\n", addr, *val); Also, when amdgpu_i2c_get_byte() returns, val may remain uninitialized, but it is accessed in: val &= ~amdgpu_connector->router.ddc_mux_control_pin; To fix this possible uninitialized-variable access, initialize val to 0 in amdgpu_i2c_router_select_ddc_port(). Reported-by: TOTE Robot <[email protected]> Signed-off-by: Tuo Li <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-08-11drm/amdkfd: CWSR with software schedulerMukul Joshi3-1/+94
This patch adds support to program trap handler settings when loading driver with software scheduler (sched_policy=2). Signed-off-by: Mukul Joshi <[email protected]> Suggested-by: Jay Cornwall <[email protected]> Reviewed-by: Harish Kasiviswanathan <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-08-10drm/amdgpu: Convert to Linux IRQ interfacesThomas Zimmermann3-9/+15
Drop the DRM IRQ midlayer in favor of Linux IRQ interfaces. DRM's IRQ helpers are mostly useful for UMS drivers. Modern KMS drivers don't benefit from using it. DRM IRQ callbacks are now being called directly or inlined. The interrupt number returned by pci_msi_vector() is now stored in struct amdgpu_irq. Calls to pci_msi_vector() can fail and return a negative errno code. Abort initlaizaton in thi case. The DRM IRQ midlayer does not handle this correctly. Signed-off-by: Thomas Zimmermann <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Acked-by: Christian König <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2021-08-10drm/amdgpu: handle VCN instances when harvesting (v2)Alex Deucher1-3/+9
There may be multiple instances and only one is harvested. v2: fix typo in commit message Fixes: 83a0b8639185 ("drm/amdgpu: add judgement when add ip blocks (v2)") Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1673 Reviewed-by: Guchun Chen <[email protected]> Reviewed-by: James Zhu <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected]
2021-08-10drm/amdgpu: handle VCN instances when harvesting (v2)Alex Deucher1-3/+9
There may be multiple instances and only one is harvested. v2: fix typo in commit message Fixes: 83a0b8639185 ("drm/amdgpu: add judgement when add ip blocks (v2)") Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1673 Reviewed-by: Guchun Chen <[email protected]> Reviewed-by: James Zhu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-08-09drm/amdgpu: Removed unnecessary if statementSergio Miguéns Iglesias1-3/+0
There was an "if" statement that did nothing so it was removed. Signed-off-by: Sergio Miguéns Iglesias <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-08-09drm/amdgpu: fix kernel-doc warnings on non-kernel-doc commentsRandy Dunlap1-3/+3
Don't use "begin kernel-doc notation" (/**) for comments that are not kernel-doc. This eliminates warnings reported by the 0day bot. drivers/gpu/drm/amd/amdgpu/gfx_v9_4_2.c:89: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst * This shader is used to clear VGPRS and LDS, and also write the input drivers/gpu/drm/amd/amdgpu/gfx_v9_4_2.c:209: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst * The below shaders are used to clear SGPRS, and also write the input drivers/gpu/drm/amd/amdgpu/gfx_v9_4_2.c:301: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst * This shader is used to clear the uninitiated sgprs after the above Fixes: 0e0036c7d13b ("drm/amdgpu: fix no full coverage issue for gprs initialization") Signed-off-by: Randy Dunlap <[email protected]> Reported-by: kernel test robot <[email protected]> Cc: Alex Deucher <[email protected]> Cc: Christian König <[email protected]> Cc: "Pan, Xinhui" <[email protected]> Cc: Dennis Li <[email protected]> Cc: [email protected] Cc: [email protected] Signed-off-by: Alex Deucher <[email protected]>
2021-08-09drm/amd/amdgpu: skip locking delayed work if not initialized.YuBiao Wang1-1/+2
When init failed in early init stage, amdgpu_object has not been initialized, so hasn't the ttm delayed queue functions. Signed-off-by: YuBiao Wang <[email protected]> Reviewed-by: Emily.Deng <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-08-09drm/amdgpu: Extend full access wait time in guestVictor Zhao1-4/+12
- Extend wait time and add retry, currently 6s * 2times - Change timing algorithm Signed-off-by: Victor Zhao <[email protected]> Signed-off-by: Peng Ju Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-08-06drm/amdgpu: check for allocation failure in amdgpu_vkms_sw_init()Dan Carpenter1-0/+2
Check whether the kcalloc() fails and return -ENOMEM if it does. Fixes: 84ec374bd58036 ("drm/amdgpu: create amdgpu_vkms (v4)") Reviewed-by: Christian König <[email protected]> Signed-off-by: Dan Carpenter <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-08-06drm/amdgpu: don't enable baco on boco platforms in runpmAlex Deucher1-0/+2
If the platform uses BOCO, don't use BACO in runtime suspend. We could end up executing the BACO path if the platform supports both. Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1669 Reviewed-by: Evan Quan <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected]
2021-08-06drm/amdgpu: set RAS EEPROM address from VBIOSJohn Clements3-0/+45
update to latest atombios fw table [Backport to 5.14 - Alex] Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1670 Signed-off-by: John Clements <[email protected]> Reviewed-by: Hawking Zhang <[email protected]>. Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected]
2021-08-05drm/amdgpu: drop redundant null-pointer checks in amdgpu_ttm_tt_populate() ↵Tuo Li1-2/+2
and amdgpu_ttm_tt_unpopulate() The varialbe gtt in the function amdgpu_ttm_tt_populate() and amdgpu_ttm_tt_unpopulate() is guaranteed to be not NULL in the context. Thus the null-pointer checks are redundant and can be dropped. Reported-by: TOTE Robot <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Tuo Li <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-08-05drm/amdgpu: don't enable baco on boco platforms in runpmAlex Deucher1-0/+2
If the platform uses BOCO, don't use BACO in runtime suspend. We could end up executing the BACO path if the platform supports both. Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1669 Reviewed-by: Evan Quan <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-08-05drm/amdgpu: Put MODE register in wave debug infoJoseph Greathouse5-0/+5
Add the MODE register into the per-wave debug information. This register holds state such as FP rounding and denorm modes, which exceptions are enabled, and active clamping modes. Signed-off-by: Joseph Greathouse <[email protected]> Acked-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-08-05drm/amdgpu: set RAS EEPROM address from VBIOSJohn Clements3-0/+58
update to latest atombios fw table Signed-off-by: John Clements <[email protected]> Reviewed-by: Hawking Zhang <[email protected]>. Signed-off-by: Alex Deucher <[email protected]>
2021-08-05drm/amd/amdgpu: Recovery vcn instance iterate.Peng Ju Zhou1-13/+20
The previous logic is recording the amount of valid vcn instances to use them on SRIOV, it is a hard task due to the vcn accessment is based on the index of the vcn instance. Check if the vcn instance enabled before do instance init. Signed-off-by: Peng Ju Zhou <[email protected]> Reviewed-by: Monk Liu <[email protected]> Acked-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-08-05drm/amdgpu: added synchronization for psp cmd buf accessJohn Clements1-66/+139
resolved race condition accessing psp cmd submission memory Signed-off-by: John Clements <[email protected]> Reviewed-by: Guchun Chen <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-08-05drm/amdgpu: update PSP BL cmd IDsJohn Clements1-3/+3
resolved bug with incorrect PSP BL cmd IDs Signed-off-by: John Clements <[email protected]> Acked-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-08-05drm/amdgpu: add DID for beige gobyChengming Gui1-0/+7
Add device ids. Signed-off-by: Chengming Gui <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-08-05drm/amd/amdgpu: remove redundant host to psp cmd buf allocationsCandice Li1-161/+66
Signed-off-by: Candice Li <[email protected]> Reviewed-by: John Clements <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-08-05drm/amdgpu: replace dce_virtual with amdgpu_vkms (v3)Ryan Taylor10-290/+228
Move dce_virtual into amdgpu_vkms and update all references to dce_virtual with amdgpu_vkms. v2: Removed more references to dce_virtual. v3: Restored display modes from previous implementation. Signed-off-by: Ryan Taylor <[email protected]> Reported-by: kernel test robot <[email protected]> Suggested-by: Alex Deucher <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-08-05drm/amdgpu: cleanup dce_virtualRyan Taylor1-565/+3
Remove obsolete functions and variables from dce_virtual. Signed-off-by: Ryan Taylor <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-08-05drm/amdgpu: create amdgpu_vkms (v4)Ryan Taylor7-11/+493
Modify the VKMS driver into an api that dce_virtual can use to create virtual displays that obey drm's atomic modesetting api. v2: Made local functions static. v3: Switched vkms_output kzalloc for kcalloc. Cleanup patches by moving display mode fixes to this patch. v4: Update atomic_check and atomic_update to comply with new kms api. Signed-off-by: Ryan Taylor <[email protected]> Reported-by: kernel test robot <[email protected]> Suggested-by: Alex Deucher <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-08-05gpu/drm/amd: Remove duplicated include of drm_drv.hzhouchuangao1-2/+0
Duplicate include header file <drm/drm_drv.h> line 28: #include <drm/drm_drv.h> line 44: #include <drm/drm_drv.h> Signed-off-by: zhouchuangao <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-08-05drm/amdgpu: avoid over-handle of fence driver fini in s3 test (v2)Guchun Chen3-10/+11
In amdgpu_fence_driver_hw_fini, no need to call drm_sched_fini to stop scheduler in s3 test, otherwise, fence related failure will arrive after resume. To fix this and for a better clean up, move drm_sched_fini from fence_hw_fini to fence_sw_fini, as it's part of driver shutdown, and should never be called in hw_fini. v2: rename amdgpu_fence_driver_init to amdgpu_fence_driver_sw_init, to keep sw_init and sw_fini paired. Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1668 Fixes: 8d35a2596164c1 ("drm/amdgpu: adjust fence driver enable sequence") Suggested-by: Christian König <[email protected]> Tested-by: Mike Lothian <[email protected]> Signed-off-by: Guchun Chen <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-08-05drm/amdgpu: Fix channel_index table layout for AldebaranMukul Joshi3-12/+12
Fix the channel_index table layout to fetch the correct channel_index when calculating physical address from normalized address during page retirement. Also, fix the number of UMC instances and number of channels within each UMC instance for Aldebaran. Signed-off-by: Mukul Joshi <[email protected]> Reviewed-By: John Clements <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-08-05drm/amdgpu: fix checking pmops when PM_SLEEP is not enabledRandy Dunlap1-1/+1
'pm_suspend_target_state' is only available when CONFIG_PM_SLEEP is set/enabled. OTOH, when both SUSPEND and HIBERNATION are not set, PM_SLEEP is not set, so this variable cannot be used. ../drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c: In function ‘amdgpu_acpi_is_s0ix_active’: ../drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c:1046:11: error: ‘pm_suspend_target_state’ undeclared (first use in this function); did you mean ‘__KSYM_pm_suspend_target_state’? return pm_suspend_target_state == PM_SUSPEND_TO_IDLE; ^~~~~~~~~~~~~~~~~~~~~~~ __KSYM_pm_suspend_target_state Also use shorter IS_ENABLED(CONFIG_foo) notation for checking the 2 config symbols. Fixes: 91e273712ab8dd ("drm/amdgpu: Check pmops for desired suspend state") Signed-off-by: Randy Dunlap <[email protected]> Cc: Alex Deucher <[email protected]> Cc: Christian König <[email protected]> Cc: "Pan, Xinhui" <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Signed-off-by: Alex Deucher <[email protected]>
2021-08-05drm/amdgpu: add DID for beige gobyChengming Gui1-0/+7
Add device ids. Signed-off-by: Chengming Gui <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-08-05drm/amdgpu: fix checking pmops when PM_SLEEP is not enabledRandy Dunlap1-1/+1
'pm_suspend_target_state' is only available when CONFIG_PM_SLEEP is set/enabled. OTOH, when both SUSPEND and HIBERNATION are not set, PM_SLEEP is not set, so this variable cannot be used. ../drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c: In function ‘amdgpu_acpi_is_s0ix_active’: ../drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c:1046:11: error: ‘pm_suspend_target_state’ undeclared (first use in this function); did you mean ‘__KSYM_pm_suspend_target_state’? return pm_suspend_target_state == PM_SUSPEND_TO_IDLE; ^~~~~~~~~~~~~~~~~~~~~~~ __KSYM_pm_suspend_target_state Also use shorter IS_ENABLED(CONFIG_foo) notation for checking the 2 config symbols. Fixes: 91e273712ab8dd ("drm/amdgpu: Check pmops for desired suspend state") Signed-off-by: Randy Dunlap <[email protected]> Cc: Alex Deucher <[email protected]> Cc: Christian König <[email protected]> Cc: "Pan, Xinhui" <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected]
2021-08-02drm/amdgpu: fix the doorbell missing when in CGPG issue for renoir.Yifan Zhang1-1/+20
If GC has entered CGPG, ringing doorbell > first page doesn't wakeup GC. Enlarge CP_MEC_DOORBELL_RANGE_UPPER to workaround this issue. Signed-off-by: Yifan Zhang <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-08-02Revert "Revert "drm/amdkfd: Only apply TLB flush optimization on ALdebaran""Eric Huang1-0/+6
This reverts commit 53d0533049a573298f74ae07a39db14163960e68. Revert reason: The issue has been resolved. Signed-off-by: Eric Huang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-08-02Revert "Revert "drm/amdgpu: Fix warning of Function parameter or member not ↵Eric Huang1-0/+1
described"" This reverts commit 4e7b93ca52fb228b177168d436449c5671415a72. Revert reason: The issue has been resolved. Signed-off-by: Eric Huang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-08-02Revert "Revert "drm/amdkfd: Make TLB flush conditional on mapping""Eric Huang2-9/+12
This reverts commit 7ed9876c9793bfe96fed58ba645d6c8e32f26001. Revert reason: The issue has been resolved. Signed-off-by: Eric Huang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-08-02Revert "Revert "drm/amdgpu: Add table_freed parameter to amdgpu_vm_bo_update""Eric Huang4-10/+10
This reverts commit 024d8811c90ed56d8b90cdcf71e51c9fedeff460. Revert reason: The issue has been resolved. Signed-off-by: Eric Huang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-08-02drm/amdgpu: Fix out-of-bounds read when update mappingxinhui pan1-1/+2
If one GTT BO has been evicted/swapped out, it should sit in CPU domain. TTM only alloc struct ttm_resource instead of struct ttm_range_mgr_node for sysMem. Now when we update mapping for such invalidated BOs, we might walk out of bounds of struct ttm_resource. Three possible fix: 1) Let sysMem manager alloc struct ttm_range_mgr_node, like ttm_range_manager does. 2) Pass pages_addr to update_mapping function too, but need memset pages_addr[] to zero when unpopulate. 3) Init amdgpu_res_cursor directly. bug is detected by kfence. ================================================================== BUG: KFENCE: out-of-bounds read in amdgpu_vm_bo_update_mapping+0x564/0x6e0 Out-of-bounds read at 0x000000008ea93fe9 (64B right of kfence-#167): amdgpu_vm_bo_update_mapping+0x564/0x6e0 [amdgpu] amdgpu_vm_bo_update+0x282/0xa40 [amdgpu] amdgpu_vm_handle_moved+0x19e/0x1f0 [amdgpu] amdgpu_cs_vm_handling+0x4e4/0x640 [amdgpu] amdgpu_cs_ioctl+0x19e7/0x23c0 [amdgpu] drm_ioctl_kernel+0xf3/0x180 [drm] drm_ioctl+0x2cb/0x550 [drm] amdgpu_drm_ioctl+0x5e/0xb0 [amdgpu] kfence-#167 [0x000000008e11c055-0x000000001f676b3e ttm_sys_man_alloc+0x35/0x80 [ttm] ttm_resource_alloc+0x39/0x50 [ttm] ttm_bo_swapout+0x252/0x5a0 [ttm] ttm_device_swapout+0x107/0x180 [ttm] ttm_global_swapout+0x6f/0x130 [ttm] ttm_tt_populate+0xb1/0x2a0 [ttm] ttm_bo_handle_move_mem+0x17e/0x1d0 [ttm] ttm_mem_evict_first+0x59d/0x9c0 [ttm] ttm_bo_mem_space+0x39f/0x400 [ttm] ttm_bo_validate+0x13c/0x340 [ttm] ttm_bo_init_reserved+0x269/0x540 [ttm] amdgpu_bo_create+0x1d1/0xa30 [amdgpu] amdgpu_bo_create_user+0x40/0x80 [amdgpu] amdgpu_gem_object_create+0x71/0xc0 [amdgpu] amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu+0x2f2/0xcd0 [amdgpu] kfd_ioctl_alloc_memory_of_gpu+0xe2/0x330 [amdgpu] kfd_ioctl+0x461/0x690 [amdgpu] Signed-off-by: xinhui pan <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>