Age | Commit message (Collapse) | Author | Files | Lines |
|
The KFD pre_reset should be called before reset been executed, it will
hold the lock to prevent other rocm process to sent the packlage to hiq
during host execute the real reset on the HW
Signed-off-by: shaoyunl <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Properly handle SI DC support when CONFIG_DRM_AMD_DC_SI is not
set.
Fixes: f7f12b25823c0d ("drm/amdgpu: default to true in amdgpu_device_asic_has_dc_support")
Reviewed-by: Evan Quan <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Remove duplicated kfd_resume_iommu which already runs
in mdgpu_amdkfd_device_init.
Tested-By: Ken Moffat <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: James Zhu <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
[Why]
In advance tdr mode, the real bad job will be resubmitted twice, while
in drm_sched_resubmit_jobs_ext, there's a dma_fence_put, so the bad job
is put one more time than other jobs.
[How]
Adding dma_fence_get before resbumit job in
amdgpu_device_recheck_guilty_jobs and put the fence for normal jobs
Signed-off-by: Jingwen Chen <[email protected]>
Reviewed-by: Andrey Grodzovsky <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
amdgpu_fence_driver_sw_fini() should be executed before
amdgpu_device_ip_fini(), otherwise fence driver resource
won't be properly freed as adev->rings have been tore down.
Fixes: 72c8c97b1522 ("drm/amdgpu: Split amdgpu_device_fini into early and late")
Signed-off-by: Lang Yu <[email protected]>
Reviewed-by: Andrey Grodzovsky <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
The msm next tree is based on rc3, so let's just backmerge rc7 before pulling it in.
Signed-off-by: Dave Airlie <[email protected]>
|
|
[Why]
drm_irq_uninstall is called in irq_fini_hw so that irq is disabled in sw
stage. SMU (and maybe other IP blocks) fini_hw will call irq_put for
cleanup and the whole cleanup process will be skipped because of
drm->irq_enable = false.
[How]
Move ip_fini_early before irq_fini_hw.
Signed-off-by: YuBiao Wang <[email protected]>
Reviewed-by: Andrey Grodzovsky <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
When IOMMU disabled in sbios and kfd in iommuv2 path, iommuv2
init will fail. But this failure should not block amdgpu driver init.
Reported-by: youling <[email protected]>
Tested-by: youling <[email protected]>
Signed-off-by: Yifan Zhang <[email protected]>
Reviewed-by: James Zhu <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
adev_to_drm is used everywhere, so improve recent changes
when accessing drm_device pointer from amdgpu_device.
Signed-off-by: Guchun Chen <[email protected]>
Acked-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Unify BO evicting functionality for possible memory
types in amdgpu_ttm.c.
Signed-off-by: Nirmoy Das <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
In current code, when a PCI error state pci_channel_io_normal is detectd,
it will report PCI_ERS_RESULT_CAN_RECOVER status to PCI driver, and PCI
driver will continue the execution of PCI resume callback report_resume by
pci_walk_bridge, and the callback will go into amdgpu_pci_resume
finally, where write lock is releasd unconditionally without acquiring
such lock first. In this case, a deadlock will happen when other threads
start to acquire the read lock.
To fix this, add a member in amdgpu_device strucutre to cache
pci_channel_state, and only continue the execution in amdgpu_pci_resume
when it's pci_channel_io_frozen.
Fixes: c9a6b82f45e2 ("drm/amdgpu: Implement DPC recovery")
Suggested-by: Andrey Grodzovsky <[email protected]>
Signed-off-by: Guchun Chen <[email protected]>
Reviewed-by: Andrey Grodzovsky <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
This patch is to fix clinfo failure in Raven/Picasso:
Number of platforms: 1
Platform Profile: FULL_PROFILE
Platform Version: OpenCL 2.2 AMD-APP (3364.0)
Platform Name: AMD Accelerated Parallel Processing
Platform Vendor: Advanced Micro Devices, Inc.
Platform Extensions: cl_khr_icd cl_amd_event_callback
Platform Name: AMD Accelerated Parallel Processing Number of devices: 0
Signed-off-by: Yifan Zhang <[email protected]>
Reviewed-by: James Zhu <[email protected]>
Tested-by: James Zhu <[email protected]>
Acked-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
In current code, when a PCI error state pci_channel_io_normal is detectd,
it will report PCI_ERS_RESULT_CAN_RECOVER status to PCI driver, and PCI
driver will continue the execution of PCI resume callback report_resume by
pci_walk_bridge, and the callback will go into amdgpu_pci_resume
finally, where write lock is releasd unconditionally without acquiring
such lock first. In this case, a deadlock will happen when other threads
start to acquire the read lock.
To fix this, add a member in amdgpu_device strucutre to cache
pci_channel_state, and only continue the execution in amdgpu_pci_resume
when it's pci_channel_io_frozen.
Fixes: c9a6b82f45e2 ("drm/amdgpu: Implement DPC recovery")
Suggested-by: Andrey Grodzovsky <[email protected]>
Signed-off-by: Guchun Chen <[email protected]>
Reviewed-by: Andrey Grodzovsky <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Make sure that we notice this in error reports.
Signed-off-by: Christian König <[email protected]>
Acked-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
This reverts commit 728e7e0cd61899208e924472b9e641dbeb0775c4.
Further discussion reveals that this feature is severely broken
and needs to be reverted ASAP.
GPU reset can never be delayed by userspace even for debugging or
otherwise we can run into in kernel deadlocks.
Signed-off-by: Christian König <[email protected]>
Acked-by: Alex Deucher <[email protected]>
Acked-by: Nirmoy Das <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
This patch is to fix clinfo failure in Raven/Picasso:
Number of platforms: 1
Platform Profile: FULL_PROFILE
Platform Version: OpenCL 2.2 AMD-APP (3364.0)
Platform Name: AMD Accelerated Parallel Processing
Platform Vendor: Advanced Micro Devices, Inc.
Platform Extensions: cl_khr_icd cl_amd_event_callback
Platform Name: AMD Accelerated Parallel Processing Number of devices: 0
Signed-off-by: Yifan Zhang <[email protected]>
Reviewed-by: James Zhu <[email protected]>
Tested-by: James Zhu <[email protected]>
Acked-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Add a new asic type for asics where we don't have an
explicit entry in the PCI ID list. We don't need
an asic type for these asics, other than something higher
than the existing ones, so just use this for all new
asics.
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
We are not going to support any new chips with the old
non-DC code so make it the default.
Reviewed-by: Harry Wentland <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Rather than hardcoding based on asic_type, use the IP
discovery table to configure the driver.
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Rather than hardcoding based on asic_type, use the IP
discovery table to configure the driver.
v2: rebase
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Rather than hardcoding based on asic_type, use the IP
discovery table to configure the driver.
Only tested on Navi10 so far.
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Drive the asic setup from the IP discovery table rather than
hardcoded settings based on asic type.
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
[Why]
add display related cyan_skillfish files in.
makefile controlled by CONFIG_DRM_AMD_DC_DCN201 flag.
v2: squash in clang fixes from Harry, Nathan
v3: squash in missing CONFIG_DRM_AMD_DC check (Alex)
Signed-off-by: Charlene Liu <[email protected]>
Signed-off-by: Zhan Liu <[email protected]>
Reviewed-by: Charlene Liu <[email protected]>
Acked-by: Jun Lei <[email protected]>
Acked-by: Harry Wentland <[email protected]>
Acked-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Handle all DMA IOMMU group related dependencies before the
group is removed and we try to access it after free.
v2:
Move the actul handling function to TTM
Signed-off-by: Andrey Grodzovsky <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
adev->rmmio is set to be NULL in amdgpu_device_unmap_mmio to prevent
access after pci_remove, however, in SRIOV case, amdgpu_virt_release_full_gpu
will still use adev->rmmio for access after amdgpu_device_unmap_mmio.
The patch is to move such SRIOV calling earlier to fini_early stage.
Fixes: 07775fc13878 ("drm/amdgpu: Unmap all MMIO mappings")
Cc: Andrey Grodzovsky <[email protected]>
Signed-off-by: Leslie Shi <[email protected]>
Signed-off-by: Guchun Chen <[email protected]>
Reviewed-by: Andrey Grodzovsky <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Separate iommu_resume from kfd_resume, and move it before
other amdgpu ip init/resume.
Bug: https://bugzilla.kernel.org/show_bug.cgi?id=211277
Signed-off-by: James Zhu <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Cc: [email protected]
|
|
Separate iommu_resume from kfd_resume, and move it before
other amdgpu ip init/resume.
Bug: https://bugzilla.kernel.org/show_bug.cgi?id=211277
Signed-off-by: James Zhu <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
The AtomicOp Requester Enable bit is reserved in VFs and the PF value applies to all
associated VFs. so guest driver can not directly enable the atomicOps for VF, it
depends on PF to enable it. In current design, amdgpu driver will get the enabled
atomicOps bits through private pf2vf data
Signed-off-by: shaoyunl <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
schedule_delayed_work does not push back the work if it was already
scheduled before, so amdgpu_device_delay_enable_gfx_off ran ~100 ms
after the first time GFXOFF was disabled and re-enabled, even if GFXOFF
was disabled and re-enabled again during those 100 ms.
This resulted in frame drops / stutter with the upcoming mutter 41
release on Navi 14, due to constantly enabling GFXOFF in the HW and
disabling it again (for getting the GPU clock counter).
To fix this, call cancel_delayed_work_sync when the disable count
transitions from 0 to 1, and only schedule the delayed work on the
reverse transition, not if the disable count was already 0. This makes
sure the delayed work doesn't run at unexpected times, and allows it to
be lock-free.
v2:
* Use cancel_delayed_work_sync & mutex_trylock instead of
mod_delayed_work.
v3:
* Make amdgpu_device_delay_enable_gfx_off lock-free (Christian König)
v4:
* Fix race condition between amdgpu_gfx_off_ctrl incrementing
adev->gfx.gfx_off_req_count and amdgpu_device_delay_enable_gfx_off
checking for it to be 0 (Evan Quan)
Cc: [email protected]
Reviewed-by: Evan Quan <[email protected]>
Reviewed-by: Lijo Lazar <[email protected]> # v3
Acked-by: Christian König <[email protected]> # v3
Signed-off-by: Michel Dänzer <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
[Why]
In some cases when we unload driver, warning call trace
will show up in vram_mgr_fini which claims that LRU is not empty, caused
by the ttm bo inside delay deleted queue.
[How]
We should flush delayed work to make sure the delay deleting is done.
Signed-off-by: YuBiao Wang <[email protected]>
Reviewed-by: Andrey Grodzovsky <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Why: Previously hw fence is alloced separately with job.
It caused historical lifetime issues and corner cases.
The ideal situation is to take fence to manage both job
and fence's lifetime, and simplify the design of gpu-scheduler.
How:
We propose to embed hw_fence into amdgpu_job.
1. We cover the normal job submission by this method.
2. For ib_test, and submit without a parent job keep the
legacy way to create a hw fence separately.
v2:
use AMDGPU_FENCE_FLAG_EMBED_IN_JOB_BIT to show that the fence is
embedded in a job.
v3:
remove redundant variable ring in amdgpu_job
v4:
add tdr sequence support for this feature. Add a job_run_counter to
indicate whether this job is a resubmit job.
v5
add missing handling in amdgpu_fence_enable_signaling
Signed-off-by: Jingwen Chen <[email protected]>
Signed-off-by: Jack Zhang <[email protected]>
Reviewed-by: Andrey Grodzovsky <[email protected]>
Reviewed by: Monk Liu <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
When init failed in early init stage, amdgpu_object has
not been initialized, so hasn't the ttm delayed queue functions.
Signed-off-by: YuBiao Wang <[email protected]>
Reviewed-by: Emily.Deng <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
In amdgpu_fence_driver_hw_fini, no need to call drm_sched_fini to stop
scheduler in s3 test, otherwise, fence related failure will arrive
after resume. To fix this and for a better clean up, move drm_sched_fini
from fence_hw_fini to fence_sw_fini, as it's part of driver shutdown, and
should never be called in hw_fini.
v2: rename amdgpu_fence_driver_init to amdgpu_fence_driver_sw_init,
to keep sw_init and sw_fini paired.
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1668
Fixes: 8d35a2596164c1 ("drm/amdgpu: adjust fence driver enable sequence")
Suggested-by: Christian König <[email protected]>
Tested-by: Mike Lothian <[email protected]>
Signed-off-by: Guchun Chen <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
https://gitlab.freedesktop.org/agd5f/linux into drm-next
amd-drm-next-5.15-2021-07-29:
amdgpu:
- VCN/JPEG power down sequencing fixes
- Various navi pcie link handling fixes
- Clockgating fixes
- Yellow Carp fixes
- Beige Goby fixes
- Misc code cleanups
- S0ix fixes
- SMU i2c bus rework
- EEPROM handling rework
- PSP ucode handling cleanup
- SMU error handling rework
- AMD HDMI freesync fixes
- USB PD firmware update rework
- MMIO based vram access rework
- Misc display fixes
- Backlight fixes
- Add initial Cyan Skillfish support
- Overclocking fixes suspend/resume
amdkfd:
- Sysfs leak fix
- Add counters for vm faults and migration
- GPUVM TLB optimizations
radeon:
- Misc fixes
Signed-off-by: Dave Airlie <[email protected]>
From: Alex Deucher <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
Fence driver was enabled per ring when sw init on per IP block before.
Change to enable all the fence driver at the same time after
amdgpu_device_ip_init finished.
Rename some function related to fence to make it reasonable for read.
Signed-off-by: Likun Gao <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
On Sienna Cichlid, in pass-through mode, if we unload the driver in BACO
mode(RTPM), then the kernel would receive thousands of interrupts.
That's because there is doorbell monitor interrupt on BIF, so KVM keeps
injecting interrupts to the guest VM. So we should clear the doorbell
interrupt status after BACO exit.
v2: Modify coding style and commit message
Signed-off-by: Chengzhe Liu <[email protected]>
Reviewed-by: Luben Tuikov <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Use FAMILY_NV for cyan_skillfish.
Signed-off-by: Tao Zhou <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Add cyan_skillfish asic family.
Signed-off-by: Tao Zhou <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
[Why]
Currently all timedout job will be considered to be guilty. In SRIOV
multi-vf use case, the vf flr happens first and then job time out is
found. There can be several jobs timeout during a very small time slice.
And if the innocent sdma job time out is found before the real bad
job, then the innocent sdma job will be set to guilty. This will lead
to a page fault after resubmitting job.
[How]
If the job is a kernel job, we will always consider it not guilty
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Jingwen Chen <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
The callback functions are used for SRIOV read/write instead
of just for rlcg read/write
Signed-off-by: Roy Sun <[email protected]>
Reviewed-by: Zhou pengju <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
git://anongit.freedesktop.org/drm/drm-misc into drm-next
drm-misc-next for v5.15-rc1:
UAPI Changes:
- Remove sysfs stats for dma-buf attachments, as it causes a performance regression.
Previous merge is not in a rc kernel yet, so no userspace regression possible.
Cross-subsystem Changes:
- Sanitize user input in kyro's viewport ioctl.
- Use refcount_t in fb_info->count
- Assorted fixes to dma-buf.
- Extend x86 efifb handling to all archs.
- Fix neofb divide by 0.
- Document corpro,gm7123 bridge dt bindings.
Core Changes:
- Slightly rework drm master handling.
- Cleanup vgaarb handling.
- Assorted fixes.
Driver Changes:
- Add support for ws2401 panel.
- Assorted fixes to stm, ast, bochs.
- Demidlayer ingenic irq.
Signed-off-by: Dave Airlie <[email protected]>
From: Maarten Lankhorst <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
The VGA arbitration is entirely based on pci_dev structures, so just pass
that back to the set_vga_decode callback.
Signed-off-by: Christoph Hellwig <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Acked-by: Christian König <[email protected]>
Signed-off-by: Christian König <[email protected]>
|
|
All callers pass NULL as the irq_set_state argument, so remove it and
the ->irq_set_state member in struct vga_device.
Signed-off-by: Christoph Hellwig <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Acked-by: Christian König <[email protected]>
Signed-off-by: Christian König <[email protected]>
|
|
Add a trivial wrapper for the unregister case that sets all fields to
NULL.
Signed-off-by: Christoph Hellwig <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Acked-by: Christian König <[email protected]>
Signed-off-by: Christian König <[email protected]>
|
|
split amdgpu_device_access_vram()
1. amdgpu_device_mm_access(): using MM_INDEX/MM_DATA to access vram
2. amdgpu_device_aper_access(): using vram aperature to access vram (option)
Signed-off-by: Kevin Wang <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
This reverts commit 4192f7b5768912ceda82be2f83c87ea7181f9980.
It is not true (as stated in the reverted commit changelog) that we never
unmap the BAR on failure; it actually does happen properly on
amdgpu_driver_load_kms() -> amdgpu_driver_unload_kms() ->
amdgpu_device_fini() error path.
What's worse, this commit actually completely breaks resource freeing on
probe failure (like e.g. failure to load microcode), as
amdgpu_driver_unload_kms() notices adev->rmmio being NULL and bails too
early, leaving all the resources that'd normally be freed in
amdgpu_acpi_fini() and amdgpu_device_fini() still hanging around, leading
to all sorts of oopses when someone tries to, for example, access the
sysfs and procfs resources which are still around while the driver is
gone.
Fixes: 4192f7b57689 ("drm/amdgpu: unmap register bar on device init failure")
Reported-by: Vojtech Pavlik <[email protected]>
Signed-off-by: Jiri Kosina <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
In some asics, we need to adjust the behavior according to the apu flags
at very early stage.
Signed-off-by: Huang Rui <[email protected]>
Reviewed-by: Aaron Liu <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Enable gpu recovery for beige_goby.
Signed-off-by: Chengming Gui <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Move shadow_list to struct amdgpu_bo_vm as shadow BOs
are part of PT/PD BOs.
Signed-off-by: Nirmoy Das <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
https://gitlab.freedesktop.org/agd5f/linux into drm-next
amd-drm-next-5.14-2021-06-09:
amdgpu:
- SR-IOV fixes
- Smartshift updates
- GPUVM TLB flush updates
- 16bpc fixed point display fix for DCE11
- BACO cleanups and core refactoring
- Aldebaran updates
- Initial Yellow Carp support
- RAS fixes
- PM API cleanup
- DC visual confirm updates
- DC DP MST fixes
- DC DML fixes
- Misc code cleanups and bug fixes
amdkfd:
- Initial Yellow Carp support
radeon:
- memcpy_to/from_io fixes
UAPI:
- Add Yellow Carp chip family id
Used internally in the kernel driver and by mesa
Signed-off-by: Dave Airlie <[email protected]>
From: Alex Deucher <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|