Age | Commit message (Collapse) | Author | Files | Lines |
|
Add secondary instance version info for vega20, arcturure, and
aldebaran.
Acked-by: Christian König <[email protected]>
Reviewed-by: Luben Tuikov <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Remove GPRs init for ALDEBARAN in gpu reset temporarily, will add the init once the
algorithm is stable.
v2: Only remove GPRs init in gpu reset.
v3: Suspend needs it, only skip it in gpu reset.
Signed-off-by: Tao Zhou <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Skip GPRs init in specific condition since current GPRs init algorithm only works for some CU settings.
Signed-off-by: Tao Zhou <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
TA version should only be displayed in firmware version column.
Signed-off-by: Candice Li <[email protected]>
Reviewed-by: John Clements <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
amdgpu_fence_driver_sw_fini() should be executed before
amdgpu_device_ip_fini(), otherwise fence driver resource
won't be properly freed as adev->rings have been tore down.
Fixes: 72c8c97b1522 ("drm/amdgpu: Split amdgpu_device_fini into early and late")
Signed-off-by: Lang Yu <[email protected]>
Reviewed-by: Andrey Grodzovsky <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Currently, all kfd BOs use same destruction routine. But pinned
BOs are not unpinned properly. Separate them from general routine.
v2 (Felix):
Add safeguard to prevent user space from freeing signal BO.
Kunmap signal BO in the event of setting event page error.
Just kunmap signal BO to avoid duplicating the code.
Signed-off-by: Lang Yu <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
The userptr can be unmapped by application and still registered to
driver, restore userptr work return user pages will get -EFAULT bad
address error. Pretend this error as succeed. GPU access this userptr
will have VM fault later, it is better than application soft hangs with
stalled user mode queues.
v2: squash in warning fix (Alex)
Signed-off-by: Philip Yang <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
When a GPU hits the bad_page_threshold, it will not be initialized by
the amdgpu driver. This means that the table cannot be cleared, nor can
information gathering be performed (getting serial number, BDF, etc).
If the bad_page_threshold kernel parameter is set to -2,
continue to initialize the GPU, while printing a warning to dmesg that
this action has been done
v2: squash in Luben's fix to restore RAS info reporting
Cc: Luben Tuikov <[email protected]>
Cc: Mukul Joshi <[email protected]>
Signed-off-by: Kent Russell <[email protected]>
Acked-by: Felix Kuehling <[email protected]>
Reviewed-by: Luben Tuikov <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
dmesg doesn't warn when the number of bad pages approaches the
threshold for page retirement. WARN when the number of bad pages
is at 90% or greater for easier checks and planning, instead of waiting
until the GPU is full of bad pages.
Cc: Luben Tuikov <[email protected]>
Cc: Mukul Joshi <[email protected]>
Signed-off-by: Kent Russell <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Reviewed-by: Luben Tuikov <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
The msm next tree is based on rc3, so let's just backmerge rc7 before pulling it in.
Signed-off-by: Dave Airlie <[email protected]>
|
|
In order to better track where in the kernel the dma-buf code is used,
put the symbols in the namespace DMA_BUF and modify all users of the
symbols to properly import the namespace to not break the build at the
same time.
Now the output of modinfo shows the use of these symbols, making it
easier to watch for users over time:
$ modinfo drivers/misc/fastrpc.ko | grep import
import_ns: DMA_BUF
Cc: "Pan, Xinhui" <[email protected]>
Cc: David Airlie <[email protected]>
Cc: Maarten Lankhorst <[email protected]>
Cc: Maxime Ripard <[email protected]>
Cc: Thomas Zimmermann <[email protected]>
Cc: Mauro Carvalho Chehab <[email protected]>
Cc: [email protected]
Acked-by: Daniel Vetter <[email protected]>
Acked-by: Christian König <[email protected]>
Acked-by: Arnd Bergmann <[email protected]>
Acked-by: Sumit Semwal <[email protected]>
Acked-by: Alex Deucher <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
The extended bits were not available for use on vega20 and
presumably arcturus as well.
Fixes: a0f9f854666834 ("drm/amdgpu/nbio7.4: don't use GPU_HDP_FLUSH bit 12")
Reviewed-and-tested-by: Guchun Chen <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Some navy flounder boards do not properly mark harvested
VCN instances. Fix that here.
v2: use IP versions
Fixes: 1b592d00b4ac83 ("drm/amdgpu/vcn: remove manual instance setting")
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1743
Reviewed-and-tested-by: Guchun Chen <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
No need to use the id variable, just use the constant
plus instance offset directly.
Reviewed-by: Leo Liu <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
No need to use the tmp variable, just use the constant
directly.
Reviewed-by: Leo Liu <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Roughly the same code was present in all VCN versions.
Consolidate it into a single function.
v2: use AMDGPU_UCODE_ID_VCN + i, check if num_inst >= 2
Signed-off-by: Alex Deucher <[email protected]>
Reviewed-by: Leo Liu <[email protected]>
Reviewed-by: James Zhu <[email protected]>
|
|
Only enable firmware for the instance that is enabled.
v2: use AMDGPU_UCODE_ID_VCN + i
Fixes: 1b592d00b4ac83 ("drm/amdgpu/vcn: remove manual instance setting")
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1743
Reviewed-by: James Zhu <[email protected]>
Reviewed-by: Leo Liu <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Add dummy_page_addr to sriov msg for host driver to set
GCVM_L2_PROTECTION_DEFAULT_ADDR* registers correctly.
v2:
should update vf2pf msg instead
Signed-off-by: Jingwen Chen <[email protected]>
Reviewed-by: Horace Chen <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
PSP firmware will be responsible for applying the GRBM CAM remapping in
the production. And the GRBM_CAM_INDEX / GRBM_CAM_DATA registers will be
protected by PSP under security policy. So remove it according to the
new security policy.
Signed-off-by: Huang Rui <[email protected]>
Acked-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
B0 internal rev_id is 0x01, B1 internal rev_id is 0x02.
The external rev_id for B0 and B1 is 0x20.
The original expression is not suitable for B1.
v2: squash in fix for display code (Alex)
Signed-off-by: Aaron Liu <[email protected]>
Reviewed-by: Huang Rui <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Cc: [email protected]
|
|
Change the error message when the bad_page_threshold is reached,
explicitly stating that the GPU will not be initialized.
Cc: Luben Tuikov <[email protected]>
Cc: Mukul Joshi <[email protected]>
Signed-off-by: Kent Russell <[email protected]>
Reviewed-by: Luben Tuikov <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
No longer used since IP enumeration is driven by the IP
discovery table now.
Acked-by: Evan Quan <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
No longer used since IP enumeration is now driven by
amdgpu IP discovery code.
Acked-by: Evan Quan <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
No longer used since IP enumeration is now driven by
amdgpu IP discovery code.
Acked-by: Evan Quan <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Check was incorrectly converted to IP version checking.
Fixes: 4b0ad8425498ba ("drm/amdgpu/gfx10: convert to IP version checking")
Reviewed-by: Evan Quan <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
show() must not use snprintf() when formatting the value to be
returned to user space.
Fix the following coccicheck warning:
drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c:427:
WARNING: use scnprintf or sprintf.
Signed-off-by: Qing Wang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
B0 internal rev_id is 0x01, B1 internal rev_id is 0x02.
The external rev_id for B0 and B1 is 0x20.
The original expression is not suitable for B1.
v2: squash in fix for display code (Alex)
Signed-off-by: Aaron Liu <[email protected]>
Reviewed-by: Huang Rui <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Otherwise, hw_id_name string is NULL for SDMA 2 and 3 when dumping
ip version from VBIOS.
Signed-off-by: Guchun Chen <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Output a warning message if RAS TA returns UNSUPPORTED_ERROR_INJ status.
v2: implement it in psp_ras_ta_check_status function.
Signed-off-by: Tao Zhou <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Create new function to check status returned by RAS TA.
Signed-off-by: Tao Zhou <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
[Why]
drm_irq_uninstall is called in irq_fini_hw so that irq is disabled in sw
stage. SMU (and maybe other IP blocks) fini_hw will call irq_put for
cleanup and the whole cleanup process will be skipped because of
drm->irq_enable = false.
[How]
Move ip_fini_early before irq_fini_hw.
Signed-off-by: YuBiao Wang <[email protected]>
Reviewed-by: Andrey Grodzovsky <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Some registers' access will fail without PSP RL after resume.
Signed-off-by: Tao Zhou <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
We should unreference a gem object instead of an amdgpu bo here.
Fixes: fd9a9f8801de ("drm/amdgpu: Use GEM obj reference for KFD BOs")
Signed-off-by: Lang Yu <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
psp_check_pmfw_centralized_cstate_management
Missed a few asics.
v2: update comment
Fixes: 82d05736c47b19 ("drm/amdgpu/amdgpu_psp: convert to IP version checking")
Reviewed-by: Evan Quan <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
When creating unregistered new svm range to recover retry fault, avoid
new svm range to overlap with ranges or userptr ranges managed by TTM,
otherwise svm migration will trigger TTM or userptr eviction, to evict
user queues unexpectedly.
Change helper amdgpu_ttm_tt_affect_userptr to return userptr which is
inside the range. Add helper svm_range_check_vm_userptr to scan all
userptr of the vm, and return overlap userptr bo start, last.
Signed-off-by: Philip Yang <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
When IOMMU disabled in sbios and kfd in iommuv2 path, iommuv2
init will fail. But this failure should not block amdgpu driver init.
Reported-by: youling <[email protected]>
Tested-by: youling <[email protected]>
Signed-off-by: Yifan Zhang <[email protected]>
Reviewed-by: James Zhu <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
During mode2 reset, the GPU is temporarily removed from the
mgpu_info list. As a result, page retirement fails because it
cannot find the GPU in the GPU list.
To fix this, create our own list of GPUs that support MCE notifier
based page retirement and use that list to check if the UMC error
occurred on a GPU that supports MCE notifier based page retirement.
Signed-off-by: Mukul Joshi <[email protected]>
Reviewed-by: Tao Zhou <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Add the missing call to re-enable RAS error injections on the Aldebaran
mode2 reset code path.
Signed-off-by: Mukul Joshi <[email protected]>
Reviewed-by: Tao Zhou <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Display support for cyan skillfish is ready now. Enable it!
Signed-off-by: Lang Yu <[email protected]>
Reviewed-by: Huang Rui <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
It's used internally by firmware. Using it in the driver
could conflict with firmware.
v2: squash in fix for navi1x (Alex)
Reviewed-by: James Zhu <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
It's used internally by firmware. Using it in the driver
could conflict with firmware.
Reviewed-by: James Zhu <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
git://anongit.freedesktop.org/drm/drm-misc into drm-next
drm-misc-next for v5.16:
UAPI Changes:
- Allow empty drm leases for creating separate GEM namespaces.
Cross-subsystem Changes:
- Slightly rework dma_buf_poll.
- Add dma_resv_for_each_fence_unlocked to iterate, and use it inside
the lockless dma-resv functions.
Core Changes:
- Allow devm_drm_of_get_bridge to build without CONFIG_OF for compile testing.
- Add more DP2 headers.
- fix CONFIG_FB dependency in fb_helper.
- Add DRM_FORMAT_R8 to drm_format_info, and helpers for RGB332 and RGB888.
- Fix crash on a 0 or invalid EDID.
Driver Changes:
- Apply and revert DRM_MODESET_LOCK_ALL_BEGIN.
- Add mode_valid to ti-sn65dsi86 bridge.
- Support multiple syncobjs in v3d.
- Add R8, RGB332 and RGB888 pixel formats to GUD.
- Use devm_add_action_or_reset in dw-hdmi-cec.
Signed-off-by: Dave Airlie <[email protected]>
# gpg: Signature made Wed 06 Oct 2021 20:48:12 AEST
# gpg: using RSA key B97BD6A80CAC4981091AE547FE558C72A67013C3
# gpg: Good signature from "Maarten Lankhorst <[email protected]>" [expired]
# gpg: aka "Maarten Lankhorst <[email protected]>" [expired]
# gpg: aka "Maarten Lankhorst <[email protected]>" [expired]
# gpg: Note: This key has expired!
# Primary key fingerprint: B97B D6A8 0CAC 4981 091A E547 FE55 8C72 A670 13C3
From: Maarten Lankhorst <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
adev_to_drm is used everywhere, so improve recent changes
when accessing drm_device pointer from amdgpu_device.
Signed-off-by: Guchun Chen <[email protected]>
Acked-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Missing 4.1.2.
Reviewed-by: Rodrigo Siqueira <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Was missed when converting the driver over to IP based
initialization.
Tested-by: Harry Wentland <[email protected]>
Reviewed-by: Guchun Chen <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Unify BO evicting functionality for possible memory
types in amdgpu_ttm.c.
Signed-off-by: Nirmoy Das <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
On Aldebaran, GPU driver will handle bad page retirement
for GPU memory even though UMC is host managed. As a result,
register a bad page retirement handler on the mce notifier
chain to retire bad pages on Aldebaran.
Signed-off-by: Mukul Joshi <[email protected]>
Reviewed-by: Yazen Ghannam <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Check first if debugfs is initialized before creating
amdgpu debugfs files.
References: https://gitlab.freedesktop.org/drm/amd/-/issues/1686
Signed-off-by: Nirmoy Das <[email protected]>
Reviewed-by: Christian König <[email protected]>
Reviewed-by: Lijo Lazar <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
ind_block_64b_no_128bcl means INDEP_64B && INDEP_128B &&
MAX_COMPRESSED_BLOCK_SIZE == 64B. Only used by gfx10.3.
ind_block_64b means INDEP_64B && !INDEP_128B &&
MAX_COMPRESSED_BLOCK_SIZE == 64B. Only used by gfx9 and gfx10.
Signed-off-by: Marek Olšák <[email protected]>
Acked-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
In current code, when a PCI error state pci_channel_io_normal is detectd,
it will report PCI_ERS_RESULT_CAN_RECOVER status to PCI driver, and PCI
driver will continue the execution of PCI resume callback report_resume by
pci_walk_bridge, and the callback will go into amdgpu_pci_resume
finally, where write lock is releasd unconditionally without acquiring
such lock first. In this case, a deadlock will happen when other threads
start to acquire the read lock.
To fix this, add a member in amdgpu_device strucutre to cache
pci_channel_state, and only continue the execution in amdgpu_pci_resume
when it's pci_channel_io_frozen.
Fixes: c9a6b82f45e2 ("drm/amdgpu: Implement DPC recovery")
Suggested-by: Andrey Grodzovsky <[email protected]>
Signed-off-by: Guchun Chen <[email protected]>
Reviewed-by: Andrey Grodzovsky <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|