Age | Commit message (Collapse) | Author | Files | Lines |
|
Some times a hang GPU causes multiple reset sources to schedule resets.
The second source will be able to trigger an unnecessary reset if they
schedule after we call amdgpu_device_stop_pending_resets.
Move amdgpu_device_stop_pending_resets to after the reset is done. Since
at this point the GPU is supposedly in a good state, any reset scheduled
after this point would be a legitimate reset.
Remove unnecessary and incorrect checks for amdgpu_in_reset that was
kinda serving this purpose.
Signed-off-by: Yunxiang Li <[email protected]>
Reviewed-by: Lijo Lazar <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
increase retry times to wait host has enough time to complete reset.
Signed-off-by: Zhigang Luo <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
VF can't access FB when host is doing mode1 reset. Using sizeof to get
vf2pf info size, instead of reading it from vf2pf header stored in FB.
Signed-off-by: Zhigang Luo <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Reviewed-by: Lijo Lazar <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
This reverts commit e075e496f516bf92bc0cbaf94d64e8d4a6b58321, as helper
code should always be selected by the driver that needs it, for the
convenience of the final user configuring a kernel.
The user who configures a kernel should not need to know which helpers
are needed for the driver he is interested in. Making a driver depend
on helper code means that the user needs to know which helpers to enable
first, which is very user-unfriendly.
Signed-off-by: Geert Uytterhoeven <[email protected]>
Acked-by: Arnd Bergmann <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/1ba76cc4d96a8afefff5d1bc42fb1e1329c5da68.1713780345.git.geert+renesas@glider.be
Signed-off-by: Maxime Ripard <[email protected]>
|
|
This reverts commit 0323287de87d7e6e9c22c57d7440aa353a2298d0, as helper
code should always be selected by the driver that needs it, for the
convenience of the final user configuring a kernel.
The user who configures a kernel should not need to know which helpers
are needed for the driver he is interested in. Making a driver depend
on helper code means that the user needs to know which helpers to enable
first, which is very user-unfriendly.
Signed-off-by: Geert Uytterhoeven <[email protected]>
Acked-by: Arnd Bergmann <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/89ac456805746b6d0c888f10c5120b11aacd3319.1713780345.git.geert+renesas@glider.be
Signed-off-by: Maxime Ripard <[email protected]>
|
|
This reverts commit 3166e7e6d935caaef07605a5c90773fbf9ffeaf4, as helper
code should always be selected by the driver that needs it, for the
convenience of the final user configuring a kernel.
The user who configures a kernel should not need to know which helpers
are needed for the driver he is interested in. Making a driver depend
on helper code means that the user needs to know which helpers to enable
first, which is very user-unfriendly.
Signed-off-by: Geert Uytterhoeven <[email protected]>
Acked-by: Arnd Bergmann <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/a40e70a0abd3d841c23c107d452a43fdd70ef37a.1713780345.git.geert+renesas@glider.be
Signed-off-by: Maxime Ripard <[email protected]>
|
|
This reverts commit f6d2dc03fa8546b284dd8c1af027d9fac5725921, as helper
code should always be selected by the driver that needs it, for the
convenience of the final user configuring a kernel.
The user who configures a kernel should not need to know which helpers
are needed for the driver he is interested in. Making a driver depend
on helper code means that the user needs to know which helpers to enable
first, which is very user-unfriendly.
Signed-off-by: Geert Uytterhoeven <[email protected]>
Acked-by: Arnd Bergmann <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/bd288a5943dab8609f2d1f2bf413595a61df727a.1713780345.git.geert+renesas@glider.be
Signed-off-by: Maxime Ripard <[email protected]>
|
|
Only TTM-based drivers use fbdev-generic. Rename it to fbdev-ttm and
change the symbol infix from _generic_ to _ttm_. Link the source file
into TTM helpers, so that it is only build if TTM-based drivers have
been selected. Select DRM_TTM_HELPER for loongson.
Signed-off-by: Thomas Zimmermann <[email protected]>
Reviewed-by: Javier Martinez Canillas <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
This patch adds a missed handling of PL domain doorbell while
handling VRAM faults.
Cc: Christian Koenig <[email protected]>
Cc: Alex Deucher <[email protected]>
Fixes: a6ff969fe9cb ("drm/amdgpu: fix visible VRAM handling during faults")
Reviewed-by: Christian Koenig <[email protected]>
Signed-off-by: Shashank Sharma <[email protected]>
Signed-off-by: Arvind Yadav <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
This reverts drm/amdgpu: fix ftrace event amdgpu_bo_move always move
on same heap. The basic problem here is that after the move the old
location is simply not available any more.
Some fixes were suggested, but essentially we should call the move
notification before actually moving things because only this way we have
the correct order for DMA-buf and VM move notifications as well.
Also rework the statistic handling so that we don't update the eviction
counter before the move.
v2: add missing NULL check
Signed-off-by: Christian König <[email protected]>
Fixes: 94aeb4117343 ("drm/amdgpu: fix ftrace event amdgpu_bo_move always move on same heap")
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3171
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
CC: [email protected]
|
|
Subtract the VRAM pinned memory when checking for available memory
in amdgpu_amdkfd_reserve_mem_limit function since that memory is not
available for use.
Signed-off-by: Mukul Joshi <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Clear warning that field bp is uninitialized when
calling amdgpu_virt_ras_add_bps.
Signed-off-by: Tim Huang <[email protected]>
Reviewed-by: Yang Wang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Add sdma v7_0 ip block.
v2: squash in updates (Alex)
Signed-off-by: Likun Gao <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Provide some lost ucode name shown via firmware ID.
v2: fix whitespace (Alex)
Signed-off-by: Likun Gao <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Add support for new SDMA firmware struct (V3) with PSP
front door load type.
Signed-off-by: Likun Gao <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Set SDMAx_WATCHDOG_CNTL.QUEUE_HANG_COUNT registers
to improve SDMA reliability.
Signed-off-by: Jack Xiao <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
v1: Add sdma v7_0 ip block support. (Likun)
v2: Move vmhub from ring_funcs to ring. (Hawking)
v3: Switch to AMDGPU_GFXHUB(0). (Hawking)
v4: Move microcode init into early_init. (Likun)
v5: Fix warnings (Alex)
v6: Squash in various fixes (Alex)
v7: Rebase (Alex)
v8: Rebase (Alex)
Signed-off-by: Likun Gao <[email protected]>
Signed-off-by: Hawking Zhang <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Add sdma firmware struct version 3 to support
sdma v7_0 firmware.
Signed-off-by: Likun Gao <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
The "instance" variable needs to be signed for the error handling to work.
Fixes: 8b2faf1a4f3b ("drm/amdgpu: add error handle to avoid out-of-bounds")
Reviewed-by: Bob Zhou <[email protected]>
Signed-off-by: Dan Carpenter <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Add new members in sdma instance structure
for sdma v7_0 firmware.
Signed-off-by: Likun Gao <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Add gmc v12_0 ip block.
v2: Squash in updates (Alex)
Signed-off-by: Likun Gao <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
This patch adds a missed handling of PL domain doorbell while
handling VRAM faults.
Cc: Christian Koenig <[email protected]>
Cc: Alex Deucher <[email protected]>
Fixes: a6ff969fe9cb ("drm/amdgpu: fix visible VRAM handling during faults")
Reviewed-by: Christian Koenig <[email protected]>
Signed-off-by: Shashank Sharma <[email protected]>
Signed-off-by: Arvind Yadav <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Add gfx v12 pte/pde support to gmc common helper.
v2: squash in fixes (Alex)
Signed-off-by: Hawking Zhang <[email protected]>
Reviewed-by: Likun Gao <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
pte_is_pte is new flag introduced in gmc v12 that
needs to be set by default for pte.
Signed-off-by: Hawking Zhang <[email protected]>
Reviewed-by: Likun Gao <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Add initial support for GMC v12.
v1: Add gmc v12_0 ip block support.
v2: Switch to gfx.kiq array.
v3: Switch to vmhubs_mask.
v4: Switch to AMDGPU_MMHUB0(0) and AMDGPU_GFXHUB(0)
v5: Rebase (Alex)
v6: Squash in fixes for AGP handling, gfxhub init order,
vmhub index (Alex)
v7: Rebase (Alex)
v8: squash in ecc fix (Alex)
Signed-off-by: Hawking Zhang <[email protected]>
Reviewed-by: Likun Gao <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Add gfx v12 pte/pde format change.
Signed-off-by: Hawking Zhang <[email protected]>
Reviewed-by: Likun Gao <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Add initial gfxhub v12 support.
v1: Add gfxhub v12_0 ip block support (Likun)
v2: Switch to AMDGPU_GFXHUB(0) (Hawking)
v3: Squash in keep default error response mode (Hawking)
Signed-off-by: Likun Gao <[email protected]>
Signed-off-by: Hawking Zhang <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
mes schq engine require more waiting time for engine ready
before packet submission.
Signed-off-by: Jack Xiao <[email protected]>
Reviewed-by: Yifan Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Enable mes to map legacy queue support.
v2: kiq_set_resources is required.
Signed-off-by: Jack Xiao <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
If the BO pages pinned for RDMA is not contiguous on VRAM, evict it to
system memory first to free the VRAM space, then allocate contiguous
VRAM space, and then move it from system memory back to VRAM.
v6: user context should use interruptible call (Felix)
Signed-off-by: Philip Yang <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Remove redundant function call.
Signed-off-by: YiPeng Chai <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Remove unused code.
Signed-off-by: YiPeng Chai <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Clear overflowed array index read warning by cast operation.
Signed-off-by: Tim Huang <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Clear resource leak warning that when the prepare fails,
the allocated amdgpu job object will never be released.
Signed-off-by: Tim Huang <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
because the ue valid mca count will only be cleared after gpu reset,
so only dump mca log on the first time to get mca bank after receive RAS interrupt.
Signed-off-by: Yang Wang <[email protected]>
Reviewed-by: Tao Zhou <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
v1:
because SMU CE valid mca bank will be cleared after reading,
this patch adds mca cache at the driver level to ensure that the mca bank is not lost.
v2:
refine amdgpu_mca_init/fini/reset() function name.
v3:
add mca_cache.lock support
only add CE bank to mca bank cache.
Signed-off-by: Yang Wang <[email protected]>
Reviewed-by: Tao Zhou <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
- Refine mca driver code.
- Centralize mca bank dispatch code logic.
Signed-off-by: Yang Wang <[email protected]>
Reviewed-by: Tao Zhou <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Add initial support for MMHUB 4.1.0.
v1: Add mmhub v4_1_0 ip block support.
v2: Switch to AMDGPU_MMHUB0(0).
v3: squash in fix for ip version check (Alex)
v4: squash in vm_contexts_disable fix (Alex)
Signed-off-by: Hawking Zhang <[email protected]>
Reviewed-by: Likun Gao <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
When TTM failed to alloc VRAM, TTM try evict BOs from VRAM to system
memory then retry the allocation, this skips the KFD BOs from the same
process because KFD require all BOs are resident for user queues.
If TTM with TTM_PL_FLAG_CONTIGUOUS flag to alloc contiguous VRAM, allow
TTM evict KFD BOs from the same process, this will evict the user queues
first, and restore the queues later after contiguous VRAM allocation.
Signed-off-by: Philip Yang <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Define macro AMDGPU_MAX_SG_SEGMENT_SIZE 2GB, because struct scatterlist
length is unsigned int, and some users of it cast to a signed int, so
every segment of sg table is limited to size 2GB maximum.
For contiguous VRAM allocation, don't limit the max buddy block size in
order to get contiguous VRAM memory. To workaround the sg table segment
size limit, allocate multiple segments if contiguous size is bigger than
AMDGPU_MAX_SG_SEGMENT_SIZE.
Signed-off-by: Philip Yang <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
- remove unused callback functions.
- make part of mca functions static and refine the function order.
Signed-off-by: Yang Wang <[email protected]>
Reviewed-by: Tao Zhou <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Add common soc24 ip block.
v2: squash in updates (Alex)
Signed-off-by: Likun Gao <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Adjust mes queue initialization before kgq/kcq initialization
to enable mes mapping legacy queue.
Signed-off-by: Jack Xiao <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Add mes11 map legacy queue packet submission.
Signed-off-by: Jack Xiao <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
This reverts drm/amdgpu: fix ftrace event amdgpu_bo_move always move
on same heap. The basic problem here is that after the move the old
location is simply not available any more.
Some fixes were suggested, but essentially we should call the move
notification before actually moving things because only this way we have
the correct order for DMA-buf and VM move notifications as well.
Also rework the statistic handling so that we don't update the eviction
counter before the move.
v2: add missing NULL check
Signed-off-by: Christian König <[email protected]>
Fixes: 94aeb4117343 ("drm/amdgpu: fix ftrace event amdgpu_bo_move always move on same heap")
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3171
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
CC: [email protected]
|
|
Add initial soc24 support.
v1: Add soc24 common ip block.
v2: Switch to new select_se_sh/enter_safe_mode
interface.
v3: squash in correct ext rev id, etc. (Alex)
Signed-off-by: Hawking Zhang <[email protected]>
Reviewed-by: Likun Gao <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
RDMA device with limited scatter-gather ability requires contiguous VRAM
buffer allocation for RDMA peer direct support.
Add a new KFD alloc memory flag and store as bo alloc flag
AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS. When pin this bo to export for RDMA
peerdirect access, this will set TTM_PL_FLAG_CONTIFUOUS flag, and ask
VRAM buddy allocator to get contiguous VRAM.
Signed-off-by: Philip Yang <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Assign value to clock to fix the warning below:
"Using uninitialized value res. Field res.clock is uninitialized"
Signed-off-by: Ma Jun <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
https://gitlab.freedesktop.org/agd5f/linux into drm-next
amd-drm-next-6.10-2024-04-26:
amdgpu:
- Misc code cleanups and refactors
- Support setting reset method at runtime
- Report OD status
- SMU 14.0.1 fixes
- SDMA 4.4.2 fixes
- VPE fixes
- MES fixes
- Update BO eviction priorities
- UMSCH fixes
- Reset fixes
- Freesync fixes
- GFXIP 9.4.3 fixes
- SDMA 5.2 fixes
- MES UAF fix
- RAS updates
- Devcoredump updates for dumping IP state
- DSC fixes
- JPEG fix
- Fix VRAM memory accounting
- VCN 5.0 fixes
- MES fixes
- UMC 12.0 updates
- Modify contiguous flags handling
- Initial support for mapping kernel queues via MES
amdkfd:
- Fix rescheduling of restore worker
- VRAM accounting for SVM migrations
- mGPU fix
- Enable SQ watchpoint for gfx10
Signed-off-by: Dave Airlie <[email protected]>
From: Alex Deucher <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
Thomas needs the defio fixes, Maíra needs the vkms fixes and Joonas
has some fun with i915-gem conflicts.
Signed-off-by: Daniel Vetter <[email protected]>
|