| Age | Commit message (Collapse) | Author | Files | Lines |
|
The construct allocating only parts of the vma structure when
the userptr part is not needed is very fragile. A developer could
add additional fields below the userptr part, and the code could
easily attempt to access the userptr part even if its not persent.
So introduce xe_userptr_vma which subclasses struct xe_vma the
proper way, and accordingly modify a couple of interfaces.
This should also help if adding userptr helpers to drm_gpuvm.
v2:
- Fix documentation of to_userptr_vma() (Matthew Brost)
- Fix allocation and freeing of vmas to clearer distinguish
between the types.
Closes: https://lore.kernel.org/intel-xe/[email protected]/T/#u
Fixes: a4cc60a55fd9 ("drm/xe: Only alloc userptr part of xe_vma for userptrs")
Cc: Rodrigo Vivi <[email protected]>
Cc: Matthew Brost <[email protected]>
Cc: Lucas De Marchi <[email protected]>
Signed-off-by: Thomas Hellström <[email protected]>
Reviewed-by: Matthew Brost <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit 5bd24e78829ad569fa1c3ce9a05b59bb97b91f3d)
Signed-off-by: Thomas Hellström <[email protected]>
|
|
The sparc build fails [1] due to CTX_VALID being redefined. Fix this by
using a better naming convention of LRC_VALID as this define is used in
setting bits in the lrc descriptor. To be uniform, change other define
with LRC prefix too.
[1] https://lore.kernel.org/all/[email protected]/
v2:
- s/LEGACY_64B_CONTEXT/LRC_LEGACY_64B_CONTEXT (Lucas)
Fixes: 0bc519d20ffa ("drm/xe: Remove GEN[0-9]*_ prefixes")
Cc: Thomas Hellström <[email protected]>
Cc: Lucas De Marchi <[email protected]>
Signed-off-by: Matthew Brost <[email protected]>
Reviewed-by: Lucas De Marchi <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit 152ca51d8db03f08a71c25e999812e263839fdce)
Signed-off-by: Thomas Hellström <[email protected]>
|
|
The error pointer macros are not aware of __user pointers and as a
consequence sparse warns.
Have the copy_mask() function return an integer instead of a __user
pointer.
Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Cc: Rodrigo Vivi <[email protected]>
Cc: Matthew Brost <[email protected]>
Signed-off-by: Thomas Hellström <[email protected]>
Reviewed-by: Matthew Brost <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit 78366eed6853aa6a5deccb2eb182f9334d2bd208)
Signed-off-by: Thomas Hellström <[email protected]>
|
|
These functions acquire and release the gt::mcr_lock. Annotate
accordingly.
Fix the corresponding sparse warning.
Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Fixes: fb1d55efdfcb ("drm/xe: Cleanup OPEN_BRACE style issues")
Cc: Francois Dugast <[email protected]>
Cc: Rodrigo Vivi <[email protected]>
Cc: Matthew Brost <[email protected]>
Signed-off-by: Thomas Hellström <[email protected]>
Reviewed-by: Matthew Brost <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit 97fd7a7e4e877676a2ab1a687ba958b70931abcc)
Signed-off-by: Thomas Hellström <[email protected]>
|
|
The way exec ufences are coded only 1 ufence per IOCTL will be signaled.
It is possible to fix this but for current use cases 1 ufence per IOCTL
is sufficient. Enforce a limit of 1 ufence per IOCTL (both exec and bind
to be uniform).
v2:
- Add fixes tag (Thomas)
Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Cc: Mika Kahola <[email protected]>
Cc: Thomas Hellström <[email protected]>
Signed-off-by: Matthew Brost <[email protected]>
Reviewed-by: Brian Welty <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit d1df9bfbf68c65418f30917f406b6d5bd597714e)
Signed-off-by: Thomas Hellström <[email protected]>
|
|
If skip_guc_pc is set for a platform, C6 is disabled directly without
acquiring a mem_access reference, triggering an assertion inside
xe_gt_idle_disable_c6.
Fixes: 975e4a3795d4 ("drm/xe: Manually setup C6 when skip_guc_pc is set")
Cc: Rodrigo Vivi <[email protected]>
Cc: Vinay Belgaumkar <[email protected]>
Signed-off-by: Matt Roper <[email protected]>
Reviewed-by: Matthew Auld <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit 9f5971bdf78e0937206556534247243ad56cd735)
Signed-off-by: Thomas Hellström <[email protected]>
|
|
trace_dma_fence_init() uses dma_fence_ops functions
like get_driver_name() and get_timeline_name() to generate trace
information but the Xe KMD implementation of those functions makes
use of xe_hw_fence_ctx that was being set after dma_fence_init().
So here just inverting the order to fix the crash.
Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Signed-off-by: José Roberto de Souza <[email protected]>
Reviewed-by: Matthew Brost <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit c6878e47431c72168da08dfbc1496c09b2d3c246)
Signed-off-by: Thomas Hellström <[email protected]>
|
|
The construct allocating only parts of the vma structure when
the userptr part is not needed is very fragile. A developer could
add additional fields below the userptr part, and the code could
easily attempt to access the userptr part even if its not persent.
So introduce xe_userptr_vma which subclasses struct xe_vma the
proper way, and accordingly modify a couple of interfaces.
This should also help if adding userptr helpers to drm_gpuvm.
v2:
- Fix documentation of to_userptr_vma() (Matthew Brost)
- Fix allocation and freeing of vmas to clearer distinguish
between the types.
Closes: https://lore.kernel.org/intel-xe/[email protected]/T/#u
Fixes: a4cc60a55fd9 ("drm/xe: Only alloc userptr part of xe_vma for userptrs")
Cc: Rodrigo Vivi <[email protected]>
Cc: Matthew Brost <[email protected]>
Cc: Lucas De Marchi <[email protected]>
Signed-off-by: Thomas Hellström <[email protected]>
Reviewed-by: Matthew Brost <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
The sparc build fails [1] due to CTX_VALID being redefined. Fix this by
using a better naming convention of LRC_VALID as this define is used in
setting bits in the lrc descriptor. To be uniform, change other define
with LRC prefix too.
[1] https://lore.kernel.org/all/[email protected]/
v2:
- s/LEGACY_64B_CONTEXT/LRC_LEGACY_64B_CONTEXT (Lucas)
Fixes: 0bc519d20ffa ("drm/xe: Remove GEN[0-9]*_ prefixes")
Cc: Thomas Hellström <[email protected]>
Cc: Lucas De Marchi <[email protected]>
Signed-off-by: Matthew Brost <[email protected]>
Reviewed-by: Lucas De Marchi <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
Allows us to detect subsequent IH ring buffer overflows as well.
Cc: Joshua Ashton <[email protected]>
Cc: Alex Deucher <[email protected]>
Cc: Christian König <[email protected]>
Cc: [email protected]
Signed-off-by: Friedrich Vock <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
There is no irq enabled in vcn 4.0.5 resume, causing wrong amdgpu_irq_src status.
Beside, current set function callbacks are empty with no real effect.
Signed-off-by: Yifan Zhang <[email protected]>
Acked-by: Saleemkhan Jamadar <[email protected]>
Reviewed-by: Veerabadhran Gopalakrishnan <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
No need to set GC golden settings in driver from gfx 11.5.0 onwards.
Signed-off-by: Yifan Zhang <[email protected]>
Acked-by: Alex Deucher <[email protected]>
Reviewed-by: Lang Yu <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Fix a warning.
v2: Avoid unmapping attachment repeatedly when ERESTARTSYS.
v3: Lock the BO before accessing ttm->sg to avoid race conditions.(Felix)
[ 41.708711] WARNING: CPU: 0 PID: 1463 at drivers/gpu/drm/ttm/ttm_bo.c:846 ttm_bo_validate+0x146/0x1b0 [ttm]
[ 41.708989] Call Trace:
[ 41.708992] <TASK>
[ 41.708996] ? show_regs+0x6c/0x80
[ 41.709000] ? ttm_bo_validate+0x146/0x1b0 [ttm]
[ 41.709008] ? __warn+0x93/0x190
[ 41.709014] ? ttm_bo_validate+0x146/0x1b0 [ttm]
[ 41.709024] ? report_bug+0x1f9/0x210
[ 41.709035] ? handle_bug+0x46/0x80
[ 41.709041] ? exc_invalid_op+0x1d/0x80
[ 41.709048] ? asm_exc_invalid_op+0x1f/0x30
[ 41.709057] ? amdgpu_amdkfd_gpuvm_dmaunmap_mem+0x2c/0x80 [amdgpu]
[ 41.709185] ? ttm_bo_validate+0x146/0x1b0 [ttm]
[ 41.709197] ? amdgpu_amdkfd_gpuvm_dmaunmap_mem+0x2c/0x80 [amdgpu]
[ 41.709337] ? srso_alias_return_thunk+0x5/0x7f
[ 41.709346] kfd_mem_dmaunmap_attachment+0x9e/0x1e0 [amdgpu]
[ 41.709467] amdgpu_amdkfd_gpuvm_dmaunmap_mem+0x56/0x80 [amdgpu]
[ 41.709586] kfd_ioctl_unmap_memory_from_gpu+0x1b7/0x300 [amdgpu]
[ 41.709710] kfd_ioctl+0x1ec/0x650 [amdgpu]
[ 41.709822] ? __pfx_kfd_ioctl_unmap_memory_from_gpu+0x10/0x10 [amdgpu]
[ 41.709945] ? srso_alias_return_thunk+0x5/0x7f
[ 41.709949] ? tomoyo_file_ioctl+0x20/0x30
[ 41.709959] __x64_sys_ioctl+0x9c/0xd0
[ 41.709967] do_syscall_64+0x3f/0x90
[ 41.709973] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
Fixes: 101b8104307e ("drm/amdkfd: Move dma unmapping after TLB flush")
Signed-off-by: Lang Yu <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Return 0 for success scenairos in 'gmc_v6/7/8/9_0_hw_init()'
Fixes the below:
drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c:920 gmc_v6_0_hw_init() warn: missing error code? 'r'
drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c:1104 gmc_v7_0_hw_init() warn: missing error code? 'r'
drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c:1224 gmc_v8_0_hw_init() warn: missing error code? 'r'
drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c:2347 gmc_v9_0_hw_init() warn: missing error code? 'r'
Fixes: fac4ebd79fed ("drm/amdgpu: Fix with right return code '-EIO' in 'amdgpu_gmc_vram_checking()'")
Cc: Christian König <[email protected]>
Cc: Alex Deucher <[email protected]>
Signed-off-by: Srinivasan Shanmugam <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
The error message buffer overflow 'dc->links' 12 <= 12 suggests that the
code is trying to access an element of the dc->links array that is
beyond its bounds. In C, arrays are zero-indexed, so an array with 12
elements has valid indices from 0 to 11. Trying to access dc->links[12]
would be an attempt to access the 13th element of a 12-element array,
which is a buffer overflow.
To fix this, ensure that the loop does not go beyond the last valid
index when accessing dc->links[i + 1] by subtracting 1 from the loop
condition.
This would ensure that i + 1 is always a valid index in the array.
Fixes the below:
drivers/gpu/drm/amd/amdgpu/../display/dc/link/protocols/link_dp_dpia_bw.c:208 get_host_router_total_dp_tunnel_bw() error: buffer overflow 'dc->links' 12 <= 12
Fixes: 59f1622a5f05 ("drm/amd/display: Add dpia display mode validation logic")
Cc: PeiChen Huang <[email protected]>
Cc: Aric Cyr <[email protected]>
Cc: Rodrigo Siqueira <[email protected]>
Cc: Aurabindo Pillai <[email protected]>
Cc: Meenakshikumar Somasundaram <[email protected]>
Signed-off-by: Srinivasan Shanmugam <[email protected]>
Reviewed-by: Tom Chung <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Add a NULL check for the kzalloc call that allocates memory for
dummy_updates in the amdgpu_dm_atomic_commit_tail function. Previously,
if kzalloc failed to allocate memory and returned NULL, the code would
attempt to use the NULL pointer.
The fix is to check if kzalloc returns NULL, and if so, log an error
message and skip the rest of the current loop iteration with the
continue statement. This prevents the code from attempting to use the
NULL pointer.
Cc: Julia Lawall <[email protected]>
Cc: Aurabindo Pillai <[email protected]>
Cc: Rodrigo Siqueira <[email protected]>
Cc: Alex Hung <[email protected]>
Cc: Alex Deucher <[email protected]>
Reported-by: Julia Lawall <[email protected]>
Reported-by: kernel test robot <[email protected]>
Closes: https://lore.kernel.org/r/[email protected]/
Fixes: 135fd1b35690 ("drm/amd/display: Reduce stack size")
Signed-off-by: Srinivasan Shanmugam <[email protected]>
Reviewed-by: Aurabindo Pillai <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
The same calls are made directly above, but conditional on the firmware
loading and validating successfully.
Cc: [email protected]
Fixes: 9931b67690cf ("drm/amd: Load GFX10 microcode during early_init")
Signed-off-by: David McFarland <[email protected]>
Reviewed-by: Mario Limonciello <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Fix the warning info below during mode1 reset.
[ +0.000004] Call Trace:
[ +0.000004] <TASK>
[ +0.000006] ? show_regs+0x6e/0x80
[ +0.000011] ? __flush_work.isra.0+0x2e8/0x390
[ +0.000005] ? __warn+0x91/0x150
[ +0.000009] ? __flush_work.isra.0+0x2e8/0x390
[ +0.000006] ? report_bug+0x19d/0x1b0
[ +0.000013] ? handle_bug+0x46/0x80
[ +0.000012] ? exc_invalid_op+0x1d/0x80
[ +0.000011] ? asm_exc_invalid_op+0x1f/0x30
[ +0.000014] ? __flush_work.isra.0+0x2e8/0x390
[ +0.000007] ? __flush_work.isra.0+0x208/0x390
[ +0.000007] ? _prb_read_valid+0x216/0x290
[ +0.000008] __cancel_work_timer+0x11d/0x1a0
[ +0.000007] ? try_to_grab_pending+0xe8/0x190
[ +0.000012] cancel_work_sync+0x14/0x20
[ +0.000008] amddrm_sched_stop+0x3c/0x1d0 [amd_sched]
[ +0.000032] amdgpu_device_gpu_recover+0x29a/0xe90 [amdgpu]
This warning info was printed after applying the patch
"drm/sched: Convert drm scheduler to use a work queue rather than kthread".
The root cause is that amdgpu driver tries to use the uninitialized
work_struct in the struct drm_gpu_scheduler
v2:
- Rename the function to amdgpu_ring_sched_ready and move it to
amdgpu_ring.c (Alex)
v3:
- Fix a few more checks based on Vitaly's patch (Alex)
v4:
- squash in fix noticed by Bert in
https://gitlab.freedesktop.org/drm/amd/-/issues/3139
Fixes: 11b3b9f461c5 ("drm/sched: Check scheduler ready before calling timeout handling")
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Vitaly Prosyak <[email protected]>
Signed-off-by: Ma Jun <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
[why]
odm calculation is missing for pipe split policy determination
and cause Underflow/Corruption issue.
[how]
Add the odm calculation.
Cc: Mario Limonciello <[email protected]>
Cc: Alex Deucher <[email protected]>
Cc: [email protected]
Reviewed-by: Charlene Liu <[email protected]>
Acked-by: Tom Chung <[email protected]>
Signed-off-by: Fangzhi Zuo <[email protected]>
Tested-by: Daniel Wheeler <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
[Why]
On 14us for exit latency time causes underflow for 8K monitor with HDR on.
Increasing the latency to 28us fixes the underflow.
[How]
Increase the latency to 28us. This workaround should be sufficient
before we figure out why SR exit so long.
Reviewed-by: Chaitanya Dhere <[email protected]>
Acked-by: Tom Chung <[email protected]>
Signed-off-by: Nicholas Susanto <[email protected]>
Tested-by: Daniel Wheeler <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
[why]
MAX_SURFACES is per stream, while MAX_PLANES is per asic. The
mpc_combine is an array that records all the planes per asic. Therefore
MAX_PLANES should be used as the array size. Using MAX_SURFACES causes
array overflow when there are more than 3 planes.
[how]
Use the MAX_PLANES for the mpc_combine array size.
Cc: Mario Limonciello <[email protected]>
Cc: Alex Deucher <[email protected]>
Cc: [email protected]
Reviewed-by: Rodrigo Siqueira <[email protected]>
Reviewed-by: Nevenko Stupar <[email protected]>
Reviewed-by: Chaitanya Dhere <[email protected]>
Acked-by: Tom Chung <[email protected]>
Signed-off-by: Wenjing Liu <[email protected]>
Tested-by: Daniel Wheeler <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
[Why]
Secondary DP2 display fails to light up in some instances
[How]
Clock needs to be on when DPSTREAMCLK*_EN =1. This change
moves dtbclk_p enable/disable point to make sure this is
the case
Reviewed-by: Charlene Liu <[email protected]>
Reviewed-by: Dmytro Laktyushkin <[email protected]>
Acked-by: Tom Chung <[email protected]>
Signed-off-by: Daniel Miess <[email protected]>
Signed-off-by: Dmytro Laktyushkin <[email protected]>
Tested-by: Daniel Wheeler <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
[why]
BIOS's integration info table not following the original order
which is phy instance is ext_displaypath's array index.
[how]
Move them to follow the original order.
Reviewed-by: Muhammad Ahmed <[email protected]>
Acked-by: Tom Chung <[email protected]>
Signed-off-by: Charlene Liu <[email protected]>
Tested-by: Daniel Wheeler <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
[why]
Originally, PMFW said min FCLK is 300Mhz, but min DCFCLK can be increased
to 400Mhz because min FCLK is now 600Mhz so FCLK >= 1.5 * DCFCLK hardware
requirement will still be satisfied. Increasing min DCFCLK addresses
underflow issues (underflow occurs when phantom pipe is turned on for some
Sub-Viewport configs).
[how]
Increasing DCFCLK by raising the min_dcfclk_mhz
Reviewed-by: Chaitanya Dhere <[email protected]>
Reviewed-by: Alvin Lee <[email protected]>
Acked-by: Tom Chung <[email protected]>
Signed-off-by: Sohaib Nadeem <[email protected]>
Tested-by: Daniel Wheeler <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Revert commit 885c71ad791c
("drm/amd/display: initialize all the dpm level's stutter latency")
Because it causes some regression
Reviewed-by: Muhammad Ahmed <[email protected]>
Acked-by: Tom Chung <[email protected]>
Signed-off-by: Charlene Liu <[email protected]>
Tested-by: Daniel Wheeler <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
On GFX 9.4.3, for a given KFD node, fetch the correct drm device from
XCP manager when checking for cgroup permissions.
Signed-off-by: Mukul Joshi <[email protected]>
Reviewed-by: Harish Kasiviswanathan <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
This instruction has no functional difference to S_ENDPGM
but allows performance counters to track save events correctly.
Signed-off-by: Jay Cornwall <[email protected]>
Reviewed-by: Laurent Morichetti <[email protected]>
Acked-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Partial migration to system memory should use migrate.addr, not
prange->start as virtual address to allocate system memory page.
Fixes: a546a2768440 ("drm/amdkfd: Use partial migrations/mapping for GPU/CPU page faults in SVM")
Signed-off-by: Philip Yang <[email protected]>
Reviewed-by: Xiaogang Chen <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
This patch is to eliminate interrupt warning below:
"[drm] Fence fallback timer expired on ring sdma0.0".
An early vm pt clearing job is sent to SDMA ahead of interrupt enabled.
And re-locating the drm client creation following after drm_dev_register
looks like a more proper flow.
v2: wrap the drm client creation
Fixes: 1819200166ce ("drm/amdkfd: Export DMABufs from KFD using GEM handles")
Signed-off-by: Le Ma <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Reviewed-by: Lijo Lazar <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Fix the W=1 warning -Wunused-but-set-variable.
Cc: Karol Herbst <[email protected]>
Cc: Lyude Paul <[email protected]>
Cc: Danilo Krummrich <[email protected]>
Cc: [email protected]
Signed-off-by: Jani Nikula <[email protected]>
Reviewed-by: Danilo Krummrich <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/8b133e7ec0e9aef728be301ac019c5ddcb3bbf51.1704908087.git.jani.nikula@intel.com
|
|
Fix the W=1 warning -Wunused-but-set-variable.
Cc: Karol Herbst <[email protected]>
Cc: Lyude Paul <[email protected]>
Cc: Danilo Krummrich <[email protected]>
Cc: [email protected]
Signed-off-by: Jani Nikula <[email protected]>
Reviewed-by: Danilo Krummrich <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/4d9f62fa6963acfd8b7d8f623799ba3a516e347d.1704908087.git.jani.nikula@intel.com
|
|
Allows us to detect subsequent IH ring buffer overflows as well.
Cc: Joshua Ashton <[email protected]>
Cc: Alex Deucher <[email protected]>
Cc: Christian König <[email protected]>
Cc: [email protected]
Signed-off-by: Friedrich Vock <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
There is no irq enabled in vcn 4.0.5 resume, causing wrong amdgpu_irq_src status.
Beside, current set function callbacks are empty with no real effect.
Signed-off-by: Yifan Zhang <[email protected]>
Acked-by: Saleemkhan Jamadar <[email protected]>
Reviewed-by: Veerabadhran Gopalakrishnan <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Get UMC physical address from PSP in RAS error address coversion.
Signed-off-by: Tao Zhou <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Convert mca address to physical address or vice versa via RAS TA.
Signed-off-by: Tao Zhou <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
No need to set GC golden settings in driver from gfx 11.5.0 onwards.
Signed-off-by: Yifan Zhang <[email protected]>
Acked-by: Alex Deucher <[email protected]>
Reviewed-by: Lang Yu <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Fix a warning.
v2: Avoid unmapping attachment repeatedly when ERESTARTSYS.
v3: Lock the BO before accessing ttm->sg to avoid race conditions.(Felix)
[ 41.708711] WARNING: CPU: 0 PID: 1463 at drivers/gpu/drm/ttm/ttm_bo.c:846 ttm_bo_validate+0x146/0x1b0 [ttm]
[ 41.708989] Call Trace:
[ 41.708992] <TASK>
[ 41.708996] ? show_regs+0x6c/0x80
[ 41.709000] ? ttm_bo_validate+0x146/0x1b0 [ttm]
[ 41.709008] ? __warn+0x93/0x190
[ 41.709014] ? ttm_bo_validate+0x146/0x1b0 [ttm]
[ 41.709024] ? report_bug+0x1f9/0x210
[ 41.709035] ? handle_bug+0x46/0x80
[ 41.709041] ? exc_invalid_op+0x1d/0x80
[ 41.709048] ? asm_exc_invalid_op+0x1f/0x30
[ 41.709057] ? amdgpu_amdkfd_gpuvm_dmaunmap_mem+0x2c/0x80 [amdgpu]
[ 41.709185] ? ttm_bo_validate+0x146/0x1b0 [ttm]
[ 41.709197] ? amdgpu_amdkfd_gpuvm_dmaunmap_mem+0x2c/0x80 [amdgpu]
[ 41.709337] ? srso_alias_return_thunk+0x5/0x7f
[ 41.709346] kfd_mem_dmaunmap_attachment+0x9e/0x1e0 [amdgpu]
[ 41.709467] amdgpu_amdkfd_gpuvm_dmaunmap_mem+0x56/0x80 [amdgpu]
[ 41.709586] kfd_ioctl_unmap_memory_from_gpu+0x1b7/0x300 [amdgpu]
[ 41.709710] kfd_ioctl+0x1ec/0x650 [amdgpu]
[ 41.709822] ? __pfx_kfd_ioctl_unmap_memory_from_gpu+0x10/0x10 [amdgpu]
[ 41.709945] ? srso_alias_return_thunk+0x5/0x7f
[ 41.709949] ? tomoyo_file_ioctl+0x20/0x30
[ 41.709959] __x64_sys_ioctl+0x9c/0xd0
[ 41.709967] do_syscall_64+0x3f/0x90
[ 41.709973] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
Fixes: 101b8104307e ("drm/amdkfd: Move dma unmapping after TLB flush")
Signed-off-by: Lang Yu <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
The error message buffer overflow 'dc->links' 12 <= 12 suggests that the
code is trying to access an element of the dc->links array that is
beyond its bounds. In C, arrays are zero-indexed, so an array with 12
elements has valid indices from 0 to 11. Trying to access dc->links[12]
would be an attempt to access the 13th element of a 12-element array,
which is a buffer overflow.
To fix this, ensure that the loop does not go beyond the last valid
index when accessing dc->links[i + 1] by subtracting 1 from the loop
condition.
This would ensure that i + 1 is always a valid index in the array.
Fixes the below:
drivers/gpu/drm/amd/amdgpu/../display/dc/link/protocols/link_dp_dpia_bw.c:208 get_host_router_total_dp_tunnel_bw() error: buffer overflow 'dc->links' 12 <= 12
Fixes: 59f1622a5f05 ("drm/amd/display: Add dpia display mode validation logic")
Cc: PeiChen Huang <[email protected]>
Cc: Aric Cyr <[email protected]>
Cc: Rodrigo Siqueira <[email protected]>
Cc: Aurabindo Pillai <[email protected]>
Cc: Meenakshikumar Somasundaram <[email protected]>
Signed-off-by: Srinivasan Shanmugam <[email protected]>
Reviewed-by: Tom Chung <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Add a NULL check for the kzalloc call that allocates memory for
dummy_updates in the amdgpu_dm_atomic_commit_tail function. Previously,
if kzalloc failed to allocate memory and returned NULL, the code would
attempt to use the NULL pointer.
The fix is to check if kzalloc returns NULL, and if so, log an error
message and skip the rest of the current loop iteration with the
continue statement. This prevents the code from attempting to use the
NULL pointer.
Cc: Julia Lawall <[email protected]>
Cc: Aurabindo Pillai <[email protected]>
Cc: Rodrigo Siqueira <[email protected]>
Cc: Alex Hung <[email protected]>
Cc: Alex Deucher <[email protected]>
Reported-by: Julia Lawall <[email protected]>
Reported-by: kernel test robot <[email protected]>
Closes: https://lore.kernel.org/r/[email protected]/
Fixes: 135fd1b35690 ("drm/amd/display: Reduce stack size")
Signed-off-by: Srinivasan Shanmugam <[email protected]>
Reviewed-by: Aurabindo Pillai <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Return 0 for success scenairos in 'gmc_v6/7/8/9_0_hw_init()'
Fixes the below:
drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c:920 gmc_v6_0_hw_init() warn: missing error code? 'r'
drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c:1104 gmc_v7_0_hw_init() warn: missing error code? 'r'
drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c:1224 gmc_v8_0_hw_init() warn: missing error code? 'r'
drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c:2347 gmc_v9_0_hw_init() warn: missing error code? 'r'
Fixes: fac4ebd79fed ("drm/amdgpu: Fix with right return code '-EIO' in 'amdgpu_gmc_vram_checking()'")
Cc: Christian König <[email protected]>
Cc: Alex Deucher <[email protected]>
Signed-off-by: Srinivasan Shanmugam <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Need to resume ras during gpu reset for
gfx v9_4_3 sriov
Signed-off-by: YiPeng Chai <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Send RAS disable feature command in fini.
Signed-off-by: Tao Zhou <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Update boot time errors polling sequence to align with
the latest firmware change.
Signed-off-by: Hawking Zhang <[email protected]>
Reviewed-by: Frank Min <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
The same calls are made directly above, but conditional on the firmware
loading and validating successfully.
Cc: [email protected]
Fixes: 9931b67690cf ("drm/amd: Load GFX10 microcode during early_init")
Signed-off-by: David McFarland <[email protected]>
Reviewed-by: Mario Limonciello <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
All GuC ABI definitions are unsigned and not defining as unsigned is
causing build errors [1].
[1] https://lore.kernel.org/all/[email protected]/
Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Cc: Thomas Hellström <[email protected]>
Cc: Lucas De Marchi <[email protected]>
Cc: Michal Wajdeczko <[email protected]>
Signed-off-by: Matthew Brost <[email protected]>
Reviewed-by: Thomas Hellström <[email protected]>
Reviewed-by: Lucas De Marchi <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
xe_assert() is intended to be used only for "impossible" situations that
should never be hit (and if they are hit it means there's a driver bug
somewhere); assertions are only compiled into debug builds.
Although we expect jobs submitted by the kernel to be well-behaved and
run without error, timeouts are a legitimate possibility for reasons
beyond our control (bad firmware, flaky hardware, etc.). We should use
a real WARN if we encounter these, even for non-debug builds, to ensure
the issue is being properly highlighted in bug reports and such.
Also give the WARNs more human-readable messages and move them below the
general notice-level message that gets printed for any kind of timeout
to make the errors a bit more understandable.
v2:
- Convert the VM / exec_queue_killed assertion as well. (MattB)
Signed-off-by: Matt Roper <[email protected]>
Reviewed-by: Matthew Brost <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
Fix the warning info below during mode1 reset.
[ +0.000004] Call Trace:
[ +0.000004] <TASK>
[ +0.000006] ? show_regs+0x6e/0x80
[ +0.000011] ? __flush_work.isra.0+0x2e8/0x390
[ +0.000005] ? __warn+0x91/0x150
[ +0.000009] ? __flush_work.isra.0+0x2e8/0x390
[ +0.000006] ? report_bug+0x19d/0x1b0
[ +0.000013] ? handle_bug+0x46/0x80
[ +0.000012] ? exc_invalid_op+0x1d/0x80
[ +0.000011] ? asm_exc_invalid_op+0x1f/0x30
[ +0.000014] ? __flush_work.isra.0+0x2e8/0x390
[ +0.000007] ? __flush_work.isra.0+0x208/0x390
[ +0.000007] ? _prb_read_valid+0x216/0x290
[ +0.000008] __cancel_work_timer+0x11d/0x1a0
[ +0.000007] ? try_to_grab_pending+0xe8/0x190
[ +0.000012] cancel_work_sync+0x14/0x20
[ +0.000008] amddrm_sched_stop+0x3c/0x1d0 [amd_sched]
[ +0.000032] amdgpu_device_gpu_recover+0x29a/0xe90 [amdgpu]
This warning info was printed after applying the patch
"drm/sched: Convert drm scheduler to use a work queue rather than kthread".
The root cause is that amdgpu driver tries to use the uninitialized
work_struct in the struct drm_gpu_scheduler
v2:
- Rename the function to amdgpu_ring_sched_ready and move it to
amdgpu_ring.c (Alex)
v3:
- Fix a few more checks based on Vitaly's patch (Alex)
v4:
- squash in fix noticed by Bert in
https://gitlab.freedesktop.org/drm/amd/-/issues/3139
Fixes: 11b3b9f461c5 ("drm/sched: Check scheduler ready before calling timeout handling")
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Vitaly Prosyak <[email protected]>
Signed-off-by: Ma Jun <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
The error pointer macros are not aware of __user pointers and as a
consequence sparse warns.
Have the copy_mask() function return an integer instead of a __user
pointer.
Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Cc: Rodrigo Vivi <[email protected]>
Cc: Matthew Brost <[email protected]>
Signed-off-by: Thomas Hellström <[email protected]>
Reviewed-by: Matthew Brost <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
These functions acquire and release the gt::mcr_lock. Annotate
accordingly.
Fix the corresponding sparse warning.
Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Fixes: fb1d55efdfcb ("drm/xe: Cleanup OPEN_BRACE style issues")
Cc: Francois Dugast <[email protected]>
Cc: Rodrigo Vivi <[email protected]>
Cc: Matthew Brost <[email protected]>
Signed-off-by: Thomas Hellström <[email protected]>
Reviewed-by: Matthew Brost <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
There are very few places that need to include anything from under
display/. Require the display/ prefix in #include directives, and drop
the subdirectory from the header search path.
Sort the include lists while at it.
Signed-off-by: Jani Nikula <[email protected]>
Reviewed-by: Rodrigo Vivi <[email protected]>
Acked-by: Lucas De Marchi <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|