Age | Commit message (Collapse) | Author | Files | Lines |
|
Fix the following errors reported by checkpatch:
ERROR: space required before the open parenthesis '('
ERROR: that open brace { should be on the previous line
Signed-off-by: chenxuebing <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Fix the following errors reported by checkpatch:
ERROR: that open brace { should be on the previous line
Signed-off-by: chenxuebing <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Fix the following errors reported by checkpatch:
ERROR: do not initialise globals to 0
Signed-off-by: chenxuebing <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Fix the following errors reported by checkpatch:
ERROR: that open brace { should be on the previous line
Signed-off-by: chenxuebing <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
If the RLC firmware is invalid because of wrong header size,
the pointer to the rlc firmware is released in function
amdgpu_ucode_request. There will be a null pointer error
in subsequent use. So skip validation to fix it.
Signed-off-by: Ma Jun <[email protected]>
Acked-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Move ras capablity check to amdgpu_ras_check_supported.
Driver will query ras capablity through psp interace, or
vbios interface, or specific ip callbacks.
Signed-off-by: Hawking Zhang <[email protected]>
Reviewed-by: Tao Zhou <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Instead of traditional atomfirmware interfaces for RAS
capability, host driver can query ras capability from
psp starting from psp v13_0_6.
v2: drop redundant local variable from get_ras_capability.
Signed-off-by: Hawking Zhang <[email protected]>
Reviewed-by: Tao Zhou <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Fix the following errors reported by checkpatch:
ERROR: space prohibited before that '++' (ctx:WxB)
Signed-off-by: chenxuebing <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Fix the following errors reported by checkpatch:
ERROR: that open brace { should be on the previous line
ERROR: trailing statements should be on next line
Signed-off-by: chenxuebing <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Fix the following errors reported by checkpatch:
spaces required around that '=' (ctx:VxV)
Signed-off-by: chenxuebing <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Fix the following errors reported by checkpatch:
ERROR: "foo* bar" should be "foo *bar"
Signed-off-by: chenxuebing <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Fix the following errors reported by checkpatch:
ERROR: that open brace { should be on the previous line
Signed-off-by: chenxuebing <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Fix the following errors reported by checkpatch:
ERROR: that open brace { should be on the previous line
Signed-off-by: chenxuebing <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Theoretically, it would be possible for a buggy or malicious VBIOS to
overwrite past the bounds of the passed parameters (or its own
workspace); add bounds checking to prevent this from happening.
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3093
Signed-off-by: Alexander Richards <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Driver and firmware share the same ras block enum.
Signed-off-by: Hawking Zhang <[email protected]>
Reviewed-by: Tao Zhou <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
use new ACA macro to instead of MCA
Signed-off-by: Yang Wang <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Separate deferred error from UE and CE and log it
individually.
Signed-off-by: Candice Li <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Needs to do bad page retirement for deferred errors.
v2: Drop unused dev_info.
Signed-off-by: YiPeng Chai <[email protected]>
Signed-off-by: Candice Li <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
add xgmi v6.4.0 ACA driver support
Signed-off-by: Yang Wang <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
v1:
add mmhub v1.8 ACA driver support
v2:
use macro to define smn address value.
Signed-off-by: Yang Wang <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
v1:
add sdma v4.4.2 ACA driver support
v2:
use macro to define smn address value.
v3: squash in fix for unbalanced irqs
Signed-off-by: Yang Wang <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
v1:
add gfx v9.4.3 ACA driver support
v2:
use macro to define smn address value.
Signed-off-by: Yang Wang <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
large bar
Some customer platforms do not enable mmconfig for various reasons,
such as bios bug, and therefore cannot access the GPU extend configuration
space through mmio.
When the system enters the d3cold state and resumes, the amdgpu driver
fails to resume because the extend configuration space registers of
GPU can't be restored. At this point, Usually we only see some failure
dmesg log printed by amdgpu driver, it is difficult to find the root
cause.
Therefor print a warnning message if the system can't access the
extended configuration space register when using large bar.
Signed-off-by: Ma Jun <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
add umc v12.0 ACA driver support
Signed-off-by: Yang Wang <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
add aca sysfs node support
Signed-off-by: Yang Wang <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
v1:
add ACA error query interface
v2:
Add a new helper function to determine whether to use ACA or MCA.
Signed-off-by: Yang Wang <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
It's used for more than just SR-IOV now, so move it to
amdgpu_gmc.c and rename it to better match the functionality and
update the comments in the code paths to better document
when each path is used and why. No functional change.
Reviewed-by: Shaoyun.liu <[email protected]>
Acked-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Cc: [email protected]
Cc: [email protected]
|
|
Some chips provide both average and input power. Previously
we just exposed average power, add a new query for input
power.
Example userspace:
https://github.com/Umio-Yasuno/libdrm-amdgpu-sys-rs/tree/input_power
Reviewed-by: Yang Wang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Check and report firmware boot status if it doesn't
reach steady status.
Signed-off-by: Hawking Zhang <[email protected]>
Reviewed-by: Yang Wang <[email protected]>
Reviewed-by: Le Ma <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
add ACA bank dump debugfs support
Signed-off-by: Yang Wang <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
add ACA kernel hardware error log support.
Signed-off-by: Yang Wang <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Check and report boot status if discovery failed.
Signed-off-by: Hawking Zhang <[email protected]>
Reviewed-by: Yang Wang <[email protected]>
Reviewed-by: Le Ma <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Add ras helper function to query boot time gpu
errors.
v2: use aqua_vanjaram smn addressing pattern
Signed-off-by: Hawking Zhang <[email protected]>
Reviewed-by: Yang Wang <[email protected]>
Reviewed-by: Le Ma <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
v1:
implement new RAS ACA driver code framework.
v2:
- rename aca_bank_set to aca_banks.
- rename aca_source_xxx to aca_handle_xxx.
v3:
Optimize some function implementation details. (from Hawking's suggestion)
Signed-off-by: Yang Wang <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
To allow using this helper for indirect access when
nbio funcs is not available. For instance, in ip
discovery phase.
v2: define macro for pcie_index/data/index_hi fallback.
Signed-off-by: Hawking Zhang <[email protected]>
Reviewed-by: Yang Wang <[email protected]>
Reviewed-by: Le Ma <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Will replace it with new implementation to cover
boot fails in ip discovery phase.
Signed-off-by: Hawking Zhang <[email protected]>
Reviewed-by: Yang Wang <[email protected]>
Reviewed-by: Le Ma <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
So kernel message has the device pcie bdf information,
which helps issue debugging especially in multiple GPU
system.
Signed-off-by: Hawking Zhang <[email protected]>
Reviewed-by: Tao Zhou <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
DMABuf imports in compute VMs are not wrapped in a kgd_mem object on the
process_info->kfd_bo_list. There is no explicit KFD API call to validate
them or add eviction fences to them.
This patch automatically validates and fences dymanic DMABuf imports when
they are added to a compute VM. Revalidation after evictions is handled
in the VM code.
v2:
* Renamed amdgpu_vm_validate_evicted_bos to amdgpu_vm_validate
* Eliminated evicted_user state, use evicted state for VM BOs and user BOs
* Fixed and simplified amdgpu_vm_fence_imports, depends on reserved BOs
* Moved dma_resv_reserve_fences for amdgpu_vm_fence_imports into
amdgpu_vm_validate, outside the vm->status_lock
* Added dummy version of amdgpu_amdkfd_bo_validate_and_fence for builds
without KFD
v4: Eliminate amdgpu_vm_fence_imports. It's not needed because the
reservation with its fences is shared with the export, as long as all
imports are from KFD, with the exports already reserved, validated and
fenced by the KFD restore worker.
v5: Reintroduced separate evicted_user state to simplify the state machine
and CS error handling when amdgpu_vm_validate is called without a ticket.
Signed-off-by: Felix Kuehling <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
No longer needed.
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Cc: [email protected] # 6.7.x
|
|
To bring debug options into effect in early initialization phase
Signed-off-by: Le Ma <[email protected]>
Reviewed-by: Lijo Lazar <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Use debug_mask=0x8 param to help isolating data path issues
on new systems in early phase.
v2: rename the flag for explicitness (lijo)
Signed-off-by: Le Ma <[email protected]>
Reviewed-by: Lijo Lazar <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
This reverts commit c572abffe9f50c8ba33060865449313b3f588c35.
Will use debug module param instead of independent module param.
Signed-off-by: Le Ma <[email protected]>
Reviewed-by: Lijo Lazar <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
This patch to update regGL2C_CTRL4 in golden setting.
Signed-off-by: Yifan Zhang <[email protected]>
Reviewed-by: Tim Huang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Cc: [email protected] # 6.7.x
|
|
In function 'amdgpu_device_need_post(struct amdgpu_device *adev)' -
'adev->pm.fw' may not be released before return.
Using the function release_firmware() to release adev->pm.fw.
Thus fixing the below:
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:1571 amdgpu_device_need_post() warn: 'adev->pm.fw' from request_firmware() not released on lines: 1554.
Cc: Monk Liu <[email protected]>
Cc: Christian König <[email protected]>
Cc: Alex Deucher <[email protected]>
Signed-off-by: Srinivasan Shanmugam <[email protected]>
Suggested-by: Lijo Lazar <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
vpe_u1_8_from_fraction()
The variables 'numerator' and 'denominator', are unsigned 16-bit integer
types, that can never be less than 0.
Thus fixing the below:
drivers/gpu/drm/amd/amdgpu/amdgpu_vpe.c:62 vpe_u1_8_from_fraction() warn: unsigned 'numerator' is never less than zero.
drivers/gpu/drm/amd/amdgpu/amdgpu_vpe.c:63 vpe_u1_8_from_fraction() warn: unsigned 'denominator' is never less than zero.
Cc: Peyton Lee <[email protected]>
Cc: Lang Yu <[email protected]>
Cc: Christian König <[email protected]>
Cc: Alex Deucher <[email protected]>
Signed-off-by: Srinivasan Shanmugam <[email protected]>
Reviewed-by: Peyton Lee <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
The amdgpu_gmc_vram_checking() function in emulation checks whether
all of the memory range of shared system memory could be accessed by
GPU, from this aspect, -EIO is returned for error scenarios.
Fixes the below:
drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c:919 gmc_v6_0_hw_init() warn: missing error code? 'r'
drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c:1103 gmc_v7_0_hw_init() warn: missing error code? 'r'
drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c:1223 gmc_v8_0_hw_init() warn: missing error code? 'r'
drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c:2344 gmc_v9_0_hw_init() warn: missing error code? 'r'
Cc: Xiaojian Du <[email protected]>
Cc: Lijo Lazar <[email protected]>
Cc: Christian König <[email protected]>
Cc: Alex Deucher <[email protected]>
Signed-off-by: Srinivasan Shanmugam <[email protected]>
Suggested-by: Christian König <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
VM_L2_CNTL* should not be programmed on driver unload under SRIOV.
These regs are skipped during SRIOV driver init.
Signed-off-by: Victor Lu <[email protected]>
Reviewed-by: Vignesh Chander <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
This patch to update ATHUB_MISC_CNTL offset for athub v3.3
v2: correct a typo (Tim)
v3: correct patch title (Lang)
Signed-off-by: Yifan Zhang <[email protected]>
Acked-by: Alex Deucher <[email protected]>
Reviewed-by: Tim Huang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
For backwards compatibility with userspace.
Fixes: 47f1724db4fe ("drm/amd: Introduce `AMDGPU_PP_SENSOR_GPU_INPUT_POWER`")
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2897
Reviewed-by: Yang Wang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Use a generic comment for AMDGPU_VM_RESERVED_VRAM size.
Signed-off-by: James Zhu <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|