Age | Commit message (Collapse) | Author | Files | Lines |
|
A static checker pointed out, that bo_va->base.bo was already derefenced
earlier in the same scope. Therefore this check is unnecessary here.
Reported-by: Dan Carpenter <[email protected]>
Fixes: 50661eb1a2c8 ("drm/amdgpu: Auto-validate DMABuf imports in compute VMs")
Reviewed-by: Kent Russell <[email protected]>
Signed-off-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Calling amdgpu_device_ip_resume_phase1() during shutdown leaves the
HW in an active state and is an unbalanced use of the IP callbacks.
Using the IP callbacks like this can lead to memory leaks, double
free and imbalanced reference counters.
Leaving the HW in an active state can lead to DMA accesses to memory now
freed by the driver.
Both is a complete no-go for driver unload so completely revert the
workaround for now.
This reverts commit f5c7e7797060255dbc8160734ccc5ad6183c5e04.
Signed-off-by: Christian König <[email protected]>
Acked-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
ida_alloc() and ida_free() should be preferred to the deprecated
ida_simple_get() and ida_simple_remove().
Note that the upper limit of ida_simple_get() is exclusive, but the one of
ida_alloc_range() is inclusive. So a -1 has been added when needed.
Signed-off-by: Christophe JAILLET <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
otherwise drm_client_dev_unregister() would try to
kfree(&adev->kfd.client).
Fixes: 1819200166ce ("drm/amdkfd: Export DMABufs from KFD using GEM handles")
Signed-off-by: Flora Cui <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Fix the following errors reported by checkpatch:
ERROR: space required after that ',' (ctx:VxV)
Signed-off-by: chenxuebing <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Fix the following errors reported by checkpatch:
ERROR: that open brace { should be on the previous line
Signed-off-by: chenxuebing <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
After removal of the legacy EEPROM driver and I2C_CLASS_DDC support in
olpc_dcon there's no i2c client driver left supporting I2C_CLASS_DDC.
Class-based device auto-detection is a legacy mechanism and shouldn't
be used in new code. So we can remove this class completely now.
Acked-by: Alex Deucher <[email protected]>
Acked-by: Dmitry Baryshkov <[email protected]>
Acked-by: Harry Wentland <[email protected]>
Acked-by: Heiko Stuebner <[email protected]>
Acked-by: Jani Nikula <[email protected]>
Acked-by: Jernej Skrabec <[email protected]>
Reviewed-by: Thomas Zimmermann <[email protected]>
Reviewed-by: Andi Shyti <[email protected]>
Reviewed-by: AngeloGioacchino Del Regno <[email protected]>
Reviewed-by: Neil Armstrong <[email protected]>
Signed-off-by: Heiner Kallweit <[email protected]>
Signed-off-by: Wolfram Sang <[email protected]>
|
|
Fix the following errors reported by checkpatch:
ERROR: open brace '{' following struct go on the same line
Signed-off-by: chenxuebing <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Fix the following errors reported by checkpatch:
ERROR: need consistent spacing around '-' (ctx:WxV)
Signed-off-by: chenxuebing <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Fix the following errors reported by checkpatch:
ERROR: space required before the open parenthesis '('
ERROR: that open brace { should be on the previous line
Signed-off-by: chenxuebing <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Fix the following errors reported by checkpatch:
ERROR: that open brace { should be on the previous line
Signed-off-by: chenxuebing <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Fix the following errors reported by checkpatch:
ERROR: do not initialise globals to 0
Signed-off-by: chenxuebing <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Fix the following errors reported by checkpatch:
ERROR: that open brace { should be on the previous line
Signed-off-by: chenxuebing <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
If the RLC firmware is invalid because of wrong header size,
the pointer to the rlc firmware is released in function
amdgpu_ucode_request. There will be a null pointer error
in subsequent use. So skip validation to fix it.
Signed-off-by: Ma Jun <[email protected]>
Acked-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Move ras capablity check to amdgpu_ras_check_supported.
Driver will query ras capablity through psp interace, or
vbios interface, or specific ip callbacks.
Signed-off-by: Hawking Zhang <[email protected]>
Reviewed-by: Tao Zhou <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Instead of traditional atomfirmware interfaces for RAS
capability, host driver can query ras capability from
psp starting from psp v13_0_6.
v2: drop redundant local variable from get_ras_capability.
Signed-off-by: Hawking Zhang <[email protected]>
Reviewed-by: Tao Zhou <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Fix the following errors reported by checkpatch:
ERROR: space prohibited before that '++' (ctx:WxB)
Signed-off-by: chenxuebing <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Fix the following errors reported by checkpatch:
ERROR: that open brace { should be on the previous line
ERROR: trailing statements should be on next line
Signed-off-by: chenxuebing <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Fix the following errors reported by checkpatch:
spaces required around that '=' (ctx:VxV)
Signed-off-by: chenxuebing <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Fix the following errors reported by checkpatch:
ERROR: "foo* bar" should be "foo *bar"
Signed-off-by: chenxuebing <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Fix the following errors reported by checkpatch:
ERROR: that open brace { should be on the previous line
Signed-off-by: chenxuebing <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Fix the following errors reported by checkpatch:
ERROR: that open brace { should be on the previous line
Signed-off-by: chenxuebing <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Theoretically, it would be possible for a buggy or malicious VBIOS to
overwrite past the bounds of the passed parameters (or its own
workspace); add bounds checking to prevent this from happening.
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3093
Signed-off-by: Alexander Richards <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Driver and firmware share the same ras block enum.
Signed-off-by: Hawking Zhang <[email protected]>
Reviewed-by: Tao Zhou <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
use new ACA macro to instead of MCA
Signed-off-by: Yang Wang <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Separate deferred error from UE and CE and log it
individually.
Signed-off-by: Candice Li <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Needs to do bad page retirement for deferred errors.
v2: Drop unused dev_info.
Signed-off-by: YiPeng Chai <[email protected]>
Signed-off-by: Candice Li <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
add xgmi v6.4.0 ACA driver support
Signed-off-by: Yang Wang <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
v1:
add mmhub v1.8 ACA driver support
v2:
use macro to define smn address value.
Signed-off-by: Yang Wang <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
v1:
add sdma v4.4.2 ACA driver support
v2:
use macro to define smn address value.
v3: squash in fix for unbalanced irqs
Signed-off-by: Yang Wang <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
v1:
add gfx v9.4.3 ACA driver support
v2:
use macro to define smn address value.
Signed-off-by: Yang Wang <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
large bar
Some customer platforms do not enable mmconfig for various reasons,
such as bios bug, and therefore cannot access the GPU extend configuration
space through mmio.
When the system enters the d3cold state and resumes, the amdgpu driver
fails to resume because the extend configuration space registers of
GPU can't be restored. At this point, Usually we only see some failure
dmesg log printed by amdgpu driver, it is difficult to find the root
cause.
Therefor print a warnning message if the system can't access the
extended configuration space register when using large bar.
Signed-off-by: Ma Jun <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
add umc v12.0 ACA driver support
Signed-off-by: Yang Wang <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
add aca sysfs node support
Signed-off-by: Yang Wang <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
v1:
add ACA error query interface
v2:
Add a new helper function to determine whether to use ACA or MCA.
Signed-off-by: Yang Wang <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
It's used for more than just SR-IOV now, so move it to
amdgpu_gmc.c and rename it to better match the functionality and
update the comments in the code paths to better document
when each path is used and why. No functional change.
Reviewed-by: Shaoyun.liu <[email protected]>
Acked-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Cc: [email protected]
Cc: [email protected]
|
|
Some chips provide both average and input power. Previously
we just exposed average power, add a new query for input
power.
Example userspace:
https://github.com/Umio-Yasuno/libdrm-amdgpu-sys-rs/tree/input_power
Reviewed-by: Yang Wang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Check and report firmware boot status if it doesn't
reach steady status.
Signed-off-by: Hawking Zhang <[email protected]>
Reviewed-by: Yang Wang <[email protected]>
Reviewed-by: Le Ma <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
add ACA bank dump debugfs support
Signed-off-by: Yang Wang <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
add ACA kernel hardware error log support.
Signed-off-by: Yang Wang <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Check and report boot status if discovery failed.
Signed-off-by: Hawking Zhang <[email protected]>
Reviewed-by: Yang Wang <[email protected]>
Reviewed-by: Le Ma <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Add ras helper function to query boot time gpu
errors.
v2: use aqua_vanjaram smn addressing pattern
Signed-off-by: Hawking Zhang <[email protected]>
Reviewed-by: Yang Wang <[email protected]>
Reviewed-by: Le Ma <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
v1:
implement new RAS ACA driver code framework.
v2:
- rename aca_bank_set to aca_banks.
- rename aca_source_xxx to aca_handle_xxx.
v3:
Optimize some function implementation details. (from Hawking's suggestion)
Signed-off-by: Yang Wang <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
To allow using this helper for indirect access when
nbio funcs is not available. For instance, in ip
discovery phase.
v2: define macro for pcie_index/data/index_hi fallback.
Signed-off-by: Hawking Zhang <[email protected]>
Reviewed-by: Yang Wang <[email protected]>
Reviewed-by: Le Ma <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Will replace it with new implementation to cover
boot fails in ip discovery phase.
Signed-off-by: Hawking Zhang <[email protected]>
Reviewed-by: Yang Wang <[email protected]>
Reviewed-by: Le Ma <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
So kernel message has the device pcie bdf information,
which helps issue debugging especially in multiple GPU
system.
Signed-off-by: Hawking Zhang <[email protected]>
Reviewed-by: Tao Zhou <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
DMABuf imports in compute VMs are not wrapped in a kgd_mem object on the
process_info->kfd_bo_list. There is no explicit KFD API call to validate
them or add eviction fences to them.
This patch automatically validates and fences dymanic DMABuf imports when
they are added to a compute VM. Revalidation after evictions is handled
in the VM code.
v2:
* Renamed amdgpu_vm_validate_evicted_bos to amdgpu_vm_validate
* Eliminated evicted_user state, use evicted state for VM BOs and user BOs
* Fixed and simplified amdgpu_vm_fence_imports, depends on reserved BOs
* Moved dma_resv_reserve_fences for amdgpu_vm_fence_imports into
amdgpu_vm_validate, outside the vm->status_lock
* Added dummy version of amdgpu_amdkfd_bo_validate_and_fence for builds
without KFD
v4: Eliminate amdgpu_vm_fence_imports. It's not needed because the
reservation with its fences is shared with the export, as long as all
imports are from KFD, with the exports already reserved, validated and
fenced by the KFD restore worker.
v5: Reintroduced separate evicted_user state to simplify the state machine
and CS error handling when amdgpu_vm_validate is called without a ticket.
Signed-off-by: Felix Kuehling <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
No longer needed.
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Cc: [email protected] # 6.7.x
|
|
To bring debug options into effect in early initialization phase
Signed-off-by: Le Ma <[email protected]>
Reviewed-by: Lijo Lazar <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Use debug_mask=0x8 param to help isolating data path issues
on new systems in early phase.
v2: rename the flag for explicitness (lijo)
Signed-off-by: Le Ma <[email protected]>
Reviewed-by: Lijo Lazar <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|