Age | Commit message (Collapse) | Author | Files | Lines |
|
We have a debugfs hook to directly call into i915_gem_shrink() with the
fs_reclaim acquire annotations to simulate hitting direct reclaim.
However we should also annotate this with memalloc_noreclaim, which will
set PF_MEMALLOC for us on the current context, to ensure we can't
re-enter direct reclaim(just like "real" direct reclaim does). This is
an issue now that ttm_bo_validate could potentially be called here,
which might try to allocate a tiny amount of memory to hold the new
ttm_resource struct, as per the below splat:
[ 2507.913844] WARNING: possible recursive locking detected
[ 2507.913848] 5.16.0-rc4+ #5 Tainted: G U
[ 2507.913853] --------------------------------------------
[ 2507.913856] gem_exec_captur/1825 is trying to acquire lock:
[ 2507.913861] ffffffffb9df2500 (fs_reclaim){..}-{0:0}, at: kmem_cache_alloc_trace+0x30/0x390
[ 2507.913875]
but task is already holding lock:
[ 2507.913879] ffffffffb9df2500 (fs_reclaim){..}-{0:0}, at: i915_drop_caches_set+0x1c9/0x2c0 [i915]
[ 2507.913962]
other info that might help us debug this:
[ 2507.913966] Possible unsafe locking scenario:
[ 2507.913970] CPU0
[ 2507.913973] ----
[ 2507.913975] lock(fs_reclaim);
[ 2507.913979] lock(fs_reclaim);
[ 2507.913983]
DEADLOCK ***
[ 2507.913988] May be due to missing lock nesting notation
[ 2507.913992] 4 locks held by gem_exec_captur/1825:
[ 2507.913997] #0: ffff888101f6e460 (sb_writers#17){..}-{0:0}, at: ksys_write+0xe9/0x1b0
[ 2507.914009] #1: ffff88812d99e2b8 (&attr->mutex){..}-{3:3}, at: simple_attr_write+0xbb/0x220
[ 2507.914019] #2: ffffffffb9df2500 (fs_reclaim){..}-{0:0}, at: i915_drop_caches_set+0x1c9/0x2c0 [i915]
[ 2507.914085] #3: ffff8881b4a11b20 (reservation_ww_class_mutex){..}-{3:3}, at: ww_mutex_trylock+0x43f/0xcb0
[ 2507.914097]
stack backtrace:
[ 2507.914102] CPU: 0 PID: 1825 Comm: gem_exec_captur Tainted: G U 5.16.0-rc4+ #5
[ 2507.914109] Hardware name: ASUS System Product Name/PRIME B560M-A AC, BIOS 0403 01/26/2021
[ 2507.914115] Call Trace:
[ 2507.914118] <TASK>
[ 2507.914121] dump_stack_lvl+0x59/0x73
[ 2507.914128] __lock_acquire.cold+0x227/0x3b0
[ 2507.914135] ? lockdep_hardirqs_on_prepare+0x410/0x410
[ 2507.914141] ? __lock_acquire+0x23ca/0x5000
[ 2507.914147] lock_acquire+0x19c/0x4b0
[ 2507.914152] ? kmem_cache_alloc_trace+0x30/0x390
[ 2507.914157] ? lock_release+0x690/0x690
[ 2507.914163] ? lock_is_held_type+0xe4/0x140
[ 2507.914170] ? ttm_sys_man_alloc+0x47/0xb0 [ttm]
[ 2507.914178] fs_reclaim_acquire+0x11a/0x160
[ 2507.914183] ? kmem_cache_alloc_trace+0x30/0x390
[ 2507.914188] kmem_cache_alloc_trace+0x30/0x390
[ 2507.914192] ? lock_release+0x37f/0x690
[ 2507.914198] ttm_sys_man_alloc+0x47/0xb0 [ttm]
[ 2507.914206] ttm_bo_pipeline_gutting+0x70/0x440 [ttm]
[ 2507.914214] ? ttm_mem_io_free+0x150/0x150 [ttm]
[ 2507.914221] ? lock_is_held_type+0xe4/0x140
[ 2507.914227] ttm_bo_validate+0x2fb/0x370 [ttm]
[ 2507.914234] ? lock_acquire+0x19c/0x4b0
[ 2507.914239] ? ttm_bo_bounce_temp_buffer.constprop.0+0xf0/0xf0 [ttm]
[ 2507.914246] ? lock_acquire+0x131/0x4b0
[ 2507.914251] ? lock_is_held_type+0xe4/0x140
[ 2507.914257] i915_ttm_shrinker_release_pages+0x2bc/0x490 [i915]
[ 2507.914339] ? i915_ttm_swap_notify+0x130/0x130 [i915]
[ 2507.914429] ? i915_gem_object_release_mmap_offset+0x32/0x250 [i915]
[ 2507.914529] i915_gem_shrink+0xb14/0x1290 [i915]
[ 2507.914616] ? ___i915_gem_object_make_shrinkable+0x3e0/0x3e0 [i915]
[ 2507.914698] ? _raw_spin_unlock_irqrestore+0x2d/0x60
[ 2507.914705] ? track_intel_runtime_pm_wakeref+0x180/0x230 [i915]
[ 2507.914777] i915_gem_shrink_all+0x4b/0x70 [i915]
[ 2507.914857] i915_drop_caches_set+0x227/0x2c0 [i915]
Reported-by: Thomas Hellström <[email protected]>
Signed-off-by: Matthew Auld <[email protected]>
Reviewed-by: Thomas Hellström <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
ttm->num_pages is uint32_t which was causing very large buffers to
only populate a truncated size.
This fixes gem_create@create-clear igt test on large memory systems.
Fixes: 7ae034590cea ("drm/i915/ttm: add tt shmem backend")
Signed-off-by: Robert Beckett <[email protected]>
Reviewed-by: Matthew Auld <[email protected]>
Signed-off-by: Matthew Auld <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
There are a few details specific to the GETFB2 IOCTL.
It's not immediately clear how user-space should check for the
number of planes. Suggest using the handles field or the pitches
field.
The modifier array is filled with zeroes, ie. DRM_FORMAT_MOD_LINEAR.
So explicitly tell user-space to not look at it unless the flag is
set.
Changes in v2 (Daniel):
- Mention that handles should be used to compute the number of planes,
and only refer to pitches as a fallback.
- Reword bit about undefined modifier.
Signed-off-by: Simon Ser <[email protected]>
Acked-by: Daniel Vetter <[email protected]>
Acked-by: Pekka Paalanen <[email protected]>
Acked-by: Daniel Stone <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
This extends the previous sanitychecking of device memory to read/write
all the memory on the device during the device probe, ala memtest86,
as an optional module parameter: i915.memtest=1. This is not expected to
be fast, but a reasonably thorough verfification that the device memory
is accessible and doesn't return bit errors.
v2: Rebased.
Suggested-by: Matthew Auld <[email protected]>
Signed-off-by: Chris Wilson <[email protected]>
Cc: Matthew Auld <[email protected]>
Signed-off-by: Ramalingam C <[email protected]>
Reviewed-by: Matthew Auld <[email protected]>
Reviewed-by: Andi Shyti <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
As we setup the memory regions for the device, give each a quick test to
verify that we can read and write to the full iomem range. This ensures
that our physical addressing for the device's memory is correct, and
some reassurance that the memory is functional.
v2: wrapper for memtest [Chris]
v3: Removed the unused ptr i915 [Chris]
v4: used the %pa for the resource_size_t.
Signed-off-by: Chris Wilson <[email protected]>
Cc: Matthew Auld <[email protected]>
Signed-off-by: Ramalingam C <[email protected]>
Reviewed-by: Matthew Auld <[email protected]>
Reviewed-by: Andi Shyti <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
Remove the portion of stolen memory reserved for private use from driver
access.
Signed-off-by: Chris Wilson <[email protected]>
cc: Matthew Auld <[email protected]>
Signed-off-by: Ramalingam C <[email protected]>
Reviewed-by: Matthew Auld <[email protected]>
Reviewed-by: Andi Shyti <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
Thomas Zimmermann requested a fixes backmerge, specifically also for
96c5f82ef0a1 ("drm/vc4: fix error code in vc4_create_object()")
Just a bunch of adjacent changes conflicts, even the big pile of them
in vc4.
Signed-off-by: Daniel Vetter <[email protected]>
|
|
intel_device_info.h references struct pci_dev but does not ensure that
the struct has been declared, causing build failures if something in
other headers changes so that the implicit dependency it is relying on
is no longer satisfied:
In file included from drivers/gpu/drm/i915/intel_device_info.h:32,
from drivers/gpu/drm/i915/gt/uc/intel_uc_fw.h:11,
from drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c:11:
drivers/gpu/drm/i915/display/intel_display.h:643:39: error: 'struct pci_dev' declared inside parameter list will not be visible outside of this definition or declaration [-Werror]
643 | bool intel_modeset_probe_defer(struct pci_dev *pdev);
| ^~~~~~~
cc1: all warnings being treated as errors
Add a declaration of the struct to fix this.
Signed-off-by: Mark Brown <[email protected]>
Fixes: 94b541f53db1 ("drm/i915: Add intel_modeset_probe_defer() helper")
Reviewed-by: Jani Nikula <[email protected]>
Signed-off-by: Jani Nikula <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
At the reset hook, call __drm_atomic_helper_plane_reset which is
called at the initialization of the plane and sets the default value of
rotation on all planes to DRM_MODE_ROTATE_0 which is equal to 1.
Tested on Jacuzzi (MTK).
Resolves IGT@kms_properties@plane-properties-{legacy,atomic}
Signed-off-by: Mark Yacoub <[email protected]>
Signed-off-by: Chun-Kuang Hu <[email protected]>
|
|
I am seeing some crash logs which imply that we are trying to use
crashdumper hw to read back GPU state when the GPU isn't initialized.
This doesn't go well (for example, GPU could be in 32b address mode
and ignoring the upper bits of buffer that it is trying to dump state
to).
I'm not *quite* sure how we get into this state in the first place,
but lets not make a bad situation worse by triggering iova fault
crashes.
While we're at it, also add the information about whether the GPU is
initialized to the devcore dump to make this easier to see in the
logs (which makes the WARN_ON() redundant and even harmful because
it fills up the small bit of dmesg we get with the crash report).
Signed-off-by: Rob Clark <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Rob Clark <[email protected]>
|
|
return value form directly instead of
taking this in another redundant variable.
Reported-by: Zeal Robot <[email protected]>
Signed-off-by: chiminghao <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
This starts to make the formated index much more manageable to the reader.
Reviewed-by: Rodrigo Siqueira <[email protected]>
Signed-off-by: Yann Dirson <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Update smu_v13 to match smu_v12 and smu_v11 behavior where this is
fetched from debugfs rather than in kernel logs on every boot.
Signed-off-by: Mario Limonciello <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
This value does not get cached into adev->pm.fw_version during
startup for smu13 like it does for other SMU like smu12.
Signed-off-by: Mario Limonciello <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
This turns previously global functions into static, thus removing
compile-time warnings such as:
warning: no previous prototype for 'get_highest_allowed_voltage_level'
[-Wmissing-prototypes]
742 | unsigned int get_highest_allowed_voltage_level(uint32_t chip_family, uint32_t hw_internal_rev, uint32_t pci_revision_id)
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
warning: no previous prototype for 'rv1_vbios_smu_send_msg_with_param'
[-Wmissing-prototypes]
102 | int rv1_vbios_smu_send_msg_with_param(struct clk_mgr_internal *clk_mgr, unsigned int msg_id, unsigned int param)
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Changes since v1:
- As suggested by Rodrigo Siqueira:
1. Rewrite function signatures to make them more readable.
2. Get rid of unused functions in order to remove 'defined but not
used' warnings.
Reviewed-by: Rodrigo Siqueira <[email protected]>
Signed-off-by: Isabella Basso <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Use the struct display_mode_lib pointer instead of passing lots of large
arrays as parameters by value.
Addresses this warning (resulting in failure to build a RHEL debug kernel
with Werror enabled):
../drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn31/display_mode_vba_31.c: In function ‘UseMinimumDCFCLK’:
../drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn31/display_mode_vba_31.c:7478:1: warning: the frame size of 2128 bytes is larger than 2048 bytes [-Wframe-larger-than=]
NOTE: AFAICT this function previously had no observable effect, since it
only modified parameters passed by value and doesn't return anything.
Now it may modify some values in struct display_mode_lib passed in by
reference.
Reviewed-by: Rodrigo Siqueira <[email protected]>
Signed-off-by: Michel Dänzer <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
dml31_ModeSupportAndSystemConfigurationFull
Move code using the Pipe struct to a new helper function.
Works around[0] this warning (resulting in failure to build a RHEL debug
kernel with Werror enabled):
../drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn31/display_mode_vba_31.c: In function ‘dml31_ModeSupportAndSystemConfigurationFull’:
../drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn31/display_mode_vba_31.c:5740:1: warning: the frame size of 2144 bytes is larger than 2048 bytes [-Wframe-larger-than=]
The culprit seems to be the Pipe struct, so pull the relevant block out
into its own sub-function. (This is porting
commit a62427ef9b55 ("drm/amd/display: Reduce stack size for dml21_ModeSupportAndSystemConfigurationFull")
from dml31 to dml21)
[0] AFAICT this doesn't actually reduce the total amount of stack which
can be used, just moves some of it from
dml31_ModeSupportAndSystemConfigurationFull to the new helper function,
so the former happens to no longer exceed the limit for a single
function.
Reviewed-by: Rodrigo Siqueira <[email protected]>
Signed-off-by: Michel Dänzer <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Fix the warning below:
warning: Cannot understand * \file amdgpu_ioc32.c
on line 2 - I thought it was a doc line
Changes since v1:
- As suggested by Alexander Deucher:
1. Reduce diff to minimum as this DOC section doesn't provide much
value.
Signed-off-by: Isabella Basso <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
This fixes the warnings below, and also drops the display_count
variable, as it's unused.
In function 'svm_range_map_to_gpu':
warning: variable 'bo_va' set but not used [-Wunused-but-set-variable]
1172 | struct amdgpu_bo_va bo_va;
| ^~~~~
...
In function 'dcn201_update_clocks':
warning: variable 'enter_display_off' set but not used [-Wunused-but-set-variable]
132 | bool enter_display_off = false;
| ^~~~~~~~~~~~~~~~~
Changes since v1:
- As suggested by Rodrigo Siqueira:
1. Drop display_count variable.
- As suggested by Felix Kuehling:
1. Remove block surrounding amdgpu_xgmi_same_hive.
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Isabella Basso <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
This commit fixes the compile-time warning below:
warning: no previous prototype for ‘amdgpu_ras_mca_query_error_status’
[-Wmissing-prototypes]
Changes since v1:
- As suggested by Alexander Deucher:
1. Make function static instead of adding prototype.
Signed-off-by: Isabella Basso <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
`edp_stream` is only used when backend is enabled on eDP, don't
declare the variable outside that scope.
Signed-off-by: Mario Limonciello <[email protected]>
Reviewed-by: Harry Wentland <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
There are a few places that this isn't checked that could potentially
be a NULL pointer access.
Signed-off-by: Mario Limonciello <[email protected]>
Reviewed-by: Harry Wentland <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
For userptr bo, if adev is not in IOMMU isolation mode, RAM direct map
to GPU, multiple GPUs use same system memory dma mapping address, they
can share the original mem->bo in attachment to reduce dma address array
memory usage.
Signed-off-by: Philip Yang <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
If host and amdgpu IOMMU is not enabled or IOMMU is pass through mode,
set adev->ram_is_direct_mapped flag which will be used to optimize
memory usage for multi GPU mappings.
Signed-off-by: Philip Yang <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
In the DC driver, we have multiple acronyms that are not obvious most of
the time; the same idea is valid for amdgpu. This commit introduces a DC
and amdgpu glossary in order to make it easier to navigate through our
driver.
Changes since V3:
- Yann: Add new acronyms to amdgpu glossary
- Daniel: Add link between dc and amdgpu glossary
Changes since V2:
- Add MMHUB
Changes since V1:
- Yann: Divide glossary based on driver context.
- Alex: Make terms more consistent and update CPLIB
- Add new acronyms to the glossary
Reviewed-by: Yann Dirson <[email protected]>
Reviewed-by: Harry Wentland <[email protected]>
Signed-off-by: Rodrigo Siqueira <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
This commit describes how DCN works by providing high-level diagrams
with an explanation of each component. In particular, it details the
Global Sync signals.
Change since V2:
- Add a comment about MMHUBBUB.
Reviewed-by: Yann Dirson <[email protected]>
Reviewed-by: Harry Wentland <[email protected]>
Signed-off-by: Rodrigo Siqueira <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Introduce how to collect DTN log from debugfs.
Reviewed-by: Yann Dirson <[email protected]>
Reviewed-by: Harry Wentland <[email protected]>
Signed-off-by: Rodrigo Siqueira <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Display core provides a feature that makes it easy for users to debug
Pipe Split. This commit introduces how to use such a debug option.
Reviewed-by: Yann Dirson <[email protected]>
Reviewed-by: Harry Wentland <[email protected]>
Signed-off-by: Rodrigo Siqueira <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Display core provides a feature that makes it easy for users to debug
Multiple planes by enabling a visual notification at the bottom of each
plane. This commit introduces how to use such a feature.
Reviewed-by: Yann Dirson <[email protected]>
Reviewed-by: Harry Wentland <[email protected]>
Signed-off-by: Rodrigo Siqueira <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Display core documentation is not well organized, and it is hard to find
information due to the lack of sections. This commit reorganizes the
documentation layout, and it is preparation work for future changes.
Changes since V1:
- Christian: Group amdgpu documentation together.
- Daniel: Drop redundant amdgpu prefix.
- Jani: Create index pages.
- Yann: Mirror display folder in the documentation.
Reviewed-by: Yann Dirson <[email protected]>
Reviewed-by: Harry Wentland <[email protected]>
Signed-off-by: Rodrigo Siqueira <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
SMU firmware expects the driver maintains error context
and doesn't interact with SMU any more when SMU errors
occurred. That will aid in debugging SMU firmware issues.
Add SMU debug option support for this request, it can be
enabled or disabled via amdgpu_smu_debug debugfs file.
Use a 32-bit mask to indicate corresponding debug modes.
Currently, only one mode(HALT_ON_ERROR) is supported.
When enabled, it brings hardware to a kind of halt state
so that no one can touch it any more in the envent of SMU
errors.
The dirver interacts with SMU via sending messages. And
threre are three ways to sending messages to SMU in current
implementation. Handle them respectively as following:
1, smu_cmn_send_smc_msg_with_param() for normal timeout cases
Halt on any error.
2, smu_cmn_send_msg_without_waiting()/smu_cmn_wait_for_response()
for longer timeout cases
Halt on errors apart from ETIME. Otherwise this way won't work.
Let the user handle ETIME error in such a case.
3, smu_cmn_send_msg_without_waiting() for no waiting cases
Halt on errors apart from ETIME. Otherwise second way won't work.
== Command Guide ==
1, enable HALT_ON_ERROR mode
# echo 0x1 > /sys/kernel/debug/dri/0/amdgpu_smu_debug
2, disable HALT_ON_ERROR mode
# echo 0x0 > /sys/kernel/debug/dri/0/amdgpu_smu_debug
v5:
- Use bit mask to allow more debug features.(Evan)
- Use WRAN() instead of BUG().(Evan)
v4:
- Set to halt state instead of a simple hang.(Christian)
v3:
- Use debugfs_create_bool().(Christian)
- Put variable into smu_context struct.
- Don't resend command when timeout.
v2:
- Resend command when timeout.(Lijo)
- Use debugfs file instead of module parameter.
Signed-off-by: Lang Yu <[email protected]>
Reviewed-by: Lijo Lazar <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
It is useful to maintain error context when debugging
SW/FW issues. Introduce amdgpu_device_halt() for this
purpose. It will bring hardware to a kind of halt state,
so that no one can touch it any more.
Compare to a simple hang, the system will keep stable
at least for SSH access. Then it should be trivial to
inspect the hardware state and see what's going on.
v2:
- Set adev->no_hw_access earlier to avoid potential crashes.(Christian)
Suggested-by: Christian Koenig <[email protected]>
Suggested-by: Andrey Grodzovsky <[email protected]>
Signed-off-by: Lang Yu <[email protected]>
Reviewed-by: Christian Koenig <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
in case they are not avaiable in early phase
Signed-off-by: Hawking Zhang <[email protected]>
Reviewed-by: Le Ma <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Leave this bit as hardware default setting
Signed-off-by: Hawking Zhang <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
should count on GC IP base address
Signed-off-by: Le Ma <[email protected]>
Signed-off-by: Hawking Zhang <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
read and authenticate ip discovery binary getting from
vram first, if it is not valid, read and authenticate
the one getting from file
Signed-off-by: Hawking Zhang <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
To be used to check ip discovery binary signature
Signed-off-by: Hawking Zhang <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
add _from_vram in the funciton name to diffrentiate
the one used to read from file
Signed-off-by: Hawking Zhang <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
To be used when ip_discovery binary is not carried by vbios
Signed-off-by: Hawking Zhang <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Guest OS will setup VCN instance 1 which is disabled as an enabled instance and
execute initialization work on it, but this causes VCN ib ring test failure
on the disabled VCN instance during modprobe:
amdgpu 0000:00:08.0: amdgpu: ring vcn_enc_1.0 uses VM inv eng 5 on hub 1
amdgpu 0000:00:08.0: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* IB test failed on vcn_dec_0 (-110).
amdgpu 0000:00:08.0: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* IB test failed on vcn_enc_0.0 (-110).
[drm:amdgpu_device_delayed_init_work_handler [amdgpu]] *ERROR* ib ring test failed (-110).
v2: drop amdgpu_discovery_get_vcn_version and rename sriov_config to
vcn_config
v3: modify VCN's revision in SR-IOV and bare-metal
Fixes: baf3f8f3740625 ("drm/amdgpu: handle SRIOV VCN revision parsing")
Signed-off-by: Leslie Shi <[email protected]>
Reviewed-by: Lijo Lazar <[email protected]>
Reviewed-by: Guchun Chen <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Fix following warning in SRIOV during modprobe:
amdgpu 0000:00:08.0: GFX9+ requires FB check based on format modifier
WARNING: CPU: 0 PID: 1023 at drivers/gpu/drm/amd/amdgpu/amdgpu_display.c:1150 amdgpu_display_framebuffer_init+0x8e7/0xb40 [amdgpu]
Signed-off-by: Leslie Shi <[email protected]>
Reviewed-by: Guchun Chen <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
This patch reverts the following:
commit 48733b224fa7ba ("drm/amdkfd: add Navi2x to GWS init conditions")
Disable GWS usage in default settings for now due to FW bugs.
Signed-off-by: Jonathan Kim <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Initalize GWS on Navi2x with mec2_fw_version >= 0x42.
Signed-off-by: Graham Sider <[email protected]>
Reviewed-and-tested-by: Jonathan Kim <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
We found some headaches on ASICs don't need that,
so remove that for them.
Suggested-by: Lijo Lazar <[email protected]>
Signed-off-by: Lang Yu <[email protected]>
Reviewed-by: Kevin Wang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Currently, we don't find some neccesities to power on/off
SDMA in SMU hw_init/fini(). It makes more sense in SDMA
hw_init/fini().
Signed-off-by: Lang Yu <[email protected]>
Reviewed-by: Kevin Wang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Hawaii support is mostly untested these days. ROCm user mode also
depends on custom firmware for AQL packet processing, that was never
pushed upstream due to quality regressions in graphics driver testing.
Signed-off-by: Felix Kuehling <[email protected]>
Reviewed-by: Kent Russell <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
If an existing SVM range overlaps an svm_range_set_attr call, we would
normally split it in order to update only the overlapping part.
However, if the attributes of the existing range would not be changed
splitting it is unnecessary.
Signed-off-by: Felix Kuehling <[email protected]>
Reviewed-by: Philip Yang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
The existing function doesn't compare the access bitmaps and flags.
This can result in failure to update those attributes in existing
ranges when all other attributes remained unchanged.
Because the access and flags attributes modify only some bits in the
respective bitmaps, we cannot compare them directly. Instead we need to
check whether applying the attributes to a particular range would
change the bitmaps.
A PREFETCH_LOC attribute must always trigger a migration, even if the
attribute value remains unchanged. E.g. if some pages were migrated due
to a CPU page fault, a prefetch must still be executed to migrate pages
back to VRAM.
Signed-off-by: Felix Kuehling <[email protected]>
Reviewed-by: Philip Yang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Add null-pointer check after the last svm_range_new call. This was
originally reported by Zhou Qingyang <[email protected]> based on a
static analyzer.
To avoid duplicating the unwinding code from svm_range_handle_overlap,
I merged the two functions into one.
Signed-off-by: Felix Kuehling <[email protected]>
Cc: Zhou Qingyang <[email protected]>
Reviewed-by: Philip Yang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Remove not unique timestamp WARNING as same timestamp interrupt happens
on some chips,
Drain fault need to wait for the processed_timestamp to be truly greater
than the checkpoint or the ring to be empty to be sure no stale faults
are handled.
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1818
Signed-off-by: Philip Yang <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|