Age | Commit message (Collapse) | Author | Files | Lines |
|
-v2: 1. rename variable to redue confuse
2. optimize the code
-v3: move new define out of the middle of the code
-v4: squash in minmax error fix (Luben)
When applications try to allocate large system (more than > 128GB),
"stall cpu" is reported.
for such large system memory, walk_page_range takes more than 20s usually.
The warning message can be removed when splitting hmm range into smaller
ones which is not more 64GB for each walk_page_range.
[ 164.437617] amdgpu:amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu:1753: amdgpu: create BO VA 0x7f63c7a00000 size 0x2f16000000 domain CPU
[ 164.488847] amdgpu:amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu:1785: amdgpu: creating userptr BO for user_addr = 7f63c7a00000
[ 185.439116] rcu: INFO: rcu_sched self-detected stall on CPU
[ 185.439125] rcu: 8-....: (20999 ticks this GP) idle=e22/1/0x4000000000000000 softirq=2242/2242 fqs=5249
[ 185.439137] (t=21000 jiffies g=6325 q=1215)
[ 185.439141] NMI backtrace for cpu 8
[ 185.439143] CPU: 8 PID: 3470 Comm: kfdtest Kdump: loaded Tainted: G O 5.12.0-0_fbk5_zion_rc1_5697_g2c723fb88626 #1
[ 185.439147] Hardware name: HPE ProLiant XL675d Gen10 Plus/ProLiant XL675d Gen10 Plus, BIOS A47 11/06/2020
[ 185.439150] Call Trace:
[ 185.439153] <IRQ>
[ 185.439157] dump_stack+0x64/0x7c
[ 185.439163] nmi_cpu_backtrace.cold.7+0x30/0x65
[ 185.439165] ? lapic_can_unplug_cpu+0x70/0x70
[ 185.439170] nmi_trigger_cpumask_backtrace+0xf9/0x100
[ 185.439174] rcu_dump_cpu_stacks+0xc5/0xf5
[ 185.439178] rcu_sched_clock_irq.cold.97+0x112/0x38c
[ 185.439182] ? tick_sched_handle.isra.21+0x50/0x50
[ 185.439185] update_process_times+0x8c/0xc0
[ 185.439189] tick_sched_timer+0x63/0x70
[ 185.439192] __hrtimer_run_queues+0xff/0x250
[ 185.439195] hrtimer_interrupt+0xf4/0x200
[ 185.439199] __sysvec_apic_timer_interrupt+0x51/0xd0
[ 185.439201] sysvec_apic_timer_interrupt+0x69/0x90
[ 185.439206] </IRQ>
[ 185.439207] asm_sysvec_apic_timer_interrupt+0x12/0x20
[ 185.439211] RIP: 0010:clear_page_rep+0x7/0x10
[ 185.439214] Code: e8 fe 7c 51 00 44 89 e2 48 89 ee 48 89 df e8 60 ff ff ff c6 03 00 5b 5d 41 5c c3 cc cc cc cc cc cc cc cc b9 00 02 00 00 31 c0 <f3> 48 ab c3 0f 1f 44 00 00 31 c0 b9 40 00 00 00 66 0f 1f 84 00 00
[ 185.439218] RSP: 0018:ffffc9000f58f818 EFLAGS: 00000246
[ 185.439220] RAX: 0000000000000000 RBX: 0000000000000881 RCX: 000000000000005c
[ 185.439223] RDX: 0000000000100dca RSI: 0000000000000000 RDI: ffff88a59e0e5d20
[ 185.439225] RBP: ffffea0096783940 R08: ffff888118c35280 R09: ffffea0096783940
[ 185.439227] R10: ffff888000000000 R11: 0000160000000000 R12: ffffea0096783980
[ 185.439228] R13: ffffea0096783940 R14: ffff88b07fdfdd00 R15: 0000000000000000
[ 185.439232] prep_new_page+0x81/0xc0
[ 185.439236] get_page_from_freelist+0x13be/0x16f0
[ 185.439240] ? release_pages+0x16a/0x4a0
[ 185.439244] __alloc_pages_nodemask+0x1ae/0x340
[ 185.439247] alloc_pages_vma+0x74/0x1e0
[ 185.439251] __handle_mm_fault+0xafe/0x1360
[ 185.439255] handle_mm_fault+0xc3/0x280
[ 185.439257] hmm_vma_fault.isra.22+0x49/0x90
[ 185.439261] __walk_page_range+0x692/0x9b0
[ 185.439265] walk_page_range+0x9b/0x120
[ 185.439269] hmm_range_fault+0x4f/0x90
[ 185.439274] amdgpu_hmm_range_get_pages+0x24f/0x260 [amdgpu]
[ 185.439463] amdgpu_ttm_tt_get_user_pages+0xc2/0x190 [amdgpu]
[ 185.439603] amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu+0x49f/0x7a0 [amdgpu]
[ 185.439774] kfd_ioctl_alloc_memory_of_gpu+0xfb/0x410 [amdgpu]
Signed-off-by: James Zhu <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
https://gitlab.freedesktop.org/agd5f/linux into drm-fixes
amd-drm-fixes-6.1-2022-11-23:
amdgpu:
- DCN 3.1.4 fixes
- DP MST DSC deadlock fixes
- HMM userptr fixes
- Fix Aldebaran CU occupancy reporting
- GFX11 fixes
- PSP suspend/resume fix
- DCE12 KASAN fix
- DCN 3.2.x fixes
- Rotated cursor fix
- SMU 13.x fix
- DELL platform suspend/resume fixes
- VCN4 SR-IOV fix
- Display regression fix for polled connectors
Signed-off-by: Dave Airlie <[email protected]>
From: Alex Deucher <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
drm-misc-fixes for v6.1-rc7:
- Another amdgpu gang submit fix.
- Use dma_fence_unwrap_for_each when importing sync files.
- Fix race in dma_heap_add().
- Fix use of uninitialized memory in logo.
Signed-off-by: Dave Airlie <[email protected]>
From: Maarten Lankhorst <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
Linux 6.1-rc6
This is needed for drm-misc-next and tegra.
Signed-off-by: Dave Airlie <[email protected]>
|
|
when the edid is read"
This partially reverts 20543be93ca45968f344261c1a997177e51bd7e1.
Calling drm_connector_update_edid_property() in
amdgpu_connector_free_edid() causes a noticeable pause in
the system every 10 seconds on polled outputs so revert this
part of the change.
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2257
Cc: Claudio Suarez <[email protected]>
Acked-by: Luben Tuikov <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Configure related settings to enable it.
Signed-off-by: Tao Zhou <[email protected]>
Signed-off-by: Lijo Lazar <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
- in device_resume, sriov configure interrupt should be in full access,
so release_full_gpu should be done after kfd_resume.
- remove the previous workaround solution for sriov.
Fixes: ec4927d463cb ("drm/amdgpu: fix for suspend/resume sequence under sriov")
Signed-off-by: Shikang Fan <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
If CONFIG_DRM_AMDGPU=y and CONFIG_DRM_AMD_DC is not set,
gcc complained about unused-function :
drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c:1705:13: error: ‘amdgpu_discovery_set_sriov_display’ defined but not used [-Werror=unused-function]
static void amdgpu_discovery_set_sriov_display(struct amdgpu_device *adev)
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors
To fix this error, use CONFIG_DRM_AMD_DC to wrap
the definition of amdgpu_discovery_set_sriov_display().
Fixes: 25263da37693 ("drm/amdgpu: rework SR-IOV virtual display handling")
Signed-off-by: Ren Zhijie <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
[Why]
[ 754.862560] refcount_t: underflow; use-after-free.
[ 754.862898] Call Trace:
[ 754.862903] <TASK>
[ 754.862913] amdgpu_job_free_cb+0xc2/0xe1 [amdgpu]
[ 754.863543] drm_sched_main.cold+0x34/0x39 [amd_sched]
[How]
The fw_fence may be not init, check whether dma_fence_init
is performed before job free
Signed-off-by: Stanley.Yang <[email protected]>
Reviewed-by: Tao Zhou <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Fixes: f7ba887f606b ("drm/amdgpu: Adjust logic around GTT size (v3)")
Reviewed-by: Alex Deucher <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: ZhenGuo Yin <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
root cause that S2A need to use deduct offset flag.
after setting this flag, vcn0 doorbell value works.
so return it as before
Signed-off-by: Jane Jian <[email protected]>
Acked-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
As comment of pci_get_domain_bus_and_slot() says, it returns
a pci device with refcount increment, when finish using it,
the caller must decrement the reference count by calling
pci_dev_put().
So before returning from amdgpu_device_resume|suspend_display_audio(),
pci_dev_put() is called to avoid refcount leak.
Fixes: 3f12acc8d6d4 ("drm/amdgpu: put the audio codec into suspend state before gpu reset V3")
Reviewed-by: Evan Quan <[email protected]>
Signed-off-by: Yang Yingliang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
We can reuse the same buffers on resume.
v2: squash in S4 fix from Shikai
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2213
Reviewed-by: Christian König <[email protected]>
Tested-by: Guilherme G. Piccoli <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Cc: [email protected]
|
|
If mes enabled, reserve VM invalidation engine 5 for firmware.
Signed-off-by: Jack Xiao <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
root cause that S2A need to use deduct offset flag.
after setting this flag, vcn0 doorbell value works.
so return it as before
Signed-off-by: Jane Jian <[email protected]>
Acked-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
when the edid is read"
This partially reverts 20543be93ca45968f344261c1a997177e51bd7e1.
Calling drm_connector_update_edid_property() in
amdgpu_connector_free_edid() causes a noticeable pause in
the system every 10 seconds on polled outputs so revert this
part of the change.
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2257
Cc: Claudio Suarez <[email protected]>
Acked-by: Luben Tuikov <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Cc: [email protected]
|
|
[Why]
[ 754.862560] refcount_t: underflow; use-after-free.
[ 754.862898] Call Trace:
[ 754.862903] <TASK>
[ 754.862913] amdgpu_job_free_cb+0xc2/0xe1 [amdgpu]
[ 754.863543] drm_sched_main.cold+0x34/0x39 [amd_sched]
[How]
The fw_fence may be not init, check whether dma_fence_init
is performed before job free
Signed-off-by: Stanley.Yang <[email protected]>
Reviewed-by: Tao Zhou <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
We can reuse the same buffers on resume.
v2: squash in S4 fix from Shikai
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2213
Reviewed-by: Christian König <[email protected]>
Tested-by: Guilherme G. Piccoli <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Cc: [email protected]
|
|
https://gitlab.freedesktop.org/agd5f/linux into drm-next
amd-drm-next-6.2-2022-11-18:
amdgpu:
- SR-IOV fixes
- Clean up DC checks
- DCN 3.2.x fixes
- DCN 3.1.x fixes
- Don't enable degamma on asics which don't support it
- IP discovery fixes
- BACO fixes
- Fix vbios allocation handling when vkms is enabled
- Drop buggy tdr advanced mode GPU reset handling
- Fix the build when DCN is not set in kconfig
- MST DSC fixes
- Userptr fixes
- FRU and RAS EEPROM fixes
- VCN 4.x RAS support
- Aldrebaran CU occupancy reporting fix
- PSP ring cleanup
amdkfd:
- Memory limit fix
- Enable cooperative launch on gfx 10.3
amd-drm-next-6.2-2022-11-11:
amdgpu:
- SMU 13.x updates
- GPUVM TLB race fix
- DCN 3.1.4 updates
- DCN 3.2.x updates
- PSR fixes
- Kerneldoc fix
- Vega10 fan fix
- GPUVM locking fixes in error pathes
- BACO fix for Beige Goby
- EEPROM I2C address cleanup
- GFXOFF fix
- Fix DC memory leak in error pathes
- Flexible array updates
- Mtype fix for GPUVM PTEs
- Move Kconfig into amdgpu directory
- SR-IOV updates
- Fix possible memory leak in CS IOCTL error path
amdkfd:
- Fix possible memory overrun
- CRIU fixes
radeon:
- ACPI ref count fix
- HDA audio notifier support
- Move Kconfig into radeon directory
UAPI:
- Add new GEM_CREATE flags to help to transition more KFD functionality to the DRM UAPI.
These are used internally in the driver to align location based memory coherency
requirements from memory allocated in the KFD with how we manage GPUVM PTEs. They
are currently blocked in the GEM_CREATE IOCTL as we don't have a user right now.
They are just used internally in the kernel driver for now for existing KFD memory
allocations. So a change to the UAPI header, but no functional change in the UAPI.
From: Alex Deucher <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Dave Airlie <[email protected]>
|
|
If mes enabled, reserve VM invalidation engine 5 for firmware.
Signed-off-by: Jack Xiao <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Cc: [email protected] # 6.0.x
|
|
Allow user to know number of compute units (CU) that are in use at any
given moment. Enable access to the method kgd_gfx_v9_get_cu_occupancy
that computes CU occupancy.
Signed-off-by: Ramesh Errabolu <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Cc: [email protected]
|
|
The basic problem here is that it's not allowed to page fault while
holding the reservation lock.
So it can happen that multiple processes try to validate an userptr
at the same time.
Work around that by putting the HMM range object into the mutex
protected bo list for now.
v2: make sure range is set to NULL in case of an error
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
CC: [email protected]
Signed-off-by: Alex Deucher <[email protected]>
|
|
Since switching to HMM we always need that because we no longer grab
references to the pages.
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Acked-by: Felix Kuehling <[email protected]>
CC: [email protected]
Signed-off-by: Alex Deucher <[email protected]>
|
|
Otherwise it can happen that not all gang members can get a VMID
assigned and we deadlock.
Signed-off-by: Christian König <[email protected]>
Tested-by: Timur Kristóf <[email protected]>
Acked-by: Timur Kristóf <[email protected]>
Fixes: 68ce8b242242 ("drm/amdgpu: add gang submit backend v2")
Reviewed-by: Alex Deucher <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
https://gitlab.freedesktop.org/agd5f/linux into drm-fixes
amd-drm-fixes-6.1-2022-11-16:
amdgpu:
- Fix a possible memory leak in ganng submit error path
- DP tunneling fixes
- DCN 3.1 page flip fix
- DCN 3.2.x fixes
- DCN 3.1.4 fixes
- Don't expose degamma on hardware that doesn't support it
- BACO fixes for SMU 11.x
- BACO fixes for SMU 13.x
- Virtual display fix for devices with no display hardware
amdkfd:
- Memory limit regression fix
Signed-off-by: Dave Airlie <[email protected]>
From: Alex Deucher <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
All of the IP specific versions are the same now, so
we can just use a common function.
Acked-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
This matches what we do for psp 3.1 and makes ring_init
common for all PSP versions.
Acked-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Allow user to know number of compute units (CU) that are in use at any
given moment. Enable access to the method kgd_gfx_v9_get_cu_occupancy
that computes CU occupancy.
Signed-off-by: Ramesh Errabolu <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Register related irq handler.
Signed-off-by: Tao Zhou <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Register irq handler.
Signed-off-by: Tao Zhou <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Initialize JPEG RAS structure and add error query interface.
Signed-off-by: Tao Zhou <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Initialize VCN RAS structure and add RAS status query function.
Signed-off-by: Tao Zhou <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Make the code reusable.
Signed-off-by: Tao Zhou <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
So the code can be reused.
Signed-off-by: Tao Zhou <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Set support flag for VCN/JPEG 4.0.
Signed-off-by: Tao Zhou <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
The patch is enabling mode-1 reset for RAS recovery in fatal error mode.
Signed-off-by: YiPeng Chai <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Reviewed-by: Tao Zhou <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Add support for RAS table at I2C EEPROM address of 0x40000, since on some
ASICs it is not at 0, but at 0x40000.
Cc: Alex Deucher <[email protected]>
Cc: Kent Russell <[email protected]>
Signed-off-by: Luben Tuikov <[email protected]>
Tested-by: Kent Russell <[email protected]>
Reviewed-by: Kent Russell <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Don't assume FRU MCU memory locations for the FRU data fields, or their sizes,
instead read and interpret the IPMI data, as stipulated in the IPMI spec
version 1.0 rev 1.2.
Extract the Product Name, Product Part/Model Number, and the Product Serial
Number by interpreting the IPMI data.
Check the checksums of the stored IPMI data to make sure we don't read and
give corrupted data back the the user.
Eliminate small I2C reads, and instead read the whole Product Info Area in one
go, and then extract the information we're seeking from it.
Eliminates a whole function, making this file smaller.
v2: Clarify changes in the commit message.
Cc: Alex Deucher <[email protected]>
Cc: Kent Russell <[email protected]>
Signed-off-by: Luben Tuikov <[email protected]>
Tested-by: Kent Russell <[email protected]>
Reviewed-by: Kent Russell <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Set the new correct default FRU MCU I2C address for newer ASICs, so that we
can correctly read the Product Name, Product Part/Model Number and Serial
Number.
On newer ASICs, the FRU MCU was moved to I2C address 0x58.
Cc: Alex Deucher <[email protected]>
Cc: Kent Russell <[email protected]>
Signed-off-by: Luben Tuikov <[email protected]>
Tested-by: Kent Russell <[email protected]>
Reviewed-by: Kent Russell <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Allow non-standard EEPROM I2C address of 0x58, where the Device Type
Identifier is 1011b, where we form 1011000b = 0x58 I2C address, as on some
ASICs the FRU data lives there.
Cc: Alex Deucher <[email protected]>
Cc: Kent Russell <[email protected]>
Signed-off-by: Luben Tuikov <[email protected]>
Tested-by: Kent Russell <[email protected]>
Reviewed-by: Kent Russell <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
drm-misc-fixes for v6.1-rc6:
- Fix error handling in vc4_atomic_commit_tail()
- Set bpc for logictechno panels.
- Fix potential memory leak in drm_dev_init()
- Fix potential null-ptr-deref in drm_vblank_destroy_worker()
- Set lima's clkname corrrectly when regulator is missing.
- Small amdgpu fix to gang submission.
- Revert hiding unregistered connectors from userspace, as it breaks on DP-MST.
- Add workaround for DP++ dual mode adaptors that don't support
i2c subaddressing.
Signed-off-by: Dave Airlie <[email protected]>
From: Maarten Lankhorst <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
Remove unused parameters and cleanup dead code.
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Clean that up a bit, no functional change.
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
The basic problem here is that it's not allowed to page fault while
holding the reservation lock.
So it can happen that multiple processes try to validate an userptr
at the same time.
Work around that by putting the HMM range object into the mutex
protected bo list for now.
v2: make sure range is set to NULL in case of an error
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
CC: [email protected]
Signed-off-by: Alex Deucher <[email protected]>
|
|
Since switching to HMM we always need that because we no longer grab
references to the pages.
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Acked-by: Felix Kuehling <[email protected]>
CC: [email protected]
Signed-off-by: Alex Deucher <[email protected]>
|
|
git://anongit.freedesktop.org/drm/drm-misc into drm-next
drm-misc-next for 6.2:
UAPI Changes:
Cross-subsystem Changes:
Core Changes:
- atomic-helper: Add begin_fb_access and end_fb_access hooks
- fb-helper: Rework to move fb emulation into helpers
- scheduler: rework entity flush, kill and fini
- ttm: Optimize pool allocations
Driver Changes:
- amdgpu: scheduler rework
- hdlcd: Switch to DRM-managed resources
- ingenic: Fix registration error path
- lcdif: FIFO threshold tuning
- meson: Fix return type of cvbs' mode_valid
- ofdrm: multiple fixes (kconfig, types, endianness)
- sun4i: A100 and D1 support
- panel:
- New Panel: Jadard JD9365DA-H3
Signed-off-by: Dave Airlie <[email protected]>
From: Maxime Ripard <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/20221110083612.g63eaocoaa554soh@houat
|
|
The state of VRAM is unreliable due to a PCI event like AER, link reset
or DPC.
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Re-submitting IBs by the kernel has many problems because pre-
requisite state is not automatically re-created as well. In
other words neither binary semaphores nor things like ring
buffer pointers are in the state they should be when the
hardware starts to work on the IBs again.
Additional to that even after more than 5 years of
developing this feature it is still not stable and we have
massively problems getting the reference counts right.
As discussed with user space developers this behavior is not
helpful in the first place. For graphics and multimedia
workloads it makes much more sense to either completely
re-create the context or at least re-submitting the IBs
from userspace.
For compute use cases re-submitting is also not very
helpful since userspace must rely on the accuracy of
the result.
Because of this we stop this practice and instead just
properly note that the fence submission was canceled. The
only use case we keep the re-submission for now is SRIOV
and function level resets.
v2: as suggested by Sshaoyun stop resubmitting jobs even for SRIOV
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
This reverts commit e6c6338f393b74ac0b303d567bb918b44ae7ad75.
This feature basically re-submits one job after another to
figure out which one was the one causing a hang.
This is obviously incompatible with gang-submit which requires
that multiple jobs run at the same time. It's also absolutely
not helpful to crash the hardware multiple times if a clean
recovery is desired.
For testing and debugging environments we should rather disable
recovery alltogether to be able to inspect the state with a hw
debugger.
Additional to that the sw implementation is clearly buggy and causes
reference count issues for the hardware fence.
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
recovery is completed
Amdgpu_ras_set_error_query_ready is called at the start of
amdgpu_device_gpu_recover to disable query ras error, but the
code behind only enables query ras error in full reset path,
but not in soft reset path, emergency restart path and skip
the hardware reset path.
Signed-off-by: YiPeng Chai <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|