blaster4385/linux-IllusionX - Linux kernel with personal config changes for arch linux

Age	Commit message (Collapse)	Author	Files	Lines
2024-06-14	drm/amdgpu: revert "take runtime pm reference when we attach a buffer" v2	Christian König	1	-2/+0
	This reverts commit b8c415e3bf98 ("drm/amdgpu: take runtime pm reference when we attach a buffer") and commit 425285d39afd ("drm/amdgpu: add amdgpu runpm usage trace for separate funcs"). Taking a runtime pm reference for DMA-buf is actually completely unnecessary and even dangerous. The problem is that calling pm_runtime_get_sync() from the DMA-buf callbacks is illegal because we have the reservation locked here which is also taken during resume. So this would deadlock. When the buffer is in GTT it is still accessible even when the GPU is powered down and when it is in VRAM the buffer gets migrated to GTT before powering down. The only use case which would make it mandatory to keep the runtime pm reference would be if we pin the buffer into VRAM, and that's not something we currently do. v2: improve the commit message Signed-off-by: Christian König <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]> CC: [email protected]
2024-06-14	drm/amdgpu: Skip coredump during resets for debug	Lijo Lazar	1	-0/+1
	Skip scheduling coredump when gpu reset is intentionally triggered through debugfs. Signed-off-by: Lijo Lazar <[email protected]> Reviewed-by: Asad Kamal <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-06-14	drm/amdgpu: add reset source in various cases	Eric Huang	1	-0/+1
	To fullfill the reset event description. Suggested-by: Lijo Lazar <[email protected]> Signed-off-by: Eric Huang <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-02-22	drm/amdgpu: Simplify the allocation of fence slab caches	Kunwu Chan	1	-3/+1
	Use the new KMEM_CACHE() macro instead of direct kmem_cache_create to simplify the creation of SLAB caches. Reviewed-by: Christian König <[email protected]> Signed-off-by: Kunwu Chan <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-11-17	drm/amdgpu: add amdgpu runpm usage trace for separate funcs	Prike Liang	1	-0/+2
	Add trace for amdgpu runpm separate funcs usage and this will help debugging on the case of runpm usage missed to dereference. In the normal case the runpm usage count referred by one kind of functionality pairwise and usage should be changed from 1 to 0, otherwise there will be an issue in the amdgpu runpm usage dereference. Signed-off-by: Prike Liang <[email protected]> Acked-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-09-20	drm/amdgpu: Use function for IP version check	Lijo Lazar	1	-1/+2
	Use an inline function for version check. Gives more flexibility to handle any format changes. Signed-off-by: Lijo Lazar <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-08-16	drm/amdgpu: skip fence GFX interrupts disable/enable for S0ix	Tim Huang	1	-2/+39
	GFX v11.0.1 reported fence fallback timer expired issue on SDMA and GFX rings after S0ix resume. This is generated by EOP interrupts are disabled when S0ix suspend but fails to re-enable when resume because of the GFX is in GFXOFF. [ 203.349571] [drm] Fence fallback timer expired on ring sdma0 [ 203.349572] [drm] Fence fallback timer expired on ring gfx_0.0.0 [ 203.861635] [drm] Fence fallback timer expired on ring gfx_0.0.0 For S0ix, GFX is in GFXOFF state, avoid to touch the GFX registers to configure the fence driver interrupts for rings that belong to GFX. The interrupts configuration will be restored by GFXOFF exit. Signed-off-by: Tim Huang <[email protected]> Reviewed-by: Mario Limonciello <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-15	drm/amdgpu: mark force completed fences with -ECANCELED	Christian König	1	-0/+1
	When we force complete fences we should mark them as canceled. Signed-off-by: Christian König <[email protected]> Reviewed-by: Luben Tuikov <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-15	drm/amdgpu: add amdgpu_error_* debugfs file	Christian König	1	-0/+24
	This allows us to insert some error codes into the bottom of the pipeline on an engine. Signed-off-by: Christian König <[email protected]> Reviewed-by: Luben Tuikov <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09	drm/amdgpu: remove unnecessary (void*) conversions	Su Hui	1	-1/+1
	No need cast (void) to (struct amdgpu_device ). Signed-off-by: Su Hui <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09	drm/amdgpu: skip disabling fence driver src_irqs when device is unplugged	Guchun Chen	1	-1/+2
	When performing device unbind or halt, we have disabled all irqs at the very begining like amdgpu_pci_remove or amdgpu_device_halt. So amdgpu_irq_put for irqs stored in fence driver should not be called any more, otherwise, below calltrace will arrive. [ 139.114088] WARNING: CPU: 2 PID: 1550 at drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c:616 amdgpu_irq_put+0xf6/0x110 [amdgpu] [ 139.114655] Call Trace: [ 139.114655] <TASK> [ 139.114657] amdgpu_fence_driver_hw_fini+0x93/0x130 [amdgpu] [ 139.114836] amdgpu_device_fini_hw+0xb6/0x350 [amdgpu] [ 139.114955] amdgpu_driver_unload_kms+0x51/0x70 [amdgpu] [ 139.115075] amdgpu_pci_remove+0x63/0x160 [amdgpu] [ 139.115193] ? __pm_runtime_resume+0x64/0x90 [ 139.115195] pci_device_remove+0x3a/0xb0 [ 139.115197] device_remove+0x43/0x70 [ 139.115198] device_release_driver_internal+0xbd/0x140 Signed-off-by: Guchun Chen <[email protected]> Acked-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09	drm/amdgpu: improve wait logic at fence polling	Alex Sierra	1	-7/+4
	Accomplish this by reading the seq number right away instead of sleep for 5us. There are certain cases where the fence is ready almost immediately. Sleep number granularity was also reduced as the majority of the kiq tlb flush takes between 2us to 6us. Signed-off-by: Alex Sierra <[email protected]> Acked-by: Felix Kuehling <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09	drm/amd/amdgpu: Fix errors & warnings in amdgpu _bios, _cs, _dma_buf, _fence.c	Srinivasan Shanmugam	1	-6/+8
	The following checkpatch errors & warning is removed. ERROR: else should follow close brace '}' ERROR: trailing statements should be on next line WARNING: Prefer 'unsigned int' to bare use of 'unsigned' WARNING: Possible repeated word: 'Fences' WARNING: Missing a blank line after declarations WARNING: braces {} are not necessary for single statement blocks WARNING: Comparisons should place the constant on the right side of the test WARNING: printk() should include KERN_<LEVEL> facility level Cc: Christian König <[email protected]> Cc: Alex Deucher <[email protected]> Signed-off-by: Srinivasan Shanmugam <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-03-23	drm/amdgpu: Force signal hw_fences that are embedded in non-sched jobs	YuBiao Wang	1	-0/+9
	[Why] For engines not supporting soft reset, i.e. VCN, there will be a failed ib test before mode 1 reset during asic reset. The fences in this case are never signaled and next time when we try to free the sa_bo, kernel will hang. [How] During pre_asic_reset, driver will clear job fences and afterwards the fences' refcount will be reduced to 1. For drm_sched_jobs it will be released in job_free_cb, and for non-sched jobs like ib_test, it's meant to be released in sa_bo_free but only when the fences are signaled. So we have to force signal the non_sched bad job's fence during pre_asic_reset or the clear is not complete. Signed-off-by: YuBiao Wang <[email protected]> Acked-by: Luben Tuikov <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-02-08	drm/amdgpu/fence: Fix oops due to non-matching drm_sched init/fini	Guilherme G. Piccoli	1	-1/+7
	Currently amdgpu calls drm_sched_fini() from the fence driver sw fini routine - such function is expected to be called only after the respective init function - drm_sched_init() - was executed successfully. Happens that we faced a driver probe failure in the Steam Deck recently, and the function drm_sched_fini() was called even without its counter-part had been previously called, causing the following oops: amdgpu: probe of 0000:04:00.0 failed with error -110 BUG: kernel NULL pointer dereference, address: 0000000000000090 PGD 0 P4D 0 Oops: 0002 [#1] PREEMPT SMP NOPTI CPU: 0 PID: 609 Comm: systemd-udevd Not tainted 6.2.0-rc3-gpiccoli #338 Hardware name: Valve Jupiter/Jupiter, BIOS F7A0113 11/04/2022 RIP: 0010:drm_sched_fini+0x84/0xa0 [gpu_sched] [...] Call Trace: <TASK> amdgpu_fence_driver_sw_fini+0xc8/0xd0 [amdgpu] amdgpu_device_fini_sw+0x2b/0x3b0 [amdgpu] amdgpu_driver_release_kms+0x16/0x30 [amdgpu] devm_drm_dev_init_release+0x49/0x70 [...] To prevent that, check if the drm_sched was properly initialized for a given ring before calling its fini counter-part. Notice ideally we'd use sched.ready for that; such field is set as the latest thing on drm_sched_init(). But amdgpu seems to "override" the meaning of such field - in the above oops for example, it was a GFX ring causing the crash, and the sched.ready field was set to true in the ring init routine, regardless of the state of the DRM scheduler. Hence, we ended-up using sched.ops as per Christian's suggestion [0], and also removed the no_scheduler check [1]. [0] https://lore.kernel.org/amd-gfx/[email protected]/ [1] https://lore.kernel.org/amd-gfx/[email protected]/ Fixes: 067f44c8b459 ("drm/amdgpu: avoid over-handle of fence driver fini in s3 test (v2)") Suggested-by: Christian König <[email protected]> Cc: Guchun Chen <[email protected]> Cc: Luben Tuikov <[email protected]> Cc: Mario Limonciello <[email protected]> Reviewed-by: Luben Tuikov <[email protected]> Signed-off-by: Guilherme G. Piccoli <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected]
2022-12-02	drm/amdgpu: MCBP based on DRM scheduler (v9)	Jiadong.Zhu	1	-0/+54
	Trigger Mid-Command Buffer Preemption according to the priority of the software rings and the hw fence signalling condition. The muxer saves the locations of the indirect buffer frames from the software ring together with the fence sequence number in its fifo queue, and pops out those records when the fences are signalled. The locations are used to resubmit packages in preemption scenarios by coping the chunks from the software ring. v2: Update comment style. v3: Fix conflict caused by previous modifications. v4: Remove unnecessary prints. v5: Fix corner cases for resubmission cases. v6: Refactor functions for resubmission, calling fence_process in irq handler. v7: Solve conflict for removing amdgpu_sw_ring.c. v8: Add time threshold to judge if preemption request is needed. v9: Correct comment spelling. Set fence emit timestamp before rsu assignment. Cc: Christian Koenig <[email protected]> Cc: Luben Tuikov <[email protected]> Cc: Andrey Grodzovsky <[email protected]> Cc: Michel Dänzer <[email protected]> Signed-off-by: Jiadong.Zhu <[email protected]> Acked-by: Luben Tuikov <[email protected]> Acked-by: Huang Rui <[email protected]> Acked-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2022-09-27	drm/amdgpu: Remove fence_process in count_emitted	Jiadong.Zhu	1	-1/+0
	The function amdgpu_fence_count_emitted used in work_hander should not call amdgpu_fence_process which must be used in irq handler. Reviewed-by: Christian König <[email protected]> Signed-off-by: Jiadong.Zhu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2022-07-25	drm/amd: Fix typo 'the the' in comment	Slark Xiao	1	-1/+1
	Replace 'the the' with 'the' in the comment. Signed-off-by: Slark Xiao <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2022-07-13	drm/amdgpu: support reset flag set for gpu reset	Likun Gao	1	-1/+8
	Move reset_context out of gpu recover function to make it configurable for different reset purpose. For the reset way of call gpu_recovery sysfs, force to use full reset method. Otherwise, try soft reset by default if the related ASIC supportted, if soft reset failed, will use full reset. Signed-off-by: Likun Gao <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2022-06-28	drm/amdgpu: Follow up change to previous drm scheduler change.	Andrey Grodzovsky	1	-1/+6
	Align refcount behaviour for amdgpu_job embedded HW fence with classic pointer style HW fences by increasing refcount each time emit is called so amdgpu code doesn't need to make workarounds using amdgpu_job.job_run_counter to keep the HW fence refcount balanced. Also since in the previous patch we resumed setting s_fence->parent to NULL in drm_sched_stop switch to directly checking if job->hw_fence is signaled to short circuit reset if already signed. Signed-off-by: Andrey Grodzovsky <[email protected]> Tested-by: Yiqing Yao <[email protected]> Acked-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2022-06-28	drm/amdgpu: Prevent race between late signaled fences and GPU reset.	Andrey Grodzovsky	1	-0/+18
	Problem: After we start handling timed out jobs we assume there fences won't be signaled but we cannot be sure and sometimes they fire late. We need to prevent concurrent accesses to fence array from amdgpu_fence_driver_clear_job_fences during GPU reset and amdgpu_fence_process from a late EOP interrupt. Fix: Before accessing fence array in GPU disable EOP interrupt and flush all pending interrupt handlers for amdgpu device's interrupt line. v2: Switch from irq_get/put to full enable/disable_irq for amdgpu Signed-off-by: Andrey Grodzovsky <[email protected]> Acked-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2022-06-28	drm/amdgpu: Add put fence in amdgpu_fence_driver_clear_job_fences	Andrey Grodzovsky	1	-1/+3
	This function should drop the fence refcount when it extracts the fence from the fence array, just as it's done in amdgpu_fence_process. Signed-off-by: Andrey Grodzovsky <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2022-06-10	drm/amdgpu: Rename amdgpu_device_gpu_recover_imp back to ↵	Andrey Grodzovsky	1	-1/+1
	amdgpu_device_gpu_recover We removed the wrapper that was queueing the recover function into reset domain queue who was using this name. Signed-off-by: Andrey Grodzovsky <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2022-06-10	drm/amdgpu: Add work_struct for GPU reset from debugfs	Andrey Grodzovsky	1	-2/+17
	We need to have a work_struct to cancel this reset if another already in progress. Signed-off-by: Andrey Grodzovsky <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2022-05-04	drm/amdgpu: assign the cpu/gpu address of fence from ring	Jack Xiao	1	-2/+2
	assign the cpu/gpu address of fence for the normal or mes ring from ring structure. Signed-off-by: Jack Xiao <[email protected]> Acked-by: Christian König <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2022-02-09	drm/amdgpu: Move scheduler init to after XGMI is ready	Andrey Grodzovsky	1	-38/+5
	Before we initialize schedulers we must know which reset domain are we in - for single device there iis a single domain per device and so single wq per device. For XGMI the reset domain spans the entire XGMI hive and so the reset wq is per hive. Signed-off-by: Andrey Grodzovsky <[email protected]> Reviewed-by: Christian König <[email protected]> Link: https://www.spinics.net/lists/amd-gfx/msg74112.html
2022-01-09	Revert "drm/amdgpu: stop scheduler when calling hw_fini (v2)"	Len Brown	1	-8/+0
	This reverts commit f7d6779df642720e22bffd449e683bb8690bd3bf. This bisected regression has impacted suspend-resume stability since 5.15-rc1. It regressed -stable via 5.14.10. Link: https://bugzilla.kernel.org/show_bug.cgi?id=215315 Fixes: f7d6779df64 ("drm/amdgpu: stop scheduler when calling hw_fini (v2)") Cc: Guchun Chen <[email protected]> Cc: Andrey Grodzovsky <[email protected]> Cc: Christian Koenig <[email protected]> Cc: Alex Deucher <[email protected]> Cc: <[email protected]> # 5.14+ Signed-off-by: Len Brown <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-12-17	drm/amdgpu: introduce new amdgpu_fence object to indicate the job embedded fence	Huang Rui	1	-39/+87
	The job embedded fence donesn't initialize the flags at dma_fence_init(). Then we will go a wrong way in amdgpu_fence_get_timeline_name callback and trigger a null pointer panic once we enabled the trace event here. So introduce new amdgpu_fence object to indicate the job embedded fence. [ 156.131790] BUG: kernel NULL pointer dereference, address: 00000000000002a0 [ 156.131804] #PF: supervisor read access in kernel mode [ 156.131811] #PF: error_code(0x0000) - not-present page [ 156.131817] PGD 0 P4D 0 [ 156.131824] Oops: 0000 [#1] PREEMPT SMP PTI [ 156.131832] CPU: 6 PID: 1404 Comm: sdma0 Tainted: G OE 5.16.0-rc1-custom #1 [ 156.131842] Hardware name: Gigabyte Technology Co., Ltd. Z170XP-SLI/Z170XP-SLI-CF, BIOS F20 11/04/2016 [ 156.131848] RIP: 0010:strlen+0x0/0x20 [ 156.131859] Code: 89 c0 c3 0f 1f 80 00 00 00 00 48 01 fe eb 0f 0f b6 07 38 d0 74 10 48 83 c7 01 84 c0 74 05 48 39 f7 75 ec 31 c0 c3 48 89 f8 c3 <80> 3f 00 74 10 48 89 f8 48 83 c0 01 80 38 00 75 f7 48 29 f8 c3 31 [ 156.131872] RSP: 0018:ffff9bd0018dbcf8 EFLAGS: 00010206 [ 156.131880] RAX: 00000000000002a0 RBX: ffff8d0305ef01b0 RCX: 000000000000000b [ 156.131888] RDX: ffff8d03772ab924 RSI: ffff8d0305ef01b0 RDI: 00000000000002a0 [ 156.131895] RBP: ffff9bd0018dbd60 R08: ffff8d03002094d0 R09: 0000000000000000 [ 156.131901] R10: 000000000000005e R11: 0000000000000065 R12: ffff8d03002094d0 [ 156.131907] R13: 000000000000001f R14: 0000000000070018 R15: 0000000000000007 [ 156.131914] FS: 0000000000000000(0000) GS:ffff8d062ed80000(0000) knlGS:0000000000000000 [ 156.131923] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 156.131929] CR2: 00000000000002a0 CR3: 000000001120a005 CR4: 00000000003706e0 [ 156.131937] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 156.131942] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 156.131949] Call Trace: [ 156.131953] <TASK> [ 156.131957] ? trace_event_raw_event_dma_fence+0xcc/0x200 [ 156.131973] ? ring_buffer_unlock_commit+0x23/0x130 [ 156.131982] dma_fence_init+0x92/0xb0 [ 156.131993] amdgpu_fence_emit+0x10d/0x2b0 [amdgpu] [ 156.132302] amdgpu_ib_schedule+0x2f9/0x580 [amdgpu] [ 156.132586] amdgpu_job_run+0xed/0x220 [amdgpu] v2: fix mismatch warning between the prototype and function name (Ray, kernel test robot) Signed-off-by: Huang Rui <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-10-08	drm/amdgpu: use adev_to_drm for consistency when accessing drm_device	Guchun Chen	1	-1/+1
	adev_to_drm is used everywhere, so improve recent changes when accessing drm_device pointer from amdgpu_device. Signed-off-by: Guchun Chen <[email protected]> Acked-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-09-14	Merge drm/drm-next into drm-misc-next	Maxime Ripard	1	-22/+80
	Kickstart new drm-misc-next cycle. Signed-off-by: Maxime Ripard <[email protected]>
2021-09-02	dma-buf: nuke DMA_FENCE_TRACE macros v2	Christian König	1	-9/+1
	Only the DRM GPU scheduler, radeon and amdgpu where using them and they depend on a non existing config option to actually emit some code. v2: keep the signal path as is for now Signed-off-by: Christian König <[email protected]> Acked-by: Daniel Vetter <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2021-08-31	drm/amdgpu: stop scheduler when calling hw_fini (v2)	Guchun Chen	1	-0/+8
	This gurantees no more work on the ring can be submitted to hardware in suspend/resume case, otherwise a potential race will occur and the ring will get no chance to stay empty before suspend. v2: Call drm_sched_resubmit_job before drm_sched_start to restart jobs from the pending list. Suggested-by: Andrey Grodzovsky <[email protected]> Suggested-by: Christian König <[email protected]> Signed-off-by: Guchun Chen <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected]
2021-08-16	drm/amd/amdgpu embed hw_fence into amdgpu_job	Jack Zhang	1	-19/+67
	Why: Previously hw fence is alloced separately with job. It caused historical lifetime issues and corner cases. The ideal situation is to take fence to manage both job and fence's lifetime, and simplify the design of gpu-scheduler. How: We propose to embed hw_fence into amdgpu_job. 1. We cover the normal job submission by this method. 2. For ib_test, and submit without a parent job keep the legacy way to create a hw fence separately. v2: use AMDGPU_FENCE_FLAG_EMBED_IN_JOB_BIT to show that the fence is embedded in a job. v3: remove redundant variable ring in amdgpu_job v4: add tdr sequence support for this feature. Add a job_run_counter to indicate whether this job is a resubmit job. v5 add missing handling in amdgpu_fence_enable_signaling Signed-off-by: Jingwen Chen <[email protected]> Signed-off-by: Jack Zhang <[email protected]> Reviewed-by: Andrey Grodzovsky <[email protected]> Reviewed by: Monk Liu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-08-05	drm/amdgpu: avoid over-handle of fence driver fini in s3 test (v2)	Guchun Chen	1	-5/+7
	In amdgpu_fence_driver_hw_fini, no need to call drm_sched_fini to stop scheduler in s3 test, otherwise, fence related failure will arrive after resume. To fix this and for a better clean up, move drm_sched_fini from fence_hw_fini to fence_sw_fini, as it's part of driver shutdown, and should never be called in hw_fini. v2: rename amdgpu_fence_driver_init to amdgpu_fence_driver_sw_init, to keep sw_init and sw_fini paired. Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1668 Fixes: 8d35a2596164c1 ("drm/amdgpu: adjust fence driver enable sequence") Suggested-by: Christian König <[email protected]> Tested-by: Mike Lothian <[email protected]> Signed-off-by: Guchun Chen <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-07-30	Merge tag 'amd-drm-next-5.15-2021-07-29' of ↵	Dave Airlie	1	-39/+5
	https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-5.15-2021-07-29: amdgpu: - VCN/JPEG power down sequencing fixes - Various navi pcie link handling fixes - Clockgating fixes - Yellow Carp fixes - Beige Goby fixes - Misc code cleanups - S0ix fixes - SMU i2c bus rework - EEPROM handling rework - PSP ucode handling cleanup - SMU error handling rework - AMD HDMI freesync fixes - USB PD firmware update rework - MMIO based vram access rework - Misc display fixes - Backlight fixes - Add initial Cyan Skillfish support - Overclocking fixes suspend/resume amdkfd: - Sysfs leak fix - Add counters for vm faults and migration - GPUVM TLB optimizations radeon: - Misc fixes Signed-off-by: Dave Airlie <[email protected]> From: Alex Deucher <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2021-07-28	drm/amdgpu: adjust fence driver enable sequence	Likun Gao	1	-39/+5
	Fence driver was enabled per ring when sw init on per IP block before. Change to enable all the fence driver at the same time after amdgpu_device_ip_init finished. Rename some function related to fence to make it reasonable for read. Signed-off-by: Likun Gao <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-07-01	drm/sched: Allow using a dedicated workqueue for the timeout/fault tdr	Boris Brezillon	1	-1/+1
	Mali Midgard/Bifrost GPUs have 3 hardware queues but only a global GPU reset. This leads to extra complexity when we need to synchronize timeout works with the reset work. One solution to address that is to have an ordered workqueue at the driver level that will be used by the different schedulers to queue their timeout work. Thanks to the serialization provided by the ordered workqueue we are guaranteed that timeout handlers are executed sequentially, and can thus easily reset the GPU from the timeout handler without extra synchronization. v5: * Add a new paragraph to the timedout_job() method v3: * New patch v4: * Actually use the timeout_wq to queue the timeout work Suggested-by: Daniel Vetter <[email protected]> Signed-off-by: Boris Brezillon <[email protected]> Reviewed-by: Steven Price <[email protected]> Reviewed-by: Lucas Stach <[email protected]> Acked-by: Daniel Vetter <[email protected]> Acked-by: Christian König <[email protected]> Cc: Qiang Yu <[email protected]> Cc: Emma Anholt <[email protected]> Cc: Alex Deucher <[email protected]> Cc: "Christian König" <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2021-05-22	Merge drm/drm-next into drm-misc-next	Thomas Zimmermann	1	-0/+1
	Backmerging from drm/drm-next to the patches for AMD devices for v5.14. Signed-off-by: Thomas Zimmermann <[email protected]>
2021-05-19	drm/amdgpu: Fix hang on device removal.	Andrey Grodzovsky	1	-6/+10
	If removing while commands in flight you cannot wait to flush the HW fences on a ring since the device is gone. Signed-off-by: Andrey Grodzovsky <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2021-05-19	drm/amdgpu: Split amdgpu_device_fini into early and late	Andrey Grodzovsky	1	-1/+14
	Some of the stuff in amdgpu_device_fini such as HW interrupts disable and pending fences finilization must be done right away on pci_remove while most of the stuff which relates to finilizing and releasing driver data structures can be kept until drm_driver.release hook is called, i.e. when the last device reference is dropped. v4: Change functions prefix early->hw and late->sw Signed-off-by: Andrey Grodzovsky <[email protected]> Acked-by: Christian König <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2021-04-20	drm/amd/amdgpu/amdgpu_fence: Provide description for 'sched_score'	Lee Jones	1	-0/+1
	Fixes the following W=1 kernel build warning(s): drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c:444: warning: Function parameter or member 'sched_score' not described in 'amdgpu_fence_driver_init_ring' Cc: Alex Deucher <[email protected]> Cc: "Christian König" <[email protected]> Cc: David Airlie <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: Sumit Semwal <[email protected]> Cc: Jerome Glisse <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Reviewed-by: Christian König <[email protected]> Signed-off-by: Lee Jones <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-04-09	drm/amdgpu: add the sched_score to amdgpu_ring_init	Christian König	1	-24/+26
	Allow separate ring to share the same scheduler score. No functional change. Signed-off-by: Christian König <[email protected]> Reviewed-and-Tested-by: Leo Liu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-03-26	Merge tag 'amd-drm-next-5.13-2021-03-23' of ↵	Daniel Vetter	1	-29/+22
	https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-5.13-2021-03-23: amdgpu: - Debugfs cleanup - Various cleanups and spelling fixes - Flexible array cleanups - Initial AMD Freesync HDMI - Display fixes - 10bpc dithering improvements - Display ASSR support - Clean up and unify powerplay and swsmu interfaces - Vangogh fixes - Add SMU gfx busy queues for RV/PCO - PCIE DPM fixes - S0ix fixes - GPU metrics data fixes - DCN secure display support - Backlight type override - Add initial support for Aldebaran - RAS fixes - Prime fixes for A+A systems - Reset fixes - Initial resource cursor support - Drop legacy IO BAR requirements - Various power fixes amdkfd: - MMU notifier fixes - APU fixes radeon: - Debugfs cleanups - Flexible array cleanups UAPI: - amdgpu: Add a new INFO ioctl interface to query video capabilities rather than hardcoding them in userspace. This allows us to provide fine grained asic capabilities (e.g., if a particular part is bandwidth limited, we can limit the capabilities). Proposed userspace: https://gitlab.freedesktop.org/leoliu/drm/-/commits/info_video_caps https://gitlab.freedesktop.org/leoliu/mesa/-/commits/info_video_caps - amdkfd: bump the driver version. There was a problem with reporting some RAS features on older versions of the driver. Proposed userspace: https://github.com/RadeonOpenCompute/ROCT-Thunk-Interface/commit/7cdd63475c36bb9f49bb960f90f9a8cdb7e80a21 Danvet: A bunch of conflicts all over, but it seems to compile ... I did put the call to dc_allow_idle_optimizations() on a single line since it looked a bit too jarring to be left alone. Signed-off-by: Daniel Vetter <[email protected]> From: Alex Deucher <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2021-03-05	drm/amdgpu: Fix some unload driver issues	Emily Deng	1	-2/+3
	When unloading driver after killing some applications, it will hit sdma flush tlb job timeout which is called by ttm_bo_delay_delete. So to avoid the job submit after fence driver fini, call ttm_bo_lock_delayed_workqueue before fence driver fini. And also put drm_sched_fini before waiting fence. Signed-off-by: Emily Deng <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-02-18	drm/amdgpu: do not use drm middle layer for debugfs	Nirmoy Das	1	-27/+19
	Use debugfs API directly instead of drm middle layer. This also includes following debugfs file output changes: 1 amdgpu_evict_vram/amdgpu_evict_gtt output will not contain any braces. e.g. (0) --> 0 2 amdgpu_gpu_recover output will print return value of amdgpu_device_gpu_recover() instead of not so important "gpu recover" message. v2: * checkpatch.pl: use '0444' instead of S_IRUGO. * remove S_IFREG from mode. * remove mode variable. Signed-off-by: Nirmoy Das <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-02-05	drm/scheduler: provide scheduler score externally	Christian König	1	-1/+1
	Allow multiple schedulers to share the load balancing score. This is useful when one engine has different hw rings. Signed-off-by: Christian König <[email protected]> Reviewed-and-Tested-by: Leo Liu <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2020-11-13	drm/amd/amdgpu/amdgpu_fence: Fix some issues pertaining to function ↵	Lee Jones	1	-5/+6
	documentation Fixes the following W=1 kernel build warning(s): drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c:139: warning: Function parameter or member 'flags' not described in 'amdgpu_fence_emit' drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c:197: warning: Function parameter or member 'timeout' not described in 'amdgpu_fence_emit_polling' drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c:302: warning: Function parameter or member 't' not described in 'amdgpu_fence_fallback' drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c:302: warning: Excess function parameter 'work' description in 'amdgpu_fence_fallback' drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c:320: warning: Excess function parameter 'adev' description in 'amdgpu_fence_wait_empty' drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c:649: warning: Function parameter or member 'f' not described in 'amdgpu_fence_enable_signaling' drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c:649: warning: Excess function parameter 'fence' description in 'amdgpu_fence_enable_signaling' drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c:684: warning: Function parameter or member 'f' not described in 'amdgpu_fence_release' drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c:684: warning: Excess function parameter 'fence' description in 'amdgpu_fence_release' drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c:749: warning: Function parameter or member 'm' not described in 'amdgpu_debugfs_gpu_recover' drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c:749: warning: Function parameter or member 'data' not described in 'amdgpu_debugfs_gpu_recover' Cc: Alex Deucher <[email protected]> Cc: "Christian König" <[email protected]> Cc: David Airlie <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: Sumit Semwal <[email protected]> Cc: Jerome Glisse <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Signed-off-by: Lee Jones <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-08-24	drm/amdgpu: Get DRM dev from adev by inline-f	Luben Tuikov	1	-3/+3
	Add a static inline adev_to_drm() to obtain the DRM device pointer from an amdgpu_device pointer. Signed-off-by: Luben Tuikov <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-08-24	drm/amdgpu: drm_device to amdgpu_device by inline-f (v2)	Luben Tuikov	1	-2/+2
	Get the amdgpu_device from the DRM device by use of an inline function, drm_to_adev(). The inline function resolves a pointer to struct drm_device to a pointer to struct amdgpu_device. v2: Use a typed visible static inline function instead of an invisible macro. Signed-off-by: Luben Tuikov <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-07-14	drm/amdgpu: use ARRAY_SIZE() to add amdgpu debugfs files	Xiaojie Yuan	1	-2/+4
	to easily add new debugfs file w/o changing the hardcoded list count. Signed-off-by: Xiaojie Yuan <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>