blaster4385/linux-IllusionX - Linux kernel with personal config changes for arch linux

Age	Commit message (Collapse)	Author	Files	Lines
2024-11-04	drm/amd/display: parse umc_info or vram_info based on ASIC	Aurabindo Pillai	1	-1/+3
	An upstream bug report suggests that there are production dGPUs that are older than DCN401 but still have a umc_info in VBIOS tables with the same version as expected for a DCN401 product. Hence, reading this tables should be guarded with a version check. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3678 Reviewed-by: Dillon Varone <[email protected]> Signed-off-by: Aurabindo Pillai <[email protected]> Signed-off-by: Zaeem Mohamed <[email protected]> Tested-by: Daniel Wheeler <[email protected]> Signed-off-by: Alex Deucher <[email protected]> (cherry picked from commit 2551b4a321a68134360b860113dd460133e856e5) Fixes: 00c391102abc ("drm/amd/display: Add misc DC changes for DCN401") Cc: [email protected] # 6.11.x
2024-11-04	drm/amd/display: Fix brightness level not retained over reboot	Tom Chung	1	-0/+15
	[Why] During boot up and resume the DC layer will reset the panel brightness to fix a flicker issue. It will cause the dm->actual_brightness is not the current panel brightness level. (the dm->brightness is the correct panel level) [How] Set the backlight level after do the set mode. Cc: Mario Limonciello <[email protected]> Cc: Alex Deucher <[email protected]> Fixes: d9e865826c20 ("drm/amd/display: Simplify brightness initialization") Reported-by: Mark Herbert <[email protected]> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3655 Reviewed-by: Sun peng Li <[email protected]> Signed-off-by: Tom Chung <[email protected]> Signed-off-by: Zaeem Mohamed <[email protected]> Tested-by: Daniel Wheeler <[email protected]> Signed-off-by: Alex Deucher <[email protected]> (cherry picked from commit 7875afafba84817b791be6d2282b836695146060) Cc: [email protected]
2024-11-04	drm/xe/guc/tlb: Flush g2h worker in case of tlb timeout	Nirmoy Das	1	-0/+2
	Flush the g2h worker explicitly if TLB timeout happens which is observed on LNL and that points to the recent scheduling issue with E-cores on LNL. This is similar to the recent fix: commit e51527233804 ("drm/xe/guc/ct: Flush g2h worker in case of g2h response timeout") and should be removed once there is E core scheduling fix. v2: Add platform check(Himal) v3: Remove gfx platform check as the issue related to cpu platform(John) Use the common WA macro(John) and print when the flush resolves timeout(Matt B) v4: Remove the resolves log and do the flush before taking pending_lock(Matt A) Cc: Badal Nilawar <[email protected]> Cc: Matthew Brost <[email protected]> Cc: Matthew Auld <[email protected]> Cc: John Harrison <[email protected]> Cc: Himal Prasad Ghimiray <[email protected]> Cc: Lucas De Marchi <[email protected]> Cc: [email protected] # v6.11+ Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/2687 Signed-off-by: Nirmoy Das <[email protected]> Reviewed-by: Matthew Auld <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Lucas De Marchi <[email protected]> (cherry picked from commit e1f6fa55664a0eeb0a641f497e1adfcf6672e995) Signed-off-by: Lucas De Marchi <[email protected]>
2024-11-04	drm/xe/ufence: Flush xe ordered_wq in case of ufence timeout	Nirmoy Das	1	-0/+7
	Flush xe ordered_wq in case of ufence timeout which is observed on LNL and that points to recent scheduling issue with E-cores. This is similar to the recent fix: commit e51527233804 ("drm/xe/guc/ct: Flush g2h worker in case of g2h response timeout") and should be removed once there is a E-core scheduling fix for LNL. v2: Add platform check(Himal) s/__flush_workqueue/flush_workqueue(Jani) v3: Remove gfx platform check as the issue related to cpu platform(John) v4: Use the Common macro(John) and print when the flush resolves timeout(Matt B) Cc: Badal Nilawar <[email protected]> Cc: Matthew Auld <[email protected]> Cc: John Harrison <[email protected]> Cc: Himal Prasad Ghimiray <[email protected]> Cc: Lucas De Marchi <[email protected]> Cc: [email protected] # v6.11+ Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/2754 Suggested-by: Matthew Brost <[email protected]> Signed-off-by: Nirmoy Das <[email protected]> Reviewed-by: Matthew Auld <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Lucas De Marchi <[email protected]> (cherry picked from commit 38c4c8722bd74452280951edc44c23de47612001) Signed-off-by: Lucas De Marchi <[email protected]>
2024-11-04	drm/xe: Move LNL scheduling WA to xe_device.h	Nirmoy Das	2	-10/+15
	Move LNL scheduling WA to xe_device.h so this can be used in other places without needing keep the same comment about removal of this WA in the future. The WA, which flushes work or workqueues, is now wrapped in macros and can be reused wherever needed. Cc: Badal Nilawar <[email protected]> Cc: Matthew Auld <[email protected]> Cc: Matthew Brost <[email protected]> Cc: Himal Prasad Ghimiray <[email protected]> Cc: Lucas De Marchi <[email protected]> cc: [email protected] # v6.11+ Suggested-by: John Harrison <[email protected]> Signed-off-by: Nirmoy Das <[email protected]> Reviewed-by: Matthew Auld <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Lucas De Marchi <[email protected]> (cherry picked from commit cbe006a6492c01a0058912ae15d473f4c149896c) Signed-off-by: Lucas De Marchi <[email protected]>
2024-11-04	drm/xe: Use the filelist from drm for ccs_mode change	Balasubramani Vivekanandan	3	-23/+5
	Drop the exclusive client count tracking and use the filelist from the drm to track the active clients. This also ensures the clients created internally by the driver won't block changing the ccs mode. Fixes: ce8c161cbad4 ("drm/xe: Add ref counting for xe_file") Signed-off-by: Balasubramani Vivekanandan <[email protected]> Reviewed-by: Lucas De Marchi <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Lucas De Marchi <[email protected]> (cherry picked from commit 1c35f1ed1fe3c649f8c16214d0d3dd828b5265d9) Signed-off-by: Lucas De Marchi <[email protected]>
2024-11-04	drm/xe: Set mask bits for CCS_MODE register	Balasubramani Vivekanandan	2	-1/+7
	CCS_MODE register requires setting mask bits from Xe2+ platforms. Set the mask bits unconditionally, as those bits are unused for older platforms. Signed-off-by: Balasubramani Vivekanandan <[email protected]> Cc: [email protected] # v6.11+ Reviewed-by: Lucas De Marchi <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Lucas De Marchi <[email protected]> (cherry picked from commit 23ea2c7572d4735ef66beb1e4feb8ae510b78247) [ Fix conflict with mmio refactors ] Signed-off-by: Lucas De Marchi <[email protected]>
2024-11-04	drm/imagination: Break an object reference loop	Brendan King	4	-4/+56
	When remaining resources are being cleaned up on driver close, outstanding VM mappings may result in resources being leaked, due to an object reference loop, as shown below, with each object (or set of objects) referencing the object below it: PVR GEM Object GPU scheduler "finished" fence GPU scheduler “scheduled” fence PVR driver “done” fence PVR Context PVR VM Context PVR VM Mappings PVR GEM Object The reference that the PVR VM Context has on the VM mappings is a soft one, in the sense that the freeing of outstanding VM mappings is done as part of VM context destruction; no reference counts are involved, as is the case for all the other references in the loop. To break the reference loop during cleanup, free the outstanding VM mappings before destroying the PVR Context associated with the VM context. Signed-off-by: Brendan King <[email protected]> Signed-off-by: Matt Coster <[email protected]> Reviewed-by: Frank Binns <[email protected]> Cc: [email protected] Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2024-11-04	drm/imagination: Add a per-file PVR context list	Brendan King	4	-0/+30
	This adds a linked list of VM contexts which is needed for the next patch to be able to correctly track VM contexts for destruction on file close. It is only safe for VM contexts to be removed from the list and destroyed when not in interrupt context. Signed-off-by: Brendan King <[email protected]> Signed-off-by: Matt Coster <[email protected]> Reviewed-by: Frank Binns <[email protected]> Cc: [email protected] Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2024-11-02	Merge tag 'drm-xe-fixes-2024-10-31' of ↵	Dave Airlie	5	-35/+72
	https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes Driver Changes: - Fix missing HPD interrupt enabling, bringing one PM refactor with it (Imre / Maarten) - Workaround LNL GGTT invalidation not being visible to GuC (Matthew Brost) - Avoid getting jobs stuck without a protecting timeout (Matthew Brost) Signed-off-by: Dave Airlie <[email protected]> From: Lucas De Marchi <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/tsbftadm7owyizzdaqnqu7u4tqggxgeqeztlfvmj5fryxlfomi@5m5bfv2zvzmw
2024-11-01	Merge tag 'mediatek-drm-fixes-20241028' of ↵	Dave Airlie	11	-32/+179
	https://git.kernel.org/pub/scm/linux/kernel/git/chunkuang.hu/linux into drm-fixes Mediatek DRM Fixes - 20241028 1. Fix degradation problem of alpha blending 2. Fix color format MACROs in OVL 3. Fix get efuse issue for MT8188 DPTX 4. Fix potential NULL dereference in mtk_crtc_destroy() 5. Correct dpi power-domains property 6. Add split subschema property constraints Signed-off-by: Dave Airlie <[email protected]> From: Chun-Kuang Hu <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2024-11-01	Merge tag 'amd-drm-fixes-6.12-2024-10-31' of ↵	Dave Airlie	3	-4/+7
	https://gitlab.freedesktop.org/agd5f/linux into drm-fixes amd-drm-fixes-6.12-2024-10-31: amdgpu: - DCN 3.5 fix - Vangogh SMU KASAN fix - SMU 13 profile reporting fix Signed-off-by: Dave Airlie <[email protected]> From: Alex Deucher <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2024-10-31	drm/xe: Don't short circuit TDR on jobs not started	Matthew Brost	1	-6/+12
	Short circuiting TDR on jobs not started is an optimization which is not required. On LNL we are facing an issue where jobs do not get scheduled by the GuC if it misses a GGTT page update. When this occurs let the TDR fire, toggle the scheduling which may get the job unstuck, and print a warning message. If the TDR fires twice on job that hasn't started, timeout the job. v2: - Add warning message (Paulo) - Add fixes tag (Paulo) - Timeout job which hasn't started after TDR firing twice v3: - Include local change v4: - Short circuit check_timeout on job not started - use warn level rather than notice (Paulo) Fixes: 7ddb9403dd74 ("drm/xe: Sample ctx timestamp to determine if jobs have timed out") Cc: [email protected] Cc: Paulo Zanoni <[email protected]> Signed-off-by: Matthew Brost <[email protected]> Reviewed-by: Lucas De Marchi <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Lucas De Marchi <[email protected]> (cherry picked from commit 35d25a4a0012e690ef0cc4c5440231176db595cc) Signed-off-by: Lucas De Marchi <[email protected]>
2024-10-31	drm/xe: Add mmio read before GGTT invalidate	Matthew Brost	1	-0/+10
	On LNL without a mmio read before a GGTT invalidate the GuC can incorrectly read the GGTT scratch page upon next access leading to jobs not getting scheduled. A mmio read before a GGTT invalidate seems to fix this. Since a GGTT invalidate is not a hot code path, blindly do a mmio read before each GGTT invalidate. Cc: John Harrison <[email protected]> Cc: Daniele Ceraolo Spurio <[email protected]> Cc: Thomas Hellström <[email protected]> Cc: Lucas De Marchi <[email protected]> Cc: [email protected] Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Reported-by: Paulo Zanoni <[email protected]> Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/3164 Signed-off-by: Matthew Brost <[email protected]> Reviewed-by: Lucas De Marchi <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Lucas De Marchi <[email protected]> (cherry picked from commit 5a710196883e0ac019ac6df2a6d79c16ad3c32fa) [ Fix conflict with mmio vs gt argument ] Signed-off-by: Lucas De Marchi <[email protected]>
2024-10-31	drm/tests: hdmi: Fix memory leaks in drm_display_mode_from_cea_vic()	Jinjie Ruan	1	-4/+4
	modprobe drm_hdmi_state_helper_test and then rmmod it, the following memory leak occurs. The `mode` allocated in drm_mode_duplicate() called by drm_display_mode_from_cea_vic() is not freed, which cause the memory leak: unreferenced object 0xffffff80ccd18100 (size 128): comm "kunit_try_catch", pid 1851, jiffies 4295059695 hex dump (first 32 bytes): 57 62 00 00 80 02 90 02 f0 02 20 03 00 00 e0 01 Wb........ ..... ea 01 ec 01 0d 02 00 00 0a 00 00 00 00 00 00 00 ................ backtrace (crc c2f1aa95): [<000000000f10b11b>] kmemleak_alloc+0x34/0x40 [<000000001cd4cf73>] __kmalloc_cache_noprof+0x26c/0x2f4 [<00000000f1f3cffa>] drm_mode_duplicate+0x44/0x19c [<000000008cbeef13>] drm_display_mode_from_cea_vic+0x88/0x98 [<0000000019daaacf>] 0xffffffedc11ae69c [<000000000aad0f85>] kunit_try_run_case+0x13c/0x3ac [<00000000a9210bac>] kunit_generic_run_threadfn_adapter+0x80/0xec [<000000000a0b2e9e>] kthread+0x2e8/0x374 [<00000000bd668858>] ret_from_fork+0x10/0x20 ...... Free `mode` by using drm_kunit_display_mode_from_cea_vic() to fix it. Cc: [email protected] Fixes: 4af70f19e559 ("drm/tests: Add RGB Quantization tests") Acked-by: Maxime Ripard <[email protected]> Signed-off-by: Jinjie Ruan <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Maxime Ripard <[email protected]>
2024-10-31	drm/connector: hdmi: Fix memory leak in drm_display_mode_from_cea_vic()	Jinjie Ruan	1	-12/+12
	modprobe drm_connector_test and then rmmod drm_connector_test, the following memory leak occurs. The `mode` allocated in drm_mode_duplicate() called by drm_display_mode_from_cea_vic() is not freed, which cause the memory leak: unreferenced object 0xffffff80cb0ee400 (size 128): comm "kunit_try_catch", pid 1948, jiffies 4294950339 hex dump (first 32 bytes): 14 44 02 00 80 07 d8 07 04 08 98 08 00 00 38 04 .D............8. 3c 04 41 04 65 04 00 00 05 00 00 00 00 00 00 00 <.A.e........... backtrace (crc 90e9585c): [<00000000ec42e3d7>] kmemleak_alloc+0x34/0x40 [<00000000d0ef055a>] __kmalloc_cache_noprof+0x26c/0x2f4 [<00000000c2062161>] drm_mode_duplicate+0x44/0x19c [<00000000f96c74aa>] drm_display_mode_from_cea_vic+0x88/0x98 [<00000000d8f2c8b4>] 0xffffffdc982a4868 [<000000005d164dbc>] kunit_try_run_case+0x13c/0x3ac [<000000006fb23398>] kunit_generic_run_threadfn_adapter+0x80/0xec [<000000006ea56ca0>] kthread+0x2e8/0x374 [<000000000676063f>] ret_from_fork+0x10/0x20 ...... Free `mode` by using drm_kunit_display_mode_from_cea_vic() to fix it. Cc: [email protected] Fixes: abb6f74973e2 ("drm/tests: Add HDMI TDMS character rate tests") Acked-by: Maxime Ripard <[email protected]> Signed-off-by: Jinjie Ruan <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Maxime Ripard <[email protected]>
2024-10-31	drm/tests: helpers: Add helper for drm_display_mode_from_cea_vic()	Jinjie Ruan	1	-0/+42
	As Maxime suggested, add a new helper drm_kunit_display_mode_from_cea_vic(), it can replace the direct call of drm_display_mode_from_cea_vic(), and it will help solving the `mode` memory leaks. Acked-by: Maxime Ripard <[email protected]> Suggested-by: Maxime Ripard <[email protected]> Signed-off-by: Jinjie Ruan <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Maxime Ripard <[email protected]>
2024-10-30	drm/panthor: Report group as timedout when we fail to properly suspend	Boris Brezillon	1	-4/+11
	If we don't do that, the group is considered usable by userspace, but all further GROUP_SUBMIT will fail with -EINVAL. Changes in v3: - Add R-bs Changes in v2: - New patch Fixes: de8548813824 ("drm/panthor: Add the scheduler logical block") Signed-off-by: Boris Brezillon <[email protected]> Reviewed-by: Steven Price <[email protected]> Reviewed-by: Liviu Dudau <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2024-10-30	drm/panthor: Fail job creation when the group is dead	Boris Brezillon	1	-0/+5
	Userspace can use GROUP_SUBMIT errors as a trigger to check the group state and recreate the group if it became unusable. Make sure we report an error when the group became unusable. Changes in v3: - None Changes in v2: - Add R-bs Fixes: de8548813824 ("drm/panthor: Add the scheduler logical block") Signed-off-by: Boris Brezillon <[email protected]> Reviewed-by: Steven Price <[email protected]> Reviewed-by: Liviu Dudau <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2024-10-30	drm/panthor: Fix firmware initialization on systems with a page size > 4k	Boris Brezillon	4	-8/+24
	The system and GPU MMU page size might differ, which becomes a problem for FW sections that need to be mapped at explicit addresses since our PAGE_SIZE alignment might cover a VA range that's expected to be used for another section. Make sure we never map more than we need. Changes in v3: - Add R-bs Changes in v2: - Plan for per-VM page sizes so the MCU VM and user VM can have different pages sizes Fixes: 2718d91816ee ("drm/panthor: Add the FW logical block") Signed-off-by: Boris Brezillon <[email protected]> Reviewed-by: Steven Price <[email protected]> Reviewed-by: Liviu Dudau <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2024-10-29	drm/xe/display: Add missing HPD interrupt enabling during non-d3cold RPM resume	Imre Deak	1	-0/+1
	Atm the display HPD interrupts that got disabled during runtime suspend, are re-enabled only if d3cold is enabled. Fix things by also re-enabling the interrupts if d3cold is disabled. Cc: Rodrigo Vivi <[email protected]> Reviewed-by: Jonathan Cavitt <[email protected]> Signed-off-by: Imre Deak <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] (cherry picked from commit bbc4a30de095f0349d3c278500345a1b620d495e) Signed-off-by: Lucas De Marchi <[email protected]>
2024-10-29	drm/xe/display: Separate the d3cold and non-d3cold runtime PM handling	Imre Deak	1	-5/+14
	For clarity separate the d3cold and non-d3cold runtime PM handling. The only change in behavior is disabling polling later during runtime resume. This shouldn't make a difference, since the poll disabling is handled from a work, which could run at any point wrt. the runtime resume handler. The work will also require a runtime PM reference, syncing it with the resume handler. Cc: Rodrigo Vivi <[email protected]> Reviewed-by: Jonathan Cavitt <[email protected]> Signed-off-by: Imre Deak <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] (cherry picked from commit a4de6beb83fc5adee788518350247c629568901e) Signed-off-by: Lucas De Marchi <[email protected]>
2024-10-29	drm/xe: Remove runtime argument from display s/r functions	Maarten Lankhorst	3	-28/+39
	The previous change ensures that pm_suspend is only called when suspending or resuming. This ensures no further bugs like those in the previous commit. Signed-off-by: Maarten Lankhorst <[email protected]> Reviewed-by: Lucas De Marchi <[email protected]> Reviewed-by: Vinod Govindapillai <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] (cherry picked from commit f90491d4b64e302e940133103d3d9908e70e454f) Signed-off-by: Lucas De Marchi <[email protected]>
2024-10-28	drm/amdgpu/smu13: fix profile reporting	Alex Deucher	1	-3/+3
	The following 3 commits landed in parallel: commit d7d2688bf4ea ("drm/amd/pm: update workload mask after the setting") commit 7a1613e47e65 ("drm/amdgpu/smu13: always apply the powersave optimization") commit 7c210ca5a2d7 ("drm/amdgpu: handle default profile on on devices without fullscreen 3D") While everything is set correctly, this caused the profile to be reported incorrectly because both the powersave and fullscreen3d bits were set in the mask and when the driver prints the profile, it looks for the first bit set. Fixes: d7d2688bf4ea ("drm/amd/pm: update workload mask after the setting") Reviewed-by: Kenneth Feng <[email protected]> Signed-off-by: Alex Deucher <[email protected]> (cherry picked from commit ecfe9b237687a55d596fff0650ccc8cc455edd3f) Cc: [email protected]
2024-10-28	drm/amd/pm: Vangogh: Fix kernel memory out of bounds write	Tvrtko Ursulin	1	-1/+3
	KASAN reports that the GPU metrics table allocated in vangogh_tables_init() is not large enough for the memset done in smu_cmn_init_soft_gpu_metrics(). Condensed report follows: [ 33.861314] BUG: KASAN: slab-out-of-bounds in smu_cmn_init_soft_gpu_metrics+0x73/0x200 [amdgpu] [ 33.861799] Write of size 168 at addr ffff888129f59500 by task mangoapp/1067 ... [ 33.861808] CPU: 6 UID: 1000 PID: 1067 Comm: mangoapp Tainted: G W 6.12.0-rc4 #356 1a56f59a8b5182eeaf67eb7cb8b13594dd23b544 [ 33.861816] Tainted: [W]=WARN [ 33.861818] Hardware name: Valve Galileo/Galileo, BIOS F7G0107 12/01/2023 [ 33.861822] Call Trace: [ 33.861826] <TASK> [ 33.861829] dump_stack_lvl+0x66/0x90 [ 33.861838] print_report+0xce/0x620 [ 33.861853] kasan_report+0xda/0x110 [ 33.862794] kasan_check_range+0xfd/0x1a0 [ 33.862799] __asan_memset+0x23/0x40 [ 33.862803] smu_cmn_init_soft_gpu_metrics+0x73/0x200 [amdgpu 13b1bc364ec578808f676eba412c20eaab792779] [ 33.863306] vangogh_get_gpu_metrics_v2_4+0x123/0xad0 [amdgpu 13b1bc364ec578808f676eba412c20eaab792779] [ 33.864257] vangogh_common_get_gpu_metrics+0xb0c/0xbc0 [amdgpu 13b1bc364ec578808f676eba412c20eaab792779] [ 33.865682] amdgpu_dpm_get_gpu_metrics+0xcc/0x110 [amdgpu 13b1bc364ec578808f676eba412c20eaab792779] [ 33.866160] amdgpu_get_gpu_metrics+0x154/0x2d0 [amdgpu 13b1bc364ec578808f676eba412c20eaab792779] [ 33.867135] dev_attr_show+0x43/0xc0 [ 33.867147] sysfs_kf_seq_show+0x1f1/0x3b0 [ 33.867155] seq_read_iter+0x3f8/0x1140 [ 33.867173] vfs_read+0x76c/0xc50 [ 33.867198] ksys_read+0xfb/0x1d0 [ 33.867214] do_syscall_64+0x90/0x160 ... [ 33.867353] Allocated by task 378 on cpu 7 at 22.794876s: [ 33.867358] kasan_save_stack+0x33/0x50 [ 33.867364] kasan_save_track+0x17/0x60 [ 33.867367] __kasan_kmalloc+0x87/0x90 [ 33.867371] vangogh_init_smc_tables+0x3f9/0x840 [amdgpu] [ 33.867835] smu_sw_init+0xa32/0x1850 [amdgpu] [ 33.868299] amdgpu_device_init+0x467b/0x8d90 [amdgpu] [ 33.868733] amdgpu_driver_load_kms+0x19/0xf0 [amdgpu] [ 33.869167] amdgpu_pci_probe+0x2d6/0xcd0 [amdgpu] [ 33.869608] local_pci_probe+0xda/0x180 [ 33.869614] pci_device_probe+0x43f/0x6b0 Empirically we can confirm that the former allocates 152 bytes for the table, while the latter memsets the 168 large block. Root cause appears that when GPU metrics tables for v2_4 parts were added it was not considered to enlarge the table to fit. The fix in this patch is rather "brute force" and perhaps later should be done in a smarter way, by extracting and consolidating the part version to size logic to a common helper, instead of brute forcing the largest possible allocation. Nevertheless, for now this works and fixes the out of bounds write. v2: * Drop impossible v3_0 case. (Mario) Signed-off-by: Tvrtko Ursulin <[email protected]> Fixes: 41cec40bc9ba ("drm/amd/pm: Vangogh: Add new gpu_metrics_v2_4 to acquire gpu_metrics") Cc: Mario Limonciello <[email protected]> Cc: Evan Quan <[email protected]> Cc: Wenyou Yang <[email protected]> Cc: Alex Deucher <[email protected]> Reviewed-by: Mario Limonciello <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Mario Limonciello <[email protected]> Signed-off-by: Alex Deucher <[email protected]> (cherry picked from commit 0880f58f9609f0200483a49429af0f050d281703) Cc: [email protected] # v6.6+
2024-10-28	Revert "drm/amd/display: update DML2 policy ↵	Ovidiu Bunea	1	-0/+1
	EnhancedPrefetchScheduleAccelerationFinal DCN35" This reverts commit 9dad21f910fc ("drm/amd/display: update DML2 policy EnhancedPrefetchScheduleAccelerationFinal DCN35") [why & how] The offending commit exposes a hang with lid close/open behavior. Both issues seem to be related to ODM 2:1 mode switching, so there is another issue generic to that sequence that needs to be investigated. Cc: Mario Limonciello <[email protected]> Cc: Alex Deucher <[email protected]> Reviewed-by: Nicholas Kazlauskas <[email protected]> Signed-off-by: Ovidiu Bunea <[email protected]> Signed-off-by: Tom Chung <[email protected]> Tested-by: Daniel Wheeler <[email protected]> Signed-off-by: Alex Deucher <[email protected]> (cherry picked from commit 68bf95317ebf2cfa7105251e4279e951daceefb7) Cc: [email protected]
2024-10-28	drm/sched: Mark scheduler work queues with WQ_MEM_RECLAIM	Matthew Brost	1	-2/+3
	drm_gpu_scheduler.submit_wq is used to submit jobs, jobs are in the path of dma-fences, and dma-fences are in the path of reclaim. Mark scheduler work queue with WQ_MEM_RECLAIM to ensure forward progress during reclaim; without WQ_MEM_RECLAIM, work queues cannot make forward progress during reclaim. v2: - Fixes tags (Philipp) - Reword commit message (Philipp) Cc: Luben Tuikov <[email protected]> Cc: Danilo Krummrich <[email protected]> Cc: Philipp Stanner <[email protected]> Cc: [email protected] Fixes: 34f50cc6441b ("drm/sched: Use drm sched lockdep map for submit_wq") Fixes: a6149f039369 ("drm/sched: Convert drm scheduler to use a work queue rather than kthread") Signed-off-by: Matthew Brost <[email protected]> Acked-by: Nirmoy Das <[email protected]> Reviewed-by: Philipp Stanner <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Rodrigo Vivi <[email protected]>
2024-10-25	Merge drm/drm-fixes into drm-misc-fixes	Thomas Zimmermann	15	-106/+199
	Backmerging to get the latest fixes from upstream. Signed-off-by: Thomas Zimmermann <[email protected]>
2024-10-25	Merge tag 'drm-xe-fixes-2024-10-24-1' of ↵	Dave Airlie	5	-7/+42
	https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes Driver Changes: - Increase invalidation timeout to avoid errors in some hosts (Shuicheng) - Flush worker on timeout (Badal) - Better handling for force wake failure (Shuicheng) - Improve argument check on user fence creation (Nirmoy) - Don't restart parallel queues multiple times on GT reset (Nirmoy) Signed-off-by: Dave Airlie <[email protected]> From: Lucas De Marchi <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/trlkoiewtc4x2cyhsxmj3atayyq4zwto4iryea5pvya2ymc3yp@fdx5nhwmiyem
2024-10-25	Merge tag 'drm-misc-fixes-2024-10-24' of ↵	Dave Airlie	2	-1/+3
	https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes Short summary of fixes pull: bridge: - aux: Fix assignment of OF node - tc358767: Add missing of_node_put() in error path Signed-off-by: Dave Airlie <[email protected]> From: Thomas Zimmermann <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2024-10-25	Merge tag 'drm-intel-fixes-2024-10-24' of ↵	Dave Airlie	1	-2/+1
	https://gitlab.freedesktop.org/drm/i915/kernel into drm-fixes - Fix DRM_I915_GVT_KVMGT dependencies in Kconfig Signed-off-by: Dave Airlie <[email protected]> From: Joonas Lahtinen <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2024-10-24	drm/xe: Don't restart parallel queues multiple times on GT reset	Nirmoy Das	1	-2/+12
	In case of parallel submissions multiple GuC id will point to the same exec queue and on GT reset such exec queues will get restarted multiple times which is not desirable. v2: don't use exec_queue_enabled() which could race, do the same for xe_guc_submit_stop (Matt B) Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/2295 Cc: Jonathan Cavitt <[email protected]> Cc: Himal Prasad Ghimiray <[email protected]> Cc: Matthew Auld <[email protected]> Cc: Matthew Brost <[email protected]> Cc: Tejas Upadhyay <[email protected]> Reviewed-by: Matthew Brost <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Nirmoy Das <[email protected]> (cherry picked from commit c8b0acd6d8745fd7e6450f5acc38f0227bd253b3) Signed-off-by: Lucas De Marchi <[email protected]>
2024-10-24	drm/xe/ufence: Prefetch ufence addr to catch bogus address	Nirmoy Das	1	-1/+2
	access_ok() only checks for addr overflow so also try to read the addr to catch invalid addr sent from userspace. Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/1630 Cc: Francois Dugast <[email protected]> Cc: Maarten Lankhorst <[email protected]> Cc: Matthew Auld <[email protected]> Cc: Matthew Brost <[email protected]> Reviewed-by: Matthew Brost <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Nirmoy Das <[email protected]> (cherry picked from commit 9408c4508483ffc60811e910a93d6425b8e63928) Signed-off-by: Lucas De Marchi <[email protected]>
2024-10-24	drm/xe: Handle unreliable MMIO reads during forcewake	Shuicheng Lin	1	-3/+9
	In some cases, when the driver attempts to read an MMIO register, the hardware may return 0xFFFFFFFF. The current force wake path code treats this as a valid response, as it only checks the BIT. However, 0xFFFFFFFF should be considered an invalid value, indicating a potential issue. To address this, we should add a log entry to highlight this condition and return failure. The force wake failure log level is changed from notice to err to match the failure return value. v2 (Matt Brost): - set ret value (-EIO) to kick the error to upper layers v3 (Rodrigo): - add commit message for the log level promotion from notice to err v4: - update reviewed info Suggested-by: Alex Zuo <[email protected]> Signed-off-by: Shuicheng Lin <[email protected]> Cc: Matthew Brost <[email protected]> Cc: Michal Wajdeczko <[email protected]> Reviewed-by: Himal Prasad Ghimiray <[email protected]> Acked-by: Badal Nilawar <[email protected]> Cc: Anshuman Gupta <[email protected]> Cc: Matt Roper <[email protected]> Cc: Rodrigo Vivi <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Rodrigo Vivi <[email protected]> (cherry picked from commit a9fbeabe7226a3bf90f82d0e28a02c18e3c67447) Signed-off-by: Lucas De Marchi <[email protected]>
2024-10-24	drm/xe/guc/ct: Flush g2h worker in case of g2h response timeout	Badal Nilawar	1	-0/+18
	In case if g2h worker doesn't get opportunity to within specified timeout delay then flush the g2h worker explicitly. v2: - Describe change in the comment and add TODO (Matt B/John H) - Add xe_gt_warn on fence done after G2H flush (John H) v3: - Updated the comment with root cause - Clean up xe_gt_warn message (John H) Closes: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1620 Closes: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2902 Signed-off-by: Badal Nilawar <[email protected]> Cc: Matthew Brost <[email protected]> Cc: Matthew Auld <[email protected]> Cc: John Harrison <[email protected]> Cc: Himal Prasad Ghimiray <[email protected]> Reviewed-by: Himal Prasad Ghimiray <[email protected]> Acked-by: Matthew Brost <[email protected]> Signed-off-by: Matthew Brost <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] (cherry picked from commit e5152723380404acb8175e0777b1cea57f319a01) Signed-off-by: Lucas De Marchi <[email protected]>
2024-10-24	drm/xe: Enlarge the invalidation timeout from 150 to 500	Shuicheng Lin	1	-1/+1
	There are error messages like below that are occurring during stress testing: "[ 31.004009] xe 0000:03:00.0: [drm] ERROR GT0: Global invalidation timeout". Previously it was hitting this 3 out of 1000 executions of warm reboot. After raising it to 500, 1000 warm reboot executions passed and it didn't fail. Due to the way xe_mmio_wait32() is implemented, the timeout is able to expire early when the register matches the expected value due to the wait increments starting small. So, the larger timeout value should have no effect during normal use cases. v2 (Jonathan): - rework the commit message v3 (Lucas): - add conclusive message for the fail rate and test case v4: - add suggested-by Suggested-by: Jia Yao <[email protected]> Signed-off-by: Shuicheng Lin <[email protected]> Cc: Lucas De Marchi <[email protected]> Cc: Matthew Auld <[email protected]> Cc: Nirmoy Das <[email protected]> Reviewed-by: Jonathan Cavitt <[email protected]> Tested-by: Zongyao Bai <[email protected]> Reviewed-by: Nirmoy Das <[email protected]> Signed-off-by: Matthew Auld <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] (cherry picked from commit 2eb460ab9f4bc5b575f52568d17936da0af681d8) [ Fix conflict with gt->mmio ] Signed-off-by: Lucas De Marchi <[email protected]>
2024-10-24	drm/tegra: Fix NULL vs IS_ERR() check in probe()	Dan Carpenter	1	-2/+2
	The iommu_paging_domain_alloc() function doesn't return NULL pointers, it returns error pointers. Update the check to match. Fixes: 45c690aea8ee ("drm/tegra: Use iommu_paging_domain_alloc()") Signed-off-by: Dan Carpenter <[email protected]> Reviewed-by: Lu Baolu <[email protected]> Signed-off-by: Thierry Reding <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2024-10-23	drm/mediatek: Fix potential NULL dereference in mtk_crtc_destroy()	Dan Carpenter	1	-2/+1
	In mtk_crtc_create(), if the call to mbox_request_channel() fails then we set the "mtk_crtc->cmdq_client.chan" pointer to NULL. In that situation, we do not call cmdq_pkt_create(). During the cleanup, we need to check if the "mtk_crtc->cmdq_client.chan" is NULL first before calling cmdq_pkt_destroy(). Calling cmdq_pkt_destroy() is unnecessary if we didn't call cmdq_pkt_create() and it will result in a NULL pointer dereference. Fixes: 7627122fd1c0 ("drm/mediatek: Add cmdq_handle in mtk_crtc") Signed-off-by: Dan Carpenter <[email protected]> Reviewed-by: AngeloGioacchino Del Regno <[email protected]> Reviewed-by: CK Hu <[email protected]> Link: https://patchwork.kernel.org/project/dri-devel/patch/[email protected]/ Signed-off-by: Chun-Kuang Hu <[email protected]>
2024-10-23	drm/mediatek: Fix get efuse issue for MT8188 DPTX	Liankun Yang	1	-1/+84
	Update efuse data for MT8188 displayport. The DP monitor can not display when DUT connected to USB-c to DP dongle. Analysis view is invalid DP efuse data. Fixes: 350c3fe907fb ("drm/mediatek: dp: Add support MT8188 dp/edp function") Reviewed-by: Matthias Brugger <[email protected]> Reviewed-by: AngeloGioacchino Del Regno <[email protected]> Signed-off-by: Liankun Yang <[email protected]> Reviewed-by: Fei Shao <[email protected]> Tested-by: Fei Shao <[email protected]> Reviewed-by: CK Hu <[email protected]> Link: https://patchwork.kernel.org/project/dri-devel/patch/[email protected]/ Signed-off-by: Chun-Kuang Hu <[email protected]>
2024-10-22	drm/amdgpu: handle default profile on on devices without fullscreen 3D	Alex Deucher	1	-1/+10
	Some devices do not support fullscreen 3D. v2: Make the check generic. Fixes: ec1aab7816b0 ("drm/amdgpu/swsmu: default to fullscreen 3D profile for dGPUs") Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: Kenneth Feng <[email protected]> Cc: Lijo Lazar <[email protected]> (cherry picked from commit 1cdd67510e54e3832f14a885dbf5858584558650)
2024-10-22	drm/amd/display: Disable PSR-SU on Parade 08-01 TCON too	Mario Limonciello	1	-0/+2
	Stuart Hayhurst has found that both at bootup and fullscreen VA-API video is leading to black screens for around 1 second and kernel WARNING [1] traces when calling dmub_psr_enable() with Parade 08-01 TCON. These symptoms all go away with PSR-SU disabled for this TCON, so disable it for now while DMUB traces [2] from the failure can be analyzed and the failure state properly root caused. Cc: Marc Rossi <[email protected]> Cc: Hamza Mahfooz <[email protected]> Link: https://gitlab.freedesktop.org/drm/amd/uploads/a832dd515b571ee171b3e3b566e99a13/dmesg.log [1] Link: https://gitlab.freedesktop.org/drm/amd/uploads/8f13ff3b00963c833e23e68aa8116959/output.log [2] Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/2645 Reviewed-by: Leo Li <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Mario Limonciello <[email protected]> Signed-off-by: Alex Deucher <[email protected]> (cherry picked from commit afb634a6823d8d9db23c5fb04f79c5549349628b) Cc: [email protected]
2024-10-22	drm/amdgpu: fix random data corruption for sdma 7	Frank Min	1	-1/+8
	There is random data corruption caused by const fill, this is caused by write compression mode not correctly configured. So correct compression mode for const fill. Signed-off-by: Frank Min <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]> (cherry picked from commit 75400f8d6e36afc88d59db8a1f3e4b7d90d836ad) Cc: [email protected] # 6.11.x
2024-10-22	drm/amd/display: temp w/a for DP Link Layer compliance	Aurabindo Pillai	1	-0/+13
	[Why&How] Disabling P-State support on full updates for DCN401 results in introducing additional communication with SMU. A UCLK hard min message to SMU takes 4 seconds to go through, which was due to DCN not allowing pstate switch, which was caused by incorrect value for TTU watermark before blanking the HUBP prior to DPG on for servicing the test request. Fix the issue temporarily by disallowing pstate changes for compliance test while test request handler is reworked for a proper fix. Fixes: 67ea53a4bd9d ("drm/amd/display: Disable DCN401 UCLK P-State support on full updates") Cc: Mario Limonciello <[email protected]> Cc: Alex Deucher <[email protected]> Reviewed-by: Dillon Varone <[email protected]> Signed-off-by: Aurabindo Pillai <[email protected]> Signed-off-by: Wayne Lin <[email protected]> Tested-by: Daniel Wheeler <[email protected]> Signed-off-by: Alex Deucher <[email protected]> (cherry picked from commit 8a79f7cdbb41bb0ddfd4d7662b4428d4a9d5306d) Cc: [email protected]
2024-10-22	drm/amd/display: temp w/a for dGPU to enter idle optimizations	Aurabindo Pillai	1	-1/+2
	[Why&How] vblank immediate disable currently does not work for all asics. On DCN401, the vblank interrupts never stop coming, and hence we never get a chance to trigger idle optimizations. Add a workaround to enable immediate disable only on APUs for now. This adds a 2-frame delay for triggering idle optimization, which is a negligible overhead. Fixes: 58a261bfc967 ("drm/amd/display: use a more lax vblank enable policy for older ASICs") Fixes: e45b6716de4b ("drm/amd/display: use a more lax vblank enable policy for DCN35+") Cc: Mario Limonciello <[email protected]> Cc: Alex Deucher <[email protected]> Reviewed-by: Harry Wentland <[email protected]> Reviewed-by: Rodrigo Siqueira <[email protected]> Signed-off-by: Aurabindo Pillai <[email protected]> Signed-off-by: Wayne Lin <[email protected]> Tested-by: Daniel Wheeler <[email protected]> Signed-off-by: Alex Deucher <[email protected]> (cherry picked from commit 9b47278cec98e9894adf39229e91aaf4ab9140c5) Cc: [email protected]
2024-10-22	drm/amd/pm: update deep sleep status on smu v14.0.2/3	Kenneth Feng	1	-1/+6
	disable deep sleep during the compute workload for the potential performance loss on smu v14.0.2/3 Signed-off-by: Kenneth Feng <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]> (cherry picked from commit 7d9af459f43436452103babb960fd0ecb13c714e)
2024-10-22	drm/amd/pm: update overdrive function on smu v14.0.2/3	Kenneth Feng	1	-1/+1
	update overdrive function on smu v14.0.2/3 Signed-off-by: Kenneth Feng <[email protected]> Acked-by: Yang Wang <[email protected]> Signed-off-by: Alex Deucher <[email protected]> (cherry picked from commit dcf822fca599e4cbc582801222d519b4da82fab5)
2024-10-22	drm/amd/pm: update the driver-fw interface file for smu v14.0.2/3	Kenneth Feng	3	-89/+102
	update the driver-fw interface file for smu v14.0.2/3 Signed-off-by: Kenneth Feng <[email protected]> Reviewed-by: Yang Wang <[email protected]> Signed-off-by: Alex Deucher <[email protected]> (cherry picked from commit 0642c95efbdc09efb34dd9f1ac642daa0daa9c2c)
2024-10-22	drm/amd: Guard against bad data for ATIF ACPI method	Mario Limonciello	1	-3/+12
	If a BIOS provides bad data in response to an ATIF method call this causes a NULL pointer dereference in the caller. ``` ? show_regs (arch/x86/kernel/dumpstack.c:478 (discriminator 1)) ? __die (arch/x86/kernel/dumpstack.c:423 arch/x86/kernel/dumpstack.c:434) ? page_fault_oops (arch/x86/mm/fault.c:544 (discriminator 2) arch/x86/mm/fault.c:705 (discriminator 2)) ? do_user_addr_fault (arch/x86/mm/fault.c:440 (discriminator 1) arch/x86/mm/fault.c:1232 (discriminator 1)) ? acpi_ut_update_object_reference (drivers/acpi/acpica/utdelete.c:642) ? exc_page_fault (arch/x86/mm/fault.c:1542) ? asm_exc_page_fault (./arch/x86/include/asm/idtentry.h:623) ? amdgpu_atif_query_backlight_caps.constprop.0 (drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c:387 (discriminator 2)) amdgpu ? amdgpu_atif_query_backlight_caps.constprop.0 (drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c:386 (discriminator 1)) amdgpu ``` It has been encountered on at least one system, so guard for it. Fixes: d38ceaf99ed0 ("drm/amdgpu: add core driver (v4)") Acked-by: Alex Deucher <[email protected]> Signed-off-by: Mario Limonciello <[email protected]> Signed-off-by: Alex Deucher <[email protected]> (cherry picked from commit c9b7c809b89f24e9372a4e7f02d64c950b07fdee) Cc: [email protected]
2024-10-22	drm/mediatek: Fix color format MACROs in OVL	Hsin-Te Yuan	1	-2/+2
	In commit 9f428b95ac89 ("drm/mediatek: Add new color format MACROs in OVL"), some new color formats are defined in the MACROs to make the switch statement more concise. That commit was intended to be a no-op cleanup. However, there are typos in these formats MACROs, which cause the return value to be incorrect. Fix the typos to ensure the return value remains unchanged. Fixes: 9f428b95ac89 ("drm/mediatek: Add new color format MACROs in OVL") Reviewed-by: Douglas Anderson <[email protected]> Reviewed-by: Matthias Brugger <[email protected]> Signed-off-by: Hsin-Te Yuan <[email protected]> Reviewed-by: AngeloGioacchino Del Regno <[email protected]> Reviewed-by: CK Hu <[email protected]> Link: https://patchwork.kernel.org/project/dri-devel/patch/[email protected]/ Signed-off-by: Chun-Kuang Hu <[email protected]>
2024-10-22	drm/mediatek: Add blend_modes to mtk_plane_init() for different SoCs	Jason-JH.Lin	10	-10/+46
	Since some SoCs support premultiplied pixel formats but some do not, the blend_modes parameter is added to mtk_plane_init(), which is obtained from the mtk_ddp_comp_get_blend_modes function implemented in different blending supported components. The blending supported components can use driver data to set the blend mode capabilities for different SoCs. Signed-off-by: Jason-JH.Lin <[email protected]> Reviewed-by: AngeloGioacchino Del Regno <[email protected]> Reviewed-by: CK Hu <[email protected]> Link: https://patchwork.kernel.org/project/dri-devel/patch/[email protected]/ Signed-off-by: Chun-Kuang Hu <[email protected]>