<feed xmlns='http://www.w3.org/2005/Atom'>
<title>blaster4385/linux-IllusionX/drivers/gpu/drm/xe, branch v6.12.1</title>
<subtitle>Linux kernel with personal config changes for arch linux</subtitle>
<id>https://git.tablaster.dev/blaster4385/linux-IllusionX/atom?h=v6.12.1</id>
<link rel='self' href='https://git.tablaster.dev/blaster4385/linux-IllusionX/atom?h=v6.12.1'/>
<link rel='alternate' type='text/html' href='https://git.tablaster.dev/blaster4385/linux-IllusionX/'/>
<updated>2024-11-13T19:37:22Z</updated>
<entry>
<title>drm/xe/oa: Fix "Missing outer runtime PM protection" warning</title>
<updated>2024-11-13T19:37:22Z</updated>
<author>
<name>Ashutosh Dixit</name>
<email>ashutosh.dixit@intel.com</email>
</author>
<published>2024-11-09T03:20:03Z</published>
<link rel='alternate' type='text/html' href='https://git.tablaster.dev/blaster4385/linux-IllusionX/commit/?id=c0403e4ceecaefbeaf78263dffcd3e3f06a19f6b'/>
<id>urn:sha1:c0403e4ceecaefbeaf78263dffcd3e3f06a19f6b</id>
<content type='text'>
Fix the following drm_WARN:

[953.586396] xe 0000:00:02.0: [drm] Missing outer runtime PM protection
...
&lt;4&gt; [953.587090]  ? xe_pm_runtime_get_noresume+0x8d/0xa0 [xe]
&lt;4&gt; [953.587208]  guc_exec_queue_add_msg+0x28/0x130 [xe]
&lt;4&gt; [953.587319]  guc_exec_queue_fini+0x3a/0x40 [xe]
&lt;4&gt; [953.587425]  xe_exec_queue_destroy+0xb3/0xf0 [xe]
&lt;4&gt; [953.587515]  xe_oa_release+0x9c/0xc0 [xe]

Suggested-by: John Harrison &lt;john.c.harrison@intel.com&gt;
Suggested-by: Matthew Brost &lt;matthew.brost@intel.com&gt;
Fixes: e936f885f1e9 ("drm/xe/oa/uapi: Expose OA stream fd")
Cc: stable@vger.kernel.org
Signed-off-by: Ashutosh Dixit &lt;ashutosh.dixit@intel.com&gt;
Reviewed-by: Matthew Brost &lt;matthew.brost@intel.com&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20241109032003.3093811-1-ashutosh.dixit@intel.com
(cherry picked from commit b107c63d2953907908fd0cafb0e543b3c3167b75)
Signed-off-by: Lucas De Marchi &lt;lucas.demarchi@intel.com&gt;
</content>
</entry>
<entry>
<title>drm/xe: handle flat ccs during hibernation on igpu</title>
<updated>2024-11-13T15:58:58Z</updated>
<author>
<name>Matthew Auld</name>
<email>matthew.auld@intel.com</email>
</author>
<published>2024-11-12T16:28:28Z</published>
<link rel='alternate' type='text/html' href='https://git.tablaster.dev/blaster4385/linux-IllusionX/commit/?id=be7eeaba2a11d7c16a9dc034a25f224f1343f303'/>
<id>urn:sha1:be7eeaba2a11d7c16a9dc034a25f224f1343f303</id>
<content type='text'>
Starting from LNL, CCS has moved over to flat CCS model where there is
now dedicated memory reserved for storing compression state. On
platforms like LNL this reserved memory lives inside graphics stolen
memory, which is not treated like normal RAM and is therefore skipped by
the core kernel when creating the hibernation image. Currently if
something was compressed and we enter hibernation all the corresponding
CCS state is lost on such HW, resulting in corrupted memory. To fix this
evict user buffers from TT -&gt; SYSTEM to ensure we take a snapshot of the
raw CCS state when entering hibernation, where upon resuming we can
restore the raw CCS state back when next validating the buffer. This has
been confirmed to fix display corruption on LNL when coming back from
hibernation.

Fixes: cbdc52c11c9b ("drm/xe/xe2: Support flat ccs")
Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/3409
Signed-off-by: Matthew Auld &lt;matthew.auld@intel.com&gt;
Cc: Matthew Brost &lt;matthew.brost@intel.com&gt;
Cc: &lt;stable@vger.kernel.org&gt; # v6.8+
Reviewed-by: Rodrigo Vivi &lt;rodrigo.vivi@intel.com&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20241112162827.116523-2-matthew.auld@intel.com
(cherry picked from commit c8b3c6db941299d7cc31bd9befed3518fdebaf68)
Signed-off-by: Lucas De Marchi &lt;lucas.demarchi@intel.com&gt;
</content>
</entry>
<entry>
<title>drm/xe: improve hibernation on igpu</title>
<updated>2024-11-13T15:58:31Z</updated>
<author>
<name>Matthew Auld</name>
<email>matthew.auld@intel.com</email>
</author>
<published>2024-11-01T17:01:57Z</published>
<link rel='alternate' type='text/html' href='https://git.tablaster.dev/blaster4385/linux-IllusionX/commit/?id=46f1f4b0f3c2a2dff9887de7c66ccc7ef482bd83'/>
<id>urn:sha1:46f1f4b0f3c2a2dff9887de7c66ccc7ef482bd83</id>
<content type='text'>
The GGTT looks to be stored inside stolen memory on igpu which is not
treated as normal RAM.  The core kernel skips this memory range when
creating the hibernation image, therefore when coming back from
hibernation the GGTT programming is lost. This seems to cause issues
with broken resume where GuC FW fails to load:

[drm] *ERROR* GT0: load failed: status = 0x400000A0, time = 10ms, freq = 1250MHz (req 1300MHz), done = -1
[drm] *ERROR* GT0: load failed: status: Reset = 0, BootROM = 0x50, UKernel = 0x00, MIA = 0x00, Auth = 0x01
[drm] *ERROR* GT0: firmware signature verification failed
[drm] *ERROR* CRITICAL: Xe has declared device 0000:00:02.0 as wedged.

Current GGTT users are kernel internal and tracked as pinned, so it
should be possible to hook into the existing save/restore logic that we
use for dgpu, where the actual evict is skipped but on restore we
importantly restore the GGTT programming.  This has been confirmed to
fix hibernation on at least ADL and MTL, though likely all igpu
platforms are affected.

This also means we have a hole in our testing, where the existing s4
tests only really test the driver hooks, and don't go as far as actually
rebooting and restoring from the hibernation image and in turn powering
down RAM (and therefore losing the contents of stolen).

v2 (Brost)
 - Remove extra newline and drop unnecessary parentheses.

Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/3275
Signed-off-by: Matthew Auld &lt;matthew.auld@intel.com&gt;
Cc: Matthew Brost &lt;matthew.brost@intel.com&gt;
Cc: &lt;stable@vger.kernel.org&gt; # v6.8+
Reviewed-by: Matthew Brost &lt;matthew.brost@intel.com&gt;
Reviewed-by: Lucas De Marchi &lt;lucas.demarchi@intel.com&gt;
Signed-off-by: Matthew Brost &lt;matthew.brost@intel.com&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20241101170156.213490-2-matthew.auld@intel.com
(cherry picked from commit f2a6b8e396666d97ada8e8759dfb6a69d8df6380)
Signed-off-by: Lucas De Marchi &lt;lucas.demarchi@intel.com&gt;
</content>
</entry>
<entry>
<title>drm/xe: Restore system memory GGTT mappings</title>
<updated>2024-11-13T15:58:22Z</updated>
<author>
<name>Matthew Brost</name>
<email>matthew.brost@intel.com</email>
</author>
<published>2024-10-31T18:22:57Z</published>
<link rel='alternate' type='text/html' href='https://git.tablaster.dev/blaster4385/linux-IllusionX/commit/?id=dd886a63d6e2ce5c16e662c07547c067ad7d91f5'/>
<id>urn:sha1:dd886a63d6e2ce5c16e662c07547c067ad7d91f5</id>
<content type='text'>
GGTT mappings reside on the device and this state is lost during suspend
/ d3cold thus this state must be restored resume regardless if the BO is
in system memory or VRAM.

v2:
 - Unnecessary parentheses around bo-&gt;placements[0] (Checkpatch)

Signed-off-by: Matthew Brost &lt;matthew.brost@intel.com&gt;
Reviewed-by: Matthew Auld &lt;matthew.auld@intel.com&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20241031182257.2949579-1-matthew.brost@intel.com
(cherry picked from commit a19d1db9a3fa89fabd7c83544b84f393ee9b851f)
Signed-off-by: Lucas De Marchi &lt;lucas.demarchi@intel.com&gt;
</content>
</entry>
<entry>
<title>drm/xe: Ensure all locks released in exec IOCTL</title>
<updated>2024-11-13T15:49:14Z</updated>
<author>
<name>Matthew Brost</name>
<email>matthew.brost@intel.com</email>
</author>
<published>2024-11-06T22:49:44Z</published>
<link rel='alternate' type='text/html' href='https://git.tablaster.dev/blaster4385/linux-IllusionX/commit/?id=ce0d6970231903f43572a6998020fdc8b3a8f455'/>
<id>urn:sha1:ce0d6970231903f43572a6998020fdc8b3a8f455</id>
<content type='text'>
In couple of places the wrong error handling goto was used to release
locks. Fix these to ensure all locks dropped on exec IOCTL errors.

Cc: Francois Dugast &lt;francois.dugast@intel.com&gt;
Fixes: d16ef1a18e39 ("drm/xe/exec: Switch hw engine group execution mode upon job submission")
Signed-off-by: Matthew Brost &lt;matthew.brost@intel.com&gt;
Reviewed-by: Francois Dugast &lt;francois.dugast@intel.com&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20241106224944.30130-1-matthew.brost@intel.com
(cherry picked from commit 9e7aacd8402b88394e6a83cb242901fde77a1773)
Signed-off-by: Lucas De Marchi &lt;lucas.demarchi@intel.com&gt;
</content>
</entry>
<entry>
<title>drm/xe: Stop accumulating LRC timestamp on job_free</title>
<updated>2024-11-05T23:40:13Z</updated>
<author>
<name>Lucas De Marchi</name>
<email>lucas.demarchi@intel.com</email>
</author>
<published>2024-11-04T14:38:12Z</published>
<link rel='alternate' type='text/html' href='https://git.tablaster.dev/blaster4385/linux-IllusionX/commit/?id=514447a1219021298329ce586536598c3b4b2dc0'/>
<id>urn:sha1:514447a1219021298329ce586536598c3b4b2dc0</id>
<content type='text'>
The exec queue timestamp is only really useful when it's being queried
through the fdinfo. There's no need to update it so often, on every
job_free. Tracing a simple app like vkcube running shows an update
rate of ~ 120Hz. In case of discrete, the BO is on vram, creating a lot
of pcie transactions.

The update on job_free() is used to cover a gap: if exec
queue is created and destroyed rapidly, before a new query, the
timestamp still needs to be accumulated and accounted for in the xef.

Initial implementation in commit 6109f24f87d7 ("drm/xe: Add helper to
accumulate exec queue runtime") couldn't do it on the exec_queue_fini
since the xef could be gone at that point. However since commit
ce8c161cbad4 ("drm/xe: Add ref counting for xe_file") the xef is
refcounted and the exec queue always holds a reference, making this safe
now.

Improve the fix in commit 2149ded63079 ("drm/xe: Fix use after free when
client stats are captured") by reducing the frequency in which the
update is needed.

Fixes: 2149ded63079 ("drm/xe: Fix use after free when client stats are captured")
Reviewed-by: Nirmoy Das &lt;nirmoy.das@intel.com&gt;
Reviewed-by: Jonathan Cavitt &lt;jonathan.cavitt@intel.com&gt;
Reviewed-by: Umesh Nerlige Ramappa &lt;umesh.nerlige.ramappa@intel.com&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20241104143815.2112272-3-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi &lt;lucas.demarchi@intel.com&gt;
(cherry picked from commit 83db047d9425d9a649f01573797558eff0f632e1)
Signed-off-by: Lucas De Marchi &lt;lucas.demarchi@intel.com&gt;
</content>
</entry>
<entry>
<title>drm/xe/pf: Fix potential GGTT allocation leak</title>
<updated>2024-11-05T23:40:12Z</updated>
<author>
<name>Michal Wajdeczko</name>
<email>michal.wajdeczko@intel.com</email>
</author>
<published>2024-11-04T14:49:01Z</published>
<link rel='alternate' type='text/html' href='https://git.tablaster.dev/blaster4385/linux-IllusionX/commit/?id=a353c78459f4d116216393cc29032ef5fe1472d2'/>
<id>urn:sha1:a353c78459f4d116216393cc29032ef5fe1472d2</id>
<content type='text'>
In unlikely event that we fail during sending the new VF GGTT
configuration to the GuC, we will free only the GGTT node data
struct but will miss to release the actual GGTT allocation.

This will later lead to list corruption, GGTT space leak and
finally risking crash when unloading the driver:

 [ ] ... [drm] GT0: PF: Failed to provision VF1 with 1073741824 (1.00 GiB) GGTT (-EIO)
 [ ] ... [drm] GT0: PF: VF1 provisioning remains at 0 (0 B) GGTT

 [ ] list_add corruption. next-&gt;prev should be prev (ffff88813cfcd628), but was 0000000000000000. (next=ffff88813cfe2028).
 [ ] RIP: 0010:__list_add_valid_or_report+0x6b/0xb0
 [ ] Call Trace:
 [ ]  drm_mm_insert_node_in_range+0x2c0/0x4e0
 [ ]  xe_ggtt_node_insert+0x46/0x70 [xe]
 [ ]  pf_provision_vf_ggtt+0x7f5/0xa70 [xe]
 [ ]  xe_gt_sriov_pf_config_set_ggtt+0x5e/0x770 [xe]
 [ ]  ggtt_set+0x4b/0x70 [xe]
 [ ]  simple_attr_write_xsigned.constprop.0.isra.0+0xb0/0x110

 [ ] ... [drm] GT0: PF: Failed to provision VF1 with 1073741824 (1.00 GiB) GGTT (-ENOSPC)
 [ ] ... [drm] GT0: PF: VF1 provisioning remains at 0 (0 B) GGTT

 [ ] Oops: general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6b7b: 0000 [#1] PREEMPT SMP NOPTI
 [ ] RIP: 0010:drm_mm_remove_node+0x1b7/0x390
 [ ] Call Trace:
 [ ]  &lt;TASK&gt;
 [ ]  ? die_addr+0x2e/0x80
 [ ]  ? exc_general_protection+0x1a1/0x3e0
 [ ]  ? asm_exc_general_protection+0x22/0x30
 [ ]  ? drm_mm_remove_node+0x1b7/0x390
 [ ]  ggtt_node_remove+0xa5/0xf0 [xe]
 [ ]  xe_ggtt_node_remove+0x35/0x70 [xe]
 [ ]  xe_ttm_bo_destroy+0x123/0x220 [xe]
 [ ]  intel_user_framebuffer_destroy+0x44/0x70 [xe]
 [ ]  intel_plane_destroy_state+0x3b/0xc0 [xe]
 [ ]  drm_atomic_state_default_clear+0x1cd/0x2f0
 [ ]  intel_atomic_state_clear+0x9/0x20 [xe]
 [ ]  __drm_atomic_state_free+0x1d/0xb0

Fix that by using pf_release_ggtt() on the error path, which now
works regardless if the node has GGTT allocation or not.

Fixes: 34e804220f69 ("drm/xe: Make xe_ggtt_node struct independent")
Signed-off-by: Michal Wajdeczko &lt;michal.wajdeczko@intel.com&gt;
Cc: Rodrigo Vivi &lt;rodrigo.vivi@intel.com&gt;
Cc: Matthew Brost &lt;matthew.brost@intel.com&gt;
Cc: Matthew Auld &lt;matthew.auld@intel.com&gt;
Reviewed-by: Matthew Brost &lt;matthew.brost@intel.com&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20241104144901.1903-1-michal.wajdeczko@intel.com
(cherry picked from commit 43b1dd2b550f0861ce80fbfffd5881b1b26272b1)
Signed-off-by: Lucas De Marchi &lt;lucas.demarchi@intel.com&gt;
</content>
</entry>
<entry>
<title>drm/xe: Drop VM dma-resv lock on xe_sync_in_fence_get failure in exec IOCTL</title>
<updated>2024-11-05T23:40:12Z</updated>
<author>
<name>Matthew Brost</name>
<email>matthew.brost@intel.com</email>
</author>
<published>2024-11-05T04:35:24Z</published>
<link rel='alternate' type='text/html' href='https://git.tablaster.dev/blaster4385/linux-IllusionX/commit/?id=64a2b6ed4bfd890a0e91955dd8ef8422a3944ed9'/>
<id>urn:sha1:64a2b6ed4bfd890a0e91955dd8ef8422a3944ed9</id>
<content type='text'>
Upon failure all locks need to be dropped before returning to the user.

Fixes: 58480c1c912f ("drm/xe: Skip VMAs pin when requesting signal to the last XE_EXEC")
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Matthew Brost &lt;matthew.brost@intel.com&gt;
Reviewed-by: Tejas Upadhyay &lt;tejas.upadhyay@intel.com&gt;
Reviewed-by: Rodrigo Vivi &lt;rodrigo.vivi@intel.com&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20241105043524.4062774-3-matthew.brost@intel.com
(cherry picked from commit 7d1a4258e602ffdce529f56686925034c1b3b095)
Signed-off-by: Lucas De Marchi &lt;lucas.demarchi@intel.com&gt;
</content>
</entry>
<entry>
<title>drm/xe: Fix possible exec queue leak in exec IOCTL</title>
<updated>2024-11-05T23:40:12Z</updated>
<author>
<name>Matthew Brost</name>
<email>matthew.brost@intel.com</email>
</author>
<published>2024-11-05T04:35:23Z</published>
<link rel='alternate' type='text/html' href='https://git.tablaster.dev/blaster4385/linux-IllusionX/commit/?id=af797b831d8975cb4610f396dcb7f03f4b9908e7'/>
<id>urn:sha1:af797b831d8975cb4610f396dcb7f03f4b9908e7</id>
<content type='text'>
In a couple of places after an exec queue is looked up the exec IOCTL
returns on input errors without dropping the exec queue ref. Fix this
ensuring the exec queue ref is dropped on input error.

Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Matthew Brost &lt;matthew.brost@intel.com&gt;
Reviewed-by: Tejas Upadhyay &lt;tejas.upadhyay@intel.com&gt;
Reviewed-by: Rodrigo Vivi &lt;rodrigo.vivi@intel.com&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20241105043524.4062774-2-matthew.brost@intel.com
(cherry picked from commit 07064a200b40ac2195cb6b7b779897d9377e5e6f)
Signed-off-by: Lucas De Marchi &lt;lucas.demarchi@intel.com&gt;
</content>
</entry>
<entry>
<title>drm/xe/guc/tlb: Flush g2h worker in case of tlb timeout</title>
<updated>2024-11-04T16:12:30Z</updated>
<author>
<name>Nirmoy Das</name>
<email>nirmoy.das@intel.com</email>
</author>
<published>2024-10-29T12:01:17Z</published>
<link rel='alternate' type='text/html' href='https://git.tablaster.dev/blaster4385/linux-IllusionX/commit/?id=1491efb39acee3848b61fcb3e5cc4be8de304352'/>
<id>urn:sha1:1491efb39acee3848b61fcb3e5cc4be8de304352</id>
<content type='text'>
Flush the g2h worker explicitly if TLB timeout happens which is
observed on LNL and that points to the recent scheduling issue with
E-cores on LNL.

This is similar to the recent fix:
commit e51527233804 ("drm/xe/guc/ct: Flush g2h worker in case of g2h
response timeout") and should be removed once there is E core
scheduling fix.

v2: Add platform check(Himal)
v3: Remove gfx platform check as the issue related to cpu
    platform(John)
    Use the common WA macro(John) and print when the flush
    resolves timeout(Matt B)
v4: Remove the resolves log and do the flush before taking
    pending_lock(Matt A)

Cc: Badal Nilawar &lt;badal.nilawar@intel.com&gt;
Cc: Matthew Brost &lt;matthew.brost@intel.com&gt;
Cc: Matthew Auld &lt;matthew.auld@intel.com&gt;
Cc: John Harrison &lt;John.C.Harrison@Intel.com&gt;
Cc: Himal Prasad Ghimiray &lt;himal.prasad.ghimiray@intel.com&gt;
Cc: Lucas De Marchi &lt;lucas.demarchi@intel.com&gt;
Cc: stable@vger.kernel.org # v6.11+
Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/2687
Signed-off-by: Nirmoy Das &lt;nirmoy.das@intel.com&gt;
Reviewed-by: Matthew Auld &lt;matthew.auld@intel.com&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20241029120117.449694-3-nirmoy.das@intel.com
Signed-off-by: Lucas De Marchi &lt;lucas.demarchi@intel.com&gt;
(cherry picked from commit e1f6fa55664a0eeb0a641f497e1adfcf6672e995)
Signed-off-by: Lucas De Marchi &lt;lucas.demarchi@intel.com&gt;
</content>
</entry>
</feed>
