aboutsummaryrefslogtreecommitdiff
path: root/drivers/gpu
AgeCommit message (Collapse)AuthorFilesLines
2024-04-30drm/vmwgfx: Remove duplicate vmwgfx_vkms.h headerJiapeng Chong1-1/+0
./drivers/gpu/drm/vmwgfx/vmwgfx_vkms.c: vmwgfx_vkms.h is included more than once. Reported-by: Abaci Robot <[email protected]> Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=8772 Signed-off-by: Jiapeng Chong <[email protected]> Signed-off-by: Zack Rusin <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2024-04-30drm/vmwgfx: Fix invalid reads in fence signaled eventsZack Rusin1-1/+1
Correctly set the length of the drm_event to the size of the structure that's actually used. The length of the drm_event was set to the parent structure instead of to the drm_vmw_event_fence which is supposed to be read. drm_read uses the length parameter to copy the event to the user space thus resuling in oob reads. Signed-off-by: Zack Rusin <[email protected]> Fixes: 8b7de6aa8468 ("vmwgfx: Rework fence event action") Reported-by: [email protected] # ZDI-CAN-23566 Cc: David Airlie <[email protected]> CC: Daniel Vetter <[email protected]> Cc: Zack Rusin <[email protected]> Cc: Broadcom internal kernel review list <[email protected]> Cc: [email protected] Cc: [email protected] Cc: <[email protected]> # v3.4+ Reviewed-by: Maaz Mombasawala <[email protected]> Reviewed-by: Martin Krastev <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2024-04-30drm/nouveau/gsp: Use the sg allocator for level 2 of radix3Lyude Paul2-27/+54
Currently we allocate all 3 levels of radix3 page tables using nvkm_gsp_mem_ctor(), which uses dma_alloc_coherent() for allocating all of the relevant memory. This can end up failing in scenarios where the system has very high memory fragmentation, and we can't find enough contiguous memory to allocate level 2 of the page table. Currently, this can result in runtime PM issues on systems where memory fragmentation is high - as we'll fail to allocate the page table for our suspend/resume buffer: kworker/10:2: page allocation failure: order:7, mode:0xcc0(GFP_KERNEL), nodemask=(null),cpuset=/,mems_allowed=0 CPU: 10 PID: 479809 Comm: kworker/10:2 Not tainted 6.8.6-201.ChopperV6.fc39.x86_64 #1 Hardware name: SLIMBOOK Executive/Executive, BIOS N.1.10GRU06 02/02/2024 Workqueue: pm pm_runtime_work Call Trace: <TASK> dump_stack_lvl+0x64/0x80 warn_alloc+0x165/0x1e0 ? __alloc_pages_direct_compact+0xb3/0x2b0 __alloc_pages_slowpath.constprop.0+0xd7d/0xde0 __alloc_pages+0x32d/0x350 __dma_direct_alloc_pages.isra.0+0x16a/0x2b0 dma_direct_alloc+0x70/0x270 nvkm_gsp_radix3_sg+0x5e/0x130 [nouveau] r535_gsp_fini+0x1d4/0x350 [nouveau] nvkm_subdev_fini+0x67/0x150 [nouveau] nvkm_device_fini+0x95/0x1e0 [nouveau] nvkm_udevice_fini+0x53/0x70 [nouveau] nvkm_object_fini+0xb9/0x240 [nouveau] nvkm_object_fini+0x75/0x240 [nouveau] nouveau_do_suspend+0xf5/0x280 [nouveau] nouveau_pmops_runtime_suspend+0x3e/0xb0 [nouveau] pci_pm_runtime_suspend+0x67/0x1e0 ? __pfx_pci_pm_runtime_suspend+0x10/0x10 __rpm_callback+0x41/0x170 ? __pfx_pci_pm_runtime_suspend+0x10/0x10 rpm_callback+0x5d/0x70 ? __pfx_pci_pm_runtime_suspend+0x10/0x10 rpm_suspend+0x120/0x6a0 pm_runtime_work+0x98/0xb0 process_one_work+0x171/0x340 worker_thread+0x27b/0x3a0 ? __pfx_worker_thread+0x10/0x10 kthread+0xe5/0x120 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x31/0x50 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1b/0x30 Luckily, we don't actually need to allocate coherent memory for the page table thanks to being able to pass the GPU a radix3 page table for suspend/resume data. So, let's rewrite nvkm_gsp_radix3_sg() to use the sg allocator for level 2. We continue using coherent allocations for lvl0 and 1, since they only take a single page. V2: * Don't forget to actually jump to the next scatterlist when we reach the end of the scatterlist we're currently on when writing out the page table for level 2 Signed-off-by: Lyude Paul <[email protected]> Cc: [email protected] Reviewed-by: Ben Skeggs <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2024-04-30drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()Lyude Paul1-7/+12
Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a BUG() on startup: kernel BUG at include/linux/scatterlist.h:187! invalid opcode: 0000 [#1] PREEMPT SMP NOPTI CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30 Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019 RIP: 0010:sg_init_one+0x85/0xa0 Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54 24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b 0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00 RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000 RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000 R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508 R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018 FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0 Call Trace: <TASK> ? die+0x36/0x90 ? do_trap+0xdd/0x100 ? sg_init_one+0x85/0xa0 ? do_error_trap+0x65/0x80 ? sg_init_one+0x85/0xa0 ? exc_invalid_op+0x50/0x70 ? sg_init_one+0x85/0xa0 ? asm_exc_invalid_op+0x1a/0x20 ? sg_init_one+0x85/0xa0 nvkm_firmware_ctor+0x14a/0x250 [nouveau] nvkm_falcon_fw_ctor+0x42/0x70 [nouveau] ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau] r535_gsp_oneinit+0xb3/0x15f0 [nouveau] ? srso_return_thunk+0x5/0x5f ? srso_return_thunk+0x5/0x5f ? nvkm_udevice_new+0x95/0x140 [nouveau] ? srso_return_thunk+0x5/0x5f ? srso_return_thunk+0x5/0x5f ? ktime_get+0x47/0xb0 ? srso_return_thunk+0x5/0x5f nvkm_subdev_oneinit_+0x4f/0x120 [nouveau] nvkm_subdev_init_+0x39/0x140 [nouveau] ? srso_return_thunk+0x5/0x5f nvkm_subdev_init+0x44/0x90 [nouveau] nvkm_device_init+0x166/0x2e0 [nouveau] nvkm_udevice_init+0x47/0x70 [nouveau] nvkm_object_init+0x41/0x1c0 [nouveau] nvkm_ioctl_new+0x16a/0x290 [nouveau] ? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau] ? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau] nvkm_ioctl+0x126/0x290 [nouveau] nvif_object_ctor+0x112/0x190 [nouveau] nvif_device_ctor+0x23/0x60 [nouveau] nouveau_cli_init+0x164/0x640 [nouveau] nouveau_drm_device_init+0x97/0x9e0 [nouveau] ? srso_return_thunk+0x5/0x5f ? pci_update_current_state+0x72/0xb0 ? srso_return_thunk+0x5/0x5f nouveau_drm_probe+0x12c/0x280 [nouveau] ? srso_return_thunk+0x5/0x5f local_pci_probe+0x45/0xa0 pci_device_probe+0xc7/0x270 really_probe+0xe6/0x3a0 __driver_probe_device+0x87/0x160 driver_probe_device+0x1f/0xc0 __driver_attach+0xec/0x1f0 ? __pfx___driver_attach+0x10/0x10 bus_for_each_dev+0x88/0xd0 bus_add_driver+0x116/0x220 driver_register+0x59/0x100 ? __pfx_nouveau_drm_init+0x10/0x10 [nouveau] do_one_initcall+0x5b/0x320 do_init_module+0x60/0x250 init_module_from_file+0x86/0xc0 idempotent_init_module+0x120/0x2b0 __x64_sys_finit_module+0x5e/0xb0 do_syscall_64+0x83/0x160 ? srso_return_thunk+0x5/0x5f entry_SYSCALL_64_after_hwframe+0x71/0x79 RIP: 0033:0x7feeb5cc20cd Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48 RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035 RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310 R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0 R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80 </TASK> We hit this when trying to initialize firmware of type NVKM_FIRMWARE_IMG_DMA because we allocate our memory with dma_alloc_coherent, and DMA allocations can't be turned back into memory pages - which a scatterlist needs in order to map them. So, fix this by allocating the memory with vmalloc instead(). V2: * Fixup explanation as the prior one was bogus Signed-off-by: Lyude Paul <[email protected]> Reviewed-by: Dave Airlie <[email protected]> Cc: [email protected] Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2024-04-30drm/fb_dma: Add checks in drm_fb_dma_get_scanout_buffer()Jocelyn Falempe1-0/+3
plane->state and plane->state->fb can be NULL, so add a check before dereferencing them. Found by testing with the imx driver. Fixes: 879b3b6511fe ("drm/fb_dma: Add generic get_scanout_buffer() for drm_panic") Signed-off-by: Jocelyn Falempe <[email protected]> Reviewed-by: Thomas Zimmermann <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2024-04-30nouveau: Add missing break statementChaitanya Kumar Borah1-0/+1
Add the missing break statement that causes the following build error CC [M] drivers/gpu/drm/i915/display/intel_display_device.o ../drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c: In function ‘build_registry’: ../drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c:1266:3: error: label at end of compound statement 1266 | default: | ^~~~~~~ CC [M] drivers/gpu/drm/amd/amdgpu/gfx_v10_0.o HDRTEST drivers/gpu/drm/xe/compat-i915-headers/i915_reg.h CC [M] drivers/gpu/drm/amd/amdgpu/imu_v11_0.o make[7]: *** [../scripts/Makefile.build:244: drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.o] Error 1 make[7]: *** Waiting for unfinished jobs.... Fixes: b58a0bc904ff ("nouveau: add command-line GSP-RM registry support") Closes: https://lore.kernel.org/all/[email protected]/T/#m3c9acebac754f2e74a85b76c858c093bb1aacaf0 Closes: https://lore.kernel.org/all/CA+G9fYu7Ug0K8h9QJT0WbtWh_LL9Juc+VC0WMU_Z_vSSPDNymg@mail.gmail.com/ Tested-by: Nícolas F. R. A. Prado <[email protected]> Reviewed-by: Lucas De Marchi <[email protected]> Signed-off-by: Chaitanya Kumar Borah <[email protected]> Signed-off-by: Danilo Krummrich <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2024-04-30drm/amd/pm: fix uninitialized variable warnings for vega10_hwmgrTim Huang2-13/+39
Clear warnings that using uninitialized variable when fails to get the valid value from SMU. Signed-off-by: Tim Huang <[email protected]> Reviewed-by: Yang Wang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-30drm/amdgpu: fix uninitialized scalar variable warningTim Huang1-0/+2
Clear warning that field bp is uninitialized when calling amdgpu_virt_ras_add_bps. Signed-off-by: Tim Huang <[email protected]> Reviewed-by: Yang Wang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-30drm/amd/pm: fix the Out-of-bounds read warningJesse Zhang1-2/+3
using index i - 1U may beyond element index for mc_data[] when i = 0. Signed-off-by: Jesse Zhang <[email protected]> Reviewed-by: Tim Huang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-30drm/amd/pm: fix uninitialized variable warning for smu_v13Tim Huang5-92/+51
Clear warning that using uninitialized variable when the dpm is not enabled and reuse the code for SMU13 to get the boot frequency. Signed-off-by: Tim Huang <[email protected]> Acked-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-30drm/amd/pm: Fix negative array index readJesse Zhang1-6/+21
Avoid using the negative values for clk_idex as an index into an array pptable->DpmDescriptor. V2: fix clk_index return check (Tim Huang) Signed-off-by: Jesse Zhang <[email protected]> Reviewed-by: Tim Huang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-30drm/amdgpu/discovery: add sdma v7_0 ip blockLikun Gao1-0/+5
Add sdma v7_0 ip block. v2: squash in updates (Alex) Signed-off-by: Likun Gao <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-30drm/amdgpu: provide more ucode name shown via idLikun Gao1-0/+24
Provide some lost ucode name shown via firmware ID. v2: fix whitespace (Alex) Signed-off-by: Likun Gao <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-30drm/amdgpu: support SDMA v3 struct fw front door loadLikun Gao4-0/+19
Add support for new SDMA firmware struct (V3) with PSP front door load type. Signed-off-by: Likun Gao <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-30drm/amdgpu/sdma7: set sdma hang watchdogJack Xiao1-0/+7
Set SDMAx_WATCHDOG_CNTL.QUEUE_HANG_COUNT registers to improve SDMA reliability. Signed-off-by: Jack Xiao <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-30drm/amdgpu: Add sdma v7_0 ip block support (v7)Likun Gao4-1/+1668
v1: Add sdma v7_0 ip block support. (Likun) v2: Move vmhub from ring_funcs to ring. (Hawking) v3: Switch to AMDGPU_GFXHUB(0). (Hawking) v4: Move microcode init into early_init. (Likun) v5: Fix warnings (Alex) v6: Squash in various fixes (Alex) v7: Rebase (Alex) v8: Rebase (Alex) Signed-off-by: Likun Gao <[email protected]> Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-30drm/amd/display: Add MSF panel to DPCD 0x317 patch listTobias Jakobi1-0/+1
This 8.4 inch panel is integrated in the Ayaneo Kun handheld device. The panel resolution is 2560×1600, i.e. it has portrait dimensions. Decoding the EDID shows: Manufacturer: MSF Model: 4099 Display Product Name: 'TV080WUM-NL0 ' Judging from the product name this might be a clone of a BOE panel, but with larger dimensions. Panel frequently shows non-functional backlight control. Adding some debug prints to update_connector_ext_caps() shows that something the OLED bit of ext_caps is set, and then the driver assumes that backlight is controlled via AUX. Forcing backlight control to PWM via amdgpu.backlight=0 restores backlight operation. Signed-off-by: Tobias Jakobi <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-30drm/amd/display: Remove duplicate dcn401/dcn401_clk_mgr.h headerJiapeng Chong1-1/+0
./drivers/gpu/drm/amd/display/dc/clk_mgr/dcn401/dcn401_clk_mgr.c: dcn401/dcn401_clk_mgr.h is included more than once. Reported-by: Abaci Robot <[email protected]> Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=8885 Signed-off-by: Jiapeng Chong <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-30drm/amdgpu: Add sdma fw v3 structureLikun Gao2-0/+15
Add sdma firmware struct version 3 to support sdma v7_0 firmware. Signed-off-by: Likun Gao <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-30drm/amd/display: Remove duplicate spl/dc_spl_types.h headerJiapeng Chong1-2/+0
./drivers/gpu/drm/amd/display/dc/inc/hw/transform.h: spl/dc_spl_types.h is included more than once. Reported-by: Abaci Robot <[email protected]> Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=8884 Signed-off-by: Jiapeng Chong <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-30drm/amdgpu: Fix signedness bug in sdma_v4_0_process_trap_irq()Dan Carpenter1-1/+1
The "instance" variable needs to be signed for the error handling to work. Fixes: 8b2faf1a4f3b ("drm/amdgpu: add error handle to avoid out-of-bounds") Reviewed-by: Bob Zhou <[email protected]> Signed-off-by: Dan Carpenter <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-30drm/amdgpu: Add new members for sdma v7_0 fwLikun Gao1-0/+4
Add new members in sdma instance structure for sdma v7_0 firmware. Signed-off-by: Likun Gao <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-30drm/amdgpu: add gfx12 mqd structuresLikun Gao1-0/+1188
memory queue descriptors for gfx12. v2: squash in sdma updates (Alex) Signed-off-by: Likun Gao <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-30drm/amdgpu/discovery: Add gmc v12_0 ip blockLikun Gao1-0/+5
Add gmc v12_0 ip block. v2: Squash in updates (Alex) Signed-off-by: Likun Gao <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-30drm/amdgpu: fix doorbell regressionShashank Sharma1-1/+1
This patch adds a missed handling of PL domain doorbell while handling VRAM faults. Cc: Christian Koenig <[email protected]> Cc: Alex Deucher <[email protected]> Fixes: a6ff969fe9cb ("drm/amdgpu: fix visible VRAM handling during faults") Reviewed-by: Christian Koenig <[email protected]> Signed-off-by: Shashank Sharma <[email protected]> Signed-off-by: Arvind Yadav <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-30drm/amdgpu: support gfx v12 specific pte/pde fieldsHawking Zhang5-12/+18
Add gfx v12 pte/pde support to gmc common helper. v2: squash in fixes (Alex) Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Likun Gao <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-30drm/amdgpu: Set pte_is_pte flag in gmc v12 gartHawking Zhang1-1/+2
pte_is_pte is new flag introduced in gmc v12 that needs to be set by default for pte. Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Likun Gao <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-30drm/amdgpu: Add gmc v12_0 ip block support (v7)Hawking Zhang3-1/+1031
Add initial support for GMC v12. v1: Add gmc v12_0 ip block support. v2: Switch to gfx.kiq array. v3: Switch to vmhubs_mask. v4: Switch to AMDGPU_MMHUB0(0) and AMDGPU_GFXHUB(0) v5: Rebase (Alex) v6: Squash in fixes for AGP handling, gfxhub init order, vmhub index (Alex) v7: Rebase (Alex) v8: squash in ecc fix (Alex) Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Likun Gao <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-30drm/amdgpu: Add gfx v12 pte/pde format changeHawking Zhang1-0/+13
Add gfx v12 pte/pde format change. Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Likun Gao <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-30drm/amdgpu: Add gfxhub v12_0 ip block support (v3)Likun Gao3-1/+531
Add initial gfxhub v12 support. v1: Add gfxhub v12_0 ip block support (Likun) v2: Switch to AMDGPU_GFXHUB(0) (Hawking) v3: Squash in keep default error response mode (Hawking) Signed-off-by: Likun Gao <[email protected]> Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-30drm/amdgpu/mes11: increase waiting time for engine readyJack Xiao1-1/+1
mes schq engine require more waiting time for engine ready before packet submission. Signed-off-by: Jack Xiao <[email protected]> Reviewed-by: Yifan Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-30drm/amdkfd: Flush the process wq before creating a kfd_processLancelot SIX1-0/+8
There is a race condition when re-creating a kfd_process for a process. This has been observed when a process under the debugger executes exec(3). In this scenario: - The process executes exec. - This will eventually release the process's mm, which will cause the kfd_process object associated with the process to be freed (kfd_process_free_notifier decrements the reference count to the kfd_process to 0). This causes kfd_process_ref_release to enqueue kfd_process_wq_release to the kfd_process_wq. - The debugger receives the PTRACE_EVENT_EXEC notification, and tries to re-enable AMDGPU traps (KFD_IOC_DBG_TRAP_ENABLE). - When handling this request, KFD tries to re-create a kfd_process. This eventually calls kfd_create_process and kobject_init_and_add. At this point the call to kobject_init_and_add can fail because the old kfd_process.kobj has not been freed yet by kfd_process_wq_release. This patch proposes to avoid this race by making sure to drain kfd_process_wq before creating a new kfd_process object. This way, we know that any cleanup task is done executing when we reach kobject_init_and_add. Signed-off-by: Lancelot SIX <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-30drm/amd/pm: fix warning using uninitialized value of max_vid_stepJesse Zhang1-1/+4
Check the return of pp_atomfwctrl_get_Voltage_table_v4 as it may fail to initialize max_vid_step V2: change the check condition (Tim Huang) Signed-off-by: Jesse Zhang <[email protected]> Reviewed-by: Tim Huang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-30drm/amdgpu/gfx: enable mes to map legacy queue supportJack Xiao1-9/+41
Enable mes to map legacy queue support. v2: kiq_set_resources is required. Signed-off-by: Jack Xiao <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-30drm/amdkfd: Evict BO itself for contiguous allocationPhilip Yang1-1/+18
If the BO pages pinned for RDMA is not contiguous on VRAM, evict it to system memory first to free the VRAM space, then allocate contiguous VRAM space, and then move it from system memory back to VRAM. v6: user context should use interruptible call (Felix) Signed-off-by: Philip Yang <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-30drm/amd/display: re-indent dpp401_dscl_program_isharp()Dan Carpenter1-63/+30
Smatch complains because some lines are indented more than they should be. I went a bit crazy re-indenting this. ;) The comments were not useful except as a marker of things which are left to implement so I deleted most of them except for the TODO. I introduced a "data" pointer so that I could replace "scl_data->dscl_prog_data." with just "data->" and shorten the lines a bit. It's more readable without the line breaks. I also tried to align it so you can see what is changing on each line. Signed-off-by: Dan Carpenter <[email protected]> Reviewed-by: Aurabindo Pillai <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-30drm/amd/pm: fix uninitialized variable warning for smu8_hwmgrTim Huang1-3/+12
Clear warnings that using uninitialized value level when fails to get the value from SMU. Signed-off-by: Tim Huang <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-30drm/amd/pm: fix uninitialized variable warningJesse Zhang4-16/+37
Check the return of function smum_send_msg_to_smc as it may fail to initialize the variable. Signed-off-by: Jesse Zhang <[email protected]> Reviewed-by: Tim Huang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-30drm/amd/pm: fix uninitialized variable warningJesse Zhang1-1/+1
Check the return of function smum_send_msg_to_smc as it may fail to initialize the variable. Signed-off-by: Jesse Zhang <[email protected]> Reviewed-by: Yang Wang <[email protected]> Reviewed-by: Tim Huang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-30drm/amdgpu/pm: Check the return value of smum_send_msg_to_smcMa Jun1-2/+6
Check the return value of smum_send_msg_to_smc, otherwise we might use an uninitialized variable "now" Signed-off-by: Ma Jun <[email protected]> Reviewed-by: Tim Huang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-30drm/amdgpu: Remove redundant function callYiPeng Chai1-16/+6
Remove redundant function call. Signed-off-by: YiPeng Chai <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-30rm/amdgpu: Remove unused codeYiPeng Chai1-71/+0
Remove unused code. Signed-off-by: YiPeng Chai <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-30drm/amdgpu: fix overflowed array index read warningTim Huang1-1/+2
Clear overflowed array index read warning by cast operation. Signed-off-by: Tim Huang <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-30drm/amdgpu: fix potential resource leak warningTim Huang1-0/+5
Clear resource leak warning that when the prepare fails, the allocated amdgpu job object will never be released. Signed-off-by: Tim Huang <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-30drm/amdgpu: avoid dump mca bank log muti times during ras ISRYang Wang2-0/+28
because the ue valid mca count will only be cleared after gpu reset, so only dump mca log on the first time to get mca bank after receive RAS interrupt. Signed-off-by: Yang Wang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-30drm/amdgpu: add MCA smu cache supportYang Wang3-7/+116
v1: because SMU CE valid mca bank will be cleared after reading, this patch adds mca cache at the driver level to ensure that the mca bank is not lost. v2: refine amdgpu_mca_init/fini/reset() function name. v3: add mca_cache.lock support only add CE bank to mca bank cache. Signed-off-by: Yang Wang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-30drm/amdgpu: add amdgpu MCA bank dispatch function supportYang Wang1-42/+55
- Refine mca driver code. - Centralize mca bank dispatch code logic. Signed-off-by: Yang Wang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-30drm/amdgpu: Add mmhub v4_1_0 ip block support (v4)Hawking Zhang3-1/+683
Add initial support for MMHUB 4.1.0. v1: Add mmhub v4_1_0 ip block support. v2: Switch to AMDGPU_MMHUB0(0). v3: squash in fix for ip version check (Alex) v4: squash in vm_contexts_disable fix (Alex) Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Likun Gao <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-30drm/amdgpu: Evict BOs from same process for contiguous allocationPhilip Yang1-1/+2
When TTM failed to alloc VRAM, TTM try evict BOs from VRAM to system memory then retry the allocation, this skips the KFD BOs from the same process because KFD require all BOs are resident for user queues. If TTM with TTM_PL_FLAG_CONTIGUOUS flag to alloc contiguous VRAM, allow TTM evict KFD BOs from the same process, this will evict the user queues first, and restore the queues later after contiguous VRAM allocation. Signed-off-by: Philip Yang <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-30drm/amdgpu: Handle sg size limit for contiguous allocationPhilip Yang1-6/+6
Define macro AMDGPU_MAX_SG_SEGMENT_SIZE 2GB, because struct scatterlist length is unsigned int, and some users of it cast to a signed int, so every segment of sg table is limited to size 2GB maximum. For contiguous VRAM allocation, don't limit the max buddy block size in order to get contiguous VRAM memory. To workaround the sg table segment size limit, allocate multiple segments if contiguous size is bigger than AMDGPU_MAX_SG_SEGMENT_SIZE. Signed-off-by: Philip Yang <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>