aboutsummaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/amd/amdgpu
AgeCommit message (Collapse)AuthorFilesLines
2023-09-26drm/amdgpu: Restore partition mode after resetLijo Lazar4-8/+28
On a full device reset, PSP FW gets unloaded. Hence restore the partition mode by placing a new request. Signed-off-by: Lijo Lazar <[email protected]> Reviewed-by: Asad Kamal <[email protected]> Acked-by: Alex Deucher <[email protected]> Tested-by: Asad Kamal <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-09-20drm/amdgpu: fix a memory leak in amdgpu_ras_feature_enableCong Liu1-0/+1
This patch fixes a memory leak in the amdgpu_ras_feature_enable() function. The leak occurs when the function sends a command to the firmware to enable or disable a RAS feature for a GFX block. If the command fails, the kfree() function is not called to free the info memory. Fixes: 9f051d6ff13f ("drm/amdgpu: Free ras cmd input buffer properly") Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Cong Liu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-09-20Revert "drm/amdgpu: Report vbios version instead of PN"Lijo Lazar1-1/+1
This reverts commit 7748ce5b69581325cae40c2134088820f0957902. vbios_version sysfs node is used to identify Part Number also. Revert to the same so that it doesn't break scripts/software which parse this. Signed-off-by: Lijo Lazar <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-09-20drm/amdgpu: Add more fields to IP versionLijo Lazar3-13/+35
Include subrevision and variant fileds also to IP version. Signed-off-by: Lijo Lazar <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-09-20drm/amdgpu: print channel index for UMC bad pageTao Zhou1-4/+6
Print channel index for UMC v12. Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-09-20drm/amdgpu: update IP count INFO querySathishkumar S1-29/+61
update the query to return the number of functional instances where there is more than an instance of the requested type and for others continue to return one. v2: count must reflect the actual number of engines (Alex) v3: fix wrong number of engines for vcn (Alex) Signed-off-by: Sathishkumar S <[email protected]> Reviewed-by: Leo Liu <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-09-20drm/amdgpu: Fix false positive error logStanley.Yang1-5/+5
It should first check block ras obj whether be set, it should return 0 directly if block ras obj or hw_ops is not set. If block doesn't support RAS just return 0 is fine. Changed from V1: return 0 directly if block ras obj or hw ops is not set Signed-off-by: Stanley.Yang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-09-20drm/amdgpu/jpeg: skip set pg for sriovVignesh Chander2-11/+12
Host handles PG. Signed-off-by: Vignesh Chander <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-09-20drm/amdgpu: Rework coredump to use memory dynamicallyAndré Almeida2-28/+49
Instead of storing coredump information inside amdgpu_device struct, move if to a proper separated struct and allocate it dynamically. This will make it easier to further expand the logged information. Signed-off-by: André Almeida <[email protected]> Reviewed-by: Shashank Sharma <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-09-20drm/amdgpu: fix a memory leak in amdgpu_ras_feature_enableCong Liu1-0/+1
This patch fixes a memory leak in the amdgpu_ras_feature_enable() function. The leak occurs when the function sends a command to the firmware to enable or disable a RAS feature for a GFX block. If the command fails, the kfree() function is not called to free the info memory. Fixes: 9f051d6ff13f ("drm/amdgpu: Free ras cmd input buffer properly") Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Cong Liu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-09-20drm/amdgpu: Fix vbios version string searchLijo Lazar1-1/+18
Search for vbios version string in STRING_OFFSET-ATOM_ROM_HEADER region first. If those offsets are not populated, use the hardcoded region. Signed-off-by: Lijo Lazar <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-09-20Revert "drm/amdgpu: Report vbios version instead of PN"Lijo Lazar1-1/+1
This reverts commit 7748ce5b69581325cae40c2134088820f0957902. vbios_version sysfs node is used to identify Part Number also. Revert to the same so that it doesn't break scripts/software which parse this. Signed-off-by: Lijo Lazar <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-09-20drm/amdgpu: Add EXT_COHERENT memory allocation flagsDavid Francis5-1/+9
These flags (for GEM and SVM allocations) allocate memory that allows for system-scope atomic semantics. On GFX943 these flags cause caches to be avoided on non-local memory. On all other ASICs they are identical in functionality to the equivalent COHERENT flags. Corresponding Thunk patch is at https://github.com/RadeonOpenCompute/ROCT-Thunk-Interface/pull/88 Reviewed-by: David Yat Sin <[email protected]> Signed-off-by: David Francis <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-09-20drm/amdgpu: add amdgpu mca debug sysfs supportYang Wang3-0/+120
add amdgpu mca debug sysfs support. Signed-off-by: Yang Wang <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-09-20drm/amdgpu: add VPE IP discovery info to HW IP info queryAlex Deucher1-0/+4
Add missing IP discovery info. Reviewed-by: Christian König <[email protected]> Reviewed-by: Lang Yu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-09-20drm/amdgpu: add amdgpu smu mca dump feature supportYang Wang2-0/+129
add amdgpu smu mca dump feature support. Signed-off-by: Yang Wang <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-09-20drm/amd: Enable seamless boot by default on newer ASICsMario Limonciello1-11/+4
Seamless boot can technically be supported as far back as DCN1 but to avoid regressions on older hardware, enable it for DCN3 and later. If users report using the module parameter that it works on older ASICs as well, this can be adjusted. Reviewed-by: Harry Wentland <[email protected]> Signed-off-by: Mario Limonciello <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-09-20drm/amd: Add a module parameter for seamless bootMario Limonciello3-3/+26
The module parameter can be used to test more easily enabling seamless boot support on additional ASICs. Reviewed-by: Harry Wentland <[email protected]> Signed-off-by: Mario Limonciello <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-09-20drm/amd: Add HDP flush during jpeg initTimmy Tsai1-0/+3
During jpeg init, CPU writes to frame buffer which can be cached by HDP, occasionally causing invalid header to be sent to MMSCH. Perform HDP flush after writing to frame buffer before continuing with jpeg init sequence. Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Timmy Tsai <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-09-20drm/amd: Move seamless boot check out of displayMario Limonciello2-0/+22
This will allow base driver to dictate whether seamless should be enabled. No intended functional changes. Reviewed-by: Harry Wentland <[email protected]> Signed-off-by: Mario Limonciello <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-09-20drm/amd: Drop special case for yellow carp without discoveryMario Limonciello1-6/+0
`amdgpu_gmc_get_vbios_allocations` has a special case for how to bring up yellow carp when amdgpu discovery is turned off. As this ASIC ships with discovery turned on, it's generally dead code and worse it causes `adev->mman.keep_stolen_vga_memory` to not be initialized for yellow carp. Remove it. Signed-off-by: Mario Limonciello <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-09-20drm/amdgpu: Use function for IP version checkLijo Lazar77-461/+550
Use an inline function for version check. Gives more flexibility to handle any format changes. Signed-off-by: Lijo Lazar <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-09-20drm: Update file owner during useTvrtko Ursulin1-2/+4
With the typical model where the display server opens the file descriptor and then hands it over to the client(*), we were showing stale data in debugfs. Fix it by updating the drm_file->pid on ioctl access from a different process. The field is also made RCU protected to allow for lockless readers. Update side is protected with dev->filelist_mutex. Before: $ cat /sys/kernel/debug/dri/0/clients command pid dev master a uid magic Xorg 2344 0 y y 0 0 Xorg 2344 0 n y 0 2 Xorg 2344 0 n y 0 3 Xorg 2344 0 n y 0 4 After: $ cat /sys/kernel/debug/dri/0/clients command tgid dev master a uid magic Xorg 830 0 y y 0 0 xfce4-session 880 0 n y 0 1 xfwm4 943 0 n y 0 2 neverball 1095 0 n y 0 3 *) More detailed and historically accurate description of various handover implementation kindly provided by Emil Velikov: """ The traditional model, the server was the orchestrator managing the primary device node. From the fd, to the master status and authentication. But looking at the fd alone, this has varied across the years. IIRC in the DRI1 days, Xorg (libdrm really) would have a list of open fd(s) and reuse those whenever needed, DRI2 the client was responsible for open() themselves and with DRI3 the fd was passed to the client. Around the inception of DRI3 and systemd-logind, the latter became another possible orchestrator. Whereby Xorg and Wayland compositors could ask it for the fd. For various reasons (hysterical and genuine ones) Xorg has a fallback path going the open(), whereas Wayland compositors are moving to solely relying on logind... some never had fallback even. Over the past few years, more projects have emerged which provide functionality similar (be that on API level, Dbus, or otherwise) to systemd-logind. """ v2: * Fixed typo in commit text and added a fine historical explanation from Emil. Signed-off-by: Tvrtko Ursulin <[email protected]> Cc: "Christian König" <[email protected]> Cc: Daniel Vetter <[email protected]> Acked-by: Christian König <[email protected]> Reviewed-by: Emil Velikov <[email protected]> Reviewed-by: Rob Clark <[email protected]> Tested-by: Rob Clark <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Christian König <[email protected]>
2023-09-15Merge tag 'amd-drm-fixes-6.6-2023-09-13' of ↵Dave Airlie19-98/+69
https://gitlab.freedesktop.org/agd5f/linux into drm-fixes amd-drm-fixes-6.6-2023-09-13: amdgpu: - GC 9.4.3 fixes - Fix white screen issues with S/G display on system with >= 64G of ram - Replay fixes - SMU 13.0.6 fixes - AUX backlight fix - NBIO 4.3 SR-IOV fixes for HDP - RAS fixes - DP MST resume fix - Fix segfault on systems with no vbios - DPIA fixes amdkfd: - CWSR grace period fix - Unaligned doorbell fix - CRIU fix for GFX11 - Add missing TLB flush on gfx10 and newer Signed-off-by: Dave Airlie <[email protected]> From: Alex Deucher <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2023-09-14Merge tag 'drm-misc-fixes-2023-09-07' of ↵Daniel Vetter1-1/+1
git://anongit.freedesktop.org/drm/drm-misc into drm-fixes One doc fix for drm/connector, one fix for amdgpu for an crash when VRAM usage is high, and one fix in gm12u320 to fix the timeout units in the code Signed-off-by: Daniel Vetter <[email protected]> From: Maxime Ripard <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/w5nlld5ukeh6bgtljsxmkex3e7s7f4qquuqkv5lv4cv3uxzwqr@pgokpejfsyef
2023-09-12drm/amdgpu: add remap_hdp_registers callback for nbio 7.11Alex Deucher1-0/+9
Implement support for remapping the HDP aperture registers for NBIO 7.11. Reviewed-by: Lang Yu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-09-12drm/amdgpu: add vcn_doorbell_range callback for nbio 7.11Alex Deucher1-0/+22
Implement support for setting up the VCN doorbell range for NBIO 7.11. Reviewed-by: Lang Yu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-09-12Merge drm/drm-fixes into drm-misc-fixesThomas Zimmermann145-2415/+4352
Forwarding to v6.6-rc1. Signed-off-by: Thomas Zimmermann <[email protected]>
2023-09-11drm/amdgpu: Handle null atom context in VBIOS info ioctlDavid Francis1-6/+11
On some APU systems, there is no atom context and so the atom_context struct is null. Add a check to the VBIOS_INFO branch of amdgpu_info_ioctl to handle this case, returning all zeroes. Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: David Francis <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-09-11drm/amdgpu: fallback to old RAS error message for aqua_vanjaramHawking Zhang1-2/+4
So driver doesn't generate incorrect message until the new format is settled down for aqua_vanjaram Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Yang Wang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-09-11drm/amdgpu/nbio4.3: set proper rmmio_remap.reg_offset for SR-IOVAlex Deucher1-0/+3
Needed for HDP flush to work correctly. Reviewed-by: Timmy Tsai <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-09-11drm/amdgpu/soc21: don't remap HDP registers for SR-IOVAlex Deucher1-1/+1
This matches the behavior for soc15 and nv. Acked-by: Christian König <[email protected]> Reviewed-by: Timmy Tsai <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-09-11Revert "drm/amd: Disable S/G for APUs when 64GB or more host memory"Hamza Mahfooz2-27/+0
This reverts commit 70e64c4d522b732e31c6475a3be2349de337d321. Since, we now have an actual fix for this issue, we can get rid of this workaround as it can cause pin failures if enough VRAM isn't carved out by the BIOS. Cc: [email protected] # 6.1+ Acked-by: Harry Wentland <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Hamza Mahfooz <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-09-11drm/amdgpu: Store CU info from all XCCs for GFX v9.4.3Mukul Joshi10-52/+45
Currently, we store CU info only for a single XCC assuming that it is the same for all XCCs. However, that may not be true. As a result, store CU info for all XCCs. This info is later used for CU masking. Signed-off-by: Mukul Joshi <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-09-11drm/amdkfd: Fix reg offset for setting CWSR grace periodMukul Joshi4-10/+5
This patch fixes the case where the code currently passes absolute register address and not the reg offset, which HWS expects, when sending the PM4 packet to set/update CWSR grace period. Additionally, cleanup the signature of build_grace_period_packet_info function as it no longer needs the inst parameter. Signed-off-by: Mukul Joshi <[email protected]> Reviewed-by: Jonathan Kim <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-09-11drm/amdgpu: Handle null atom context in VBIOS info ioctlDavid Francis1-6/+11
On some APU systems, there is no atom context and so the atom_context struct is null. Add a check to the VBIOS_INFO branch of amdgpu_info_ioctl to handle this case, returning all zeroes. Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: David Francis <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-09-11drm/amdgpu: Create an option to disable soft recoveryAndré Almeida3-1/+13
Create a module option to disable soft recoveries on amdgpu, making every recovery go through the device reset path. This option makes easier to force device resets for testing and debugging purposes. Signed-off-by: André Almeida <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Hamza Mahfooz <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-09-11drm/amdgpu: Merge debug module parametersAndré Almeida5-23/+43
Merge all developer debug options available as separated module parameters in one, making it obvious that are for developers. Drop the obsolete module options in favor of the new ones. Signed-off-by: André Almeida <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Hamza Mahfooz <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-09-11drm/amdgpu: add type conversion for gc infoYifan Zhang1-3/+3
gc info usage misses type conversion. Acked-by: Alex Deucher <[email protected]> Signed-off-by: Yifan Zhang <[email protected]> Reviewed-by: Li Ma <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-09-11drm/amdgpu: Rename KGD_MAX_QUEUES to AMDGPU_MAX_QUEUESMukul Joshi3-7/+7
Rename KGD_MAX_QUEUES to AMDGPU_MAX_QUEUES to conform with the naming convention followed in amdgpu_gfx.h. No functional change. Signed-off-by: Mukul Joshi <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-09-11drm/amdgpu: print more address info of UMC bad pageTao Zhou1-3/+9
Print out row, column and bank value of UMC error address for UMC v12. Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-09-11drm/amdgpu: fallback to old RAS error message for aqua_vanjaramHawking Zhang1-2/+4
So driver doesn't generate incorrect message until the new format is settled down for aqua_vanjaram Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Yang Wang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-09-11drm/amdgpu/nbio4.3: set proper rmmio_remap.reg_offset for SR-IOVAlex Deucher1-0/+3
Needed for HDP flush to work correctly. Reviewed-by: Timmy Tsai <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-09-11drm/amdgpu/soc21: don't remap HDP registers for SR-IOVAlex Deucher1-1/+1
This matches the behavior for soc15 and nv. Acked-by: Christian König <[email protected]> Reviewed-by: Timmy Tsai <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-09-11Revert "drm/amd: Disable S/G for APUs when 64GB or more host memory"Hamza Mahfooz2-27/+0
This reverts commit 70e64c4d522b732e31c6475a3be2349de337d321. Since, we now have an actual fix for this issue, we can get rid of this workaround as it can cause pin failures if enough VRAM isn't carved out by the BIOS. Cc: [email protected] # 6.1+ Acked-by: Harry Wentland <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Hamza Mahfooz <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-09-11drm/amdgpu: add channel index table for UMC v12Tao Zhou3-0/+20
Get UMC phyical channel index according to node id, umc instance and channel instance. Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-09-11drm/amdgpu: add address conversion for UMC v12Tao Zhou3-1/+159
Convert MCA error address to physical address and find out all pages in one physical row. Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-09-11drm/amdgpu: Use default reset method handlerLijo Lazar1-9/+7
When reset method is not passed in reset context, look for the handler for default reset method. On Aldebaran, default reset method for SOCs connected to CPU over XGMI is MODE2. Signed-off-by: Lijo Lazar <[email protected]> Reviewed-by: Asad Kamal <[email protected]> Tested-by: Asad Kamal <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-09-11drm/amdgpu: Store CU info from all XCCs for GFX v9.4.3Mukul Joshi10-52/+45
Currently, we store CU info only for a single XCC assuming that it is the same for all XCCs. However, that may not be true. As a result, store CU info for all XCCs. This info is later used for CU masking. Signed-off-by: Mukul Joshi <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-09-11drm/amd: Fix the flag setting code for interrupt requestMa Jun1-19/+26
[1] Remove the irq flags setting code since pci_alloc_irq_vectors() handles these flags. [2] Free the msi vectors in case of error. Signed-off-by: Ma Jun <[email protected]> Acked-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>