diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2023-11-01 06:28:35 -1000 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2023-11-01 06:28:35 -1000 |
commit | 7d461b291e65938f15f56fe58da2303b07578a76 (patch) | |
tree | 015dd7c2f1743dd70be52787dd9aff33822bc938 /drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | |
parent | 8bc9e6515183935fa0cccaf67455c439afe4982b (diff) | |
parent | 631808095a82e6b6f8410a95f8b12b8d0d38b161 (diff) |
Merge tag 'drm-next-2023-10-31-1' of git://anongit.freedesktop.org/drm/drm
Pull drm updates from Dave Airlie:
"Highlights:
- AMD adds some more upcoming HW platforms
- Intel made Meteorlake stable and started adding Lunarlake
- nouveau has a bunch of display rework in prepartion for the NVIDIA
GSP firmware support
- msm adds a7xx support
- habanalabs has finished migration to accel subsystem
Detail summary:
kernel:
- add initial vmemdup-user-array
core:
- fix platform remove() to return void
- drm_file owner updated to reflect owner
- move size calcs to drm buddy allocator
- let GPUVM build as a module
- allow variable number of run-queues in scheduler
edid:
- handle bad h/v sync_end in EDIDs
panfrost:
- add Boris as maintainer
fbdev:
- use fb_ops helpers more
- only allow logo use from fbcon
- rename fb_pgproto to pgprot_framebuffer
- add HPD state to drm_connector_oob_hotplug_event
- convert to fbdev i/o mem helpers
i915:
- Enable meteorlake by default
- Early Xe2 LPD/Lunarlake display enablement
- Rework subplatforms into IP version checks
- GuC based TLB invalidation for Meteorlake
- Display rework for future Xe driver integration
- LNL FBC features
- LNL display feature capability reads
- update recommended fw versions for DG2+
- drop fastboot module parameter
- added deviceid for Arrowlake-S
- drop preproduction workarounds
- don't disable preemption for resets
- cleanup inlines in headers
- PXP firmware loading fix
- Fix sg list lengths
- DSC PPS state readout/verification
- Add more RPL P/U PCI IDs
- Add new DG2-G12 stepping
- DP enhanced framing support to state checker
- Improve shared link bandwidth management
- stop using GEM macros in display code
- refactor related code into display code
- locally enable W=1 warnings
- remove PSR watchdog timers on LNL
amdgpu:
- RAS/FRU EEPROM updatse
- IP discovery updatses
- GC 11.5 support
- DCN 3.5 support
- VPE 6.1 support
- NBIO 7.11 support
- DML2 support
- lots of IP updates
- use flexible arrays for bo list handling
- W=1 fixes
- Enable seamless boot in more cases
- Enable context type property for HDMI
- Rework GPUVM TLB flushing
- VCN IB start/size alignment fixes
amdkfd:
- GC 10/11 fixes
- GC 11.5 support
- use partial migration in GPU faults
radeon:
- W=1 Fixes
- fix some possible buffer overflow/NULL derefs
nouveau:
- update uapi for NO_PREFETCH
- scheduler/fence fixes
- rework suspend/resume for GSP-RM
- rework display in preparation for GSP-RM
habanalabs:
- uapi: expose tsc clock
- uapi: block access to eventfd through control device
- uapi: force dma-buf export to PAGE_SIZE alignments
- complete move to accel subsystem
- move firmware interface include files
- perform hard reset on PCIe AXI drain event
- optimise user interrupt handling
msm:
- DP: use existing helpers for DPCD
- DPU: interrupts reworked
- gpu: a7xx (a730/a740) support
- decouple msm_drv from kms for headless devices
mediatek:
- MT8188 dsi/dp/edp support
- DDP GAMMA - 12 bit LUT support
- connector dynamic selection capability
rockchip:
- rv1126 mipi-dsi/vop support
- add planar formats
ast:
- rename constants
panels:
- Mitsubishi AA084XE01
- JDI LPM102A188A
- LTK050H3148W-CTA6
ivpu:
- power management fixes
qaic:
- add detach slice bo api
komeda:
- add NV12 writeback
tegra:
- support NVSYNC/NHSYNC
- host1x suspend fixes
ili9882t:
- separate into own driver"
* tag 'drm-next-2023-10-31-1' of git://anongit.freedesktop.org/drm/drm: (1803 commits)
drm/amdgpu: Remove unused variables from amdgpu_show_fdinfo
drm/amdgpu: Remove duplicate fdinfo fields
drm/amd/amdgpu: avoid to disable gfxhub interrupt when driver is unloaded
drm/amdgpu: Add EXT_COHERENT support for APU and NUMA systems
drm/amdgpu: Retrieve CE count from ce_count_lo_chip in EccInfo table
drm/amdgpu: Identify data parity error corrected in replay mode
drm/amdgpu: Fix typo in IP discovery parsing
drm/amd/display: fix S/G display enablement
drm/amdxcp: fix amdxcp unloads incompletely
drm/amd/amdgpu: fix the GPU power print error in pm info
drm/amdgpu: Use pcie domain of xcc acpi objects
drm/amd: check num of link levels when update pcie param
drm/amdgpu: Add a read to GFX v9.4.3 ring test
drm/amd/pm: call smu_cmn_get_smc_version in is_mode1_reset_supported.
drm/amdgpu: get RAS poison status from DF v4_6_2
drm/amdgpu: Use discovery table's subrevision
drm/amd/display: 3.2.256
drm/amd/display: add interface to query SubVP status
drm/amd/display: Read before writing Backlight Mode Set Register
drm/amd/display: Disable SYMCLK32_SE RCO on DCN314
...
Diffstat (limited to 'drivers/gpu/drm/amd/amdgpu/amdgpu_device.c')
-rw-r--r-- | drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 472 |
1 files changed, 346 insertions, 126 deletions
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 2b8356699f23..d5f78179b2b6 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -32,8 +32,6 @@ #include <linux/slab.h> #include <linux/iommu.h> #include <linux/pci.h> -#include <linux/devcoredump.h> -#include <generated/utsrelease.h> #include <linux/pci-p2pdma.h> #include <linux/apple-gmux.h> @@ -162,6 +160,74 @@ static ssize_t amdgpu_device_get_pcie_replay_count(struct device *dev, static DEVICE_ATTR(pcie_replay_count, 0444, amdgpu_device_get_pcie_replay_count, NULL); +/** + * DOC: board_info + * + * The amdgpu driver provides a sysfs API for giving board related information. + * It provides the form factor information in the format + * + * type : form factor + * + * Possible form factor values + * + * - "cem" - PCIE CEM card + * - "oam" - Open Compute Accelerator Module + * - "unknown" - Not known + * + */ + +static ssize_t amdgpu_device_get_board_info(struct device *dev, + struct device_attribute *attr, + char *buf) +{ + struct drm_device *ddev = dev_get_drvdata(dev); + struct amdgpu_device *adev = drm_to_adev(ddev); + enum amdgpu_pkg_type pkg_type = AMDGPU_PKG_TYPE_CEM; + const char *pkg; + + if (adev->smuio.funcs && adev->smuio.funcs->get_pkg_type) + pkg_type = adev->smuio.funcs->get_pkg_type(adev); + + switch (pkg_type) { + case AMDGPU_PKG_TYPE_CEM: + pkg = "cem"; + break; + case AMDGPU_PKG_TYPE_OAM: + pkg = "oam"; + break; + default: + pkg = "unknown"; + break; + } + + return sysfs_emit(buf, "%s : %s\n", "type", pkg); +} + +static DEVICE_ATTR(board_info, 0444, amdgpu_device_get_board_info, NULL); + +static struct attribute *amdgpu_board_attrs[] = { + &dev_attr_board_info.attr, + NULL, +}; + +static umode_t amdgpu_board_attrs_is_visible(struct kobject *kobj, + struct attribute *attr, int n) +{ + struct device *dev = kobj_to_dev(kobj); + struct drm_device *ddev = dev_get_drvdata(dev); + struct amdgpu_device *adev = drm_to_adev(ddev); + + if (adev->flags & AMD_IS_APU) + return 0; + + return attr->mode; +} + +static const struct attribute_group amdgpu_board_attrs_group = { + .attrs = amdgpu_board_attrs, + .is_visible = amdgpu_board_attrs_is_visible +}; + static void amdgpu_device_get_pcie_info(struct amdgpu_device *adev); @@ -507,6 +573,7 @@ void amdgpu_device_wreg(struct amdgpu_device *adev, * @adev: amdgpu_device pointer * @reg: mmio/rlc register * @v: value to write + * @xcc_id: xcc accelerated compute core id * * this function is invoked only for the debugfs register access */ @@ -571,7 +638,7 @@ u32 amdgpu_device_indirect_rreg_ext(struct amdgpu_device *adev, pcie_index = adev->nbio.funcs->get_pcie_index_offset(adev); pcie_data = adev->nbio.funcs->get_pcie_data_offset(adev); - if (adev->nbio.funcs->get_pcie_index_hi_offset) + if ((reg_addr >> 32) && (adev->nbio.funcs->get_pcie_index_hi_offset)) pcie_index_hi = adev->nbio.funcs->get_pcie_index_hi_offset(adev); else pcie_index_hi = 0; @@ -638,6 +705,56 @@ u64 amdgpu_device_indirect_rreg64(struct amdgpu_device *adev, return r; } +u64 amdgpu_device_indirect_rreg64_ext(struct amdgpu_device *adev, + u64 reg_addr) +{ + unsigned long flags, pcie_index, pcie_data; + unsigned long pcie_index_hi = 0; + void __iomem *pcie_index_offset; + void __iomem *pcie_index_hi_offset; + void __iomem *pcie_data_offset; + u64 r; + + pcie_index = adev->nbio.funcs->get_pcie_index_offset(adev); + pcie_data = adev->nbio.funcs->get_pcie_data_offset(adev); + if ((reg_addr >> 32) && (adev->nbio.funcs->get_pcie_index_hi_offset)) + pcie_index_hi = adev->nbio.funcs->get_pcie_index_hi_offset(adev); + + spin_lock_irqsave(&adev->pcie_idx_lock, flags); + pcie_index_offset = (void __iomem *)adev->rmmio + pcie_index * 4; + pcie_data_offset = (void __iomem *)adev->rmmio + pcie_data * 4; + if (pcie_index_hi != 0) + pcie_index_hi_offset = (void __iomem *)adev->rmmio + + pcie_index_hi * 4; + + /* read low 32 bits */ + writel(reg_addr, pcie_index_offset); + readl(pcie_index_offset); + if (pcie_index_hi != 0) { + writel((reg_addr >> 32) & 0xff, pcie_index_hi_offset); + readl(pcie_index_hi_offset); + } + r = readl(pcie_data_offset); + /* read high 32 bits */ + writel(reg_addr + 4, pcie_index_offset); + readl(pcie_index_offset); + if (pcie_index_hi != 0) { + writel((reg_addr >> 32) & 0xff, pcie_index_hi_offset); + readl(pcie_index_hi_offset); + } + r |= ((u64)readl(pcie_data_offset) << 32); + + /* clear the high bits */ + if (pcie_index_hi != 0) { + writel(0, pcie_index_hi_offset); + readl(pcie_index_hi_offset); + } + + spin_unlock_irqrestore(&adev->pcie_idx_lock, flags); + + return r; +} + /** * amdgpu_device_indirect_wreg - write an indirect register address * @@ -677,7 +794,7 @@ void amdgpu_device_indirect_wreg_ext(struct amdgpu_device *adev, pcie_index = adev->nbio.funcs->get_pcie_index_offset(adev); pcie_data = adev->nbio.funcs->get_pcie_data_offset(adev); - if (adev->nbio.funcs->get_pcie_index_hi_offset) + if ((reg_addr >> 32) && (adev->nbio.funcs->get_pcie_index_hi_offset)) pcie_index_hi = adev->nbio.funcs->get_pcie_index_hi_offset(adev); else pcie_index_hi = 0; @@ -742,6 +859,55 @@ void amdgpu_device_indirect_wreg64(struct amdgpu_device *adev, spin_unlock_irqrestore(&adev->pcie_idx_lock, flags); } +void amdgpu_device_indirect_wreg64_ext(struct amdgpu_device *adev, + u64 reg_addr, u64 reg_data) +{ + unsigned long flags, pcie_index, pcie_data; + unsigned long pcie_index_hi = 0; + void __iomem *pcie_index_offset; + void __iomem *pcie_index_hi_offset; + void __iomem *pcie_data_offset; + + pcie_index = adev->nbio.funcs->get_pcie_index_offset(adev); + pcie_data = adev->nbio.funcs->get_pcie_data_offset(adev); + if ((reg_addr >> 32) && (adev->nbio.funcs->get_pcie_index_hi_offset)) + pcie_index_hi = adev->nbio.funcs->get_pcie_index_hi_offset(adev); + + spin_lock_irqsave(&adev->pcie_idx_lock, flags); + pcie_index_offset = (void __iomem *)adev->rmmio + pcie_index * 4; + pcie_data_offset = (void __iomem *)adev->rmmio + pcie_data * 4; + if (pcie_index_hi != 0) + pcie_index_hi_offset = (void __iomem *)adev->rmmio + + pcie_index_hi * 4; + + /* write low 32 bits */ + writel(reg_addr, pcie_index_offset); + readl(pcie_index_offset); + if (pcie_index_hi != 0) { + writel((reg_addr >> 32) & 0xff, pcie_index_hi_offset); + readl(pcie_index_hi_offset); + } + writel((u32)(reg_data & 0xffffffffULL), pcie_data_offset); + readl(pcie_data_offset); + /* write high 32 bits */ + writel(reg_addr + 4, pcie_index_offset); + readl(pcie_index_offset); + if (pcie_index_hi != 0) { + writel((reg_addr >> 32) & 0xff, pcie_index_hi_offset); + readl(pcie_index_hi_offset); + } + writel((u32)(reg_data >> 32), pcie_data_offset); + readl(pcie_data_offset); + + /* clear the high bits */ + if (pcie_index_hi != 0) { + writel(0, pcie_index_hi_offset); + readl(pcie_index_hi_offset); + } + + spin_unlock_irqrestore(&adev->pcie_idx_lock, flags); +} + /** * amdgpu_device_get_rev_id - query device rev_id * @@ -819,6 +985,13 @@ static uint64_t amdgpu_invalid_rreg64(struct amdgpu_device *adev, uint32_t reg) return 0; } +static uint64_t amdgpu_invalid_rreg64_ext(struct amdgpu_device *adev, uint64_t reg) +{ + DRM_ERROR("Invalid callback to read register 0x%llX\n", reg); + BUG(); + return 0; +} + /** * amdgpu_invalid_wreg64 - dummy reg write function * @@ -836,6 +1009,13 @@ static void amdgpu_invalid_wreg64(struct amdgpu_device *adev, uint32_t reg, uint BUG(); } +static void amdgpu_invalid_wreg64_ext(struct amdgpu_device *adev, uint64_t reg, uint64_t v) +{ + DRM_ERROR("Invalid callback to write 64 bit register 0x%llX with 0x%08llX\n", + reg, v); + BUG(); +} + /** * amdgpu_block_invalid_rreg - dummy reg read function * @@ -889,8 +1069,8 @@ static int amdgpu_device_asic_init(struct amdgpu_device *adev) amdgpu_asic_pre_asic_init(adev); - if (adev->ip_versions[GC_HWIP][0] == IP_VERSION(9, 4, 3) || - adev->ip_versions[GC_HWIP][0] >= IP_VERSION(11, 0, 0)) { + if (amdgpu_ip_version(adev, GC_HWIP, 0) == IP_VERSION(9, 4, 3) || + amdgpu_ip_version(adev, GC_HWIP, 0) >= IP_VERSION(11, 0, 0)) { amdgpu_psp_wait_for_bootloader(adev); ret = amdgpu_atomfirmware_asic_init(adev, true); return ret; @@ -1245,14 +1425,45 @@ bool amdgpu_device_need_post(struct amdgpu_device *adev) } /* - * Intel hosts such as Raptor Lake and Sapphire Rapids don't support dynamic - * speed switching. Until we have confirmation from Intel that a specific host - * supports it, it's safer that we keep it disabled for all. + * Check whether seamless boot is supported. + * + * So far we only support seamless boot on DCE 3.0 or later. + * If users report that it works on older ASICS as well, we may + * loosen this. + */ +bool amdgpu_device_seamless_boot_supported(struct amdgpu_device *adev) +{ + switch (amdgpu_seamless) { + case -1: + break; + case 1: + return true; + case 0: + return false; + default: + DRM_ERROR("Invalid value for amdgpu.seamless: %d\n", + amdgpu_seamless); + return false; + } + + if (!(adev->flags & AMD_IS_APU)) + return false; + + if (adev->mman.keep_stolen_vga_memory) + return false; + + return adev->ip_versions[DCE_HWIP][0] >= IP_VERSION(3, 0, 0); +} + +/* + * Intel hosts such as Rocket Lake, Alder Lake, Raptor Lake and Sapphire Rapids + * don't support dynamic speed switching. Until we have confirmation from Intel + * that a specific host supports it, it's safer that we keep it disabled for all. * * https://edc.intel.com/content/www/us/en/design/products/platforms/details/raptor-lake-s/13th-generation-core-processors-datasheet-volume-1-of-2/005/pci-express-support/ * https://gitlab.freedesktop.org/drm/amd/-/issues/2663 */ -bool amdgpu_device_pcie_dynamic_switching_supported(void) +static bool amdgpu_device_pcie_dynamic_switching_supported(void) { #if IS_ENABLED(CONFIG_X86) struct cpuinfo_x86 *c = &cpu_data(0); @@ -1285,20 +1496,13 @@ bool amdgpu_device_should_use_aspm(struct amdgpu_device *adev) default: return false; } + if (adev->flags & AMD_IS_APU) + return false; + if (!(adev->pm.pp_feature & PP_PCIE_DPM_MASK)) + return false; return pcie_aspm_enabled(adev->pdev); } -bool amdgpu_device_aspm_support_quirk(void) -{ -#if IS_ENABLED(CONFIG_X86) - struct cpuinfo_x86 *c = &cpu_data(0); - - return !(c->x86 == 6 && c->x86_model == INTEL_FAM6_ALDERLAKE); -#else - return true; -#endif -} - /* if we get transitioned to only one device, take VGA back */ /** * amdgpu_device_vga_set_decode - enable/disable vga decode @@ -1547,6 +1751,7 @@ static void amdgpu_switcheroo_set_state(struct pci_dev *pdev, } else { pr_info("switched off\n"); dev->switch_power_state = DRM_SWITCH_POWER_CHANGING; + amdgpu_device_prepare(dev); amdgpu_device_suspend(dev, true); amdgpu_device_cache_pci_state(pdev); /* Shut down the device */ @@ -2103,6 +2308,8 @@ static int amdgpu_device_ip_early_init(struct amdgpu_device *adev) adev->pm.pp_feature &= ~PP_GFXOFF_MASK; if (amdgpu_sriov_vf(adev) && adev->asic_type == CHIP_SIENNA_CICHLID) adev->pm.pp_feature &= ~PP_OVERDRIVE_MASK; + if (!amdgpu_device_pcie_dynamic_switching_supported()) + adev->pm.pp_feature &= ~PP_PCIE_DPM_MASK; total = true; for (i = 0; i < adev->num_ip_blocks; i++) { @@ -2280,6 +2487,7 @@ static int amdgpu_device_init_schedulers(struct amdgpu_device *adev) } r = drm_sched_init(&ring->sched, &amdgpu_sched_ops, + DRM_SCHED_PRIORITY_COUNT, ring->num_hw_submission, 0, timeout, adev->reset_domain->wq, ring->sched_score, ring->name, @@ -2449,6 +2657,9 @@ static int amdgpu_device_ip_init(struct amdgpu_device *adev) if (r) goto init_failed; + if (adev->mman.buffer_funcs_ring->sched.ready) + amdgpu_ttm_set_buffer_funcs_status(adev, true); + /* Don't init kfd if whole hive need to be reset during init */ if (!adev->gmc.xgmi.pending_reset) { kgd2kfd_init_zone_device(adev); @@ -2731,7 +2942,7 @@ static void amdgpu_device_smu_fini_early(struct amdgpu_device *adev) { int i, r; - if (adev->ip_versions[GC_HWIP][0] > IP_VERSION(9, 0, 0)) + if (amdgpu_ip_version(adev, GC_HWIP, 0) > IP_VERSION(9, 0, 0)) return; for (i = 0; i < adev->num_ip_blocks; i++) { @@ -2984,8 +3195,10 @@ static int amdgpu_device_ip_suspend_phase2(struct amdgpu_device *adev) /* SDMA 5.x+ is part of GFX power domain so it's covered by GFXOFF */ if (adev->in_s0ix && - (adev->ip_versions[SDMA0_HWIP][0] >= IP_VERSION(5, 0, 0)) && - (adev->ip_blocks[i].version->type == AMD_IP_BLOCK_TYPE_SDMA)) + (amdgpu_ip_version(adev, SDMA0_HWIP, 0) >= + IP_VERSION(5, 0, 0)) && + (adev->ip_blocks[i].version->type == + AMD_IP_BLOCK_TYPE_SDMA)) continue; /* Once swPSP provides the IMU, RLC FW binaries to TOS during cold-boot. @@ -3044,6 +3257,8 @@ int amdgpu_device_ip_suspend(struct amdgpu_device *adev) amdgpu_virt_request_full_gpu(adev, false); } + amdgpu_ttm_set_buffer_funcs_status(adev, false); + r = amdgpu_device_ip_suspend_phase1(adev); if (r) return r; @@ -3233,6 +3448,9 @@ static int amdgpu_device_ip_resume(struct amdgpu_device *adev) r = amdgpu_device_ip_resume_phase2(adev); + if (adev->mman.buffer_funcs_ring->sched.ready) + amdgpu_ttm_set_buffer_funcs_status(adev, true); + return r; } @@ -3362,9 +3580,7 @@ static void amdgpu_device_xgmi_reset_func(struct work_struct *__work) if (adev->asic_reset_res) goto fail; - if (adev->mmhub.ras && adev->mmhub.ras->ras_block.hw_ops && - adev->mmhub.ras->ras_block.hw_ops->reset_ras_error_count) - adev->mmhub.ras->ras_block.hw_ops->reset_ras_error_count(adev); + amdgpu_ras_reset_error_count(adev, AMDGPU_RAS_BLOCK__MMHUB); } else { task_barrier_full(&hive->tb); @@ -3476,8 +3692,8 @@ static void amdgpu_device_set_mcbp(struct amdgpu_device *adev) adev->gfx.mcbp = true; else if (amdgpu_mcbp == 0) adev->gfx.mcbp = false; - else if ((adev->ip_versions[GC_HWIP][0] >= IP_VERSION(9, 0, 0)) && - (adev->ip_versions[GC_HWIP][0] < IP_VERSION(10, 0, 0)) && + else if ((amdgpu_ip_version(adev, GC_HWIP, 0) >= IP_VERSION(9, 0, 0)) && + (amdgpu_ip_version(adev, GC_HWIP, 0) < IP_VERSION(10, 0, 0)) && adev->gfx.num_gfx_rings) adev->gfx.mcbp = true; @@ -3542,6 +3758,8 @@ int amdgpu_device_init(struct amdgpu_device *adev, adev->pciep_wreg = &amdgpu_invalid_wreg; adev->pcie_rreg64 = &amdgpu_invalid_rreg64; adev->pcie_wreg64 = &amdgpu_invalid_wreg64; + adev->pcie_rreg64_ext = &amdgpu_invalid_rreg64_ext; + adev->pcie_wreg64_ext = &amdgpu_invalid_wreg64_ext; adev->uvd_ctx_rreg = &amdgpu_invalid_rreg; adev->uvd_ctx_wreg = &amdgpu_invalid_wreg; adev->didt_rreg = &amdgpu_invalid_rreg; @@ -3597,6 +3815,8 @@ int amdgpu_device_init(struct amdgpu_device *adev, INIT_LIST_HEAD(&adev->ras_list); + INIT_LIST_HEAD(&adev->pm.od_kobj_list); + INIT_DELAYED_WORK(&adev->delayed_init_work, amdgpu_device_delayed_init_work_handler); INIT_DELAYED_WORK(&adev->gfx.gfx_off_delay_work, @@ -3693,7 +3913,8 @@ int amdgpu_device_init(struct amdgpu_device *adev, * internal path natively support atomics, set have_atomics_support to true. */ } else if ((adev->flags & AMD_IS_APU) && - (adev->ip_versions[GC_HWIP][0] > IP_VERSION(9, 0, 0))) { + (amdgpu_ip_version(adev, GC_HWIP, 0) > + IP_VERSION(9, 0, 0))) { adev->have_atomics_support = true; } else { adev->have_atomics_support = @@ -3833,22 +4054,6 @@ fence_driver_init: /* Get a log2 for easy divisions. */ adev->mm_stats.log2_max_MBps = ilog2(max(1u, max_MBps)); - r = amdgpu_atombios_sysfs_init(adev); - if (r) - drm_err(&adev->ddev, - "registering atombios sysfs failed (%d).\n", r); - - r = amdgpu_pm_sysfs_init(adev); - if (r) - DRM_ERROR("registering pm sysfs failed (%d).\n", r); - - r = amdgpu_ucode_sysfs_init(adev); - if (r) { - adev->ucode_sysfs_en = false; - DRM_ERROR("Creating firmware sysfs failed (%d).\n", r); - } else - adev->ucode_sysfs_en = true; - /* * Register gpu instance before amdgpu_device_enable_mgpu_fan_boost. * Otherwise the mgpu fan boost feature will be skipped due to the @@ -3877,10 +4082,36 @@ fence_driver_init: flush_delayed_work(&adev->delayed_init_work); } + /* + * Place those sysfs registering after `late_init`. As some of those + * operations performed in `late_init` might affect the sysfs + * interfaces creating. + */ + r = amdgpu_atombios_sysfs_init(adev); + if (r) + drm_err(&adev->ddev, + "registering atombios sysfs failed (%d).\n", r); + + r = amdgpu_pm_sysfs_init(adev); + if (r) + DRM_ERROR("registering pm sysfs failed (%d).\n", r); + + r = amdgpu_ucode_sysfs_init(adev); + if (r) { + adev->ucode_sysfs_en = false; + DRM_ERROR("Creating firmware sysfs failed (%d).\n", r); + } else + adev->ucode_sysfs_en = true; + r = sysfs_create_files(&adev->dev->kobj, amdgpu_dev_attributes); if (r) dev_err(adev->dev, "Could not create amdgpu device attr\n"); + r = devm_device_add_group(adev->dev, &amdgpu_board_attrs_group); + if (r) + dev_err(adev->dev, + "Could not create amdgpu board attributes\n"); + amdgpu_fru_sysfs_init(adev); if (IS_ENABLED(CONFIG_PERF_EVENTS)) @@ -4007,6 +4238,8 @@ void amdgpu_device_fini_hw(struct amdgpu_device *adev) /* disable ras feature must before hw fini */ amdgpu_ras_pre_fini(adev); + amdgpu_ttm_set_buffer_funcs_status(adev, false); + amdgpu_device_ip_fini_early(adev); amdgpu_irq_fini_hw(adev); @@ -4044,6 +4277,9 @@ void amdgpu_device_fini_sw(struct amdgpu_device *adev) kfree(adev->bios); adev->bios = NULL; + kfree(adev->fru_info); + adev->fru_info = NULL; + px = amdgpu_device_supports_px(adev_to_drm(adev)); if (px || (!pci_is_thunderbolt_attached(adev->pdev) && @@ -4103,6 +4339,41 @@ static int amdgpu_device_evict_resources(struct amdgpu_device *adev) * Suspend & resume. */ /** + * amdgpu_device_prepare - prepare for device suspend + * + * @dev: drm dev pointer + * + * Prepare to put the hw in the suspend state (all asics). + * Returns 0 for success or an error on failure. + * Called at driver suspend. + */ +int amdgpu_device_prepare(struct drm_device *dev) +{ + struct amdgpu_device *adev = drm_to_adev(dev); + int i, r; + + if (dev->switch_power_state == DRM_SWITCH_POWER_OFF) + return 0; + + /* Evict the majority of BOs before starting suspend sequence */ + r = amdgpu_device_evict_resources(adev); + if (r) + return r; + + for (i = 0; i < adev->num_ip_blocks; i++) { + if (!adev->ip_blocks[i].status.valid) + continue; + if (!adev->ip_blocks[i].version->funcs->prepare_suspend) + continue; + r = adev->ip_blocks[i].version->funcs->prepare_suspend((void *)adev); + if (r) + return r; + } + + return 0; +} + +/** * amdgpu_device_suspend - initiate device suspend * * @dev: drm dev pointer @@ -4122,11 +4393,6 @@ int amdgpu_device_suspend(struct drm_device *dev, bool fbcon) adev->in_suspend = true; - /* Evict the majority of BOs before grabbing the full access */ - r = amdgpu_device_evict_resources(adev); - if (r) - return r; - if (amdgpu_sriov_vf(adev)) { amdgpu_virt_fini_data_exchange(adev); r = amdgpu_virt_request_full_gpu(adev, false); @@ -4145,6 +4411,8 @@ int amdgpu_device_suspend(struct drm_device *dev, bool fbcon) amdgpu_ras_suspend(adev); + amdgpu_ttm_set_buffer_funcs_status(adev, false); + amdgpu_device_ip_suspend_phase1(adev); if (!adev->in_s0ix) @@ -4786,67 +5054,16 @@ static int amdgpu_reset_reg_dumps(struct amdgpu_device *adev) lockdep_assert_held(&adev->reset_domain->sem); - for (i = 0; i < adev->num_regs; i++) { - adev->reset_dump_reg_value[i] = RREG32(adev->reset_dump_reg_list[i]); - trace_amdgpu_reset_reg_dumps(adev->reset_dump_reg_list[i], - adev->reset_dump_reg_value[i]); - } - - return 0; -} + for (i = 0; i < adev->reset_info.num_regs; i++) { + adev->reset_info.reset_dump_reg_value[i] = + RREG32(adev->reset_info.reset_dump_reg_list[i]); -#ifdef CONFIG_DEV_COREDUMP -static ssize_t amdgpu_devcoredump_read(char *buffer, loff_t offset, - size_t count, void *data, size_t datalen) -{ - struct drm_printer p; - struct amdgpu_device *adev = data; - struct drm_print_iterator iter; - int i; - - iter.data = buffer; - iter.offset = 0; - iter.start = offset; - iter.remain = count; - - p = drm_coredump_printer(&iter); - - drm_printf(&p, "**** AMDGPU Device Coredump ****\n"); - drm_printf(&p, "kernel: " UTS_RELEASE "\n"); - drm_printf(&p, "module: " KBUILD_MODNAME "\n"); - drm_printf(&p, "time: %lld.%09ld\n", adev->reset_time.tv_sec, adev->reset_time.tv_nsec); - if (adev->reset_task_info.pid) - drm_printf(&p, "process_name: %s PID: %d\n", - adev->reset_task_info.process_name, - adev->reset_task_info.pid); - - if (adev->reset_vram_lost) - drm_printf(&p, "VRAM is lost due to GPU reset!\n"); - if (adev->num_regs) { - drm_printf(&p, "AMDGPU register dumps:\nOffset: Value:\n"); - - for (i = 0; i < adev->num_regs; i++) - drm_printf(&p, "0x%08x: 0x%08x\n", - adev->reset_dump_reg_list[i], - adev->reset_dump_reg_value[i]); + trace_amdgpu_reset_reg_dumps(adev->reset_info.reset_dump_reg_list[i], + adev->reset_info.reset_dump_reg_value[i]); } - return count - iter.remain; -} - -static void amdgpu_devcoredump_free(void *data) -{ -} - -static void amdgpu_reset_capture_coredumpm(struct amdgpu_device *adev) -{ - struct drm_device *dev = adev_to_drm(adev); - - ktime_get_ts64(&adev->reset_time); - dev_coredumpm(dev->dev, THIS_MODULE, adev, 0, GFP_NOWAIT, - amdgpu_devcoredump_read, amdgpu_devcoredump_free); + return 0; } -#endif int amdgpu_do_asic_reset(struct list_head *device_list_handle, struct amdgpu_reset_context *reset_context) @@ -4895,7 +5112,7 @@ int amdgpu_do_asic_reset(struct list_head *device_list_handle, if (r) { dev_err(tmp_adev->dev, "ASIC reset failed with error, %d for drm dev, %s", r, adev_to_drm(tmp_adev)->unique); - break; + goto out; } } @@ -4914,9 +5131,7 @@ int amdgpu_do_asic_reset(struct list_head *device_list_handle, if (!r && amdgpu_ras_intr_triggered()) { list_for_each_entry(tmp_adev, device_list_handle, reset_list) { - if (tmp_adev->mmhub.ras && tmp_adev->mmhub.ras->ras_block.hw_ops && - tmp_adev->mmhub.ras->ras_block.hw_ops->reset_ras_error_count) - tmp_adev->mmhub.ras->ras_block.hw_ops->reset_ras_error_count(tmp_adev); + amdgpu_ras_reset_error_count(tmp_adev, AMDGPU_RAS_BLOCK__MMHUB); } amdgpu_ras_intr_cleared(); @@ -4948,15 +5163,9 @@ int amdgpu_do_asic_reset(struct list_head *device_list_handle, goto out; vram_lost = amdgpu_device_check_vram_lost(tmp_adev); -#ifdef CONFIG_DEV_COREDUMP - tmp_adev->reset_vram_lost = vram_lost; - memset(&tmp_adev->reset_task_info, 0, - sizeof(tmp_adev->reset_task_info)); - if (reset_context->job && reset_context->job->vm) - tmp_adev->reset_task_info = - reset_context->job->vm->task_info; - amdgpu_reset_capture_coredumpm(tmp_adev); -#endif + + amdgpu_coredump(tmp_adev, vram_lost, reset_context); + if (vram_lost) { DRM_INFO("VRAM is lost due to GPU reset!\n"); amdgpu_inc_vram_lost(tmp_adev); @@ -4966,10 +5175,18 @@ int amdgpu_do_asic_reset(struct list_head *device_list_handle, if (r) return r; + r = amdgpu_xcp_restore_partition_mode( + tmp_adev->xcp_mgr); + if (r) + goto out; + r = amdgpu_device_ip_resume_phase2(tmp_adev); if (r) goto out; + if (tmp_adev->mman.buffer_funcs_ring->sched.ready) + amdgpu_ttm_set_buffer_funcs_status(tmp_adev, true); + if (vram_lost) amdgpu_device_fill_reset_magic(tmp_adev); @@ -5183,7 +5400,8 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev, * Flush RAM to disk so that after reboot * the user can read log and see why the system rebooted. */ - if (need_emergency_restart && amdgpu_ras_get_context(adev)->reboot) { + if (need_emergency_restart && amdgpu_ras_get_context(adev) && + amdgpu_ras_get_context(adev)->reboot) { DRM_WARN("Emergency reboot."); ksys_sync_helper(); @@ -5321,8 +5539,9 @@ retry: /* Rest of adevs pre asic reset from XGMI hive. */ adev->asic_reset_res = r; /* Aldebaran and gfx_11_0_3 support ras in SRIOV, so need resume ras during reset */ - if (adev->ip_versions[GC_HWIP][0] == IP_VERSION(9, 4, 2) || - adev->ip_versions[GC_HWIP][0] == IP_VERSION(11, 0, 3)) + if (amdgpu_ip_version(adev, GC_HWIP, 0) == + IP_VERSION(9, 4, 2) || + amdgpu_ip_version(adev, GC_HWIP, 0) == IP_VERSION(11, 0, 3)) amdgpu_ras_resume(adev); } else { r = amdgpu_do_asic_reset(device_list_handle, reset_context); @@ -5347,7 +5566,8 @@ skip_hw_reset: drm_sched_start(&ring->sched, true); } - if (adev->enable_mes && adev->ip_versions[GC_HWIP][0] != IP_VERSION(11, 0, 3)) + if (adev->enable_mes && + amdgpu_ip_version(adev, GC_HWIP, 0) != IP_VERSION(11, 0, 3)) amdgpu_mes_self_test(tmp_adev); if (!drm_drv_uses_atomic_modeset(adev_to_drm(tmp_adev)) && !job_signaled) @@ -6024,7 +6244,7 @@ bool amdgpu_device_has_display_hardware(struct amdgpu_device *adev) return true; default: /* IP discovery */ - if (!adev->ip_versions[DCE_HWIP][0] || + if (!amdgpu_ip_version(adev, DCE_HWIP, 0) || (adev->harvest_ip_mask & AMD_HARVEST_IP_DMU_MASK)) return false; return true; |