aboutsummaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/amd/amdgpu
AgeCommit message (Collapse)AuthorFilesLines
2019-03-20drm/amdkfd/sriov:Put the pre and post reset in exclusive mode v2Wentao Lou1-0/+3
add amdgpu_amdkfd_pre_reset and amdgpu_amdkfd_post_reset inside amdgpu_device_reset_sriov. Signed-off-by: Wentao Lou <[email protected]> Acked-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-03-20drm/amdgpu: Wait for newly allocated PTs to be idleFelix Kuehling1-7/+13
When page table are updated by the CPU, synchronize with the allocation and initialization of newly allocated page tables. Signed-off-by: Felix Kuehling <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-03-20drm/amdgpu: more descriptive message if HMM not enabledPhilip Yang1-0/+2
If using old kernel config file, CONFIG_ZONE_DEVICE is not selected, so CONFIG_HMM and CONFIG_HMM_MIRROR is not enabled, the current driver error message "Failed to register MMU notifier" is not clear. Inform user with more descriptive message on how to fix the missing kernel config option. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109808 Signed-off-by: Philip Yang <[email protected]> Reviewed-by: Michel Dänzer <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-03-19drm/amdgpu: support userptr cross VMAs case with HMMPhilip Yang1-35/+91
userptr may cross two VMAs if the forked child process (not call exec after fork) malloc buffer, then free it, and then malloc larger size buf, kerenl will create new VMA adjacent to old VMA which was cloned from parent process, some pages of userptr are in the first VMA, the rest pages are in the second VMA. HMM expects range only have one VMA, loop over all VMAs in the address range, create multiple ranges to handle this case. See is_mergeable_anon_vma in mm/mmap.c for details. Signed-off-by: Philip Yang <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-03-19drm/amdkfd: support concurrent userptr update for HMMPhilip Yang1-6/+19
Userptr restore may have concurrent userptr invalidation after hmm_vma_fault adds the range to the hmm->ranges list, needs call hmm_vma_range_done to remove the range from hmm->ranges list first, then reschedule the restore worker. Otherwise hmm_vma_fault will add same range to the list, this will cause loop in the list because range->next point to range itself. Add function untrack_invalid_user_pages to reduce code duplication. Signed-off-by: Philip Yang <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-03-19drm/amdgpu: stop evicting busy PDs/PTsChristian König1-0/+7
Otherwise we won't be able to cleanly handle page faults. Signed-off-by: Christian König <[email protected]> Reviewed-by: Chunming Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-03-19drm/amdgpu: wait for VM to become idle during flushChristian König5-7/+22
Make sure that not only the entities are flush, but that we also wait for the HW to finish all processing. Signed-off-by: Christian König <[email protected]> Reviewed-by: Chunming Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-03-19drm/amdgpu: remove non-sense NULL ptr checkChristian König1-10/+0
It's a bug having a dead pointer in the IDR, silently returning is the worst we can do. Signed-off-by: Christian König <[email protected]> Reviewed-by: Chunming Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-03-19drm/amdgpu: remove chashChristian König2-119/+0
Remove the chash implementation for now since it isn't used any more. Signed-off-by: Christian König <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-03-19drm/amdgpu: use ring/hash for fault handling on GMC9 v3Christian König3-57/+92
Further testing showed that the idea with the chash doesn't work as expected. Especially we can't predict when we can remove the entries from the hash again. So replace the chash with a ring buffer/hash mix where entries in the container age automatically based on their timestamp. v2: use ring buffer / hash mix v3: check the timeout to make sure all entries age Signed-off-by: Christian König <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> (v2) Signed-off-by: Alex Deucher <[email protected]>
2019-03-19drm/amdgpu: limit the number of IVs processed at onceChristian König2-1/+5
Only process a maximum of 32 IVs before writing back the RPTR. This improves hw handling when we get close to an overflow in the ring buffer. Signed-off-by: Christian König <[email protected]> Reviewed-by: Michel Dänzer <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-03-19drm/amdgpu: enable IH ring 1&2 for Vega20 as wellChristian König1-17/+13
That doesn't seem to have any negative effects. Signed-off-by: Christian König <[email protected]> Acked-by: Chunming Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-03-19drm/amdgpu: enable IH doorbell for ring 1&2 on VegaChristian König2-24/+45
The doorbells should already be reserved, just enable them. Signed-off-by: Christian König <[email protected]> Acked-by: Chunming Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-03-19drm/amdgpu: change Vega IH ring 1 configChristian König1-0/+4
Disable overflow and enable full drain. This makes fault handling on ring 1 much more reliable since we don't generate back pressure any more. Signed-off-by: Christian König <[email protected]> Acked-by: Chunming Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-03-19drm/amdgpu: Only clear dumb buffers if ring is enabledNicholas Kazlauskas1-3/+10
The buffers should be cleared when possible but we also don't want buffer creation to fail in the rare case where the ring isn't ready during the call. This could happen during some suspend/resume sequences. Cc: Christian König <[email protected]> Signed-off-by: Nicholas Kazlauskas <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-03-19drm/amdgpu: Clear VRAM for DRM dumb_create buffersNicholas Kazlauskas1-1/+2
The dumb_create API isn't intended for high performance rendering and it's more useful for userspace (ie. IGT) to have them precleared. The bonus here is that we also won't needlessly leak whatever was previously in VRAM, but it also probably wasn't sensitive if it was going through this API. Signed-off-by: Nicholas Kazlauskas <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-03-19drm/amdgpu: fix semicolon.cocci warningskbuild test robot1-2/+2
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c:405:2-3: Unneeded semicolon drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c:435:2-3: Unneeded semicolon Remove unneeded semicolon. Generated by: scripts/coccinelle/misc/semicolon.cocci CC: xinhui pan <[email protected]> Signed-off-by: kbuild test robot <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-03-19drm/amdgpu: add new ras workflow control flagsxinhui pan3-1/+39
add ras post init function. Do some initialization after all IP have finished their late init. Add new member flags which will control the ras work flow. For now, vbios enable ras for us on boot. That might change in the future. So there should be a flag from vbios to tell us if ras is enabled or not on boot. Looks like there is no such info now. Other bits of the flags are reserved to control other parts of ras. Signed-off-by: xinhui pan <[email protected]> Reviewed-by: Evan Quan <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-03-19drm/amdgpu: let ras initialization a little noticeablexinhui pan1-2/+7
add drm info output if ras initialized successfully. add ras atomfirmware sanity check. Signed-off-by: xinhui pan <[email protected]> Reviewed-by: Evan Quan <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-03-19drm/amdgpu: Fix lockdep warning more gracelyxinhui pan1-6/+2
lockdep need a static key. Previously we set ignore bit to avoid the warning. Now call sysfs_attr_init to initialize the static key. Signed-off-by: xinhui pan <[email protected]> Reviewed-and-Tested-by: Andrey Grodzovsky <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-03-19drm/amdgpu: Fix ras debugfs data parsexinhui pan1-1/+1
Unzero char is accepted by sscanf, so when data is structure but unexpectedly return error invalid; Signed-off-by: xinhui pan <[email protected]> Reviewed-by: Feifei Xu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-03-19drm/amdgpu: add new member hw_supportedxinhui pan2-12/+33
Currently, it is not clear how ras is supported. Both software and hardware can set the supported. That is confusing. Fix it by adding new member hw_supported. Signed-off-by: xinhui pan <[email protected]> Reviewed-by: Evan Quan <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-03-19drm/amdgpu: Fix warning when lockdep is enabledxinhui pan1-0/+6
Set ignore bit to satisfy locpdep. Signed-off-by: xinhui pan <[email protected]> Reviewed-by: Evan Quan <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-03-19drm/amdgpu: Fix NULL pointer when ta is missingxinhui pan1-9/+15
Ta is optional, so check if ta firmware is loaded or not. Signed-off-by: xinhui pan <[email protected]> Reviewed-by: Evan Quan <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-03-19drm/amdgpu: fix ras parameter descriptionsEvan Quan1-4/+4
The descriptions of modinfo wrongly show two parameters for each feature(see below). This patch can fix this incorrect outputs. parm: amdgpu_ras_enable:Enable RAS features on the GPU (0 = disable, 1 = enable, -1 = auto (default)) parm: ras_enable:int parm: amdgpu_ras_mask:Mask of RAS features to enable (default 0xffffffff), only valid when ras_enable == 1 parm: ras_mask:uint Signed-off-by: Evan Quan <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Reviewed-by: xinhui pan <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-03-19drm/amdgpu: export both supported and enabled ras featuresxinhui pan1-2/+5
Signed-off-by: xinhui pan <[email protected]> Reviewed-by: Evan Quan <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-03-19drm/amdgpu: lookup vbios table to check ecc capabilityxinhui pan1-27/+13
Signed-off-by: xinhui pan <[email protected]> Reviewed-by: Evan Quan <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-03-19drm/amdgpu: query sram ecc/ecc availability from atombiosHawking Zhang1-205/+23
query sram ecc capability via amdgpu_atomfirmware_ecc_default_enabled query ecc availability via amdgpu_atomfirmware_sram_ecc_supported Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-03-19drm/amdgpu: add atomfirmware helper function to query sram ecc capsHawking Zhang2-0/+31
sram ecc capability could be get from firmware_capability field in firmwareinfo table Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-03-19drm/amdgpu: add atomfirmware helper function to query ecc statusHawking Zhang2-0/+32
ecc default status (enabled or disabled) could be get from umc_config field in umc_info table Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-03-19drm/amdgpu: handle ras resumexinhui pan3-3/+13
Suspend will put irq, so resume need get irq back. And in the same time, skip other ras initialization. Signed-off-by: xinhui pan <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-03-19drm/amdkfd: add RAS ECC event support (v3)Eric Huang5-0/+9
RAS ECC event will combine with GPU reset event, due to ECC interrupts are caused by uncorrectable error that triggers GPU reset. v2: Fix misleading-indentation warning v3: fix build with CONFIG_HSA_AMD disabled Signed-off-by: Eric Huang <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-03-19drm/amdgpu: add human readable debugfs control support (v2)xinhui pan1-13/+102
Currently, the debugfs control node can't parse bash-like commands. Now add such support for any tester that uses scripts. v2: squash in fixes for input validation Signed-off-by: xinhui pan <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-03-19drm/amdgpu: skip gpu reset when ras error occuredxinhui pan1-0/+3
gpu reset is not stable on vega20 A1. Signed-off-by: xinhui pan <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-03-19drm/amdgpu: add ioctl query for enabled ras features (v2)xinhui pan1-0/+10
Add a query for userspace to check which RAS features are enabled. v2: squash in warning fix Signed-off-by: xinhui pan <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-03-19drm/amdgpu: Add a new flag to AMDGPU_CTX_OP_QUERY_STATE2xinhui pan2-0/+19
Add AMDGPU_CTX_QUERY2_FLAGS_RAS_CE/UE which indicate if any error happened between previous query and this query. Signed-off-by: xinhui pan <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-03-19drm/amdgpu: enable ras on gmc9xinhui pan2-0/+278
Signed-off-by: xinhui pan <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-03-19drm/amdgpu: enable ras on gfx9 (v2)Feifei Xu2-0/+176
Register ecc interrupts and ecc interrupt handler on gfx9. Add ras support on gfx9 v2: squash in warning fix Signed-off-by: Feifei Xu <[email protected]> Signed-off-by: xinhui pan <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-03-19drm/amdgpu: enable ras on sdma4xinhui pan2-1/+187
register IH, enable ras features on sdma. create sysfs debugfs file for sdma. Signed-off-by: xinhui pan <[email protected]> Signed-off-by: Feifei Xu <[email protected]> Signed-off-by: Eric Huang <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-03-19drm/amdgpu: reserve bad pages during recoveryxinhui pan1-0/+5
Mark vram pages with errors as bad and prevent the driver from using them. Signed-off-by: xinhui pan <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-03-19drm/amdgpu: add debugfs ctrl nodexinhui pan2-10/+121
allow userspace enable/disable ras Signed-off-by: xinhui pan <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-03-19drm/amdgpu: add amdgpu_ras.c to support ras (v2)xinhui pan5-1/+1475
add obj management. add feature control. add debugfs infrastructure. add sysfs infrastructure. add IH infrastructure. add recovery infrastructure. It is a framework. Other IPs need call amdgpu_ras_xxx function instead of psp_ras_xxx functions. v2: squash in warning fixes Signed-off-by: xinhui pan <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-03-19drm/amdgpu: add psp cmd submit timeoutxinhui pan1-1/+5
Signed-off-by: xinhui pan <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-03-19drm/amdgpu: add psp v11 ras callbackxinhui pan1-0/+50
Add trigger_error and cure_posion. Acked-by: Hawking Zhang <[email protected]> Signed-off-by: xinhui pan <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-03-19drm/amdgpu: add psp ras subsystem infrastructure (v2)xinhui pan2-0/+230
Add ras fw loading, init, terminate. Add ras cmd submit helper. Add ras feature enable/disable common function. v2: squash in unused variable warning fix Signed-off-by: xinhui pan <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-03-19drm/amdgpu: add psp ras callback func and macroxinhui pan1-0/+11
Define the driver side interface for ras ta. Acked-by: Hawking Zhang <[email protected]> Signed-off-by: xinhui pan <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-03-19drm/amdgpu: add ta_ras_if.hxinhui pan1-0/+108
Signed-off-by: xinhui pan <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-03-19drm/amdgpu: add module parameters for rasxinhui pan2-0/+19
Allow RAS feature enable/disable via boot parameter. Signed-off-by: xinhui pan <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-03-19drm/amdgpu: export ta fw infoxinhui pan1-0/+21
Output the ta fw, aka xgmi/ras, via debugfs. Signed-off-by: xinhui pan <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-03-19drm/amdgpu: add ta ras fw info (v2)xinhui pan2-0/+11
Add ras fw part, xgmi and ras fw are combined together in ta binary. Reading the data from the info is not implemented yet. v2: squash in "drm/amdgpu: fix NULL pointer when ta is missing" Signed-off-by: xinhui pan <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>