aboutsummaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/amd/amdkfd
AgeCommit message (Collapse)AuthorFilesLines
2018-04-13drm/amdkfd: Remove vlaLaura Abbott2-3/+7
There's an ongoing effort to remove VLAs[1] from the kernel to eventually turn on -Wvla. Switch to a constant value that covers all hardware. [1] https://lkml.org/lkml/2018/3/7/621 Reviewed-by: Felix Kuehling <[email protected]> Acked-by: Christian König <[email protected]> Signed-off-by: Laura Abbott <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2018-05-01drm/amdkfd: Add sanity checks in IRQ handlersFelix Kuehling2-21/+39
Only accept interrupts from KFD VMIDs. Just checking for a PASID may not be enough because amdgpu started using PASIDs to map VM faults to processes. Warn if an IRQ doesn't have a valid PASID (indicating a firmware bug). Suggested-by: Shaoyun Liu <[email protected]> Suggested-by: Oak Zeng <[email protected]> Signed-off-by: Felix Kuehling <[email protected]> Reviewed-by: Oded Gabbay <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2018-05-01drm/amdkfd: Remove queue node when destroy queue failedShaoyun Liu1-3/+7
HWS may hang in the middle of destroy queue, remove the queue from the process queue list so it won't be freed again in the future Signed-off-by: Shaoyun Liu <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Felix Kuehling <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2018-05-01drm/amdkfd: Locking PM mutex while allocating IB bufferBen Goz1-1/+6
Signed-off-by: Ben Goz <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Felix Kuehling <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2018-05-01drm/amdkfd: Remove initialization of cp_hqd_ib_control on CIKFelix Kuehling1-4/+0
The initialization is not necessary. amd-kfd-staging and ROCm releases have worked without it for two years. Signed-off-by: Felix Kuehling <[email protected]> Reviewed-by: Oded Gabbay <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2018-05-01drm/amdkfd: Fix signal handling performance againFelix Kuehling1-1/+1
It turns out that idr_for_each_entry is really slow compared to just iterating over the slots. Based on measurements the difference is estimated to be about a factor 64. That means using idr_for_each_entry is only worth it with very few allocated events. Signed-off-by: Felix Kuehling <[email protected]> Reviewed-by: Oded Gabbay <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2018-05-01drm/amdkfd: Fix CP soft hang on APUsYong Zhao3-6/+4
The problem happens on Raven and Carrizo. The context save handler should not clear the high bits of PC_HI before extracting the bits of IB_STS. The bug is not relevant to VEGA10 until we enable demand paging. Signed-off-by: Jay Cornwall <[email protected]> Signed-off-by: Yong Zhao <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Felix Kuehling <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2018-05-01drm/amdkfd: Separate trap handler assembly code and its hex valuesYong Zhao4-555/+575
Since the assembly code is inside "#if 0", it is ineffective. Despite that, during debugging, we need to change the assembly code, extract it into a separate file and compile the new file into hex values using sp3. That process also requires us to remove "#if 0" and modify lines starting with "#", so that sp3 can successfully compile the new file. With this change, all the above chore is no longer needed, and cwsr_trap_handler_gfx*.asm can be directly used by sp3 to generate its hex values. Signed-off-by: Yong Zhao <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Felix Kuehling <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2018-05-01drm/amdkfd: Remove redundant include of amd-iommu.hFelix Kuehling1-3/+0
Signed-off-by: Felix Kuehling <[email protected]> Reviewed-by: Oded Gabbay <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2018-05-01drm/amdkfd: use %px to print user space address instead of %pPhilip Yang2-5/+5
Signed-off-by: Philip Yang <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Felix Kuehling <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2018-05-01drm/amdkfd: Use volatile MTYPE in default/alternate aperturesJay Cornwall1-1/+2
MTYPE_NC_NV (0) marks scalar/vector L1 cache lines as non-volatile. Cache lines loaded through these apertures are intended to be invalidated before (and sometimes during) a dispatch. The non-volatile qualifier prevents these cache lines from being distinguished from those loaded through the private aperture. Use MTYPE_NC (1) instead on both Gfx7 and Gfx8. This allows the compiler to use the BUFFER_WBINVL1_VOL instruction and is a precursor to automatic per-dispatch scalar/vector L1 volatile invalidation. Signed-off-by: Jay Cornwall <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Felix Kuehling <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2018-05-01drm/amdkfd: Reduce priority of context-saving waves before spin-waitJay Cornwall2-3/+15
Synchronization between context-saving wavefronts is achieved by sending a SAVEWAVE message to the SPI and then spin-waiting for a response. These spin-waiting wavefronts may inhibit the progress of other wavefronts in the context save handler, leading to the synchronization condition never being achieved. Before spin-waiting reduce the priority of each wavefront to guarantee foward progress in the others. Signed-off-by: Jay Cornwall <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Felix Kuehling <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2018-05-01drm/amdkfd: Dump HQD of HIQOak Zeng1-0/+12
Signed-off-by: Oak Zeng <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Felix Kuehling <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2018-04-24drm/amdkfd: Integer overflows in ioctlDan Carpenter1-4/+4
args->n_devices is a u32 that comes from the user. The multiplication could overflow on 32 bit systems possibly leading to privilege escalation. Fixes: 5ec7e02854b3 ("drm/amdkfd: Add ioctls for GPUVM memory management") Signed-off-by: Dan Carpenter [email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2018-04-10drm/amdkfd: Add Vega10 topology and device infoFelix Kuehling4-0/+55
* Report 64-bit doorbells as HSA_CAP_DOORBELL_TYPE_2_0 in topology * Report cache information in topology (duplicates GFXv8 info for now) * Add device info for Vega10 support in KFD Raven is not enabled at this time as it needs additional changes in DQM to work with a single SDMA engine. Signed-off-by: Felix Kuehling <[email protected]> Reviewed-by: Oded Gabbay <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2018-04-10drm/amdkfd: Try to enable atomics for all GPUswelu1-14/+13
Report failure to enable atomics only on GPUs that require them. This allows GPUs that don't require atomics to function, but can benefit if they are available. This is the case for Vega10, which doesn't use atomics for basic functioning of the MEC, AQL and HWS microcode. So it can work without atomics. But shader programs can still use atomic instructions on systems that support PCIe atomics. Signed-off-by: welu <[email protected]> Signed-off-by: Felix Kuehling <[email protected]> Reviewed-by: Oded Gabbay <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2018-04-10drm/amdkfd: Add GFXv9 CWSR trap handlerFelix Kuehling2-3/+1505
Signed-off-by: Shaoyun Liu <[email protected]> Signed-off-by: Jay Cornwall <[email protected]> Signed-off-by: Felix Kuehling <[email protected]> Reviewed-by: Oded Gabbay <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2018-04-10drm/amdkfd: Support flat memory apertures for GFXv9Felix Kuehling1-28/+87
Signed-off-by: Felix Kuehling <[email protected]> Reviewed-by: Oded Gabbay <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2018-04-10drm/amdkfd: Remove limit on number of GPUs (follow-up)Felix Kuehling1-3/+1
This condition was missed in a previous commit with the same title. Signed-off-by: Felix Kuehling <[email protected]> Reviewed-by: Oded Gabbay <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2018-04-08drm/amdkfd: Add 64-bit doorbell and wptr support to kernel queueFelix Kuehling7-7/+63
v2: Removed redundant 0x before %p. Signed-off-by: Felix Kuehling <[email protected]> Reviewed-by: Oded Gabbay <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2018-04-10drm/amdkfd: Fix kernel queue rollback_packetFelix Kuehling1-1/+1
kq->queue->properties.write_ptr is a GPU address which can'd be derefenced in the kernel. Use kq->wptr_kernel instead, which is the kernel CPU address of the same buffer. Signed-off-by: Felix Kuehling <[email protected]> Reviewed-by: Oded Gabbay <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2018-04-10drm/amdkfd: Fix goto usageFelix Kuehling1-6/+8
Missed a spot in previous cleanup commit: Remove gotos that do not feature any common cleanup, and use gotos instead of repeating cleanup commands. According to kernel.org: "The goto statement comes in handy when a function exits from multiple locations and some common work such as cleanup has to be done. If there is no cleanup needed then just return directly." Signed-off-by: Kent Russell <[email protected]> Signed-off-by: Felix Kuehling <[email protected]> Reviewed-by: Oded Gabbay <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2018-04-10drm/amdkfd: Add SOC15 interrupt processing supportFelix Kuehling4-1/+134
Signed-off-by: Shaoyun Liu <[email protected]> Signed-off-by: Oak Zeng <[email protected]> Signed-off-by: Felix Kuehling <[email protected]> Reviewed-by: Oded Gabbay <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2018-04-10drm/amdkfd: Add GFXv9 device queue managerFelix Kuehling6-2/+106
Signed-off-by: John Bridgman <[email protected]> Signed-off-by: Felix Kuehling <[email protected]> Reviewed-by: Oded Gabbay <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2018-04-10drm/amdkfd: Add GFXv9 MQD managerFelix Kuehling5-1/+451
Signed-off-by: John Bridgman <[email protected]> Signed-off-by: Jay Cornwall <[email protected]> Signed-off-by: Felix Kuehling <[email protected]> Reviewed-by: Oded Gabbay <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2018-04-10drm/amdkfd: Add GFXv9 PM4 packet writer functionsFelix Kuehling6-12/+937
Signed-off-by: Shaoyun Liu <[email protected]> Signed-off-by: Felix Kuehling <[email protected]> Reviewed-by: Oded Gabbay <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2018-04-10drm/amdkfd: Move packet writer functions into ASIC-specific fileFelix Kuehling4-316/+420
This is in preparation for GFXv9 (Vega10) which uses incompatible PM4 packet formats from previous ASIC generations. Signed-off-by: Shaoyun Liu <[email protected]> Signed-off-by: Felix Kuehling <[email protected]> Reviewed-by: Oded Gabbay <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2018-04-10drm/amdkfd: Implement doorbell allocation for SOC15Felix Kuehling6-17/+139
Allocate doorbells according to the doorbell routing information on SOC15 ASICs (Vega10 and later). On older ASICs we continue to use the queue_id as the doorbell ID to maintain compatibility with the Thunk. Signed-off-by: Shaoyun Liu <[email protected]> Signed-off-by: Felix Kuehling <[email protected]> Reviewed-by: Oded Gabbay <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2018-04-10drm/amdkfd: Clean up KFD_MMAP_ offset handlingHarish Kasiviswanathan5-33/+59
Use bit-rotate for better clarity and remove _MASK from the #defines as these represent mmap types. Centralize all the parsing of the mmap offset in kfd_mmap and add device parameter to doorbell and reserved_mem map functions. Encode gpu_id into upper bits of vm_pgoff. This frees up the lower bits for encoding the the doorbell ID on Vega10. Signed-off-by: Harish Kasiviswanathan <[email protected]> Signed-off-by: Felix Kuehling <[email protected]> Reviewed-by: Oded Gabbay <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2018-04-10drm/amdkfd: Make doorbell size ASIC-dependentFelix Kuehling3-26/+39
This prepares for GFXv9 (Vega10), which has 64-bit doorbells. Signed-off-by: Felix Kuehling <[email protected]> Reviewed-by: Oded Gabbay <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2018-04-02Merge tag 'drm-for-v4.17' of git://people.freedesktop.org/~airlied/linuxLinus Torvalds25-326/+2538
Pull drm updates from Dave Airlie: "Cannonlake and Vega12 support are probably the two major things. This pull lacks nouveau, Ben had some unforseen leave and a few other blockers so we'll see how things look or maybe leave it for this merge window. core: - Device links to handle sound/gpu pm dependency - Color encoding/range properties - Plane clipping into plane check helper - Backlight helpers - DP TP4 + HBR3 helper support amdgpu: - Vega12 support - Enable DC by default on all supported GPUs - Powerplay restructuring and cleanup - DC bandwidth calc updates - DC backlight on pre-DCE11 - TTM backing store dropping support - SR-IOV fixes - Adding "wattman" like functionality - DC crc support - Improved DC dual-link handling amdkfd: - GPUVM support for dGPU - KFD events for dGPU - Enable PCIe atomics for dGPUs - HSA process eviction support - Live-lock fixes for process eviction - VM page table allocation fix for large-bar systems panel: - Raydium RM68200 - AUO G104SN02 V2 - KEO TX31D200VM0BAA - ARM Versatile panels i915: - Cannonlake support enabled - AUX-F port support added - Icelake base enabling until internal milestone of forcewake support - Query uAPI interface (used for GPU topology information currently) - Compressed framebuffer support for sprites - kmem cache shrinking when GPU is idle - Avoid boosting GPU when waited item is being processed already - Avoid retraining LSPCON link unnecessarily - Decrease request signaling latency - Deprecation of I915_SET_COLORKEY_NONE - Kerneldoc and compiler warning cleanup for upcoming CI enforcements - Full range ycbcr toggling - HDCP support i915/gvt: - Big refactor for shadow ppgtt - KBL context save/restore via LRI cmd (Weinan) - Properly unmap dma for guest page (Changbin) vmwgfx: - Lots of various improvements etnaviv: - Use the drm gpu scheduler - prep work for GC7000L support vc4: - fix alpha blending - Expose perf counters to userspace pl111: - Bandwidth checking/limiting - Versatile panel support sun4i: - A83T HDMI support - A80 support - YUV plane support - H3/H5 HDMI support omapdrm: - HPD support for DVI connector - remove lots of static variables msm: - DSI updates from 10nm / SDM845 - fix for race condition with a3xx/a4xx fence completion irq - some refactoring/prep work for eventual a6xx support (ie. when we have a userspace) - a5xx debugfs enhancements - some mdp5 fixes/cleanups to prepare for eventually merging writeback - support (ie. when we have a userspace) tegra: - mmap() fixes for fbdev devices - Overlay plane for hw cursor fix - dma-buf cache maintenance support mali-dp: - YUV->RGB conversion support rockchip: - rk3399/chromebook fixes and improvements rcar-du: - LVDS support move to drm bridge - DT bindings for R8A77995 - Driver/DT support for R8A77970 tilcdc: - DRM panel support" * tag 'drm-for-v4.17' of git://people.freedesktop.org/~airlied/linux: (1646 commits) drm/i915: Fix hibernation with ACPI S0 target state drm/i915/execlists: Use a locked clear_bit() for synchronisation with interrupt drm/i915: Specify which engines to reset following semaphore/event lockups drm/i915/dp: Write to SET_POWER dpcd to enable MST hub. drm/amdkfd: Use ordered workqueue to restore processes drm/amdgpu: Fix acquiring VM on large-BAR systems drm/amd/pp: clean header file hwmgr.h drm/amd/pp: use mlck_table.count for array loop index limit drm: Fix uabi regression by allowing garbage mode->type from userspace drm/amdgpu: Add an ATPX quirk for hybrid laptop drm/amdgpu: fix spelling mistake: "asssert" -> "assert" drm/amd/pp: Add new asic support in pp_psm.c drm/amd/pp: Clean up powerplay code on Vega12 drm/amd/pp: Add smu irq handlers for legacy asics drm/amd/pp: Fix set wrong temperature range on smu7 drm/amdgpu: Don't change preferred domian when fallback GTT v5 drm/vmwgfx: Bump version patchlevel and date drm/vmwgfx: use monotonic event timestamps drm/vmwgfx: Unpin the screen object backup buffer when not used drm/vmwgfx: Stricter count of legacy surface device resources ...
2018-03-23drm/amdkfd: Add quiesce_mm and resume_mm to kgd2kfd_callsFelix Kuehling4-5/+49
These interfaces allow KGD to stop and resume all GPU user mode queue access to a process address space. This is needed for handling MMU notifiers of userptrs mapped for GPU access in KFD VMs. Signed-off-by: Felix Kuehling <[email protected]> Reviewed-by: Oded Gabbay <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2018-03-23drm/amdkfd: GFP_NOIO while holding locks taken in MMU notifierFelix Kuehling3-3/+3
When an MMU notifier runs in memory reclaim context, it can deadlock trying to take locks that are already held in the thread causing the memory reclaim. The solution is to avoid memory reclaim while holding locks that are taken in MMU notifiers by using GFP_NOIO. This commit fixes memory allocations done while holding the dqm->lock which is needed in the MMU notifier (dqm->ops.evict_process_queues). Signed-off-by: Felix Kuehling <[email protected]> Reviewed-by: Oded Gabbay <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2018-03-23drm/amdkfd: Use ordered workqueue to restore processesFelix Kuehling3-6/+32
Restoring multiple processes concurrently can lead to live-locks where each process prevents the other from validating all its BOs. v2: fix duplicate check of same variable Signed-off-by: Felix Kuehling <[email protected]> Reviewed-by: Oded Gabbay <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2018-03-23drm/amdkfd: Deallocate SDMA queues correctlyFelix Kuehling1-6/+14
Deallocate SDMA queues during abnormal process termination and when queue creation fails after the SDMA allocation. Signed-off-by: Felix Kuehling <[email protected]> Reviewed-by: Oded Gabbay <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2018-03-23drm/amdkfd: Fix scratch memory with HWS enabledFelix Kuehling1-2/+1
Program sh_hidden_private_base_vmid correctly in the map-process PM4 packet. Signed-off-by: Felix Kuehling <[email protected]> Reviewed-by: Oded Gabbay <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2018-03-15drm/amdkfd: Add module option for testing large-BAR functionalityFelix Kuehling4-0/+19
Simulate large-BAR system by exporting only visible memory. This limits the amount of available VRAM to the size of the BAR, but enables CPU access to VRAM. Signed-off-by: Felix Kuehling <[email protected]> Acked-by: Oded Gabbay <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2018-03-15drm/amdkfd: Kmap event page for dGPUsFelix Kuehling3-2/+87
The events page must be accessible in user mode by the GPU and CPU as well as in kernel mode by the CPU. On dGPUs user mode virtual addresses are managed by the Thunk's GPU memory allocation code. Therefore we can't allocate the memory in kernel mode like we do on APUs. But KFD still needs to map the memory for kernel access. To facilitate this, the Thunk provides the buffer handle of the events page to KFD when creating the first event. Signed-off-by: Felix Kuehling <[email protected]> Acked-by: Christian König <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2018-03-15drm/amdkfd: Add ioctls for GPUVM memory managementFelix Kuehling2-0/+385
v2: * Fix error handling after kfd_bind_process_to_device in kfd_ioctl_map_memory_to_gpu v3: * Add ioctl to acquire VM from a DRM FD v4: * Return number of successful map/unmap operations in failure cases * Facilitate partial retry after failed map/unmap * Added comments with parameter descriptions to new APIs * Defined AMDKFD_IOC_FREE_MEMORY_OF_GPU write-only Signed-off-by: Felix Kuehling <[email protected]> Acked-by: Christian König <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2018-03-15drm/amdkfd: Add TC flush on VMID deallocation for HawaiiFelix Kuehling4-1/+95
On GFX7 the CP does not perform a TC flush when queues are unmapped. To avoid TC eviction from accessing an invalid VMID, flush it explicitly before releasing a VMID. v2: Fix unnecessary list_for_each_entry_safe v3: Moved allocation to kfd_process_device_init_vm Signed-off-by: Amber Lin <[email protected]> Signed-off-by: Felix Kuehling <[email protected]> Acked-by: Christian König <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2018-03-15drm/amdkfd: Allocate CWSR trap handler memory for dGPUsFelix Kuehling1-10/+127
Add helpers for allocating GPUVM memory in kernel mode and use them to allocate memory for the CWSR trap handler. v2: Use dev instead of pdd->dev in kfd_process_free_gpuvm v3: * Cleaned up and simplified kfd_process_alloc_gpuvm * Moved allocation for dGPU to kfd_process_device_init_vm Signed-off-by: Felix Kuehling <[email protected]> Acked-by: Christian König <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2018-03-15drm/amdkfd: Add per-process IDR for buffer handlesFelix Kuehling2-0/+84
Also used for cleaning up on process termination. v2: Refactored cleanup on process termination Signed-off-by: Felix Kuehling <[email protected]> Acked-by: Christian König <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2018-03-15drm/amdkfd: Aperture setup for dGPUsFelix Kuehling2-8/+33
Set up the GPUVM aperture for SVM (shared virtual memory) that allows sharing a part of virtual address space between GPUs and CPUs. Report the size of the GPUVM aperture that is supported by KGD accurately. The low part of the GPUVM aperture is reserved for kernel use. This is for kernel-allocated buffers that are only accessed on the GPU: - CWSR trap handler - IB for submitting commands in user-mode context from kernel mode Signed-off-by: Felix Kuehling <[email protected]> Acked-by: Christian König <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2018-03-15drm/amdkfd: Remove limit on number of GPUsFelix Kuehling2-12/+104
Currently the number of GPUs is limited by aperture placement options available on GFX7 and GFX8 hardware. This limitation is not necessary. Scratch and LDS represent per-work-item and per-work-group storage respectively. Different work-items and work-groups use the same virtual address to access their own data. Work running on different GPUs is by definition in different work-groups (different dispatches, in fact). That means the same virtual addresses can be used for these apertures on different GPUs. Add a new AMDKFD_IOC_GET_PROCESS_APERTURES_NEW ioctl that removes the artificial limitation on the number of GPUs that can be supported. The new ioctl allows user mode to query the number of GPUs to allocate enough memory for all GPUs to be reported. This deprecates AMDKFD_IOC_GET_PROCESS_APERTURES. Signed-off-by: Felix Kuehling <[email protected]> Acked-by: Christian König <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2018-03-15drm/amdkfd: Populate DRM render device minorOak Zeng2-0/+5
Populate DRM render device minor in kfd topology Signed-off-by: Oak Zeng <[email protected]> Signed-off-by: Felix Kuehling <[email protected]> Acked-by: Christian König <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2018-03-15drm/amdkfd: Create KFD VMs on demandFelix Kuehling2-10/+53
Instead of creating all VMs on process creation, create them when a process is bound to a device. This will later allow registering an existing VM from a DRM render node FD at runtime, before the process is bound to the device. This way the render node VM can be used for KFD instead of creating our own redundant VM. Signed-off-by: Felix Kuehling <[email protected]> Acked-by: Christian König <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2018-03-15drm/amdkfd: fix uninitialized variable useArnd Bergmann1-1/+1
When CONFIG_ACPI is disabled, we never initialize the acpi_table structure in kfd_create_crat_image_virtual: drivers/gpu/drm/amd/amdkfd/kfd_crat.c: In function 'kfd_create_crat_image_virtual': drivers/gpu/drm/amd/amdkfd/kfd_crat.c:888:40: error: 'acpi_table' may be used uninitialized in this function [-Werror=maybe-uninitialized] The undefined behavior also happens for any other acpi_get_table() failure, but then the compiler can't warn about it. This adds an error check that prevents the structure from being used in error, avoiding both the undefined behavior and the warning about it. Fixes: 520b8fb755cc ("drm/amdkfd: Add topology support for CPUs") Signed-off-by: Arnd Bergmann <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2018-02-06drm/amdkfd: Implement KFD process eviction/restoreFelix Kuehling8-8/+547
When the TTM memory manager in KGD evicts BOs, all user mode queues potentially accessing these BOs must be evicted temporarily. Once user mode queues are evicted, the eviction fence is signaled, allowing the migration of the BO to proceed. A delayed worker is scheduled to restore all the BOs belonging to the evicted process and restart its queues. During suspend/resume of the GPU we also evict all processes to allow KGD to save BOs in system memory, since VRAM will be lost. v2: * Account for eviction when updating of q->is_active in MQD manager Signed-off-by: Harish Kasiviswanathan <[email protected]> Signed-off-by: Felix Kuehling <[email protected]> Acked-by: Oded Gabbay <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2018-02-06drm/amdkfd: Add GPUVM virtual address space to PDDFelix Kuehling3-0/+66
Create/destroy the GPUVM context during PDD creation/destruction. Get VM page table base and program it during process registration (HWS) or VMID allocation (non-HWS). v2: * Used dev instead of pdd->dev in kfd_flush_tlb Signed-off-by: Felix Kuehling <[email protected]> Acked-by: Oded Gabbay <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2018-02-06drm/amdkfd: Remove unaligned memory accessHarish Kasiviswanathan1-16/+9
Unaligned atomic operations can cause problems on some CPU architectures. Use simpler bitmask operations instead. Atomic bit manipulations are not necessary since dqm->lock is held during these operations. Signed-off-by: Harish Kasiviswanathan <[email protected]> Signed-off-by: Felix Kuehling <[email protected]> Acked-by: Oded Gabbay <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>