Age | Commit message (Collapse) | Author | Files | Lines |
|
Support per-queue reset for GFX9. The recommendation is for the driver
to target reset the HW queue via a SPI MMIO register write.
Since this requires pipe and HW queue info and MEC FW is limited to
doorbell reports of hung queues after an unmap failure, scan the HW
queue slots defined by SET_RESOURCES first to identify the user queue
candidates to reset.
Only signal reset events to processes that have had a queue reset.
If queue reset fails, fall back to GPU reset.
Signed-off-by: Jonathan Kim <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Add log info for umc_v12_0.
v2:
Delete redundant logs.
Signed-off-by: YiPeng Chai <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Fixes issue where user events of type KFD_EVENT_TYPE_HW_EXCEPTION do not
have valid data
Signed-off-by: David Yat Sin <[email protected]>
Reviewed-by: Lijo Lazar <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Now that we use the dGPU path for all APUs, drop the
IOMMUv2 support.
v2: drop the now unused queue manager functions for gfx7/8 APUs
Reviewed-by: Felix Kuehling <[email protected]>
Acked-by: Christian König <[email protected]>
Tested-by: Mike Lothian <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Update user space last_event_age when event age is enabled.
It is only for KFD_EVENT_TYPE_SIGNAL which is checked by user space.
Signed-off-by: James Zhu <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Set waiter's activated flag true when event age unmatchs with last_event_age.
Signed-off-by: James Zhu <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Add event_age tracking when receiving interrupt.
Signed-off-by: James Zhu <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Exception handling for vmfaults should be raised with additional data.
Reported-by: Mukul Joshi <[email protected]>
Signed-off-by: Jonathan Kim <[email protected]>
Reviewed-by: Mukul Joshi <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Add a debug operation that allows the debugger to send an exception
directly to runtime through a payload address.
For memory violations, normal vmfault signals will be applied to
notify runtime instead after passing in the saved exception data
when a memory violation was raised to the debugger.
For runtime exceptions, this will unblock the runtime enable
function which will be explained and implemented in a follow up
patch.
Signed-off-by: Jonathan Kim <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Introduce a new structure, kfd_node, which will now represent
a compute node. kfd_node is carved out of kfd_dev structure.
kfd_dev struct now will become the parent of kfd_node, and will
store common resources such as doorbells, GTT sub-alloctor etc.
kfd_node struct will store all resources specific to a compute
node, such as device queue manager, interrupt handling etc.
This is the first step in adding compute partition support in KFD.
v2: introduce kfd_node struct to gc v11 (Hawking)
v3: make reference to kfd_dev struct through kfd_node (Morris)
v4: use kfd_node instead for kfd isr/mqd functions (Morris)
v5: rebase (Alex)
Signed-off-by: Mukul Joshi <[email protected]>
Tested-by: Amber Lin <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Hawking Zhang <[email protected]>
Signed-off-by: Morris Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Pull drm fixes from Dave Airlie:
"fbdev:
- fix uninit var in error path
shmem:
- revert unGPLing an export
i915:
- Don't use stolen memory or BAR mappings for ring buffers with LLC
- Add inverted backlight quirk for HP 14-r206nv
- Fix GSI offset for MCR lookups
- GVT fixes (memleak, debugfs attributes, kconfig, typos)
amdgpu:
- SMU 13 fixes
- Enable TMZ for GC 10.3.6
- Misc display fixes
- Buddy allocator fixes
- GC 11 fixes
- S0ix fix
- INFO IOCTL queries for GC 11
- VCN harvest fixes for SR-IOV
- UMC 8.10 RAS fixes
- Don't restrict bpc to 8
- NBIO 7.5 fix
- Allow freesync on PCon for more devices
amdkfd:
- SDMA fix
- Illegal memory access fix"
* tag 'drm-next-2023-03-03-1' of git://anongit.freedesktop.org/drm/drm: (45 commits)
drm/amdgpu/vcn: fix compilation issue with legacy gcc
drm/amd/display: Extend Freesync over PCon support for more devices
Revert "drm/amd/display: Do not set DRR on pipe commit"
drm/amd/display: fix shift-out-of-bounds in CalculateVMAndRowBytes
drm/amd/display: Ext displays with dock can't recognized after resume
drm/amdgpu: fix ttm_bo calltrace warning in psp_hw_fini
drm/amdgpu: remove unused variable ring
drm/amd/display: fix dm irq error message in gpu recover
drm/amd: Fix initialization for nbio 7.5.1
drm/amd/display: Don't restrict bpc to 8 bpc
drm/amdgpu: Make umc_v8_10_convert_error_address static and remove unused variable
drm/radeon: Fix eDP for single-display iMac11,2
drm/shmem-helper: Revert accidental non-GPL export
drm: omapdrm: Do not use helper unininitialized in omap_fbdev_init()
drm/amd/pm: downgrade log level upon SMU IF version mismatch
drm/amdgpu: Add ecc info query interface for umc v8_10
drm/amdgpu: Add convert_error_address function for umc v8_10
drm/amdgpu: add bad_page_threshold check in ras_eeprom_check_err
drm/amdgpu: change default behavior of bad_page_threshold parameter
drm/amdgpu: exclude duplicate pages from UMC RAS UE count
...
|
|
In the kfd_wait_on_events() function, the kfd_event_waiter structure is
allocated by alloc_event_waiters(), but the event field of the waiter
structure is not initialized; When copy_from_user() fails in the
kfd_wait_on_events() function, it will enter exception handling to
release the previously allocated memory of the waiter structure;
Due to the event field of the waiters structure being accessed
in the free_waiters() function, this results in illegal memory access
and system crash, here is the crash log:
localhost kernel: RIP: 0010:native_queued_spin_lock_slowpath+0x185/0x1e0
localhost kernel: RSP: 0018:ffffaa53c362bd60 EFLAGS: 00010082
localhost kernel: RAX: ff3d3d6bff4007cb RBX: 0000000000000282 RCX: 00000000002c0000
localhost kernel: RDX: ffff9e855eeacb80 RSI: 000000000000279c RDI: ffffe7088f6a21d0
localhost kernel: RBP: ffffe7088f6a21d0 R08: 00000000002c0000 R09: ffffaa53c362be64
localhost kernel: R10: ffffaa53c362bbd8 R11: 0000000000000001 R12: 0000000000000002
localhost kernel: R13: ffff9e7ead15d600 R14: 0000000000000000 R15: ffff9e7ead15d698
localhost kernel: FS: 0000152a3d111700(0000) GS:ffff9e855ee80000(0000) knlGS:0000000000000000
localhost kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
localhost kernel: CR2: 0000152938000010 CR3: 000000044d7a4000 CR4: 00000000003506e0
localhost kernel: Call Trace:
localhost kernel: _raw_spin_lock_irqsave+0x30/0x40
localhost kernel: remove_wait_queue+0x12/0x50
localhost kernel: kfd_wait_on_events+0x1b6/0x490 [hydcu]
localhost kernel: ? ftrace_graph_caller+0xa0/0xa0
localhost kernel: kfd_ioctl+0x38c/0x4a0 [hydcu]
localhost kernel: ? kfd_ioctl_set_trap_handler+0x70/0x70 [hydcu]
localhost kernel: ? kfd_ioctl_create_queue+0x5a0/0x5a0 [hydcu]
localhost kernel: ? ftrace_graph_caller+0xa0/0xa0
localhost kernel: __x64_sys_ioctl+0x8e/0xd0
localhost kernel: ? syscall_trace_enter.isra.18+0x143/0x1b0
localhost kernel: do_syscall_64+0x33/0x80
localhost kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
localhost kernel: RIP: 0033:0x152a4dff68d7
Allocate the structure with kcalloc, and remove redundant 0-initialization
and a redundant loop condition check.
Signed-off-by: Qu Huang <[email protected]>
Signed-off-by: Felix Kuehling <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Replace direct modifications to vma->vm_flags with calls to modifier
functions to be able to track flag changes and to keep vma locking
correctness.
[[email protected]: fix drivers/misc/open-dice.c, per Hyeonggon Yoo]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Suren Baghdasaryan <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Acked-by: Mel Gorman <[email protected]>
Acked-by: Mike Rapoport (IBM) <[email protected]>
Acked-by: Sebastian Reichel <[email protected]>
Reviewed-by: Liam R. Howlett <[email protected]>
Reviewed-by: Hyeonggon Yoo <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Arjun Roy <[email protected]>
Cc: Axel Rasmussen <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: David Howells <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: Greg Thelen <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jann Horn <[email protected]>
Cc: Joel Fernandes <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Kent Overstreet <[email protected]>
Cc: Laurent Dufour <[email protected]>
Cc: Lorenzo Stoakes <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Peter Oskolkov <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Punit Agrawal <[email protected]>
Cc: Sebastian Andrzej Siewior <[email protected]>
Cc: Shakeel Butt <[email protected]>
Cc: Soheil Hassas Yeganeh <[email protected]>
Cc: Song Liu <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
mutex_unlock before the exit label because all the error code paths that
jump there didn't take that lock. This fixes unbalanced locking errors
in case of restore errors.
Fixes: 40e8a766a761 ("drm/amdkfd: CRIU checkpoint and restore events")
Signed-off-by: Felix Kuehling <[email protected]>
Reviewed-by: Rajneesh Bhardwaj <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Cc: [email protected]
|
|
When kfd_ioctl_wait_events needs to restart due to a signal, we need to
update the timeout to account for the time already elapsed. We also need
to undo auto_reset of events that have signaled already, so that the
restarted ioctl will be able to count those signals again.
This fixes infinite hangs when kfd_ioctl_wait_events is interrupted by a
signal.
Signed-off-by: Felix Kuehling <[email protected]>
Reviewed-and-tested-by: Xiaogang Chen <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Removed an unused parameter from two functions and added kernel-doc
comments.
Signed-off-by: Felix Kuehling <[email protected]>
Reviewed-by: Philip Yang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
MEC firmware sometimes sends signal interrupts without a valid context ID
on end of pipe events that don't intend to signal any HSA signals.
This triggers the slow path in kfd_signal_event_interrupt that scans the
entire event page for signaled events. Detect these signals in the top
half interrupt handler to stop processing them as early as possible.
Because we now always treat event ID 0 as invalid, reserve that ID during
process initialization.
v2: Update firmware version checks to support more GPUs
Signed-off-by: Felix Kuehling <[email protected]>
Reviewed-by: Philip Yang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Add the waiters to the wait queue during initialization, while holding the
event spinlock. Otherwise the waiter will not get activated if the event
signals before being added to the wait queue.
Signed-off-by: Felix Kuehling <[email protected]>
Reviewed-by: Philip Yang<[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
If lookup_event_by_id() returns a NULL "ev" pointer then the
spin_lock(&ev->lock) will crash. This was detected by Smatch:
drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_events.c:644 kfd_set_event()
error: we previously assumed 'ev' could be null (see line 639)
Fixes: 5273e82c5f47 ("drm/amdkfd: Improve concurrency of event handling")
Signed-off-by: Dan Carpenter <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
The synchronize_rcu call in destroy_events can take several ms, which
noticeably slows down applications destroying many events. Use kfree_rcu
to free the event structure asynchronously and eliminate the
synchronize_rcu call in the user thread.
Signed-off-by: Felix Kuehling <[email protected]>
Reviewed-by: Philip Yang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Use rcu_read_lock to read p->event_idr concurrently with other readers
and writers. Use p->event_mutex only for creating and destroying events
and in kfd_wait_on_events.
Protect the contents of the kfd_event structure with a per-event
spinlock that can be taken inside the rcu_read_lock critical section.
This eliminates contention of p->event_mutex in set_event, which tends
to be on the critical path for dispatch latency even when busy waiting
is used. It also eliminates lock contention in event interrupt handlers.
Since the p->event_mutex is now used much less, the impact of requiring
it in kfd_wait_on_events should also be much smaller.
This should improve event handling latency for processes using multiple
GPUs concurrently.
v2: Reschedule the worker periodically to avoid soft lockup warnings
Signed-off-by: Felix Kuehling <[email protected]>
Reviewed-by: Sean Keely <[email protected]> # v1
Tested-by: Sanjay Tripathi <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
As the kmalloc_array() may return null, the 'event_waiters[i].wait' would lead to null-pointer dereference.
Therefore, it is better to check the return value of kmalloc_array() to avoid this confusion.
Signed-off-by: QintaoShen <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Update the SPDX License header for all the KFD files.
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Rajneesh Bhardwaj <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
When doing a restore on a different node, the gpu_id's on the restore
node may be different. But the user space application will still refer
use the original gpu_id's in the ioctl calls. Adding code to create a
gpu id mapping so that kfd can determine actual gpu_id during the user
ioctl's.
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: David Yat Sin <[email protected]>
Signed-off-by: Rajneesh Bhardwaj <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Add support to existing CRIU ioctl's to save and restore events during
criu checkpoint and restore.
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: David Yat Sin <[email protected]>
Signed-off-by: Rajneesh Bhardwaj <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Switch to IP version checking instead of asic_type on various KFD
version checks.
Signed-off-by: Graham Sider <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
The function kfd_lookup_process_by_pasid will increase the reference
count of kfd_process object, its caller should call kfd_unref_process to
decrease the reference count. Otherwise resource leakage will happen.
Signed-off-by: Dennis Li <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
The user applications maybe register the KFD_EVENT_TYPE_HW_EXCEPTION and
KFD_EVENT_TYPE_MEMORY events, driver could notify them when poison data
consumed. Beside that, some applications maybe register SIGBUS signal
hander. These applications will handle poison data by themselves, exit
or re-create context to re-dispatch works.
Signed-off-by: Dennis Li <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
PASID is defined as a few different types in iommu including "int",
"u32", and "unsigned int". To be consistent and to match with uapi
definitions, define PASID and its variations (e.g. max PASID) as "u32".
"u32" is also shorter and a little more explicit than "unsigned int".
No PASID type change in uapi although it defines PASID as __u64 in
some places.
Suggested-by: Thomas Gleixner <[email protected]>
Signed-off-by: Fenghua Yu <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Tony Luck <[email protected]>
Reviewed-by: Lu Baolu <[email protected]>
Acked-by: Felix Kuehling <[email protected]>
Acked-by: Joerg Roedel <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
This change converts the existing mmap_sem rwsem calls to use the new mmap
locking API instead.
The change is generated using coccinelle with the following rule:
// spatch --sp-file mmap_lock_api.cocci --in-place --include-headers --dir .
@@
expression mm;
@@
(
-init_rwsem
+mmap_init_lock
|
-down_write
+mmap_write_lock
|
-down_write_killable
+mmap_write_lock_killable
|
-down_write_trylock
+mmap_write_trylock
|
-up_write
+mmap_write_unlock
|
-downgrade_write
+mmap_write_downgrade
|
-down_read
+mmap_read_lock
|
-down_read_killable
+mmap_read_lock_killable
|
-down_read_trylock
+mmap_read_trylock
|
-up_read
+mmap_read_unlock
)
-(&mm->mmap_sem)
+(mm)
Signed-off-by: Michel Lespinasse <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Daniel Jordan <[email protected]>
Reviewed-by: Laurent Dufour <[email protected]>
Reviewed-by: Vlastimil Babka <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Jerome Glisse <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: Liam Howlett <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ying Han <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
People are inclined to think of the previous pr_warn message as an
error, so use pre_debug instead.
Signed-off-by: Yong Zhao <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
The new code uses straightforward bit shifts and thus has better readability.
Signed-off-by: Yong Zhao <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Since KFD pasid starts from 0x8000 (32768 in decimal), it is better
perceived as a hex number. Meanwhile, change the pasid type from
unsigned int to uint16_t to be consistent throughout the code.
Signed-off-by: Yong Zhao <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Renoir is the same with Raven, will enable iommu event in future.
v2: fix the checking (Thong)
Signed-off-by: Huang Rui <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Fix some spacing issues, log output, uses of !=NULL/==NULL, unneeded
extra lines and clean up a declaration from =1 to =true for clarity
Signed-off-by: Kent Russell <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
RAS ECC event will combine with GPU reset event, due to
ECC interrupts are caused by uncorrectable error that triggers
GPU reset.
v2: Fix misleading-indentation warning
v3: fix build with CONFIG_HSA_AMD disabled
Signed-off-by: Eric Huang <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
memory_exception_data is already initialized for not-present faults.
It only needs to be overridden for permission faults.
Signed-off-by: Yong Zhao <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Felix Kuehling <[email protected]>
Acked-by: Alex Deucher <[email protected]>
Signed-off-by: Oded Gabbay <[email protected]>
|
|
On Raven multiple PPRs can be queued up by the hardware. When the
first of those requests is handled by the IOMMU driver, the memory
access succeeds. After that the application may be done with the
memory and unmap it. At that point the page table entries are
invalidated, but there are still outstanding duplicate PPRs for those
addresses. When the IOMMU driver processes those duplicate requests,
it finds invalid page table entries and triggers an invalid PPR fault.
As a workaround, don't signal invalid PPR faults on Raven to avoid
segfaulting applications that haven't done anything wrong. As a side
effect, real GPU memory access faults may go unnoticed by the
application.
Signed-off-by: Yong Zhao <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Felix Kuehling <[email protected]>
Acked-by: Alex Deucher <[email protected]>
Signed-off-by: Oded Gabbay <[email protected]>
|
|
Lock KFD and evict existing queues on reset. Notify user mode by
signaling hw_exception events.
Signed-off-by: Shaoyun Liu <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Felix Kuehling <[email protected]>
Acked-by: Christian König <[email protected]>
Signed-off-by: Oded Gabbay <[email protected]>
|
|
1. Pre-GFX9 the amdgpu ISR saves the vm-fault status and address per
per-vmid. amdkfd needs to get the information from amdgpu through the
new get_vm_fault_info interface. On GFX9 and later, all the required
information is in the IH ring
2. amdkfd unmaps all queues from the faulting process and create new
run-list without the guilty process
3. amdkfd notifies the runtime of the vm fault trap via EVENT_TYPE_MEMORY
Signed-off-by: shaoyun liu <[email protected]>
Signed-off-by: Felix Kuehling <[email protected]>
Acked-by: Christian König <[email protected]>
Signed-off-by: Oded Gabbay <[email protected]>
|
|
Signed-off-by: Moses Reuben <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Felix Kuehling <[email protected]>
Acked-by: Christian König <[email protected]>
Signed-off-by: Oded Gabbay <[email protected]>
|
|
It turns out that idr_for_each_entry is really slow compared to just
iterating over the slots. Based on measurements the difference is
estimated to be about a factor 64. That means using idr_for_each_entry
is only worth it with very few allocated events.
Signed-off-by: Felix Kuehling <[email protected]>
Reviewed-by: Oded Gabbay <[email protected]>
Signed-off-by: Oded Gabbay <[email protected]>
|
|
Use bit-rotate for better clarity and remove _MASK from the #defines as
these represent mmap types.
Centralize all the parsing of the mmap offset in kfd_mmap and add device
parameter to doorbell and reserved_mem map functions.
Encode gpu_id into upper bits of vm_pgoff. This frees up the lower bits
for encoding the the doorbell ID on Vega10.
Signed-off-by: Harish Kasiviswanathan <[email protected]>
Signed-off-by: Felix Kuehling <[email protected]>
Reviewed-by: Oded Gabbay <[email protected]>
Signed-off-by: Oded Gabbay <[email protected]>
|
|
The events page must be accessible in user mode by the GPU and CPU
as well as in kernel mode by the CPU. On dGPUs user mode virtual
addresses are managed by the Thunk's GPU memory allocation code.
Therefore we can't allocate the memory in kernel mode like we do
on APUs. But KFD still needs to map the memory for kernel access.
To facilitate this, the Thunk provides the buffer handle of the
events page to KFD when creating the first event.
Signed-off-by: Felix Kuehling <[email protected]>
Acked-by: Christian König <[email protected]>
Signed-off-by: Oded Gabbay <[email protected]>
|
|
dGPUs work without IOMMUv2. Make IOMMUv2 initialization dependent on
ASIC information. Also allow building KFD without IOMMUv2 support.
This is still useful for dGPUs and prepares for enabling KFD on
architectures that don't support AMD IOMMUv2.
v2:
* Centralize IOMMUv2 code to avoid #ifdefs in too many places
v3:
* Imply AMD_IOMMU_V2 in Kconfig
Signed-off-by: Felix Kuehling <[email protected]>
Acked-by: Christian Konig <[email protected]>
Reviewed-by: Oded Gabbay <[email protected]>
Signed-off-by: Oded Gabbay <[email protected]>
|
|
Use a reference counter instead of a lock to prevent process
destruction while functions running out of process context are using
the kfd_process structure. In many cases these functions don't need
the structure to be locked. In the few cases that really do need the
process lock, take it explicitly.
This helps simplify lock dependencies between the process lock and
other locks, particularly amdgpu and mm_struct locks. This will be
important when amdgpu calls back to amdkfd for memory evictions.
Signed-off-by: Felix Kuehling <[email protected]>
Acked-by: Christian König <[email protected]>
Reviewed-by: Oded Gabbay <[email protected]>
Signed-off-by: Oded Gabbay <[email protected]>
|
|
This allows increasing the KFD_SIGNAL_EVENT_LIMIT in kfd_ioctl.h
without breaking processes built with older kfd_ioctl.h versions.
Signed-off-by: Felix Kuehling <[email protected]>
Signed-off-by: Oded Gabbay <[email protected]>
|
|
This speeds up signal lookup when the IH ring entry includes a
valid context ID or partial context ID. Only if the context ID is
found to be invalid, fall back to an exhaustive search of all
signaled events.
Signed-off-by: Felix Kuehling <[email protected]>
Acked-by: Oded Gabbay <[email protected]>
Signed-off-by: Oded Gabbay <[email protected]>
|
|
Signal slots are identical to event IDs.
Replace the used_slot_bitmap and events hash table with an IDR to
allocate and lookup event IDs and signal slots more efficiently.
Signed-off-by: Felix Kuehling <[email protected]>
Acked-by: Oded Gabbay <[email protected]>
Signed-off-by: Oded Gabbay <[email protected]>
|
|
The first event page is always big enough to handle all events.
Handling of multiple events pages is not supported by user mode, and
not necessary.
Signed-off-by: Yong Zhao <[email protected]>
Signed-off-by: Felix Kuehling <[email protected]>
Acked-by: Oded Gabbay <[email protected]>
Signed-off-by: Oded Gabbay <[email protected]>
|