aboutsummaryrefslogtreecommitdiff
path: root/drivers/misc
AgeCommit message (Collapse)AuthorFilesLines
2022-09-18habanalabs/gaudi2: log critical events with no rate limitfarah kassabri1-4/+11
When we have a storm of errors of HBM ECC SERR we can reach a situation where driver start hard reset flow without logging the error cause that caused the hard reset due to logs rate limiting. Signed-off-by: farah kassabri <[email protected]> Reviewed-by: Oded Gabbay <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2022-09-18habanalabs: ignore EEPROM errors during bootOfir Bitton2-0/+14
EEPROM errors reported by firmware are basically warnings and should not fail the boot process. Signed-off-by: Ofir Bitton <[email protected]> Reviewed-by: Oded Gabbay <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2022-09-18habanalabs: perform context switch flow only if neededOfir Bitton3-4/+9
Except Goya, none of our ASICs require context switch flow, hence we enable this flow only where it is needed. Signed-off-by: Ofir Bitton <[email protected]> Reviewed-by: Oded Gabbay <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2022-09-18habanalabs: set command buffer host VA dynamicallyDafna Hirschfeld5-18/+18
Set the addresses for userspace command buffer dynamically instead of hard-coded. There is no reason for it to be hard-coded. Signed-off-by: Dafna Hirschfeld <[email protected]> Reviewed-by: Oded Gabbay <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2022-09-18habanalabs: trace DMA allocationsOhad Sharabi2-27/+62
This patch add tracepoints in the code for DMA allocation. The main purpose is to be able to cross data with the map operations and determine whether memory violation occurred, for example free DMA allocation before unmapping it from device memory. To achieve this the DMA alloc/free code flows were refactored so that a single DMA tracepoint will catch many flows. To get better understanding of what happened in the DMA allocations the real allocating function is added to the trace as well. Signed-off-by: Ohad Sharabi <[email protected]> Reviewed-by: Oded Gabbay <[email protected]> Reviewed-by: Steven Rostedt (Google) <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2022-09-18habanalabs: trace MMU map/unmap pageOhad Sharabi1-0/+7
This patch utilize the defined tracepoint to trace the MMU's pages map/unmap operations. Signed-off-by: Ohad Sharabi <[email protected]> Reviewed-by: Oded Gabbay <[email protected]> Reviewed-by: Steven Rostedt (Google) <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2022-09-18habanalabs: define trace eventsOhad Sharabi1-0/+3
This patch adds trace events for habanalabs driver to gain all the benefits such an infrastructure can supply. The following events were added: - MMU map/unmap: to be able to track driver's memory allocations - DMA alloc/free: to track our DMA allocation the above trace points in conjunction will help us map the device memory usage as well as to be able to track memory violations. Signed-off-by: Ohad Sharabi <[email protected]> Acked-by: Oded Gabbay <[email protected]> Reviewed-by: Steven Rostedt (Google) <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2022-09-18habanalabs/gaudi2: assigning PQFs for ARC f/w in PDMARajarama Manjukody Bhat2-5/+16
Assigning 3 PQFs in PDMA1 and 2 PQFs in PDMA0 for ARC firmware usage. Signed-off-by: Rajarama Manjukody Bhat <[email protected]> Reviewed-by: Oded Gabbay <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2022-09-18habanalabs: fix calculation of DRAM base address in PCIe BARTomer Tayar1-1/+5
The calculation of the device DRAM base address before setting the relevant PCIe BAR to point at it, has an assumption that this BAR is used to access only the DRAM, and thus the covered DRAM size is a power of 2. In future ASICs it is not necessarily true, so need to update the calculation to support also a non-power-of-2 size. Signed-off-by: Tomer Tayar <[email protected]> Reviewed-by: Oded Gabbay <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2022-09-18habanalabs: if map page fails don't try to unmap itDafna Hirschfeld1-0/+2
The original code tried to unmap a page that was not mapped as part of the map page error path. Signed-off-by: Dafna Hirschfeld <[email protected]> Reviewed-by: Oded Gabbay <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2022-09-18habanalabs: select FW_LOADER in KconfigOded Gabbay1-0/+1
The driver is loading firmware to the device and we use the firmware loading functions from the FW_LOADER module. Signed-off-by: Oded Gabbay <[email protected]>
2022-09-18habanalabs: add cdev index data memberOmer Shpigelman2-5/+9
Instead of recalculating the cdev index, store it in a dedicated data member. This data member is intended to be passed to other drivers using the auxiliary bus infra and hence this new data member is necessary in case that the calculation is changed in the future. Signed-off-by: Omer Shpigelman <[email protected]> Reviewed-by: Oded Gabbay <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2022-09-18habanalabs: fix bug when setting va block sizeDafna Hirschfeld1-2/+2
the size of a block is always 'block->end - block->start + 1' Signed-off-by: Dafna Hirschfeld <[email protected]> Reviewed-by: Oded Gabbay <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2022-09-18habanalabs: expose device security status using info ioctlOfir Bitton1-0/+1
In order for the user to know if he is running on a secured device or not, we add it also to the hw_ip info ioctl. Signed-off-by: Ofir Bitton <[email protected]> Reviewed-by: Oded Gabbay <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2022-09-18habanalabs: expose device security status through sysfsOfir Bitton1-0/+10
In order for the user to know if he is running on a secured device or not, a sysfs node is added. Signed-off-by: Ofir Bitton <[email protected]> Reviewed-by: Oded Gabbay <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2022-09-18habanalabs: remove secured PCI IDsOfir Bitton1-6/+0
Secured PCI ID will not be supported in new asics because the security status can always be read from the f/w. Signed-off-by: Ofir Bitton <[email protected]> Reviewed-by: Oded Gabbay <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2022-09-18habanalabs: fix H/W block handling for partial unmappingsTomer Tayar3-9/+21
Several munmap() calls can be done or a mapped H/W block that has a larger size than a page size. Releasing the object should be done only when all mapped range is unmapped. Signed-off-by: Tomer Tayar <[email protected]> Reviewed-by: Oded Gabbay <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2022-09-18habanalabs: unify hwmon resources clean upDani Liberman5-50/+25
Since hwmon fini code is common for all asics, unified it to common function. Signed-off-by: Dani Liberman <[email protected]> Reviewed-by: Oded Gabbay <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2022-09-18habanalabs/gaudi2: new API to control engine cores running modeTal Cohen4-7/+117
The current flow of halting the engine cores is implemented by command buffers built by the user space and sent towards the Driver. This current flow is broken since the user space does not know when the cores actually halt as sending a workload is async op. Therefore the application can not free the memory that is mapped to the engine cores. This new API allows the user space to control the running mode. The API call is sync (returns after the cores are set to the requested mode). Signed-off-by: Tal Cohen <[email protected]> Reviewed-by: Oded Gabbay <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2022-09-18habanalabs: remove left-over code from bring-upOded Gabbay2-27/+21
There is some left-over code from the gaudi2 bring-up that wasn't removed so far. Signed-off-by: Oded Gabbay <[email protected]>
2022-09-18habanalabs/gaudi2: change device f/w security checkfarah kassabri2-15/+8
On Gaudi2 the f/w always configures the PCIe iATU and allows access to scratchpad registers. Therefore, we can know if the f/w is secured by reading a status bit from the f/w registers. Signed-off-by: farah kassabri <[email protected]> Reviewed-by: Oded Gabbay <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2022-09-18habanalabs: move common function out of debugfs.cOded Gabbay2-25/+24
A common function that is called from multiple places can't be located in degugfs.c because that file is only compiled if debugfs is enabled in the kernel config file. This can lead to undefined symbol compilation error. Reported-by: kernel test robot <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2022-09-18habanalabs: add a missing lock for in_reset indicationTomer Tayar1-0/+2
Add a missing lock in hl_device_resume() when it assigns a value to the 'in_reset' indication. Signed-off-by: Tomer Tayar <[email protected]> Reviewed-by: Oded Gabbay <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2022-09-18habanalabs: fix vma fields assignments order in hl_hw_block_mmap()Tomer Tayar1-6/+5
In hl_hw_block_mmap(), the vma's 'vm_private_data' and 'vm_ops' fields are assigned before filling the content of the private data. In between there is a call to the ASIC hw_block_mmap() function, and if it fails, the vma close function will be called with a bad private data value. Fix the order of assignments to avoid this issue. In hl_hw_block_mmap() the vma's 'vm_private_data and vm_ops are assigned before setting the Signed-off-by: Tomer Tayar <[email protected]> Reviewed-by: Oded Gabbay <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2022-09-18habanalabs: avoid returning a valid handle if map_block() failsTomer Tayar1-4/+9
map_block() sets the block id handle even if get_hw_block_id() fails, and in this case it uses block id 0 which might be a valid id. Modify it to set the handle only if get_hw_block_id() succeeds. Signed-off-by: Tomer Tayar <[email protected]> Reviewed-by: Oded Gabbay <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2022-09-18habanalabs: fix command submission sanity checkTal Cohen1-5/+9
When a CS is submitted, the ioctl handler checks the CS flags and performs a sanity check, according to its value. As new CS flags are added, the sanity check needs to be updated according to the new flags. Signed-off-by: Tal Cohen <[email protected]> Reviewed-by: Oded Gabbay <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2022-09-18habanalabs/gaudi: read div_sel value from firmwareOhad Sharabi1-2/+3
Even when running with unsecured f/w, we should read the PLL div_sel value from the f/w as this register is always privileged. Signed-off-by: Ohad Sharabi <[email protected]> Reviewed-by: Oded Gabbay <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2022-09-18habanalabs/gaudi: fix print format for div_selOhad Sharabi1-3/+1
Print format was for int (%d) while variable is u32. Signed-off-by: Ohad Sharabi <[email protected]> Reviewed-by: Oded Gabbay <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2022-09-18habanalabs/gaudi2: mark PCIE access error as fatalTomer Tayar1-0/+1
F/W events are enabled in a late phase of the device init, so an event for a PCIE access error during the init, can be received after the init is already done and considered as successful. A resulting device reset, which does the same H/W init, can end similarly with this event right after the reset is done and considered as successful, and a loop of this sequence can continue. To avoid it mark the PCIE access error as a fatal event, so after 2 consecutive events no more resets will be done. Signed-off-by: Tomer Tayar <[email protected]> Reviewed-by: Oded Gabbay <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2022-09-18habanalabs: add uapi to retrieve engines statusDani Liberman2-2/+41
Currently, to get engines status, user needed to read debugfs file with root permissions. This new uapi allows user apace apps retrieve status, so for example, in case of failure, status can be retrieved immediately by the application itself which runs without root permissions. Signed-off-by: Dani Liberman <[email protected]> Reviewed-by: Oded Gabbay <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2022-09-18habanalabs: remove all kdma locksOded Gabbay5-34/+0
We don't use KDMA concurrently in the driver. The only use is through debugfs and we don't protect concurrent access through it. Reported-by: Dan Carpenter <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2022-09-18habanalabs: wrap macro arg with parenthesesOhad Sharabi1-2/+2
The macro argument <val> is cast-ed to u32 in some of the places. Because this arg can be some arithmetic computation (e.g. address + offset) the cast should be on the whole expression. Signed-off-by: Ohad Sharabi <[email protected]> Reviewed-by: Oded Gabbay <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2022-09-18habanalabs: fix spelling mistakesBharat Jauhari3-23/+22
Cosmetic commit, no logical changes. It just fixes the spelling mistakes. Signed-off-by: Bharat Jauhari <[email protected]> Reviewed-by: Oded Gabbay <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2022-09-18habanalabs/gaudi2: remove old interrupt mappingsOfir Bitton2-60/+0
Interrupt enumration has changed some time ago but the old mapping was accidentally left in the driver. Signed-off-by: Ofir Bitton <[email protected]> Reviewed-by: Oded Gabbay <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2022-09-18habanalabs/gaudi: increase default cs timeout to 10 minutesOded Gabbay1-5/+18
In order to improve scalability and reduce host overhead, it is better to increase the default TDR timeout of Gaudi1 from 30 seconds to 10 minutes. This will allow the DL Framework (e.g. PyTorch, TensorFlow) to remove the host sync they are using now and improve overall performance on scaleout training. Note that one can always set the timeout to a custom value via a kernel module parameter given during driver load. Signed-off-by: Oded Gabbay <[email protected]>
2022-09-18habanalabs: add return code field to module iteratorOhad Sharabi3-20/+32
Up until now the module iterator called void callback functions and so caller activating callback that may fail suffered from 2 issues: 1. The need to "plant" return called in the private data. This is a drawback since the iterator itself should not be aware of the private data of the caller. 2. Due to 1 even in a failure the iterator would keep iterating instead of break upon error. To overcome this an optional rc field added to the iterator context. Signed-off-by: Ohad Sharabi <[email protected]> Reviewed-by: Oded Gabbay <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2022-09-18habanalabs/gaudi2: enable all MMU SPI/SEI interruptsTomer Tayar1-3/+11
Currently only part of the MMU SPI/SEI interrupts are enabled, although there is no real reason to not enable all. The only exception is "burst_fifo_full" which is expected for PMMU because it has a 2 entries FIFO, and thus is it not enabled for it. Signed-off-by: Tomer Tayar <[email protected]> Reviewed-by: Oded Gabbay <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2022-09-18habanalabs: rename non_hard_reset to compute_resetOfir Bitton5-9/+9
In order to be more explicit we should use the term compute_reset for describing the reset in which only the compute engines gets reset. Signed-off-by: Ofir Bitton <[email protected]> Reviewed-by: Oded Gabbay <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2022-09-18habanalabs: Fix spelling mistake "Scrubing" -> "Scrubbing"Colin Ian King1-1/+1
There is a spelling mistake in a dev_dbg message. Fix it. Signed-off-by: Colin Ian King <[email protected]> Reviewed-by: Oded Gabbay <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2022-09-18habanalabs: Simplify bool conversionYang Li1-1/+1
Fix the following coccicheck warning: ./drivers/misc/habanalabs/gaudi2/gaudi2.c:9727:48-53: WARNING: conversion to bool not needed here Signed-off-by: Yang Li <[email protected]> Reviewed-by: Oded Gabbay <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2022-09-18habanalabs: removed seq_file parameter from is_idle asic functionsDani Liberman5-89/+151
Change is_idle functions so it would be more usable outside debugfs. Do this by replacing seq_file parameter with regular string. Signed-off-by: Dani Liberman <[email protected]> Reviewed-by: Oded Gabbay <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2022-09-16Merge tag 'v6.0-rc5' into i2c/for-mergewindowWolfram Sang1-5/+9
Linux 6.0-rc5
2022-09-12mei: debugfs: add pxp mode to devstate in debugfsTomas Winkler1-1/+18
Add pxp mode devstate to debugfs to monitor pxp state machine progress. This is useful to debug issues in scenarios in which the pxp state needs to be re-initialized, like during power transitions such as suspend/resume. With this debugfs the state could be monitored to ensure that pxp is in the ready state. CC: Vitaly Lubart <[email protected]> Signed-off-by: Tomas Winkler <[email protected]> Signed-off-by: Alexander Usyskin <[email protected]> Reviewed-by: Daniele Ceraolo Spurio <[email protected]> Signed-off-by: Daniele Ceraolo Spurio <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Acked-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Joonas Lahtinen <[email protected]>
2022-09-12mei: drop ready bits check after startAlexander Usyskin1-10/+0
The check that hardware and host ready bits are set after start is redundant and may fail and disable driver if there is back-to-back link reset issued right after start. This happens during pxp mode transitions when firmware undergo reset. Remove these checks to eliminate such failures. Signed-off-by: Alexander Usyskin <[email protected]> Signed-off-by: Tomas Winkler <[email protected]> Reviewed-by: Daniele Ceraolo Spurio <[email protected]> Signed-off-by: Daniele Ceraolo Spurio <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Acked-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Joonas Lahtinen <[email protected]>
2022-09-12mei: gsc: add transition to PXP mode in resume flowVitaly Lubart1-0/+11
Added transition to PXP mode in resume flow. CC: Daniele Ceraolo Spurio <[email protected]> Signed-off-by: Vitaly Lubart <[email protected]> Signed-off-by: Tomas Winkler <[email protected]> Signed-off-by: Alexander Usyskin <[email protected]> Reviewed-by: Daniele Ceraolo Spurio <[email protected]> Signed-off-by: Daniele Ceraolo Spurio <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Acked-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Joonas Lahtinen <[email protected]>
2022-09-12mei: gsc: setup gsc extended operational memoryTomas Winkler6-4/+138
1. Retrieve extended operational memory physical pointers from the auxiliary device info. 2. Setup memory registers. 3. Notify firmware that the memory is ready by sending the memory ready command. 4. Disable PXP device if GSC is not in PXP mode. CC: Daniele Ceraolo Spurio <[email protected]> Signed-off-by: Tomas Winkler <[email protected]> Signed-off-by: Alexander Usyskin <[email protected]> Reviewed-by: Daniele Ceraolo Spurio <[email protected]> Signed-off-by: Daniele Ceraolo Spurio <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Acked-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Joonas Lahtinen <[email protected]> Acked-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Joonas Lahtinen <[email protected]>
2022-09-12mei: mkhi: add memory ready commandTomas Winkler1-0/+12
Add GSC memory ready command. The command indicates to the firmware that extend operation memory was setup and the firmware may enter PXP mode. CC: Daniele Ceraolo Spurio <[email protected]> Signed-off-by: Tomas Winkler <[email protected]> Signed-off-by: Alexander Usyskin <[email protected]> Reviewed-by: Daniele Ceraolo Spurio <[email protected]> Signed-off-by: Daniele Ceraolo Spurio <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Acked-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Joonas Lahtinen <[email protected]>
2022-09-12mei: bus: export common mkhi definitions into a separate headerVitaly Lubart2-30/+44
Exported common mkhi definitions from bus-fixup.c into a separate header file mkhi.h for other driver usage. Signed-off-by: Vitaly Lubart <[email protected]> Signed-off-by: Tomas Winkler <[email protected]> Signed-off-by: Alexander Usyskin <[email protected]> Reviewed-by: Daniele Ceraolo Spurio <[email protected]> Signed-off-by: Daniele Ceraolo Spurio <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Acked-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Joonas Lahtinen <[email protected]>
2022-09-12mei: extend timeouts on slow devicesAlexander Usyskin12-43/+82
Parametrize operational timeouts in order to support slow firmware on some graphics devices. Signed-off-by: Alexander Usyskin <[email protected]> Signed-off-by: Tomas Winkler <[email protected]> Reviewed-by: Daniele Ceraolo Spurio <[email protected]> Signed-off-by: Daniele Ceraolo Spurio <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Acked-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Joonas Lahtinen <[email protected]>
2022-09-12mei: gsc: wait for reset thread on stopAlexander Usyskin1-1/+3
Wait for reset work to complete before initiating stop reset flow sequence. Signed-off-by: Alexander Usyskin <[email protected]> Signed-off-by: Tomas Winkler <[email protected]> Reviewed-by: Daniele Ceraolo Spurio <[email protected]> Signed-off-by: Daniele Ceraolo Spurio <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Acked-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Joonas Lahtinen <[email protected]>