aboutsummaryrefslogtreecommitdiff
path: root/drivers/accel/habanalabs/include
AgeCommit message (Collapse)AuthorFilesLines
2024-06-23accel/habanalbs/gaudi2: reduce interrupt count to 128Ofir Bitton1-2/+2
Some systems allow a maximum number of 128 MSI-X interrupts. Hence we reduce the interrupt count to 128 instead of 512. Reviewed-by: Tomer Tayar <[email protected]> Signed-off-by: Ofir Bitton <[email protected]>
2024-06-23accel/habanalabs/gaudi2: add GAUDI2D revision supportFarah Kassabri1-1/+2
Gaudi2 with PCI revision ID with the value of '4' represents Gaudi2D device and should be detected and initialized as Gaudi2. Signed-off-by: Farah Kassabri <[email protected]> Reviewed-by: Ofir Bitton <[email protected]> Signed-off-by: Ofir Bitton <[email protected]>
2024-06-23accel/habanalabs/gaudi2: align interrupt names to tableAriel Suller1-75/+75
when reporting tpc events, the dcore and tpc in dcore should be reported and propagated, and not the generatl tpc number Signed-off-by: Ariel Suller <[email protected]> Reviewed-by: Ofir Bitton <[email protected]> Signed-off-by: Ofir Bitton <[email protected]>
2024-06-23accel/habanalabs/gaudi2: align embedded specs headersOfir Bitton2-20/+15
Align embedded headers to latest release. Reviewed-by: Tomer Tayar <[email protected]> Signed-off-by: Ofir Bitton <[email protected]>
2024-06-23accel/habanalabs/gaudi2: update interrupts related headersFarah Kassabri1-47/+47
Align the interrupts related headers to latest release. Signed-off-by: Farah Kassabri <[email protected]> Reviewed-by: Ofir Bitton <[email protected]> Signed-off-by: Ofir Bitton <[email protected]>
2024-02-26accel/habanalabs/gaudi2: move HMMU page tables to device memoryFarah Kassabri1-0/+2
Currently the HMMU page tables reside in the host memory, which will cause host access from the device for every page walk. This can affect PCIe bandwidth in certain scenarios. To prevent that problem, HMMU page tables will be moved to the device memory so the miss transaction will read the hops from there instead of going to the host. Signed-off-by: Farah Kassabri <[email protected]> Reviewed-by: Oded Gabbay <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2023-12-19accel/habanalabs/gaudi2: use correct registers to dump QM CQ infoTomer Tayar1-6/+6
The QM CQ PTR_LO/PTR_HI/TSIZE registers are for pushing a CQ entry, and although they are updated by HW even when descriptors are fetched by PQ and CB addresses are fed into CQ, the correct registers to use when dumping the CQ info are the ones with the _STS suffix. Signed-off-by: Tomer Tayar <[email protected]> Reviewed-by: Oded Gabbay <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2023-12-19accel/habanalabs/gaudi2: get the correct QM CQ info upon an errorTomer Tayar1-0/+1
Upon a QM error, the address/size from both the CQ and the ARC_CQ are printed, although the instruction that led to the error was received from only one of them. Moreover, in case of a QM undefined opcode, only one of these address/size sets will be captured based on the value of ARC_CQ_PTR. However, this value can be non-zero even if currently the CQ is used, in case the CQ/ARC_CQ are alternately used. Under the assumption of having a stop-on-error configuration, modify to use CP_STS.CUR_CQ field to get the relevant CQ for the QM error. Signed-off-by: Tomer Tayar <[email protected]> Reviewed-by: Oded Gabbay <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2023-12-19accel/habanalabs: add support for Gaudi2C deviceOded Gabbay1-0/+1
Gaudi2 with PCI revision ID with the value of '3' represents Gaudi2C device and should be detected and initialized as Gaudi2. Signed-off-by: Oded Gabbay <[email protected]>
2023-10-09accel/habanalabs/gaudi2: perform hard-reset upon PCIe AXI drain eventTomer Tayar1-1/+1
Non-completed transactions from PCIe towards the device are handled by the AXI drain mechanism. This handling is in the PCIe level, but the transactions are still there in the device consuming some queues entries, and therefore the device must be reset. Modify to perform hard-reset upon PCIe AXI drain events. Signed-off-by: Tomer Tayar <[email protected]> Reviewed-by: Oded Gabbay <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2023-10-09accel/habanalabs/gaudi: remove unused structure definitionOded Gabbay1-32/+0
struct gaudi_nic_status is not used anywhere in the code. Signed-off-by: Oded Gabbay <[email protected]> Reviewed-by: Ofir Bitton <[email protected]>
2023-10-09accel/habanalabs/gaudi2: handle eq health heartbeat checkfarah kassabri1-6/+8
Add mechanism for fw eq health check. this will be done using two flows: using the heartbeat mechanism and raising a dedicated interrupt to indicate an eq failure like EQ full. This patch will add implementation for the eq heartbeat for gaudi2 asic. More info about the heartbeat mechanism: Expand the heartbeat mechanism to monitor a new event that will be sent from FW upon receiving heartbeat message. that way driver can know that the eq is working or not. Signed-off-by: farah kassabri <[email protected]> Reviewed-by: Oded Gabbay <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2023-10-09accel/habanalabs/gaudi2: print power-mode changesMoti Haimovski2-0/+19
Print to kernel log any device power mode changes events reported by the FW. Signed-off-by: Moti Haimovski <[email protected]> Reviewed-by: Oded Gabbay <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2023-10-09accel/habanalabs: move cpucp interface to linux/habanalabsDavid Meriin2-2192/+0
The CPUCP interface is moved to a shared folder outside of accel as a pre-requisite to upstream the NIC drivers that will also include this file. Signed-off-by: David Meriin <[email protected]> Reviewed-by: Oded Gabbay <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2023-10-09accel/habanalabs/gaudi2: include block id in ECC error reportingOfir Bitton1-1/+2
During ECC event handling, Memory wrapper id was mistakenly printed as block id. Fix the print and in addition fetch the actual block-id from firmware. Signed-off-by: Ofir Bitton <[email protected]> Reviewed-by: Oded Gabbay <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2023-10-09accel/habanalabs: handle f/w reserved dram space requestDani Liberman1-0/+5
It is possible for FW to request reserved space in dram. If the device supports this option, it will retrieve the size from the f/w and will reserve it. Currently we add the common code infrastructure to support it. Signed-off-by: Dani Liberman <[email protected]> Reviewed-by: Oded Gabbay <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2023-06-08accel/habanalabs: print qman data on error only for lower qmanTomer Tayar1-0/+11
By default, the upper QMANs are not used, and instead engines ARCs access the lower QMANs directly. Errors for upper QMANs are therefore not expected, and the debug print of the PQ entries is not needed. Modify the QMAN debug data print on errors to include only information for the lower QMAN. Signed-off-by: Tomer Tayar <[email protected]> Reviewed-by: Oded Gabbay <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2023-06-08accel/habanalabs: align to latest firmware specsOded Gabbay2-43/+16
Update the firmware common interface files with the latest version. Signed-off-by: Oded Gabbay <[email protected]> Reviewed-by: Ofir Bitton <[email protected]>
2023-06-08accel/habanalabs: set unused bit as reservedOded Gabbay1-1/+1
Get latest f/w gaudi2 interface file which marks unused bist_need_iatu_config bit in cold_rst_data structure as reserved bit. Signed-off-by: Oded Gabbay <[email protected]> Reviewed-by: Ofir Bitton <[email protected]>
2023-06-05accel/habanalabs: do soft-reset using cpucp packetDafna Hirschfeld1-0/+4
This is done depending on the FW version. The cpucp method is preferable and saves scratchpads resource. Signed-off-by: Dafna Hirschfeld <[email protected]> Reviewed-by: Oded Gabbay <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2023-04-08accel/habanalabs: fix handling of arc farm sei eventDani Liberman1-1/+3
There is only single eq entry for arc farm sei event which aggregates events from the four arc farms. Fix the code to handle this event according to this behavior. Signed-off-by: Dani Liberman <[email protected]> Reviewed-by: Oded Gabbay <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2023-03-20accel/habanalabs: add handling for unexpected user eventOfir Bitton1-0/+2
In order for the user to be aware of unexpected events in Gaudi2 that aren't assigned to a specific engine, we are adding the handling of this dedicated interrupt. Signed-off-by: Ofir Bitton <[email protected]> Reviewed-by: Oded Gabbay <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2023-03-20accel/habanalabs: regenerate gaudi2 ids_map_extendedOhad Sharabi1-38/+38
Some names of events has been modified/added. Signed-off-by: Ohad Sharabi <[email protected]> Reviewed-by: Oded Gabbay <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2023-03-20accel/habanalabs: align to latest firmware specsOded Gabbay4-39/+26
Copy the most up-to-date interface files to the firmware. Signed-off-by: Oded Gabbay <[email protected]> Reviewed-by: Ofir Bitton <[email protected]>
2023-03-15accel/habanalabs: add uapi to stall/resume engineKoby Elbaz1-0/+5
The user might want to stall/resume engines to perform power testing for various scenarios. Because our current HL_CS_FLAGS_ENGINE_CORE_COMMAND command only handles the engines' cores, we need to add another opcode for handling entire engine and not just its core. The user supplies an array, where each entry holds the engine's ID and the command to send to the engine. The size of the array is limited by the number of engines in the ASIC (only Gaudi2 is currently supported). Signed-off-by: Koby Elbaz <[email protected]> Reviewed-by: Oded Gabbay <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2023-03-15accel/habanalabs: modify events reset policyOhad Sharabi1-244/+244
The policy file of the events reset has been modified. This change is reflected in the autogenerated file. Signed-off-by: Ohad Sharabi <[email protected]> Reviewed-by: Oded Gabbay <[email protected]> Signed-off-by: Oded Gabbay <[email protected]> Reviewed-by: Stanislaw Gruszka <[email protected]>
2023-03-15accel/habanalabs: get reset type indication from irq_mapOhad Sharabi1-2644/+2650
When getting an event, add the ability to deduce the reset type from the IRQ map table instead of using hard reset regardless. Signed-off-by: Ohad Sharabi <[email protected]> Reviewed-by: Oded Gabbay <[email protected]> Signed-off-by: Oded Gabbay <[email protected]> Reviewed-by: Stanislaw Gruszka <[email protected]>
2023-01-26habanalabs/gaudi2: find decode error root causeKoby Elbaz1-0/+157
When a decode error happens, we often don't know the exact root cause (the erroneous address that was accessed) and the exact engine that created the erroneous transaction. To find out, we need to go over all the relevant register blocks in the ASIC. Once we find the relevant engine, we print its details and the offending address. This helps tremendously when debugging an error that was created by running a user workload. Signed-off-by: Koby Elbaz <[email protected]> Reviewed-by: Oded Gabbay <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2023-01-26habanalabs: Replace zero-length arrays with flexible-array membersGustavo A. R. Silva1-2/+2
Zero-length arrays are deprecated[1] and we are moving towards adopting C99 flexible-array members instead. So, replace zero-length arrays in a couple of structures with flex-array members. This helps with the ongoing efforts to tighten the FORTIFY_SOURCE routines on memcpy() and help us make progress towards globally enabling -fstrict-flex-arrays=3 [2]. Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#zero-length-and-one-element-arrays [1] Link: https://gcc.gnu.org/pipermail/gcc-patches/2022-October/602902.html [2] Link: https://github.com/KSPP/linux/issues/78 Signed-off-by: Gustavo A. R. Silva <[email protected]> Reviewed-by: Stanislaw Gruszka <[email protected]> Reviewed-by: Oded Gabbay <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2023-01-26habanalabs/gaudi2: avoid reconfiguring the same PB registersKoby Elbaz2-0/+1204
It appears that, within the sync manager security configuration, we reconfigure PB registers over and over without any need to do that. Signed-off-by: Koby Elbaz <[email protected]> Reviewed-by: Oded Gabbay <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2023-01-26habanalabs: update f/w filesOded Gabbay2-7/+127
Update common firmware files with the latest version. There is no functional change. Signed-off-by: Oded Gabbay <[email protected]>
2023-01-26habanalabs/gaudi2: update f/w filesOded Gabbay2-16/+23
Update gaudi2 firmware files with the latest version. There is no functional change. Signed-off-by: Oded Gabbay <[email protected]>
2023-01-26habanalabs/gaudi2: update asic register filesOded Gabbay10-155/+114
Update some register files with the latest h/w auto-generated files. There is no functional change. Signed-off-by: Oded Gabbay <[email protected]>
2023-01-26habanalabs: add uapi to flush inbound HBM transactionsOhad Sharabi1-0/+2
When doing p2p with a NIC device, the NIC needs to make sure all the writes to the HBM (through the PCI bar of the Gaudi device) were flushed. It can be done by either the NIC or the host reading through the PCI bar. To support the host side, we supply a simple uapi to perform this flush through the driver, because the user can't create such a transaction by itself (the PCI bar isn't exposed to normal users). Signed-off-by: Ohad Sharabi <[email protected]> Reviewed-by: Oded Gabbay <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2023-01-26habanalabs: move driver to accel subsystemOded Gabbay378-0/+251969
Now that we have a subsystem for compute accelerators, move the habanalabs driver to it. This patch only moves the files and fixes the Makefiles. Future patches will change the existing code to register to the accel subsystem and expose the accel device char files instead of the habanalabs device char files. Update the MAINTAINERS file to reflect this change. Signed-off-by: Oded Gabbay <[email protected]>