aboutsummaryrefslogtreecommitdiff
path: root/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
AgeCommit message (Collapse)AuthorFilesLines
2024-10-08iommu/arm-smmu-v3: Convert comma to semicolonChen Ni1-1/+1
Replace comma between expressions with semicolons. Using a ',' in place of a ';' can have unintended side effects. Although that is not the case here, it is seems best to use ';' unless ',' is intended. Found by inspection. No functional change intended. Compile tested only. Fixes: e3b1be2e73db ("iommu/arm-smmu-v3: Reorganize struct arm_smmu_ctx_desc_cfg") Signed-off-by: Chen Ni <nichen@iscas.ac.cn> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20240923021557.3432068-1-nichen@iscas.ac.cn Signed-off-by: Will Deacon <will@kernel.org>
2024-10-08iommu/arm-smmu-v3: Fix last_sid_idx calculation for sid_bits==32Daniel Mentz1-1/+1
The function arm_smmu_init_strtab_2lvl uses the expression ((1 << smmu->sid_bits) - 1) to calculate the largest StreamID value. However, this fails for the maximum allowed value of SMMU_IDR1.SIDSIZE which is 32. The C standard states: "If the value of the right operand is negative or is greater than or equal to the width of the promoted left operand, the behavior is undefined." With smmu->sid_bits being 32, the prerequisites for undefined behavior are met. We observed that the value of (1 << 32) is 1 and not 0 as we initially expected. Similar bit shift operations in arm_smmu_init_strtab_linear seem to not be affected, because it appears to be unlikely for an SMMU to have SMMU_IDR1.SIDSIZE set to 32 but then not support 2-level Stream tables This issue was found by Ryan Huang <tzukui@google.com> on our team. Fixes: ce410410f1a7 ("iommu/arm-smmu-v3: Add arm_smmu_strtab_l1/2_idx()") Signed-off-by: Daniel Mentz <danielmentz@google.com> Link: https://lore.kernel.org/r/20241002015357.1766934-1-danielmentz@google.com Signed-off-by: Will Deacon <will@kernel.org>
2024-09-13Merge branches 'fixes', 'arm/smmu', 'intel/vt-d', 'amd/amd-vi' and 'core' ↵Joerg Roedel1-239/+339
into next
2024-09-09iommu/arm-smmu-v3: Reorganize struct arm_smmu_ctx_desc_cfgJason Gunthorpe1-60/+55
The members here are being used for both the linear and the 2 level case, with the meaning of each item slightly different in the two cases. Split it into a clean union where both cases have their own struct with their own logical names and correct types. Adjust all the users to detect linear/2lvl and use the right sub structure and types consistently. Remove CTXDESC_CD_DWORDS by changing the last places to use sizeof(struct arm_smmu_cd). Tested-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/8-v4-6416877274e1+1af-smmuv3_tidy_jgg@nvidia.com Signed-off-by: Will Deacon <will@kernel.org>
2024-09-09iommu/arm-smmu-v3: Add types for each level of the CD tableJason Gunthorpe1-21/+24
As well as indexing helpers arm_smmu_cdtab_l1/2_idx(). Remove CTXDESC_L1_DESC_DWORDS and CTXDESC_CD_DWORDS replacing them all with type specific calculations. Tested-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/7-v4-6416877274e1+1af-smmuv3_tidy_jgg@nvidia.com Signed-off-by: Will Deacon <will@kernel.org>
2024-09-09iommu/arm-smmu-v3: Shrink the cdtab l1_desc arrayJason Gunthorpe1-24/+18
The top of the 2 level CD table is (at most) 1024 entries big, and two high order allocations are required. One of __le64 which is programmed into the HW (8k) and one of struct arm_smmu_l1_ctx_desc which holds the CPU pointer (16k). There are two copies of the l2ptr_dma, one is stored in the struct arm_smmu_l1_ctx_desc, and another is encoded in the __le64 for the HW to use. Instead of storing two copies just decode the value from the __le64. Tested-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/6-v4-6416877274e1+1af-smmuv3_tidy_jgg@nvidia.com Signed-off-by: Will Deacon <will@kernel.org>
2024-09-09iommu/arm-smmu-v3: Do not use devm for the cd table allocationsJason Gunthorpe1-16/+13
The master->cd_table is entirely contained within the struct arm_smmu_master which is guaranteed to be freed by the core code under arm_smmu_release_device(). There is no reason to use devm here, arm_smmu_free_cd_tables() is reliably called to free the CD related memory. Remove it and save some memory. Tested-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/5-v4-6416877274e1+1af-smmuv3_tidy_jgg@nvidia.com Signed-off-by: Will Deacon <will@kernel.org>
2024-09-09iommu/arm-smmu-v3: Remove strtab_base/cfgJason Gunthorpe1-28/+27
These values can be computed from the other values already stored in the config. Move the calculation to arm_smmu_write_strtab() and do it directly before writing the registers. This moves all the logic to calculate the two registers into one function from three and saves an unimportant 16 bytes from the arm_smmu_device. Suggested-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/4-v4-6416877274e1+1af-smmuv3_tidy_jgg@nvidia.com Signed-off-by: Will Deacon <will@kernel.org>
2024-09-09iommu/arm-smmu-v3: Reorganize struct arm_smmu_strtab_cfgJason Gunthorpe1-41/+37
The members here are being used for both the linear and the 2 level case, with the meaning of each item slightly different in the two cases. Split it into a clean union where both cases have their own struct with their own logical names and correct types. Adjust all the users to detect linear/2lvl and use the right sub structure and types consistently. Remove STRTAB_STE_DWORDS by changing the last places to use sizeof(struct arm_smmu_ste). Tested-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/3-v4-6416877274e1+1af-smmuv3_tidy_jgg@nvidia.com Signed-off-by: Will Deacon <will@kernel.org>
2024-09-09iommu/arm-smmu-v3: Add types for each level of the 2 level stream tableJason Gunthorpe1-10/+11
Add types struct arm_smmu_strtab_l1 and l2 to represent the HW layout of the descriptors, and use them in most places, following patches will get the remaing places. The size of the l1 and l2 HW allocations are sizeof(struct arm_smmu_strtab_l1/2). This provides some more clarity than having raw __le64 *'s and sizes computed via macros. Remove STRTAB_L1_DESC_DWORDS. Tested-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/2-v4-6416877274e1+1af-smmuv3_tidy_jgg@nvidia.com Signed-off-by: Will Deacon <will@kernel.org>
2024-09-09iommu/arm-smmu-v3: Add arm_smmu_strtab_l1/2_idx()Jason Gunthorpe1-26/+19
Don't open code the calculations of the indexes for each level, provide two functions to do that math and call them in all the places. Update all the places computing indexes. Calculate the L1 table size directly based on the max required index from the cap. Remove STRTAB_L1_SZ_SHIFT in favour of STRTAB_NUM_L2_STES. Use STRTAB_NUM_L2_STES to replace remaining open coded 1 << STRTAB_SPLIT. Tested-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/1-v4-6416877274e1+1af-smmuv3_tidy_jgg@nvidia.com Signed-off-by: Will Deacon <will@kernel.org>
2024-09-06iommu/arm-smmu-v3: Use the new rb tree helpersJason Gunthorpe1-37/+31
Since v5.12 the rbtree has gained some simplifying helpers aimed at making rb tree users write less convoluted boiler plate code. Instead the caller provides a single comparison function and the helpers generate the prior open-coded stuff. Update smmu->streams to use rb_find_add() and rb_find(). Tested-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Mostafa Saleh <smostafa@google.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/1-v3-9fef8cdc2ff6+150d1-smmuv3_tidy_jgg@nvidia.com Signed-off-by: Will Deacon <will@kernel.org>
2024-09-05iommu/tegra241-cmdqv: Do not allocate vcmdq until dma_set_mask_and_coherentNicolin Chen1-1/+8
It's observed that, when the first 4GB of system memory was reserved, all VCMDQ allocations failed (even with the smallest qsz in the last attempt): arm-smmu-v3: found companion CMDQV device: NVDA200C:00 arm-smmu-v3: option mask 0x10 arm-smmu-v3: failed to allocate queue (0x8000 bytes) for vcmdq0 acpi NVDA200C:00: tegra241_cmdqv: Falling back to standard SMMU CMDQ arm-smmu-v3: ias 48-bit, oas 48-bit (features 0x001e1fbf) arm-smmu-v3: allocated 524288 entries for cmdq arm-smmu-v3: allocated 524288 entries for evtq arm-smmu-v3: allocated 524288 entries for priq This is because the 4GB reserved memory shifted the entire DMA zone from a lower 32-bit range (on a system without the 4GB carveout) to higher range, while the dev->coherent_dma_mask was set to DMA_BIT_MASK(32) by default. The dma_set_mask_and_coherent() call is done in arm_smmu_device_hw_probe() of the SMMU driver. So any DMA allocation from tegra241_cmdqv_probe() must wait until the coherent_dma_mask is correctly set. Move the vintf/vcmdq structure initialization routine into a different op, "init_structures". Call it at the end of arm_smmu_init_structures(), where standard SMMU queues get allocated. Most of the impl_ops aren't ready until vintf/vcmdq structure are init-ed. So replace the full impl_ops with an init_ops in __tegra241_cmdqv_probe(). And switch to tegra241_cmdqv_impl_ops later in arm_smmu_init_structures(). Note that tegra241_cmdqv_impl_ops does not link to the new init_structures op after this switch, since there is no point in having it once it's done. Fixes: 918eb5c856f6 ("iommu/arm-smmu-v3: Add in-kernel support for NVIDIA Tegra241 (Grace) CMDQV") Reported-by: Matt Ochs <mochs@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/530993c3aafa1b0fc3d879b8119e13c629d12e2b.1725503154.git.nicolinc@nvidia.com Signed-off-by: Will Deacon <will@kernel.org>
2024-08-30iommu/arm-smmu-v3: Match Stall behaviour for S2Mostafa Saleh1-5/+3
According to the spec (ARM IHI 0070 F.b), in "5.5 Fault configuration (A, R, S bits)": A STE with stage 2 translation enabled and STE.S2S == 0 is considered ILLEGAL if SMMU_IDR0.STALL_MODEL == 0b10. Also described in the pseudocode “SteIllegal()” if STE.Config == '11x' then [..] if eff_idr0_stall_model == '10' && STE.S2S == '0' then // stall_model forcing stall, but S2S == 0 return TRUE; Which means, S2S must be set when stall model is "ARM_SMMU_FEAT_STALL_FORCE", but currently the driver ignores that. Although, the driver can do the minimum and only set S2S for “ARM_SMMU_FEAT_STALL_FORCE”, it is more consistent to match S1 behaviour, which also sets it for “ARM_SMMU_FEAT_STALL” if the master has requested stalls. Also, since S2 stalls are enabled now, report them to the IOMMU layer and for VFIO devices it will fail anyway as VFIO doesn’t register an iopf handler. Signed-off-by: Mostafa Saleh <smostafa@google.com> Link: https://lore.kernel.org/r/20240830110349.797399-2-smostafa@google.com Signed-off-by: Will Deacon <will@kernel.org>
2024-08-30iommu/tegra241-cmdqv: Limit CMDs for VCMDQs of a guest owned VINTFNicolin Chen1-12/+16
When VCMDQs are assigned to a VINTF owned by a guest (HYP_OWN bit unset), only TLB and ATC invalidation commands are supported by the VCMDQ HW. So, implement the new cmdq->supports_cmd op to scan the input cmd in order to make sure that it is supported by the selected queue. Note that the guest VM shouldn't have HYP_OWN bit being set regardless of guest kernel driver writing it or not, i.e. the hypervisor running in the host OS should wire this bit to zero when trapping a write access to this VINTF_CONFIG register from a guest kernel. Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Link: https://lore.kernel.org/r/8160292337059b91271045800e5c62f7295e2c24.1724970714.git.nicolinc@nvidia.com Signed-off-by: Will Deacon <will@kernel.org>
2024-08-30iommu/arm-smmu-v3: Start a new batch if new command is not supportedNicolin Chen1-2/+4
The VCMDQ in the tegra241-cmdqv driver has a guest mode that supports only a few invalidation commands. A batch is initialized with a cmdq, so it has to confirm whether a new command is supported or not. Add a supports_cmd function pointer to the cmdq structure, where the vcmdq driver should hook a command scan function. Add an inline helper too so it can be used by both sides. If a new command is not supported, simply issue the existing batch and re- init it as a new batch. Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Link: https://lore.kernel.org/r/aafb24b881504f18c5d0c7c15f2134e40ad2c486.1724970714.git.nicolinc@nvidia.com Signed-off-by: Will Deacon <will@kernel.org>
2024-08-30iommu/arm-smmu-v3: Add in-kernel support for NVIDIA Tegra241 (Grace) CMDQVNate Watterson1-1/+32
NVIDIA's Tegra241 Soc has a CMDQ-Virtualization (CMDQV) hardware, extending the standard ARM SMMU v3 IP to support multiple VCMDQs with virtualization capabilities. In terms of command queue, they are very like a standard SMMU CMDQ (or ECMDQs), but only support CS_NONE in the CS field of CMD_SYNC. Add a new tegra241-cmdqv driver, and insert its structure pointer into the existing arm_smmu_device, and then add related function calls in the SMMUv3 driver to interact with the CMDQV driver. In the CMDQV driver, add a minimal part for the in-kernel support: reserve VINTF0 for in-kernel use, and assign some of the VCMDQs to the VINTF0, and select one VCMDQ based on the current CPU ID to execute supported commands. This multi-queue design for in-kernel use gives some limited improvements: up to 20% reduction of invalidation time was measured by a multi-threaded DMA unmap benchmark, compared to a single queue. The other part of the CMDQV driver will be user-space support that gives a hypervisor running on the host OS to talk to the driver for virtualization use cases, allowing VMs to use VCMDQs without trappings, i.e. no VM Exits. This is designed based on IOMMUFD, and its RFC series is also under review. It will provide a guest OS a bigger improvement: 70% to 90% reductions of TLB invalidation time were measured by DMA unmap tests running in a guest, compared to nested SMMU CMDQ (with trappings). As the initial version, the CMDQV driver only supports ACPI configurations. Signed-off-by: Nate Watterson <nwatterson@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Co-developed-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Link: https://lore.kernel.org/r/dce50490b2c10b7254fb36aa73ed7ffd812b283a.1724970714.git.nicolinc@nvidia.com Signed-off-by: Will Deacon <will@kernel.org>
2024-08-30iommu/arm-smmu-v3: Add struct arm_smmu_impl_opsJason Gunthorpe1-1/+50
Mimicing the arm-smmu (v2) driver, introduce a struct arm_smmu_impl_ops to accommodate impl routines. Suggested-by: Will Deacon <will@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Link: https://lore.kernel.org/r/8fe9f3805568aabf771fc6706c116459016bf62d.1724970714.git.nicolinc@nvidia.com Signed-off-by: Will Deacon <will@kernel.org>
2024-08-30iommu/arm-smmu-v3: Add acpi_smmu_iort_probe_model for implNicolin Chen1-5/+10
For model-specific implementation, repurpose the acpi_smmu_get_options() to a wider acpi_smmu_acpi_probe_model(). A new model can add to the list in this new function. Suggested-by: Will Deacon <will@kernel.org> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Link: https://lore.kernel.org/r/79716299829aeab2e55b8c7932f2634b209bb4d5.1724970714.git.nicolinc@nvidia.com Signed-off-by: Will Deacon <will@kernel.org>
2024-08-30iommu/arm-smmu-v3: Add ARM_SMMU_OPT_TEGRA241_CMDQVNicolin Chen1-1/+15
The CMDQV extension in NVIDIA Tegra241 SoC only supports CS_NONE in the CS field of CMD_SYNC. Add a new SMMU option to accommodate that. Suggested-by: Will Deacon <will@kernel.org> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Link: https://lore.kernel.org/r/a3cb9bb2429fbae4a59f7ef517614d226763d717.1724970714.git.nicolinc@nvidia.com Signed-off-by: Will Deacon <will@kernel.org>
2024-08-30iommu/arm-smmu-v3: Make symbols public for CONFIG_TEGRA241_CMDQVNicolin Chen1-10/+8
The symbols __arm_smmu_cmdq_skip_err(), arm_smmu_init_one_queue(), and arm_smmu_cmdq_init() need to be used by the tegra241-cmdqv compilation unit in a following patch. Remove the static and put prototypes in the header. Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Link: https://lore.kernel.org/r/c4f2aa5f5f40a2e7c68b132c6d3171d6403de57a.1724970714.git.nicolinc@nvidia.com Signed-off-by: Will Deacon <will@kernel.org>
2024-08-30iommu/arm-smmu-v3: Pass in cmdq pointer to arm_smmu_cmdq_initNicolin Chen1-3/+3
So that this function can be used by other cmdqs than &smmu->cmdq only. Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Link: https://lore.kernel.org/r/e11a3c0bde172c9652c2946f12bc2ceed4c3a355.1724970714.git.nicolinc@nvidia.com Signed-off-by: Will Deacon <will@kernel.org>
2024-08-30iommu/arm-smmu-v3: Pass in cmdq pointer to arm_smmu_cmdq_build_sync_cmdNicolin Chen1-4/+6
The CMDQV extension on NVIDIA Tegra241 SoC only supports CS_NONE in the CS field of CMD_SYNC, v.s. standard SMMU CMDQ. Pass in the cmdq pointer directly, so the function can identify a different cmdq implementation. Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Link: https://lore.kernel.org/r/723288287997b6dfbcd2a904d2c11e9b23f82250.1724970714.git.nicolinc@nvidia.com Signed-off-by: Will Deacon <will@kernel.org>
2024-08-30iommu/arm-smmu-v3: Issue a batch of commands to the same cmdqNicolin Chen1-18/+30
The driver calls in different places the arm_smmu_get_cmdq() helper, and it's fine to do so since the helper always returns the single SMMU CMDQ. However, with NVIDIA CMDQV extension or SMMU ECMDQ, there can be multiple cmdqs in the system to select one from. And either case requires a batch of commands to be issued to the same cmdq. Thus, a cmdq has to be decided in the higher-level callers. Add a cmdq pointer in arm_smmu_cmdq_batch structure, and decide the cmdq when initializing the batch. Pass its pointer down to the bottom function. Update __arm_smmu_cmdq_issue_cmd() accordingly for single command issuers. Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Link: https://lore.kernel.org/r/2cbf5ddefb6ea611e48d67c642271bd24421eb21.1724970714.git.nicolinc@nvidia.com Signed-off-by: Will Deacon <will@kernel.org>
2024-08-30iommu: Allow ATS to work on VFs when the PF uses IDENTITYJason Gunthorpe1-0/+6
PCI ATS has a global Smallest Translation Unit field that is located in the PF but shared by all of the VFs. The expectation is that the STU will be set to the root port's global STU capability which is driven by the IO page table configuration of the iommu HW. Today it becomes set when the iommu driver first enables ATS. Thus, to enable ATS on the VF, the PF must have already had the correct STU programmed, even if ATS is off on the PF. Unfortunately the PF only programs the STU when the PF enables ATS. The iommu drivers tend to leave ATS disabled when IDENTITY translation is being used. Thus we can get into a state where the PF is setup to use IDENTITY with the DMA API while the VF would like to use VFIO with a PAGING domain and have ATS turned on. This fails because the PF never loaded a PAGING domain and so it never setup the STU, and the VF can't do it. The simplest solution is to have the iommu driver set the ATS STU when it probes the device. This way the ATS STU is loaded immediately at boot time to all PFs and there is no issue when a VF comes to use it. Add a new call pci_prepare_ats() which should be called by iommu drivers in their probe_device() op for every PCI device if the iommu driver supports ATS. This will setup the STU based on whatever page size capability the iommu HW has. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/0-v1-0fb4d2ab6770+7e706-ats_vf_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-08-23iommu: Handle iommu faults for a bad iopf setupPranjal Shrivastava1-1/+1
The iommu_report_device_fault function was updated to return void while assuming that drivers only need to call iommu_report_device_fault() for reporting an iopf. This implementation causes following problems: 1. The drivers rely on the core code to call it's page_reponse, however, when a fault is received and no fault capable domain is attached / iopf_param is NULL, the ops->page_response is NOT called causing the device to stall in case the fault type was PAGE_REQ. 2. The arm_smmu_v3 driver relies on the returned value to log errors returning void from iommu_report_device_fault causes these events to be missed while logging. Modify the iommu_report_device_fault function to return -EINVAL for cases where no fault capable domain is attached or iopf_param was NULL and calls back to the driver (ops->page_response) in case the fault type was IOMMU_FAULT_PAGE_REQ. The returned value can be used by the drivers to log the fault/event as needed. Reported-by: Kunkun Jiang <jiangkunkun@huawei.com> Closes: https://lore.kernel.org/all/6147caf0-b9a0-30ca-795e-a1aa502a5c51@huawei.com/ Fixes: 3dfa64aecbaf ("iommu: Make iommu_report_device_fault() return void") Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Pranjal Shrivastava <praan@google.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20240816104906.1010626-1-praan@google.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-08-16iommu/arm-smmu-v3: Fix a NULL vs IS_ERR() checkDan Carpenter1-2/+2
The arm_smmu_domain_alloc() function returns error pointers on error. It doesn't return NULL. Update the error checking to match. Fixes: 52acd7d8a413 ("iommu/arm-smmu-v3: Add support for domain_alloc_user fn") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/9208cd0d-8105-40df-93e9-bdcdf0d55eec@stanley.mountain Signed-off-by: Will Deacon <will@kernel.org>
2024-07-03iommu/arm-smmu-v3: Enable HTTU for stage1 with io-pgtable mappingKunkun Jiang1-0/+15
If io-pgtable quirk flag indicates support for hardware update of dirty state, enable HA/HD bits in the SMMU CD and also set the DBM bit in the page descriptor. Now report the dirty page tracking capability of SMMUv3 and select IOMMUFD_DRIVER for ARM_SMMU_V3 if IOMMUFD is enabled. Co-developed-by: Keqian Zhu <zhukeqian1@huawei.com> Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com> Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com> Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Link: https://lore.kernel.org/r/20240703101604.2576-6-shameerali.kolothum.thodi@huawei.com Signed-off-by: Will Deacon <will@kernel.org>
2024-07-03iommu/arm-smmu-v3: Add support for dirty tracking in domain allocJoao Martins1-23/+61
This provides all the infrastructure to enable dirty tracking if the hardware has the capability and domain alloc request for it. Also, add a device_iommu_capable() check in iommufd core for IOMMU_CAP_DIRTY_TRACKING before we request a user domain with dirty tracking support. Please note, we still report no support for IOMMU_CAP_DIRTY_TRACKING as it will finally be enabled in a subsequent patch. Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Link: https://lore.kernel.org/r/20240703101604.2576-5-shameerali.kolothum.thodi@huawei.com Signed-off-by: Will Deacon <will@kernel.org>
2024-07-03iommu/arm-smmu-v3: Add feature detection for HTTUJean-Philippe Brucker1-0/+32
If the SMMU supports it and the kernel was built with HTTU support, Probe support for Hardware Translation Table Update (HTTU) which is essentially to enable hardware update of access and dirty flags. Probe and set the smmu::features for Hardware Dirty and Hardware Access bits. This is in preparation, to enable it on the context descriptors of stage 1 format. Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Link: https://lore.kernel.org/r/20240703101604.2576-3-shameerali.kolothum.thodi@huawei.com Signed-off-by: Will Deacon <will@kernel.org>
2024-07-03iommu/arm-smmu-v3: Add support for domain_alloc_user fnShameer Kolothum1-2/+31
This will be used by iommufd for allocating usr managed domains and is also required when we add support for iommufd based dirty tracking support. Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Link: https://lore.kernel.org/r/20240703101604.2576-2-shameerali.kolothum.thodi@huawei.com Signed-off-by: Will Deacon <will@kernel.org>
2024-07-02iommu/arm-smmu-v3: Shrink the strtab l1_desc arrayJason Gunthorpe1-7/+6
The top of the 2 level stream table is (at most) 128k entries big, and two high order allocations are required. One of __le64 which is programmed into the HW (1M), and one of struct arm_smmu_strtab_l1_desc which holds the CPU pointer (3M). There is no reason to store the l2ptr_dma as nothing reads it. devm stores a copy of it and the DMA memory will be freed via devm mechanisms. span is a constant of 8+1. Remove both. This removes 16 bytes from each arm_smmu_l1_ctx_desc and saves up to 2M of memory per iommu instance. Tested-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Mostafa Saleh <smostafa@google.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Link: https://lore.kernel.org/r/2-v2-318ed5f6983b+198f-smmuv3_tidy_jgg@nvidia.com Signed-off-by: Will Deacon <will@kernel.org>
2024-07-02iommu/arm-smmu-v3: Do not zero the strtab twiceJason Gunthorpe1-20/+6
dmam_alloc_coherent() already returns zero'd memory so cfg->strtab.l1_desc (the list of DMA addresses for the L2 entries) is already zero'd. arm_smmu_init_l1_strtab() goes through and calls arm_smmu_write_strtab_l1_desc() on the newly allocated (and zero'd) struct arm_smmu_strtab_l1_desc, which ends up computing 'val = 0' and zeroing it again. Remove arm_smmu_init_l1_strtab() and just call devm_kcalloc() from arm_smmu_init_strtab_2lvl to allocate the companion struct. Tested-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Mostafa Saleh <smostafa@google.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Link: https://lore.kernel.org/r/1-v2-318ed5f6983b+198f-smmuv3_tidy_jgg@nvidia.com Signed-off-by: Will Deacon <will@kernel.org>
2024-07-02iommu/arm-smmu-v3: Allow setting a S1 domain to a PASIDJason Gunthorpe1-1/+40
The SVA cleanup made the SSID logic entirely general so all we need to do is call it with the correct cd table entry for a S1 domain. This is slightly tricky because of the ASID and how the locking works, the simple fix is to just update the ASID once we get the right locks. Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/14-v9-5cd718286059+79186-smmuv3_newapi_p2b_jgg@nvidia.com Signed-off-by: Will Deacon <will@kernel.org>
2024-07-02iommu/arm-smmu-v3: Allow a PASID to be set when RID is IDENTITY/BLOCKEDJason Gunthorpe1-2/+47
If the STE doesn't point to the CD table we can upgrade it by reprogramming the STE with the appropriate S1DSS. We may also need to turn on ATS at the same time. Keep track if the installed STE is pointing at the cd_table and the ATS state to trigger this path. Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/13-v9-5cd718286059+79186-smmuv3_newapi_p2b_jgg@nvidia.com Signed-off-by: Will Deacon <will@kernel.org>
2024-07-02iommu/arm-smmu-v3: Allow IDENTITY/BLOCKED to be set while PASID is usedJason Gunthorpe1-13/+47
The HW supports this, use the S1DSS bits to configure the behavior of SSID=0 which is the RID's translation. If SSID's are currently being used in the CD table then just update the S1DSS bits in the STE, remove the master_domain and leave ATS alone. For iommufd the driver design has a small problem that all the unused CD table entries are set with V=0 which will generate an event if VFIO userspace tries to use the CD entry. This patch extends this problem to include the RID as well if PASID is being used. For BLOCKED with used PASIDs the F_STREAM_DISABLED (STRTAB_STE_1_S1DSS_TERMINATE) event is generated on untagged traffic and a substream CD table entry with V=0 (removed pasid) will generate C_BAD_CD. Arguably there is no advantage to using S1DSS over the CD entry 0 with V=0. As we don't yet support PASID in iommufd this is a problem to resolve later, possibly by using EPD0 for unused CD table entries instead of V=0, and not using S1DSS for BLOCKED. Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/11-v9-5cd718286059+79186-smmuv3_newapi_p2b_jgg@nvidia.com Signed-off-by: Will Deacon <will@kernel.org>
2024-07-02iommu/arm-smmu-v3: Put the SVA mmu notifier in the smmu_domainJason Gunthorpe1-54/+15
This removes all the notifier de-duplication logic in the driver and relies on the core code to de-duplicate and allocate only one SVA domain per mm per smmu instance. This naturally gives a 1:1 relationship between SVA domain and mmu notifier. It is a significant simplication of the flow, as we end up with a single struct arm_smmu_domain for each MM and the invalidation can then be shifted to properly use the masters list like S1/S2 do. Remove all of the previous mmu_notifier, bond, shared cd, and cd refcount logic entirely. The logic here is tightly wound together with the unusued BTM support. Since the BTM logic requires holding all the iommu_domains in a global ASID xarray it conflicts with the design to have a single SVA domain per PASID, as multiple SMMU instances will need to have different domains. Following patches resolve this by making the ASID xarray per-instance instead of global. However, converting the BTM code over to this methodology requires many changes. Thus, since ARM_SMMU_FEAT_BTM is never enabled, remove the parts of the BTM support for ASID sharing that interact with SVA as well. A followup series is already working on fully enabling the BTM support, that requires iommufd's VIOMMU feature to bring in the KVM's VMID as well. It will come with an already written patch to bring back the ASID sharing using a per-instance ASID xarray. https://lore.kernel.org/linux-iommu/20240208151837.35068-1-shameerali.kolothum.thodi@huawei.com/ https://lore.kernel.org/linux-iommu/26-v6-228e7adf25eb+4155-smmuv3_newapi_p2_jgg@nvidia.com/ Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Michael Shavit <mshavit@google.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/10-v9-5cd718286059+79186-smmuv3_newapi_p2b_jgg@nvidia.com Signed-off-by: Will Deacon <will@kernel.org>
2024-07-02iommu/arm-smmu-v3: Keep track of arm_smmu_master_domain for SVAJason Gunthorpe1-2/+28
Fill in the smmu_domain->devices list in the new struct arm_smmu_domain that SVA allocates. Keep track of every SSID and master that is using the domain reusing the logic for the RID attach. This is the first step to making the SVA invalidation follow the same design as S1/S2 invalidation. At present nothing will read this list. Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/9-v9-5cd718286059+79186-smmuv3_newapi_p2b_jgg@nvidia.com Signed-off-by: Will Deacon <will@kernel.org>
2024-07-02iommu/arm-smmu-v3: Make SVA allocate a normal arm_smmu_domainJason Gunthorpe1-8/+19
Currently the SVA domain is a naked struct iommu_domain, allocate a struct arm_smmu_domain instead. This is necessary to be able to use the struct arm_master_domain mechanism. Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Reviewed-by: Michael Shavit <mshavit@google.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/8-v9-5cd718286059+79186-smmuv3_newapi_p2b_jgg@nvidia.com Signed-off-by: Will Deacon <will@kernel.org>
2024-07-02iommu/arm-smmu-v3: Thread SSID through the arm_smmu_attach_*() interfaceJason Gunthorpe1-9/+17
Allow creating and managing arm_smmu_mater_domain's with a non-zero SSID through the arm_smmu_attach_*() family of functions. This triggers ATC invalidation for the correct SSID in PASID cases and tracks the per-attachment SSID in the struct arm_smmu_master_domain. Generalize arm_smmu_attach_remove() to be able to remove SSID's as well by ensuring the ATC for the PASID is flushed properly. Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/7-v9-5cd718286059+79186-smmuv3_newapi_p2b_jgg@nvidia.com Signed-off-by: Will Deacon <will@kernel.org>
2024-07-02iommu/arm-smmu-v3: Do not use master->sva_enable to restrict attachesJason Gunthorpe1-12/+12
We no longer need a master->sva_enable to control what attaches are allowed. Instead we can tell if the attach is legal based on the current configuration of the master. Keep track of the number of valid CD entries for SSID's in the cd_table and if the cd_table has been installed in the STE directly so we know what the configuration is. The attach logic is then made into: - SVA bind, check if the CD is installed - RID attach of S2, block if SSIDs are used - RID attach of IDENTITY/BLOCKING, block if SSIDs are used arm_smmu_set_pasid() is already checking if it is possible to setup a CD entry, at this patch it means the RID path already set a STE pointing at the CD table. Tested-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/6-v9-5cd718286059+79186-smmuv3_newapi_p2b_jgg@nvidia.com Signed-off-by: Will Deacon <will@kernel.org>
2024-07-02iommu/arm-smmu-v3: Add ssid to struct arm_smmu_master_domainJason Gunthorpe1-9/+33
Prepare to allow a S1 domain to be attached to a PASID as well. Keep track of the SSID the domain is using on each master in the arm_smmu_master_domain. Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Reviewed-by: Michael Shavit <mshavit@google.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/5-v9-5cd718286059+79186-smmuv3_newapi_p2b_jgg@nvidia.com Signed-off-by: Will Deacon <will@kernel.org>
2024-07-02iommu/arm-smmu-v3: Make changing domains be hitless for ATSJason Gunthorpe1-66/+171
The core code allows the domain to be changed on the fly without a forced stop in BLOCKED/IDENTITY. In this flow the driver should just continually maintain the ATS with no change while the STE is updated. ATS relies on a linked list smmu_domain->devices to keep track of which masters have the domain programmed, but this list is also used by arm_smmu_share_asid(), unrelated to ats. Create two new functions to encapsulate this combined logic: arm_smmu_attach_prepare() <caller generates and sets the STE> arm_smmu_attach_commit() The two functions can sequence both enabling ATS and disabling across the STE store. Have every update of the STE use this sequence. Installing a S1/S2 domain always enables the ATS if the PCIe device supports it. The enable flow is now ordered differently to allow it to be hitless: 1) Add the master to the new smmu_domain->devices list 2) Program the STE 3) Enable ATS at PCIe 4) Remove the master from the old smmu_domain This flow ensures that invalidations to either domain will generate an ATC invalidation to the device while the STE is being switched. Thus we don't need to turn off the ATS anymore for correctness. The disable flow is the reverse: 1) Disable ATS at PCIe 2) Program the STE 3) Invalidate the ATC 4) Remove the master from the old smmu_domain Move the nr_ats_masters adjustments to be close to the list manipulations. It is a count of the number of ATS enabled masters currently in the list. This is stricly before and after the STE/CD are revised, and done under the list's spin_lock. This is part of the bigger picture to allow changing the RID domain while a PASID is in use. If a SVA PASID is relying on ATS to function then changing the RID domain cannot just temporarily toggle ATS off without also wrecking the SVA PASID. The new infrastructure here is organized so that the PASID attach/detach flows will make use of it as well in following patches. Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Michael Shavit <mshavit@google.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/4-v9-5cd718286059+79186-smmuv3_newapi_p2b_jgg@nvidia.com Signed-off-by: Will Deacon <will@kernel.org>
2024-07-02iommu/arm-smmu-v3: Make smmu_domain->devices into an allocated listJason Gunthorpe1-5/+34
The next patch will need to store the same master twice (with different SSIDs), so allocate memory for each list element. Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Reviewed-by: Michael Shavit <mshavit@google.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/3-v9-5cd718286059+79186-smmuv3_newapi_p2b_jgg@nvidia.com Signed-off-by: Will Deacon <will@kernel.org>
2024-07-02iommu/arm-smmu-v3: Start building a generic PASID layerJason Gunthorpe1-2/+30
Add arm_smmu_set_pasid()/arm_smmu_remove_pasid() which are to be used by callers that already constructed the arm_smmu_cd they wish to program. These functions will encapsulate the shared logic to setup a CD entry that will be shared by SVA and S1 domain cases. Prior fixes had already moved most of this logic up into __arm_smmu_sva_bind(), move it to it's final home. Following patches will relieve some of the remaining SVA restrictions: - The RID domain is a S1 domain and has already setup the STE to point to the CD table - The programmed PASID is the mm_get_enqcmd_pasid() - Nothing changes while SVA is running (sva_enable) SVA invalidation will still iterate over the S1 domain's master list, later patches will resolve that. Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/2-v9-5cd718286059+79186-smmuv3_newapi_p2b_jgg@nvidia.com Signed-off-by: Will Deacon <will@kernel.org>
2024-07-02iommu/arm-smmu-v3: Convert to domain_alloc_sva()Jason Gunthorpe1-9/+1
This allows the driver the receive the mm and always a device during allocation. Later patches need this to properly setup the notifier when the domain is first allocated. Remove ops->domain_alloc() as SVA was the only remaining purpose. Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Reviewed-by: Michael Shavit <mshavit@google.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/1-v9-5cd718286059+79186-smmuv3_newapi_p2b_jgg@nvidia.com Signed-off-by: Will Deacon <will@kernel.org>
2024-06-05iommu/arm-smmu-v3: Avoid uninitialized asid in case of errorMostafa Saleh1-1/+1
Static checker is complaining about the ASID possibly set uninitialized. This only happens in case of error and this value would be ignored anyway. A simple fix would be just to initialize the local variable to zero, this path will only be reached on the first attach to a domain where the CD is already initialized to zero. This avoids having to bloat the function with an error path. Closes: https://lore.kernel.org/linux-iommu/849e3d77-0a3c-43c4-878d-a0e061c8cd61@moroto.mountain/T/#u Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Mostafa Saleh <smostafa@google.com> Fixes: 04905c17f648 ("iommu/arm-smmu-v3: Build the whole CD in arm_smmu_make_s1_cd()") Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20240604185218.2602058-1-smostafa@google.com Signed-off-by: Will Deacon <will@kernel.org>
2024-05-13Merge branches 'arm/renesas', 'arm/smmu', 'x86/amd', 'core' and 'x86/vt-d' ↵Joerg Roedel1-275/+293
into next
2024-05-10iommu/arm-smmu-v3: Make the kunit into a moduleJason Gunthorpe1-0/+8
It turns out kconfig has problems ensuring the SMMU module and the KUNIT module are consistently y/m to allow linking. It will permit KUNIT to be a module while SMMU is built in. Also, Fedora apparently enables kunit on production kernels. So, put the entire kunit in its own module using the VISIBLE_IF_KUNIT/EXPORT_SYMBOL_IF_KUNIT machinery. This keeps it out of vmlinus on Fedora and makes the kconfig work in the normal way. There is no cost if kunit is disabled. Fixes: 56e1a4cc2588 ("iommu/arm-smmu-v3: Add unit tests for arm_smmu_write_entry") Reported-by: Thorsten Leemhuis <linux@leemhuis.info> Link: https://lore.kernel.org/all/aeea8546-5bce-4c51-b506-5d2008e52fef@leemhuis.info Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Thorsten Leemhuis <linux@leemhuis.info> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/0-v1-24cba6c0f404+2ae-smmu_kunit_module_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-05-01iommu/arm-smmu-v3: Add unit tests for arm_smmu_write_entryJason Gunthorpe1-23/+20
Add tests for some of the more common STE update operations that we expect to see, as well as some artificial STE updates to test the edges of arm_smmu_write_entry. These also serve as a record of which common operation is expected to be hitless, and how many syncs they require. arm_smmu_write_entry implements a generic algorithm that updates an STE/CD to any other abritrary STE/CD configuration. The update requires a sequence of write+sync operations with some invariants that must be held true after each sync. arm_smmu_write_entry lends itself well to unit-testing since the function's interaction with the STE/CD is already abstracted by input callbacks that we can hook to introspect into the sequence of operations. We can use these hooks to guarantee that invariants are held throughout the entire update operation. Link: https://lore.kernel.org/r/20240106083617.1173871-3-mshavit@google.com Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Michael Shavit <mshavit@google.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/9-v9-5040dc602008+177d7-smmuv3_newapi_p2_jgg@nvidia.com Signed-off-by: Will Deacon <will@kernel.org>