iommu/io-pgtable-arm: Optimise non-coherent unmap - blaster4385/linux-IllusionX - Linux kernel with personal config changes for arch linux

diff options

author	Ashish Mhetre <amhetre@nvidia.com>	2024-08-06 10:51:35 +0000
committer	Joerg Roedel <jroedel@suse.de>	2024-08-30 14:29:32 +0200
commit	84b2baf427968c1b2e3ae3b7afcb0118cdee0915 (patch)
tree	e85ebaec3fbbd28f607faa07aaffbb2ab69c12f7 /fs
parent	6c17c7d5936e6af6a5bda9f9de98a5e2ee6e8a6f (diff)

iommu/io-pgtable-arm: Optimise non-coherent unmap

The current __arm_lpae_unmap() function calls dma_sync() on individual PTEs after clearing them. Overall unmap performance can be improved by around 25% for large buffer sizes by combining the syncs for adjacent leaf entries. Optimize the unmap time by clearing all the leaf entries and issuing a single dma_sync() for them. Below is detailed analysis of average unmap latency(in us) with and without this optimization obtained by running dma_map_benchmark for different buffer sizes. UnMap Latency(us) Size Without With % gain with optimiztion optimization optimization 4KB 3 3 0 8KB 4 3.8 5 16KB 6.1 5.4 11.48 32KB 10.2 8.5 16.67 64KB 18.5 14.9 19.46 128KB 35 27.5 21.43 256KB 67.5 52.2 22.67 512KB 127.9 97.2 24.00 1MB 248.6 187.4 24.62 2MB 65.5 65.5 0 4MB 119.2 119 0.17 Reviewed-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Ashish Mhetre <amhetre@nvidia.com> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20240806105135.218089-1-amhetre@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>

Diffstat (limited to 'fs')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: