aboutsummaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/i915/gt/shmem_utils.c
AgeCommit message (Collapse)AuthorFilesLines
2024-04-25fix missing vmalloc.h includesKent Overstreet1-0/+1
Patch series "Memory allocation profiling", v6. Overview: Low overhead [1] per-callsite memory allocation profiling. Not just for debug kernels, overhead low enough to be deployed in production. Example output: root@moria-kvm:~# sort -rn /proc/allocinfo 127664128 31168 mm/page_ext.c:270 func:alloc_page_ext 56373248 4737 mm/slub.c:2259 func:alloc_slab_page 14880768 3633 mm/readahead.c:247 func:page_cache_ra_unbounded 14417920 3520 mm/mm_init.c:2530 func:alloc_large_system_hash 13377536 234 block/blk-mq.c:3421 func:blk_mq_alloc_rqs 11718656 2861 mm/filemap.c:1919 func:__filemap_get_folio 9192960 2800 kernel/fork.c:307 func:alloc_thread_stack_node 4206592 4 net/netfilter/nf_conntrack_core.c:2567 func:nf_ct_alloc_hashtable 4136960 1010 drivers/staging/ctagmod/ctagmod.c:20 [ctagmod] func:ctagmod_start 3940352 962 mm/memory.c:4214 func:alloc_anon_folio 2894464 22613 fs/kernfs/dir.c:615 func:__kernfs_new_node ... Usage: kconfig options: - CONFIG_MEM_ALLOC_PROFILING - CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT - CONFIG_MEM_ALLOC_PROFILING_DEBUG adds warnings for allocations that weren't accounted because of a missing annotation sysctl: /proc/sys/vm/mem_profiling Runtime info: /proc/allocinfo Notes: [1]: Overhead To measure the overhead we are comparing the following configurations: (1) Baseline with CONFIG_MEMCG_KMEM=n (2) Disabled by default (CONFIG_MEM_ALLOC_PROFILING=y && CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT=n) (3) Enabled by default (CONFIG_MEM_ALLOC_PROFILING=y && CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT=y) (4) Enabled at runtime (CONFIG_MEM_ALLOC_PROFILING=y && CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT=n && /proc/sys/vm/mem_profiling=1) (5) Baseline with CONFIG_MEMCG_KMEM=y && allocating with __GFP_ACCOUNT (6) Disabled by default (CONFIG_MEM_ALLOC_PROFILING=y && CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT=n) && CONFIG_MEMCG_KMEM=y (7) Enabled by default (CONFIG_MEM_ALLOC_PROFILING=y && CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT=y) && CONFIG_MEMCG_KMEM=y Performance overhead: To evaluate performance we implemented an in-kernel test executing multiple get_free_page/free_page and kmalloc/kfree calls with allocation sizes growing from 8 to 240 bytes with CPU frequency set to max and CPU affinity set to a specific CPU to minimize the noise. Below are results from running the test on Ubuntu 22.04.2 LTS with 6.8.0-rc1 kernel on 56 core Intel Xeon: kmalloc pgalloc (1 baseline) 6.764s 16.902s (2 default disabled) 6.793s (+0.43%) 17.007s (+0.62%) (3 default enabled) 7.197s (+6.40%) 23.666s (+40.02%) (4 runtime enabled) 7.405s (+9.48%) 23.901s (+41.41%) (5 memcg) 13.388s (+97.94%) 48.460s (+186.71%) (6 def disabled+memcg) 13.332s (+97.10%) 48.105s (+184.61%) (7 def enabled+memcg) 13.446s (+98.78%) 54.963s (+225.18%) Memory overhead: Kernel size: text data bss dec diff (1) 26515311 18890222 17018880 62424413 (2) 26524728 19423818 16740352 62688898 264485 (3) 26524724 19423818 16740352 62688894 264481 (4) 26524728 19423818 16740352 62688898 264485 (5) 26541782 18964374 16957440 62463596 39183 Memory consumption on a 56 core Intel CPU with 125GB of memory: Code tags: 192 kB PageExts: 262144 kB (256MB) SlabExts: 9876 kB (9.6MB) PcpuExts: 512 kB (0.5MB) Total overhead is 0.2% of total memory. Benchmarks: Hackbench tests run 100 times: hackbench -s 512 -l 200 -g 15 -f 25 -P baseline disabled profiling enabled profiling avg 0.3543 0.3559 (+0.0016) 0.3566 (+0.0023) stdev 0.0137 0.0188 0.0077 hackbench -l 10000 baseline disabled profiling enabled profiling avg 6.4218 6.4306 (+0.0088) 6.5077 (+0.0859) stdev 0.0933 0.0286 0.0489 stress-ng tests: stress-ng --class memory --seq 4 -t 60 stress-ng --class cpu --seq 4 -t 60 Results posted at: https://evilpiepirate.org/~kent/memalloc_prof_v4_stress-ng/ [2] https://lore.kernel.org/all/[email protected]/ This patch (of 37): The next patch drops vmalloc.h from a system header in order to fix a circular dependency; this adds it to all the files that were pulling it in implicitly. [[email protected]: fix arch/alpha/lib/memcpy.c] Link: https://lkml.kernel.org/r/[email protected] [[email protected]: fix arch/x86/mm/numa_32.c] Link: https://lkml.kernel.org/r/[email protected] [[email protected]: a few places were depending on sizes.h] Link: https://lkml.kernel.org/r/[email protected] [[email protected]: fix mm/kasan/hw_tags.c] Link: https://lkml.kernel.org/r/[email protected] [[email protected]: fix arc build] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Kent Overstreet <[email protected]> Signed-off-by: Suren Baghdasaryan <[email protected]> Signed-off-by: Arnd Bergmann <[email protected]> Reviewed-by: Pasha Tatashin <[email protected]> Tested-by: Kees Cook <[email protected]> Cc: Alexander Viro <[email protected]> Cc: Alex Gaynor <[email protected]> Cc: Alice Ryhl <[email protected]> Cc: Andreas Hindborg <[email protected]> Cc: Benno Lossin <[email protected]> Cc: "Björn Roy Baron" <[email protected]> Cc: Boqun Feng <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Dennis Zhou <[email protected]> Cc: Gary Guo <[email protected]> Cc: Miguel Ojeda <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Wedson Almeida Filho <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-08-10drm/i915/gt: Simplify shmem_create_from_object map_type selectionJonathan Cavitt1-2/+1
The object pin created for shmem_create_from_object is just a single use mapping with the sole purpose of reading the contents of the whole object in bulk. And the whole source object is also even a throw-away. Ergo, the additional logic required by i915_coherent_map_type can be safely dropped and simplified. Suggested-by: Tvrtko Ursulin <[email protected]> Signed-off-by: Jonathan Cavitt <[email protected]> Reviewed-by: Tvrtko Ursulin <[email protected]> Reviewed-by: Andi Shyti <[email protected]> Signed-off-by: Andi Shyti <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2022-12-20drm/i915: Use helper func to find out map typeNirmoy Das1-2/+5
Use i915_coherent_map_type() function to find out map_type of the shmem obj. v2: handle non-llc platform(Matt) Signed-off-by: Nirmoy Das <[email protected]> Reviewed-by: Andrzej Hajda <[email protected]> Signed-off-by: Matthew Auld <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2022-02-25drm/i915/gt: Add helper for shmem copy to iosys_mapLucas De Marchi1-0/+32
Add a variant of shmem_read() that takes a iosys_map pointer rather than a plain pointer as argument. It's mostly a copy __shmem_rw() but adapting the api and removing the write support since there's currently only need to use iosys_map as destination. Reworking __shmem_rw() to share the implementation was tempting, but finding a good balance between reuse and clarity pushed towards a little code duplication. Since the function is small, just add the similar function with a copy/paste/adapt approach. v2: Add an offset as argument and instead of using a map iterator, use the offset to keep track of where we are writing data to. Cc: Matt Roper <[email protected]> Cc: Joonas Lahtinen <[email protected]> Cc: Tvrtko Ursulin <[email protected]> Cc: David Airlie <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: Matthew Auld <[email protected]> Cc: Thomas Hellström <[email protected]> Cc: Maarten Lankhorst <[email protected]> Signed-off-by: Lucas De Marchi <[email protected]> Reviewed-by: Matt Atwood <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2021-04-27drm/i915/dg1: Fix mapping type for default state objectVenkata Ramana Nayana1-1/+3
Use I915_MAP_WC when default state object is allocated in LMEM. Signed-off-by: Venkata Ramana Nayana <[email protected]> Reviewed-by: Matthew Auld <[email protected]> Signed-off-by: Matthew Auld <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2021-03-24drm/i915: Fix ww locking in shmem_create_from_objectMaarten Lankhorst1-1/+1
Quick fix, just use the unlocked version. Signed-off-by: Maarten Lankhorst <[email protected]> Reviewed-by: Thomas Hellström <[email protected]> Signed-off-by: Daniel Vetter <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2021-01-20drm/i915/gem: Move stolen node into GEM object unionChris Wilson1-1/+1
The obj->stolen is currently used to identify an object allocated from stolen memory. This dates back to when there were just 1.5 types of objects, an object backed by shmemfs and an object backed by shmemfs with a contiguous physical address. Now that we have several different types of objects, we no longer want to treat stolen objects as a special case. Signed-off-by: Chris Wilson <[email protected]> Reviewed-by: Matthew Auld <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2020-12-07drm/i915: fix size_t greater or equal to zero comparisonColin Ian King1-1/+1
Currently the check that the unsigned size_t variable i is >= 0 is always true because the unsigned variable will never be negative, causing the loop to run forever. Fix this by changing the pre-decrement check to a zero check on i followed by a decrement of i. Addresses-Coverity: ("Unsigned compared against 0") Fixes: bfed6708d6c9 ("drm/i915: use vmap in shmem_pin_map") Signed-off-by: Colin Ian King <[email protected]> Reviewed-by: Chris Wilson <[email protected]> Signed-off-by: Chris Wilson <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2020-11-27drm/i915/gt: Retain default context state across shrinkingVenkata Ramana Nayana1-2/+5
As we use a shmemfs file to hold the context state, when not in use it may be swapped out, such as across suspend. Since we wrote into the shmemfs without marking the pages as dirty, the contents may be dropped instead of being written back to swap. On re-using the shmemfs file, such as creating a new context after resume, the contents of that file were likely garbage and so the new context could then hang the GPU. Simply mark the page as being written when copying into the shmemfs file, and it the new contents will be retained across swapout. Fixes: be1cb55a07bf ("drm/i915/gt: Keep a no-frills swappable copy of the default context state") Cc: Sudeep Dutt <[email protected]> Cc: Matthew Auld <[email protected]> Cc: Tvrtko Ursulin <[email protected]> Cc: Ramalingam C <[email protected]> Signed-off-by: CQ Tang <[email protected]> Signed-off-by: Venkata Ramana Nayana <[email protected]> Reviewed-by: Chris Wilson <[email protected]> Signed-off-by: Chris Wilson <[email protected]> Cc: <[email protected]> # v5.8+ Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2020-10-18drm/i915: use vmap in shmem_pin_mapChristoph Hellwig1-58/+18
shmem_pin_map somewhat awkwardly reimplements vmap using alloc_vm_area and manual pte setup. The only practical difference is that alloc_vm_area prefeaults the vmalloc area PTEs, which doesn't seem to be required here (and could be added to vmap using a flag if actually required). Switch to use vmap, and use vfree to free both the vmalloc mapping and the page array, as well as dropping the references to each page. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Tvrtko Ursulin <[email protected]> Cc: Boris Ostrovsky <[email protected]> Cc: Chris Wilson <[email protected]> Cc: Jani Nikula <[email protected]> Cc: Joonas Lahtinen <[email protected]> Cc: Juergen Gross <[email protected]> Cc: Matthew Auld <[email protected]> Cc: "Matthew Wilcox (Oracle)" <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Nitin Gupta <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rodrigo Vivi <[email protected]> Cc: Stefano Stabellini <[email protected]> Cc: Uladzislau Rezki (Sony) <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-04-29drm/i915/gt: Keep a no-frills swappable copy of the default context stateChris Wilson1-0/+173
We need to keep the default context state around to instantiate new contexts (aka golden rendercontext), and we also keep it pinned while the engine is active so that we can quickly reset a hanging context. However, the default contexts are large enough to merit keeping in swappable memory as opposed to kernel memory, so we store them inside shmemfs. Currently, we use the normal GEM objects to create the default context image, but we can throw away all but the shmemfs file. This greatly simplifies the tricky power management code which wants to run underneath the normal GT locking, and we definitely do not want to use any high level objects that may appear to recurse back into the GT. Though perhaps the primary advantage of the complex GEM object is that we aggressively cache the mapping, but here we are recreating the vm_area everytime time we unpark. At the worst, we add a lightweight cache, but first find a microbenchmark that is impacted. Having started to create some utility functions to make working with shmemfs objects easier, we can start putting them to wider use, where GEM objects are overkill, such as storing persistent error state. Signed-off-by: Chris Wilson <[email protected]> Cc: Matthew Auld <[email protected]> Cc: Ramalingam C <[email protected]> Cc: Tvrtko Ursulin <[email protected]> Reviewed-by: Matthew Auld <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]