Age | Commit message (Collapse) | Author | Files | Lines |
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull mm updates from Andrew Morton:
"The usual shower of singleton fixes and minor series all over MM,
documented (hopefully adequately) in the respective changelogs.
Notable series include:
- Lucas Stach has provided some page-mapping cleanup/consolidation/
maintainability work in the series "mm/treewide: Remove pXd_huge()
API".
- In the series "Allow migrate on protnone reference with
MPOL_PREFERRED_MANY policy", Donet Tom has optimized mempolicy's
MPOL_PREFERRED_MANY mode, yielding almost doubled performance in
one test.
- In their series "Memory allocation profiling" Kent Overstreet and
Suren Baghdasaryan have contributed a means of determining (via
/proc/allocinfo) whereabouts in the kernel memory is being
allocated: number of calls and amount of memory.
- Matthew Wilcox has provided the series "Various significant MM
patches" which does a number of rather unrelated things, but in
largely similar code sites.
- In his series "mm: page_alloc: freelist migratetype hygiene"
Johannes Weiner has fixed the page allocator's handling of
migratetype requests, with resulting improvements in compaction
efficiency.
- In the series "make the hugetlb migration strategy consistent"
Baolin Wang has fixed a hugetlb migration issue, which should
improve hugetlb allocation reliability.
- Liu Shixin has hit an I/O meltdown caused by readahead in a
memory-tight memcg. Addressed in the series "Fix I/O high when
memory almost met memcg limit".
- In the series "mm/filemap: optimize folio adding and splitting"
Kairui Song has optimized pagecache insertion, yielding ~10%
performance improvement in one test.
- Baoquan He has cleaned up and consolidated the early zone
initialization code in the series "mm/mm_init.c: refactor
free_area_init_core()".
- Baoquan has also redone some MM initializatio code in the series
"mm/init: minor clean up and improvement".
- MM helper cleanups from Christoph Hellwig in his series "remove
follow_pfn".
- More cleanups from Matthew Wilcox in the series "Various
page->flags cleanups".
- Vlastimil Babka has contributed maintainability improvements in the
series "memcg_kmem hooks refactoring".
- More folio conversions and cleanups in Matthew Wilcox's series:
"Convert huge_zero_page to huge_zero_folio"
"khugepaged folio conversions"
"Remove page_idle and page_young wrappers"
"Use folio APIs in procfs"
"Clean up __folio_put()"
"Some cleanups for memory-failure"
"Remove page_mapping()"
"More folio compat code removal"
- David Hildenbrand chipped in with "fs/proc/task_mmu: convert
hugetlb functions to work on folis".
- Code consolidation and cleanup work related to GUP's handling of
hugetlbs in Peter Xu's series "mm/gup: Unify hugetlb, part 2".
- Rick Edgecombe has developed some fixes to stack guard gaps in the
series "Cover a guard gap corner case".
- Jinjiang Tu has fixed KSM's behaviour after a fork+exec in the
series "mm/ksm: fix ksm exec support for prctl".
- Baolin Wang has implemented NUMA balancing for multi-size THPs.
This is a simple first-cut implementation for now. The series is
"support multi-size THP numa balancing".
- Cleanups to vma handling helper functions from Matthew Wilcox in
the series "Unify vma_address and vma_pgoff_address".
- Some selftests maintenance work from Dev Jain in the series
"selftests/mm: mremap_test: Optimizations and style fixes".
- Improvements to the swapping of multi-size THPs from Ryan Roberts
in the series "Swap-out mTHP without splitting".
- Kefeng Wang has significantly optimized the handling of arm64's
permission page faults in the series
"arch/mm/fault: accelerate pagefault when badaccess"
"mm: remove arch's private VM_FAULT_BADMAP/BADACCESS"
- GUP cleanups from David Hildenbrand in "mm/gup: consistently call
it GUP-fast".
- hugetlb fault code cleanups from Vishal Moola in "Hugetlb fault
path to use struct vm_fault".
- selftests build fixes from John Hubbard in the series "Fix
selftests/mm build without requiring "make headers"".
- Memory tiering fixes/improvements from Ho-Ren (Jack) Chuang in the
series "Improved Memory Tier Creation for CPUless NUMA Nodes".
Fixes the initialization code so that migration between different
memory types works as intended.
- David Hildenbrand has improved follow_pte() and fixed an errant
driver in the series "mm: follow_pte() improvements and acrn
follow_pte() fixes".
- David also did some cleanup work on large folio mapcounts in his
series "mm: mapcount for large folios + page_mapcount() cleanups".
- Folio conversions in KSM in Alex Shi's series "transfer page to
folio in KSM".
- Barry Song has added some sysfs stats for monitoring multi-size
THP's in the series "mm: add per-order mTHP alloc and swpout
counters".
- Some zswap cleanups from Yosry Ahmed in the series "zswap
same-filled and limit checking cleanups".
- Matthew Wilcox has been looking at buffer_head code and found the
documentation to be lacking. The series is "Improve buffer head
documentation".
- Multi-size THPs get more work, this time from Lance Yang. His
series "mm/madvise: enhance lazyfreeing with mTHP in madvise_free"
optimizes the freeing of these things.
- Kemeng Shi has added more userspace-visible writeback
instrumentation in the series "Improve visibility of writeback".
- Kemeng Shi then sent some maintenance work on top in the series
"Fix and cleanups to page-writeback".
- Matthew Wilcox reduces mmap_lock traffic in the anon vma code in
the series "Improve anon_vma scalability for anon VMAs". Intel's
test bot reported an improbable 3x improvement in one test.
- SeongJae Park adds some DAMON feature work in the series
"mm/damon: add a DAMOS filter type for page granularity access recheck"
"selftests/damon: add DAMOS quota goal test"
- Also some maintenance work in the series
"mm/damon/paddr: simplify page level access re-check for pageout"
"mm/damon: misc fixes and improvements"
- David Hildenbrand has disabled some known-to-fail selftests ni the
series "selftests: mm: cow: flag vmsplice() hugetlb tests as
XFAIL".
- memcg metadata storage optimizations from Shakeel Butt in "memcg:
reduce memory consumption by memcg stats".
- DAX fixes and maintenance work from Vishal Verma in the series
"dax/bus.c: Fixups for dax-bus locking""
* tag 'mm-stable-2024-05-17-19-19' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (426 commits)
memcg, oom: cleanup unused memcg_oom_gfp_mask and memcg_oom_order
selftests/mm: hugetlb_madv_vs_map: avoid test skipping by querying hugepage size at runtime
mm/hugetlb: add missing VM_FAULT_SET_HINDEX in hugetlb_wp
mm/hugetlb: add missing VM_FAULT_SET_HINDEX in hugetlb_fault
selftests: cgroup: add tests to verify the zswap writeback path
mm: memcg: make alloc_mem_cgroup_per_node_info() return bool
mm/damon/core: fix return value from damos_wmark_metric_value
mm: do not update memcg stats for NR_{FILE/SHMEM}_PMDMAPPED
selftests: cgroup: remove redundant enabling of memory controller
Docs/mm/damon/maintainer-profile: allow posting patches based on damon/next tree
Docs/mm/damon/maintainer-profile: change the maintainer's timezone from PST to PT
Docs/mm/damon/design: use a list for supported filters
Docs/admin-guide/mm/damon/usage: fix wrong schemes effective quota update command
Docs/admin-guide/mm/damon/usage: fix wrong example of DAMOS filter matching sysfs file
selftests/damon: classify tests for functionalities and regressions
selftests/damon/_damon_sysfs: use 'is' instead of '==' for 'None'
selftests/damon/_damon_sysfs: find sysfs mount point from /proc/mounts
selftests/damon/_damon_sysfs: check errors from nr_schemes file reads
mm/damon/core: initialize ->esz_bp from damos_quota_init_priv()
selftests/damon: add a test for DAMOS quota goal
...
|
|
Patch series "Memory allocation profiling", v6.
Overview:
Low overhead [1] per-callsite memory allocation profiling. Not just for
debug kernels, overhead low enough to be deployed in production.
Example output:
root@moria-kvm:~# sort -rn /proc/allocinfo
127664128 31168 mm/page_ext.c:270 func:alloc_page_ext
56373248 4737 mm/slub.c:2259 func:alloc_slab_page
14880768 3633 mm/readahead.c:247 func:page_cache_ra_unbounded
14417920 3520 mm/mm_init.c:2530 func:alloc_large_system_hash
13377536 234 block/blk-mq.c:3421 func:blk_mq_alloc_rqs
11718656 2861 mm/filemap.c:1919 func:__filemap_get_folio
9192960 2800 kernel/fork.c:307 func:alloc_thread_stack_node
4206592 4 net/netfilter/nf_conntrack_core.c:2567 func:nf_ct_alloc_hashtable
4136960 1010 drivers/staging/ctagmod/ctagmod.c:20 [ctagmod] func:ctagmod_start
3940352 962 mm/memory.c:4214 func:alloc_anon_folio
2894464 22613 fs/kernfs/dir.c:615 func:__kernfs_new_node
...
Usage:
kconfig options:
- CONFIG_MEM_ALLOC_PROFILING
- CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT
- CONFIG_MEM_ALLOC_PROFILING_DEBUG
adds warnings for allocations that weren't accounted because of a
missing annotation
sysctl:
/proc/sys/vm/mem_profiling
Runtime info:
/proc/allocinfo
Notes:
[1]: Overhead
To measure the overhead we are comparing the following configurations:
(1) Baseline with CONFIG_MEMCG_KMEM=n
(2) Disabled by default (CONFIG_MEM_ALLOC_PROFILING=y &&
CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT=n)
(3) Enabled by default (CONFIG_MEM_ALLOC_PROFILING=y &&
CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT=y)
(4) Enabled at runtime (CONFIG_MEM_ALLOC_PROFILING=y &&
CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT=n && /proc/sys/vm/mem_profiling=1)
(5) Baseline with CONFIG_MEMCG_KMEM=y && allocating with __GFP_ACCOUNT
(6) Disabled by default (CONFIG_MEM_ALLOC_PROFILING=y &&
CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT=n) && CONFIG_MEMCG_KMEM=y
(7) Enabled by default (CONFIG_MEM_ALLOC_PROFILING=y &&
CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT=y) && CONFIG_MEMCG_KMEM=y
Performance overhead:
To evaluate performance we implemented an in-kernel test executing
multiple get_free_page/free_page and kmalloc/kfree calls with allocation
sizes growing from 8 to 240 bytes with CPU frequency set to max and CPU
affinity set to a specific CPU to minimize the noise. Below are results
from running the test on Ubuntu 22.04.2 LTS with 6.8.0-rc1 kernel on
56 core Intel Xeon:
kmalloc pgalloc
(1 baseline) 6.764s 16.902s
(2 default disabled) 6.793s (+0.43%) 17.007s (+0.62%)
(3 default enabled) 7.197s (+6.40%) 23.666s (+40.02%)
(4 runtime enabled) 7.405s (+9.48%) 23.901s (+41.41%)
(5 memcg) 13.388s (+97.94%) 48.460s (+186.71%)
(6 def disabled+memcg) 13.332s (+97.10%) 48.105s (+184.61%)
(7 def enabled+memcg) 13.446s (+98.78%) 54.963s (+225.18%)
Memory overhead:
Kernel size:
text data bss dec diff
(1) 26515311 18890222 17018880 62424413
(2) 26524728 19423818 16740352 62688898 264485
(3) 26524724 19423818 16740352 62688894 264481
(4) 26524728 19423818 16740352 62688898 264485
(5) 26541782 18964374 16957440 62463596 39183
Memory consumption on a 56 core Intel CPU with 125GB of memory:
Code tags: 192 kB
PageExts: 262144 kB (256MB)
SlabExts: 9876 kB (9.6MB)
PcpuExts: 512 kB (0.5MB)
Total overhead is 0.2% of total memory.
Benchmarks:
Hackbench tests run 100 times:
hackbench -s 512 -l 200 -g 15 -f 25 -P
baseline disabled profiling enabled profiling
avg 0.3543 0.3559 (+0.0016) 0.3566 (+0.0023)
stdev 0.0137 0.0188 0.0077
hackbench -l 10000
baseline disabled profiling enabled profiling
avg 6.4218 6.4306 (+0.0088) 6.5077 (+0.0859)
stdev 0.0933 0.0286 0.0489
stress-ng tests:
stress-ng --class memory --seq 4 -t 60
stress-ng --class cpu --seq 4 -t 60
Results posted at: https://evilpiepirate.org/~kent/memalloc_prof_v4_stress-ng/
[2] https://lore.kernel.org/all/[email protected]/
This patch (of 37):
The next patch drops vmalloc.h from a system header in order to fix a
circular dependency; this adds it to all the files that were pulling it in
implicitly.
[[email protected]: fix arch/alpha/lib/memcpy.c]
Link: https://lkml.kernel.org/r/[email protected]
[[email protected]: fix arch/x86/mm/numa_32.c]
Link: https://lkml.kernel.org/r/[email protected]
[[email protected]: a few places were depending on sizes.h]
Link: https://lkml.kernel.org/r/[email protected]
[[email protected]: fix mm/kasan/hw_tags.c]
Link: https://lkml.kernel.org/r/[email protected]
[[email protected]: fix arc build]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Kent Overstreet <[email protected]>
Signed-off-by: Suren Baghdasaryan <[email protected]>
Signed-off-by: Arnd Bergmann <[email protected]>
Reviewed-by: Pasha Tatashin <[email protected]>
Tested-by: Kees Cook <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: Alex Gaynor <[email protected]>
Cc: Alice Ryhl <[email protected]>
Cc: Andreas Hindborg <[email protected]>
Cc: Benno Lossin <[email protected]>
Cc: "Björn Roy Baron" <[email protected]>
Cc: Boqun Feng <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Dennis Zhou <[email protected]>
Cc: Gary Guo <[email protected]>
Cc: Miguel Ojeda <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Wedson Almeida Filho <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
In V3D, the conclusion of a job is indicated by a IRQ. When a job
finishes, then we update the local and the global GPU stats of that
queue. But, while the GPU stats are being updated, a user might be
reading the stats from sysfs or fdinfo.
For example, on `gpu_stats_show()`, we could think about a scenario where
`v3d->queue[queue].start_ns != 0`, then an interrupt happens, we update
the value of `v3d->queue[queue].start_ns` to 0, we come back to
`gpu_stats_show()` to calculate `active_runtime` and now,
`active_runtime = timestamp`.
In this simple example, the user would see a spike in the queue usage,
that didn't match reality.
In order to address this issue properly, use a seqcount to protect read
and write sections of the code.
Fixes: 09a93cc4f7d1 ("drm/v3d: Implement show_fdinfo() callback for GPU usage stats")
Reported-by: Tvrtko Ursulin <[email protected]>
Signed-off-by: Maíra Canal <[email protected]>
Reviewed-by: Tvrtko Ursulin <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
Create a function to decouple the stats calculation from the printing.
This will be useful in the next step when we add a seqcount to protect
the stats.
Signed-off-by: Maíra Canal <[email protected]>
Reviewed-by: Tvrtko Ursulin <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
Given a set of GPU stats, that is, a `struct v3d_stats` related to a
queue in a given context, create a function that can update this set
of GPU stats.
Signed-off-by: Maíra Canal <[email protected]>
Reviewed-by: Tvrtko Ursulin <[email protected]>
Reviewed-by: Jose Maria Casanova Crespo <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
This will make it easier to instantiate the GPU stats variables and it
will create a structure where we can store all the variables that refer
to GPU stats.
Note that, when we created the struct `v3d_stats`, we renamed
`jobs_sent` to `jobs_completed`. This better express the semantics of
the variable, as we are only accounting jobs that have been completed.
Signed-off-by: Maíra Canal <[email protected]>
Reviewed-by: Tvrtko Ursulin <[email protected]>
Reviewed-by: Jose Maria Casanova Crespo <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
Currently, we manually perform all operations to update the GPU stats
variables. Apart from the code repetition, this is very prone to errors,
as we can see on commit 35f4f8c9fc97 ("drm/v3d: Don't increment
`enabled_ns` twice").
Therefore, create two functions to manage updating all GPU stats
variables. Now, the jobs only need to call for `v3d_job_update_stats()`
when the job is done and `v3d_job_start_stats()` when starting the job.
Co-developed-by: Tvrtko Ursulin <[email protected]>
Signed-off-by: Tvrtko Ursulin <[email protected]>
Signed-off-by: Maíra Canal <[email protected]>
Reviewed-by: Jose Maria Casanova Crespo <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
The commit 509433d8146c ("drm/v3d: Expose the total GPU usage stats on sysfs")
introduced the calculation of global GPU stats. For the regards, it used
the already existing infrastructure provided by commit 09a93cc4f7d1 ("drm/v3d:
Implement show_fdinfo() callback for GPU usage stats"). While adding
global GPU stats calculation ability, the author forgot to delete the
existing one.
Currently, the value of `enabled_ns` is incremented twice by the end of
the job, when it should be added just once. Therefore, delete the
leftovers from commit 509433d8146c ("drm/v3d: Expose the total GPU usage
stats on sysfs").
Fixes: 509433d8146c ("drm/v3d: Expose the total GPU usage stats on sysfs")
Reported-by: Tvrtko Ursulin <[email protected]>
Signed-off-by: Maíra Canal <[email protected]>
Reviewed-by: Tvrtko Ursulin <[email protected]>
Reviewed-by: Jose Maria Casanova Crespo <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
Currently, the V3D driver uses PAGE_SHIFT over the assumption that
PAGE_SHIFT = 12, as the PAGE_SIZE = 4KB. But, the RPi 5 is using
PAGE_SIZE = 16KB, so the MMU PAGE_SHIFT is different than the system's
PAGE_SHIFT.
Enable V3D to be used in system's with any PAGE_SIZE by making sure that
everything MMU-related uses the MMU page shift.
Signed-off-by: Maíra Canal <[email protected]>
Reviewed-by: Iago Toral Quiroga <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
git://anongit.freedesktop.org/drm/drm-misc into drm-next
drm-misc-next for v6.9:
UAPI Changes:
virtio:
- add Venus capset defines
Cross-subsystem Changes:
Core Changes:
- fix drm_fixp2int_ceil()
- documentation fixes
- clean ups
- allow DRM_MM_DEBUG with DRM=m
- build fixes for debugfs support
- EDID cleanups
- sched: error-handling fixes
- ttm: add tests
Driver Changes:
bridge:
- ite-6505: fix DP link-training bug
- samsung-dsim: fix error checking in probe
- tc358767: fix regmap usage
efifb:
- use copy of global screen_info state
hisilicon:
- fix EDID includes
mgag200:
- improve ioremap usage
- convert to struct drm_edid
nouveau:
- disp: use kmemdup()
- fix EDID includes
- documentation fixes
panel:
- ltk050h3146w: error-handling fixes
- panel-edp: support delay between power-on and enable; use put_sync in
unprepare; support Mediatek MT8173 Chromebooks, BOE NV116WHM-N49 V8.0,
BOE NV122WUM-N41, CSO MNC207QS1-1 plus DT bindings
- panel-lvds: support EDT ETML0700Z9NDHA plus DT bindings
- panel-novatek: FRIDA FRD400B25025-A-CTK plus DT bindings
qaic:
- fixes to BO handling
- make use of DRM managed release
- fix order of remove operations
rockchip:
- analogix_dp: get encoder port from DT
- inno_hdmi: support HDMI for RK3128
- lvds: error-handling fixes
simplefb:
- fix logging
ssd130x:
- support SSD133x plus DT bindings
tegra:
- fix error handling
tilcdc:
- make use of DRM managed release
v3d:
- show memory stats in debugfs
vc4:
- fix error handling in plane prepare_fb
- fix framebuffer test in plane helpers
vesafb:
- use copy of global screen_info state
virtio:
- cleanups
vkms:
- fix OOB access when programming the LUT
- Kconfig improvements
vmwgfx:
- unmap surface before changing plane state
- fix memory leak in error handling
- documentation fixes
Signed-off-by: Dave Airlie <[email protected]>
From: Thomas Zimmermann <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/20240111154902.GA8448@linux-uq9g
|
|
Dump the contents of the DRM MM allocator of the V3D driver. This will
help us to debug the VA ranges allocated.
Signed-off-by: Maíra Canal <[email protected]>
Reviewed-by: Melissa Wen <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
Currently, if `v3d_job_init()` fails (e.g. in the IGT test "bad-in-sync",
where we submit an invalid in-sync to the IOCTL), then we end up with
the following NULL pointer dereference:
[ 34.146279] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000078
[ 34.146301] Mem abort info:
[ 34.146306] ESR = 0x0000000096000005
[ 34.146315] EC = 0x25: DABT (current EL), IL = 32 bits
[ 34.146322] SET = 0, FnV = 0
[ 34.146328] EA = 0, S1PTW = 0
[ 34.146334] FSC = 0x05: level 1 translation fault
[ 34.146340] Data abort info:
[ 34.146345] ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000
[ 34.146351] CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[ 34.146357] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[ 34.146366] user pgtable: 4k pages, 39-bit VAs, pgdp=00000001232e6000
[ 34.146375] [0000000000000078] pgd=0000000000000000, p4d=0000000000000000, pud=0000000000000000
[ 34.146399] Internal error: Oops: 0000000096000005 [#1] PREEMPT SMP
[ 34.146406] Modules linked in: rfcomm snd_seq_dummy snd_hrtimer snd_seq snd_seq_device algif_hash aes_neon_bs aes_neon_blk algif_skcipher af_alg bnep hid_logitech_hidpp brcmfmac_wcc brcmfmac brcmutil hci_uart vc4 btbcm cfg80211 bluetooth bcm2835_v4l2(C) snd_soc_hdmi_codec binfmt_misc cec drm_display_helper hid_logitech_dj bcm2835_mmal_vchiq(C) drm_dma_helper drm_kms_helper videobuf2_v4l2 raspberrypi_hwmon ecdh_generic videobuf2_vmalloc videobuf2_memops ecc videobuf2_common rfkill videodev libaes snd_soc_core dwc2 i2c_brcmstb snd_pcm_dmaengine snd_bcm2835(C) i2c_bcm2835 pwm_bcm2835 snd_pcm mc v3d snd_timer snd gpu_sched drm_shmem_helper nvmem_rmem uio_pdrv_genirq uio i2c_dev drm fuse dm_mod drm_panel_orientation_quirks backlight configfs ip_tables x_tables ipv6
[ 34.146556] CPU: 1 PID: 1890 Comm: v3d_submit_csd Tainted: G C 6.7.0-rc3-g49ddab089611 #68
[ 34.146563] Hardware name: Raspberry Pi 4 Model B Rev 1.5 (DT)
[ 34.146569] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 34.146575] pc : drm_sched_job_cleanup+0x3c/0x190 [gpu_sched]
[ 34.146611] lr : v3d_submit_csd_ioctl+0x1b4/0x460 [v3d]
[ 34.146653] sp : ffffffc083cbbb80
[ 34.146658] x29: ffffffc083cbbb90 x28: ffffff81035afc00 x27: ffffffe77a641168
[ 34.146668] x26: ffffff81056a8000 x25: 0000000000000058 x24: 0000000000000000
[ 34.146677] x23: ffffff81065e2000 x22: ffffff81035afe00 x21: ffffffc083cbbcf0
[ 34.146686] x20: ffffff81035afe00 x19: 00000000ffffffea x18: 0000000000000000
[ 34.146694] x17: 0000000000000000 x16: ffffffe7989e34b0 x15: 0000000000000000
[ 34.146703] x14: 0000000004000004 x13: ffffff81035afe80 x12: ffffffc083cb8000
[ 34.146711] x11: cc57e05dfbe5ef00 x10: cc57e05dfbe5ef00 x9 : ffffffe77a64131c
[ 34.146719] x8 : 0000000000000000 x7 : 0000000000000000 x6 : 000000000000003f
[ 34.146727] x5 : 0000000000000040 x4 : ffffff81fefb03f0 x3 : ffffffc083cbba40
[ 34.146736] x2 : ffffff81056a8000 x1 : ffffffe7989e35e8 x0 : 0000000000000000
[ 34.146745] Call trace:
[ 34.146748] drm_sched_job_cleanup+0x3c/0x190 [gpu_sched]
[ 34.146768] v3d_submit_csd_ioctl+0x1b4/0x460 [v3d]
[ 34.146791] drm_ioctl_kernel+0xe0/0x120 [drm]
[ 34.147029] drm_ioctl+0x264/0x408 [drm]
[ 34.147135] __arm64_sys_ioctl+0x9c/0xe0
[ 34.147152] invoke_syscall+0x4c/0x118
[ 34.147162] el0_svc_common+0xb8/0xf0
[ 34.147168] do_el0_svc+0x28/0x40
[ 34.147174] el0_svc+0x38/0x88
[ 34.147184] el0t_64_sync_handler+0x84/0x100
[ 34.147191] el0t_64_sync+0x190/0x198
[ 34.147201] Code: aa0003f4 f90007e8 f9401008 aa0803e0 (b8478c09)
[ 34.147210] ---[ end trace 0000000000000000 ]---
This happens because we are calling `drm_sched_job_cleanup()` twice:
once at `v3d_job_init()` and again when we call `v3d_job_cleanup()`.
To mitigate this issue, we can return to the same approach that we used
to use before 464c61e76de8: deallocate the job after `v3d_job_init()`
fails and assign it to NULL. Then, when we call `v3d_job_cleanup()`, job
is NULL and the function returns.
Fixes: 464c61e76de8 ("drm/v3d: Decouple job allocation from job initiation")
Signed-off-by: Maíra Canal <[email protected]>
Reviewed-by: Iago Toral Quiroga <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
RPi 4 uses V3D 4.2, which is currently not supported by the register
definition stated at `v3d_core_reg_defs`. We should be able to support
V3D 4.2, therefore, change the maximum version of the register
definition to 42, not 41.
Fixes: 0ad5bc1ce463 ("drm/v3d: fix up register addresses for V3D 7.x")
Signed-off-by: Maíra Canal <[email protected]>
Reviewed-by: Iago Toral Quiroga <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
Smatch warns:
drivers/gpu/drm/v3d/v3d_submit.c:1222 v3d_submit_cpu_ioctl()
warn: missing error code 'ret'
When there is no job type or job is submitted with wrong number of BOs
it is an error path, ret is zero at this point which is incorrect
return.
Fix this by changing it to -EINVAL.
Fixes: aafc1a2bea67 ("drm/v3d: Add a CPU job submission")
Signed-off-by: Harshit Mogalapalli <[email protected]>
Reviewed-by: Melissa Wen <[email protected]>
Signed-off-by: Melissa Wen <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
A CPU job is a type of job that performs operations that requires CPU
intervention. A copy performance query job is a job that copy the complete
or partial result of a query to a buffer. In order to copy the result of
a performance query to a buffer, we need to get the values from the
performance monitors.
So, create a user extension for the CPU job that enables the creation
of a copy performance query job. This user extension will allow the creation
of a CPU job that copy the results of a performance query to a BO with the
possibility to indicate the availability with a availability bit.
Signed-off-by: Maíra Canal <[email protected]>
Reviewed-by: Iago Toral Quiroga <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
A CPU job is a type of job that performs operations that requires CPU
intervention. A reset performance query job is a job that resets the
performance queries by resetting the values of the perfmons. Moreover,
we also reset the syncobjs related to the availability of the query.
So, create a user extension for the CPU job that enables the creation
of a reset performance job. This user extension will allow the creation of
a CPU job that resets the perfmons values and resets the availability syncobj.
Signed-off-by: Maíra Canal <[email protected]>
Reviewed-by: Iago Toral Quiroga <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
A CPU job is a type of job that performs operations that requires CPU
intervention. A copy timestamp query job is a job that copy the complete
or partial result of a query to a buffer. As V3D doesn't provide any
mechanism to obtain a timestamp from the GPU, it is a job that needs
CPU intervention.
So, create a user extension for the CPU job that enables the creation
of a copy timestamp query job. This user extension will allow the creation
of a CPU job that copy the results of a timestamp query to a BO with the
possibility to indicate the timestamp availability with a availability bit.
Signed-off-by: Maíra Canal <[email protected]>
Reviewed-by: Iago Toral Quiroga <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
A CPU job is a type of job that performs operations that requires CPU
intervention. A reset timestamp job is a job that resets the timestamp
queries based on the value offset of the first query. As V3D doesn't
provide any mechanism to obtain a timestamp from the GPU, it is a job
that needs CPU intervention.
So, create a user extension for the CPU job that enables the creation
of a reset timestamp job. This user extension will allow the creation of
a CPU job that resets the timestamp value in the timestamp BO and resets
the availability syncobj.
Signed-off-by: Maíra Canal <[email protected]>
Reviewed-by: Iago Toral Quiroga <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
A CPU job is a type of job that performs operations that requires CPU
intervention. A timestamp query job is a job that calculates the
query timestamp and updates the query availability by signaling a
syncobj. As V3D doesn't provide any mechanism to obtain a timestamp
from the GPU, it is a job that needs CPU intervention.
So, create a user extension for the CPU job that enables the creation
of a timestamp query job. This user extension will allow the creation of
a CPU job that performs the timestamp query calculation and updates the
timestamp BO with the proper value.
Signed-off-by: Maíra Canal <[email protected]>
Reviewed-by: Iago Toral Quiroga <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
A CPU job is a type of job that performs operations that requires CPU
intervention. An indirect CSD job is a job that, when executed in the
queue, will map the indirect buffer, read the dispatch parameters, and
submit a regular dispatch. Therefore, it is a job that needs CPU
intervention.
So, create a user extension for the CPU job that enables the creation
of an indirect CSD. This user extension will allow the creation of a CSD
job linked to a CPU job. The CPU job will wait for the indirect CSD job
dependencies and, once they are signaled, it will update the CSD job
parameters.
Co-developed-by: Melissa Wen <[email protected]>
Signed-off-by: Melissa Wen <[email protected]>
Signed-off-by: Maíra Canal <[email protected]>
Reviewed-by: Iago Toral Quiroga <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
For the indirect CSD CPU job, we will need to access the internal
contents of the BO with the dispatch parameters. Therefore, create
methods to allow the mapping and unmapping of the BO.
Signed-off-by: Maíra Canal <[email protected]>
Reviewed-by: Iago Toral Quiroga <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
Detach CSD job setup from CSD submission ioctl to reuse it in CPU
submission ioctl for indirect CSD job.
Signed-off-by: Melissa Wen <[email protected]>
Co-developed-by: Maíra Canal <[email protected]>
Signed-off-by: Maíra Canal <[email protected]>
Reviewed-by: Iago Toral Quiroga <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
Create tracepoints to track the three major events of a CPU job
lifetime:
1. Submission of a `v3d_submit_cpu` IOCTL
2. Beginning of the execution of a CPU job
3. Ending of the execution of a CPU job
Signed-off-by: Maíra Canal <[email protected]>
Reviewed-by: Iago Toral Quiroga <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
Currently, v3d_get_extensions() only parses multisync data and assigns
it to the `struct v3d_submit_ext`. But, to implement the CPU job with
user extensions, we want v3d_get_extensions() to be able to parse CPU
job data and assign it to the `struct v3d_cpu_job`.
Therefore, allow the function v3d_get_extensions() to use `struct v3d_cpu_job *`
as a parameter. If the `struct v3d_cpu_job *` is assigned to NULL, it means
that the job is a GPU job and CPU job extensions should be rejected.
Signed-off-by: Maíra Canal <[email protected]>
Reviewed-by: Iago Toral Quiroga <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
Create a new type of job, a CPU job. A CPU job is a type of job that
performs operations that requires CPU intervention. The overall idea is
to use user extensions to enable different types of CPU job, allowing the
CPU job to perform different operations according to the type of user
extension. The user extension ID identify the type of CPU job that must
be dealt.
Having a CPU job is interesting for synchronization purposes as a CPU
job has a queue like any other V3D job and can be synchoronized by the
multisync extension.
Signed-off-by: Melissa Wen <[email protected]>
Co-developed-by: Maíra Canal <[email protected]>
Signed-off-by: Maíra Canal <[email protected]>
Reviewed-by: Iago Toral Quiroga <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
We want to allow the IOCTLs to allocate the job without initiating it.
This will be useful for the CPU job submission IOCTL, as the CPU job has
the need to use information from the user extensions. Currently, the
user extensions are parsed before the job allocation, making it
impossible to fill the CPU job when parsing the user extensions.
Therefore, decouple the job allocation from the job initiation.
Signed-off-by: Maíra Canal <[email protected]>
Reviewed-by: Iago Toral Quiroga <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
Currently, two multisync extensions can be added to the same job and
only the last multisync extension will be used. To avoid this
vulnerability, don't allow two multisync extensions in the same job.
Signed-off-by: Maíra Canal <[email protected]>
Reviewed-by: Iago Toral Quiroga <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
Instead of checking if the job is NULL every time we call the function,
check it inside the function.
Signed-off-by: Melissa Wen <[email protected]>
Signed-off-by: Maíra Canal <[email protected]>
Reviewed-by: Iago Toral Quiroga <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
We will include a new job submission type, the CPU job submission. For
readability and maintability, separate the job submission IOCTLs and
related operations from v3d_gem.c.
Minor fix in the CSD submission kernel doc:
CSD (texture formatting) -> CSD (compute shader).
Signed-off-by: Melissa Wen <[email protected]>
Signed-off-by: Maíra Canal <[email protected]>
Reviewed-by: Iago Toral Quiroga <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
IOCTLs related to BO operations reside on the file v3d_bo.c. The wait BO
ioctl is the only IOCTL regarding BOs that is placed in a different file.
So, move it to the v3d_bo.c file.
Signed-off-by: Melissa Wen <[email protected]>
Signed-off-by: Maíra Canal <[email protected]>
Reviewed-by: Iago Toral Quiroga <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
v3d_mmu_get_offset header was added but the function was never defined.
Just remove it.
Signed-off-by: Melissa Wen <[email protected]>
Signed-off-by: Maíra Canal <[email protected]>
Reviewed-by: Iago Toral Quiroga <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
Currently, job flow control is implemented simply by limiting the number
of jobs in flight. Therefore, a scheduler is initialized with a credit
limit that corresponds to the number of jobs which can be sent to the
hardware.
This implies that for each job, drivers need to account for the maximum
job size possible in order to not overflow the ring buffer.
However, there are drivers, such as Nouveau, where the job size has a
rather large range. For such drivers it can easily happen that job
submissions not even filling the ring by 1% can block subsequent
submissions, which, in the worst case, can lead to the ring run dry.
In order to overcome this issue, allow for tracking the actual job size
instead of the number of jobs. Therefore, add a field to track a job's
credit count, which represents the number of credits a job contributes
to the scheduler's credit limit.
Signed-off-by: Danilo Krummrich <[email protected]>
Reviewed-by: Luben Tuikov <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
The previous patch exposed the accumulated amount of active time per
client for each V3D queue. But this doesn't provide a global notion of
the GPU usage.
Therefore, provide the accumulated amount of active time for each V3D
queue (BIN, RENDER, CSD, TFU and CACHE_CLEAN), considering all the jobs
submitted to the queue, independent of the client.
This data is exposed through the sysfs interface, so that if the
interface is queried at two different points of time the usage percentage
of each of the queues can be calculated.
Co-developed-by: Jose Maria Casanova Crespo <[email protected]>
Signed-off-by: Jose Maria Casanova Crespo <[email protected]>
Signed-off-by: Maíra Canal <[email protected]>
Acked-by: Jose Maria Casanova Crespo <[email protected]>
Reviewed-by: Melissa Wen <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
This patch exposes the accumulated amount of active time per client
through the fdinfo infrastructure. The amount of active time is exposed
for each V3D queue: BIN, RENDER, CSD, TFU and CACHE_CLEAN.
In order to calculate the amount of active time per client, a CPU clock
is used through the function local_clock(). The point where the jobs has
started is marked and is finally compared with the time that the job had
finished.
Moreover, the number of jobs submitted to each queue is also exposed on
fdinfo through the identifier "v3d-jobs-<queue>".
Co-developed-by: Jose Maria Casanova Crespo <[email protected]>
Signed-off-by: Jose Maria Casanova Crespo <[email protected]>
Signed-off-by: Maíra Canal <[email protected]>
Acked-by: Jose Maria Casanova Crespo <[email protected]>
Reviewed-by: Melissa Wen <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
This is required to get the V3D module to load with Raspberry Pi 5.
Signed-off-by: Iago Toral Quiroga <[email protected]>
Reviewed-by: Stefan Wahren <[email protected]>
Reviewed-by: Maíra Canal <[email protected]>
Signed-off-by: Maíra Canal <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
This patch updates a number of register addresses that have
been changed in Raspberry Pi 5 (V3D 7.1) and updates the
code to use the corresponding registers and addresses based
on the actual V3D version.
Signed-off-by: Iago Toral Quiroga <[email protected]>
Reviewed-by: Maíra Canal <[email protected]>
Signed-off-by: Maíra Canal <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
In Xe, the new Intel GPU driver, a choice has made to have a 1 to 1
mapping between a drm_gpu_scheduler and drm_sched_entity. At first this
seems a bit odd but let us explain the reasoning below.
1. In Xe the submission order from multiple drm_sched_entity is not
guaranteed to be the same completion even if targeting the same hardware
engine. This is because in Xe we have a firmware scheduler, the GuC,
which allowed to reorder, timeslice, and preempt submissions. If a using
shared drm_gpu_scheduler across multiple drm_sched_entity, the TDR falls
apart as the TDR expects submission order == completion order. Using a
dedicated drm_gpu_scheduler per drm_sched_entity solve this problem.
2. In Xe submissions are done via programming a ring buffer (circular
buffer), a drm_gpu_scheduler provides a limit on number of jobs, if the
limit of number jobs is set to RING_SIZE / MAX_SIZE_PER_JOB we get flow
control on the ring for free.
A problem with this design is currently a drm_gpu_scheduler uses a
kthread for submission / job cleanup. This doesn't scale if a large
number of drm_gpu_scheduler are used. To work around the scaling issue,
use a worker rather than kthread for submission / job cleanup.
v2:
- (Rob Clark) Fix msm build
- Pass in run work queue
v3:
- (Boris) don't have loop in worker
v4:
- (Tvrtko) break out submit ready, stop, start helpers into own patch
v5:
- (Boris) default to ordered work queue
v6:
- (Luben / checkpatch) fix alignment in msm_ringbuffer.c
- (Luben) s/drm_sched_submit_queue/drm_sched_wqueue_enqueue
- (Luben) Update comment for drm_sched_wqueue_enqueue
- (Luben) Positive check for submit_wq in drm_sched_init
- (Luben) s/alloc_submit_wq/own_submit_wq
v7:
- (Luben) s/drm_sched_wqueue_enqueue/drm_sched_run_job_queue
v8:
- (Luben) Adjust var names / comments
Signed-off-by: Matthew Brost <[email protected]>
Reviewed-by: Luben Tuikov <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Luben Tuikov <[email protected]>
|
|
Currently, we are only warning the user if the BIN or RENDER jobs don't
finish before we unregister V3D. We must wait for all jobs to finish
before unregistering. Therefore, warn the user if TFU or CSD jobs
are not done by the time the driver is unregistered.
Signed-off-by: Maíra Canal <[email protected]>
Reviewed-by: Iago Toral Quiroga <[email protected]>
Signed-off-by: Maíra Canal <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
The GPU scheduler has now a variable number of run-queues, which are set up at
drm_sched_init() time. This way, each driver announces how many run-queues it
requires (supports) per each GPU scheduler it creates. Note, that run-queues
correspond to scheduler "priorities", thus if the number of run-queues is set
to 1 at drm_sched_init(), then that scheduler supports a single run-queue,
i.e. single "priority". If a driver further sets a single entity per
run-queue, then this creates a 1-to-1 correspondence between a scheduler and
a scheduled entity.
Cc: Lucas Stach <[email protected]>
Cc: Russell King <[email protected]>
Cc: Qiang Yu <[email protected]>
Cc: Rob Clark <[email protected]>
Cc: Abhinav Kumar <[email protected]>
Cc: Dmitry Baryshkov <[email protected]>
Cc: Danilo Krummrich <[email protected]>
Cc: Matthew Brost <[email protected]>
Cc: Boris Brezillon <[email protected]>
Cc: Alex Deucher <[email protected]>
Cc: Christian König <[email protected]>
Cc: Emma Anholt <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Luben Tuikov <[email protected]>
Acked-by: Christian König <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Prepare for the coming implementation by GCC and Clang of the __counted_by
attribute. Flexible array members annotated with __counted_by can have
their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS
(for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family
functions).
As found with Coccinelle[1], add __counted_by for struct v3d_perfmon.
[1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci
Cc: Emma Anholt <[email protected]>
Cc: Melissa Wen <[email protected]>
Cc: David Airlie <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: [email protected]
Signed-off-by: Kees Cook <[email protected]>
Reviewed-by: Maíra Canal <[email protected]>
Signed-off-by: Christian König <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
A proposed update to clang's -Wconstant-logical-operand to warn when the
left hand side is a constant shows the following instance in
nsecs_to_jiffies_timeout() when NSEC_PER_SEC is not a multiple of HZ,
such as CONFIG_HZ=300:
In file included from drivers/gpu/drm/v3d/v3d_debugfs.c:12:
drivers/gpu/drm/v3d/v3d_drv.h:343:24: warning: use of logical '&&' with constant operand [-Wconstant-logical-operand]
343 | if (NSEC_PER_SEC % HZ &&
| ~~~~~~~~~~~~~~~~~ ^
drivers/gpu/drm/v3d/v3d_drv.h:343:24: note: use '&' for a bitwise operation
343 | if (NSEC_PER_SEC % HZ &&
| ^~
| &
drivers/gpu/drm/v3d/v3d_drv.h:343:24: note: remove constant to silence this warning
1 warning generated.
Turn this into an explicit comparison against zero to make the
expression a boolean to make it clear this should be a logical check,
not a bitwise one.
Link: https://reviews.llvm.org/D142609
Signed-off-by: Nathan Chancellor <[email protected]>
Reviewed-by: Maíra Canal <[email protected]>
Reviewed-by: Nick Desaulniers <[email protected]>
Signed-off-by: Maíra Canal <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/20230718-nsecs_to_jiffies_timeout-constant-logical-operand-v1-1-36ed8fc8faea@kernel.org
|
|
Clear all assignments of struct drm_driver's fd/handle callbacks to
drm_gem_prime_fd_to_handle() and drm_gem_prime_handle_to_fd(). These
functions are called by default. Add a TODO item to convert vmwgfx
to the defaults as well.
v2:
* remove TODO item (Zack)
* also update amdgpu's amdgpu_partition_driver
Signed-off-by: Thomas Zimmermann <[email protected]>
Reviewed-by: Simon Ser <[email protected]>
Acked-by: Alex Deucher <[email protected]>
Acked-by: Jeffrey Hugo <[email protected]> # qaic
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
All drivers initialize this field with drm_gem_prime_mmap(). Call
the function directly and remove the field. Simplifies the code and
resolves a long-standing TODO item.
Signed-off-by: Thomas Zimmermann <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is (mostly) ignored
and this typically results in resource leaks. To improve here there is a
quest to make the remove callback return void. In the first step of this
quest all drivers are converted to .remove_new() which already returns
void.
Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.
Signed-off-by: Uwe Kleine-König <[email protected]>
Signed-off-by: Douglas Anderson <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
As v3d_job_add_deps() performs the same steps as
drm_sched_job_add_syncobj_dependency(), replace the open-coded
implementation in v3d in order to simply use the DRM function.
Signed-off-by: Maíra Canal <[email protected]>
Reviewed-by: Melissa Wen <[email protected]>
Signed-off-by: Maíra Canal <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
As v3d_submit_tfu_ioctl() performs the same steps as
drm_gem_object_lookup(), replace the open-code implementation in v3d
with its DRM core equivalent.
Signed-off-by: Maíra Canal <[email protected]>
Reviewed-by: Melissa Wen <[email protected]>
Signed-off-by: Melissa Wen <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
Replace the use of drm_debugfs_create_files() with the new
drm_debugfs_add_files() function, which centers the debugfs files
management on the drm_device instead of drm_minor.
Signed-off-by: Maíra Canal <[email protected]>
Reviewed-by: Melissa Wen <[email protected]>
Acked-by: Daniel Vetter <[email protected]>
Signed-off-by: Maíra Canal <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
As v3d_lookup_bos() performs the same steps as drm_gem_objects_lookup(),
replace the explicit code in v3d to simply use the DRM function.
Signed-off-by: Melissa Wen <[email protected]>
Reviewed-by: Maíra Canal <[email protected]>
Signed-off-by: Melissa Wen <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
When v3d_lookup_bos fails to `allocate validated BO pointers`,
job->bo_count was already set to args->bo_count, but job->bo points to
NULL. In this scenario, we must verify that job->bo is not NULL before
iterating on it to proper clean up a job. Also, drm_gem_object_put
already checks that the object passed is not NULL, doing the job->bo[i]
checker redundant.
Signed-off-by: Melissa Wen <[email protected]>
Reviewed-by: Maíra Canal <[email protected]>
Signed-off-by: Melissa Wen <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
v3d_perfmon_open_file() instantiates a mutex for a particular file
instance, but it never destroys it by calling mutex_destroy() in
v3d_perfmon_close_file().
Similarly, v3d_perfmon_create_ioctl() instantiates a mutex for a
particular perfmon, but it never destroys it by calling mutex_destroy()
in v3d_perfmon_destroy_ioctl().
So, add the missing mutex_destroy on both cases.
Signed-off-by: Maíra Canal <[email protected]>
Reviewed-by: Daniel Vetter <[email protected]>
Signed-off-by: Melissa Wen <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|