Age | Commit message (Collapse) | Author | Files | Lines |
|
[why]
Enabling higher compiler warning levels results in many issues that can
be trivially resolved as well as some potentially critical issues.
[how]
Fix all compiler warnings found with various compilers and higher
warning levels. Primarily, potentially uninitialized variables and
unreachable code.
Reviewed-by: Leo Li <[email protected]>
Acked-by: Roman Li <[email protected]>
Signed-off-by: Aric Cyr <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
update for odm
[why]
Move built test pattern as part of pipe resource update for odm to ensure we rebuild
test pattern params every time we have an ODM update
Reviewed-by: George Shen <[email protected]>
Acked-by: Roman Li <[email protected]>
Signed-off-by: Wenjing Liu <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
[Why]
Future implementations will require a distinction between AC power and
DC power (wall power and battery power, respectively). To accomplish this,
adding a power mode parameter to certain dc interfaces, and adding a
separate DML2 instance for DC mode validation. Default behaviour unchanged.
Reviewed-by: Jun Lei <[email protected]>
Reviewed-by: Aric Cyr <[email protected]>
Acked-by: Roman Li <[email protected]>
Signed-off-by: Joshua Aberback <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
[Why]
When the DML2 wrapper explicitly accesses context->dml2, that creates a
dependency on where dc saves the DML object. This dependency makes it
harder to have multiple co-existing DML objects, which we would like to
have for upcoming functionality.
[How]
- make all DML21 interfaces take in a DML2 object as parameter
- remove all references to context->dml2, use parameter instead
Reviewed-by: Jun Lei <[email protected]>
Reviewed-by: Aric Cyr <[email protected]>
Acked-by: Roman Li <[email protected]>
Signed-off-by: Joshua Aberback <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
[Description]
- Add logging for first DMUB inbox message that timed out to diagnostic
data
- It is useful to track the first failed message for debug purposes
because once DMUB becomes hung (typically on a message), it will
remain hung and all subsequent messages. In these cases we're
interested in knowing which is the first message that failed.
Reviewed-by: Josip Pavic <[email protected]>
Acked-by: Roman Li <[email protected]>
Signed-off-by: Alvin Lee <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
corruption
[Why and how]
External display has corruption because no root clock control function. Add the function pointer to fix the issue.
Reviewed-by: Daniel Miess <[email protected]>
Reviewed-by: Nicholas Kazlauskas <[email protected]>
Acked-by: Roman Li <[email protected]>
Signed-off-by: Xi (Alex) Liu <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
[Why]
The threshold is no longer useful for blocking suboptimal power states
for DCN35 based on real measurement.
[How]
Reduce to the minimum threshold duration, 1us.
Reviewed-by: Gabe Teeger <[email protected]>
Acked-by: Roman Li <[email protected]>
Signed-off-by: Nicholas Kazlauskas <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
[Description]
Add extra logging for DCSURF_FLIP_CNTL, DCHUBP_CNTL,
OTG_MASTER_EN, and OTG_DOUBLE_BUFFER_CONTROL for more
debuggability for a system crash.
Reviewed-by: Samson Tam <[email protected]>
Acked-by: Roman Li <[email protected]>
Signed-off-by: Alvin Lee <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
[Why && How]
OTG can be disabled before setting dpms on. Add check to skip wait
when setting AV mute if OTG is disabled.
Reviewed-by: Wenjing Liu <[email protected]>
Acked-by: Roman Li <[email protected]>
Signed-off-by: Leo (Hanghong) Ma <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
[WHY]
dynamic memory safety error detector (KASAN) catches and generates error
messages "BUG: KASAN: slab-out-of-bounds" as writeback connector does not
support certain features which are not initialized.
[HOW]
Skip them when connector type is DRM_MODE_CONNECTOR_WRITEBACK.
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/3199
Reviewed-by: Harry Wentland <[email protected]>
Reviewed-by: Rodrigo Siqueira <[email protected]>
Acked-by: Roman Li <[email protected]>
Signed-off-by: Alex Hung <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
[Why]
HPO can be power gated unconditionally for
DCN35.
[How]
Set disable flag to false.
Reviewed-by: Nicholas Kazlauskas <[email protected]>
Acked-by: Roman Li <[email protected]>
Signed-off-by: Duncan Ma <[email protected]>
Tested-by: Daniel Wheeler <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
[Why & How]
Enable root clock optimization for HDMISTREAMCLK and only
disable it when it's actively being used.
Reviewed-by: Charlene Liu <[email protected]>
Acked-by: Roman Li <[email protected]>
Signed-off-by: Daniel Miess <[email protected]>
Tested-by: Daniel Wheeler <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
[Why/How]
Some issues may require a trace of the previous SMU messages from DC to
understand the context and aid in debugging. Actual logging to be
implemented when needed.
Reviewed-by: Josip Pavic <[email protected]>
Acked-by: Roman Li <[email protected]>
Signed-off-by: George Shen <[email protected]>
Tested-by: Daniel Wheeler <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
[why]
As per programming guide, we need to
enable the virtual pixel clock via DTBCLK
DTO and ungate the clock before we begin
programming OPP/OPTC control registers.
Otherwise, the double-buffered registers
will be left pending until the clocks are enabled.
[how]
Move the DTBCLK DTO programming up to
where we do the legacy DP DTO programming.
Reviewed-by: Nicholas Kazlauskas <[email protected]>
Acked-by: Roman Li <[email protected]>
Signed-off-by: Sung Joon Kim <[email protected]>
Tested-by: Daniel Wheeler <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
[why]
There is an ambiguity in subvp pipe topology log. The log doesn't show
subvp relation to main stream and it is not clear that certain stream
is an internal stream for subvp pipes.
[how]
Separate subvp pipe topology logging from main pipe topology. Log main
stream indices instead of the internal stream for subvp pipes.
The following is a sample log showing 2 streams with subvp enabled on
both:
pipe topology update
________________________
| plane0 slice0 stream0|
|DPP1----OPP1----OTG1----|
| plane0 slice0 stream1|
|DPP0----OPP0----OTG0----|
| (phantom pipes) |
| plane0 slice0 stream0|
|DPP3----OPP3----OTG3----|
| plane0 slice0 stream1|
|DPP2----OPP2----OTG2----|
|________________________|
Reviewed-by: Alvin Lee <[email protected]>
Acked-by: Roman Li <[email protected]>
Signed-off-by: Wenjing Liu <[email protected]>
Tested-by: Daniel Wheeler <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
[WHY&HOW]
Update dmub and driver interface for future FAMS revisions.
Reviewed-by: Anthony Koo <[email protected]>
Acked-by: Roman Li <[email protected]>
Signed-off-by: Dillon Varone <[email protected]>
Tested-by: Daniel Wheeler <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Apply this rule to all newer asics in sriov case.
For asic with VF MMIO access protection avoid using CPU for VM table updates.
CPU pagetable updates have issues with HDP flush as VF MMIO access protection
blocks write to BIF_BX_DEV0_EPF0_VF0_HDP_MEM_COHERENCY_FLUSH_CNTL register
during sriov runtime.
Moved the check to amdgpu_device_init() to ensure it is done after
amdgpu_device_ip_early_init() where the IP versions are discovered.
Signed-off-by: Danijel Slivka <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Enable pipe1 support starting from SIENNA CICHLID asic
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2117
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Arunpravin Paneer Selvam <[email protected]>
Signed-off-by: ZhenGuo Yin <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Use correct ref/mask for differnent gfx ring pipe.
This should fix the gfx hang issue after enabling gfx pipe1.
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2117
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: ZhenGuo Yin <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
[Why]
The function to count the number of valid connectors does not
guarantee that the first n indices are valid, only that there
exist n valid indices. When invalid indices are present, this
results in later valid connectors being missed, as processing
would end after checking n indices.
[How]
- count valid indices separately from total indices examined
- add explicit definition of MAX_LINKS
Reviewed-by: Dillon Varone <[email protected]>
Acked-by: Roman Li <[email protected]>
Signed-off-by: Joshua Aberback <[email protected]>
Tested-by: Daniel Wheeler <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
[Why] Mst slot nums equals to pbn / pbn_div.
Today, pbn_div refers to dm_mst_get_pbn_divider ->
dc_link_bandwidth_kbps. In dp_link_bandwidth_kbps,
which includes effect of FEC overhead already. As
result, we should not include effect of FEC overhead
again while calculating pbn by kpbs_to_peak_pbn
(stream_kbps).
[How] Include FEC overhead within dp_link_bandwidth_kbps.
Remove FEC overhead from kbps_to_peak_pbn.
Reviewed-by: Wayne Lin <[email protected]>
Acked-by: Roman Li <[email protected]>
Signed-off-by: Hersen Wu <[email protected]>
Tested-by: Daniel Wheeler <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
SDMA_CNTL is not set in some cases, driver configures it by itself.
v2: simplify code
Signed-off-by: Tao Zhou <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
[Why]
Dmub provides several Replay residency calculation methods,
but current interface only supports either ALPM or PHY mode
[How]
Modify the interface for supporting different types
of Replay residency calculation.
Reviewed-by: Robin Chen <[email protected]>
Acked-by: Roman Li <[email protected]>
Signed-off-by: Leon Huang <[email protected]>
Tested-by: Daniel Wheeler <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
[Why]
With root clock optimization now enabled for DCN35 there
are still RCO registers still not being toggled
[How]
Add in logic to toggle RCO registers for DPPCLK,
DPSTREAMCLK and DSCCLK
Reviewed-by: Charlene Liu <[email protected]>
Acked-by: Roman Li <[email protected]>
Signed-off-by: Daniel Miess <[email protected]>
Tested-by: Daniel Wheeler <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
[why]
There could be cases that we are transition from MPC to ODM combine.
In this case if we map pipes before unmapping MPC pipes, we might
temporarly run out of pipes. The change reorders pipe resource
allocation. So we unmapping pipes before mapping new pipes.
Reviewed-by: Dillon Varone <[email protected]>
Acked-by: Roman Li <[email protected]>
Signed-off-by: Wenjing Liu <[email protected]>
Tested-by: Daniel Wheeler <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
[Why]
In two display configuration, switching between subvp and non-subvp
may cause underflow because it moves an existing pipe between
displays
[How]
Create helper function for applying pipe split flags
Apply pipe split flags prior to deciding on subvp
During subvp check, do not merge pipes, so it can retain previous
pipe configuration
Add check for prev odm pipe in subvp check
For single display subvp case, use same odm policy for phantom pipes
as main subvp pipe
Reviewed-by: Alvin Lee <[email protected]>
Acked-by: Roman Li <[email protected]>
Signed-off-by: Samson Tam <[email protected]>
Tested-by: Daniel Wheeler <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
[Why] Mode pbn is not calculated correctly because timing pixel encoding is
not checked within convert_dc_color_depth_into_bpc.
[How] Get mode kbps from dc_bandwidth_in_kbps_from_timing, then calculate
pbn by kbps_to_peak_pbn.
Reviewed-by: Wayne Lin <[email protected]>
Acked-by: Roman Li <[email protected]>
Signed-off-by: Hersen Wu <[email protected]>
Tested-by: Daniel Wheeler <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
[why & how]
Modified definitions of 1 function and 2 structs to remove warnings on
certain specific compiler configurations due to redefinition.
Reviewed-by: Martin Leung <[email protected]>
Acked-by: Roman Li <[email protected]>
Signed-off-by: Mounika Adhuri <[email protected]>
Tested-by: Daniel Wheeler <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
This patch to add smu 14.0.1 support
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Yifan Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
update ppsmc.h pmfw.h and driver_if.h for smu v14_0_1
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: lima1002 <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
From MES version 0x54, the log entry increased and require the log buffer
size to be increased. The 16k is maximum size agreed
Signed-off-by: shaoyunl <[email protected]>
Acked-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
The MES log might slow down the performance for extra step of log the data,
disable it by default and introduce a parameter can enable it when necessary
Signed-off-by: shaoyunl <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
While doing multiple S4 stress tests, GC/RLC/PMFW get into
an invalid state resulting into hard hangs.
Adding a GFX reset as workaround just before sending the
MP1_UNLOAD message avoids this failure.
Signed-off-by: Tim Huang <[email protected]>
Acked-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
refine function signature of amdgpu_aca_get_error_data();
Signed-off-by: Yang Wang <[email protected]>
Reviewed-by: Tao Zhou <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
There is a new DCN veriosn 3.5.1 need to load
Signed-off-by: Li Ma <[email protected]>
Reviewed-by: Yifan Zhang <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
For SOC21 ASICs, there is an issue in re-enabling PM features if a
suspend got aborted. In such cases, reset the device during resume
phase. This is a workaround till a proper solution is finalized.
Signed-off-by: Lijo Lazar <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Reviewed-by: Yang Wang <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Cc: [email protected]
|
|
Add FW information of all the IP's in the devcoredump.
Signed-off-by: Sunil Khatri <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Otherwise the old one will be used during GPU reset.
That's not expected.
Signed-off-by: Lang Yu <[email protected]>
Reviewed-by: Feifei Xu <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Downgrade to debug information when IBs are skipped. Also, use dev_* to
identify the device.
Signed-off-by: Lijo Lazar <[email protected]>
Reviewed-by: Christian König <[email protected]>
Reviewed-by: Asad Kamal <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
v1:
add a new enumeration type to identify device attribute node,
this method is relatively more efficient compared with 'strcmp' in
update_attr() function.
v2:
rename device_attr_type to device_attr_id.
Signed-off-by: Yang Wang <[email protected]>
Reviewed-by: Ma Jun <[email protected]>
Reviewed-by: Lijo Lazar <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
https://gitlab.freedesktop.org/drm/misc/kernel into drm-next
Two misc-next in one.
drm-misc-next for v6.10-rc1:
The deal of a lifetime! You get ALL of the previous
drm-misc-next-2024-03-21-1 tag!!
But WAIT, there's MORE!
Cross-subsystem Changes:
- Assorted DT binding updates.
Core Changes:
- Clarify how optional wait_hpd_asserted is.
- Shuffle Kconfig names around.
Driver Changes:
- Assorted build fixes for panthor, imagination,
- Add AUO B120XAN01.0 panels.
- Assorted small fixes to panthor, panfrost.
drm-misc-next for v6.10:
UAPI Changes:
- Move some nouveau magic constants to uapi.
Cross-subsystem Changes:
- Move drm-misc to gitlab and freedesktop hosting.
- Add entries for panfrost.
Core Changes:
- Improve placement for TTM bo's in idle/busy handling.
- Improve drm/bridge init ordering.
- Add CONFIG_DRM_WERROR, and use W=1 for drm.
- Assorted documentation updates.
- Make more (drm and driver) headers self-contained and add header
guards.
- Grab reservation lock in pin/unpin callbacks.
- Fix reservation lock handling for vmap.
- Add edp and edid panel matching, use it to fix a nearly identical
panel.
Driver Changes:
- Add drm/panthor driver and assorted fixes.
- Assorted small fixes to xlnx, panel-edp, tidss, ci, nouveau,
panel and bridge drivers.
- Add Samsung s6e3fa7, BOE NT116WHM-N44, CMN N116BCA-EA1,
CrystalClear CMT430B19N00, Startek KD050HDFIA020-C020A,
powertip PH128800T006-ZHC01 panels.
- Fix console for omapdrm.
Signed-off-by: Dave Airlie <[email protected]>
From: Maarten Lankhorst <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
Backmerging to get v6.9-rc2 changes into drm-misc-next.
Signed-off-by: Thomas Zimmermann <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild fixes from Masahiro Yamada:
- Deduplicate Kconfig entries for CONFIG_CXL_PMU
- Fix unselectable choice entry in MIPS Kconfig, and forbid this
structure
- Remove unused include/asm-generic/export.h
- Fix a NULL pointer dereference bug in modpost
- Enable -Woverride-init warning consistently with W=1
- Drop KCSAN flags from *.mod.c files
* tag 'kbuild-fixes-v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
kconfig: Fix typo HEIGTH to HEIGHT
Documentation/llvm: Note s390 LLVM=1 support with LLVM 18.1.0 and newer
kbuild: Disable KCSAN for autogenerated *.mod.c intermediaries
kbuild: make -Woverride-init warnings more consistent
modpost: do not make find_tosym() return NULL
export.h: remove include/asm-generic/export.h
kconfig: do not reparent the menu inside a choice block
MIPS: move unselectable FIT_IMAGE_FDT_EPM5 out of the "System type" choice
cxl: remove CONFIG_CXL_PMU entry in drivers/cxl/Kconfig
|
|
The -Woverride-init warn about code that may be intentional or not,
but the inintentional ones tend to be real bugs, so there is a bit of
disagreement on whether this warning option should be enabled by default
and we have multiple settings in scripts/Makefile.extrawarn as well as
individual subsystems.
Older versions of clang only supported -Wno-initializer-overrides with
the same meaning as gcc's -Woverride-init, though all supported versions
now work with both. Because of this difference, an earlier cleanup of
mine accidentally turned the clang warning off for W=1 builds and only
left it on for W=2, while it's still enabled for gcc with W=1.
There is also one driver that only turns the warning off for newer
versions of gcc but not other compilers, and some but not all the
Makefiles still use a cc-disable-warning conditional that is no
longer needed with supported compilers here.
Address all of the above by removing the special cases for clang
and always turning the warning off unconditionally where it got
in the way, using the syntax that is supported by both compilers.
Fixes: 2cd3271b7a31 ("kbuild: avoid duplicate warning options")
Signed-off-by: Arnd Bergmann <[email protected]>
Acked-by: Hamza Mahfooz <[email protected]>
Acked-by: Jani Nikula <[email protected]>
Acked-by: Andrew Jeffery <[email protected]>
Signed-off-by: Jani Nikula <[email protected]>
Reviewed-by: Linus Walleij <[email protected]>
Signed-off-by: Masahiro Yamada <[email protected]>
|
|
Most of our helpers have relied on being selected so far through
Kconfig, but that creates issues when we have multiple layers of helpers
with some depending on others.
Indeed, select doesn't select a dependency's dependencies, and thus
isn't super intuitive. Depends on however doesn't have that limitation,
so we can just switch all the drivers that were selecting
DRM_DISPLAY_HDMI_HELPER to depend on it.
Reviewed-by: Jani Nikula <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Maxime Ripard <[email protected]>
|
|
Most of our helpers have relied on being selected so far through
Kconfig, but that creates issues when we have multiple layers of helpers
with some depending on others.
Indeed, select doesn't select a dependency's dependencies, and thus
isn't super intuitive. Depends on however doesn't have that limitation,
so we can just switch all the drivers that were selecting
DRM_DISPLAY_HDCP_HELPER to depend on it.
Reviewed-by: Jani Nikula <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Maxime Ripard <[email protected]>
|
|
Most of our helpers have relied on being selected so far through
Kconfig, but that creates issues when we have multiple layers of helpers
with some depending on others.
Indeed, select doesn't select a dependency's dependencies, and thus
isn't super intuitive. Depends on however doesn't have that limitation,
so we can just switch all the drivers that were selecting
DRM_DISPLAY_DP_HELPER to depend on it.
Reviewed-by: Jani Nikula <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Maxime Ripard <[email protected]>
|
|
Most of our helpers have relied on being selected so far through
Kconfig, but that creates issues when we have multiple layers of helpers
with some depending on others.
Indeed, select doesn't select a dependency's dependencies, and thus
isn't super intuitive. Depends on however doesn't have that limitation,
so we can just switch all the drivers that were selecting
DRM_DISPLAY_HELPER to depend on it.
Reviewed-by: Jani Nikula <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Maxime Ripard <[email protected]>
|
|
An errant disk backup on my desktop got into debugfs and triggered the
following deadlock scenario in the amdgpu debugfs files. The machine
also hard-resets immediately after those lines are printed (although I
wasn't able to reproduce that part when reading by hand):
[ 1318.016074][ T1082] ======================================================
[ 1318.016607][ T1082] WARNING: possible circular locking dependency detected
[ 1318.017107][ T1082] 6.8.0-rc7-00015-ge0c8221b72c0 #17 Not tainted
[ 1318.017598][ T1082] ------------------------------------------------------
[ 1318.018096][ T1082] tar/1082 is trying to acquire lock:
[ 1318.018585][ T1082] ffff98c44175d6a0 (&mm->mmap_lock){++++}-{3:3}, at: __might_fault+0x40/0x80
[ 1318.019084][ T1082]
[ 1318.019084][ T1082] but task is already holding lock:
[ 1318.020052][ T1082] ffff98c4c13f55f8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: amdgpu_debugfs_mqd_read+0x6a/0x250 [amdgpu]
[ 1318.020607][ T1082]
[ 1318.020607][ T1082] which lock already depends on the new lock.
[ 1318.020607][ T1082]
[ 1318.022081][ T1082]
[ 1318.022081][ T1082] the existing dependency chain (in reverse order) is:
[ 1318.023083][ T1082]
[ 1318.023083][ T1082] -> #2 (reservation_ww_class_mutex){+.+.}-{3:3}:
[ 1318.024114][ T1082] __ww_mutex_lock.constprop.0+0xe0/0x12f0
[ 1318.024639][ T1082] ww_mutex_lock+0x32/0x90
[ 1318.025161][ T1082] dma_resv_lockdep+0x18a/0x330
[ 1318.025683][ T1082] do_one_initcall+0x6a/0x350
[ 1318.026210][ T1082] kernel_init_freeable+0x1a3/0x310
[ 1318.026728][ T1082] kernel_init+0x15/0x1a0
[ 1318.027242][ T1082] ret_from_fork+0x2c/0x40
[ 1318.027759][ T1082] ret_from_fork_asm+0x11/0x20
[ 1318.028281][ T1082]
[ 1318.028281][ T1082] -> #1 (reservation_ww_class_acquire){+.+.}-{0:0}:
[ 1318.029297][ T1082] dma_resv_lockdep+0x16c/0x330
[ 1318.029790][ T1082] do_one_initcall+0x6a/0x350
[ 1318.030263][ T1082] kernel_init_freeable+0x1a3/0x310
[ 1318.030722][ T1082] kernel_init+0x15/0x1a0
[ 1318.031168][ T1082] ret_from_fork+0x2c/0x40
[ 1318.031598][ T1082] ret_from_fork_asm+0x11/0x20
[ 1318.032011][ T1082]
[ 1318.032011][ T1082] -> #0 (&mm->mmap_lock){++++}-{3:3}:
[ 1318.032778][ T1082] __lock_acquire+0x14bf/0x2680
[ 1318.033141][ T1082] lock_acquire+0xcd/0x2c0
[ 1318.033487][ T1082] __might_fault+0x58/0x80
[ 1318.033814][ T1082] amdgpu_debugfs_mqd_read+0x103/0x250 [amdgpu]
[ 1318.034181][ T1082] full_proxy_read+0x55/0x80
[ 1318.034487][ T1082] vfs_read+0xa7/0x360
[ 1318.034788][ T1082] ksys_read+0x70/0xf0
[ 1318.035085][ T1082] do_syscall_64+0x94/0x180
[ 1318.035375][ T1082] entry_SYSCALL_64_after_hwframe+0x46/0x4e
[ 1318.035664][ T1082]
[ 1318.035664][ T1082] other info that might help us debug this:
[ 1318.035664][ T1082]
[ 1318.036487][ T1082] Chain exists of:
[ 1318.036487][ T1082] &mm->mmap_lock --> reservation_ww_class_acquire --> reservation_ww_class_mutex
[ 1318.036487][ T1082]
[ 1318.037310][ T1082] Possible unsafe locking scenario:
[ 1318.037310][ T1082]
[ 1318.037838][ T1082] CPU0 CPU1
[ 1318.038101][ T1082] ---- ----
[ 1318.038350][ T1082] lock(reservation_ww_class_mutex);
[ 1318.038590][ T1082] lock(reservation_ww_class_acquire);
[ 1318.038839][ T1082] lock(reservation_ww_class_mutex);
[ 1318.039083][ T1082] rlock(&mm->mmap_lock);
[ 1318.039328][ T1082]
[ 1318.039328][ T1082] *** DEADLOCK ***
[ 1318.039328][ T1082]
[ 1318.040029][ T1082] 1 lock held by tar/1082:
[ 1318.040259][ T1082] #0: ffff98c4c13f55f8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: amdgpu_debugfs_mqd_read+0x6a/0x250 [amdgpu]
[ 1318.040560][ T1082]
[ 1318.040560][ T1082] stack backtrace:
[ 1318.041053][ T1082] CPU: 22 PID: 1082 Comm: tar Not tainted 6.8.0-rc7-00015-ge0c8221b72c0 #17 3316c85d50e282c5643b075d1f01a4f6365e39c2
[ 1318.041329][ T1082] Hardware name: Gigabyte Technology Co., Ltd. B650 AORUS PRO AX/B650 AORUS PRO AX, BIOS F20 12/14/2023
[ 1318.041614][ T1082] Call Trace:
[ 1318.041895][ T1082] <TASK>
[ 1318.042175][ T1082] dump_stack_lvl+0x4a/0x80
[ 1318.042460][ T1082] check_noncircular+0x145/0x160
[ 1318.042743][ T1082] __lock_acquire+0x14bf/0x2680
[ 1318.043022][ T1082] lock_acquire+0xcd/0x2c0
[ 1318.043301][ T1082] ? __might_fault+0x40/0x80
[ 1318.043580][ T1082] ? __might_fault+0x40/0x80
[ 1318.043856][ T1082] __might_fault+0x58/0x80
[ 1318.044131][ T1082] ? __might_fault+0x40/0x80
[ 1318.044408][ T1082] amdgpu_debugfs_mqd_read+0x103/0x250 [amdgpu 8fe2afaa910cbd7654c8cab23563a94d6caebaab]
[ 1318.044749][ T1082] full_proxy_read+0x55/0x80
[ 1318.045042][ T1082] vfs_read+0xa7/0x360
[ 1318.045333][ T1082] ksys_read+0x70/0xf0
[ 1318.045623][ T1082] do_syscall_64+0x94/0x180
[ 1318.045913][ T1082] ? do_syscall_64+0xa0/0x180
[ 1318.046201][ T1082] ? lockdep_hardirqs_on+0x7d/0x100
[ 1318.046487][ T1082] ? do_syscall_64+0xa0/0x180
[ 1318.046773][ T1082] ? do_syscall_64+0xa0/0x180
[ 1318.047057][ T1082] ? do_syscall_64+0xa0/0x180
[ 1318.047337][ T1082] ? do_syscall_64+0xa0/0x180
[ 1318.047611][ T1082] entry_SYSCALL_64_after_hwframe+0x46/0x4e
[ 1318.047887][ T1082] RIP: 0033:0x7f480b70a39d
[ 1318.048162][ T1082] Code: 91 ba 0d 00 f7 d8 64 89 02 b8 ff ff ff ff eb b2 e8 18 a3 01 00 0f 1f 84 00 00 00 00 00 80 3d a9 3c 0e 00 00 74 17 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 5b c3 66 2e 0f 1f 84 00 00 00 00 00 53 48 83
[ 1318.048769][ T1082] RSP: 002b:00007ffde77f5c68 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[ 1318.049083][ T1082] RAX: ffffffffffffffda RBX: 0000000000000800 RCX: 00007f480b70a39d
[ 1318.049392][ T1082] RDX: 0000000000000800 RSI: 000055c9f2120c00 RDI: 0000000000000008
[ 1318.049703][ T1082] RBP: 0000000000000800 R08: 000055c9f2120a94 R09: 0000000000000007
[ 1318.050011][ T1082] R10: 0000000000000000 R11: 0000000000000246 R12: 000055c9f2120c00
[ 1318.050324][ T1082] R13: 0000000000000008 R14: 0000000000000008 R15: 0000000000000800
[ 1318.050638][ T1082] </TASK>
amdgpu_debugfs_mqd_read() holds a reservation when it calls
put_user(), which may fault and acquire the mmap_sem. This violates
the established locking order.
Bounce the mqd data through a kernel buffer to get put_user() out of
the illegal section.
Fixes: 445d85e3c1df ("drm/amdgpu: add debugfs interface for reading MQDs")
Cc: [email protected] # v6.5+
Reviewed-by: Shashank Sharma <[email protected]>
Signed-off-by: Johannes Weiner <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Share same codes with 4.0.5 and enable collaborate mode for VPE.
Signed-off-by: Lang Yu <[email protected]>
Reviewed-by: Veerabadhran Gopalakrishnan <[email protected]>
Acked-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|