aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2024-06-19bcachefs: Fix bch2_trans_put()Kent Overstreet1-3/+8
reference: https://github.com/koverstreet/bcachefs/issues/692 trans->ref is the reference used by the cycle detector, which walks btree_trans objects of other threads to walk the graph of held locks and issue wakeups when an abort is required. We have to wait for the ref to go to 1 before freeing trans->paths or clearing trans->locking_wait.task. Signed-off-by: Kent Overstreet <[email protected]>
2024-06-19bcachefs: set_worker_desc() for delete_dead_snapshotsKent Overstreet1-0/+2
this is long running - help users see what's going on Signed-off-by: Kent Overstreet <[email protected]>
2024-06-19bcachefs: Fix bch2_sb_downgrade_update()Kent Overstreet1-1/+1
Missing enum conversion Signed-off-by: Kent Overstreet <[email protected]>
2024-06-19bcachefs: Handle cached data LRU wraparoundKent Overstreet1-5/+41
We only have 48 bits for the LRU time field, which is insufficient to prevent wraparound. Signed-off-by: Kent Overstreet <[email protected]>
2024-06-19bcachefs: Guard against overflowing LRU_TIME_BITSKent Overstreet6-12/+32
LRUs only have 48 bits for the time field (i.e. LRU order); thus we need overflow checks and guards. Reported-by: [email protected] Signed-off-by: Kent Overstreet <[email protected]>
2024-06-19bcachefs: delete_dead_snapshots() doesn't need to go RWKent Overstreet1-7/+0
We've been moving away from going RW lazily; if we want to go RW we do that in set_may_go_rw(), and if we didn't go RW we don't need to delete dead snapshots. Reported-by: [email protected] Signed-off-by: Kent Overstreet <[email protected]>
2024-06-19bcachefs: Fix early init error path in journal codeKent Overstreet1-0/+3
We shouln't be running the journal shutdown sequence if we never fully initialized the journal. Reported-by: [email protected] Signed-off-by: Kent Overstreet <[email protected]>
2024-06-19bcachefs: Check for invalid btree IDsKent Overstreet2-2/+12
We can only handle btree IDs up to 62, since the btree id (plus the type for interior btree nodes) has to fit ito a 64 bit bitmask - check for invalid ones to avoid invalid shifts later. Signed-off-by: Kent Overstreet <[email protected]>
2024-06-19bcachefs: Fix btree ID bitmasksKent Overstreet2-10/+11
these should be 64 bit bitmasks, not 32 bit. Signed-off-by: Kent Overstreet <[email protected]>
2024-06-19bcachefs: Fix shift overflow in read_one_super()Kent Overstreet1-3/+4
Reported-by: [email protected] Signed-off-by: Kent Overstreet <[email protected]>
2024-06-19bcachefs: Fix a locking bug in the do_discard_fast() pathKent Overstreet1-1/+1
We can't discard a bucket while it's still open; this needs the bucket_is_open_safe() version, which takes the open_buckets lock. Signed-off-by: Kent Overstreet <[email protected]>
2024-06-19bcachefs: Fix array-index-out-of-boundsKent Overstreet3-3/+8
We use 0 size arrays as markers, but ubsan doesn't know that - cast them to a pointer to fix the splat. Also, make sure this code gets tested a bit more. Signed-off-by: Kent Overstreet <[email protected]>
2024-06-19bcachefs: Fix initialization order for srcu barrierKent Overstreet1-1/+1
btree_iter_init() needs to happen before key_cache_init(), to initialize btree_trans_barrier Reported-by: [email protected] Signed-off-by: Kent Overstreet <[email protected]>
2024-06-19drm/amdgpu: init TA fw for psp v14Likun Gao1-0/+5
Add support to init TA firmware for psp v14. Signed-off-by: Likun Gao <[email protected]> Acked-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-06-19drm/amdgpu: cleanup MES11 command submissionChristian König1-28/+48
The approach of having a separate WB slot for each submission doesn't really work well and for example breaks GPU reset. Use a status query packet for the fence update instead since those should always succeed we can use the fence of the original packet to signal the state of the operation. While at it cleanup the coding style. Fixes: eef016ba8986 ("drm/amdgpu/mes11: Use a separate fence per transaction") Reviewed-by: Mukul Joshi <[email protected]> Signed-off-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-06-19drm/amdgpu: fix UBSAN warning in kv_dpm.cAlex Deucher1-0/+2
Adds bounds check for sumo_vid_mapping_entry. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3392 Reviewed-by: Mario Limonciello <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected]
2024-06-19drm/radeon: fix UBSAN warning in kv_dpm.cAlex Deucher1-0/+2
Adds bounds check for sumo_vid_mapping_entry. Reviewed-by: Mario Limonciello <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected]
2024-06-19fortify: Do not special-case 0-sized destinationsKees Cook2-8/+3
All fake flexible arrays should have been removed now, so remove the special casing that was avoiding checking them. If a destination claims to be 0 sized, believe it. This is especially important for cases where __counted_by is in use and may have a 0 element count. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Kees Cook <[email protected]>
2024-06-19exec: Add KUnit test for bprm_stack_limits()Kees Cook4-0/+136
Since bprm_stack_limits() operates with very limited side-effects, add it as the first exec.c KUnit test. Add to Kconfig and adjust MAINTAINERS file to include it. Tested on 64-bit UML: $ tools/testing/kunit/kunit.py run exec Link: https://lore.kernel.org/lkml/[email protected]/ Signed-off-by: Kees Cook <[email protected]>
2024-06-19cifs: fix typo in module parameter enable_gcm_256Steve French1-1/+1
enable_gcm_256 (which allows the server to require the strongest encryption) is enabled by default, but the modinfo description incorrectly showed it disabled by default. Fix the typo. Cc: [email protected] Fixes: fee742b50289 ("smb3.1.1: enable negotiating stronger encryption by default") Signed-off-by: Steve French <[email protected]>
2024-06-19binfmt_elf: Honor PT_LOAD alignment for static PIEKees Cook2-6/+38
The p_align values in PT_LOAD were ignored for static PIE executables (i.e. ET_DYN without PT_INTERP). This is because there is no way to request a non-fixed mmap region with a specific alignment. ET_DYN with PT_INTERP uses a separate base address (ELF_ET_DYN_BASE) and binfmt_elf performs the ASLR itself, which means it can also apply alignment. For the mmap region, the address selection happens deep within the vm_mmap() implementation (when the requested address is 0). The earlier attempt to implement this: commit 9630f0d60fec ("fs/binfmt_elf: use PT_LOAD p_align values for static PIE") commit 925346c129da ("fs/binfmt_elf: fix PT_LOAD p_align values for loaders") did not take into account the different base address origins, and were eventually reverted: aeb7923733d1 ("revert "fs/binfmt_elf: use PT_LOAD p_align values for static PIE"") In order to get the correct alignment from an mmap base, binfmt_elf must perform a 0-address load first, then tear down the mapping and perform alignment on the resulting address. Since this is slightly more overhead, only do this when it is needed (i.e. the alignment is not the default ELF alignment). This does, however, have the benefit of being able to use MAP_FIXED_NOREPLACE, to avoid potential collisions. With this fixed, enable the static PIE self tests again. Reported-by: H.J. Lu <[email protected]> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=215275 Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Kees Cook <[email protected]>
2024-06-19binfmt_elf: Calculate total_size earlierKees Cook1-25/+27
In preparation to support PT_LOAD with large p_align values on non-PT_INTERP ET_DYN executables (i.e. "static pie"), we'll need to use the total_size details earlier. Move this separately now to make the next patch more readable. As total_size and load_bias are currently calculated separately, this has no behavioral impact. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Kees Cook <[email protected]>
2024-06-19selftests/exec: Build both static and non-static load_address testsKees Cook2-20/+66
After commit 4d1cd3b2c5c1 ("tools/testing/selftests/exec: fix link error"), the load address alignment tests tried to build statically. This was silently ignored in some cases. However, after attempting to further fix the build by switching to "-static-pie", the test started failing. This appears to be due to non-PT_INTERP ET_DYN execs ("static PIE") not doing alignment correctly, which remains unfixed[1]. See commit aeb7923733d1 ("revert "fs/binfmt_elf: use PT_LOAD p_align values for static PIE"") for more details. Provide rules to build both static and non-static PIE binaries, improve debug reporting, and perform several test steps instead of a single all-or-nothing test. However, do not actually enable static-pie tests; alignment specification is only supported for ET_DYN with PT_INTERP ("regular PIE"). Link: https://bugzilla.kernel.org/show_bug.cgi?id=215275 [1] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Kees Cook <[email protected]>
2024-06-19x86/alternatives: Make FineIBT mode Kconfig selectableKees Cook3-5/+14
Since FineIBT performs checking at the destination, it is weaker against attacks that can construct arbitrary executable memory contents. As such, some system builders want to run with FineIBT disabled by default. Allow the "cfi=kcfi" boot param mode to be selectable through Kconfig via the newly introduced CONFIG_CFI_AUTO_DEFAULT. Reviewed-by: Sami Tolvanen <[email protected]> Reviewed-by: Nathan Chancellor <[email protected]> Tested-by: Nathan Chancellor <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Kees Cook <[email protected]>
2024-06-19vfs: link_path_walk: do '.' and '..' detection while hashingLinus Torvalds1-22/+46
Instead of loading the name again to detect '.' and '..', just use the fact that we already had the masked last word available when as we created the name hash. Which is exactly what we'd then test for. Dealing with big-endian word ordering needs a bit of care, particularly since we have the byte-at-a-time loop as a fallback that doesn't do BE word loads. But not a big deal. Signed-off-by: Linus Torvalds <[email protected]>
2024-06-19vfs: link_path_walk: clarify and improve name hashing interfaceLinus Torvalds1-8/+12
Now that we clearly only care about the length of the name we just parsed, we can simplify and clarify the interface to "name_hash()", and move the actual nd->last field setting in there. That makes everything simpler, and this way don't mix the hash and the length together only to then immediately unmix them again. We still eventually want the combined mixed "hashlen" for when we look things up in the dentry cache, but inside link_path_walk() it's simpler and clearer to just deal with the path component length. Signed-off-by: Linus Torvalds <[email protected]>
2024-06-19vfs: link_path_walk: simplify name hash flowLinus Torvalds1-16/+13
This is one of those hot functions in path walking, and it's doing things in just the wrong order that causes slightly unnecessary extra work. Move the name pointer update and the setting of 'nd->last' up a bit, so that the (unlikely) filesystem-specific hashing can run on them in place, instead of having to set up a copy on the stack and copy things back and forth. Because even when the hashing is not run, it causes the stack frame of the function to be bigger to hold the unnecessary temporary copy. This also means that we never then reference the full "hashlen" field after calculating it, and can clarify the code with just using the length part. Signed-off-by: Linus Torvalds <[email protected]>
2024-06-19arm64: word-at-a-time: improve byte count calculations for LELinus Torvalds1-8/+3
Do the same optimization as x86-64: do __ffs() on the intermediate value that found whether there is a zero byte, before we've actually computed the final byte mask. The logic is: has_zero(): Check if the word has a zero byte in it, which indicates the end of the loop, and prepare a value to be used for the rest of the sequence. The standard LE implementation just creates a word that has the high bit set in each byte of the word that was zero. Example: 0xaa00bbccdd00eeff -> 0x0080000000800000 prep_zero_mask(): Possibly do more prep to then clean up the initial fast result from has_zero, so that it can be combined with another zero mask with a simple logical "or" to create a final mask. This is only used on big-endian machines that use a different algorithm, and is a no-op here. create_zero_mask(): This is "step 1" of creating the count and the mask, and is meant for any common operations between the two. In the old implementation, this actually created the zero mask, that was then used for masking and for counting the number of bits in the mask. In the new implementation, this is a no-op. count_zero(): This takes the mask bits, and counts the number of bytes before the first zero byte. In the old implementation, it counted the number of bits in the final byte mask (which was the same as the C standard "find last set bit" that uses the silly "starts at one" counting) and shifted the value down by three. In the new implementation, we know the intermediate mask isn't zero, and it just does "find first set" with the sane semantics without any off-by-one issues, and again shifts by three (which also masks off the bit offset in the zero byte itself). Example: 0x0080000000800000 -> 2 zero_bytemask(): This takes the mask bits, and turns it into an actual byte mask of the bytes preceding the first zero byte. In the old implementation, this was a no-op, because the work had already been done by create_zero_mask(). In the new implementation, this does what create_zero_mask() used to do. Example: 0x0080000000800000 -> 0x000000000000ffff The difference between the old and the new implementation is that "count_zero()" ends up scheduling better because it is being done on a value that is available earlier (before the final mask). But more importantly, it can be implemented without the insane semantics of the standard bit finding helpers that have the off-by-one issue and have to special-case the zero mask situation. On arm64, the new "count_zero()" ends up just "rbit + clz" plus the shift right that then ends up being subsumed by the "add to final length". Signed-off-by: Linus Torvalds <[email protected]>
2024-06-19x86-64: word-at-a-time: improve byte count calculationsLinus Torvalds1-34/+23
This switches x86-64 over to using 'tzcount' instead of the integer multiply trick to turn the bytemask information into actual byte counts. We even had a comment saying that a fast bit count instruction is better than a multiply, but x86 bit counting has traditionally been "questionably fast", and so avoiding it was the right thing back in the days. Now, on any half-way modern core, using bit counting is cheaper and smaller than the large constant multiply, so let's just switch over. Note that as part of switching over to counting bits, we also do it at a different point. We used to create the byte count from the final byte mask, but once you use the 'tzcount' instruction (aka 'bsf' on older CPU's), you can actually count the leading zeroes using a value we have available earlier. In fact, we can just use the very first mask of bits that tells us whether we have any zero bytes at all. The zero bytes in the word will have the high bit set, so just doing 'tzcount' on that value and dividing by 8 will give the number of bytes that precede the first NUL character, which is exactly what we want. Note also that the input value to the tzcount is by definition not zero, since that is the condition that we already used to check the whole "do we have any zero bytes at all". So we don't need to worry about the legacy instruction behavior of pre-lzcount days when 'bsf' didn't have a result for zero input. The 32-bit code continues to use the bimple bit op trick that is faster even on newer cores, but particularly on the older 32-bit-only ones. Signed-off-by: Linus Torvalds <[email protected]>
2024-06-19runtime constants: add x86 architecture supportLinus Torvalds2-0/+64
This implements the runtime constant infrastructure for x86, allowing the dcache d_hash() function to be generated using as a constant for hash table address followed by shift by a constant of the hash index. Signed-off-by: Linus Torvalds <[email protected]>
2024-06-19runtime constants: add default dummy infrastructureLinus Torvalds4-1/+34
This adds the initial dummy support for 'runtime constants' for when an architecture doesn't actually support an implementation of fixing up said runtime constants. This ends up being the fallback to just using the variables as regular __ro_after_init variables, and changes the dcache d_hash() function to use this model. Signed-off-by: Linus Torvalds <[email protected]>
2024-06-19vfs: dcache: move hashlen_hash() from callers into d_hash()Linus Torvalds1-4/+4
Both __d_lookup_rcu() and __d_lookup_rcu_op_compare() have the full 'name_hash' value of the qstr that they want to look up, and mask it off to just the low 32-bit hash before calling down to d_hash(). Other callers just load the 32-bit hash and pass it as the argument. If we move the masking into d_hash() itself, it simplifies the two callers that currently do the masking, and is a no-op for the other cases. It doesn't actually change the generated code since the compiler will inline d_hash() and see that the end result is the same. [ Technically, since the parse tree changes, the code generation may not be 100% the same, and for me on x86-64, this does result in gcc switching the operands around for one 'cmpl' instruction. So not necessarily the exact same code generation, but equivalent ] However, this does encapsulate the 'd_hash()' operation more, and makes the shift operation in particular be a "shift 32 bits right, return full word". Which matches the instruction semantics on both x86-64 and arm64 better, since a 32-bit shift will clear the upper bits. That makes the next step of introducing a "shift by runtime constant" more obvious and generates the shift with no extraneous type masking. Signed-off-by: Linus Torvalds <[email protected]>
2024-06-19arm64: start using 'asm goto' for put_user()Linus Torvalds2-34/+39
This generates noticeably better code since we don't need to test the error register etc, the exception just jumps to the error handling directly. Unlike get_user(), there's no need to worry about old compilers. All supported compilers support the regular non-output 'asm goto', as pointed out by Nathan Chancellor. Signed-off-by: Linus Torvalds <[email protected]>
2024-06-19arm64: start using 'asm goto' for get_user() when availableLinus Torvalds3-30/+104
This generates noticeably better code with compilers that support it, since we don't need to test the error register etc, the exception just jumps to the error handling directly. Note that this also marks SW_TTBR0_PAN incompatible with KCSAN support, since KCSAN wants to save and restore the user access state. KCSAN and SW_TTBR0_PAN were probably always incompatible, but it became obvious only when implementing the unsafe user access functions. At that point the default empty user_access_save/restore() functions weren't provided by the default fallback functions. Signed-off-by: Linus Torvalds <[email protected]>
2024-06-19arm64: dts: rockchip: fix PMIC interrupt pin on ROCK Pi EFUKAUMI Naoki1-2/+2
use GPIO0_A2 as interrupt pin for PMIC. GPIO2_A6 was used for pre-production board. Fixes: b918e81f2145 ("arm64: dts: rockchip: rk3328: Add Radxa ROCK Pi E") Signed-off-by: FUKAUMI Naoki <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Heiko Stuebner <[email protected]>
2024-06-19hte: tegra-194: add missing MODULE_DESCRIPTION() macroJeff Johnson1-0/+1
make allmodconfig && make W=1 C=1 reports: WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/hte/hte-tegra194-test.o Add the missing invocation of the MODULE_DESCRIPTION() macro. Signed-off-by: Jeff Johnson <[email protected]> Acked-by: Dipen Patel <[email protected]> Signed-off-by: Dipen Patel <[email protected]>
2024-06-19cpufreq: intel_pstate: Update Lunar Lake hybrid scaling factorSrinivas Pandruvada1-0/+2
Change hybrid scaling factor for Lunar Lake. Scaling factor is 1.15 for P-cores compared to E-cores. Signed-off-by: Srinivas Pandruvada <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Rafael J. Wysocki <[email protected]>
2024-06-19cpufreq: intel_pstate: Update Arrow Lake hybrid scaling factorSrinivas Pandruvada1-0/+1
Arrow Lake uses the same scaling factor as Meteor Lake, so reuse the same scaling factor. Signed-off-by: Srinivas Pandruvada <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Rafael J. Wysocki <[email protected]>
2024-06-19drm/etnaviv: don't disable TS on MMUv2 core when moving the linear windowLucas Stach1-2/+5
On MMUv2 cores the linear window is only relevant when starting the FE, before the MMU has been activated. Once the MMU is active, all accesses are translated with no way to bypass the MMU via the linear window. Thus TS ignoring the linear window offset is not an issue on cores with MMUv2 present and there is no need to disable TS when we need to move the linear window. Signed-off-by: Lucas Stach <[email protected]> Tested-by: Joao Paulo Goncalves <[email protected]>
2024-06-19drm/etnaviv: Read some FE registers twiceDerek Foreman1-0/+8
On some hardware (such at the GC7000 rev 6009), these registers need to be read twice to return the correct value. Hide that in gpu_read(). Signed-off-by: Derek Foreman <[email protected]> Signed-off-by: Lucas Stach <[email protected]>
2024-06-19drm/amd/display: Disable CONFIG_DRM_AMD_DC_FP for RISC-V with clangNathan Chancellor1-1/+1
Commit 77acc6b55ae4 ("riscv: add support for kernel-mode FPU") and commit a28e4b672f04 ("drm/amd/display: use ARCH_HAS_KERNEL_FPU_SUPPORT") enabled support for CONFIG_DRM_AMD_DC_FP with RISC-V. Unfortunately, this exposed -Wframe-larger-than warnings (which become fatal with CONFIG_WERROR=y) when building ARCH=riscv allmodconfig with clang: drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn32/display_mode_vba_32.c:58:13: error: stack frame size (2448) exceeds limit (2048) in 'DISPCLKDPPCLKDCFCLKDeepSleepPrefetchParametersWatermarksAndPerformanceCalculation' [-Werror,-Wframe-larger-than] 58 | static void DISPCLKDPPCLKDCFCLKDeepSleepPrefetchParametersWatermarksAndPerformanceCalculation( | ^ 1 error generated. Many functions in this file use a large number of parameters, which must be passed on the stack at a certain pointer due to register exhaustion, which can cause high stack usage when inlining and issues with stack slot analysis get involved. While the compiler can and should do better (as GCC uses less than half the amount of stack space for the same function), it is not as simple as a fix as adjusting the functions not to take a large number of parameters. Unfortunately, modifying these files to avoid the problem is a difficult to justify approach because any revisions to the files in the kernel tree never make it back to the original source (so copies of the code for newer hardware revisions just reintroduce the issue) and the files are hard to read/modify due to being "gcc-parsable HW gospel, coming straight from HW engineers". Avoid building the problematic code for RISC-V by modifying the existing condition for arm64 that exists for the same reason. Factor out the logical not to make the condition a little more readable naturally. Fixes: a28e4b672f04 ("drm/amd/display: use ARCH_HAS_KERNEL_FPU_SUPPORT") Reported-by: Palmer Dabbelt <[email protected]> Closes: https://lore.kernel.org/[email protected]/ Reviewed-by: Harry Wentland <[email protected]> Signed-off-by: Nathan Chancellor <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-06-19drm/amd/display: Attempt to avoid empty TUs when endpoint is DPIAMichael Strauss3-1/+75
[WHY] Empty SST TUs are illegal to transmit over a USB4 DP tunnel. Current policy is to configure stream encoder to pack 2 pixels per pclk even when ODM combine is not in use, allowing seamless dynamic ODM reconfiguration. However, in extreme edge cases where average pixel count per TU is less than 2, this can lead to unexpected empty TU generation during compliance testing. For example, VIC 1 with a 1xHBR3 link configuration will average 1.98 pix/TU. [HOW] Calculate average pixel count per TU, and block 2 pixels per clock if endpoint is a DPIA tunnel and pixel clock is low enough that we will never require 2:1 ODM combine. Cc: [email protected] # 6.6+ Reviewed-by: Wenjing Liu <[email protected]> Acked-by: Hamza Mahfooz <[email protected]> Signed-off-by: Michael Strauss <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-06-19drm/amd/display: change dram_clock_latency to 34us for dcn35Paul Hsieh1-1/+1
[Why & How] Current DRAM setting would cause underflow on customer platform. Modify dram_clock_change_latency_us from 11.72 to 34.0 us as per recommendation from HW team Reviewed-by: Nicholas Kazlauskas <[email protected]> Acked-by: Zaeem Mohamed <[email protected]> Signed-off-by: Paul Hsieh <[email protected]> Tested-by: Daniel Wheeler <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-06-19drm/amd/display: Change dram_clock_latency to 34us for dcn351Daniel Miess1-1/+1
[Why] Intermittent underflow observed when using 4k144 display on dcn351 [How] Update dram_clock_change_latency_us from 11.72us to 34us Reviewed-by: Nicholas Kazlauskas <[email protected]> Acked-by: Zaeem Mohamed <[email protected]> Signed-off-by: Daniel Miess <[email protected]> Tested-by: Daniel Wheeler <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-06-19drm/amdgpu: revert "take runtime pm reference when we attach a buffer" v2Christian König3-51/+0
This reverts commit b8c415e3bf98 ("drm/amdgpu: take runtime pm reference when we attach a buffer") and commit 425285d39afd ("drm/amdgpu: add amdgpu runpm usage trace for separate funcs"). Taking a runtime pm reference for DMA-buf is actually completely unnecessary and even dangerous. The problem is that calling pm_runtime_get_sync() from the DMA-buf callbacks is illegal because we have the reservation locked here which is also taken during resume. So this would deadlock. When the buffer is in GTT it is still accessible even when the GPU is powered down and when it is in VRAM the buffer gets migrated to GTT before powering down. The only use case which would make it mandatory to keep the runtime pm reference would be if we pin the buffer into VRAM, and that's not something we currently do. v2: improve the commit message Signed-off-by: Christian König <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]> CC: [email protected]
2024-06-19drm/amdgpu: Indicate CU havest info to CPHarish Kasiviswanathan1-2/+13
To achieve full occupancy CP hardware needs to know if CUs in SE are symmetrically or asymmetrically harvested v2: Reset is_symmetric_cus for each loop Signed-off-by: Harish Kasiviswanathan <[email protected]> Acked-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-06-19drm/amd/display: prevent register access while in IPSHamza Mahfooz1-0/+10
We can't read/write to DCN registers while in IPS. Since, that can cause the system to hang. So, before proceeding with the access in that scenario, force the system out of IPS. Cc: [email protected] # 6.6+ Reviewed-by: Roman Li <[email protected]> Signed-off-by: Hamza Mahfooz <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-06-19drm/amdgpu: fix locking scope when flushing tlbYunxiang Li1-32/+34
Which method is used to flush tlb does not depend on whether a reset is in progress or not. We should skip flush altogether if the GPU will get reset. So put both path under reset_domain read lock. Signed-off-by: Yunxiang Li <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]> CC: [email protected]
2024-06-19drm/amd/display: Remove redundant idle optimization checkRoman Li1-3/+0
[Why] Disable idle optimization for each atomic commit is unnecessary, and can lead to a potential race condition. [How] Remove idle optimization check from amdgpu_dm_atomic_commit_tail() Fixes: 196107eb1e15 ("drm/amd/display: Add IPS checks before dcn register access") Cc: [email protected] Reviewed-by: Hamza Mahfooz <[email protected]> Acked-by: Roman Li <[email protected]> Signed-off-by: Roman Li <[email protected]> Tested-by: Daniel Wheeler <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-06-19x86/uaccess: Improve the 8-byte getuser() caseLinus Torvalds1-49/+20
Streamline the 8-byte case and drop the special handling. Use a macro which hides the exception handling. No functional changes. Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Link: https://lore.kernel.org/r/CAHk-=whYb2L_atsRk9pBiFiVLGe5wNZLHhRinA69yu6FiKvDsw@mail.gmail.com