| Age | Commit message (Collapse) | Author | Files | Lines |
|
reference: https://github.com/koverstreet/bcachefs/issues/692
trans->ref is the reference used by the cycle detector, which walks
btree_trans objects of other threads to walk the graph of held locks and
issue wakeups when an abort is required.
We have to wait for the ref to go to 1 before freeing trans->paths or
clearing trans->locking_wait.task.
Signed-off-by: Kent Overstreet <[email protected]>
|
|
this is long running - help users see what's going on
Signed-off-by: Kent Overstreet <[email protected]>
|
|
Missing enum conversion
Signed-off-by: Kent Overstreet <[email protected]>
|
|
We only have 48 bits for the LRU time field, which is insufficient to
prevent wraparound.
Signed-off-by: Kent Overstreet <[email protected]>
|
|
LRUs only have 48 bits for the time field (i.e. LRU order); thus we need
overflow checks and guards.
Reported-by: [email protected]
Signed-off-by: Kent Overstreet <[email protected]>
|
|
We've been moving away from going RW lazily; if we want to go RW we do
that in set_may_go_rw(), and if we didn't go RW we don't need to delete
dead snapshots.
Reported-by: [email protected]
Signed-off-by: Kent Overstreet <[email protected]>
|
|
We shouln't be running the journal shutdown sequence if we never fully
initialized the journal.
Reported-by: [email protected]
Signed-off-by: Kent Overstreet <[email protected]>
|
|
We can only handle btree IDs up to 62, since the btree id (plus the type
for interior btree nodes) has to fit ito a 64 bit bitmask - check for
invalid ones to avoid invalid shifts later.
Signed-off-by: Kent Overstreet <[email protected]>
|
|
these should be 64 bit bitmasks, not 32 bit.
Signed-off-by: Kent Overstreet <[email protected]>
|
|
Reported-by: [email protected]
Signed-off-by: Kent Overstreet <[email protected]>
|
|
We can't discard a bucket while it's still open; this needs the
bucket_is_open_safe() version, which takes the open_buckets lock.
Signed-off-by: Kent Overstreet <[email protected]>
|
|
We use 0 size arrays as markers, but ubsan doesn't know that - cast them
to a pointer to fix the splat.
Also, make sure this code gets tested a bit more.
Signed-off-by: Kent Overstreet <[email protected]>
|
|
btree_iter_init() needs to happen before key_cache_init(), to initialize
btree_trans_barrier
Reported-by: [email protected]
Signed-off-by: Kent Overstreet <[email protected]>
|
|
Add support to init TA firmware for psp v14.
Signed-off-by: Likun Gao <[email protected]>
Acked-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
The approach of having a separate WB slot for each submission doesn't
really work well and for example breaks GPU reset.
Use a status query packet for the fence update instead since those
should always succeed we can use the fence of the original packet to
signal the state of the operation.
While at it cleanup the coding style.
Fixes: eef016ba8986 ("drm/amdgpu/mes11: Use a separate fence per transaction")
Reviewed-by: Mukul Joshi <[email protected]>
Signed-off-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Adds bounds check for sumo_vid_mapping_entry.
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3392
Reviewed-by: Mario Limonciello <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Cc: [email protected]
|
|
Adds bounds check for sumo_vid_mapping_entry.
Reviewed-by: Mario Limonciello <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Cc: [email protected]
|
|
All fake flexible arrays should have been removed now, so remove the
special casing that was avoiding checking them. If a destination claims
to be 0 sized, believe it. This is especially important for cases where
__counted_by is in use and may have a 0 element count.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Kees Cook <[email protected]>
|
|
Since bprm_stack_limits() operates with very limited side-effects, add
it as the first exec.c KUnit test. Add to Kconfig and adjust MAINTAINERS
file to include it.
Tested on 64-bit UML:
$ tools/testing/kunit/kunit.py run exec
Link: https://lore.kernel.org/lkml/[email protected]/
Signed-off-by: Kees Cook <[email protected]>
|
|
enable_gcm_256 (which allows the server to require the strongest
encryption) is enabled by default, but the modinfo description
incorrectly showed it disabled by default. Fix the typo.
Cc: [email protected]
Fixes: fee742b50289 ("smb3.1.1: enable negotiating stronger encryption by default")
Signed-off-by: Steve French <[email protected]>
|
|
The p_align values in PT_LOAD were ignored for static PIE executables
(i.e. ET_DYN without PT_INTERP). This is because there is no way to
request a non-fixed mmap region with a specific alignment. ET_DYN with
PT_INTERP uses a separate base address (ELF_ET_DYN_BASE) and binfmt_elf
performs the ASLR itself, which means it can also apply alignment. For
the mmap region, the address selection happens deep within the vm_mmap()
implementation (when the requested address is 0).
The earlier attempt to implement this:
commit 9630f0d60fec ("fs/binfmt_elf: use PT_LOAD p_align values for static PIE")
commit 925346c129da ("fs/binfmt_elf: fix PT_LOAD p_align values for loaders")
did not take into account the different base address origins, and were
eventually reverted:
aeb7923733d1 ("revert "fs/binfmt_elf: use PT_LOAD p_align values for static PIE"")
In order to get the correct alignment from an mmap base, binfmt_elf must
perform a 0-address load first, then tear down the mapping and perform
alignment on the resulting address. Since this is slightly more overhead,
only do this when it is needed (i.e. the alignment is not the default
ELF alignment). This does, however, have the benefit of being able to
use MAP_FIXED_NOREPLACE, to avoid potential collisions.
With this fixed, enable the static PIE self tests again.
Reported-by: H.J. Lu <[email protected]>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=215275
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Kees Cook <[email protected]>
|
|
In preparation to support PT_LOAD with large p_align values on
non-PT_INTERP ET_DYN executables (i.e. "static pie"), we'll need to use
the total_size details earlier. Move this separately now to make the
next patch more readable. As total_size and load_bias are currently
calculated separately, this has no behavioral impact.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Kees Cook <[email protected]>
|
|
After commit 4d1cd3b2c5c1 ("tools/testing/selftests/exec: fix link
error"), the load address alignment tests tried to build statically.
This was silently ignored in some cases. However, after attempting to
further fix the build by switching to "-static-pie", the test started
failing. This appears to be due to non-PT_INTERP ET_DYN execs ("static
PIE") not doing alignment correctly, which remains unfixed[1]. See commit
aeb7923733d1 ("revert "fs/binfmt_elf: use PT_LOAD p_align values for
static PIE"") for more details.
Provide rules to build both static and non-static PIE binaries, improve
debug reporting, and perform several test steps instead of a single
all-or-nothing test. However, do not actually enable static-pie tests;
alignment specification is only supported for ET_DYN with PT_INTERP
("regular PIE").
Link: https://bugzilla.kernel.org/show_bug.cgi?id=215275 [1]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Kees Cook <[email protected]>
|
|
Since FineIBT performs checking at the destination, it is weaker against
attacks that can construct arbitrary executable memory contents. As such,
some system builders want to run with FineIBT disabled by default. Allow
the "cfi=kcfi" boot param mode to be selectable through Kconfig via the
newly introduced CONFIG_CFI_AUTO_DEFAULT.
Reviewed-by: Sami Tolvanen <[email protected]>
Reviewed-by: Nathan Chancellor <[email protected]>
Tested-by: Nathan Chancellor <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Kees Cook <[email protected]>
|
|
Instead of loading the name again to detect '.' and '..', just use the
fact that we already had the masked last word available when as we
created the name hash. Which is exactly what we'd then test for.
Dealing with big-endian word ordering needs a bit of care, particularly
since we have the byte-at-a-time loop as a fallback that doesn't do BE
word loads. But not a big deal.
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Now that we clearly only care about the length of the name we just
parsed, we can simplify and clarify the interface to "name_hash()", and
move the actual nd->last field setting in there.
That makes everything simpler, and this way don't mix the hash and the
length together only to then immediately unmix them again.
We still eventually want the combined mixed "hashlen" for when we look
things up in the dentry cache, but inside link_path_walk() it's simpler
and clearer to just deal with the path component length.
Signed-off-by: Linus Torvalds <[email protected]>
|
|
This is one of those hot functions in path walking, and it's doing
things in just the wrong order that causes slightly unnecessary extra
work.
Move the name pointer update and the setting of 'nd->last' up a bit, so
that the (unlikely) filesystem-specific hashing can run on them in
place, instead of having to set up a copy on the stack and copy things
back and forth.
Because even when the hashing is not run, it causes the stack frame of
the function to be bigger to hold the unnecessary temporary copy.
This also means that we never then reference the full "hashlen" field
after calculating it, and can clarify the code with just using the
length part.
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Do the same optimization as x86-64: do __ffs() on the intermediate value
that found whether there is a zero byte, before we've actually computed
the final byte mask.
The logic is:
has_zero():
Check if the word has a zero byte in it, which indicates the end
of the loop, and prepare a value to be used for the rest of the
sequence.
The standard LE implementation just creates a word that has the
high bit set in each byte of the word that was zero.
Example: 0xaa00bbccdd00eeff -> 0x0080000000800000
prep_zero_mask():
Possibly do more prep to then clean up the initial fast result
from has_zero, so that it can be combined with another zero mask
with a simple logical "or" to create a final mask.
This is only used on big-endian machines that use a different
algorithm, and is a no-op here.
create_zero_mask():
This is "step 1" of creating the count and the mask, and is
meant for any common operations between the two.
In the old implementation, this actually created the zero mask,
that was then used for masking and for counting the number of
bits in the mask.
In the new implementation, this is a no-op.
count_zero():
This takes the mask bits, and counts the number of bytes before
the first zero byte.
In the old implementation, it counted the number of bits in the
final byte mask (which was the same as the C standard "find last
set bit" that uses the silly "starts at one" counting) and shifted
the value down by three.
In the new implementation, we know the intermediate mask isn't
zero, and it just does "find first set" with the sane semantics
without any off-by-one issues, and again shifts by three (which
also masks off the bit offset in the zero byte itself).
Example: 0x0080000000800000 -> 2
zero_bytemask():
This takes the mask bits, and turns it into an actual byte mask
of the bytes preceding the first zero byte.
In the old implementation, this was a no-op, because the work
had already been done by create_zero_mask().
In the new implementation, this does what create_zero_mask()
used to do.
Example: 0x0080000000800000 -> 0x000000000000ffff
The difference between the old and the new implementation is that
"count_zero()" ends up scheduling better because it is being done on a
value that is available earlier (before the final mask).
But more importantly, it can be implemented without the insane semantics
of the standard bit finding helpers that have the off-by-one issue and
have to special-case the zero mask situation.
On arm64, the new "count_zero()" ends up just "rbit + clz" plus the
shift right that then ends up being subsumed by the "add to final
length".
Signed-off-by: Linus Torvalds <[email protected]>
|
|
This switches x86-64 over to using 'tzcount' instead of the integer
multiply trick to turn the bytemask information into actual byte counts.
We even had a comment saying that a fast bit count instruction is better
than a multiply, but x86 bit counting has traditionally been
"questionably fast", and so avoiding it was the right thing back in the
days.
Now, on any half-way modern core, using bit counting is cheaper and
smaller than the large constant multiply, so let's just switch over.
Note that as part of switching over to counting bits, we also do it at a
different point. We used to create the byte count from the final byte
mask, but once you use the 'tzcount' instruction (aka 'bsf' on older
CPU's), you can actually count the leading zeroes using a value we have
available earlier.
In fact, we can just use the very first mask of bits that tells us
whether we have any zero bytes at all. The zero bytes in the word will
have the high bit set, so just doing 'tzcount' on that value and
dividing by 8 will give the number of bytes that precede the first NUL
character, which is exactly what we want.
Note also that the input value to the tzcount is by definition not zero,
since that is the condition that we already used to check the whole "do
we have any zero bytes at all". So we don't need to worry about the
legacy instruction behavior of pre-lzcount days when 'bsf' didn't have a
result for zero input.
The 32-bit code continues to use the bimple bit op trick that is faster
even on newer cores, but particularly on the older 32-bit-only ones.
Signed-off-by: Linus Torvalds <[email protected]>
|
|
This implements the runtime constant infrastructure for x86, allowing
the dcache d_hash() function to be generated using as a constant for
hash table address followed by shift by a constant of the hash index.
Signed-off-by: Linus Torvalds <[email protected]>
|
|
This adds the initial dummy support for 'runtime constants' for when
an architecture doesn't actually support an implementation of fixing
up said runtime constants.
This ends up being the fallback to just using the variables as regular
__ro_after_init variables, and changes the dcache d_hash() function to
use this model.
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Both __d_lookup_rcu() and __d_lookup_rcu_op_compare() have the full
'name_hash' value of the qstr that they want to look up, and mask it off
to just the low 32-bit hash before calling down to d_hash().
Other callers just load the 32-bit hash and pass it as the argument.
If we move the masking into d_hash() itself, it simplifies the two
callers that currently do the masking, and is a no-op for the other
cases. It doesn't actually change the generated code since the compiler
will inline d_hash() and see that the end result is the same.
[ Technically, since the parse tree changes, the code generation may not
be 100% the same, and for me on x86-64, this does result in gcc
switching the operands around for one 'cmpl' instruction. So not
necessarily the exact same code generation, but equivalent ]
However, this does encapsulate the 'd_hash()' operation more, and makes
the shift operation in particular be a "shift 32 bits right, return full
word". Which matches the instruction semantics on both x86-64 and arm64
better, since a 32-bit shift will clear the upper bits.
That makes the next step of introducing a "shift by runtime constant"
more obvious and generates the shift with no extraneous type masking.
Signed-off-by: Linus Torvalds <[email protected]>
|
|
This generates noticeably better code since we don't need to test the
error register etc, the exception just jumps to the error handling
directly.
Unlike get_user(), there's no need to worry about old compilers. All
supported compilers support the regular non-output 'asm goto', as
pointed out by Nathan Chancellor.
Signed-off-by: Linus Torvalds <[email protected]>
|
|
This generates noticeably better code with compilers that support it,
since we don't need to test the error register etc, the exception just
jumps to the error handling directly.
Note that this also marks SW_TTBR0_PAN incompatible with KCSAN support,
since KCSAN wants to save and restore the user access state.
KCSAN and SW_TTBR0_PAN were probably always incompatible, but it became
obvious only when implementing the unsafe user access functions. At
that point the default empty user_access_save/restore() functions
weren't provided by the default fallback functions.
Signed-off-by: Linus Torvalds <[email protected]>
|
|
use GPIO0_A2 as interrupt pin for PMIC. GPIO2_A6 was used for
pre-production board.
Fixes: b918e81f2145 ("arm64: dts: rockchip: rk3328: Add Radxa ROCK Pi E")
Signed-off-by: FUKAUMI Naoki <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Heiko Stuebner <[email protected]>
|
|
make allmodconfig && make W=1 C=1 reports:
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/hte/hte-tegra194-test.o
Add the missing invocation of the MODULE_DESCRIPTION() macro.
Signed-off-by: Jeff Johnson <[email protected]>
Acked-by: Dipen Patel <[email protected]>
Signed-off-by: Dipen Patel <[email protected]>
|
|
Change hybrid scaling factor for Lunar Lake.
Scaling factor is 1.15 for P-cores compared to E-cores.
Signed-off-by: Srinivas Pandruvada <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Rafael J. Wysocki <[email protected]>
|
|
Arrow Lake uses the same scaling factor as Meteor Lake, so reuse the
same scaling factor.
Signed-off-by: Srinivas Pandruvada <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Rafael J. Wysocki <[email protected]>
|
|
On MMUv2 cores the linear window is only relevant when starting the FE,
before the MMU has been activated. Once the MMU is active, all accesses
are translated with no way to bypass the MMU via the linear window. Thus
TS ignoring the linear window offset is not an issue on cores with MMUv2
present and there is no need to disable TS when we need to move the
linear window.
Signed-off-by: Lucas Stach <[email protected]>
Tested-by: Joao Paulo Goncalves <[email protected]>
|
|
On some hardware (such at the GC7000 rev 6009), these registers need to be
read twice to return the correct value. Hide that in gpu_read().
Signed-off-by: Derek Foreman <[email protected]>
Signed-off-by: Lucas Stach <[email protected]>
|
|
Commit 77acc6b55ae4 ("riscv: add support for kernel-mode FPU") and
commit a28e4b672f04 ("drm/amd/display: use ARCH_HAS_KERNEL_FPU_SUPPORT")
enabled support for CONFIG_DRM_AMD_DC_FP with RISC-V. Unfortunately,
this exposed -Wframe-larger-than warnings (which become fatal with
CONFIG_WERROR=y) when building ARCH=riscv allmodconfig with clang:
drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn32/display_mode_vba_32.c:58:13: error: stack frame size (2448) exceeds limit (2048) in 'DISPCLKDPPCLKDCFCLKDeepSleepPrefetchParametersWatermarksAndPerformanceCalculation' [-Werror,-Wframe-larger-than]
58 | static void DISPCLKDPPCLKDCFCLKDeepSleepPrefetchParametersWatermarksAndPerformanceCalculation(
| ^
1 error generated.
Many functions in this file use a large number of parameters, which must
be passed on the stack at a certain pointer due to register exhaustion,
which can cause high stack usage when inlining and issues with stack
slot analysis get involved. While the compiler can and should do better
(as GCC uses less than half the amount of stack space for the same
function), it is not as simple as a fix as adjusting the functions not
to take a large number of parameters.
Unfortunately, modifying these files to avoid the problem is a difficult
to justify approach because any revisions to the files in the kernel
tree never make it back to the original source (so copies of the code
for newer hardware revisions just reintroduce the issue) and the files
are hard to read/modify due to being "gcc-parsable HW gospel, coming
straight from HW engineers".
Avoid building the problematic code for RISC-V by modifying the existing
condition for arm64 that exists for the same reason. Factor out the
logical not to make the condition a little more readable naturally.
Fixes: a28e4b672f04 ("drm/amd/display: use ARCH_HAS_KERNEL_FPU_SUPPORT")
Reported-by: Palmer Dabbelt <[email protected]>
Closes: https://lore.kernel.org/[email protected]/
Reviewed-by: Harry Wentland <[email protected]>
Signed-off-by: Nathan Chancellor <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
[WHY]
Empty SST TUs are illegal to transmit over a USB4 DP tunnel.
Current policy is to configure stream encoder to pack 2 pixels per pclk
even when ODM combine is not in use, allowing seamless dynamic ODM
reconfiguration. However, in extreme edge cases where average pixel
count per TU is less than 2, this can lead to unexpected empty TU
generation during compliance testing. For example, VIC 1 with a 1xHBR3
link configuration will average 1.98 pix/TU.
[HOW]
Calculate average pixel count per TU, and block 2 pixels per clock if
endpoint is a DPIA tunnel and pixel clock is low enough that we will
never require 2:1 ODM combine.
Cc: [email protected] # 6.6+
Reviewed-by: Wenjing Liu <[email protected]>
Acked-by: Hamza Mahfooz <[email protected]>
Signed-off-by: Michael Strauss <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
[Why & How]
Current DRAM setting would cause underflow on customer platform.
Modify dram_clock_change_latency_us from 11.72 to 34.0 us as per recommendation from HW team
Reviewed-by: Nicholas Kazlauskas <[email protected]>
Acked-by: Zaeem Mohamed <[email protected]>
Signed-off-by: Paul Hsieh <[email protected]>
Tested-by: Daniel Wheeler <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
[Why]
Intermittent underflow observed when using 4k144 display on
dcn351
[How]
Update dram_clock_change_latency_us from 11.72us to 34us
Reviewed-by: Nicholas Kazlauskas <[email protected]>
Acked-by: Zaeem Mohamed <[email protected]>
Signed-off-by: Daniel Miess <[email protected]>
Tested-by: Daniel Wheeler <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
This reverts commit b8c415e3bf98 ("drm/amdgpu: take runtime pm reference
when we attach a buffer") and commit 425285d39afd ("drm/amdgpu: add amdgpu
runpm usage trace for separate funcs").
Taking a runtime pm reference for DMA-buf is actually completely
unnecessary and even dangerous.
The problem is that calling pm_runtime_get_sync() from the DMA-buf
callbacks is illegal because we have the reservation locked here
which is also taken during resume. So this would deadlock.
When the buffer is in GTT it is still accessible even when the GPU
is powered down and when it is in VRAM the buffer gets migrated to
GTT before powering down.
The only use case which would make it mandatory to keep the runtime
pm reference would be if we pin the buffer into VRAM, and that's not
something we currently do.
v2: improve the commit message
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
CC: [email protected]
|
|
To achieve full occupancy CP hardware needs to know if CUs in SE are
symmetrically or asymmetrically harvested
v2: Reset is_symmetric_cus for each loop
Signed-off-by: Harish Kasiviswanathan <[email protected]>
Acked-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
We can't read/write to DCN registers while in IPS. Since, that can cause
the system to hang. So, before proceeding with the access in that
scenario, force the system out of IPS.
Cc: [email protected] # 6.6+
Reviewed-by: Roman Li <[email protected]>
Signed-off-by: Hamza Mahfooz <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Which method is used to flush tlb does not depend on whether a reset is
in progress or not. We should skip flush altogether if the GPU will get
reset. So put both path under reset_domain read lock.
Signed-off-by: Yunxiang Li <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
CC: [email protected]
|
|
[Why]
Disable idle optimization for each atomic commit is unnecessary,
and can lead to a potential race condition.
[How]
Remove idle optimization check from amdgpu_dm_atomic_commit_tail()
Fixes: 196107eb1e15 ("drm/amd/display: Add IPS checks before dcn register access")
Cc: [email protected]
Reviewed-by: Hamza Mahfooz <[email protected]>
Acked-by: Roman Li <[email protected]>
Signed-off-by: Roman Li <[email protected]>
Tested-by: Daniel Wheeler <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Streamline the 8-byte case and drop the special handling. Use a macro
which hides the exception handling.
No functional changes.
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Link: https://lore.kernel.org/r/CAHk-=whYb2L_atsRk9pBiFiVLGe5wNZLHhRinA69yu6FiKvDsw@mail.gmail.com
|