aboutsummaryrefslogtreecommitdiff
path: root/arch/x86/kernel/static_call.c
AgeCommit message (Collapse)AuthorFilesLines
2024-01-10x86/bugs: Rename CONFIG_RETHUNK => CONFIG_MITIGATION_RETHUNKBreno Leitao1-1/+1
Step 10/10 of the namespace unification of CPU mitigations related Kconfig options. [ mingo: Added one more case. ] Suggested-by: Josh Poimboeuf <[email protected]> Signed-off-by: Breno Leitao <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Acked-by: Josh Poimboeuf <[email protected]> Cc: Linus Torvalds <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-08-17x86/static_call: Fix __static_call_fixup()Peter Zijlstra1-0/+13
Christian reported spurious module load crashes after some of Song's module memory layout patches. Turns out that if the very last instruction on the very last page of the module is a 'JMP __x86_return_thunk' then __static_call_fixup() will trip a fault and die. And while the module rework made this slightly more likely to happen, it's always been possible. Fixes: ee88d363d156 ("x86,static_call: Use alternative RET encoding") Reported-by: Christian Bricart <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Acked-by: Josh Poimboeuf <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2023-01-31x86/static_call: Add support for Jcc tail-callsPeter Zijlstra1-3/+47
Clang likes to create conditional tail calls like: 0000000000000350 <amd_pmu_add_event>: 350: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) 351: R_X86_64_NONE __fentry__-0x4 355: 48 83 bf 20 01 00 00 00 cmpq $0x0,0x120(%rdi) 35d: 0f 85 00 00 00 00 jne 363 <amd_pmu_add_event+0x13> 35f: R_X86_64_PLT32 __SCT__amd_pmu_branch_add-0x4 363: e9 00 00 00 00 jmp 368 <amd_pmu_add_event+0x18> 364: R_X86_64_PLT32 __x86_return_thunk-0x4 Where 0x35d is a static call site that's turned into a conditional tail-call using the Jcc class of instructions. Teach the in-line static call text patching about this. Notably, since there is no conditional-ret, in that case patch the Jcc to point at an empty stub function that does the ret -- or the return thunk when needed. Reported-by: "Erhard F." <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Reviewed-by: Masami Hiramatsu (Google) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-10-17static_call: Add call depth tracking supportPeter Zijlstra1-0/+1
When indirect calls are switched to direct calls then it has to be ensured that the call target is not the function, but the call thunk when call depth tracking is enabled. But static calls are available before call thunks have been set up. Ensure a second run through the static call patching code after call thunks have been created. When call thunks are not enabled this has no side effects. Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-10-17x86/returnthunk: Allow different return thunksPeter Zijlstra1-1/+1
In preparation for call depth tracking on Intel SKL CPUs, make it possible to patch in a SKL specific return thunk. Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-07-12x86/static_call: Serialize __static_call_fixup() properlyThomas Gleixner1-5/+8
__static_call_fixup() invokes __static_call_transform() without holding text_mutex, which causes lockdep to complain in text_poke_bp(). Adding the proper locking cures that, but as this is either used during early boot or during module finalizing, it's not required to use text_poke_bp(). Add an argument to __static_call_transform() which tells it to use text_poke_early() for it. Fixes: ee88d363d156 ("x86,static_call: Use alternative RET encoding") Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Borislav Petkov <[email protected]>
2022-06-29x86/retbleed: Add fine grained Kconfig knobsPeter Zijlstra1-1/+1
Do fine-grained Kconfig for all the various retbleed parts. NOTE: if your compiler doesn't support return thunks this will silently 'upgrade' your mitigation to IBPB, you might not like this. Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Borislav Petkov <[email protected]>
2022-06-27x86,static_call: Use alternative RET encodingPeter Zijlstra1-2/+38
In addition to teaching static_call about the new way to spell 'RET', there is an added complication in that static_call() is allowed to rewrite text before it is known which particular spelling is required. In order to deal with this; have a static_call specific fixup in the apply_return() 'alternative' patching routine that will rewrite the static_call trampoline to match the definite sequence. This in turn creates the problem of uniquely identifying static call trampolines. Currently trampolines are 8 bytes, the first 5 being the jmp.d32/ret sequence and the final 3 a byte sequence that spells out 'SCT'. This sequence is used in __static_call_validate() to ensure it is patching a trampoline and not a random other jmp.d32. That is, false-positives shouldn't be plenty, but aren't a big concern. OTOH the new __static_call_fixup() must not have false-positives, and 'SCT' decodes to the somewhat weird but semi plausible sequence: push %rbx rex.XB push %r12 Additionally, there are SLS concerns with immediate jumps. Combined it seems like a good moment to change the signature to a single 3 byte trap instruction that is unique to this usage and will not ever get generated by accident. As such, change the signature to: '0x0f, 0xb9, 0xcc', which decodes to: ud1 %esp, %ecx Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Josh Poimboeuf <[email protected]> Signed-off-by: Borislav Petkov <[email protected]>
2022-04-05x86,static_call: Fix __static_call_return0 for i386Peter Zijlstra1-3/+2
Paolo reported that the instruction sequence that is used to replace: call __static_call_return0 namely: 66 66 48 31 c0 data16 data16 xor %rax,%rax decodes to something else on i386, namely: 66 66 48 data16 dec %ax 31 c0 xor %eax,%eax Which is a nonsensical sequence that happens to have the same outcome. *However* an important distinction is that it consists of 2 instructions which is a problem when the thing needs to be overwriten with a regular call instruction again. As such, replace the instruction with something that decodes the same on both i386 and x86_64. Fixes: 3f2a8fc4b15d ("static_call/x86: Add __static_call_return0()") Reported-by: Paolo Bonzini <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2021-12-09x86: Add straight-line-speculation mitigationPeter Zijlstra1-2/+3
Make use of an upcoming GCC feature to mitigate straight-line-speculation for x86: https://gcc.gnu.org/g:53a643f8568067d7700a9f2facc8ba39974973d3 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102952 https://bugs.llvm.org/show_bug.cgi?id=52323 It's built tested on x86_64-allyesconfig using GCC-12 and GCC-11. Maintenance overhead of this should be fairly low due to objtool validation. Size overhead of all these additional int3 instructions comes to: text data bss dec hex filename 22267751 6933356 2011368 31212475 1dc43bb defconfig-build/vmlinux 22804126 6933356 1470696 31208178 1dc32f2 defconfig-build/vmlinux.sls Or roughly 2.4% additional text. Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-11-11static_call,x86: Robustify trampoline patchingPeter Zijlstra1-4/+10
Add a few signature bytes after the static call trampoline and verify those bytes match before patching the trampoline. This avoids patching random other JMPs (such as CFI jump-table entries) instead. These bytes decode as: d: 53 push %rbx e: 43 54 rex.XB push %r12 And happen to spell "SCT". Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2021-03-15x86: Remove dynamic NOP selectionPeter Zijlstra1-2/+2
This ensures that a NOP is a NOP and not a random other instruction that is also a NOP. It allows simplification of dynamic code patching that wants to verify existing code before writing new instructions (ftrace, jump_label, static_call, etc..). Differentiating on NOPs is not a feature. This pessimises 32bit (DONTCARE) and 32bit on 64bit CPUs (CARELESS). 32bit is not a performance target. Everything x86_64 since AMD K10 (2007) and Intel IvyBridge (2012) is fine with using NOPL (as opposed to prefix NOP). And per FEATURE_NOPL being required for x86_64, all x86_64 CPUs can use NOPL. So stop caring about NOPs, simplify things and get on with life. [ The problem seems to be that some uarchs can only decode NOPL on a single front-end port while others have severe decode penalties for excessive prefixes. All modern uarchs can handle both, except Atom, which has prefix penalties. ] [ Also, much doubt you can actually measure any of this on normal workloads. ] After this, FEATURE_NOPL is unused except for required-features for x86_64. FEATURE_K8 is only used for PTI. [ bp: Kernel build measurements showed ~0.3s slowdown on Sandybridge which is hardly a slowdown. Get rid of X86_FEATURE_K7, while at it. ] Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Acked-by: Alexei Starovoitov <[email protected]> # bpf Acked-by: Linus Torvalds <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2021-02-17static_call/x86: Add __static_call_return0()Peter Zijlstra1-2/+15
Provide a stub function that return 0 and wire up the static call site patching to replace the CALL with a single 5 byte instruction that clears %RAX, the return value register. The function can be cast to any function pointer type that has a single %RAX return (including pointers). Also provide a version that returns an int for convenience. We are clearing the entire %RAX register in any case, whether the return value is 32 or 64 bits, since %RAX is always a scratch register anyway. Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Frederic Weisbecker <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-09-01static_call: Allow early initPeter Zijlstra1-1/+4
In order to use static_call() to wire up x86_pmu, we need to initialize earlier, specifically before memory allocation works; copy some of the tricks from jump_label to enable this. Primarily we overload key->next to store a sites pointer when there are no modules, this avoids having to use kmalloc() to initialize the sites and allows us to run much earlier. Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Reviewed-by: Steven Rostedt (VMware) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-09-01static_call: Add some validationPeter Zijlstra1-2/+26
Verify the text we're about to change is as we expect it to be. Requested-by: Steven Rostedt <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-09-01static_call: Handle tail-callsPeter Zijlstra1-3/+18
GCC can turn our static_call(name)(args...) into a tail call, in which case we get a JMP.d32 into the trampoline (which then does a further tail-call). Teach objtool to recognise and mark these in .static_call_sites and adjust the code patching to deal with this. Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Cc: Linus Torvalds <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-09-01static_call: Add static_call_cond()Peter Zijlstra1-10/+32
Extend the static_call infrastructure to optimize the following common pattern: if (func_ptr) func_ptr(args...) For the trampoline (which is in effect a tail-call), we patch the JMP.d32 into a RET, which then directly consumes the trampoline call. For the in-line sites we replace the CALL with a NOP5. NOTE: this is 'obviously' limited to functions with a 'void' return type. NOTE: DEFINE_STATIC_COND_CALL() only requires a typename, as opposed to a full function. Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Cc: Linus Torvalds <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-09-01x86/static_call: Add inline static call implementation for x86-64Josh Poimboeuf1-0/+3
Add the inline static call implementation for x86-64. The generated code is identical to the out-of-line case, except we move the trampoline into it's own section. Objtool uses the trampoline naming convention to detect all the call sites. It then annotates those call sites in the .static_call_sites section. During boot (and module init), the call sites are patched to call directly into the destination function. The temporary trampoline is then no longer used. [peterz: merged trampolines, put trampoline in section] Signed-off-by: Josh Poimboeuf <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Cc: Linus Torvalds <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-09-01x86/static_call: Add out-of-line static call implementationJosh Poimboeuf1-0/+31
Add the x86 out-of-line static call implementation. For each key, a permanent trampoline is created which is the destination for all static calls for the given key. The trampoline has a direct jump which gets patched by static_call_update() when the destination function changes. [peterz: fixed trampoline, rewrote patching code] Signed-off-by: Josh Poimboeuf <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Cc: Linus Torvalds <[email protected]> Link: https://lore.kernel.org/r/[email protected]