Age | Commit message (Collapse) | Author | Files | Lines |
|
A clang build reported an (obvious) double CLAC while a GCC build did not;
it turns out that objtool only re-visits instructions if the first visit
was with AC=0. If OTOH the first visit was with AC=1, it completely ignores
any subsequent visit, even when it has AC=0.
Fix this by using a visited mask instead of a boolean, and (explicitly)
mark the AC state.
$ ./objtool check -b --no-fp --retpoline --uaccess drivers/gpu/drm/i915/gem/i915_gem_execbuffer.o
drivers/gpu/drm/i915/gem/i915_gem_execbuffer.o: warning: objtool: .altinstr_replacement+0x22: redundant UACCESS disable
drivers/gpu/drm/i915/gem/i915_gem_execbuffer.o: warning: objtool: eb_copy_relocations.isra.34()+0xea: (alt)
drivers/gpu/drm/i915/gem/i915_gem_execbuffer.o: warning: objtool: .altinstr_replacement+0xffffffffffffffff: (branch)
drivers/gpu/drm/i915/gem/i915_gem_execbuffer.o: warning: objtool: eb_copy_relocations.isra.34()+0xd9: (alt)
drivers/gpu/drm/i915/gem/i915_gem_execbuffer.o: warning: objtool: eb_copy_relocations.isra.34()+0xb2: (branch)
drivers/gpu/drm/i915/gem/i915_gem_execbuffer.o: warning: objtool: eb_copy_relocations.isra.34()+0x39: (branch)
drivers/gpu/drm/i915/gem/i915_gem_execbuffer.o: warning: objtool: eb_copy_relocations.isra.34()+0x0: <=== (func)
Reported-by: Josh Poimboeuf <[email protected]>
Reported-by: Thomas Gleixner <[email protected]>
Reported-by: Sedat Dilek <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Nathan Chancellor <[email protected]>
Tested-by: Nick Desaulniers <[email protected]>
Tested-by: Sedat Dilek <[email protected]>
Link: https://github.com/ClangBuiltLinux/linux/issues/617
Link: https://lkml.kernel.org/r/5359166aad2d53f3145cd442d83d0e5115e0cd17.1564007838.git.jpoimboe@redhat.com
|
|
A Clang-built kernel is showing the following warning:
arch/x86/kernel/platform-quirks.o: warning: objtool: x86_early_init_platform_quirks()+0x84: unreachable instruction
That corresponds to this code:
7e: 0f 85 00 00 00 00 jne 84 <x86_early_init_platform_quirks+0x84>
80: R_X86_64_PC32 __x86_indirect_thunk_r11-0x4
84: c3 retq
This is a conditional retpoline sibling call, which is now possible
thanks to retpolines. Objtool hasn't seen that before. It's
incorrectly interpreting the conditional jump as an unconditional
dynamic jump.
Reported-by: Nick Desaulniers <[email protected]>
Signed-off-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Nick Desaulniers <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/30d4c758b267ef487fb97e6ecb2f148ad007b554.1563413318.git.jpoimboe@redhat.com
|
|
This makes it easier to add new instruction types. Also it's hopefully
more robust since the compiler should warn about out-of-range enums.
Signed-off-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Nick Desaulniers <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/0740e96af0d40e54cfd6a07bf09db0fbd10793cd.1563413318.git.jpoimboe@redhat.com
|
|
In one rare case, Clang generated the following code:
5ca: 83 e0 21 and $0x21,%eax
5cd: b9 04 00 00 00 mov $0x4,%ecx
5d2: ff 24 c5 00 00 00 00 jmpq *0x0(,%rax,8)
5d5: R_X86_64_32S .rodata+0x38
which uses the corresponding jump table relocations:
000000000038 000200000001 R_X86_64_64 0000000000000000 .text + 834
000000000040 000200000001 R_X86_64_64 0000000000000000 .text + 5d9
000000000048 000200000001 R_X86_64_64 0000000000000000 .text + b96
000000000050 000200000001 R_X86_64_64 0000000000000000 .text + b96
000000000058 000200000001 R_X86_64_64 0000000000000000 .text + b96
000000000060 000200000001 R_X86_64_64 0000000000000000 .text + b96
000000000068 000200000001 R_X86_64_64 0000000000000000 .text + b96
000000000070 000200000001 R_X86_64_64 0000000000000000 .text + b96
000000000078 000200000001 R_X86_64_64 0000000000000000 .text + b96
000000000080 000200000001 R_X86_64_64 0000000000000000 .text + b96
000000000088 000200000001 R_X86_64_64 0000000000000000 .text + b96
000000000090 000200000001 R_X86_64_64 0000000000000000 .text + b96
000000000098 000200000001 R_X86_64_64 0000000000000000 .text + b96
0000000000a0 000200000001 R_X86_64_64 0000000000000000 .text + b96
0000000000a8 000200000001 R_X86_64_64 0000000000000000 .text + b96
0000000000b0 000200000001 R_X86_64_64 0000000000000000 .text + b96
0000000000b8 000200000001 R_X86_64_64 0000000000000000 .text + b96
0000000000c0 000200000001 R_X86_64_64 0000000000000000 .text + b96
0000000000c8 000200000001 R_X86_64_64 0000000000000000 .text + b96
0000000000d0 000200000001 R_X86_64_64 0000000000000000 .text + b96
0000000000d8 000200000001 R_X86_64_64 0000000000000000 .text + b96
0000000000e0 000200000001 R_X86_64_64 0000000000000000 .text + b96
0000000000e8 000200000001 R_X86_64_64 0000000000000000 .text + b96
0000000000f0 000200000001 R_X86_64_64 0000000000000000 .text + b96
0000000000f8 000200000001 R_X86_64_64 0000000000000000 .text + b96
000000000100 000200000001 R_X86_64_64 0000000000000000 .text + b96
000000000108 000200000001 R_X86_64_64 0000000000000000 .text + b96
000000000110 000200000001 R_X86_64_64 0000000000000000 .text + b96
000000000118 000200000001 R_X86_64_64 0000000000000000 .text + b96
000000000120 000200000001 R_X86_64_64 0000000000000000 .text + b96
000000000128 000200000001 R_X86_64_64 0000000000000000 .text + b96
000000000130 000200000001 R_X86_64_64 0000000000000000 .text + b96
000000000138 000200000001 R_X86_64_64 0000000000000000 .text + 82f
000000000140 000200000001 R_X86_64_64 0000000000000000 .text + 828
Since %eax was masked with 0x21, only the first two and the last two
entries are possible.
Objtool doesn't actually emulate all the code, so it isn't smart enough
to know that all the middle entries aren't reachable. They point to the
NOP padding area after the end of the function, so objtool seg faulted
when it tried to dereference a NULL insn->func.
After this fix, objtool still gives an "unreachable" error because it
stops reading the jump table when it encounters the bad addresses:
/home/jpoimboe/objtool-tests/adm1275.o: warning: objtool: adm1275_probe()+0x828: unreachable instruction
While the above code is technically correct, it's very wasteful of
memory -- it uses 34 jump table entries when only 4 are needed. It's
also not possible for objtool to validate this type of switch table
because the unused entries point outside the function and objtool has no
way of determining if that's intentional. Hopefully the Clang folks can
fix it.
Reported-by: Arnd Bergmann <[email protected]>
Signed-off-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Nick Desaulniers <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/a9db88eec4f1ca089e040989846961748238b6d8.1563413318.git.jpoimboe@redhat.com
|
|
This fixes objtool for both a GCC issue and a Clang issue:
1) GCC issue:
kernel/bpf/core.o: warning: objtool: ___bpf_prog_run()+0x8d5: sibling call from callable instruction with modified stack frame
With CONFIG_RETPOLINE=n, GCC is doing the following optimization in
___bpf_prog_run().
Before:
select_insn:
jmp *jumptable(,%rax,8)
...
ALU64_ADD_X:
...
jmp select_insn
ALU_ADD_X:
...
jmp select_insn
After:
select_insn:
jmp *jumptable(, %rax, 8)
...
ALU64_ADD_X:
...
jmp *jumptable(, %rax, 8)
ALU_ADD_X:
...
jmp *jumptable(, %rax, 8)
This confuses objtool. It has never seen multiple indirect jump
sites which use the same jump table.
For GCC switch tables, the only way of detecting the size of a table
is by continuing to scan for more tables. The size of the previous
table can only be determined after another switch table is found, or
when the scan reaches the end of the function.
That logic was reused for C jump tables, and was based on the
assumption that each jump table only has a single jump site. The
above optimization breaks that assumption.
2) Clang issue:
drivers/usb/misc/sisusbvga/sisusb.o: warning: objtool: sisusb_write_mem_bulk()+0x588: can't find switch jump table
With clang 9, code can be generated where a function contains two
indirect jump instructions which use the same switch table.
The fix is the same for both issues: split the jump table parsing into
two passes.
In the first pass, locate the heads of all switch tables for the
function and mark their locations.
In the second pass, parse the switch tables and add them.
Fixes: e55a73251da3 ("bpf: Fix ORC unwinding in non-JIT BPF code")
Reported-by: Randy Dunlap <[email protected]>
Reported-by: Arnd Bergmann <[email protected]>
Signed-off-by: Jann Horn <[email protected]>
Signed-off-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Nick Desaulniers <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/e995befaada9d4d8b2cf788ff3f566ba900d2b4d.1563413318.git.jpoimboe@redhat.com
Co-developed-by: Josh Poimboeuf <[email protected]>
|
|
Now that C jump tables are supported, call them "jump tables" instead of
"switch tables". Also rename some other variables, add comments, and
simplify the code flow a bit.
Signed-off-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Nick Desaulniers <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/cf951b0c0641628e0b9b81f7ceccd9bcabcb4bd8.1563413318.git.jpoimboe@redhat.com
|
|
Simplify the sibling call detection logic a bit.
Signed-off-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Nick Desaulniers <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/8357dbef9e7f5512e76bf83a76c81722fc09eb5e.1563413318.git.jpoimboe@redhat.com
|
|
Even calls to __noreturn functions need the frame pointer setup first.
Such functions often dump the stack.
Signed-off-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Nick Desaulniers <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/aed62fbd60e239280218be623f751a433658e896.1563413318.git.jpoimboe@redhat.com
|
|
dead_end_function() can no longer return an error. Simplify its
interface by making it return boolean.
Signed-off-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Nick Desaulniers <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/9e6679610768fb6e6c51dca23f7d4d0c03b0c910.1563413318.git.jpoimboe@redhat.com
|
|
All callable functions should have an ELF size.
Signed-off-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Nick Desaulniers <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/03d429c4fa87829c61c5dc0e89652f4d9efb62f1.1563413318.git.jpoimboe@redhat.com
|
|
- Add an alias check in validate_functions(). With this change, aliases
no longer need uaccess_safe set.
- Add an alias check in decode_instructions(). With this change, the
"if (!insn->func)" check is no longer needed.
- Don't create aliases for zero-length functions, as it can have
unexpected results. The next patch will spit out a warning for
zero-length functions anyway.
Signed-off-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Nick Desaulniers <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/26a99c31426540f19c9a58b9e10727c385a147bc.1563413318.git.jpoimboe@redhat.com
|
|
If 'insn->func' is NULL, objtool skips some important checks, including
sibling call validation. So if some .fixup code does an invalid sibling
call, objtool ignores it.
Treat all code branches (including alts) as part of the original
function by keeping track of the original func value from
validate_functions().
This improves the usefulness of some clang function fallthrough
warnings, and exposes some additional kernel bugs in the process.
Signed-off-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Nick Desaulniers <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/505df630f33c9717e1ccde6e4b64c5303135c25f.1563413318.git.jpoimboe@redhat.com
|
|
After an objtool improvement, it's reporting that __memcpy_mcsafe() is
calling mcsafe_handle_tail() with AC=1:
arch/x86/lib/memcpy_64.o: warning: objtool: .fixup+0x13: call to mcsafe_handle_tail() with UACCESS enabled
arch/x86/lib/memcpy_64.o: warning: objtool: __memcpy_mcsafe()+0x34: (alt)
arch/x86/lib/memcpy_64.o: warning: objtool: __memcpy_mcsafe()+0xb: (branch)
arch/x86/lib/memcpy_64.o: warning: objtool: __memcpy_mcsafe()+0x0: <=== (func)
mcsafe_handle_tail() is basically an extension of __memcpy_mcsafe(), so
AC=1 is supposed to be set. Add mcsafe_handle_tail() to the uaccess
safe list.
Signed-off-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Nick Desaulniers <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/035c38f7eac845281d3c3d36749144982e06e58c.1563413318.git.jpoimboe@redhat.com
|
|
On x86-64, with CONFIG_RETPOLINE=n, GCC's "global common subexpression
elimination" optimization results in ___bpf_prog_run()'s jumptable code
changing from this:
select_insn:
jmp *jumptable(, %rax, 8)
...
ALU64_ADD_X:
...
jmp *jumptable(, %rax, 8)
ALU_ADD_X:
...
jmp *jumptable(, %rax, 8)
to this:
select_insn:
mov jumptable, %r12
jmp *(%r12, %rax, 8)
...
ALU64_ADD_X:
...
jmp *(%r12, %rax, 8)
ALU_ADD_X:
...
jmp *(%r12, %rax, 8)
The jumptable address is placed in a register once, at the beginning of
the function. The function execution can then go through multiple
indirect jumps which rely on that same register value. This has a few
issues:
1) Objtool isn't smart enough to be able to track such a register value
across multiple recursive indirect jumps through the jump table.
2) With CONFIG_RETPOLINE enabled, this optimization actually results in
a small slowdown. I measured a ~4.7% slowdown in the test_bpf
"tcpdump port 22" selftest.
This slowdown is actually predicted by the GCC manual:
Note: When compiling a program using computed gotos, a GCC
extension, you may get better run-time performance if you
disable the global common subexpression elimination pass by
adding -fno-gcse to the command line.
So just disable the optimization for this function.
Fixes: e55a73251da3 ("bpf: Fix ORC unwinding in non-JIT BPF code")
Reported-by: Randy Dunlap <[email protected]>
Signed-off-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Acked-by: Alexei Starovoitov <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/30c3ca29ba037afcbd860a8672eef0021addf9fe.1563413318.git.jpoimboe@redhat.com
|
|
The same getuser/putuser error paths are used regardless of whether AC
is set. In non-exception failure cases, this results in an unnecessary
CLAC.
Fixes the following warnings:
arch/x86/lib/getuser.o: warning: objtool: .altinstr_replacement+0x18: redundant UACCESS disable
arch/x86/lib/putuser.o: warning: objtool: .altinstr_replacement+0x18: redundant UACCESS disable
Signed-off-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/bc14ded2755ae75bd9010c446079e113dbddb74b.1563413318.git.jpoimboe@redhat.com
|
|
After adding mcsafe_handle_tail() to the objtool uaccess safe list,
objtool reports:
arch/x86/lib/usercopy_64.o: warning: objtool: mcsafe_handle_tail()+0x0: call to __fentry__() with UACCESS enabled
With SMAP, this function is called with AC=1, so it needs to be careful
about which functions it calls. Disable the ftrace entry hook, which
can potentially pull in a lot of extra code.
Signed-off-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/8e13d6f0da1c8a3f7603903da6cbf6d582bbfe10.1563413318.git.jpoimboe@redhat.com
|
|
After an objtool improvement, it's complaining about the CLAC in
copy_user_handle_tail():
arch/x86/lib/copy_user_64.o: warning: objtool: .altinstr_replacement+0x12: redundant UACCESS disable
arch/x86/lib/copy_user_64.o: warning: objtool: copy_user_handle_tail()+0x6: (alt)
arch/x86/lib/copy_user_64.o: warning: objtool: copy_user_handle_tail()+0x2: (alt)
arch/x86/lib/copy_user_64.o: warning: objtool: copy_user_handle_tail()+0x0: <=== (func)
copy_user_handle_tail() is incorrectly marked as a callable function, so
objtool is rightfully concerned about the CLAC with no corresponding
STAC.
Remove the ELF function annotation. The copy_user_handle_tail() code
path is already verified by objtool because it's jumped to by other
callable asm code (which does the corresponding STAC).
Signed-off-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/6b6e436774678b4b9873811ff023bd29935bee5b.1563413318.git.jpoimboe@redhat.com
|
|
After an objtool improvement, it complains about the fact that
start_cpu0() jumps to code which has an LRET instruction.
arch/x86/kernel/head_64.o: warning: objtool: .head.text+0xe4: unsupported instruction in callable function
Technically, start_cpu0() is callable, but it acts nothing like a
callable function. Prevent objtool from treating it like one by
removing its ELF function annotation.
Signed-off-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/6b1b4505fcb90571a55fa1b52d71fb458ca24454.1563413318.git.jpoimboe@redhat.com
|
|
Fix the following warnings:
arch/x86/entry/thunk_64.o: warning: objtool: trace_hardirqs_on_thunk() is missing an ELF size annotation
arch/x86/entry/thunk_64.o: warning: objtool: trace_hardirqs_off_thunk() is missing an ELF size annotation
arch/x86/entry/thunk_64.o: warning: objtool: lockdep_sys_exit_thunk() is missing an ELF size annotation
Signed-off-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/89c97adc9f6cc44a0f5d03cde6d0357662938909.1563413318.git.jpoimboe@redhat.com
|
|
After making a change to improve objtool's sibling call detection, it
started showing the following warning:
arch/x86/kvm/vmx/nested.o: warning: objtool: .fixup+0x15: sibling call from callable instruction with modified stack frame
The problem is the ____kvm_handle_fault_on_reboot() macro. It does a
fake call by pushing a fake RIP and doing a jump. That tricks the
unwinder into printing the function which triggered the exception,
rather than the .fixup code.
Instead of the hack to make it look like the original function made the
call, just change the macro so that the original function actually does
make the call. This allows removal of the hack, and also makes objtool
happy.
I triggered a vmx instruction exception and verified that the stack
trace is still sane:
kernel BUG at arch/x86/kvm/x86.c:358!
invalid opcode: 0000 [#1] SMP PTI
CPU: 28 PID: 4096 Comm: qemu-kvm Not tainted 5.2.0+ #16
Hardware name: Lenovo THINKSYSTEM SD530 -[7X2106Z000]-/-[7X2106Z000]-, BIOS -[TEE113Z-1.00]- 07/17/2017
RIP: 0010:kvm_spurious_fault+0x5/0x10
Code: 00 00 00 00 00 8b 44 24 10 89 d2 45 89 c9 48 89 44 24 10 8b 44 24 08 48 89 44 24 08 e9 d4 40 22 00 0f 1f 40 00 0f 1f 44 00 00 <0f> 0b 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 55 49 89 fd 41
RSP: 0018:ffffbf91c683bd00 EFLAGS: 00010246
RAX: 000061f040000000 RBX: ffff9e159c77bba0 RCX: ffff9e15a5c87000
RDX: 0000000665c87000 RSI: ffff9e15a5c87000 RDI: ffff9e159c77bba0
RBP: 0000000000000000 R08: 0000000000000000 R09: ffff9e15a5c87000
R10: 0000000000000000 R11: fffff8f2d99721c0 R12: ffff9e159c77bba0
R13: ffffbf91c671d960 R14: ffff9e159c778000 R15: 0000000000000000
FS: 00007fa341cbe700(0000) GS:ffff9e15b7400000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fdd38356804 CR3: 00000006759de003 CR4: 00000000007606e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
loaded_vmcs_init+0x4f/0xe0
alloc_loaded_vmcs+0x38/0xd0
vmx_create_vcpu+0xf7/0x600
kvm_vm_ioctl+0x5e9/0x980
? __switch_to_asm+0x40/0x70
? __switch_to_asm+0x34/0x70
? __switch_to_asm+0x40/0x70
? __switch_to_asm+0x34/0x70
? free_one_page+0x13f/0x4e0
do_vfs_ioctl+0xa4/0x630
ksys_ioctl+0x60/0x90
__x64_sys_ioctl+0x16/0x20
do_syscall_64+0x55/0x1c0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7fa349b1ee5b
Signed-off-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Acked-by: Paolo Bonzini <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/64a9b64d127e87b6920a97afde8e96ea76f6524e.1563413318.git.jpoimboe@redhat.com
|
|
Objtool reports the following:
arch/x86/kvm/vmx/vmenter.o: warning: objtool: vmx_vmenter()+0x14: call without frame pointer save/setup
But frame pointers are necessarily broken anyway, because
__vmx_vcpu_run() clobbers RBP with the guest's value before calling
vmx_vmenter(). So calling without a frame pointer doesn't make things
any worse.
Make objtool happy by changing the call to a UD2.
Suggested-by: Paolo Bonzini <[email protected]>
Signed-off-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Acked-by: Paolo Bonzini <[email protected]>
Link: https://lkml.kernel.org/r/9fc2216c9dc972f95bb65ce2966a682c6bda1cb0.1563413318.git.jpoimboe@redhat.com
|
|
Some of the fastop functions, e.g. em_setcc(), are actually just used as
global labels which point to blocks of functions. The global labels are
incorrectly annotated as functions. Also the functions themselves don't
have size annotations.
Fixes a bunch of warnings like the following:
arch/x86/kvm/emulate.o: warning: objtool: seto() is missing an ELF size annotation
arch/x86/kvm/emulate.o: warning: objtool: em_setcc() is missing an ELF size annotation
arch/x86/kvm/emulate.o: warning: objtool: setno() is missing an ELF size annotation
arch/x86/kvm/emulate.o: warning: objtool: setc() is missing an ELF size annotation
Signed-off-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Acked-by: Paolo Bonzini <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/c8cc9be60ebbceb3092aa5dd91916039a1f88275.1563413318.git.jpoimboe@redhat.com
|
|
The __raw_callee_save_*() functions have an ELF symbol size of zero,
which confuses objtool and other tools.
Fixes a bunch of warnings like the following:
arch/x86/xen/mmu_pv.o: warning: objtool: __raw_callee_save_xen_pte_val() is missing an ELF size annotation
arch/x86/xen/mmu_pv.o: warning: objtool: __raw_callee_save_xen_pgd_val() is missing an ELF size annotation
arch/x86/xen/mmu_pv.o: warning: objtool: __raw_callee_save_xen_make_pte() is missing an ELF size annotation
arch/x86/xen/mmu_pv.o: warning: objtool: __raw_callee_save_xen_make_pgd() is missing an ELF size annotation
Signed-off-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Juergen Gross <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/afa6d49bb07497ca62e4fc3b27a2d0cece545b4e.1563413318.git.jpoimboe@redhat.com
|
|
Pick up the two pending objtool patches as the next round of objtool fixes
depend on them.
|
|
When walking userspace stacks, USER_DS needs to be set, otherwise
access_ok() will not function as expected.
Reported-by: Vegard Nossum <[email protected]>
Reported-by: Eiichi Tsukata <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Vegard Nossum <[email protected]>
Reviewed-by: Joel Fernandes (Google) <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
The elftoolchain version of libelf has a function named elf_open().
The function name isn't quite accurate anyway, since it also reads all
the ELF data. Rename it to elf_read(), which is more accurate.
[ jpoimboe: rename to elf_read(); write commit description ]
Signed-off-by: Michael Forney <[email protected]>
Signed-off-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lkml.kernel.org/r/7ce2d1b35665edf19fd0eb6fbc0b17b81a48e62f.1562793604.git.jpoimboe@redhat.com
|
|
The libelf implementation might use a different struct name, and the
Elf_Scn typedef is already used throughout the rest of objtool.
Signed-off-by: Michael Forney <[email protected]>
Signed-off-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lkml.kernel.org/r/d270e1be2835fc2a10acf67535ff2ebd2145bf43.1562793448.git.jpoimboe@redhat.com
|
|
Pull hwspinlock updates from Bjorn Andersson:
"This contains support for hardware spinlock TI K3 AM65x and J721E
family of SoCs, support for using hwspinlocks from atomic context and
better error reporting when dealing with hardware disabled in
DeviceTree"
* tag 'hwlock-v5.3' of git://github.com/andersson/remoteproc:
hwspinlock: add the 'in_atomic' API
hwspinlock: document the hwspinlock 'raw' API
hwspinlock: stm32: implement the relax() ops
hwspinlock: ignore disabled device
hwspinlock/omap: Add a trace during probe
hwspinlock/omap: Add support for TI K3 SoCs
dt-bindings: hwlock: Update OMAP binding for TI K3 SoCs
|
|
Pull remoteproc updates from Bjorn Andersson:
"This adds support for the STM32 remoteproc, additional i.MX platforms
with Cortex M4 remoteprocs and Qualcomm's QCS404 Compute DSP.
Also initial support for vendor specific resource table entries and
support for unprocessed Qualcomm firmware files"
* tag 'rproc-v5.3' of git://github.com/andersson/remoteproc:
remoteproc: stm32: fix building without ARM SMCC
remoteproc: qcom: q6v5-mss: Fix build error without QCOM_MDT_LOADER
remoteproc: copy parent dma_pfn_offset for vdev
remoteproc: qcom: q6v5-mss: Support loading non-split images
soc: qcom: mdt_loader: Support loading non-split images
remoteproc: stm32: add an ST stm32_rproc driver
dt-bindings: remoteproc: add bindings for stm32 remote processor driver
dt-bindings: stm32: add bindings for ML-AHB interconnect
remoteproc: Use struct_size() helper
remoteproc: add vendor resources handling
remoteproc: imx: Fix typo in "failed"
remoteproc: imx: Broaden the Kconfig selection logic
remoteproc,rpmsg: add missing MAINTAINERS file entries
remoteproc: qcom: qdsp6-adsp: Add support for QCS404 CDSP
dt-bindings: remoteproc: Rename and amend Hexagon v56 binding
|
|
Pull rpmsg updates from Bjorn Andersson:
"This contains a DT binding update and a change to make the remote
function of rpmsg_devices optional"
* tag 'rpmsg-v5.3' of git://github.com/andersson/remoteproc:
rpmsg: core: Make remove handler for rpmsg driver optional.
dt-bindings: soc: qcom: Add remote-pid binding for GLINK SMEM
|
|
Pull virtio, vhost updates from Michael Tsirkin:
"Fixes, features, performance:
- new iommu device
- vhost guest memory access using vmap (just meta-data for now)
- minor fixes"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
virtio-mmio: add error check for platform_get_irq
scsi: virtio_scsi: Use struct_size() helper
iommu/virtio: Add event queue
iommu/virtio: Add probe request
iommu: Add virtio-iommu driver
PCI: OF: Initialize dev->fwnode appropriately
of: Allow the iommu-map property to omit untranslated devices
dt-bindings: virtio: Add virtio-pci-iommu node
dt-bindings: virtio-mmio: Add IOMMU description
vhost: fix clang build warning
vhost: access vq metadata through kernel virtual address
vhost: factor out setting vring addr and num
vhost: introduce helpers to get the size of metadata area
vhost: rename vq_iotlb_prefetch() to vq_meta_prefetch()
vhost: fine grain userspace memory accessors
vhost: generalize adding used elem
|
|
Pull VFIO updates from Alex Williamson:
- Static symbol cleanup in mdev samples (Kefeng Wang)
- Use vma help in nvlink code (Peng Hao)
- Remove unused code in mbochs sample (YueHaibing)
- Send uevents around mdev registration (Alex Williamson)
* tag 'vfio-v5.3-rc1' of git://github.com/awilliam/linux-vfio:
mdev: Send uevents around parent device registration
sample/mdev/mbochs: remove set but not used variable 'mdev_state'
vfio: vfio_pci_nvlink2: use a vma helper function
vfio-mdev/samples: make some symbols static
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux
Pull clk updates from Stephen Boyd:
"This round of clk driver and framework updates is heavy on the driver
update side. The two main highlights in the core framework are the
addition of an bulk clk_get API that handles optional clks and an
extra debugfs file that tells the developer about the current parent
of a clk.
The driver updates are dominated by i.MX in the diffstat, but that is
mostly because that SoC has started converting to the clk_hw style of
clk registration. The next big update is in the Amlogic meson clk
driver that gained some support for audio, cpu, and temperature clks
while fixing some PLL issues. Finally, the biggest thing that stands
out is the conversion of a large part of the Allwinner sunxi-ng driver
to the new clk parent scheme that uses less strings and more pointer
comparisons to match clk parents and children up.
In general, it looks like we have a lot of little fixes and tweaks
here and there to clk data along with the normal addition of a handful
of new drivers and a couple new core framework features.
Core:
- Add a 'clk_parent' file in clk debugfs
- Add a clk_bulk_get_optional() API (with devm too)
New Drivers:
- Support gated clk controller on MIPS based BCM63XX SoCs
- Support SiLabs Si5341 and Si5340 chips
- Support for CPU clks on Raspberry Pi devices
- Audsys clock driver for MediaTek MT8516 SoCs
Updates:
- Convert a large portion of the Allwinner sunxi-ng driver to new clk parent scheme
- Small frequency support for SiLabs Si544 chips
- Slow clk support for AT91 SAM9X60 SoCs
- Remove dead code in various clk drivers (-Wunused)
- Support for Marvell 98DX1135 SoCs
- Get duty cycle of generic pwm clks
- Improvement in mmc phase calculation and cleanup of some rate defintions
- Switch i.MX6 and i.MX7 clock drivers to clk_hw based APIs
- Add GPIO, SNVS and GIC clocks for i.MX8 drivers
- Mark imx6sx/ul/ull/sll MMDC_P1_IPG and imx8mm DRAM_APB as critical clock
- Correct imx7ulp nic1_bus_clk and imx8mm audio_pll2_clk clock setting
- Add clks for new Exynos5422 Dynamic Memory Controller driver
- Clock definition for Exynos4412 Mali
- Add CMM (Color Management Module) clocks on Renesas R-Car H3, M3-N, E3, and D3
- Add TPU (Timer Pulse Unit / PWM) clocks on Renesas RZ/G2M
- Support for 32 bit clock IDs in TI's sci-clks for J721e SoCs
- TI clock probing done from DT by default instead of firmware
- Fix Amlogic Meson mpll fractional part and spread sprectrum issues
- Add Amlogic meson8 audio clocks
- Add Amlogic g12a temperature sensors clocks
- Add Amlogic g12a and g12b cpu clocks
- Add TPU (Timer Pulse Unit / PWM) clocks on Renesas R-Car H3, M3-W, and M3-N
- Add CMM (Color Management Module) clocks on Renesas R-Car M3-W
- Add Clock Domain support on Renesas RZ/N1"
* tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: (190 commits)
clk: consoldiate the __clk_get_hw() declarations
clk: sprd: Add check for return value of sprd_clk_regmap_init()
clk: lochnagar: Update DT binding doc to include the primary SPDIF MCLK
clk: Add Si5341/Si5340 driver
dt-bindings: clock: Add silabs,si5341
clk: clk-si544: Implement small frequency change support
clk: add BCM63XX gated clock controller driver
devicetree: document the BCM63XX gated clock bindings
clk: at91: sckc: use dedicated functions to unregister clock
clk: at91: sckc: improve error path for sama5d4 sck registration
clk: at91: sckc: remove unnecessary line
clk: at91: sckc: improve error path for sam9x5 sck register
clk: at91: sckc: add support to free slow clock osclillator
clk: at91: sckc: add support to free slow rc oscillator
clk: at91: sckc: add support to free slow oscillator
clk: rockchip: export HDMIPHY clock on rk3228
clk: rockchip: add watchdog pclk on rk3328
clk: rockchip: add clock id for hdmi_phy special clock on rk3228
clk: rockchip: add clock id for watchdog pclk on rk3328
clk: at91: sckc: add support for SAM9X60
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux
Pull RTC updates from Alexandre Belloni:
"A quiet cycle this time.
- ds1307: properly handle oscillator failure flags
- imx-sc: alarm support
- pcf2123: alarm support, correct offset handling
- sun6i: add R40 support
- simplify getting the adapter of an i2c client"
* tag 'rtc-5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux: (37 commits)
rtc: wm831x: Add IRQF_ONESHOT flag
rtc: stm32: remove one condition check in stm32_rtc_set_alarm()
rtc: pcf2123: Fix build error
rtc: interface: Change type of 'count' from int to u64
rtc: pcf8563: Clear event flags and disable interrupts before requesting irq
rtc: pcf8563: Fix interrupt trigger method
rtc: pcf2123: fix negative offset rounding
rtc: pcf2123: add alarm support
rtc: pcf2123: use %ptR
rtc: pcf2123: port to regmap
rtc: pcf2123: remove sysfs register view
rtc: rx8025: simplify getting the adapter of a client
rtc: rx8010: simplify getting the adapter of a client
rtc: rv8803: simplify getting the adapter of a client
rtc: m41t80: simplify getting the adapter of a client
rtc: fm3130: simplify getting the adapter of a client
rtc: tegra: Drop MODULE_ALIAS
rtc: sun6i: Add R40 compatible
dt-bindings: rtc: sun6i: Add the R40 RTC compatible
dt-bindings: rtc: Convert Allwinner A31 RTC to a schema
...
|
|
Pull dmaengine updates from Vinod Koul:
- Add support in dmaengine core to do device node checks for DT devices
and update bunch of drivers to use that and remove open coding from
drivers
- New driver/driver support for new hardware, namely:
- MediaTek UART APDMA
- Freescale i.mx7ulp edma2
- Synopsys eDMA IP core version 0
- Allwinner H6 DMA
- Updates to axi-dma and support for interleaved cyclic transfers
- Greg's debugfs return value check removals on drivers
- Updates to stm32-dma, hsu, dw, pl330, tegra drivers
* tag 'dmaengine-5.3-rc1' of git://git.infradead.org/users/vkoul/slave-dma: (68 commits)
dmaengine: Revert "dmaengine: fsl-edma: add i.mx7ulp edma2 version support"
dmaengine: at_xdmac: check for non-empty xfers_list before invoking callback
Documentation: dmaengine: clean up description of dmatest usage
dmaengine: tegra210-adma: remove PM_CLK dependency
dmaengine: fsl-edma: add i.mx7ulp edma2 version support
dt-bindings: dma: fsl-edma: add new i.mx7ulp-edma
dmaengine: fsl-edma-common: version check for v2 instead
dmaengine: fsl-edma-common: move dmamux register to another single function
dmaengine: fsl-edma: add drvdata for fsl-edma
dmaengine: Revert "dmaengine: fsl-edma: support little endian for edma driver"
dmaengine: rcar-dmac: Reject zero-length slave DMA requests
dmaengine: dw: Enable iDMA 32-bit on Intel Elkhart Lake
dmaengine: dw-edma: fix semicolon.cocci warnings
dmaengine: sh: usb-dmac: Use [] to denote a flexible array member
dmaengine: dmatest: timeout value of -1 should specify infinite wait
dmaengine: dw: Distinguish ->remove() between DW and iDMA 32-bit
dmaengine: fsl-edma: support little endian for edma driver
dmaengine: hsu: Revert "set HSU_CH_MTSR to memory width"
dmagengine: pl330: add code to get reset property
dt-bindings: pl330: document the optional resets property
...
|
|
Pull MIPS updates from Paul Burton:
"A light batch this time around but significant improvements for
certain systems:
- Removal of readq & writeq for MIPS32 kernels where they would
simply BUG() anyway, allowing drivers or other code that #ifdefs on
their presence to work properly.
- Improvements for Ingenic JZ4740 systems, including support for the
external memory controller & pinmuxing fixes for qi_lb60/NanoNote
systems.
- Improvements for Lantiq systems, in particular around SMP & IPIs.
- DT updates for ralink/MediaTek MT7628a systems to probe & configure
a bunch more devices.
- Miscellaneous cleanups & build fixes"
* tag 'mips_5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux: (30 commits)
MIPS: fix some more fall through errors in arch/mips
MIPS: perf events: handle switch statement falling through warnings
mips/kprobes: Export kprobe_fault_handler()
MAINTAINERS: Add myself as Ingenic SoCs maintainer
MIPS: ralink: mt7628a.dtsi: Add watchdog controller DT node
MIPS: ralink: mt7628a.dtsi: Add SPI controller DT node
MIPS: ralink: mt7628a.dtsi: Add GPIO controller DT node
MIPS: ralink: mt7628a.dtsi: Add pinctrl DT properties to the UART nodes
MIPS: ralink: mt7628a.dtsi: Add pinmux DT node
MIPS: ralink: mt7628a.dtsi: Add SPDX GPL-2.0 license identifier
MIPS: lantiq: Add SMP support for lantiq interrupt controller
MIPS: lantiq: Shorten register names, remove unused macros
MIPS: lantiq: Fix bitfield masking
MIPS: lantiq: Remove unused macros
MIPS: lantiq: Fix attributes of of_device_id structure
MIPS: lantiq: Change variables to the same type as the source
MIPS: lantiq: Move macro directly to iomem function
mips: Remove q-accessors from non-64bit platforms
FDDI: defza: Include linux/io-64-nonatomic-lo-hi.h
MIPS: configs: Remove useless UEVENT_HELPER_PATH
...
|
|
git://git.sourceforge.jp/gitroot/uclinux-h8/linux
Pull h8300 update from Yoshinori Sato:
"Remove unused barrier defines"
* tag 'h8300-for-linus-20190617' of git://git.sourceforge.jp/gitroot/uclinux-h8/linux:
H8300: remove unused barrier defines
|
|
git://git.sourceforge.jp/gitroot/uclinux-h8/linux
Pull SH updates from Yoshinori Sato.
kprobe fix, defconfig updates and a SH Kconfig fix.
* tag 'for-linus-20190617' of git://git.sourceforge.jp/gitroot/uclinux-h8/linux:
arch/sh: Check for kprobe trap number before trying to handle a kprobe trap
sh: configs: Remove useless UEVENT_HELPER_PATH
Fix allyesconfig output.
|
|
Merge more updates from Andrew Morton:
"VM:
- z3fold fixes and enhancements by Henry Burns and Vitaly Wool
- more accurate reclaimed slab caches calculations by Yafang Shao
- fix MAP_UNINITIALIZED UAPI symbol to not depend on config, by
Christoph Hellwig
- !CONFIG_MMU fixes by Christoph Hellwig
- new novmcoredd parameter to omit device dumps from vmcore, by
Kairui Song
- new test_meminit module for testing heap and pagealloc
initialization, by Alexander Potapenko
- ioremap improvements for huge mappings, by Anshuman Khandual
- generalize kprobe page fault handling, by Anshuman Khandual
- device-dax hotplug fixes and improvements, by Pavel Tatashin
- enable synchronous DAX fault on powerpc, by Aneesh Kumar K.V
- add pte_devmap() support for arm64, by Robin Murphy
- unify locked_vm accounting with a helper, by Daniel Jordan
- several misc fixes
core/lib:
- new typeof_member() macro including some users, by Alexey Dobriyan
- make BIT() and GENMASK() available in asm, by Masahiro Yamada
- changed LIST_POISON2 on x86_64 to 0xdead000000000122 for better
code generation, by Alexey Dobriyan
- rbtree code size optimizations, by Michel Lespinasse
- convert struct pid count to refcount_t, by Joel Fernandes
get_maintainer.pl:
- add --no-moderated switch to skip moderated ML's, by Joe Perches
misc:
- ptrace PTRACE_GET_SYSCALL_INFO interface
- coda updates
- gdb scripts, various"
[ Using merge message suggestion from Vlastimil Babka, with some editing - Linus ]
* emailed patches from Andrew Morton <[email protected]>: (100 commits)
fs/select.c: use struct_size() in kmalloc()
mm: add account_locked_vm utility function
arm64: mm: implement pte_devmap support
mm: introduce ARCH_HAS_PTE_DEVMAP
mm: clean up is_device_*_page() definitions
mm/mmap: move common defines to mman-common.h
mm: move MAP_SYNC to asm-generic/mman-common.h
device-dax: "Hotremove" persistent memory that is used like normal RAM
mm/hotplug: make remove_memory() interface usable
device-dax: fix memory and resource leak if hotplug fails
include/linux/lz4.h: fix spelling and copy-paste errors in documentation
ipc/mqueue.c: only perform resource calculation if user valid
include/asm-generic/bug.h: fix "cut here" for WARN_ON for __WARN_TAINT architectures
scripts/gdb: add helpers to find and list devices
scripts/gdb: add lx-genpd-summary command
drivers/pps/pps.c: clear offset flags in PPS_SETPARAMS ioctl
kernel/pid.c: convert struct pid count to refcount_t
drivers/rapidio/devices/rio_mport_cdev.c: NUL terminate some strings
select: shift restore_saved_sigmask_unless() into poll_select_copy_remaining()
select: change do_poll() to return -ERESTARTNOHAND rather than -EINTR
...
|
|
One of the more common cases of allocation size calculations is finding
the size of a structure that has a zero-sized array at the end, along
with memory for some number of elements for that array. For example:
struct foo {
int stuff;
struct boo entry[];
};
size = sizeof(struct foo) + count * sizeof(struct boo);
instance = kmalloc(size, GFP_KERNEL);
Instead of leaving these open-coded and prone to type mistakes, we can now
use the new struct_size() helper:
instance = kmalloc(struct_size(instance, entry, count), GFP_KERNEL);
Also, notice that variable size is unnecessary, hence it is removed.
This code was detected with the help of Coccinelle.
Link: http://lkml.kernel.org/r/20190604164226.GA13823@embeddedor
Signed-off-by: Gustavo A. R. Silva <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
locked_vm accounting is done roughly the same way in five places, so
unify them in a helper.
Include the helper's caller in the debug print to distinguish between
callsites.
Error codes stay the same, so user-visible behavior does too. The one
exception is that the -EPERM case in tce_account_locked_vm is removed
because Alexey has never seen it triggered.
[[email protected]: v3]
Link: http://lkml.kernel.org/r/[email protected]
[[email protected]: fix mm/util.c]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Daniel Jordan <[email protected]>
Signed-off-by: Stephen Rothwell <[email protected]>
Tested-by: Alexey Kardashevskiy <[email protected]>
Acked-by: Alex Williamson <[email protected]>
Cc: Alan Tull <[email protected]>
Cc: Alex Williamson <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Moritz Fischer <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Steve Sistare <[email protected]>
Cc: Wu Hao <[email protected]>
Cc: Ira Weiny <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
In order for things like get_user_pages() to work on ZONE_DEVICE memory,
we need a software PTE bit to identify device-backed PFNs. Hook this up
along with the relevant helpers to join in with ARCH_HAS_PTE_DEVMAP.
[[email protected]: build fixes]
Link: http://lkml.kernel.org/r/13026c4e64abc17133bbfa07d7731ec6691c0bcd.1559050949.git.robin.murphy@arm.com
Link: http://lkml.kernel.org/r/817d92886fc3b33bcbf6e105ee83a74babb3a5aa.1558547956.git.robin.murphy@arm.com
Signed-off-by: Robin Murphy <[email protected]>
Acked-by: Will Deacon <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: Jerome Glisse <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Oliver O'Halloran <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
ARCH_HAS_ZONE_DEVICE is somewhat meaningless in itself, and combined
with the long-out-of-date comment can lead to the impression than an
architecture may just enable it (since __add_pages() now "comprehends
device memory" for itself) and expect things to work.
In practice, however, ZONE_DEVICE users have little chance of
functioning correctly without __HAVE_ARCH_PTE_DEVMAP, so let's clean
that up the same way as ARCH_HAS_PTE_SPECIAL and make it the proper
dependency so the real situation is clearer.
Link: http://lkml.kernel.org/r/87554aa78478a02a63f2c4cf60a847279ae3eb3b.1558547956.git.robin.murphy@arm.com
Signed-off-by: Robin Murphy <[email protected]>
Acked-by: Dan Williams <[email protected]>
Reviewed-by: Ira Weiny <[email protected]>
Acked-by: Oliver O'Halloran <[email protected]>
Reviewed-by: Anshuman Khandual <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Jerome Glisse <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Refactor is_device_{public,private}_page() with is_pci_p2pdma_page() to
make them all consistent in depending on their respective config options
even when CONFIG_DEV_PAGEMAP_OPS is enabled for other reasons. This
allows a little more compile-time optimisation as well as the conceptual
and cosmetic cleanup.
Link: http://lkml.kernel.org/r/187c2ab27dea70635d375a61b2f2076d26c032b0.1558547956.git.robin.murphy@arm.com
Signed-off-by: Robin Murphy <[email protected]>
Suggested-by: Jerome Glisse <[email protected]>
Reviewed-by: Anshuman Khandual <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Oliver O'Halloran <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Two architecture that use arch specific MMAP flags are powerpc and
sparc. We still have few flag values common across them and other
architectures. Consolidate this in mman-common.h.
Also update the comment to indicate where to find HugeTLB specific
reserved values
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Aneesh Kumar K.V <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
This enables support for synchronous DAX fault on powerpc
The generic changes are added as part of b6fb293f2497 ("mm: Define
MAP_SYNC and VM_SYNC flags")
Without this, mmap returns EOPNOTSUPP for MAP_SYNC with
MAP_SHARED_VALIDATE
Instead of adding MAP_SYNC with same value to
arch/powerpc/include/uapi/asm/mman.h, I am moving the #define to
asm-generic/mman-common.h. Two architectures using mman-common.h
directly are sparc and powerpc. We should be able to consloidate more
#defines to mman-common.h. That can be done as a separate patch.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Aneesh Kumar K.V <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Ross Zwisler <[email protected]>
Cc: Dan Williams <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
It is now allowed to use persistent memory like a regular RAM, but
currently there is no way to remove this memory until machine is
rebooted.
This work expands the functionality to also allows hotremoving
previously hotplugged persistent memory, and recover the device for use
for other purposes.
To hotremove persistent memory, the management software must first
offline all memory blocks of dax region, and than unbind it from
device-dax/kmem driver. So, operations should look like this:
echo offline > /sys/devices/system/memory/memoryN/state
...
echo dax0.0 > /sys/bus/dax/drivers/kmem/unbind
Note: if unbind is done without offlining memory beforehand, it won't be
possible to do dax0.0 hotremove, and dax's memory is going to be part of
System RAM until reboot.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Pavel Tatashin <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Cc: James Morris <[email protected]>
Cc: Sasha Levin <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Keith Busch <[email protected]>
Cc: Vishal Verma <[email protected]>
Cc: Dave Jiang <[email protected]>
Cc: Ross Zwisler <[email protected]>
Cc: Tom Lendacky <[email protected]>
Cc: Huang Ying <[email protected]>
Cc: Fengguang Wu <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Bjorn Helgaas <[email protected]>
Cc: Yaowei Bai <[email protected]>
Cc: Takashi Iwai <[email protected]>
Cc: Jérôme Glisse <[email protected]>
Cc: Dave Hansen <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Presently the remove_memory() interface is inherently broken. It tries
to remove memory but panics if some memory is not offline. The problem
is that it is impossible to ensure that all memory blocks are offline as
this function also takes lock_device_hotplug that is required to change
memory state via sysfs.
So, between calling this function and offlining all memory blocks there
is always a window when lock_device_hotplug is released, and therefore,
there is always a chance for a panic during this window.
Make this interface to return an error if memory removal fails. This
way it is safe to call this function without panicking machine, and also
makes it symmetric to add_memory() which already returns an error.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Pavel Tatashin <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: Bjorn Helgaas <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Dave Jiang <[email protected]>
Cc: Fengguang Wu <[email protected]>
Cc: Huang Ying <[email protected]>
Cc: James Morris <[email protected]>
Cc: Jérôme Glisse <[email protected]>
Cc: Keith Busch <[email protected]>
Cc: Ross Zwisler <[email protected]>
Cc: Sasha Levin <[email protected]>
Cc: Takashi Iwai <[email protected]>
Cc: Tom Lendacky <[email protected]>
Cc: Vishal Verma <[email protected]>
Cc: Yaowei Bai <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Patch series ""Hotremove" persistent memory", v6.
Recently, adding a persistent memory to be used like a regular RAM was
added to Linux. This work extends this functionality to also allow hot
removing persistent memory.
We (Microsoft) have an important use case for this functionality.
The requirement is for physical machines with small amount of RAM (~8G)
to be able to reboot in a very short period of time (<1s). Yet, there
is a userland state that is expensive to recreate (~2G).
The solution is to boot machines with 2G preserved for persistent
memory.
Copy the state, and hotadd the persistent memory so machine still has
all 8G available for runtime. Before reboot, offline and hotremove
device-dax 2G, copy the memory that is needed to be preserved to pmem0
device, and reboot.
The series of operations look like this:
1. After boot restore /dev/pmem0 to ramdisk to be consumed by apps.
and free ramdisk.
2. Convert raw pmem0 to devdax
ndctl create-namespace --mode devdax --map mem -e namespace0.0 -f
3. Hotadd to System RAM
echo dax0.0 > /sys/bus/dax/drivers/device_dax/unbind
echo dax0.0 > /sys/bus/dax/drivers/kmem/new_id
echo online_movable > /sys/devices/system/memoryXXX/state
4. Before reboot hotremove device-dax memory from System RAM
echo offline > /sys/devices/system/memoryXXX/state
echo dax0.0 > /sys/bus/dax/drivers/kmem/unbind
5. Create raw pmem0 device
ndctl create-namespace --mode raw -e namespace0.0 -f
6. Copy the state that was stored by apps to ramdisk to pmem device
7. Do kexec reboot or reboot through firmware if firmware does not
zero memory in pmem0 region (These machines have only regular
volatile memory). So to have pmem0 device either memmap kernel
parameter is used, or devices nodes in dtb are specified.
This patch (of 3):
When add_memory() fails, the resource and the memory should be freed.
Link: http://lkml.kernel.org/r/[email protected]
Fixes: c221c0b0308f ("device-dax: "Hotplug" persistent memory for use like normal RAM")
Signed-off-by: Pavel Tatashin <[email protected]>
Reviewed-by: Dave Hansen <[email protected]>
Cc: Bjorn Helgaas <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Dave Jiang <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Fengguang Wu <[email protected]>
Cc: Huang Ying <[email protected]>
Cc: James Morris <[email protected]>
Cc: Jérôme Glisse <[email protected]>
Cc: Keith Busch <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Ross Zwisler <[email protected]>
Cc: Sasha Levin <[email protected]>
Cc: Takashi Iwai <[email protected]>
Cc: Tom Lendacky <[email protected]>
Cc: Vishal Verma <[email protected]>
Cc: Yaowei Bai <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Fix a few spelling and grammar errors, and two places where fast/safe in
the documentation did not match the function.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Tom Levy <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Cc: Jiri Kosina <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|