aboutsummaryrefslogtreecommitdiff
path: root/tools
AgeCommit message (Collapse)AuthorFilesLines
2021-01-26objtool: Add xen_start_kernel() to noreturn listJosh Poimboeuf1-0/+1
xen_start_kernel() doesn't return. Annotate it as such so objtool can follow the code flow. Signed-off-by: Josh Poimboeuf <[email protected]> Link: https://lore.kernel.org/r/930deafa89256c60b180442df59a1bbae48f30ab.1611263462.git.jpoimboe@redhat.com
2021-01-26objtool: Combine UNWIND_HINT_RET_OFFSET and UNWIND_HINT_FUNCJosh Poimboeuf4-26/+21
The ORC metadata generated for UNWIND_HINT_FUNC isn't actually very func-like. With certain usages it can cause stack state mismatches because it doesn't set the return address (CFI_RA). Also, users of UNWIND_HINT_RET_OFFSET no longer need to set a custom return stack offset. Instead they just need to specify a func-like situation, so the current ret_offset code is hacky for no good reason. Solve both problems by simplifying the RET_OFFSET handling and converting it into a more useful UNWIND_HINT_FUNC. If we end up needing the old 'ret_offset' functionality again in the future, we should be able to support it pretty easily with the addition of a custom 'sp_offset' in UNWIND_HINT_FUNC. Signed-off-by: Josh Poimboeuf <[email protected]> Link: https://lore.kernel.org/r/db9d1f5d79dddfbb3725ef6d8ec3477ad199948d.1611263462.git.jpoimboe@redhat.com
2021-01-26objtool: Add asm version of STACK_FRAME_NON_STANDARDJosh Poimboeuf1-0/+8
To be used for adding asm functions to the ignore list. The "aw" is needed to help the ELF section metadata match GCC-created sections. Otherwise the linker creates duplicate sections instead of combining them. Signed-off-by: Josh Poimboeuf <[email protected]> Link: https://lore.kernel.org/r/8faa476f9a5ac89af27944ec184c89f95f3c6c49.1611263462.git.jpoimboe@redhat.com
2021-01-26objtool: Assume only ELF functions do sibling callsJosh Poimboeuf1-14/+22
There's an inconsistency in how sibling calls are detected in non-function asm code, depending on the scope of the object. If the target code is external to the object, objtool considers it a sibling call. If the target code is internal but not a function, objtool *doesn't* consider it a sibling call. This can cause some inconsistencies between per-object and vmlinux.o validation. Instead, assume only ELF functions can do sibling calls. This generally matches existing reality, and makes sibling call validation consistent between vmlinux.o and per-object. Signed-off-by: Josh Poimboeuf <[email protected]> Link: https://lore.kernel.org/r/0e9ab6f3628cc7bf3bde7aa6762d54d7df19ad78.1611263461.git.jpoimboe@redhat.com
2021-01-26objtool: Support retpoline jump detection for vmlinux.oJosh Poimboeuf1-4/+4
Objtool converts direct retpoline jumps to type INSN_JUMP_DYNAMIC, since that's what they are semantically. That conversion doesn't work in vmlinux.o validation because the indirect thunk function is present in the object, so the intra-object jump check succeeds before the retpoline jump check gets a chance. Rearrange the checks: check for a retpoline jump before checking for an intra-object jump. Signed-off-by: Josh Poimboeuf <[email protected]> Link: https://lore.kernel.org/r/4302893513770dde68ddc22a9d6a2a04aca491dd.1611263461.git.jpoimboe@redhat.com
2021-01-26objtool: Fix ".cold" section suffix check for newer versions of GCCJosh Poimboeuf1-2/+2
With my version of GCC 9.3.1 the ".cold" subfunctions no longer have a numbered suffix, so the trailing period is no longer there. Presumably this doesn't yet trigger a user-visible bug since most of the subfunction detection logic is duplicated. I only found it when testing vmlinux.o validation. Fixes: 54262aa28301 ("objtool: Fix sibling call detection") Signed-off-by: Josh Poimboeuf <[email protected]> Link: https://lore.kernel.org/r/ca0b5a57f08a2fbb48538dd915cc253b5edabb40.1611263461.git.jpoimboe@redhat.com
2021-01-26objtool: Fix retpoline detection in asm codeJosh Poimboeuf3-2/+14
The JMP_NOSPEC macro branches to __x86_retpoline_*() rather than the __x86_indirect_thunk_*() wrappers used by C code. Detect jumps to __x86_retpoline_*() as retpoline dynamic jumps. Presumably this doesn't trigger a user-visible bug. I only found it when testing vmlinux.o validation. Fixes: 39b735332cb8 ("objtool: Detect jumps to retpoline thunks") Signed-off-by: Josh Poimboeuf <[email protected]> Link: https://lore.kernel.org/r/31f5833e2e4f01e3d755889ac77e3661e906c09f.1611263461.git.jpoimboe@redhat.com
2021-01-26objtool: Fix error handling for STD/CLD warningsJosh Poimboeuf1-2/+6
Actually return an error (and display a backtrace, if requested) for directional bit warnings. Fixes: 2f0f9e9ad7b3 ("objtool: Add Direction Flag validation") Signed-off-by: Josh Poimboeuf <[email protected]> Link: https://lore.kernel.org/r/dc70f2adbc72f09526f7cab5b6feb8bf7f6c5ad4.1611263461.git.jpoimboe@redhat.com
2021-01-26cpupower: Add cpuid cap flag for MSR_AMD_HWCR supportNathan Fontenot3-7/+7
Remove the family check for accessing the MSR_AMD_HWCR MSR and replace it with a cpupower cap flag. This update also allows for the removal of the local cpupower_cpu_info variable in cpufreq_has_boost_support() since we no longer need it to check the family. Signed-off-by: Nathan Fontenot <[email protected]> Reviewed-by: Robert Richter <[email protected]> Signed-off-by: Shuah Khan <[email protected]>
2021-01-26cpupower: Remove family arg to decode_pstates()Nathan Fontenot3-17/+14
The decode_pstates() routine no longer uses the CPU family and the caleed routines (get_cof() and get_did()) can grab the family from the global cpupower_cpu_info struct. These update removes passing the family arg to all these routines. Signed-off-by: Nathan Fontenot <[email protected]> Reviewed-by: Robert Richter <[email protected]> Signed-off-by: Shuah Khan <[email protected]>
2021-01-26cpupower: Condense pstate enabled bit checks in decode_pstates()Nathan Fontenot1-3/+3
The enabled bit (bit 63) is common for all families so we can remove the multiple enabled checks based on family and have a common check for HW pstate enabled. Signed-off-by: Nathan Fontenot <[email protected]> Reviewed-by: Robert Richter <[email protected]> Signed-off-by: Shuah Khan <[email protected]>
2021-01-26cpupower: Update family checks when decoding HW pstatesNathan Fontenot3-5/+10
The family checks in get_cof() and get_did() need to use the correct MSR format depending on the family. Add a cpupower capability for using the pstatedef (family 17h and newer) to control this instead of direct family checks. Signed-off-by: Nathan Fontenot <[email protected]> Reviewed-by: Robert Richter <[email protected]> Signed-off-by: Shuah Khan <[email protected]>
2021-01-26cpupower: Remove unused pscur variable.Nathan Fontenot1-8/+1
The pscur variable is set but not uused, just remove it. This may have previsously been set to validate the MSR_AMD_PSTATE_STATUS MSR. With the addition of the CPUPOWER_CAP_AMD_HW_PSTATE cap flag this is no longer needed since the cpuid bit to enable this cap flag also validates that the MSR_AMD_PSTATE_STATUS MSR is present. Signed-off-by: Nathan Fontenot <[email protected]> Reviewed-by: Robert Richter <[email protected]> Signed-off-by: Shuah Khan <[email protected]>
2021-01-26cpupower: Add CPUPOWER_CAP_AMD_HW_PSTATE cpuid caps flagNathan Fontenot3-8/+14
Add a check in get_cpu_info() for the ability to read frequencies from hardware and set the CPUPOWER_CAP_AMD_HW_PSTATE cpuid flag. The cpuid flag is set when CPUID_80000007_EDX[7] is set, which is all families >= 10h. The check excludes family 14h because HW pstate reporting was not implemented on family 14h. This is intended to reduce family checks in the main code paths. Signed-off-by: Nathan Fontenot <[email protected]> Reviewed-by: Robert Richter <[email protected]> Reviewed-by: [email protected] Signed-off-by: Shuah Khan <[email protected]>
2021-01-26cpupower: Correct macro name for CPB caps flagRobert Richter3-3/+3
The name is Core Performance Boost (CPB) for the cpuid flag. Correct cpuid caps flag to use this name (instead of CBP). Signed-off-by: Robert Richter <[email protected]> Signed-off-by: Nathan Fontenot <[email protected]> Signed-off-by: Shuah Khan <[email protected]>
2021-01-26cpupower: Update msr_pstate union struct namingNathan Fontenot1-12/+14
The msr_pstate union struct named fam17h_bits is misleading since this is the struct to use for all families >= 0x17, not just for family 0x17. Rename the bits structs to be 'pstate' (for pre family 17h CPUs) and 'pstatedef' (for CPUs since fam 17h) to align closer with PPR/BDKG (1) naming. There are no functional changes as part of this update. 1: AMD Processor Programming Reference (PPR) and BIOS and Kernel Developer's Guide (BKDG) available at: http://developer.amd.com/resources/developer-guides-manuals Signed-off-by: Nathan Fontenot <[email protected]> Reviewed-by: Robert Richter <[email protected]> Reviewed-by: [email protected] Signed-off-by: Shuah Khan <[email protected]>
2021-01-26selftests/bpf: Don't exit on failed bpf_testmod unloadAndrii Nakryiko1-1/+1
Fix bug in handling bpf_testmod unloading that will cause test_progs exiting prematurely if bpf_testmod unloading failed. This is especially problematic when running a subset of test_progs that doesn't require root permissions and doesn't rely on bpf_testmod, yet will fail immediately due to exit(1) in unload_bpf_testmod(). Fixes: 9f7fa225894c ("selftests/bpf: Add bpf_testmod kernel module for testing") Signed-off-by: Andrii Nakryiko <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2021-01-26samples/bpf: Set flag __SANE_USERSPACE_TYPES__ for MIPS to fix build warningsTiezhu Yang1-0/+3
There exists many build warnings when make M=samples/bpf on the Loongson platform, this issue is MIPS related, x86 compiles just fine. Here are some warnings: CC samples/bpf/ibumad_user.o samples/bpf/ibumad_user.c: In function ‘dump_counts’: samples/bpf/ibumad_user.c:46:24: warning: format ‘%llu’ expects argument of type ‘long long unsigned int’, but argument 3 has type ‘__u64’ {aka ‘long unsigned int’} [-Wformat=] printf("0x%02x : %llu\n", key, value); ~~~^ ~~~~~ %lu CC samples/bpf/offwaketime_user.o samples/bpf/offwaketime_user.c: In function ‘print_ksym’: samples/bpf/offwaketime_user.c:34:17: warning: format ‘%llx’ expects argument of type ‘long long unsigned int’, but argument 3 has type ‘__u64’ {aka ‘long unsigned int’} [-Wformat=] printf("%s/%llx;", sym->name, addr); ~~~^ ~~~~ %lx samples/bpf/offwaketime_user.c: In function ‘print_stack’: samples/bpf/offwaketime_user.c:68:17: warning: format ‘%lld’ expects argument of type ‘long long int’, but argument 3 has type ‘__u64’ {aka ‘long unsigned int’} [-Wformat=] printf(";%s %lld\n", key->waker, count); ~~~^ ~~~~~ %ld MIPS needs __SANE_USERSPACE_TYPES__ before <linux/types.h> to select 'int-ll64.h' in arch/mips/include/uapi/asm/types.h, then it can avoid build warnings when printing __u64 with %llu, %llx or %lld. The header tools/include/linux/types.h defines __SANE_USERSPACE_TYPES__, it seems that we can include <linux/types.h> in the source files which have build warnings, but it has no effect due to actually it includes usr/include/linux/types.h instead of tools/include/linux/types.h, the problem is that "usr/include" is preferred first than "tools/include" in samples/bpf/Makefile, that sounds like a ugly hack to -Itools/include before -Iusr/include. So define __SANE_USERSPACE_TYPES__ for MIPS in samples/bpf/Makefile is proper, if add "TPROGS_CFLAGS += -D__SANE_USERSPACE_TYPES__" in samples/bpf/Makefile, it appears the following error: Auto-detecting system features: ... libelf: [ on ] ... zlib: [ on ] ... bpf: [ OFF ] BPF API too old make[3]: *** [Makefile:293: bpfdep] Error 1 make[2]: *** [Makefile:156: all] Error 2 With #ifndef __SANE_USERSPACE_TYPES__ in tools/include/linux/types.h, the above error has gone and this ifndef change does not hurt other compilations. Signed-off-by: Tiezhu Yang <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Yonghong Song <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2021-01-26tools, headers: Sync struct bpf_perf_event_dataFlorian Lehner1-0/+1
Update struct bpf_perf_event_data with the addr field to match the tools headers with the kernel headers. Signed-off-by: Florian Lehner <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2021-01-26selftests/bpf: Avoid useless void *-castsBjörn Töpel1-5/+5
There is no need to cast to void * when the argument is void *. Avoid cluttering of code. Signed-off-by: Björn Töpel <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2021-01-26selftests/bpf: Consistent malloc/calloc usageBjörn Töpel1-7/+7
Use calloc instead of malloc where it makes sense, and avoid C++-style void *-cast. Signed-off-by: Björn Töpel <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2021-01-26selftests/bpf: Avoid heap allocationBjörn Töpel1-5/+4
The data variable is only used locally. Instead of using the heap, stick to using the stack. Signed-off-by: Björn Töpel <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2021-01-26selftests/bpf: Define local variables at the beginning of a blockBjörn Töpel1-5/+7
Use C89 rules for variable definition. Signed-off-by: Björn Töpel <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2021-01-26selftests/bpf: Change type from void * to struct generic_data *Björn Töpel1-4/+4
Instead of casting from void *, let us use the actual type in gen_udp_hdr(). Signed-off-by: Björn Töpel <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2021-01-26selftests/bpf: Change type from void * to struct ifaceconfigobj *Björn Töpel1-14/+14
Instead of casting from void *, let us use the actual type in init_iface_config(). Signed-off-by: Björn Töpel <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2021-01-26selftests/bpf: Remove casting by introduce local variableBjörn Töpel1-13/+11
Let us use a local variable in nsswitchthread(), so we can remove a lot of casting for better readability. Signed-off-by: Björn Töpel <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2021-01-26selftests/bpf: Improve readability of xdpxceiver/worker_pkt_validate()Björn Töpel1-17/+12
Introduce a local variable to get rid of lot of casting. Move common code outside the if/else-clause. Signed-off-by: Björn Töpel <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2021-01-26selftests/bpf: Remove memory leakBjörn Töpel1-1/+0
The allocated entry is immediately overwritten by an assignment. Fix that. Signed-off-by: Björn Töpel <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2021-01-26selftests/bpf: Fix style warningsBjörn Töpel1-7/+7
Silence three checkpatch style warnings. Signed-off-by: Björn Töpel <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2021-01-26selftests/bpf: Remove unused enumsBjörn Töpel1-2/+0
The enums undef and bidi are not used. Remove them. Signed-off-by: Björn Töpel <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2021-01-26selftests/bpf: Remove a lot of ifobject castingBjörn Töpel1-45/+43
Instead of passing void * all over the place, let us pass the actual type (ifobject) and remove the void-ptr-to-type-ptr casting. Signed-off-by: Björn Töpel <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2021-01-25libbpf, xsk: Select AF_XDP BPF program based on kernel versionBjörn Töpel1-3/+78
Add detection for kernel version, and adapt the BPF program based on kernel support. This way, users will get the best possible performance from the BPF program. Signed-off-by: Björn Töpel <[email protected]> Signed-off-by: Marek Majtyka <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Reviewed-by: Maciej Fijalkowski <[email protected]> Acked-by: Maciej Fijalkowski <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2021-01-24Merge tag 'objtool_urgent_for_v5.11_rc5' of ↵Linus Torvalds2-11/+17
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull objtool fixes from Borislav Petkov: - Adjust objtool to handle a recent binutils change to not generate unused symbols anymore. - Revert the fail-the-build-on-fatal-errors objtool strategy for now due to the ever-increasing matrix of supported toolchains/plugins and them causing too many such fatal errors currently. - Do not add empty symbols to objdump's rbtree to accommodate clang removing section symbols. * tag 'objtool_urgent_for_v5.11_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: objtool: Don't fail on missing symbol table objtool: Don't fail the kernel build on fatal errors objtool: Don't add empty symbols to the rbtree
2021-01-24Merge tag 'powerpc-5.11-5' of ↵Linus Torvalds3-3/+6
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: - Fix a bad interaction between the scv handling and the fallback L1D flush, which could lead to user register corruption. Only affects people using scv (~no one) on machines with old firmware that are missing the L1D flush. - Two small selftest fixes. Thanks to Eirik Fuller, Libor Pechacek, Nicholas Piggin, Sandipan Das, and Tulio Magno Quites Machado Filho. * tag 'powerpc-5.11-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/64s: fix scv entry fallback flush vs interrupt selftests/powerpc: Only test lwm/stmw on big endian selftests/powerpc: Fix exit status of pkey tests
2021-01-24tests: add mount_setattr() selftestsChristian Brauner5-0/+1434
Add a range of selftests for the new mount_setattr() syscall to verify that it works as expected. This tests that: - no invalid flags can be specified - changing properties of a single mount works and leaves other mounts in the mount tree unchanged - changing a mount tre to read-only when one of the mounts has writers fails and leaves the whole mount tree unchanged - changing mount properties from multiple threads works - changing atime settings works - changing mount propagation works - changing the mount options of a mount tree where the individual mounts in the tree have different mount options only changes the flags that were requested to change - changing mount options from another mount namespace fails - changing mount options from another user namespace fails - idmapped mounts Note, the main test-suite for idmapped mounts is part of xfstests and is pretty huge. These tests here just make sure that the syscalls bits work correctly. TAP version 13 1..20 # Starting 20 tests from 3 test cases. # RUN mount_setattr.invalid_attributes ... # OK mount_setattr.invalid_attributes ok 1 mount_setattr.invalid_attributes # RUN mount_setattr.extensibility ... # OK mount_setattr.extensibility ok 2 mount_setattr.extensibility # RUN mount_setattr.basic ... # OK mount_setattr.basic ok 3 mount_setattr.basic # RUN mount_setattr.basic_recursive ... # OK mount_setattr.basic_recursive ok 4 mount_setattr.basic_recursive # RUN mount_setattr.mount_has_writers ... # OK mount_setattr.mount_has_writers ok 5 mount_setattr.mount_has_writers # RUN mount_setattr.mixed_mount_options ... # OK mount_setattr.mixed_mount_options ok 6 mount_setattr.mixed_mount_options # RUN mount_setattr.time_changes ... # OK mount_setattr.time_changes ok 7 mount_setattr.time_changes # RUN mount_setattr.multi_threaded ... # OK mount_setattr.multi_threaded ok 8 mount_setattr.multi_threaded # RUN mount_setattr.wrong_user_namespace ... # OK mount_setattr.wrong_user_namespace ok 9 mount_setattr.wrong_user_namespace # RUN mount_setattr.wrong_mount_namespace ... # OK mount_setattr.wrong_mount_namespace ok 10 mount_setattr.wrong_mount_namespace # RUN mount_setattr_idmapped.invalid_fd_negative ... # OK mount_setattr_idmapped.invalid_fd_negative ok 11 mount_setattr_idmapped.invalid_fd_negative # RUN mount_setattr_idmapped.invalid_fd_large ... # OK mount_setattr_idmapped.invalid_fd_large ok 12 mount_setattr_idmapped.invalid_fd_large # RUN mount_setattr_idmapped.invalid_fd_closed ... # OK mount_setattr_idmapped.invalid_fd_closed ok 13 mount_setattr_idmapped.invalid_fd_closed # RUN mount_setattr_idmapped.invalid_fd_initial_userns ... # OK mount_setattr_idmapped.invalid_fd_initial_userns ok 14 mount_setattr_idmapped.invalid_fd_initial_userns # RUN mount_setattr_idmapped.attached_mount_inside_current_mount_namespace ... # OK mount_setattr_idmapped.attached_mount_inside_current_mount_namespace ok 15 mount_setattr_idmapped.attached_mount_inside_current_mount_namespace # RUN mount_setattr_idmapped.attached_mount_outside_current_mount_namespace ... # OK mount_setattr_idmapped.attached_mount_outside_current_mount_namespace ok 16 mount_setattr_idmapped.attached_mount_outside_current_mount_namespace # RUN mount_setattr_idmapped.detached_mount_inside_current_mount_namespace ... # OK mount_setattr_idmapped.detached_mount_inside_current_mount_namespace ok 17 mount_setattr_idmapped.detached_mount_inside_current_mount_namespace # RUN mount_setattr_idmapped.detached_mount_outside_current_mount_namespace ... # OK mount_setattr_idmapped.detached_mount_outside_current_mount_namespace ok 18 mount_setattr_idmapped.detached_mount_outside_current_mount_namespace # RUN mount_setattr_idmapped.change_idmapping ... # OK mount_setattr_idmapped.change_idmapping ok 19 mount_setattr_idmapped.change_idmapping # RUN mount_setattr_idmapped.idmap_mount_tree_invalid ... # OK mount_setattr_idmapped.idmap_mount_tree_invalid ok 20 mount_setattr_idmapped.idmap_mount_tree_invalid # PASSED: 20 / 20 tests passed. # Totals: pass:20 fail:0 xfail:0 xpass:0 skip:0 error:0 Link: https://lore.kernel.org/r/[email protected] Cc: Christoph Hellwig <[email protected]> Cc: David Howells <[email protected]> Cc: Al Viro <[email protected]> Cc: [email protected] Signed-off-by: Christian Brauner <[email protected]>
2021-01-24fs: add mount_setattr()Christian Brauner1-1/+3
This implements the missing mount_setattr() syscall. While the new mount api allows to change the properties of a superblock there is currently no way to change the properties of a mount or a mount tree using file descriptors which the new mount api is based on. In addition the old mount api has the restriction that mount options cannot be applied recursively. This hasn't changed since changing mount options on a per-mount basis was implemented in [1] and has been a frequent request not just for convenience but also for security reasons. The legacy mount syscall is unable to accommodate this behavior without introducing a whole new set of flags because MS_REC | MS_REMOUNT | MS_BIND | MS_RDONLY | MS_NOEXEC | [...] only apply the mount option to the topmost mount. Changing MS_REC to apply to the whole mount tree would mean introducing a significant uapi change and would likely cause significant regressions. The new mount_setattr() syscall allows to recursively clear and set mount options in one shot. Multiple calls to change mount options requesting the same changes are idempotent: int mount_setattr(int dfd, const char *path, unsigned flags, struct mount_attr *uattr, size_t usize); Flags to modify path resolution behavior are specified in the @flags argument. Currently, AT_EMPTY_PATH, AT_RECURSIVE, AT_SYMLINK_NOFOLLOW, and AT_NO_AUTOMOUNT are supported. If useful, additional lookup flags to restrict path resolution as introduced with openat2() might be supported in the future. The mount_setattr() syscall can be expected to grow over time and is designed with extensibility in mind. It follows the extensible syscall pattern we have used with other syscalls such as openat2(), clone3(), sched_{set,get}attr(), and others. The set of mount options is passed in the uapi struct mount_attr which currently has the following layout: struct mount_attr { __u64 attr_set; __u64 attr_clr; __u64 propagation; __u64 userns_fd; }; The @attr_set and @attr_clr members are used to clear and set mount options. This way a user can e.g. request that a set of flags is to be raised such as turning mounts readonly by raising MOUNT_ATTR_RDONLY in @attr_set while at the same time requesting that another set of flags is to be lowered such as removing noexec from a mount tree by specifying MOUNT_ATTR_NOEXEC in @attr_clr. Note, since the MOUNT_ATTR_<atime> values are an enum starting from 0, not a bitmap, users wanting to transition to a different atime setting cannot simply specify the atime setting in @attr_set, but must also specify MOUNT_ATTR__ATIME in the @attr_clr field. So we ensure that MOUNT_ATTR__ATIME can't be partially set in @attr_clr and that @attr_set can't have any atime bits set if MOUNT_ATTR__ATIME isn't set in @attr_clr. The @propagation field lets callers specify the propagation type of a mount tree. Propagation is a single property that has four different settings and as such is not really a flag argument but an enum. Specifically, it would be unclear what setting and clearing propagation settings in combination would amount to. The legacy mount() syscall thus forbids the combination of multiple propagation settings too. The goal is to keep the semantics of mount propagation somewhat simple as they are overly complex as it is. The @userns_fd field lets user specify a user namespace whose idmapping becomes the idmapping of the mount. This is implemented and explained in detail in the next patch. [1]: commit 2e4b7fcd9260 ("[PATCH] r/o bind mounts: honor mount writer counts at remount") Link: https://lore.kernel.org/r/[email protected] Cc: David Howells <[email protected]> Cc: Aleksa Sarai <[email protected]> Cc: Al Viro <[email protected]> Cc: [email protected] Cc: [email protected] Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Christian Brauner <[email protected]>
2021-01-23Merge tag 'linux-kselftest-kunit-fixes-5.11-rc5' of ↵Linus Torvalds5-94/+84
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull kunit fixes from Shuah : "Five fixes to the kunit tool and documentation from Daniel Latypov and David Gow" * tag 'linux-kselftest-kunit-fixes-5.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: kunit: tool: move kunitconfig parsing into __init__, make it optional kunit: tool: fix minor typing issue with None status kunit: tool: surface and address more typing issues Documentation: kunit: include example of a parameterized test kunit: tool: Fix spelling of "diagnostic" in kunit_parser
2021-01-22selftests: mlxsw: Add a scale test for physical portsDanielle Ratson5-2/+98
Query the maximum number of supported physical ports using devlink-resource and test that this number can be reached by splitting each of the splittable ports to its width. Test that an error is returned in case the maximum number is exceeded. Signed-off-by: Danielle Ratson <[email protected]> Signed-off-by: Ido Schimmel <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2021-01-22sch_htb: Hierarchical QoS hardware offloadMaxim Mikityanskiy1-0/+1
HTB doesn't scale well because of contention on a single lock, and it also consumes CPU. This patch adds support for offloading HTB to hardware that supports hierarchical rate limiting. In the offload mode, HTB passes control commands to the driver using ndo_setup_tc. The driver has to replicate the whole hierarchy of classes and their settings (rate, ceil) in the NIC. Every modification of the HTB tree caused by the admin results in ndo_setup_tc being called. After this setup, the HTB algorithm is done completely in the NIC. An SQ (send queue) is created for every leaf class and attached to the hierarchy, so that the NIC can calculate and obey aggregated rate limits, too. In the future, it can be changed, so that multiple SQs will back a single leaf class. ndo_select_queue is responsible for selecting the right queue that serves the traffic class of each packet. The data path works as follows: a packet is classified by clsact, the driver selects a hardware queue according to its class, and the packet is enqueued into this queue's qdisc. This solution addresses two main problems of scaling HTB: 1. Contention by flow classification. Currently the filters are attached to the HTB instance as follows: # tc filter add dev eth0 parent 1:0 protocol ip flower dst_port 80 classid 1:10 It's possible to move classification to clsact egress hook, which is thread-safe and lock-free: # tc filter add dev eth0 egress protocol ip flower dst_port 80 action skbedit priority 1:10 This way classification still happens in software, but the lock contention is eliminated, and it happens before selecting the TX queue, allowing the driver to translate the class to the corresponding hardware queue in ndo_select_queue. Note that this is already compatible with non-offloaded HTB and doesn't require changes to the kernel nor iproute2. 2. Contention by handling packets. HTB is not multi-queue, it attaches to a whole net device, and handling of all packets takes the same lock. When HTB is offloaded, it registers itself as a multi-queue qdisc, similarly to mq: HTB is attached to the netdev, and each queue has its own qdisc. Some features of HTB may be not supported by some particular hardware, for example, the maximum number of classes may be limited, the granularity of rate and ceil parameters may be different, etc. - so, the offload is not enabled by default, a new parameter is used to enable it: # tc qdisc replace dev eth0 root handle 1: htb offload Signed-off-by: Maxim Mikityanskiy <[email protected]> Reviewed-by: Tariq Toukan <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2021-01-22Merge branches 'doc.2021.01.06a', 'fixes.2021.01.04b', ↵Paul E. McKenney13-41/+642
'kfree_rcu.2021.01.04a', 'mmdumpobj.2021.01.22a', 'nocb.2021.01.06a', 'rt.2021.01.04a', 'stall.2021.01.06a', 'torture.2021.01.12a' and 'tortureall.2021.01.06a' into HEAD doc.2021.01.06a: Documentation updates. fixes.2021.01.04b: Miscellaneous fixes. kfree_rcu.2021.01.04a: kfree_rcu() updates. mmdumpobj.2021.01.22a: Dump allocation point for memory blocks. nocb.2021.01.06a: RCU callback offload updates and cblist segment lengths. rt.2021.01.04a: Real-time updates. stall.2021.01.06a: RCU CPU stall warning updates. torture.2021.01.12a: Torture-test updates and polling SRCU grace-period API. tortureall.2021.01.06a: Torture-test script updates.
2021-01-22Merge tag 'perf-tools-fixes-v5.11-2-2021-01-22' of ↵Linus Torvalds3-19/+32
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux Pull more perf tools fixes from Arnaldo Carvalho de Melo: - Fix id index used in Intel PT for heterogeneous systems - Fix overrun issue in 'perf script' for dynamically-allocated PMU type number - Fix 'perf stat' metrics containing the 'duration_time' synthetic event - Fix system PMU 'perf stat' metrics * tag 'perf-tools-fixes-v5.11-2-2021-01-22' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: perf script: Fix overrun issue for dynamically-allocated PMU type number perf metricgroup: Fix system PMU metrics perf metricgroup: Fix for metrics containing duration_time perf evlist: Fix id index for heterogeneous systems
2021-01-22Merge tag 'platform-drivers-x86-v5.11-2' of ↵Linus Torvalds1-0/+32
git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86 Pull x86 platform driver fixes from Hans de Goede: "A small collection of bug-fixes and model-specific quirks" * tag 'platform-drivers-x86-v5.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: platform/x86: thinkpad_acpi: Add P53/73 firmware to fan_quirk_table for dual fan control platform/x86: hp-wmi: Don't log a warning on HPWMI_RET_UNKNOWN_COMMAND errors platform/x86: intel-vbtn: Drop HP Stream x360 Convertible PC 11 from allow-list platform/x86: ideapad-laptop: Disable touchpad_switch for ELAN0634 platform/x86: amd-pmc: Fix CONFIG_DEBUG_FS check platform/x86: thinkpad_acpi: correct palmsensor error checking platform/x86: intel-vbtn: Support for tablet mode on Dell Inspiron 7352 platform/x86: touchscreen_dmi: Add swap-x-y quirk for Goodix touchscreen on Estar Beauty HD tablet platform/x86: i2c-multi-instantiate: Don't create platform device for INT3515 ACPI nodes platform/surface: SURFACE_PLATFORMS should depend on ACPI platform/surface: surface_gpe: Fix non-PM_SLEEP build warnings tools/power/x86/intel-speed-select: Set higher of cpuinfo_max_freq or base_frequency tools/power/x86/intel-speed-select: Set scaling_max_freq to base_frequency
2021-01-22ACPICA: Updated all copyrights to 2021Bob Moore10-10/+10
This affects all ACPICA source code modules. ACPICA commit c570953c914437e621dd5f160f26ddf352e0d2f4 Link: https://github.com/acpica/acpica/commit/c570953c Signed-off-by: Bob Moore <[email protected]> Signed-off-by: Erik Kaneda <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]>
2021-01-21selftest/bpf: Fix typoJunlin Yang1-2/+2
Change 'exeeds' to 'exceeds'. Signed-off-by: Junlin Yang <[email protected]> Signed-off-by: Andrii Nakryiko <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2021-01-21libbpf: Use string table index from index table if neededJiri Olsa1-2/+10
For very large ELF objects (with many sections), we could get special value SHN_XINDEX (65535) for elf object's string table index - e_shstrndx. Call elf_getshdrstrndx to get the proper string table index, instead of reading it directly from ELF header. Signed-off-by: Jiri Olsa <[email protected]> Signed-off-by: Andrii Nakryiko <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2021-01-21objtool: Don't fail on missing symbol tableJosh Poimboeuf1-2/+5
Thanks to a recent binutils change which doesn't generate unused symbols, it's now possible for thunk_64.o be completely empty without CONFIG_PREEMPTION: no text, no data, no symbols. We could edit the Makefile to only build that file when CONFIG_PREEMPTION is enabled, but that will likely create confusion if/when the thunks end up getting used by some other code again. Just ignore it and move on. Reported-by: Nathan Chancellor <[email protected]> Reviewed-by: Nathan Chancellor <[email protected]> Reviewed-by: Miroslav Benes <[email protected]> Tested-by: Nathan Chancellor <[email protected]> Link: https://github.com/ClangBuiltLinux/linux/issues/1254 Signed-off-by: Josh Poimboeuf <[email protected]>
2021-01-21objtool: Don't fail the kernel build on fatal errorsJosh Poimboeuf1-9/+5
This is basically a revert of commit 644592d32837 ("objtool: Fail the kernel build on fatal errors"). That change turned out to be more trouble than it's worth. Failing the build is an extreme measure which sometimes gets too much attention and blocks CI build testing. These fatal-type warnings aren't yet as rare as we'd hope, due to the ever-increasing matrix of supported toolchains/plugins and their fast-changing nature as of late. Also, there are more people (and bots) looking for objtool warnings than ever before, so even non-fatal warnings aren't likely to be ignored for long. Suggested-by: Nick Desaulniers <[email protected]> Reviewed-by: Miroslav Benes <[email protected]> Reviewed-by: Nick Desaulniers <[email protected]> Reviewed-by: Kamalesh Babulal <[email protected]> Signed-off-by: Josh Poimboeuf <[email protected]>
2021-01-21selftests: kselftest_harness.h: partially fix kernel-doc markupsMauro Carvalho Chehab1-11/+15
The kernel-doc markups on this file are weird: they don't follow what's specified at: Documentation/doc-guide/kernel-doc.rst In particular, markups should use this format: identifier - description and not this: identifier(args) The way the definitions are inside this file cause the parser to completely miss the identifier name of each function. This prevents improving the script to do some needed validation tests. Address this part. Yet, furter changes are needed in order for it to fully follow the specs. Acked-by: Kees Cook <[email protected]> Signed-off-by: Mauro Carvalho Chehab <[email protected]> Link: https://lore.kernel.org/r/8383758160fdb4fcbb2ac56beeb874ca6dffc6b9.1610610937.git.mchehab+huawei@kernel.org Signed-off-by: Jonathan Corbet <[email protected]>
2021-01-21perf script: Fix overrun issue for dynamically-allocated PMU type numberJin Yao1-1/+17
When unpacking the event which is from dynamic PMU, the array output[OUTPUT_TYPE_MAX] may be overrun. For example, type number of SKL uncore_imc is 10, but OUTPUT_TYPE_MAX is 7 now (OUTPUT_TYPE_MAX = PERF_TYPE_MAX + 1). /* In builtin-script.c */ process_event() { unsigned int type = output_type(attr->type); if (output[type].fields == 0) return; } output[10] is overrun. Create a type OUTPUT_TYPE_OTHER for dynamic PMU events, then output_type(attr->type) will return OUTPUT_TYPE_OTHER here. Note that if PERF_TYPE_MAX ever changed, then there would be a conflict between old perf.data files that had a dynamicaliy allocated PMU number that would then be the same as a fixed PERF_TYPE. Example: # perf record --switch-events -C 0 -e "{cpu-clock,uncore_imc/data_reads/,uncore_imc/data_writes/}:SD" -a -- sleep 1 # perf script Before: swapper 0 [000] 1479253.987551: 277766 cpu-clock: ffffffff9d4ddb6f cpuidle_enter_state+0xdf ([kernel.kallsyms]) swapper 0 [000] 1479253.987797: 246709 cpu-clock: ffffffff9d4ddb6f cpuidle_enter_state+0xdf ([kernel.kallsyms]) swapper 0 [000] 1479253.988127: 329883 cpu-clock: ffffffff9d4ddb6f cpuidle_enter_state+0xdf ([kernel.kallsyms]) swapper 0 [000] 1479253.988273: 146393 cpu-clock: ffffffff9d4ddb6f cpuidle_enter_state+0xdf ([kernel.kallsyms]) swapper 0 [000] 1479253.988523: 249977 cpu-clock: ffffffff9d4ddb6f cpuidle_enter_state+0xdf ([kernel.kallsyms]) swapper 0 [000] 1479253.988877: 354090 cpu-clock: ffffffff9d4ddb6f cpuidle_enter_state+0xdf ([kernel.kallsyms]) swapper 0 [000] 1479253.989023: 145940 cpu-clock: ffffffff9d4ddb6f cpuidle_enter_state+0xdf ([kernel.kallsyms]) swapper 0 [000] 1479253.989383: 359856 cpu-clock: ffffffff9d4ddb6f cpuidle_enter_state+0xdf ([kernel.kallsyms]) swapper 0 [000] 1479253.989523: 140082 cpu-clock: ffffffff9d4ddb6f cpuidle_enter_state+0xdf ([kernel.kallsyms]) After: swapper 0 [000] 1397040.402011: 272384 cpu-clock: ffffffff9d4ddb6f cpuidle_enter_state+0xdf ([kernel.kallsyms]) swapper 0 [000] 1397040.402011: 5396 uncore_imc/data_reads/: swapper 0 [000] 1397040.402011: 967 uncore_imc/data_writes/: swapper 0 [000] 1397040.402259: 249153 cpu-clock: ffffffff9d4ddb6f cpuidle_enter_state+0xdf ([kernel.kallsyms]) swapper 0 [000] 1397040.402259: 7231 uncore_imc/data_reads/: swapper 0 [000] 1397040.402259: 1297 uncore_imc/data_writes/: swapper 0 [000] 1397040.402508: 249108 cpu-clock: ffffffff9d4ddb6f cpuidle_enter_state+0xdf ([kernel.kallsyms]) swapper 0 [000] 1397040.402508: 5333 uncore_imc/data_reads/: swapper 0 [000] 1397040.402508: 1008 uncore_imc/data_writes/: Signed-off-by: Jin Yao <[email protected]> Acked-by: Adrian Hunter <[email protected]> Acked-by: Jiri Olsa <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Kan Liang <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Ingo Molnar <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-01-21perf metricgroup: Fix system PMU metricsJohn Garry1-3/+2
Joakim reports that getting "perf stat" for multiple system PMU metrics segfaults: $ perf stat -a -I 1000 -M imx8mm_ddr_write.all,imx8mm_ddr_write.all Segmentation fault $ While the same works without issue for a single metric. The logic in metricgroup__add_metric_sys_event_iter() is broken, in that add_metric() @m argument should be NULL for each new metric. Fix by not passing a holder for that, and rather make local in metricgroup__add_metric_sys_event_iter(). Fixes: be335ec28efa ("perf metricgroup: Support adding metrics for system PMUs") Reported-by: Joakim Zhang <[email protected]> Signed-off-by: John Garry <[email protected]> Acked-by: Jiri Olsa <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Kajol Jain <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>