aboutsummaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)AuthorFilesLines
2023-01-28bpf: btf: Add BTF_FMODEL_SIGNED_ARG flagIlya Leoshkevich1-1/+15
s390x eBPF JIT needs to know whether a function return value is signed and which function arguments are signed, in order to generate code compliant with the s390x ABI. Signed-off-by: Ilya Leoshkevich <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2023-01-28bpf: iterators: Split iterators.lskel.h into little- and big- endian versionsIlya Leoshkevich5-7/+435
iterators.lskel.h is little-endian, therefore bpf iterator is currently broken on big-endian systems. Introduce a big-endian version and add instructions regarding its generation. Unfortunately bpftool's cross-endianness capabilities are limited to BTF right now, so the procedure requires access to a big-endian machine or a configured emulator. Signed-off-by: Ilya Leoshkevich <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2023-01-28bpf: Build-time assert that cpumask offset is zeroDavid Vernet1-0/+3
The first element of a struct bpf_cpumask is a cpumask_t. This is done to allow struct bpf_cpumask to be cast to a struct cpumask. If this element were ever moved to another field, any BPF program passing a struct bpf_cpumask * to a kfunc expecting a const struct cpumask * would immediately fail to load. Add a build-time assertion so this is assumption is captured and verified. Signed-off-by: David Vernet <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2023-01-28Merge tag 'for-netdev' of ↵Jakub Kicinski11-304/+1460
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== bpf-next 2023-01-28 We've added 124 non-merge commits during the last 22 day(s) which contain a total of 124 files changed, 6386 insertions(+), 1827 deletions(-). The main changes are: 1) Implement XDP hints via kfuncs with initial support for RX hash and timestamp metadata kfuncs, from Stanislav Fomichev and Toke Høiland-Jørgensen. Measurements on overhead: https://lore.kernel.org/bpf/[email protected] 2) Extend libbpf's bpf_tracing.h support for tracing arguments of kprobes/uprobes and syscall as a special case, from Andrii Nakryiko. 3) Significantly reduce the search time for module symbols by livepatch and BPF, from Jiri Olsa and Zhen Lei. 4) Enable cpumasks to be used as kptrs, which is useful for tracing programs tracking which tasks end up running on which CPUs in different time intervals, from David Vernet. 5) Fix several issues in the dynptr processing such as stack slot liveness propagation, missing checks for PTR_TO_STACK variable offset, etc, from Kumar Kartikeya Dwivedi. 6) Various performance improvements, fixes, and introduction of more than just one XDP program to XSK selftests, from Magnus Karlsson. 7) Big batch to BPF samples to reduce deprecated functionality, from Daniel T. Lee. 8) Enable struct_ops programs to be sleepable in verifier, from David Vernet. 9) Reduce pr_warn() noise on BTF mismatches when they are expected under the CONFIG_MODULE_ALLOW_BTF_MISMATCH config anyway, from Connor O'Brien. 10) Describe modulo and division by zero behavior of the BPF runtime in BPF's instruction specification document, from Dave Thaler. 11) Several improvements to libbpf API documentation in libbpf.h, from Grant Seltzer. 12) Improve resolve_btfids header dependencies related to subcmd and add proper support for HOSTCC, from Ian Rogers. 13) Add ipip6 and ip6ip decapsulation support for bpf_skb_adjust_room() helper along with BPF selftests, from Ziyang Xuan. 14) Simplify the parsing logic of structure parameters for BPF trampoline in the x86-64 JIT compiler, from Pu Lehui. 15) Get BTF working for kernels with CONFIG_RUST enabled by excluding Rust compilation units with pahole, from Martin Rodriguez Reboredo. 16) Get bpf_setsockopt() working for kTLS on top of TCP sockets, from Kui-Feng Lee. 17) Disable stack protection for BPF objects in bpftool given BPF backends don't support it, from Holger Hoffstätte. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (124 commits) selftest/bpf: Make crashes more debuggable in test_progs libbpf: Add documentation to map pinning API functions libbpf: Fix malformed documentation formatting selftests/bpf: Properly enable hwtstamp in xdp_hw_metadata selftests/bpf: Calls bpf_setsockopt() on a ktls enabled socket. bpf: Check the protocol of a sock to agree the calls to bpf_setsockopt(). bpf/selftests: Verify struct_ops prog sleepable behavior bpf: Pass const struct bpf_prog * to .check_member libbpf: Support sleepable struct_ops.s section bpf: Allow BPF_PROG_TYPE_STRUCT_OPS programs to be sleepable selftests/bpf: Fix vmtest static compilation error tools/resolve_btfids: Alter how HOSTCC is forced tools/resolve_btfids: Install subcmd headers bpf/docs: Document the nocast aliasing behavior of ___init bpf/docs: Document how nested trusted fields may be defined bpf/docs: Document cpumask kfuncs in a new file selftests/bpf: Add selftest suite for cpumask kfuncs selftests/bpf: Add nested trust selftests suite bpf: Enable cpumasks to be queried and used as kptrs bpf: Disallow NULLable pointers for trusted kfuncs ... ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-01-27Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski5-28/+60
Conflicts: drivers/net/ethernet/intel/ice/ice_main.c 418e53401e47 ("ice: move devlink port creation/deletion") 643ef23bd9dd ("ice: Introduce local var for readability") https://lore.kernel.org/all/[email protected]/ https://lore.kernel.org/all/[email protected]/ drivers/net/ethernet/engleder/tsnep_main.c 3d53aaef4332 ("tsnep: Fix TX queue stop/wake for multiple queues") 25faa6a4c5ca ("tsnep: Replace TX spin_lock with __netif_tx_lock") https://lore.kernel.org/all/[email protected]/ net/netfilter/nf_conntrack_proto_sctp.c 13bd9b31a969 ("Revert "netfilter: conntrack: add sctp DATA_SENT state"") a44b7651489f ("netfilter: conntrack: unify established states for SCTP paths") f71cb8f45d09 ("netfilter: conntrack: sctp: use nf log infrastructure for invalid packets") https://lore.kernel.org/all/[email protected]/ https://lore.kernel.org/all/[email protected]/ Signed-off-by: Jakub Kicinski <[email protected]>
2023-01-25bpf: Pass const struct bpf_prog * to .check_memberDavid Vernet1-1/+1
The .check_member field of struct bpf_struct_ops is currently passed the member's btf_type via const struct btf_type *t, and a const struct btf_member *member. This allows the struct_ops implementation to check whether e.g. an ops is supported, but it would be useful to also enforce that the struct_ops prog being loaded for that member has other qualities, like being sleepable (or not). This patch therefore updates the .check_member() callback to also take a const struct bpf_prog *prog argument. Signed-off-by: David Vernet <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2023-01-25bpf: Allow BPF_PROG_TYPE_STRUCT_OPS programs to be sleepableDavid Vernet1-2/+3
BPF struct_ops programs currently cannot be marked as sleepable. This need not be the case -- struct_ops programs can be sleepable, and e.g. invoke kfuncs that export the KF_SLEEPABLE flag. So as to allow future struct_ops programs to invoke such kfuncs, this patch updates the verifier to allow struct_ops programs to be sleepable. A follow-on patch will add support to libbpf for specifying struct_ops.s as a sleepable struct_ops program, and then another patch will add testcases to the dummy_st_ops selftest suite which test sleepable struct_ops behavior. Signed-off-by: David Vernet <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2023-01-25bpf/docs: Document cpumask kfuncs in a new fileDavid Vernet1-0/+208
Now that we've added a series of new cpumask kfuncs, we should document them so users can easily use them. This patch adds a new cpumasks.rst file to document them. Signed-off-by: David Vernet <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2023-01-25bpf: Enable cpumasks to be queried and used as kptrsDavid Vernet2-0/+269
Certain programs may wish to be able to query cpumasks. For example, if a program that is tracing percpu operations wishes to track which tasks end up running on which CPUs, it could be useful to associate that with the tasks' cpumasks. Similarly, programs tracking NUMA allocations, CPU scheduling domains, etc, could potentially benefit from being able to see which CPUs a task could be migrated to. This patch enables these types of use cases by introducing a series of bpf_cpumask_* kfuncs. Amongst these kfuncs, there are two separate "classes" of operations: 1. kfuncs which allow the caller to allocate and mutate their own cpumask kptrs in the form of a struct bpf_cpumask * object. Such kfuncs include e.g. bpf_cpumask_create() to allocate the cpumask, and bpf_cpumask_or() to mutate it. "Regular" cpumasks such as p->cpus_ptr may not be passed to these kfuncs, and the verifier will ensure this is the case by comparing BTF IDs. 2. Read-only operations which operate on const struct cpumask * arguments. For example, bpf_cpumask_test_cpu(), which tests whether a CPU is set in the cpumask. Any trusted struct cpumask * or struct bpf_cpumask * may be passed to these kfuncs. The verifier allows struct bpf_cpumask * even though the kfunc is defined with struct cpumask * because the first element of a struct bpf_cpumask is a cpumask_t, so it is safe to cast. A follow-on patch will add selftests which validate these kfuncs, and another will document them. Signed-off-by: David Vernet <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2023-01-25bpf: Disallow NULLable pointers for trusted kfuncsDavid Vernet1-0/+6
KF_TRUSTED_ARGS kfuncs currently have a subtle and insidious bug in validating pointers to scalars. Say that you have a kfunc like the following, which takes an array as the first argument: bool bpf_cpumask_empty(const struct cpumask *cpumask) { return cpumask_empty(cpumask); } ... BTF_ID_FLAGS(func, bpf_cpumask_empty, KF_TRUSTED_ARGS) ... If a BPF program were to invoke the kfunc with a NULL argument, it would crash the kernel. The reason is that struct cpumask is defined as a bitmap, which is itself defined as an array, and is accessed as a memory address by bitmap operations. So when the verifier analyzes the register, it interprets it as a pointer to a scalar struct, which is an array of size 8. check_mem_reg() then sees that the register is NULL and returns 0, and the kfunc crashes when it passes it down to the cpumask wrappers. To fix this, this patch adds a check for KF_ARG_PTR_TO_MEM which verifies that the register doesn't contain a possibly-NULL pointer if the kfunc is KF_TRUSTED_ARGS. Signed-off-by: David Vernet <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2023-01-24bpf: Allow trusted args to walk struct when checking BTF IDsDavid Vernet2-1/+90
When validating BTF types for KF_TRUSTED_ARGS kfuncs, the verifier currently enforces that the top-level type must match when calling the kfunc. In other words, the verifier does not allow the BPF program to pass a bitwise equivalent struct, despite it being allowed according to the C standard. For example, if you have the following type: struct nf_conn___init { struct nf_conn ct; }; The C standard stipulates that it would be safe to pass a struct nf_conn___init to a kfunc expecting a struct nf_conn. The verifier currently disallows this, however, as semantically kfuncs may want to enforce that structs that have equivalent types according to the C standard, but have different BTF IDs, are not able to be passed to kfuncs expecting one or the other. For example, struct nf_conn___init may not be queried / looked up, as it is allocated but may not yet be fully initialized. On the other hand, being able to pass types that are equivalent according to the C standard will be useful for other types of kfunc / kptrs enabled by BPF. For example, in a follow-on patch, a series of kfuncs will be added which allow programs to do bitwise queries on cpumasks that are either allocated by the program (in which case they'll be a 'struct bpf_cpumask' type that wraps a cpumask_t as its first element), or a cpumask that was allocated by the main kernel (in which case it will just be a straight cpumask_t, as in task->cpus_ptr). Having the two types of cpumasks allows us to distinguish between the two for when a cpumask is read-only vs. mutatable. A struct bpf_cpumask can be mutated by e.g. bpf_cpumask_clear(), whereas a regular cpumask_t cannot be. On the other hand, a struct bpf_cpumask can of course be queried in the exact same manner as a cpumask_t, with e.g. bpf_cpumask_test_cpu(). If we were to enforce that top level types match, then a user that's passing a struct bpf_cpumask to a read-only cpumask_t argument would have to cast with something like bpf_cast_to_kern_ctx() (which itself would need to be updated to expect the alias, and currently it only accommodates a single alias per prog type). Additionally, not specifying KF_TRUSTED_ARGS is not an option, as some kfuncs take one argument as a struct bpf_cpumask *, and another as a struct cpumask * (i.e. cpumask_t). In order to enable this, this patch relaxes the constraint that a KF_TRUSTED_ARGS kfunc must have strict type matching, and instead only enforces strict type matching if a type is observed to be a "no-cast alias" (i.e., that the type names are equivalent, but one is suffixed with ___init). Additionally, in order to try and be conservative and match existing behavior / expectations, this patch also enforces strict type checking for acquire kfuncs. We were already enforcing it for release kfuncs, so this should also improve the consistency of the semantics for kfuncs. Signed-off-by: David Vernet <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2023-01-24bpf: Enable annotating trusted nested pointersDavid Vernet2-3/+90
In kfuncs, a "trusted" pointer is a pointer that the kfunc can assume is safe, and which the verifier will allow to be passed to a KF_TRUSTED_ARGS kfunc. Currently, a KF_TRUSTED_ARGS kfunc disallows any pointer to be passed at a nonzero offset, but sometimes this is in fact safe if the "nested" pointer's lifetime is inherited from its parent. For example, the const cpumask_t *cpus_ptr field in a struct task_struct will remain valid until the task itself is destroyed, and thus would also be safe to pass to a KF_TRUSTED_ARGS kfunc. While it would be conceptually simple to enable this by using BTF tags, gcc unfortunately does not yet support this. In the interim, this patch enables support for this by using a type-naming convention. A new BTF_TYPE_SAFE_NESTED macro is defined in verifier.c which allows a developer to specify the nested fields of a type which are considered trusted if its parent is also trusted. The verifier is also updated to account for this. A patch with selftests will be added in a follow-on change, along with documentation for this feature. Signed-off-by: David Vernet <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2023-01-24module: Don't wait for GOING modulesPetr Pavlu1-5/+21
During a system boot, it can happen that the kernel receives a burst of requests to insert the same module but loading it eventually fails during its init call. For instance, udev can make a request to insert a frequency module for each individual CPU when another frequency module is already loaded which causes the init function of the new module to return an error. Since commit 6e6de3dee51a ("kernel/module.c: Only return -EEXIST for modules that have finished loading"), the kernel waits for modules in MODULE_STATE_GOING state to finish unloading before making another attempt to load the same module. This creates unnecessary work in the described scenario and delays the boot. In the worst case, it can prevent udev from loading drivers for other devices and might cause timeouts of services waiting on them and subsequently a failed boot. This patch attempts a different solution for the problem 6e6de3dee51a was trying to solve. Rather than waiting for the unloading to complete, it returns a different error code (-EBUSY) for modules in the GOING state. This should avoid the error situation that was described in 6e6de3dee51a (user space attempting to load a dependent module because the -EEXIST error code would suggest to user space that the first module had been loaded successfully), while avoiding the delay situation too. This has been tested on linux-next since December 2022 and passes all kmod selftests except test 0009 with module compression enabled but it has been confirmed that this issue has existed and has gone unnoticed since prior to this commit and can also be reproduced without module compression with a simple usleep(5000000) on tools/modprobe.c [0]. These failures are caused by hitting the kernel mod_concurrent_max and can happen either due to a self inflicted kernel module auto-loead DoS somehow or on a system with large CPU count and each CPU count incorrectly triggering many module auto-loads. Both of those issues need to be fixed in-kernel. [0] https://lore.kernel.org/all/Y9A4fiobL6IHp%2F%[email protected]/ Fixes: 6e6de3dee51a ("kernel/module.c: Only return -EEXIST for modules that have finished loading") Co-developed-by: Martin Wilck <[email protected]> Signed-off-by: Martin Wilck <[email protected]> Signed-off-by: Petr Pavlu <[email protected]> Cc: [email protected] Reviewed-by: Petr Mladek <[email protected]> [mcgrof: enhance commit log with testing and kmod test result interpretation ] Signed-off-by: Luis Chamberlain <[email protected]>
2023-01-23bpf: Support consuming XDP HW metadata from fext programsToke Høiland-Jørgensen3-32/+92
Instead of rejecting the attaching of PROG_TYPE_EXT programs to XDP programs that consume HW metadata, implement support for propagating the offload information. The extension program doesn't need to set a flag or ifindex, these will just be propagated from the target by the verifier. We need to create a separate offload object for the extension program, though, since it can be reattached to a different program later (which means we can't just inherit the offload information from the target). An additional check is added on attach that the new target is compatible with the offload information in the extension prog. Signed-off-by: Toke Høiland-Jørgensen <[email protected]> Signed-off-by: Stanislav Fomichev <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Martin KaFai Lau <[email protected]>
2023-01-23bpf: XDP metadata RX kfuncsStanislav Fomichev3-1/+76
Define a new kfunc set (xdp_metadata_kfunc_ids) which implements all possible XDP metatada kfuncs. Not all devices have to implement them. If kfunc is not supported by the target device, the default implementation is called instead. The verifier, at load time, replaces a call to the generic kfunc with a call to the per-device one. Per-device kfunc pointers are stored in separate struct xdp_metadata_ops. Cc: John Fastabend <[email protected]> Cc: David Ahern <[email protected]> Cc: Martin KaFai Lau <[email protected]> Cc: Jakub Kicinski <[email protected]> Cc: Willem de Bruijn <[email protected]> Cc: Jesper Dangaard Brouer <[email protected]> Cc: Anatoly Burakov <[email protected]> Cc: Alexander Lobakin <[email protected]> Cc: Magnus Karlsson <[email protected]> Cc: Maryam Tahhan <[email protected]> Cc: [email protected] Cc: [email protected] Signed-off-by: Stanislav Fomichev <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Martin KaFai Lau <[email protected]>
2023-01-23bpf: Introduce device-bound XDP programsStanislav Fomichev3-30/+78
New flag BPF_F_XDP_DEV_BOUND_ONLY plus all the infra to have a way to associate a netdev with a BPF program at load time. netdevsim checks are dropped in favor of generic check in dev_xdp_attach. Cc: John Fastabend <[email protected]> Cc: David Ahern <[email protected]> Cc: Martin KaFai Lau <[email protected]> Cc: Jakub Kicinski <[email protected]> Cc: Willem de Bruijn <[email protected]> Cc: Jesper Dangaard Brouer <[email protected]> Cc: Anatoly Burakov <[email protected]> Cc: Alexander Lobakin <[email protected]> Cc: Magnus Karlsson <[email protected]> Cc: Maryam Tahhan <[email protected]> Cc: [email protected] Cc: [email protected] Signed-off-by: Stanislav Fomichev <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Martin KaFai Lau <[email protected]>
2023-01-23bpf: Reshuffle some parts of bpf/offload.cStanislav Fomichev1-105/+117
To avoid adding forward declarations in the main patch, shuffle some code around. No functional changes. Cc: John Fastabend <[email protected]> Cc: David Ahern <[email protected]> Cc: Martin KaFai Lau <[email protected]> Cc: Jakub Kicinski <[email protected]> Cc: Willem de Bruijn <[email protected]> Cc: Jesper Dangaard Brouer <[email protected]> Cc: Anatoly Burakov <[email protected]> Cc: Alexander Lobakin <[email protected]> Cc: Magnus Karlsson <[email protected]> Cc: Maryam Tahhan <[email protected]> Cc: [email protected] Cc: [email protected] Signed-off-by: Stanislav Fomichev <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Martin KaFai Lau <[email protected]>
2023-01-23bpf: Move offload initialization into late_initcallStanislav Fomichev1-15/+7
So we don't have to initialize it manually from several paths. Cc: John Fastabend <[email protected]> Cc: David Ahern <[email protected]> Cc: Martin KaFai Lau <[email protected]> Cc: Jakub Kicinski <[email protected]> Cc: Willem de Bruijn <[email protected]> Cc: Jesper Dangaard Brouer <[email protected]> Cc: Anatoly Burakov <[email protected]> Cc: Alexander Lobakin <[email protected]> Cc: Magnus Karlsson <[email protected]> Cc: Maryam Tahhan <[email protected]> Cc: [email protected] Cc: [email protected] Signed-off-by: Stanislav Fomichev <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Martin KaFai Lau <[email protected]>
2023-01-23bpf: Rename bpf_{prog,map}_is_dev_bound to is_offloadedStanislav Fomichev4-24/+24
BPF offloading infra will be reused to implement bound-but-not-offloaded bpf programs. Rename existing helpers for clarity. No functional changes. Cc: John Fastabend <[email protected]> Cc: David Ahern <[email protected]> Cc: Martin KaFai Lau <[email protected]> Cc: Willem de Bruijn <[email protected]> Cc: Jesper Dangaard Brouer <[email protected]> Cc: Anatoly Burakov <[email protected]> Cc: Alexander Lobakin <[email protected]> Cc: Magnus Karlsson <[email protected]> Cc: Maryam Tahhan <[email protected]> Cc: [email protected] Cc: [email protected] Reviewed-by: Jakub Kicinski <[email protected]> Signed-off-by: Stanislav Fomichev <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Martin KaFai Lau <[email protected]>
2023-01-22Merge tag 'sched_urgent_for_v6.2_rc6' of ↵Linus Torvalds2-23/+35
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Borislav Petkov: - Make sure the scheduler doesn't use stale frequency scaling values when latter get disabled due to a value error - Fix a NULL pointer access on UP configs - Use the proper locking when updating CPU capacity * tag 'sched_urgent_for_v6.2_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/aperfmperf: Erase stale arch_freq_scale values when disabling frequency invariance readings sched/core: Fix NULL pointer access fault in sched_setaffinity() with non-SMP configs sched/fair: Fixes for capacity inversion detection sched/uclamp: Fix a uninitialized variable warnings
2023-01-21Merge tag 'driver-core-6.2-rc5' of ↵Linus Torvalds1-0/+2
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core fixes from Greg KH: "Here are three small driver and kernel core fixes for 6.2-rc5. They include: - potential gadget fixup in do_prlimit - device property refcount leak fix - test_async_probe bugfix for reported problem" * tag 'driver-core-6.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: prlimit: do_prlimit needs to have a speculation check driver core: Fix test_async_probe_init saves device in wrong array device property: fix of node refcount leak in fwnode_graph_get_next_endpoint()
2023-01-21Merge tag 'kbuild-fixes-v6.2-3' of ↵Linus Torvalds1-0/+2
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild fixes from Masahiro Yamada: - Hide LDFLAGS_vmlinux from decompressor Makefiles to fix error messages when GNU Make 4.4 is used. - Fix 'make modules' build error when CONFIG_DEBUG_INFO_BTF_MODULES=y. - Fix warnings emitted by GNU Make 4.4 in scripts/kconfig/Makefile. - Support GNU Make 4.4 for scripts/jobserver-exec. - Show clearer error message when kernel/gen_kheaders.sh fails due to missing cpio. * tag 'kbuild-fixes-v6.2-3' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: kheaders: explicitly validate existence of cpio command scripts: support GNU make 4.4 in jobserver-exec kconfig: Update all declared targets scripts: rpm: make clear that mkspec script contains 4.13 feature init/Kconfig: fix LOCALVERSION_AUTO help text kbuild: fix 'make modules' error when CONFIG_DEBUG_INFO_BTF_MODULES=y kbuild: export top-level LDFLAGS_vmlinux only to scripts/Makefile.vmlinux init/version-timestamp.c: remove unneeded #include <linux/version.h> docs: kbuild: remove mention to dropped $(objtree) feature
2023-01-21prlimit: do_prlimit needs to have a speculation checkGreg Kroah-Hartman1-0/+2
do_prlimit() adds the user-controlled resource value to a pointer that will subsequently be dereferenced. In order to help prevent this codepath from being used as a spectre "gadget" a barrier needs to be added after checking the range. Reported-by: Jordy Zomer <[email protected]> Tested-by: Jordy Zomer <[email protected]> Suggested-by: Linus Torvalds <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2023-01-20bpf: Avoid recomputing spi in process_dynptr_funcKumar Kartikeya Dwivedi1-13/+11
Currently, process_dynptr_func first calls dynptr_get_spi and then is_dynptr_reg_valid_init and is_dynptr_reg_valid_uninit have to call it again to obtain the spi value. Instead of doing this twice, reuse the already obtained value (which is by default 0, and is only set for PTR_TO_STACK, and only used in that case in aforementioned functions). The input value for these two functions will either be -ERANGE or >= 1, and can either be permitted or rejected based on the respective check. Suggested-by: Joanne Koong <[email protected]> Acked-by: Joanne Koong <[email protected]> Signed-off-by: Kumar Kartikeya Dwivedi <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2023-01-20bpf: Combine dynptr_get_spi and is_spi_bounds_validKumar Kartikeya Dwivedi1-42/+33
Currently, a check on spi resides in dynptr_get_spi, while others checking its validity for being within the allocated stack slots happens in is_spi_bounds_valid. Almost always barring a couple of cases (where being beyond allocated stack slots is not an error as stack slots need to be populated), both are used together to make checks. Hence, subsume the is_spi_bounds_valid check in dynptr_get_spi, and return -ERANGE to specially distinguish the case where spi is valid but not within allocated slots in the stack state. The is_spi_bounds_valid function is still kept around as it is a generic helper that will be useful for other objects on stack similar to dynptr in the future. Suggested-by: Joanne Koong <[email protected]> Acked-by: Joanne Koong <[email protected]> Signed-off-by: Kumar Kartikeya Dwivedi <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2023-01-20bpf: Allow reinitializing unreferenced dynptr stack slotsKumar Kartikeya Dwivedi1-8/+26
Consider a program like below: void prog(void) { { struct bpf_dynptr ptr; bpf_dynptr_from_mem(...); } ... { struct bpf_dynptr ptr; bpf_dynptr_from_mem(...); } } Here, the C compiler based on lifetime rules in the C standard would be well within in its rights to share stack storage for dynptr 'ptr' as their lifetimes do not overlap in the two distinct scopes. Currently, such an example would be rejected by the verifier, but this is too strict. Instead, we should allow reinitializing over dynptr stack slots and forget information about the old dynptr object. The destroy_if_dynptr_stack_slot function already makes necessary checks to avoid overwriting referenced dynptr slots. This is done to present a better error message instead of forgetting dynptr information on stack and preserving reference state, leading to an inevitable but undecipherable error at the end about an unreleased reference which has to be associated back to its allocating call instruction to make any sense to the user. Acked-by: Joanne Koong <[email protected]> Signed-off-by: Kumar Kartikeya Dwivedi <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2023-01-20bpf: Invalidate slices on destruction of dynptrs on stackKumar Kartikeya Dwivedi1-12/+62
The previous commit implemented destroy_if_dynptr_stack_slot. It destroys the dynptr which given spi belongs to, but still doesn't invalidate the slices that belong to such a dynptr. While for the case of referenced dynptr, we don't allow their overwrite and return an error early, we still allow it and destroy the dynptr for unreferenced dynptr. To be able to enable precise and scoped invalidation of dynptr slices in this case, we must be able to associate the source dynptr of slices that have been obtained using bpf_dynptr_data. When doing destruction, only slices belonging to the dynptr being destructed should be invalidated, and nothing else. Currently, dynptr slices belonging to different dynptrs are indistinguishible. Hence, allocate a unique id to each dynptr (CONST_PTR_TO_DYNPTR and those on stack). This will be stored as part of reg->id. Whenever using bpf_dynptr_data, transfer this unique dynptr id to the returned PTR_TO_MEM_OR_NULL slice pointer, and store it in a new per-PTR_TO_MEM dynptr_id register state member. Finally, after establishing such a relationship between dynptrs and their slices, implement precise invalidation logic that only invalidates slices belong to the destroyed dynptr in destroy_if_dynptr_stack_slot. Acked-by: Joanne Koong <[email protected]> Signed-off-by: Kumar Kartikeya Dwivedi <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2023-01-20bpf: Fix partial dynptr stack slot reads/writesKumar Kartikeya Dwivedi1-0/+88
Currently, while reads are disallowed for dynptr stack slots, writes are not. Reads don't work from both direct access and helpers, while writes do work in both cases, but have the effect of overwriting the slot_type. While this is fine, handling for a few edge cases is missing. Firstly, a user can overwrite the stack slots of dynptr partially. Consider the following layout: spi: [d][d][?] 2 1 0 First slot is at spi 2, second at spi 1. Now, do a write of 1 to 8 bytes for spi 1. This will essentially either write STACK_MISC for all slot_types or STACK_MISC and STACK_ZERO (in case of size < BPF_REG_SIZE partial write of zeroes). The end result is that slot is scrubbed. Now, the layout is: spi: [d][m][?] 2 1 0 Suppose if user initializes spi = 1 as dynptr. We get: spi: [d][d][d] 2 1 0 But this time, both spi 2 and spi 1 have first_slot = true. Now, when passing spi 2 to dynptr helper, it will consider it as initialized as it does not check whether second slot has first_slot == false. And spi 1 should already work as normal. This effectively replaced size + offset of first dynptr, hence allowing invalid OOB reads and writes. Make a few changes to protect against this: When writing to PTR_TO_STACK using BPF insns, when we touch spi of a STACK_DYNPTR type, mark both first and second slot (regardless of which slot we touch) as STACK_INVALID. Reads are already prevented. Second, prevent writing to stack memory from helpers if the range may contain any STACK_DYNPTR slots. Reads are already prevented. For helpers, we cannot allow it to destroy dynptrs from the writes as depending on arguments, helper may take uninit_mem and dynptr both at the same time. This would mean that helper may write to uninit_mem before it reads the dynptr, which would be bad. PTR_TO_MEM: [?????dd] Depending on the code inside the helper, it may end up overwriting the dynptr contents first and then read those as the dynptr argument. Verifier would only simulate destruction when it does byte by byte access simulation in check_helper_call for meta.access_size, and fail to catch this case, as it happens after argument checks. The same would need to be done for any other non-trivial objects created on the stack in the future, such as bpf_list_head on stack, or bpf_rb_root on stack. A common misunderstanding in the current code is that MEM_UNINIT means writes, but note that writes may also be performed even without MEM_UNINIT in case of helpers, in that case the code after handling meta && meta->raw_mode will complain when it sees STACK_DYNPTR. So that invalid read case also covers writes to potential STACK_DYNPTR slots. The only loophole was in case of meta->raw_mode which simulated writes through instructions which could overwrite them. A future series sequenced after this will focus on the clean up of helper access checks and bugs around that. Fixes: 97e03f521050 ("bpf: Add verifier support for dynptrs") Signed-off-by: Kumar Kartikeya Dwivedi <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2023-01-20bpf: Fix missing var_off check for ARG_PTR_TO_DYNPTRKumar Kartikeya Dwivedi1-18/+66
Currently, the dynptr function is not checking the variable offset part of PTR_TO_STACK that it needs to check. The fixed offset is considered when computing the stack pointer index, but if the variable offset was not a constant (such that it could not be accumulated in reg->off), we will end up a discrepency where runtime pointer does not point to the actual stack slot we mark as STACK_DYNPTR. It is impossible to precisely track dynptr state when variable offset is not constant, hence, just like bpf_timer, kptr, bpf_spin_lock, etc. simply reject the case where reg->var_off is not constant. Then, consider both reg->off and reg->var_off.value when computing the stack pointer index. A new helper dynptr_get_spi is introduced to hide over these details since the dynptr needs to be located in multiple places outside the process_dynptr_func checks, hence once we know it's a PTR_TO_STACK, we need to enforce these checks in all places. Note that it is disallowed for unprivileged users to have a non-constant var_off, so this problem should only be possible to trigger from programs having CAP_PERFMON. However, its effects can vary. Without the fix, it is possible to replace the contents of the dynptr arbitrarily by making verifier mark different stack slots than actual location and then doing writes to the actual stack address of dynptr at runtime. Fixes: 97e03f521050 ("bpf: Add verifier support for dynptrs") Acked-by: Joanne Koong <[email protected]> Signed-off-by: Kumar Kartikeya Dwivedi <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2023-01-20bpf: Fix state pruning for STACK_DYNPTR stack slotsKumar Kartikeya Dwivedi1-4/+84
The root of the problem is missing liveness marking for STACK_DYNPTR slots. This leads to all kinds of problems inside stacksafe. The verifier by default inside stacksafe ignores spilled_ptr in stack slots which do not have REG_LIVE_READ marks. Since this is being checked in the 'old' explored state, it must have already done clean_live_states for this old bpf_func_state. Hence, it won't be receiving any more liveness marks from to be explored insns (it has received REG_LIVE_DONE marking from liveness point of view). What this means is that verifier considers that it's safe to not compare the stack slot if was never read by children states. While liveness marks are usually propagated correctly following the parentage chain for spilled registers (SCALAR_VALUE and PTR_* types), the same is not the case for STACK_DYNPTR. clean_live_states hence simply rewrites these stack slots to the type STACK_INVALID since it sees no REG_LIVE_READ marks. The end result is that we will never see STACK_DYNPTR slots in explored state. Even if verifier was conservatively matching !REG_LIVE_READ slots, very next check continuing the stacksafe loop on seeing STACK_INVALID would again prevent further checks. Now as long as verifier stores an explored state which we can compare to when reaching a pruning point, we can abuse this bug to make verifier prune search for obviously unsafe paths using STACK_DYNPTR slots thinking they are never used hence safe. Doing this in unprivileged mode is a bit challenging. add_new_state is only set when seeing BPF_F_TEST_STATE_FREQ (which requires privileges) or when jmps_processed difference is >= 2 and insn_processed difference is >= 8. So coming up with the unprivileged case requires a little more work, but it is still totally possible. The test case being discussed below triggers the heuristic even in unprivileged mode. However, it no longer works since commit 8addbfc7b308 ("bpf: Gate dynptr API behind CAP_BPF"). Let's try to study the test step by step. Consider the following program (C style BPF ASM): 0 r0 = 0; 1 r6 = &ringbuf_map; 3 r1 = r6; 4 r2 = 8; 5 r3 = 0; 6 r4 = r10; 7 r4 -= -16; 8 call bpf_ringbuf_reserve_dynptr; 9 if r0 == 0 goto pc+1; 10 goto pc+1; 11 *(r10 - 16) = 0xeB9F; 12 r1 = r10; 13 r1 -= -16; 14 r2 = 0; 15 call bpf_ringbuf_discard_dynptr; 16 r0 = 0; 17 exit; We know that insn 12 will be a pruning point, hence if we force add_new_state for it, it will first verify the following path as safe in straight line exploration: 0 1 3 4 5 6 7 8 9 -> 10 -> (12) 13 14 15 16 17 Then, when we arrive at insn 12 from the following path: 0 1 3 4 5 6 7 8 9 -> 11 (12) We will find a state that has been verified as safe already at insn 12. Since register state is same at this point, regsafe will pass. Next, in stacksafe, for spi = 0 and spi = 1 (location of our dynptr) is skipped seeing !REG_LIVE_READ. The rest matches, so stacksafe returns true. Next, refsafe is also true as reference state is unchanged in both states. The states are considered equivalent and search is pruned. Hence, we are able to construct a dynptr with arbitrary contents and use the dynptr API to operate on this arbitrary pointer and arbitrary size + offset. To fix this, first define a mark_dynptr_read function that propagates liveness marks whenever a valid initialized dynptr is accessed by dynptr helpers. REG_LIVE_WRITTEN is marked whenever we initialize an uninitialized dynptr. This is done in mark_stack_slots_dynptr. It allows screening off mark_reg_read and not propagating marks upwards from that point. This ensures that we either set REG_LIVE_READ64 on both dynptr slots, or none, so clean_live_states either sets both slots to STACK_INVALID or none of them. This is the invariant the checks inside stacksafe rely on. Next, do a complete comparison of both stack slots whenever they have STACK_DYNPTR. Compare the dynptr type stored in the spilled_ptr, and also whether both form the same first_slot. Only then is the later path safe. Fixes: 97e03f521050 ("bpf: Add verifier support for dynptrs") Acked-by: Eduard Zingerman <[email protected]> Signed-off-by: Kumar Kartikeya Dwivedi <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2023-01-20Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski7-38/+29
drivers/net/ipa/ipa_interrupt.c drivers/net/ipa/ipa_interrupt.h 9ec9b2a30853 ("net: ipa: disable ipa interrupt during suspend") 8e461e1f092b ("net: ipa: introduce ipa_interrupt_enable()") d50ed3558719 ("net: ipa: enable IPA interrupt handlers separate from registration") https://lore.kernel.org/all/[email protected]/ https://lore.kernel.org/all/[email protected]/ Signed-off-by: Jakub Kicinski <[email protected]>
2023-01-20Merge tag 'net-6.2-rc5-2' of ↵Linus Torvalds5-23/+21
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from wireless, bluetooth, bpf and netfilter. Current release - regressions: - Revert "net: team: use IFF_NO_ADDRCONF flag to prevent ipv6 addrconf", fix nsna_ping mode of team - wifi: mt76: fix bugs in Rx queue handling and DMA mapping - eth: mlx5: - add missing mutex_unlock in error reporter - protect global IPsec ASO with a lock Current release - new code bugs: - rxrpc: fix wrong error return in rxrpc_connect_call() Previous releases - regressions: - bluetooth: hci_sync: fix use of HCI_OP_LE_READ_BUFFER_SIZE_V2 - wifi: - mac80211: fix crashes on Rx due to incorrect initialization of rx->link and rx->link_sta - mac80211: fix bugs in iTXQ conversion - Tx stalls, incorrect aggregation handling, crashes - brcmfmac: fix regression for Broadcom PCIe wifi devices - rndis_wlan: prevent buffer overflow in rndis_query_oid - netfilter: conntrack: handle tcp challenge acks during connection reuse - sched: avoid grafting on htb_destroy_class_offload when destroying - virtio-net: correctly enable callback during start_xmit, fix stalls - tcp: avoid the lookup process failing to get sk in ehash table - ipa: disable ipa interrupt during suspend - eth: stmmac: enable all safety features by default Previous releases - always broken: - bpf: - fix pointer-leak due to insufficient speculative store bypass mitigation (Spectre v4) - skip task with pid=1 in send_signal_common() to avoid a splat - fix BPF program ID information in BPF_AUDIT_UNLOAD as well as PERF_BPF_EVENT_PROG_UNLOAD events - fix potential deadlock in htab_lock_bucket from same bucket index but different map_locked index - bluetooth: - fix a buffer overflow in mgmt_mesh_add() - hci_qca: fix driver shutdown on closed serdev - ISO: fix possible circular locking dependency - CIS: hci_event: fix invalid wait context - wifi: brcmfmac: fixes for survey dump handling - mptcp: explicitly specify sock family at subflow creation time - netfilter: nft_payload: incorrect arithmetics when fetching VLAN header bits - tcp: fix rate_app_limited to default to 1 - l2tp: close all race conditions in l2tp_tunnel_register() - eth: mlx5: fixes for QoS config and eswitch configuration - eth: enetc: avoid deadlock in enetc_tx_onestep_tstamp() - eth: stmmac: fix invalid call to mdiobus_get_phy() Misc: - ethtool: add netlink attr in rss get reply only if the value is not empty" * tag 'net-6.2-rc5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (88 commits) Revert "Merge branch 'octeontx2-af-CPT'" tcp: fix rate_app_limited to default to 1 bnxt: Do not read past the end of test names net: stmmac: enable all safety features by default octeontx2-af: add mbox to return CPT_AF_FLT_INT info octeontx2-af: update cpt lf alloc mailbox octeontx2-af: restore rxc conf after teardown sequence octeontx2-af: optimize cpt pf identification octeontx2-af: modify FLR sequence for CPT octeontx2-af: add mbox for CPT LF reset octeontx2-af: recover CPT engine when it gets fault net: dsa: microchip: ksz9477: port map correction in ALU table entry register selftests/net: toeplitz: fix race on tpacket_v3 block close net/ulp: use consistent error code when blocking ULP octeontx2-pf: Fix the use of GFP_KERNEL in atomic context on rt tcp: avoid the lookup process failing to get sk in ehash table Revert "net: team: use IFF_NO_ADDRCONF flag to prevent ipv6 addrconf" MAINTAINERS: add networking entries for Willem net: sched: gred: prevent races when adding offloads to stats l2tp: prevent lockdep issue in l2tp_tunnel_register() ...
2023-01-19bpf: Change modules resolving for kprobe multi linkJiri Olsa1-46/+47
We currently use module_kallsyms_on_each_symbol that iterates all modules/symbols and we try to lookup each such address in user provided symbols/addresses to get list of used modules. This fix instead only iterates provided kprobe addresses and calls __module_address on each to get list of used modules. This turned out to be simpler and also bit faster. On my setup with workload (executed 10 times): # test_progs -t kprobe_multi_bench_attach/modules Current code: Performance counter stats for './test.sh' (5 runs): 76,081,161,596 cycles:k ( +- 0.47% ) 18.3867 +- 0.0992 seconds time elapsed ( +- 0.54% ) With the fix: Performance counter stats for './test.sh' (5 runs): 74,079,889,063 cycles:k ( +- 0.04% ) 17.8514 +- 0.0218 seconds time elapsed ( +- 0.12% ) Signed-off-by: Jiri Olsa <[email protected]> Reviewed-by: Zhen Lei <[email protected]> Reviewed-by: Petr Mladek <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2023-01-19livepatch: Improve the search performance of module_kallsyms_on_each_symbol()Zhen Lei4-12/+15
Currently we traverse all symbols of all modules to find the specified function for the specified module. But in reality, we just need to find the given module and then traverse all the symbols in it. Let's add a new parameter 'const char *modname' to function module_kallsyms_on_each_symbol(), then we can compare the module names directly in this function and call hook 'fn' after matching. If 'modname' is NULL, the symbols of all modules are still traversed for compatibility with other usage cases. Phase1: mod1-->mod2..(subsequent modules do not need to be compared) | Phase2: -->f1-->f2-->f3 Assuming that there are m modules, each module has n symbols on average, then the time complexity is reduced from O(m * n) to O(m) + O(n). Reviewed-by: Petr Mladek <[email protected]> Acked-by: Song Liu <[email protected]> Signed-off-by: Zhen Lei <[email protected]> Signed-off-by: Jiri Olsa <[email protected]> Acked-by: Miroslav Benes <[email protected]> Reviewed-by: Luis Chamberlain <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2023-01-19Merge tag 'printk-for-6.2-rc5' of ↵Linus Torvalds1-0/+2
git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux Pull printk fixes from Petr Mladek: - Prevent a potential deadlock when configuring kgdb console - Fix a kernel doc warning * tag 'printk-for-6.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux: kernel/printk/printk.c: Fix W=1 kernel-doc warning tty: serial: kgdboc: fix mutex locking order for configure_kgdboc()
2023-01-19Merge branch 'rework/console-list-lock' into for-linusPetr Mladek1-0/+2
2023-01-18kheaders: explicitly validate existence of cpio commandThomas Weißschuh1-0/+2
If the cpio command is not available the error emitted by gen_kheaders.so is not clear as all output of the call to cpio is discarded: GNU make 4.4: GEN kernel/kheaders_data.tar.xz find: 'standard output': Broken pipe find: write error make[2]: *** [kernel/Makefile:157: kernel/kheaders_data.tar.xz] Error 127 make[1]: *** [scripts/Makefile.build:504: kernel] Error 2 GNU make < 4.4: GEN kernel/kheaders_data.tar.xz make[2]: *** [kernel/Makefile:157: kernel/kheaders_data.tar.xz] Error 127 make[2]: *** Waiting for unfinished jobs.... make[1]: *** [scripts/Makefile.build:504: kernel] Error 2 Add an explicit check that will trigger a clear message about the issue: CHK kernel/kheaders_data.tar.xz ./kernel/gen_kheaders.sh: line 17: type: cpio: not found The other commands executed by gen_kheaders.sh are part of a standard installation, so they are not checked. Reported-by: Amy Parker <[email protected]> Link: https://lore.kernel.org/lkml/CAPOgqxFva=tOuh1UitCSN38+28q3BNXKq19rEsVNPRzRqKqZ+g@mail.gmail.com/ Signed-off-by: Thomas Weißschuh <[email protected]> Reviewed-by: Nicolas Schier <[email protected]> Signed-off-by: Masahiro Yamada <[email protected]>
2023-01-17Merge tag 'for-netdev' of ↵Jakub Kicinski5-23/+21
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Daniel Borkmann says: ==================== bpf 2023-01-16 We've added 6 non-merge commits during the last 8 day(s) which contain a total of 6 files changed, 22 insertions(+), 24 deletions(-). The main changes are: 1) Mitigate a Spectre v4 leak in unprivileged BPF from speculative pointer-as-scalar type confusion, from Luis Gerhorst. 2) Fix a splat when pid 1 attaches a BPF program that attempts to send killing signal to itself, from Hao Sun. 3) Fix BPF program ID information in BPF_AUDIT_UNLOAD as well as PERF_BPF_EVENT_PROG_UNLOAD events, from Paul Moore. 4) Fix BPF verifier warning triggered from invalid kfunc call in backtrack_insn, also from Hao Sun. 5) Fix potential deadlock in htab_lock_bucket from same bucket index but different map_locked index, from Tonghao Zhang. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: bpf: Fix pointer-leak due to insufficient speculative store bypass mitigation bpf: hash map, avoid deadlock with suitable hash mask bpf: remove the do_idr_lock parameter from bpf_prog_free_id() bpf: restore the ebpf program ID for BPF_AUDIT_UNLOAD and PERF_BPF_EVENT_PROG_UNLOAD bpf: Skip task with pid=1 in send_signal_common() bpf: Skip invalid kfunc call in backtrack_insn ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-01-17bpf: Do not allow to load sleepable BPF_TRACE_RAW_TP programJiri Olsa1-3/+19
Currently we allow to load any tracing program as sleepable, but BPF_TRACE_RAW_TP can't sleep. Making the check explicit for tracing programs attach types, so sleepable BPF_TRACE_RAW_TP will fail to load. Updating the verifier error to mention iter programs as well. Acked-by: Song Liu <[email protected]> Acked-by: Yonghong Song <[email protected]> Signed-off-by: Jiri Olsa <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2023-01-16kernel/printk/printk.c: Fix W=1 kernel-doc warningAnuradha Weeraman1-0/+1
Fix W=1 kernel-doc warning: kernel/printk/printk.c: - Include function parameter in console_lock_spinning_disable_and_check() Signed-off-by: Anuradha Weeraman <[email protected]> Reviewed-by: Petr Mladek <[email protected]> Signed-off-by: Petr Mladek <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-01-16tty: serial: kgdboc: fix mutex locking order for configure_kgdboc()John Ogness1-0/+1
Several mutexes are taken while setting up console serial ports. In particular, the tty_port->mutex and @console_mutex are taken: serial_pnp_probe serial8250_register_8250_port uart_add_one_port (locks tty_port->mutex) uart_configure_port register_console (locks @console_mutex) In order to synchronize kgdb's tty_find_polling_driver() with register_console(), commit 6193bc90849a ("tty: serial: kgdboc: synchronize tty_find_polling_driver() and register_console()") takes the @console_mutex. However, this leads to the following call chain (with locking): platform_probe kgdboc_probe configure_kgdboc (locks @console_mutex) tty_find_polling_driver uart_poll_init (locks tty_port->mutex) uart_set_options This is clearly deadlock potential due to the reverse lock ordering. Since uart_set_options() requires holding @console_mutex in order to serialize early initialization of the serial-console lock, take the @console_mutex in uart_poll_init() instead of configure_kgdboc(). Since configure_kgdboc() was using @console_mutex for safe traversal of the console list, change it to use the SRCU iterator instead. Add comments to uart_set_options() kerneldoc mentioning that it requires holding @console_mutex (aka the console_list_lock). Fixes: 6193bc90849a ("tty: serial: kgdboc: synchronize tty_find_polling_driver() and register_console()") Signed-off-by: John Ogness <[email protected]> Reviewed-by: Sergey Senozhatsky <[email protected]> Reviewed-by: Petr Mladek <[email protected]> [[email protected]: Export console_srcu_read_lock_is_held() to fix build kgdboc as a module.] Signed-off-by: Petr Mladek <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-01-16sched/core: Fix NULL pointer access fault in sched_setaffinity() with ↵Waiman Long1-2/+8
non-SMP configs The kernel commit 9a5418bc48ba ("sched/core: Use kfree_rcu() in do_set_cpus_allowed()") introduces a bug for kernels built with non-SMP configs. Calling sched_setaffinity() on such a uniprocessor kernel will cause cpumask_copy() to be called with a NULL pointer leading to general protection fault. This is not really a problem in real use cases as there aren't that many uniprocessor kernel configs in use and calling sched_setaffinity() on such a uniprocessor system doesn't make sense. Fix this problem by making sure cpumask_copy() will not be called in such a case. Fixes: 9a5418bc48ba ("sched/core: Use kfree_rcu() in do_set_cpus_allowed()") Reported-by: kernel test robot <[email protected]> Signed-off-by: Waiman Long <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Acked-by: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-01-14Merge tag 'modules-6.2-rc4' of ↵Linus Torvalds1-15/+6
git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux Pull module fix from Luis Chamberlain: "Just one fix for modules by Nick" * tag 'modules-6.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux: kallsyms: Fix scheduling with interrupts disabled in self-test
2023-01-13kallsyms: Fix scheduling with interrupts disabled in self-testNicholas Piggin1-15/+6
kallsyms_on_each* may schedule so must not be called with interrupts disabled. The iteration function could disable interrupts, but this also changes lookup_symbol() to match the change to the other timing code. Reported-by: Erhard F. <[email protected]> Link: https://lore.kernel.org/all/[email protected]%2F/ Reported-by: kernel test robot <[email protected]> Link: https://lore.kernel.org/oe-lkp/[email protected] Fixes: 30f3bb09778d ("kallsyms: Add self-test facility") Tested-by: "Erhard F." <[email protected]> Signed-off-by: Nicholas Piggin <[email protected]> Signed-off-by: Luis Chamberlain <[email protected]>
2023-01-13bpf: Fix pointer-leak due to insufficient speculative store bypass mitigationLuis Gerhorst1-1/+3
To mitigate Spectre v4, 2039f26f3aca ("bpf: Fix leakage due to insufficient speculative store bypass mitigation") inserts lfence instructions after 1) initializing a stack slot and 2) spilling a pointer to the stack. However, this does not cover cases where a stack slot is first initialized with a pointer (subject to sanitization) but then overwritten with a scalar (not subject to sanitization because the slot was already initialized). In this case, the second write may be subject to speculative store bypass (SSB) creating a speculative pointer-as-scalar type confusion. This allows the program to subsequently leak the numerical pointer value using, for example, a branch-based cache side channel. To fix this, also sanitize scalars if they write a stack slot that previously contained a pointer. Assuming that pointer-spills are only generated by LLVM on register-pressure, the performance impact on most real-world BPF programs should be small. The following unprivileged BPF bytecode drafts a minimal exploit and the mitigation: [...] // r6 = 0 or 1 (skalar, unknown user input) // r7 = accessible ptr for side channel // r10 = frame pointer (fp), to be leaked // r9 = r10 # fp alias to encourage ssb *(u64 *)(r9 - 8) = r10 // fp[-8] = ptr, to be leaked // lfence added here because of pointer spill to stack. // // Ommitted: Dummy bpf_ringbuf_output() here to train alias predictor // for no r9-r10 dependency. // *(u64 *)(r10 - 8) = r6 // fp[-8] = scalar, overwrites ptr // 2039f26f3aca: no lfence added because stack slot was not STACK_INVALID, // store may be subject to SSB // // fix: also add an lfence when the slot contained a ptr // r8 = *(u64 *)(r9 - 8) // r8 = architecturally a scalar, speculatively a ptr // // leak ptr using branch-based cache side channel: r8 &= 1 // choose bit to leak if r8 == 0 goto SLOW // no mispredict // architecturally dead code if input r6 is 0, // only executes speculatively iff ptr bit is 1 r8 = *(u64 *)(r7 + 0) # encode bit in cache (0: slow, 1: fast) SLOW: [...] After running this, the program can time the access to *(r7 + 0) to determine whether the chosen pointer bit was 0 or 1. Repeat this 64 times to recover the whole address on amd64. In summary, sanitization can only be skipped if one scalar is overwritten with another scalar. Scalar-confusion due to speculative store bypass can not lead to invalid accesses because the pointer bounds deducted during verification are enforced using branchless logic. See 979d63d50c0c ("bpf: prevent out of bounds speculation on pointer arithmetic") for details. Do not make the mitigation depend on !env->allow_{uninit_stack,ptr_leaks} because speculative leaks are likely unexpected if these were enabled. For example, leaking the address to a protected log file may be acceptable while disabling the mitigation might unintentionally leak the address into the cached-state of a map that is accessible to unprivileged processes. Fixes: 2039f26f3aca ("bpf: Fix leakage due to insufficient speculative store bypass mitigation") Signed-off-by: Luis Gerhorst <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Henriette Hofmeier <[email protected]> Link: https://lore.kernel.org/bpf/[email protected] Link: https://lore.kernel.org/bpf/[email protected]
2023-01-13sched/fair: Fixes for capacity inversion detectionQais Yousef1-2/+11
Traversing the Perf Domains requires rcu_read_lock() to be held and is conditional on sched_energy_enabled(). Ensure right protections applied. Also skip capacity inversion detection for our own pd; which was an error. Fixes: 44c7b80bffc3 ("sched/fair: Detect capacity inversion") Reported-by: Dietmar Eggemann <[email protected]> Signed-off-by: Qais Yousef (Google) <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Vincent Guittot <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-01-13sched/uclamp: Fix a uninitialized variable warningsQais Yousef1-19/+16
Addresses the following warnings: > config: riscv-randconfig-m031-20221111 > compiler: riscv64-linux-gcc (GCC) 12.1.0 > > smatch warnings: > kernel/sched/fair.c:7263 find_energy_efficient_cpu() error: uninitialized symbol 'util_min'. > kernel/sched/fair.c:7263 find_energy_efficient_cpu() error: uninitialized symbol 'util_max'. Fixes: 244226035a1f ("sched/uclamp: Fix fits_capacity() check in feec()") Reported-by: kernel test robot <[email protected]> Reported-by: Dan Carpenter <[email protected]> Signed-off-by: Qais Yousef (Google) <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Vincent Guittot <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-01-12Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski5-20/+76
drivers/net/usb/r8152.c be53771c87f4 ("r8152: add vendor/device ID pair for Microsoft Devkit") ec51fbd1b8a2 ("r8152: add USB device driver for config selection") https://lore.kernel.org/all/[email protected]/ Signed-off-by: Jakub Kicinski <[email protected]>
2023-01-12bpf: hash map, avoid deadlock with suitable hash maskTonghao Zhang1-2/+2
The deadlock still may occur while accessed in NMI and non-NMI context. Because in NMI, we still may access the same bucket but with different map_locked index. For example, on the same CPU, .max_entries = 2, we update the hash map, with key = 4, while running bpf prog in NMI nmi_handle(), to update hash map with key = 20, so it will have the same bucket index but have different map_locked index. To fix this issue, using min mask to hash again. Fixes: 20b6cc34ea74 ("bpf: Avoid hashtab deadlock with map_locked") Signed-off-by: Tonghao Zhang <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Daniel Borkmann <[email protected]> Cc: Andrii Nakryiko <[email protected]> Cc: Martin KaFai Lau <[email protected]> Cc: Song Liu <[email protected]> Cc: Yonghong Song <[email protected]> Cc: John Fastabend <[email protected]> Cc: KP Singh <[email protected]> Cc: Stanislav Fomichev <[email protected]> Cc: Hao Luo <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Hou Tao <[email protected]> Acked-by: Yonghong Song <[email protected]> Acked-by: Hou Tao <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Martin KaFai Lau <[email protected]>
2023-01-12Merge tag 'timers-urgent-2023-01-12' of ↵Linus Torvalds3-10/+10
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer doc fixes from Ingo Molnar: - Fix various DocBook formatting errors in kernel/time/ that generated (justified) warnings during a kernel-doc build. * tag 'timers-urgent-2023-01-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: time: Fix various kernel-doc problems