aboutsummaryrefslogtreecommitdiff
path: root/lib
AgeCommit message (Collapse)AuthorFilesLines
2023-12-10lib/stackdepot: check disabled flag when fetchingAndrey Konovalov1-1/+1
Do not try fetching a stack trace from the stack depot if the stack_depot_disabled flag is enabled. Link: https://lkml.kernel.org/r/c3bfa3b7ab00b2e48ab75a3fbb9c67555777cb08.1700502145.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <[email protected]> Reviewed-by: Alexander Potapenko <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Evgenii Stepanov <[email protected]> Cc: Marco Elver <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-12-10lib/stackdepot: print disabled message only if truly disabledAndrey Konovalov1-9/+15
Patch series "stackdepot: allow evicting stack traces", v4. Currently, the stack depot grows indefinitely until it reaches its capacity. Once that happens, the stack depot stops saving new stack traces. This creates a problem for using the stack depot for in-field testing and in production. For such uses, an ideal stack trace storage should: 1. Allow saving fresh stack traces on systems with a large uptime while limiting the amount of memory used to store the traces; 2. Have a low performance impact. Implementing #1 in the stack depot is impossible with the current keep-forever approach. This series targets to address that. Issue #2 is left to be addressed in a future series. This series changes the stack depot implementation to allow evicting unneeded stack traces from the stack depot. The users of the stack depot can do that via new stack_depot_save_flags(STACK_DEPOT_FLAG_GET) and stack_depot_put APIs. Internal changes to the stack depot code include: 1. Storing stack traces in fixed-frame-sized slots (vs precisely-sized slots in the current implementation); the slot size is controlled via CONFIG_STACKDEPOT_MAX_FRAMES (default: 64 frames); 2. Keeping available slots in a freelist (vs keeping an offset to the next free slot); 3. Using a read/write lock for synchronization (vs a lock-free approach combined with a spinlock). This series also integrates the eviction functionality into KASAN: the tag-based modes evict stack traces when the corresponding entry leaves the stack ring, and Generic KASAN evicts stack traces for objects once those leave the quarantine. With KASAN, despite wasting some space on rounding up the size of each stack record, the total memory consumed by stack depot gets saturated due to the eviction of irrelevant stack traces from the stack depot. With the tag-based KASAN modes, the average total amount of memory used for stack traces becomes ~0.5 MB (with the current default stack ring size of 32k entries and the default CONFIG_STACKDEPOT_MAX_FRAMES of 64). With Generic KASAN, the stack traces take up ~1 MB per 1 GB of RAM (as the quarantine's size depends on the amount of RAM). However, with KMSAN, the stack depot ends up using ~4x more memory per a stack trace than before. Thus, for KMSAN, the stack depot capacity is increased accordingly. KMSAN uses a lot of RAM for shadow memory anyway, so the increased stack depot memory usage will not make a significant difference. Other users of the stack depot do not save stack traces as often as KASAN and KMSAN. Thus, the increased memory usage is taken as an acceptable trade-off. In the future, these other users can take advantage of the eviction API to limit the memory waste. There is no measurable boot time performance impact of these changes for KASAN on x86-64. I haven't done any tests for arm64 modes (the stack depot without performance optimizations is not suitable for intended use of those anyway), but I expect a similar result. Obtaining and copying stack trace frames when saving them into stack depot is what takes the most time. This series does not yet provide a way to configure the maximum size of the stack depot externally (e.g. via a command-line parameter). This will be added in a separate series, possibly together with the performance improvement changes. This patch (of 22): Currently, if stack_depot_disable=off is passed to the kernel command-line after stack_depot_disable=on, stack depot prints a message that it is disabled, while it is actually enabled. Fix this by moving printing the disabled message to stack_depot_early_init. Place it before the __stack_depot_early_init_requested check, so that the message is printed even if early stack depot init has not been requested. Also drop the stack_table = NULL assignment from disable_stack_depot, as stack_table is NULL by default. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/73a25c5fff29f3357cd7a9330e85e09bc8da2cbe.1700502145.git.andreyknvl@google.com Fixes: e1fdc403349c ("lib: stackdepot: add support to disable stack depot") Signed-off-by: Andrey Konovalov <[email protected]> Reviewed-by: Marco Elver <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Evgenii Stepanov <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-12-10kasan: default to inline instrumentationPaul Heidekrüger1-1/+1
KASan inline instrumentation can yield up to a 2x performance gain at the cost of a larger binary. Make inline instrumentation the default, as suggested in the bug report below. When an architecture does not support inline instrumentation, it should set ARCH_DISABLE_KASAN_INLINE, as done by PowerPC, for instance. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Paul Heidekrüger <[email protected]> Reported-by: Andrey Konovalov <[email protected]> Reviewed-by: Marco Elver <[email protected]> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=203495 Acked-by: Andrey Konovalov <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: Andrey Ryabinin <[email protected]> Cc: Vincenzo Frascino <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-12-10maple_tree: preserve the tree attributes when destroying maple treePeng Zhang1-1/+1
When destroying maple tree, preserve its attributes and then turn it into an empty tree. This allows it to be reused without needing to be reinitialized. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Peng Zhang <[email protected]> Reviewed-by: Liam R. Howlett <[email protected]> Cc: Christian Brauner <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Mateusz Guzik <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Michael S. Tsirkin <[email protected]> Cc: Mike Christie <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Suren Baghdasaryan <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-12-10maple_tree: update check_forking() and bench_forking()Peng Zhang1-59/+58
Updated check_forking() and bench_forking() to use __mt_dup() to duplicate maple tree. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Peng Zhang <[email protected]> Reviewed-by: Liam R. Howlett <[email protected]> Cc: Christian Brauner <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Mateusz Guzik <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Michael S. Tsirkin <[email protected]> Cc: Mike Christie <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Suren Baghdasaryan <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-12-10maple_tree: skip other tests when BENCH is enabledPeng Zhang1-4/+4
Skip other tests when BENCH is enabled so that performance can be measured in user space. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Peng Zhang <[email protected]> Reviewed-by: Liam R. Howlett <[email protected]> Cc: Christian Brauner <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Mateusz Guzik <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Michael S. Tsirkin <[email protected]> Cc: Mike Christie <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Suren Baghdasaryan <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-12-10maple_tree: introduce interfaces __mt_dup() and mtree_dup()Peng Zhang1-0/+274
Introduce interfaces __mt_dup() and mtree_dup(), which are used to duplicate a maple tree. They duplicate a maple tree in Depth-First Search (DFS) pre-order traversal. It uses memcopy() to copy nodes in the source tree and allocate new child nodes in non-leaf nodes. The new node is exactly the same as the source node except for all the addresses stored in it. It will be faster than traversing all elements in the source tree and inserting them one by one into the new tree. The time complexity of these two functions is O(n). The difference between __mt_dup() and mtree_dup() is that mtree_dup() handles locks internally. Analysis of the average time complexity of this algorithm: For simplicity, let's assume that the maximum branching factor of all non-leaf nodes is 16 (in allocation mode, it is 10), and the tree is a full tree. Under the given conditions, if there is a maple tree with n elements, the number of its leaves is n/16. From bottom to top, the number of nodes in each level is 1/16 of the number of nodes in the level below. So the total number of nodes in the entire tree is given by the sum of n/16 + n/16^2 + n/16^3 + ... + 1. This is a geometric series, and it has log(n) terms with base 16. According to the formula for the sum of a geometric series, the sum of this series can be calculated as (n-1)/15. Each node has only one parent node pointer, which can be considered as an edge. In total, there are (n-1)/15-1 edges. This algorithm consists of two operations: 1. Traversing all nodes in DFS order. 2. For each node, making a copy and performing necessary modifications to create a new node. For the first part, DFS traversal will visit each edge twice. Let T(ascend) represent the cost of taking one step downwards, and T(descend) represent the cost of taking one step upwards. And both of them are constants (although mas_ascend() may not be, as it contains a loop, but here we ignore it and treat it as a constant). So the time spent on the first part can be represented as ((n-1)/15-1) * (T(ascend) + T(descend)). For the second part, each node will be copied, and the cost of copying a node is denoted as T(copy_node). For each non-leaf node, it is necessary to reallocate all child nodes, and the cost of this operation is denoted as T(dup_alloc). The behavior behind memory allocation is complex and not specific to the maple tree operation. Here, we assume that the time required for a single allocation is constant. Since the size of a node is fixed, both of these symbols are also constants. We can calculate that the time spent on the second part is ((n-1)/15) * T(copy_node) + ((n-1)/15 - n/16) * T(dup_alloc). Adding both parts together, the total time spent by the algorithm can be represented as: ((n-1)/15) * (T(ascend) + T(descend) + T(copy_node) + T(dup_alloc)) - n/16 * T(dup_alloc) - (T(ascend) + T(descend)) Let C1 = T(ascend) + T(descend) + T(copy_node) + T(dup_alloc) Let C2 = T(dup_alloc) Let C3 = T(ascend) + T(descend) Finally, the expression can be simplified as: ((16 * C1 - 15 * C2) / (15 * 16)) * n - (C1 / 15 + C3). This is a linear function, so the average time complexity is O(n). Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Peng Zhang <[email protected]> Suggested-by: Liam R. Howlett <[email protected]> Cc: Christian Brauner <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Mateusz Guzik <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Michael S. Tsirkin <[email protected]> Cc: Mike Christie <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Suren Baghdasaryan <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-12-10maple_tree: add mt_free_one() and mt_attr() helpersPeng Zhang1-1/+11
Patch series "Introduce __mt_dup() to improve the performance of fork()", v7. This series introduces __mt_dup() to improve the performance of fork(). During the duplication process of mmap, all VMAs are traversed and inserted one by one into the new maple tree, causing the maple tree to be rebalanced multiple times. Balancing the maple tree is a costly operation. To duplicate VMAs more efficiently, mtree_dup() and __mt_dup() are introduced for the maple tree. They can efficiently duplicate a maple tree. Here are some algorithmic details about {mtree,__mt}_dup(). We perform a DFS pre-order traversal of all nodes in the source maple tree. During this process, we fully copy the nodes from the source tree to the new tree. This involves memory allocation, and when encountering a new node, if it is a non-leaf node, all its child nodes are allocated at once. This idea was originally from Liam R. Howlett's Maple Tree Work email, and I added some of my own ideas to implement it. Some previous discussions can be found in [1]. For a more detailed analysis of the algorithm, please refer to the logs for patch [3/10] and patch [10/10]. There is a "spawn" in byte-unixbench[2], which can be used to test the performance of fork(). I modified it slightly to make it work with different number of VMAs. Below are the test results. The first row shows the number of VMAs. The second and third rows show the number of fork() calls per ten seconds, corresponding to next-20231006 and the this patchset, respectively. The test results were obtained with CPU binding to avoid scheduler load balancing that could cause unstable results. There are still some fluctuations in the test results, but at least they are better than the original performance. 21 121 221 421 821 1621 3221 6421 12821 25621 51221 112100 76261 54227 34035 20195 11112 6017 3161 1606 802 393 114558 83067 65008 45824 28751 16072 8922 4747 2436 1233 599 2.19% 8.92% 19.88% 34.64% 42.37% 44.64% 48.28% 50.17% 51.68% 53.74% 52.42% Thanks to Liam and Matthew for the review. This patch (of 10): Add two helpers: 1. mt_free_one(), used to free a maple node. 2. mt_attr(), used to obtain the attributes of maple tree. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Peng Zhang <[email protected]> Reviewed-by: Liam R. Howlett <[email protected]> Cc: Christian Brauner <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Mateusz Guzik <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Michael S. Tsirkin <[email protected]> Cc: Mike Christie <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Suren Baghdasaryan <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-12-09test_bpf: Rename second ALU64_SMOD_X to ALU64_SMOD_KTiezhu Yang1-1/+1
Currently, there are two test cases with same name "ALU64_SMOD_X: -7 % 2 = -1", the first one is right, the second one should be ALU64_SMOD_K because its code is BPF_ALU64 | BPF_MOD | BPF_K. Before: test_bpf: #170 ALU64_SMOD_X: -7 % 2 = -1 jited:1 4 PASS test_bpf: #171 ALU64_SMOD_X: -7 % 2 = -1 jited:1 4 PASS After: test_bpf: #170 ALU64_SMOD_X: -7 % 2 = -1 jited:1 4 PASS test_bpf: #171 ALU64_SMOD_K: -7 % 2 = -1 jited:1 4 PASS Fixes: daabb2b098e0 ("bpf/tests: add tests for cpuv4 instructions") Signed-off-by: Tiezhu Yang <[email protected]> Acked-by: Yonghong Song <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2023-12-08Merge tag 'mm-hotfixes-stable-2023-12-07-18-47' of ↵Linus Torvalds1-6/+16
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "31 hotfixes. Ten of these address pre-6.6 issues and are marked cc:stable. The remainder address post-6.6 issues or aren't considered serious enough to justify backporting" * tag 'mm-hotfixes-stable-2023-12-07-18-47' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (31 commits) mm/madvise: add cond_resched() in madvise_cold_or_pageout_pte_range() nilfs2: prevent WARNING in nilfs_sufile_set_segment_usage() mm/hugetlb: have CONFIG_HUGETLB_PAGE select CONFIG_XARRAY_MULTI scripts/gdb: fix lx-device-list-bus and lx-device-list-class MAINTAINERS: drop Antti Palosaari highmem: fix a memory copy problem in memcpy_from_folio nilfs2: fix missing error check for sb_set_blocksize call kernel/Kconfig.kexec: drop select of KEXEC for CRASH_DUMP units: add missing header drivers/base/cpu: crash data showing should depends on KEXEC_CORE mm/damon/sysfs-schemes: add timeout for update_schemes_tried_regions scripts/gdb/tasks: fix lx-ps command error mm/Kconfig: make userfaultfd a menuconfig selftests/mm: prevent duplicate runs caused by TEST_GEN_PROGS mm/damon/core: copy nr_accesses when splitting region lib/group_cpus.c: avoid acquiring cpu hotplug lock in group_cpus_evenly checkstack: fix printed address mm/memory_hotplug: fix error handling in add_memory_resource() mm/memory_hotplug: add missing mem_hotplug_lock .mailmap: add a new address mapping for Chester Lin ...
2023-12-07Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski5-6/+62
Cross-merge networking fixes after downstream PR. Conflicts: drivers/net/ethernet/stmicro/stmmac/dwmac5.c drivers/net/ethernet/stmicro/stmmac/dwmac5.h drivers/net/ethernet/stmicro/stmmac/dwxgmac2_core.c drivers/net/ethernet/stmicro/stmmac/hwif.h 37e4b8df27bc ("net: stmmac: fix FPE events losing") c3f3b97238f6 ("net: stmmac: Refactor EST implementation") https://lore.kernel.org/all/[email protected]/ Adjacent changes: net/ipv4/tcp_ao.c 9396c4ee93f9 ("net/tcp: Don't store TCP-AO maclen on reqsk") 7b0f570f879a ("tcp: Move TCP-AO bits from cookie_v[46]_check() to tcp_ao_syncookie().") Signed-off-by: Jakub Kicinski <[email protected]>
2023-12-06Merge branch 'master' into mm-hotfixes-stableAndrew Morton6-12/+62
2023-12-06lib/group_cpus.c: avoid acquiring cpu hotplug lock in group_cpus_evenlyMing Lei1-6/+16
group_cpus_evenly() could be part of storage driver's error handler, such as nvme driver, when may happen during CPU hotplug, in which storage queue has to drain its pending IOs because all CPUs associated with the queue are offline and the queue is becoming inactive. And handling IO needs error handler to provide forward progress. Then deadlock is caused: 1) inside CPU hotplug handler, CPU hotplug lock is held, and blk-mq's handler is waiting for inflight IO 2) error handler is waiting for CPU hotplug lock 3) inflight IO can't be completed in blk-mq's CPU hotplug handler because error handling can't provide forward progress. Solve the deadlock by not holding CPU hotplug lock in group_cpus_evenly(), in which two stage spreads are taken: 1) the 1st stage is over all present CPUs; 2) the end stage is over all other CPUs. Turns out the two stage spread just needs consistent 'cpu_present_mask', and remove the CPU hotplug lock by storing it into one local cache. This way doesn't change correctness, because all CPUs are still covered. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ming Lei <[email protected]> Reported-by: Yi Zhang <[email protected]> Reported-by: Guangwu Zhang <[email protected]> Tested-by: Guangwu Zhang <[email protected]> Reviewed-by: Chengming Zhou <[email protected]> Reviewed-by: Jens Axboe <[email protected]> Cc: Keith Busch <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-12-06ACPI: tables: Correct and clean up the logic of acpi_parse_entries_array()Yuntao Wang1-21/+9
The original intention of acpi_parse_entries_array() is to return the number of all matching entries on success. This number may be greater than the value of the max_entries parameter. When this happens, the function will output a warning message, indicating that `count - max_entries` matching entries remain unprocessed and have been ignored. However, commit 4ceacd02f5a1 ("ACPI / table: Always count matched and successfully parsed entries") changed this logic to return the number of entries successfully processed by the handler. In this case, when the max_entries parameter is not zero, the number of entries successfully processed can never be greater than the value of max_entries. In other words, the expression `count > max_entries` will always evaluate to false. This means that the logic in the final if statement will never be executed. Commit 99b0efd7c886 ("ACPI / tables: do not report the number of entries ignored by acpi_parse_entries()") mentioned this issue, but it tried to fix it by removing part of the warning message. This is meaningless because the pr_warn statement will never be executed in the first place. Commit 8726d4f44150 ("ACPI / tables: fix acpi_parse_entries_array() so it traverses all subtables") introduced an errs variable, which is intended to make acpi_parse_entries_array() always traverse all of the subtables, calling as many of the callbacks as possible. However, it seems that the commit does not achieve this goal. For example, when a handler returns an error, none of the handlers will be called again in the subsequent iterations. This result appears to be no different from before the change. This patch corrects and cleans up the logic of acpi_parse_entries_array(), making it return the number of all matching entries, rather than the number of entries successfully processed by handlers. Additionally, if an error occurs when executing a handler, the function will return -EINVAL immediately. This patch should not affect existing users of acpi_parse_entries_array(). Signed-off-by: Yuntao Wang <[email protected]> Reviewed-by: Dave Jiang <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]>
2023-12-06lib/vsprintf: Fix %pfwf when current node refcount == 0Herve Codina1-3/+8
A refcount issue can appeared in __fwnode_link_del() due to the pr_debug() call: WARNING: CPU: 0 PID: 901 at lib/refcount.c:25 refcount_warn_saturate+0xe5/0x110 Call Trace: <TASK> ... of_node_get+0x1e/0x30 of_fwnode_get+0x28/0x40 fwnode_full_name_string+0x34/0x90 fwnode_string+0xdb/0x140 ... vsnprintf+0x17b/0x630 ... __fwnode_link_del+0x25/0xa0 fwnode_links_purge+0x39/0xb0 of_node_release+0xd9/0x180 ... Indeed, an fwnode (of_node) is being destroyed and so, of_node_release() is called because the of_node refcount reached 0. From of_node_release() several function calls are done and lead to a pr_debug() calls with %pfwf to print the fwnode full name. The issue is not present if we change %pfwf to %pfwP. To print the full name, %pfwf iterates over the current node and its parents and obtain/drop a reference to all nodes involved. In order to allow to print the full name (%pfwf) of a node while it is being destroyed, do not obtain/drop a reference to this current node. Fixes: a92eb7621b9f ("lib/vsprintf: Make use of fwnode API to obtain node names and separators") Cc: [email protected] Signed-off-by: Herve Codina <[email protected]> Reviewed-by: Sakari Ailus <[email protected]> Reviewed-by: Andy Shevchenko <[email protected]> Signed-off-by: Petr Mladek <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-12-05iov_iter: replace import_single_range() with import_ubuf()Jens Axboe1-13/+0
With the removal of the 'iov' argument to import_single_range(), the two functions are now fully identical. Convert the import_single_range() callers to import_ubuf(), and remove the former fully. Signed-off-by: Jens Axboe <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Christian Brauner <[email protected]>
2023-12-05iov_iter: remove unused 'iov' argument from import_single_range()Jens Axboe1-1/+1
It is entirely unused, just get rid of it. Signed-off-by: Jens Axboe <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Christian Brauner <[email protected]>
2023-12-05mm/slab: remove CONFIG_SLAB from all Kconfig and MakefileVlastimil Babka4-11/+5
Remove CONFIG_SLAB, CONFIG_DEBUG_SLAB, CONFIG_SLAB_DEPRECATED and everything in Kconfig files and mm/Makefile that depends on those. Since SLUB is the only remaining allocator, remove the allocator choice, make CONFIG_SLUB a "def_bool y" for now and remove all explicit dependencies on SLUB or SLAB as it's now always enabled. Make every option's verbose name and description refer to "the slab allocator" without refering to the specific implementation. Do not rename the CONFIG_ option names yet. Everything under #ifdef CONFIG_SLAB, and mm/slab.c is now dead code, all code under #ifdef CONFIG_SLUB is now always compiled. Reviewed-by: Kees Cook <[email protected]> Reviewed-by: Christoph Lameter <[email protected]> Acked-by: David Rientjes <[email protected]> Tested-by: David Rientjes <[email protected]> Reviewed-by: Hyeonggon Yoo <[email protected]> Tested-by: Hyeonggon Yoo <[email protected]> Signed-off-by: Vlastimil Babka <[email protected]>
2023-12-03Merge tag 'probes-fixes-v6.7-rc3' of ↵Linus Torvalds1-0/+17
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull probes fixes from Masami Hiramatsu: - objpool: Fix objpool overrun case on memory/cache access delay especially on the big.LITTLE SoC. The objpool uses a copy of object slot index internal loop, but the slot index can be changed on another processor in parallel. In that case, the difference of 'head' local copy and the 'slot->last' index will be bigger than local slot size. In that case, we need to re-read the slot::head to update it. - kretprobe: Fix to use appropriate rcu API for kretprobe holder. Since kretprobe_holder::rp is RCU managed, it should use rcu_assign_pointer() and rcu_dereference_check() correctly. Also adding __rcu tag for finding wrong usage by sparse. - rethook: Fix to use appropriate rcu API for rethook::handler. The same as kretprobe, rethook::handler is RCU managed and it should use rcu_assign_pointer() and rcu_dereference_check(). This also adds __rcu tag for finding wrong usage by sparse. * tag 'probes-fixes-v6.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: rethook: Use __rcu pointer for rethook::handler kprobes: consistent rcu api usage for kretprobe holder lib: objpool: fix head overrun on RK3588 SBC
2023-12-02Merge tag 'acpi-6.7-rc4' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI fixes from Rafael Wysocki: "This fixes a recently introduced build issue on ARM32 and a NULL pointer dereference in the ACPI backlight driver due to a design issue exposed by a recent change in the ACPI bus type code. Specifics: - Fix a recently introduced build issue on ARM32 platforms caused by an inadvertent header file breakage (Dave Jiang) - Eliminate questionable usage of acpi_driver_data() in the ACPI backlight cooling device code that leads to NULL pointer dereferences after recent ACPI core changes (Hans de Goede)" * tag 'acpi-6.7-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: ACPI: video: Use acpi_video_device for cooling-dev driver data ACPI: Fix ARM32 platforms compile issue introduced by fw_table changes
2023-12-02Merge tag 'bcachefs-2023-11-29' of https://evilpiepirate.org/git/bcachefsLinus Torvalds1-2/+3
Pull more bcachefs bugfixes from Kent Overstreet: - bcache & bcachefs were broken with CFI enabled; patch for closures to fix type punning - mark erasure coding as extra-experimental; there are incompatible disk space accounting changes coming for erasure coding, and I'm still seeing checksum errors in some tests - several fixes for durability-related issues (durability is a device specific setting where we can tell bcachefs that data on a given device should be counted as replicated x times) - a fix for a rare livelock when a btree node merge then updates a parent node that is almost full - fix a race in the device removal path, where dropping a pointer in a btree node to a device would be clobbered by an in flight btree write updating the btree node key on completion - fix one SRCU lock hold time warning in the btree gc code - ther's still a bunch more of these to fix - fix a rare race where we'd start copygc before initializing the "are we rw" percpu refcount; copygc would think we were already ro and die immediately * tag 'bcachefs-2023-11-29' of https://evilpiepirate.org/git/bcachefs: (23 commits) bcachefs: Extra kthread_should_stop() calls for copygc bcachefs: Convert gc_alloc_start() to for_each_btree_key2() bcachefs: Fix race between btree writes and metadata drop bcachefs: move journal seq assertion bcachefs: -EROFS doesn't count as move_extent_start_fail bcachefs: trace_move_extent_start_fail() now includes errcode bcachefs: Fix split_race livelock bcachefs: Fix bucket data type for stripe buckets bcachefs: Add missing validation for jset_entry_data_usage bcachefs: Fix zstd compress workspace size bcachefs: bpos is misaligned on big endian bcachefs: Fix ec + durability calculation bcachefs: Data update path won't accidentaly grow replicas bcachefs: deallocate_extra_replicas() bcachefs: Proper refcounting for journal_keys bcachefs: preserve device path as device name bcachefs: Fix an endianness conversion bcachefs: Start gc, copygc, rebalance threads after initing writes ref bcachefs: Don't stop copygc thread on device resize bcachefs: Make sure bch2_move_ratelimit() also waits for move_ops ...
2023-12-01Merge branch 'acpi-tables'Rafael J. Wysocki1-1/+1
Merge a fix for a recently introduced build issue on ARM32 platforms caused by an inadvertent header file breakage (Dave Jiang). * acpi-tables: ACPI: Fix ARM32 platforms compile issue introduced by fw_table changes
2023-12-01lib: objpool: fix head overrun on RK3588 SBCwuqiang.matt1-0/+17
objpool overrun stress with test_objpool on OrangePi5+ SBC triggered the following kernel warnings: WARNING: CPU: 6 PID: 3115 at lib/objpool.c:168 objpool_push+0xc0/0x100 This message is from objpool.c:168: WARN_ON_ONCE(tail - head > pool->nr_objs); The overrun test case is to validate the case that pre-allocated objects are insufficient: 8 objects are pre-allocated for each node and consumer thread per node tries to grab 16 objects in a row. The testing system is OrangePI 5+, with RK3588, a big.LITTLE SOC with 4x A76 and 4x A55. When disabling either all 4 big or 4 little cores, the overrun tests run well, and once with big and little cores mixed together, the overrun test would always cause an overrun loop. It's likely the memory timing differences of big and little cores cause this trouble. Here are the debugging data of objpool_try_get_slot after try_cmpxchg_release: objpool_pop: cpu: 4/0 0:0 head: 278/279 tail:278 last:276/278 The local copies of 'head' and 'last' were 278 and 276, and reloading of 'slot->head' and 'slot->last' got 279 and 278. After try_cmpxchg_release 'slot->head' became 'head + 1', which is correct. But what's wrong here is the stale value of 'last', and that stale value of 'last' finally led the overrun of 'head'. Memory updating of 'last' and 'head' are performed in push() and pop() independently, which could be the culprit leading this out of order visibility of 'last' and 'head'. So for objpool_try_get_slot(), it's not enough only checking the condition of 'head != slot', the implicit condition 'last - head <= nr_objs' must also be explicitly asserted to guarantee 'last' is always behind 'head' before the object retrieving. This patch will check and try reloading of 'head' and 'last' to ensure 'last' is behind 'head' at the time of object retrieving. Performance testings show the average impact is about 0.1% for X86_64 and 1.12% for ARM64. Here are the results: OS: Debian 10 X86_64, Linux 6.6rc HW: XEON 8336C x 2, 64 cores/128 threads, DDR4 3200MT/s 1T 2T 4T 8T 16T native: 49543304 99277826 199017659 399070324 795185848 objpool: 29909085 59865637 119692073 239750369 478005250 objpool+: 29879313 59230743 119609856 239067773 478509029 32T 48T 64T 96T 128T native: 1596927073 2390099988 2929397330 3183875848 3257546602 objpool: 957553042 1435814086 1680872925 2043126796 2165424198 objpool+: 956476281 1434491297 1666055740 2041556569 2157415622 OS: Debian 11 AARCH64, Linux 6.6rc HW: Kunpeng-920 96 cores/2 sockets/4 NUMA nodes, DDR4 2933 MT/s 1T 2T 4T 8T 16T native: 30890508 60399915 123111980 242257008 494002946 objpool: 14742531 28883047 57739948 115886644 232455421 objpool+: 14107220 29032998 57286084 113730493 232232850 24T 32T 48T 64T 96T native: 746406039 1000174750 1493236240 1998318364 2942911180 objpool: 349164852 467284332 702296756 934459713 1387898285 objpool+: 348388180 462750976 696606096 927865887 1368402195 Link: https://lore.kernel.org/all/[email protected]/ Fixes: b4edb8d2d464 ("lib: objpool added: ring-array based lockless MPMC") Signed-off-by: wuqiang.matt <[email protected]> Acked-by: Masami Hiramatsu (Google) <[email protected]> Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
2023-12-01Merge tag 'linux_kselftest-kunit-fixes-6.7-rc4' of ↵Linus Torvalds2-3/+41
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull KUnit fixes from Shuah Khan: "Three fixes to warnings and run-time test behavior. With these fixes, test suite counter will be reset correctly before running tests, kunit will warn if tests are too slow, and eliminate warning when kfree() as an action" * tag 'linux_kselftest-kunit-fixes-6.7-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: kunit: test: Avoid cast warning when adding kfree() as an action kunit: Reset suite counter right before running tests kunit: Warn if tests are slow
2023-11-30Merge tag 'for-netdev' of ↵Jakub Kicinski1-2/+0
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2023-11-30 We've added 30 non-merge commits during the last 7 day(s) which contain a total of 58 files changed, 1598 insertions(+), 154 deletions(-). The main changes are: 1) Add initial TX metadata implementation for AF_XDP with support in mlx5 and stmmac drivers. Two types of offloads are supported right now, that is, TX timestamp and TX checksum offload, from Stanislav Fomichev with stmmac implementation from Song Yoong Siang. 2) Change BPF verifier logic to validate global subprograms lazily instead of unconditionally before the main program, so they can be guarded using BPF CO-RE techniques, from Andrii Nakryiko. 3) Add BPF link_info support for uprobe multi link along with bpftool integration for the latter, from Jiri Olsa. 4) Use pkg-config in BPF selftests to determine ld flags which is in particular needed for linking statically, from Akihiko Odaki. 5) Fix a few BPF selftest failures to adapt to the upcoming LLVM18, from Yonghong Song. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (30 commits) bpf/tests: Remove duplicate JSGT tests selftests/bpf: Add TX side to xdp_hw_metadata selftests/bpf: Convert xdp_hw_metadata to XDP_USE_NEED_WAKEUP selftests/bpf: Add TX side to xdp_metadata selftests/bpf: Add csum helpers selftests/xsk: Support tx_metadata_len xsk: Add option to calculate TX checksum in SW xsk: Validate xsk_tx_metadata flags xsk: Document tx_metadata_len layout net: stmmac: Add Tx HWTS support to XDP ZC net/mlx5e: Implement AF_XDP TX timestamp and checksum offload tools: ynl: Print xsk-features from the sample xsk: Add TX timestamp and TX checksum offload support xsk: Support tx_metadata_len selftests/bpf: Use pkg-config for libelf selftests/bpf: Override PKG_CONFIG for static builds selftests/bpf: Choose pkg-config for the target bpftool: Add support to display uprobe_multi links selftests/bpf: Add link_info test for uprobe_multi link selftests/bpf: Use bpf_link__destroy in fill_link_info tests ... ==================== Conflicts: Documentation/netlink/specs/netdev.yaml: 839ff60df3ab ("net: page_pool: add nlspec for basic access to page pools") 48eb03dd2630 ("xsk: Add TX timestamp and TX checksum offload support") https://lore.kernel.org/all/[email protected]/ While at it also regen, tree is dirty after: 48eb03dd2630 ("xsk: Add TX timestamp and TX checksum offload support") looks like code wasn't re-rendered after "render-max" was removed. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-11-30Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski2-7/+1
Cross-merge networking fixes after downstream PR. No conflicts. Signed-off-by: Jakub Kicinski <[email protected]>
2023-11-30bpf/tests: Remove duplicate JSGT testsYujie Liu1-2/+0
It seems unnecessary that JSGT is tested twice (one before JSGE and one after JSGE) since others are tested only once. Remove the duplicate JSGT tests. Fixes: 0bbaa02b4816 ("bpf/tests: Add tests to check source register zero-extension") Signed-off-by: Yujie Liu <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Johan Almbladh <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2023-11-26Merge tag 'parisc-for-6.7-rc3' of ↵Linus Torvalds1-6/+0
git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux Pull parisc architecture fixes from Helge Deller: "This patchset fixes and enforces correct section alignments for the ex_table, altinstructions, parisc_unwind, jump_table and bug_table which are created by inline assembly. Due to not being correctly aligned at link & load time they can trigger unnecessarily the kernel unaligned exception handler at runtime. While at it, I switched the bug table to use relative addresses which reduces the size of the table by half on 64-bit. We still had the ENOSYM and EREMOTERELEASE errno symbols as left-overs from HP-UX, which now trigger build-issues with glibc. We can simply remove them. Most of the patches are tagged for stable kernel series. Summary: - Drop HP-UX ENOSYM and EREMOTERELEASE return codes to avoid glibc build issues - Fix section alignments for ex_table, altinstructions, parisc unwind table, jump_table and bug_table - Reduce size of bug_table on 64-bit kernel by using relative pointers" * tag 'parisc-for-6.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux: parisc: Reduce size of the bug_table on 64-bit kernel by half parisc: Drop the HP-UX ENOSYM and EREMOTERELEASE error codes parisc: Use natural CPU alignment for bug_table parisc: Ensure 32-bit alignment on parisc unwind section parisc: Mark lock_aligned variables 16-byte aligned on SMP parisc: Mark jump_table naturally aligned parisc: Mark altinstructions read-only and 32-bit aligned parisc: Mark ex_table entries 32-bit aligned in uaccess.h parisc: Mark ex_table entries 32-bit aligned in assembly.h
2023-11-25parisc: Drop the HP-UX ENOSYM and EREMOTERELEASE error codesHelge Deller1-6/+0
Those return codes are only defined for the parisc architecture and are leftovers from when we wanted to be HP-UX compatible. They are not returned by any Linux kernel syscall but do trigger problems with the glibc strerrorname_np() and strerror() functions as reported in glibc issue #31080. There is no need to keep them, so simply remove them. Signed-off-by: Helge Deller <[email protected]> Reported-by: Bruno Haible <[email protected]> Closes: https://sourceware.org/bugzilla/show_bug.cgi?id=31080 Cc: [email protected]
2023-11-24Merge branch 'firmware_loader'Jakub Kicinski1-0/+1
Kory says: ==================== This patch was initially submitted as part of a net patch series. Conor expressed interest in using it in a different subsystem. https://lore.kernel.org/netdev/[email protected]/ Consequently, I extracted it from the series and submitted it separately. I first tried to send it to driver-core but it seems also not the best choice: https://lore.kernel.org/lkml/2023111720-slicer-exes-7d9f@gregkh/ ==================== Signed-off-by: Jakub Kicinski <[email protected]>
2023-11-24firmware_loader: Expand Firmware upload error codes with firmware invalid errorKory Maincent1-0/+1
No error code are available to signal an invalid firmware content. Drivers that can check the firmware content validity can not return this specific failure to the user-space Expand the firmware error code with an additional code: - "firmware invalid" code which can be used when the provided firmware is invalid Sync lib/test_firmware.c file accordingly. Acked-by: Luis Chamberlain <[email protected]> Acked-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Kory Maincent <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://lore.kernel.org/r/20231122-feature_firmware_error_code-v3-1-04ec753afb71@bootlin.com Signed-off-by: Jakub Kicinski <[email protected]>
2023-11-24Merge tag 'vfs-6.7-rc3.fixes' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs fixes from Christian Brauner: - Avoid calling back into LSMs from vfs_getattr_nosec() calls. IMA used to query inode properties accessing raw inode fields without dedicated helpers. That was finally fixed a few releases ago by forcing IMA to use vfs_getattr_nosec() helpers. The goal of the vfs_getattr_nosec() helper is to query for attributes without calling into the LSM layer which would be quite problematic because incredibly IMA is called from __fput()... __fput() -> ima_file_free() What it does is to call back into the filesystem to update the file's IMA xattr. Querying the inode without using vfs_getattr_nosec() meant that IMA didn't handle stacking filesystems such as overlayfs correctly. So the switch to vfs_getattr_nosec() is quite correct. But the switch to vfs_getattr_nosec() revealed another bug when used on stacking filesystems: __fput() -> ima_file_free() -> vfs_getattr_nosec() -> i_op->getattr::ovl_getattr() -> vfs_getattr() -> i_op->getattr::$WHATEVER_UNDERLYING_FS_getattr() -> security_inode_getattr() # calls back into LSMs Now, if that __fput() happens from task_work_run() of an exiting task current->fs and various other pointer could already be NULL. So anything in the LSM layer relying on that not being NULL would be quite surprised. Fix that by passing the information that this is a security request through to the stacking filesystem by adding a new internal ATT_GETATTR_NOSEC flag. Now the callchain becomes: __fput() -> ima_file_free() -> vfs_getattr_nosec() -> i_op->getattr::ovl_getattr() -> if (AT_GETATTR_NOSEC) vfs_getattr_nosec() else vfs_getattr() -> i_op->getattr::$WHATEVER_UNDERLYING_FS_getattr() - Fix a bug introduced with the iov_iter rework from last cycle. This broke /proc/kcore by copying too much and without the correct offset. - Add a missing NULL check when allocating the root inode in autofs_fill_super(). - Fix stable writes for multi-device filesystems (xfs, btrfs etc) and the block device pseudo filesystem. Stable writes used to be a superblock flag only, making it a per filesystem property. Add an additional AS_STABLE_WRITES mapping flag to allow for fine-grained control. - Ensure that offset_iterate_dir() returns 0 after reaching the end of a directory so it adheres to getdents() convention. * tag 'vfs-6.7-rc3.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: libfs: getdents() should return 0 after reaching EOD xfs: respect the stable writes flag on the RT device xfs: clean up FS_XFLAG_REALTIME handling in xfs_ioctl_setattr_xflags block: update the stable_writes flag in bdev_add filemap: add a per-mapping stable writes flag autofs: add: new_inode check in autofs_fill_super() iov_iter: fix copy_page_to_iter_nofault() fs: Pass AT_GETATTR_NOSEC flag to getattr interface function
2023-11-24closures: CLOSURE_CALLBACK() to fix type punningKent Overstreet1-2/+3
Control flow integrity is now checking that type signatures match on indirect function calls. That breaks closures, which embed a work_struct in a closure in such a way that a closure_fn may also be used as a workqueue fn by the underlying closure code. So we have to change closure fns to take a work_struct as their argument - but that results in a loss of clarity, as closure fns have different semantics from normal workqueue functions (they run owning a ref on the closure, which must be released with continue_at() or closure_return()). Thus, this patc introduces CLOSURE_CALLBACK() and closure_type() macros as suggested by Kees, to smooth things over a bit. Suggested-by: Kees Cook <[email protected]> Cc: Coly Li <[email protected]> Signed-off-by: Kent Overstreet <[email protected]>
2023-11-22ACPI: Fix ARM32 platforms compile issue introduced by fw_table changesDave Jiang1-1/+1
Linus reported that: After commit a103f46633fd the kernel stopped compiling for several ARM32 platforms that I am building with a bare metal compiler. Bare metal compilers (arm-none-eabi-) don't define __linux__. This is because the header <acpi/platform/acenv.h> is now in the include path for <linux/irq.h>: CC arch/arm/kernel/irq.o CC kernel/sysctl.o CC crypto/api.o In file included from ../include/acpi/acpi.h:22, from ../include/linux/fw_table.h:29, from ../include/linux/acpi.h:18, from ../include/linux/irqchip.h:14, from ../arch/arm/kernel/irq.c:25: ../include/acpi/platform/acenv.h:218:2: error: #error Unknown target environment 218 | #error Unknown target environment | ^~~~~ The issue is caused by the introducing of splitting out the ACPI code to support the new generic fw_table code. Rafael suggested [1] moving the fw_table.h include in linux/acpi.h to below the linux/mutex.h. Remove the two includes in fw_table.h. Replace linux/fw_table.h include in fw_table.c with linux/acpi.h. Link: https://lore.kernel.org/linux-acpi/CAJZ5v0idWdJq3JSqQWLG5q+b+b=zkEdWR55rGYEoxh7R6N8kFQ@mail.gmail.com/ Fixes: a103f46633fd ("acpi: Move common tables helper functions to common lib") Closes: https://lore.kernel.org/linux-acpi/[email protected]/ Reported-by: Linus Walleij <[email protected]> Suggested-by: Rafael J. Wysocki <[email protected]> Tested-by: Linus Walleij <[email protected]> Signed-off-by: Dave Jiang <[email protected]> Acked-by: Rafael J. Wysocki <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]>
2023-11-22debugobjects: Stop accessing objects after releasing hash bucket lockAndrzej Hajda1-122/+78
After release of the hashbucket lock the tracking object can be modified or freed by a concurrent thread. Using it in such a case is error prone, even for printing the object state: 1. T1 tries to deactivate destroyed object, debugobjects detects it, hash bucket lock is released. 2. T2 preempts T1 and frees the tracking object. 3. The freed tracking object is allocated and initialized for a different to be tracked kernel object. 4. T1 resumes and reports error for wrong kernel object. Create a local copy of the tracking object before releasing the hash bucket lock and use the local copy for reporting and fixups to prevent this. Signed-off-by: Andrzej Hajda <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-11-21Merge tag 'for-netdev' of ↵Jakub Kicinski1-16/+0
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2023-11-21 We've added 85 non-merge commits during the last 12 day(s) which contain a total of 63 files changed, 4464 insertions(+), 1484 deletions(-). The main changes are: 1) Huge batch of verifier changes to improve BPF register bounds logic and range support along with a large test suite, and verifier log improvements, all from Andrii Nakryiko. 2) Add a new kfunc which acquires the associated cgroup of a task within a specific cgroup v1 hierarchy where the latter is identified by its id, from Yafang Shao. 3) Extend verifier to allow bpf_refcount_acquire() of a map value field obtained via direct load which is a use-case needed in sched_ext, from Dave Marchevsky. 4) Fix bpf_get_task_stack() helper to add the correct crosstask check for the get_perf_callchain(), from Jordan Rome. 5) Fix BPF task_iter internals where lockless usage of next_thread() was wrong. The rework also simplifies the code, from Oleg Nesterov. 6) Fix uninitialized tail padding via LIBBPF_OPTS_RESET, and another fix for certain BPF UAPI structs to fix verifier failures seen in bpf_dynptr usage, from Yonghong Song. 7) Add BPF selftest fixes for map_percpu_stats flakes due to per-CPU BPF memory allocator not being able to allocate per-CPU pointer successfully, from Hou Tao. 8) Add prep work around dynptr and string handling for kfuncs which is later going to be used by file verification via BPF LSM and fsverity, from Song Liu. 9) Improve BPF selftests to update multiple prog_tests to use ASSERT_* macros, from Yuran Pereira. 10) Optimize LPM trie lookup to check prefixlen before walking the trie, from Florian Lehner. 11) Consolidate virtio/9p configs from BPF selftests in config.vm file given they are needed consistently across archs, from Manu Bretelle. 12) Small BPF verifier refactor to remove register_is_const(), from Shung-Hsi Yu. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (85 commits) selftests/bpf: Replaces the usage of CHECK calls for ASSERTs in vmlinux selftests/bpf: Replaces the usage of CHECK calls for ASSERTs in bpf_obj_id selftests/bpf: Replaces the usage of CHECK calls for ASSERTs in bind_perm selftests/bpf: Replaces the usage of CHECK calls for ASSERTs in bpf_tcp_ca selftests/bpf: reduce verboseness of reg_bounds selftest logs bpf: bpf_iter_task_next: use next_task(kit->task) rather than next_task(kit->pos) bpf: bpf_iter_task_next: use __next_thread() rather than next_thread() bpf: task_group_seq_get_next: use __next_thread() rather than next_thread() bpf: emit frameno for PTR_TO_STACK regs if it differs from current one bpf: smarter verifier log number printing logic bpf: omit default off=0 and imm=0 in register state log bpf: emit map name in register state if applicable and available bpf: print spilled register state in stack slot bpf: extract register state printing bpf: move verifier state printing code to kernel/bpf/log.c bpf: move verbose_linfo() into kernel/bpf/log.c bpf: rename BPF_F_TEST_SANITY_STRICT to BPF_F_TEST_REG_INVARIANTS bpf: Remove test for MOVSX32 with offset=32 selftests/bpf: add iter test requiring range x range logic veristat: add ability to set BPF_F_TEST_SANITY_STRICT flag with -r flag ... ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-11-18iov_iter: fix copy_page_to_iter_nofault()Omar Sandoval1-1/+1
The recent conversion to inline functions made two mistakes: 1. It tries to copy the full amount requested (bytes), not just what's available in the kmap'd page (n). 2. It's not applying the offset in the first page. Note that copy_page_to_iter_nofault() is only used by /proc/kcore. This was detected by drgn's test suite. Fixes: f1982740f5e7 ("iov_iter: Convert iterate*() to inline funcs") Signed-off-by: Omar Sandoval <[email protected]> Link: https://lore.kernel.org/r/c1616e06b5248013cbbb1881bb4fef85a7a69ccb.1700257019.git.osandov@fb.com Acked-by: David Howells <[email protected]> Signed-off-by: Christian Brauner <[email protected]>
2023-11-17crypto: lib/aesgcm - Add kernel docs for aesgcm_macSagar Vashnav1-0/+13
Add kernel documentation for the aesgcm_mac. This function generates the authentication tag using the AES-GCM algorithm. Signed-off-by: Sagar Vashnav <[email protected]> Signed-off-by: Herbert Xu <[email protected]>
2023-11-15bpf: Remove test for MOVSX32 with offset=32Puranjay Mohan1-16/+0
MOVSX32 only supports sign extending 8-bit and 16-bit operands into 32 bit operands. The "ALU_MOVSX | BPF_W" test tries to sign extend a 32 bit operand into a 32 bit operand which is equivalent to a normal BPF_MOV. Remove this test as it tries to run an invalid instruction. Fixes: daabb2b098e0 ("bpf/tests: add tests for cpuv4 instructions") Signed-off-by: Puranjay Mohan <[email protected]> Reported-by: kernel test robot <[email protected]> Closes: https://lore.kernel.org/oe-lkp/[email protected] Acked-by: Stanislav Fomichev <[email protected]> Acked-by: Yonghong Song <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2023-11-14Merge tag 'zstd-linus-v6.7-rc2' of https://github.com/terrelln/linuxLinus Torvalds1-1/+1
Pull Zstd fix from Nick Terrell: "Only a single line change to fix a benign UBSAN warning" * tag 'zstd-linus-v6.7-rc2' of https://github.com/terrelln/linux: zstd: Fix array-index-out-of-bounds UBSAN warning
2023-11-14zstd: Fix array-index-out-of-bounds UBSAN warningNick Terrell1-1/+1
Zstd used an array of length 1 to mean a flexible array for C89 compatibility. Switch to a C99 flexible array to fix the UBSAN warning. Tested locally by booting the kernel and writing to and reading from a BtrFS filesystem with zstd compression enabled. I was unable to reproduce the issue before the fix, however it is a trivial change. Link: https://lkml.kernel.org/r/[email protected] Reported-by: [email protected] Reported-by: Eric Biggers <[email protected]> Reported-by: Kees Cook <[email protected]> Signed-off-by: Nick Terrell <[email protected]> Reviewed-by: Kees Cook <[email protected]>
2023-11-14kunit: test: Avoid cast warning when adding kfree() as an actionRichard Fitzgerald1-1/+1
In kunit_log_test() pass the kfree_wrapper() function to kunit_add_action() instead of directly passing kfree(). This prevents a cast warning: lib/kunit/kunit-test.c:565:25: warning: cast from 'void (*)(const void *)' to 'kunit_action_t *' (aka 'void (*)(void *)') converts to incompatible function type [-Wcast-function-type-strict] 564 full_log = string_stream_get_string(test->log); > 565 kunit_add_action(test, (kunit_action_t *)kfree, full_log); Signed-off-by: Richard Fitzgerald <[email protected]> Reported-by: kernel test robot <[email protected]> Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/ Fixes: 05e2006ce493 ("kunit: Use string_stream for test log") Reviewed-by: David Gow <[email protected]> Signed-off-by: Shuah Khan <[email protected]>
2023-11-14kunit: Reset suite counter right before running testsMichal Wajdeczko1-2/+2
Today we reset the suite counter as part of the suite cleanup, called from the module exit callback, but it might not work that well as one can try to collect results without unloading a previous test (either unintentionally or due to dependencies). For easy reproduction try to load the kunit-test.ko and then collect and parse results from the kunit-example-test.ko load. Parser will complain about mismatch of expected test number: [ ] KTAP version 1 [ ] 1..1 [ ] # example: initializing suite [ ] KTAP version 1 [ ] # Subtest: example .. [ ] # example: pass:5 fail:0 skip:4 total:9 [ ] # Totals: pass:6 fail:0 skip:6 total:12 [ ] ok 7 example [ ] [ERROR] Test: example: Expected test number 1 but found 7 [ ] ===================== [PASSED] example ===================== [ ] ============================================================ [ ] Testing complete. Ran 12 tests: passed: 6, skipped: 6, errors: 1 Since we are now printing suite test plan on every module load, right before running suite tests, we should make sure that suite counter will also start from 1. Easiest solution seems to be move counter reset to the __kunit_test_suites_init() function. Signed-off-by: Michal Wajdeczko <[email protected]> Cc: David Gow <[email protected]> Cc: Rae Moar <[email protected]> Reviewed-by: David Gow <[email protected]> Signed-off-by: Shuah Khan <[email protected]>
2023-11-14kunit: Warn if tests are slowMaxime Ripard1-0/+38
Kunit recently gained support to setup attributes, the first one being the speed of a given test, then allowing to filter out slow tests. A slow test is defined in the documentation as taking more than one second. There's an another speed attribute called "super slow" but whose definition is less clear. Add support to the test runner to check the test execution time, and report tests that should be marked as slow but aren't. Signed-off-by: Maxime Ripard <[email protected]> Reviewed-by: David Gow <[email protected]> Signed-off-by: Shuah Khan <[email protected]>
2023-11-10lib: test_objpool: make global variables staticwuqiang.matt1-3/+3
Kernel test robot reported build warnings that structures g_ot_sync_ops, g_ot_async_ops and g_testcases should be static. These definitions are only used in test_objpool.c, so make them static Link: https://lore.kernel.org/all/[email protected]/ Reported-by: kernel test robot <[email protected]> Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/ Signed-off-by: wuqiang.matt <[email protected]> Acked-by: Masami Hiramatsu (Google) <[email protected]> Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
2023-11-07Merge tag 'bcachefs-2023-11-5' of https://evilpiepirate.org/git/bcachefsLinus Torvalds1-2/+7
Pull more bcachefs updates from Kent Overstreet: "Here's the second big bcachefs pull request. This brings your tree up to date with my master branch, which is what existing bcachefs users are currently running. New features: - rebalance_work btree (and metadata version 1.3): the rebalance thread no longer has to scan to find extents that need processing - big scalability improvement. - sb_errors superblock section: this adds counters for each fsck error type, since filesystem creation, along with the date of the most recent error. It'll get us better bug reports (since users do not typically report errors that fsck was able to fix), and I might add telemetry for this in the future. Fixes include: - multiple snapshot deletion fixes - members_v2 fixups - deleted_inodes btree fixes - copygc thread no longer spins when a device is full but has no fragmented buckets (i.e. rebalance needs to move data around instead) - a fix for a memory reclaim issue with the btree key cache: we're now careful not to hold the srcu read lock that blocks key cache reclaim for too long - an early allocator locking fix, from Brian - endianness fixes, from Brian - CONFIG_BCACHEFS_DEBUG_TRANSACTIONS no longer defaults to y, a big performance improvement on multithreaded workloads" * tag 'bcachefs-2023-11-5' of https://evilpiepirate.org/git/bcachefs: (70 commits) bcachefs: Improve stripe checksum error message bcachefs: Simplify, fix bch2_backpointer_get_key() bcachefs: kill thing_it_points_to arg to backpointer_not_found() bcachefs: bch2_ec_read_extent() now takes btree_trans bcachefs: bch2_stripe_to_text() now prints ptr gens bcachefs: Don't iterate over journal entries just for btree roots bcachefs: Break up bch2_journal_write() bcachefs: Replace ERANGE with private error codes bcachefs: bkey_copy() is no longer a macro bcachefs: x-macro-ify inode flags enum bcachefs: Convert bch2_fs_open() to darray bcachefs: Move __bch2_members_v2_get_mut to sb-members.h bcachefs: bch2_prt_datetime() bcachefs: CONFIG_BCACHEFS_DEBUG_TRANSACTIONS no longer defaults to y bcachefs: Add a comment for BTREE_INSERT_NOJOURNAL usage bcachefs: rebalance_work btree is not a snapshots btree bcachefs: Add missing printk newlines bcachefs: Fix recovery when forced to use JSET_NO_FLUSH journal entry bcachefs: .get_parent() should return an error pointer bcachefs: Fix bch2_delete_dead_inodes() ...
2023-11-04Merge tag 'cxl-for-6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxlLinus Torvalds3-0/+193
Pull CXL (Compute Express Link) updates from Dan Williams: "The main new functionality this time is work to allow Linux to natively handle CXL link protocol errors signalled via PCIe AER for current generation CXL platforms. This required some enlightenment of the PCIe AER core to workaround the fact that current generation RCH (Restricted CXL Host) platforms physically hide topology details and registers via a mechanism called RCRB (Root Complex Register Block). The next major highlight is reworks to address bugs in parsing region configurations for next generation VH (Virtual Host) topologies. The old broken algorithm is replaced with a simpler one that significantly increases the number of region configurations supported by Linux. This is again relevant for error handling so that forward and reverse address translation of memory errors can be carried out by Linux for memory regions instantiated by platform firmware. As for other cross-tree work, the ACPI table parsing code has been refactored for reuse parsing the "CDAT" structure which is an ACPI-like data structure that is reported by CXL devices. That work is in preparation for v6.8 support for CXL QoS. Think of this as dynamic generation of NUMA node topology information generated by Linux rather than platform firmware. Lastly, a number of internal object lifetime issues have been resolved along with misc. fixes and feature updates (decoders_committed sysfs ABI). Summary: - Add support for RCH (Restricted CXL Host) Error recovery - Fix several region assembly bugs - Fix mem-device lifetime issues relative to the sanitize command and RCH topology. - Refactor ACPI table parsing for CDAT parsing re-use in preparation for CXL QOS support" * tag 'cxl-for-6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: (50 commits) lib/fw_table: Remove acpi_parse_entries_array() export cxl/pci: Change CXL AER support check to use native AER cxl/hdm: Remove broken error path cxl/hdm: Fix && vs || bug acpi: Move common tables helper functions to common lib cxl: Add support for reading CXL switch CDAT table cxl: Add checksum verification to CDAT from CXL cxl: Export QTG ids from CFMWS to sysfs as qos_class attribute cxl: Add decoders_committed sysfs attribute to cxl_port cxl: Add cxl_decoders_committed() helper cxl/core/regs: Rework cxl_map_pmu_regs() to use map->dev for devm cxl/core/regs: Rename phys_addr in cxl_map_component_regs() PCI/AER: Unmask RCEC internal errors to enable RCH downstream port error handling PCI/AER: Forward RCH downstream port-detected errors to the CXL.mem dev handler cxl/pci: Disable root port interrupts in RCH mode cxl/pci: Add RCH downstream port error logging cxl/pci: Map RCH downstream AER registers for logging protocol errors cxl/pci: Update CXL error logging to use RAS register address PCI/AER: Refactor cper_print_aer() for use by CXL driver module cxl/pci: Add RCH downstream port AER register discovery ...
2023-11-03Merge tag 'powerpc-6.7-1' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: - Add support for KVM running as a nested hypervisor under development versions of PowerVM, using the new PAPR nested virtualisation API - Add support for the BPF prog pack allocator - A rework of the non-server MMU handling to support execute-only on all platforms - Some optimisations & cleanups for the powerpc qspinlock code - Various other small features and fixes Thanks to Aboorva Devarajan, Aditya Gupta, Amit Machhiwal, Benjamin Gray, Christophe Leroy, Dr. David Alan Gilbert, Gaurav Batra, Gautam Menghani, Geert Uytterhoeven, Haren Myneni, Hari Bathini, Joel Stanley, Jordan Niethe, Julia Lawall, Kautuk Consul, Kuan-Wei Chiu, Michael Neuling, Minjie Du, Muhammad Muzammil, Naveen N Rao, Nicholas Piggin, Nick Child, Nysal Jan K.A, Peter Lafreniere, Rob Herring, Sachin Sant, Sebastian Andrzej Siewior, Shrikanth Hegde, Srikar Dronamraju, Stanislav Kinsburskii, Vaibhav Jain, Wang Yufen, Yang Yingliang, and Yuan Tan. * tag 'powerpc-6.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (100 commits) powerpc/vmcore: Add MMU information to vmcoreinfo Revert "powerpc: add `cur_cpu_spec` symbol to vmcoreinfo" powerpc/bpf: use bpf_jit_binary_pack_[alloc|finalize|free] powerpc/bpf: rename powerpc64_jit_data to powerpc_jit_data powerpc/bpf: implement bpf_arch_text_invalidate for bpf_prog_pack powerpc/bpf: implement bpf_arch_text_copy powerpc/code-patching: introduce patch_instructions() powerpc/32s: Implement local_flush_tlb_page_psize() powerpc/pseries: use kfree_sensitive() in plpks_gen_password() powerpc/code-patching: Perform hwsync in __patch_instruction() in case of failure powerpc/fsl_msi: Use device_get_match_data() powerpc: Remove cpm_dp...() macros powerpc/qspinlock: Rename yield_propagate_owner tunable powerpc/qspinlock: Propagate sleepy if previous waiter is preempted powerpc/qspinlock: don't propagate the not-sleepy state powerpc/qspinlock: propagate owner preemptedness rather than CPU number powerpc/qspinlock: stop queued waiters trying to set lock sleepy powerpc/perf: Fix disabling BHRB and instruction sampling powerpc/trace: Add support for HAVE_FUNCTION_ARG_ACCESS_API powerpc/tools: Pass -mabi=elfv2 to gcc-check-mprofile-kernel.sh ...
2023-11-03Merge tag 'trace-v6.7' of ↵Linus Torvalds1-15/+13
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing updates from Steven Rostedt: - Remove eventfs_file descriptor This is the biggest change, and the second part of making eventfs create its files dynamically. In 6.6 the first part was added, and that maintained a one to one mapping between eventfs meta descriptors and the directories and file inodes and dentries that were dynamically created. The directories were represented by a eventfs_inode and the files were represented by a eventfs_file. In v6.7 the eventfs_file is removed. As all events have the same directory make up (sched_switch has an "enable", "id", "format", etc files), the handing of what files are underneath each leaf eventfs directory is moved back to the tracing subsystem via a callback. When an event is added to the eventfs, it registers an array of evenfs_entry's. These hold the names of the files and the callbacks to call when the file is referenced. The callback gets the name so that the same callback may be used by multiple files. The callback then supplies the filesystem_operations structure needed to create this file. This has brought the memory footprint of creating multiple eventfs instances down by 2 megs each! - User events now has persistent events that are not associated to a single processes. These are privileged events that hang around even if no process is attached to them - Clean up of seq_buf There's talk about using seq_buf more to replace strscpy() and friends. But this also requires some minor modifications of seq_buf to be able to do this - Expand instance ring buffers individually Currently if boot up creates an instance, and a trace event is enabled on that instance, the ring buffer for that instance and the top level ring buffer are expanded (1.4 MB per CPU). This wastes memory as this happens when nothing is using the top level instance - Other minor clean ups and fixes * tag 'trace-v6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (34 commits) seq_buf: Export seq_buf_puts() seq_buf: Export seq_buf_putc() eventfs: Use simple_recursive_removal() to clean up dentries eventfs: Remove special processing of dput() of events directory eventfs: Delete eventfs_inode when the last dentry is freed eventfs: Hold eventfs_mutex when calling callback functions eventfs: Save ownership and mode eventfs: Test for ei->is_freed when accessing ei->dentry eventfs: Have a free_ei() that just frees the eventfs_inode eventfs: Remove "is_freed" union with rcu head eventfs: Fix kerneldoc of eventfs_remove_rec() tracing: Have the user copy of synthetic event address use correct context eventfs: Remove extra dget() in eventfs_create_events_dir() tracing: Have trace_event_file have ref counters seq_buf: Introduce DECLARE_SEQ_BUF and seq_buf_str() eventfs: Fix typo in eventfs_inode union comment eventfs: Fix WARN_ON() in create_file_dentry() powerpc: Remove initialisation of readpos tracing/histograms: Simplify last_cmd_set() seq_buf: fix a misleading comment ...
2023-11-03Merge tag 'printk-for-6.7' of ↵Linus Torvalds1-13/+12
git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux Pull printk updates from Petr Mladek: - Another preparation step for introducing printk kthreads. The main piece is a per-console lock with several features: - Support three priorities: normal, emergency, and panic. They will be defined by a context where the lock is taken. A context with a higher priority is allowed to take over the lock from a context with a lower one. The plan is to use the emergency context for Oops and WARN() messages, and also by watchdogs. The panic() context will be used on panic CPU. - The owner might enter/exit regions where it is not safe to take over the lock. It allows the take over the lock a safe way in the middle of a message. For example, serial drivers emit characters one by one. And the serial port is in a safe state in between. Only the final console_flush_in_panic() will be allowed to take over the lock even in the unsafe state (last chance, pray, and hope). - A higher priority context might busy wait with a timeout. The current owner is informed about the waiter and releases the lock on exit from the unsafe state. - The new lock is safe even in atomic contexts, including NMI. Another change is a safe manipulation of per-console sequence number counter under the new lock. - simple_strntoull() micro-optimization - Reduce pr_flush() pooling time. - Calm down false warning about possible buffer invalid access to console buffers when CONFIG_PRINTK is disabled. [ .. and Thomas Gleixner wants to point out that while several of the commits are attributed to him, he only authored the early versions of said commits, and that John Ogness and Petr Mladek have been the ones who sorted out the details and really should be those who get the credit - Linus ] * tag 'printk-for-6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux: vsprintf: uninline simple_strntoull(), reorder arguments printk: printk: Remove unnecessary statements'len = 0;' printk: Reduce pr_flush() pooling time printk: fix illegal pbufs access for !CONFIG_PRINTK printk: nbcon: Allow drivers to mark unsafe regions and check state printk: nbcon: Add emit function and callback function for atomic printing printk: nbcon: Add sequence handling printk: nbcon: Add ownership state functions printk: nbcon: Add buffer management printk: Make static printk buffers available to nbcon printk: nbcon: Add acquire/release logic printk: Add non-BKL (nbcon) console basic infrastructure