aboutsummaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)AuthorFilesLines
2022-08-30USB: core: Prevent nested device-reset callsAlan Stern1-0/+2
Automatic kernel fuzzing revealed a recursive locking violation in usb-storage: ============================================ WARNING: possible recursive locking detected 5.18.0 #3 Not tainted -------------------------------------------- kworker/1:3/1205 is trying to acquire lock: ffff888018638db8 (&us_interface_key[i]){+.+.}-{3:3}, at: usb_stor_pre_reset+0x35/0x40 drivers/usb/storage/usb.c:230 but task is already holding lock: ffff888018638db8 (&us_interface_key[i]){+.+.}-{3:3}, at: usb_stor_pre_reset+0x35/0x40 drivers/usb/storage/usb.c:230 ... stack backtrace: CPU: 1 PID: 1205 Comm: kworker/1:3 Not tainted 5.18.0 #3 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014 Workqueue: usb_hub_wq hub_event Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106 print_deadlock_bug kernel/locking/lockdep.c:2988 [inline] check_deadlock kernel/locking/lockdep.c:3031 [inline] validate_chain kernel/locking/lockdep.c:3816 [inline] __lock_acquire.cold+0x152/0x3ca kernel/locking/lockdep.c:5053 lock_acquire kernel/locking/lockdep.c:5665 [inline] lock_acquire+0x1ab/0x520 kernel/locking/lockdep.c:5630 __mutex_lock_common kernel/locking/mutex.c:603 [inline] __mutex_lock+0x14f/0x1610 kernel/locking/mutex.c:747 usb_stor_pre_reset+0x35/0x40 drivers/usb/storage/usb.c:230 usb_reset_device+0x37d/0x9a0 drivers/usb/core/hub.c:6109 r871xu_dev_remove+0x21a/0x270 drivers/staging/rtl8712/usb_intf.c:622 usb_unbind_interface+0x1bd/0x890 drivers/usb/core/driver.c:458 device_remove drivers/base/dd.c:545 [inline] device_remove+0x11f/0x170 drivers/base/dd.c:537 __device_release_driver drivers/base/dd.c:1222 [inline] device_release_driver_internal+0x1a7/0x2f0 drivers/base/dd.c:1248 usb_driver_release_interface+0x102/0x180 drivers/usb/core/driver.c:627 usb_forced_unbind_intf+0x4d/0xa0 drivers/usb/core/driver.c:1118 usb_reset_device+0x39b/0x9a0 drivers/usb/core/hub.c:6114 This turned out not to be an error in usb-storage but rather a nested device reset attempt. That is, as the rtl8712 driver was being unbound from a composite device in preparation for an unrelated USB reset (that driver does not have pre_reset or post_reset callbacks), its ->remove routine called usb_reset_device() -- thus nesting one reset call within another. Performing a reset as part of disconnect processing is a questionable practice at best. However, the bug report points out that the USB core does not have any protection against nested resets. Adding a reset_in_progress flag and testing it will prevent such errors in the future. Link: https://lore.kernel.org/all/CAB7eexKUpvX-JNiLzhXBDWgfg2T9e9_0Tw4HQ6keN==voRbP0g@mail.gmail.com/ Cc: [email protected] Reported-and-tested-by: Rondreis <[email protected]> Signed-off-by: Alan Stern <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2022-08-30tty: Make ->set_termios() old ktermios constIlpo Järvinen1-2/+2
There should be no reason to adjust old ktermios which is going to get discarded anyway. Reviewed-by: Andy Shevchenko <[email protected]> Signed-off-by: Ilpo Järvinen <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2022-08-30usb: serial: Make ->set_termios() old ktermios constIlpo Järvinen1-2/+2
There should be no reason to adjust old ktermios which is going to get discarded anyway. Reviewed-by: Andy Shevchenko <[email protected]> Signed-off-by: Ilpo Järvinen <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2022-08-30serial: Make ->set_termios() old ktermios constIlpo Järvinen2-5/+5
There should be no reason to adjust old ktermios which is going to get discarded anyway. Reviewed-by: Andy Shevchenko <[email protected]> Signed-off-by: Ilpo Järvinen <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2022-08-30tty: Make ldisc ->set_termios() old ktermios constIlpo Järvinen1-2/+2
There should be no reason to adjust old ktermios which is going to get discarded anyway. Reviewed-by: Andy Shevchenko <[email protected]> Signed-off-by: Ilpo Järvinen <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2022-08-30tty: Make tty_termios_copy_hw() old ktermios constIlpo Järvinen1-1/+1
There should be no reason to adjust old ktermios which is going to get discarded anyway. Reviewed-by: Andy Shevchenko <[email protected]> Signed-off-by: Ilpo Järvinen <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2022-08-30tty: Remove baudrate dead code & make ktermios params constIlpo Järvinen1-1/+1
With the architectures currently in-tree, either: 1) CBAUDEX is zero 2) The earlier BOTHER if check covers cbaud < 1 case 3) All CBAUD bits are covered by the baud_table Thus, the check for cbaud being out-of-range for CBAUDEX case cannot ever be true. The ktermios parameters can now be made const. Reviewed-by: Andy Shevchenko <[email protected]> Signed-off-by: Ilpo Järvinen <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2022-08-30netlink: add helpers for extack attr presence checkingJakub Kicinski1-0/+11
Being able to check attribute presence and set extack if not on one line is handy, add helpers. Reviewed-by: Johannes Berg <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2022-08-30netlink: add support for ext_ack missing attributesJakub Kicinski1-0/+13
There is currently no way to report via extack in a structured way that an attribute is missing. This leads to families resorting to string messages. Add a pair of attributes - @offset and @type for machine-readable way of reporting missing attributes. The @offset points to the nest which should have contained the attribute, @type is the expected nla_type. The offset will be skipped if the attribute is missing at the message level rather than inside a nest. User space should be able to figure out which attribute enum (AKA attribute space AKA attribute set) the nest pointed to by @offset is using. Reviewed-by: Johannes Berg <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2022-08-30ARM: 9229/1: amba: Fix use-after-free in amba_read_periphid()Isaac Manjarres1-0/+1
After commit f2d3b9a46e0e ("ARM: 9220/1: amba: Remove deferred device addition"), it became possible for amba_read_periphid() to be invoked concurrently from two threads for a particular AMBA device. Consider the case where a thread (T0) is registering an AMBA driver, and searching for all of the devices it can match with on the AMBA bus. Suppose that another thread (T1) is executing the deferred probe work, and is searching through all of the AMBA drivers on the bus for a driver that matches a particular AMBA device. Assume that both threads begin operating on the same AMBA device and the device's peripheral ID is still unknown. In this scenario, the amba_match() function will be invoked for the same AMBA device by both threads, which means amba_read_periphid() can also be invoked by both threads, and both threads will be able to manipulate the AMBA device's pclk pointer without any synchronization. It's possible that one thread will initialize the pclk pointer, then the other thread will re-initialize it, overwriting the previous value, and both will race to free the same pclk, resulting in a use-after-free for whichever thread frees the pclk last. Add a lock per AMBA device to synchronize the handling with detecting the peripheral ID to avoid the use-after-free scenario. The following KFENCE bug report helped detect this problem: ================================================================== BUG: KFENCE: use-after-free read in clk_disable+0x14/0x34 Use-after-free read at 0x(ptrval) (in kfence-#19): clk_disable+0x14/0x34 amba_read_periphid+0xdc/0x134 amba_match+0x3c/0x84 __driver_attach+0x20/0x158 bus_for_each_dev+0x74/0xc0 bus_add_driver+0x154/0x1e8 driver_register+0x88/0x11c do_one_initcall+0x8c/0x2fc kernel_init_freeable+0x190/0x220 kernel_init+0x10/0x108 ret_from_fork+0x14/0x3c 0x0 kfence-#19: 0x(ptrval)-0x(ptrval), size=36, cache=kmalloc-64 allocated by task 8 on cpu 0 at 11.629931s: clk_hw_create_clk+0x38/0x134 amba_get_enable_pclk+0x10/0x68 amba_read_periphid+0x28/0x134 amba_match+0x3c/0x84 __device_attach_driver+0x2c/0xc4 bus_for_each_drv+0x80/0xd0 __device_attach+0xb0/0x1f0 bus_probe_device+0x88/0x90 deferred_probe_work_func+0x8c/0xc0 process_one_work+0x23c/0x690 worker_thread+0x34/0x488 kthread+0xd4/0xfc ret_from_fork+0x14/0x3c 0x0 freed by task 8 on cpu 0 at 11.630095s: amba_read_periphid+0xec/0x134 amba_match+0x3c/0x84 __device_attach_driver+0x2c/0xc4 bus_for_each_drv+0x80/0xd0 __device_attach+0xb0/0x1f0 bus_probe_device+0x88/0x90 deferred_probe_work_func+0x8c/0xc0 process_one_work+0x23c/0x690 worker_thread+0x34/0x488 kthread+0xd4/0xfc ret_from_fork+0x14/0x3c 0x0 Cc: Saravana Kannan <[email protected]> Cc: [email protected] Fixes: f2d3b9a46e0e ("ARM: 9220/1: amba: Remove deferred device addition") Reported-by: Guenter Roeck <[email protected]> Tested-by: Guenter Roeck <[email protected]> Signed-off-by: Isaac J. Manjarres <[email protected]> Signed-off-by: Russell King (Oracle) <[email protected]>
2022-08-30locking/percpu-rwsem: Add percpu_is_write_locked() and percpu_is_read_locked()Marco Elver1-0/+6
Implement simple accessors to probe percpu-rwsem's locked state: percpu_is_write_locked(), percpu_is_read_locked(). Signed-off-by: Marco Elver <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Dmitry Vyukov <[email protected]> Acked-by: Ian Rogers <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-08-30perf/hw_breakpoint: Make hw_breakpoint_weight() inlinableMarco Elver1-1/+0
Due to being a __weak function, hw_breakpoint_weight() will cause the compiler to always emit a call to it. This generates unnecessarily bad code (register spills etc.) for no good reason; in fact it appears in profiles of `perf bench -r 100 breakpoint thread -b 4 -p 128 -t 512`: ... 0.70% [kernel] [k] hw_breakpoint_weight ... While a small percentage, no architecture defines its own hw_breakpoint_weight() nor are there users outside hw_breakpoint.c, which makes the fact it is currently __weak a poor choice. Change hw_breakpoint_weight()'s definition to follow a similar protocol to hw_breakpoint_slots(), such that if <asm/hw_breakpoint.h> defines hw_breakpoint_weight(), we'll use it instead. The result is that it is inlined and no longer shows up in profiles. Signed-off-by: Marco Elver <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Dmitry Vyukov <[email protected]> Acked-by: Ian Rogers <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-08-30perf/hw_breakpoint: Optimize list of per-task breakpointsMarco Elver1-1/+2
On a machine with 256 CPUs, running the recently added perf breakpoint benchmark results in: | $> perf bench -r 30 breakpoint thread -b 4 -p 64 -t 64 | # Running 'breakpoint/thread' benchmark: | # Created/joined 30 threads with 4 breakpoints and 64 parallelism | Total time: 236.418 [sec] | | 123134.794271 usecs/op | 7880626.833333 usecs/op/cpu The benchmark tests inherited breakpoint perf events across many threads. Looking at a perf profile, we can see that the majority of the time is spent in various hw_breakpoint.c functions, which execute within the 'nr_bp_mutex' critical sections which then results in contention on that mutex as well: 37.27% [kernel] [k] osq_lock 34.92% [kernel] [k] mutex_spin_on_owner 12.15% [kernel] [k] toggle_bp_slot 11.90% [kernel] [k] __reserve_bp_slot The culprit here is task_bp_pinned(), which has a runtime complexity of O(#tasks) due to storing all task breakpoints in the same list and iterating through that list looking for a matching task. Clearly, this does not scale to thousands of tasks. Instead, make use of the "rhashtable" variant "rhltable" which stores multiple items with the same key in a list. This results in average runtime complexity of O(1) for task_bp_pinned(). With the optimization, the benchmark shows: | $> perf bench -r 30 breakpoint thread -b 4 -p 64 -t 64 | # Running 'breakpoint/thread' benchmark: | # Created/joined 30 threads with 4 breakpoints and 64 parallelism | Total time: 0.208 [sec] | | 108.422396 usecs/op | 6939.033333 usecs/op/cpu On this particular setup that's a speedup of ~1135x. While one option would be to make task_struct a breakpoint list node, this would only further bloat task_struct for infrequently used data. Furthermore, after all optimizations in this series, there's no evidence it would result in better performance: later optimizations make the time spent looking up entries in the hash table negligible (we'll reach the theoretical ideal performance i.e. no constraints). Signed-off-by: Marco Elver <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Dmitry Vyukov <[email protected]> Acked-by: Ian Rogers <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-08-30perf/hw_breakpoint: Provide hw_breakpoint_is_used() and use in testMarco Elver1-0/+3
Provide hw_breakpoint_is_used() to check if breakpoints are in use on the system. Use it in the KUnit test to verify the global state before and after a test case. Signed-off-by: Marco Elver <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Dmitry Vyukov <[email protected]> Acked-by: Ian Rogers <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-08-30soundwire: bus: allow device number to be unique at system levelPierre-Louis Bossart1-0/+4
The SoundWire specification allows the device number to be allocated at will. When a system includes multiple SoundWire links, the device number scope is limited to the link to which the device is attached. However, for integration/debug it can be convenient to have a unique device number across the system. This patch adds a 'dev_num_ida_min' field at the bus level, which when set will be used to allocate an IDA. The allocation happens when a hardware device reports as ATTACHED. If any error happens during the enumeration, the allocated IDA is not freed - the device number will be reused if/when the device re-joins the bus. The IDA is only freed when the Linux device is unregistered. Signed-off-by: Pierre-Louis Bossart <[email protected]> Reviewed-by: Ranjani Sridharan <[email protected]> Reviewed-by: Rander Wang <[email protected]> Signed-off-by: Bard Liao <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Vinod Koul <[email protected]>
2022-08-29Merge tag '[email protected]' into ↵Bjorn Andersson1-0/+30
drivers-for-6.1 v6.0-rc1 + [email protected] + [email protected] Signed-off-by: Bjorn Andersson <[email protected]>
2022-08-29soc: qcom: llcc: Pass LLCC version based register offsets to EDAC driverManivannan Sadhasivam1-0/+30
The LLCC EDAC register offsets varies between each SoCs. Until now, the EDAC driver used the hardcoded register offsets. But this caused crash on SM8450 SoC where the register offsets has been changed. So to avoid this crash and also to make it easy to accommodate changes for new SoCs, let's pass the LLCC version specific register offsets to the EDAC driver. Currently, two set of offsets are used. One is starting from LLCC version v1.0.0 used by all SoCs other than SM8450. For SM8450, LLCC version starting from v2.1.0 is used. Signed-off-by: Manivannan Sadhasivam <[email protected]> Reviewed-by: Sai Prakash Ranjan <[email protected]> Signed-off-by: Bjorn Andersson <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-08-29soc: qcom: qmi: use const for struct qmi_elem_infoJeff Johnson1-10/+10
Currently all usage of struct qmi_elem_info, which is used to define the QMI message encoding/decoding rules, does not use const. This prevents clients from registering const arrays. Since these arrays are always pre-defined, they should be const, so add the const qualifier to all places in the QMI interface where struct qmi_elem_info is used. Once this patch is in place, clients can independently update their pre-defined arrays to be const, as demonstrated in the QMI sample code. Signed-off-by: Jeff Johnson <[email protected]> Signed-off-by: Bjorn Andersson <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-08-29tracing: Define the is_signed_type() macro onceBart Van Assche3-3/+6
There are two definitions of the is_signed_type() macro: one in <linux/overflow.h> and a second definition in <linux/trace_events.h>. As suggested by Linus, move the definition of the is_signed_type() macro into the <linux/compiler.h> header file. Change the definition of the is_signed_type() macro to make sure that it does not trigger any sparse warnings with future versions of sparse for bitwise types. Link: https://lore.kernel.org/all/CAHk-=whjH6p+qzwUdx5SOVVHjS3WvzJQr6mDUwhEyTf6pJWzaQ@mail.gmail.com/ Link: https://lore.kernel.org/all/CAHk-=wjQGnVfb4jehFR0XyZikdQvCZouE96xR_nnf5kqaM5qqQ@mail.gmail.com/ Cc: Rasmus Villemoes <[email protected]> Cc: Steven Rostedt <[email protected]> Acked-by: Kees Cook <[email protected]> Signed-off-by: Bart Van Assche <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-08-29ethernet: Add helpers to recognize addresses mapped to IP multicastCasper Andersson1-0/+22
IP multicast must sometimes be discriminated from non-IP multicast, e.g. when determining the forwarding behavior of a given group in the presence of multicast router ports on an offloaded bridge. Therefore, provide helpers to identify these groups. Signed-off-by: Casper Andersson <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-08-29genetlink: start to validate reserved header bytesJakub Kicinski1-0/+1
We had historically not checked that genlmsghdr.reserved is 0 on input which prevents us from using those precious bytes in the future. One use case would be to extend the cmd field, which is currently just 8 bits wide and 256 is not a lot of commands for some core families. To make sure that new families do the right thing by default put the onus of opting out of validation on existing families. Signed-off-by: Jakub Kicinski <[email protected]> Acked-by: Paul Moore <[email protected]> (NetLabel) Signed-off-by: David S. Miller <[email protected]>
2022-08-29x86/earlyprintk: Clean up pciserialPeter Zijlstra1-0/+3
While working on a GRUB patch to support PCI-serial, a number of cleanups were suggested that apply to the code I took inspiration from. Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Acked-by: Bjorn Helgaas <[email protected]> # pci_ids.h Reviewed-by: Greg Kroah-Hartman <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2022-08-28Merge tag 'mm-hotfixes-stable-2022-08-28' of ↵Linus Torvalds2-5/+23
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull more hotfixes from Andrew Morton: "Seventeen hotfixes. Mostly memory management things. Ten patches are cc:stable, addressing pre-6.0 issues" * tag 'mm-hotfixes-stable-2022-08-28' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: .mailmap: update Luca Ceresoli's e-mail address mm/mprotect: only reference swap pfn page if type match squashfs: don't call kmalloc in decompressors mm/damon/dbgfs: avoid duplicate context directory creation mailmap: update email address for Colin King asm-generic: sections: refactor memory_intersects bootmem: remove the vmemmap pages from kmemleak in put_page_bootmem ocfs2: fix freeing uninitialized resource on ocfs2_dlm_shutdown Revert "memcg: cleanup racy sum avoidance code" mm/zsmalloc: do not attempt to free IS_ERR handle binder_alloc: add missing mmap_lock calls when using the VMA mm: re-allow pinning of zero pfns (again) vmcoreinfo: add kallsyms_num_syms symbol mailmap: update Guilherme G. Piccoli's email addresses writeback: avoid use-after-free after removing device shmem: update folio if shmem_replace_page() updates the page mm/hugetlb: avoid corrupting page->mapping in hugetlb_mcopy_atomic_pte
2022-08-28Revert "memcg: cleanup racy sum avoidance code"Shakeel Butt1-2/+13
This reverts commit 96e51ccf1af33e82f429a0d6baebba29c6448d0f. Recently we started running the kernel with rstat infrastructure on production traffic and begin to see negative memcg stats values. Particularly the 'sock' stat is the one which we observed having negative value. $ grep "sock " /mnt/memory/job/memory.stat sock 253952 total_sock 18446744073708724224 Re-run after couple of seconds $ grep "sock " /mnt/memory/job/memory.stat sock 253952 total_sock 53248 For now we are only seeing this issue on large machines (256 CPUs) and only with 'sock' stat. I think the networking stack increase the stat on one cpu and decrease it on another cpu much more often. So, this negative sock is due to rstat flusher flushing the stats on the CPU that has seen the decrement of sock but missed the CPU that has increments. A typical race condition. For easy stable backport, revert is the most simple solution. For long term solution, I am thinking of two directions. First is just reduce the race window by optimizing the rstat flusher. Second is if the reader sees a negative stat value, force flush and restart the stat collection. Basically retry but limited. Link: https://lkml.kernel.org/r/[email protected] Fixes: 96e51ccf1af33e8 ("memcg: cleanup racy sum avoidance code") Signed-off-by: Shakeel Butt <[email protected]> Cc: "Michal Koutný" <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Muchun Song <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Yosry Ahmed <[email protected]> Cc: Greg Thelen <[email protected]> Cc: <[email protected]> [5.15] Signed-off-by: Andrew Morton <[email protected]>
2022-08-28mm: re-allow pinning of zero pfns (again)Alex Williamson1-3/+10
The below referenced commit makes the same error as 1c563432588d ("mm: fix is_pinnable_page against a cma page"), re-interpreting the logic to exclude pinning of the zero page, which breaks device assignment with vfio. To avoid further subtle mistakes, split the logic into discrete tests. [[email protected]: simplify comment, per John] Link: https://lkml.kernel.org/r/166015037385.760108.16881097713975517242.stgit@omen Link: https://lore.kernel.org/all/165490039431.944052.12458624139225785964.stgit@omen Fixes: f25cbb7a95a2 ("mm: add zone device coherent type memory support") Signed-off-by: Alex Williamson <[email protected]> Suggested-by: Matthew Wilcox <[email protected]> Suggested-by: Felix Kuehling <[email protected]> Tested-by: Slawomir Laba <[email protected]> Reviewed-by: John Hubbard <[email protected]> Cc: Alex Sierra <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Alistair Popple <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-08-28lib/string_helpers: Add str_read_write() helperAndy Shevchenko1-0/+5
Add str_read_write() helper to return 'read' or 'write' string literal. Signed-off-by: Andy Shevchenko <[email protected]> Signed-off-by: Dmitry Rokosov <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jonathan Cameron <[email protected]>
2022-08-28units: complement the set of Hz unitsDmitry Rokosov1-0/+3
Currently, Hz units do not have milli, micro and nano Hz coefficients. Some drivers (IIO especially) use their analogues to calculate appropriate Hz values. This patch includes them to units.h definitions, so they can be used from different kernel places. Signed-off-by: Dmitry Rokosov <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jonathan Cameron <[email protected]>
2022-08-27perf/core: Add speculation info to branch entriesSandipan Das1-0/+1
Add a new "spec" bitfield to branch entries for providing speculation information. This will be populated using hints provided by branch sampling features on supported hardware. The following cases are covered: * No branch speculation information is available * Branch is speculative but taken on the wrong path * Branch is non-speculative but taken on the correct path * Branch is speculative and taken on the correct path Suggested-by: Stephane Eranian <[email protected]> Signed-off-by: Sandipan Das <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/r/834088c302faf21c7b665031dd111f424e509a64.1660211399.git.sandipan.das@amd.com
2022-08-26Merge branch 'for-6.0-fixes' into for-6.1Tejun Heo2-6/+1
Pulling to receive 43626dade36f ("group: Add missing cpus_read_lock() to cgroup_attach_task_all()") for a follow-up patch.
2022-08-26cgroup: Homogenize cgroup_get_from_id() return valueMichal Koutný1-5/+0
Cgroup id is user provided datum hence extend its return domain to include possible error reason (similar to cgroup_get_from_fd()). This change also fixes commit d4ccaf58a847 ("bpf: Introduce cgroup iter") that would use NULL instead of proper error handling in d4ccaf58a847 ("bpf: Introduce cgroup iter"). Additionally, neither of: fc_appid_store, bpf_iter_attach_cgroup, mem_cgroup_get_from_ino (callers of cgroup_get_from_fd) is built without CONFIG_CGROUPS (depends via CONFIG_BLK_CGROUP, direct, transitive CONFIG_MEMCG respectively) transitive, so drop the singular definition not needed with !CONFIG_CGROUPS. Fixes: d4ccaf58a847 ("bpf: Introduce cgroup iter") Signed-off-by: Michal Koutný <[email protected]> Signed-off-by: Tejun Heo <[email protected]>
2022-08-26wait_on_bit: add an acquire memory barrierMikulas Patocka3-5/+6
There are several places in the kernel where wait_on_bit is not followed by a memory barrier (for example, in drivers/md/dm-bufio.c:new_read). On architectures with weak memory ordering, it may happen that memory accesses that follow wait_on_bit are reordered before wait_on_bit and they may return invalid data. Fix this class of bugs by introducing a new function "test_bit_acquire" that works like test_bit, but has acquire memory ordering semantics. Signed-off-by: Mikulas Patocka <[email protected]> Acked-by: Will Deacon <[email protected]> Cc: [email protected] Signed-off-by: Linus Torvalds <[email protected]>
2022-08-26lsm,io_uring: add LSM hooks for the new uring_cmd file opLuis Chamberlain3-0/+9
io-uring cmd support was added through ee692a21e9bf ("fs,io_uring: add infrastructure for uring-cmd"), this extended the struct file_operations to allow a new command which each subsystem can use to enable command passthrough. Add an LSM specific for the command passthrough which enables LSMs to inspect the command details. This was discussed long ago without no clear pointer for something conclusive, so this enables LSMs to at least reject this new file operation. [0] https://lkml.kernel.org/r/[email protected] Cc: [email protected] Fixes: ee692a21e9bf ("fs,io_uring: add infrastructure for uring-cmd") Signed-off-by: Luis Chamberlain <[email protected]> Acked-by: Jens Axboe <[email protected]> Signed-off-by: Paul Moore <[email protected]>
2022-08-26Merge tag 'wireless-next-2022-08-26-v2' of ↵David S. Miller1-4/+10
git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next Johannes berg says: ==================== Various updates: * rtw88: operation, locking, warning, and code style fixes * rtw89: small updates * cfg80211/mac80211: more EHT/MLO (802.11be, WiFi 7) work * brcmfmac: a couple of fixes * misc cleanups etc. ==================== Signed-off-by: David S. Miller <[email protected]>
2022-08-26platform/x86: asus-wmi: Implement TUF laptop keyboard power statesLuke D. Jones1-0/+3
Adds support for setting various power states of TUF keyboards. These states are combinations of: - boot, set if a boot animation is shown on keyboard - awake, set if the keyboard LEDs are visible while laptop is on - sleep, set if an animation is displayed while the laptop is suspended - keyboard (unknown effect) Adds two sysfs attributes to asus::kbd_backlight: - kbd_rgb_state - kbd_rgb_state_index Signed-off-by: Luke D. Jones <[email protected]> Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Hans de Goede <[email protected]> Signed-off-by: Hans de Goede <[email protected]>
2022-08-26platform/x86: asus-wmi: Implement TUF laptop keyboard LED modesLuke D. Jones1-0/+3
Adds support for changing the laptop keyboard LED mode and colour. The modes are visible effects such as static, rainbow, pulsing, colour cycles. These sysfs attributes are added to asus::kbd_backlight: - kbd_rgb_mode - kbd_rgb_mode_index Signed-off-by: Luke D. Jones <[email protected]> Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Hans de Goede <[email protected]> Signed-off-by: Hans de Goede <[email protected]>
2022-08-26platform/x86: asus-wmi: Support the GPU fan on TUF laptopsLuke D. Jones1-0/+1
Add support for TUF laptops which have the ability to control the GPU fan. This will show as a second fan in hwmon, and has the ability to run as boost (fullspeed), or auto. Signed-off-by: Luke D. Jones <[email protected]> Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Hans de Goede <[email protected]> Signed-off-by: Hans de Goede <[email protected]>
2022-08-25Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski14-65/+96
drivers/net/ethernet/mellanox/mlx5/core/en_fs.c 21234e3a84c7 ("net/mlx5e: Fix use after free in mlx5e_fs_init()") c7eafc5ed068 ("net/mlx5e: Convert ethtool_steering member of flow_steering struct to pointer") https://lore.kernel.org/all/[email protected]/ https://lore.kernel.org/all/[email protected]/ Signed-off-by: Jakub Kicinski <[email protected]>
2022-08-26ata: libata: Rename ATA_DFLAG_NCQ_PRIO_ENABLEDamien Le Moal1-1/+1
Rename ATA_DFLAG_NCQ_PRIO_ENABLE to ATA_DFLAG_NCQ_PRIO_ENABLED to match the fact that this flags indicates if NCQ priority use is enabled by the user. Signed-off-by: Damien Le Moal <[email protected]>
2022-08-25netdev: Use try_cmpxchg in napi_if_scheduled_mark_missedUros Bizjak1-2/+2
Use try_cmpxchg instead of cmpxchg (*ptr, old, new) == old in napi_if_scheduled_mark_missed. x86 CMPXCHG instruction returns success in ZF flag, so this change saves a compare after cmpxchg (and related move instruction in front of cmpxchg). Also, try_cmpxchg implicitly assigns old *ptr value to "old" when cmpxchg fails, enabling further code simplifications. Cc: Eric Dumazet <[email protected]> Cc: Paolo Abeni <[email protected]> Signed-off-by: Uros Bizjak <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-08-25Merge tag 'net-6.0-rc3' of ↵Linus Torvalds3-7/+18
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from ipsec and netfilter (with one broken Fixes tag). Current release - new code bugs: - dsa: don't dereference NULL extack in dsa_slave_changeupper() - dpaa: fix <1G ethernet on LS1046ARDB - neigh: don't call kfree_skb() under spin_lock_irqsave() Previous releases - regressions: - r8152: fix the RX FIFO settings when suspending - dsa: microchip: keep compatibility with device tree blobs with no phy-mode - Revert "net: macsec: update SCI upon MAC address change." - Revert "xfrm: update SA curlft.use_time", comply with RFC 2367 Previous releases - always broken: - netfilter: conntrack: work around exceeded TCP receive window - ipsec: fix a null pointer dereference of dst->dev on a metadata dst in xfrm_lookup_with_ifid - moxa: get rid of asymmetry in DMA mapping/unmapping - dsa: microchip: make learning configurable and keep it off while standalone - ice: xsk: prohibit usage of non-balanced queue id - rxrpc: fix locking in rxrpc's sendmsg Misc: - another chunk of sysctl data race silencing" * tag 'net-6.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (87 commits) net: lantiq_xrx200: restore buffer if memory allocation failed net: lantiq_xrx200: fix lock under memory pressure net: lantiq_xrx200: confirm skb is allocated before using net: stmmac: work around sporadic tx issue on link-up ionic: VF initial random MAC address if no assigned mac ionic: fix up issues with handling EAGAIN on FW cmds ionic: clear broken state on generation change rxrpc: Fix locking in rxrpc's sendmsg net: ethernet: mtk_eth_soc: fix hw hash reporting for MTK_NETSYS_V2 MAINTAINERS: rectify file entry in BONDING DRIVER i40e: Fix incorrect address type for IPv6 flow rules ixgbe: stop resetting SYSTIME in ixgbe_ptp_start_cyclecounter net: Fix a data-race around sysctl_somaxconn. net: Fix a data-race around netdev_unregister_timeout_secs. net: Fix a data-race around gro_normal_batch. net: Fix data-races around sysctl_devconf_inherit_init_net. net: Fix data-races around sysctl_fb_tunnels_only_for_init_net. net: Fix a data-race around netdev_budget_usecs. net: Fix data-races around sysctl_max_skb_frags. net: Fix a data-race around netdev_budget. ...
2022-08-25of/device: Fix up of_dma_configure_id() stubThierry Reding1-2/+3
Since the stub version of of_dma_configure_id() was added in commit a081bd4af4ce ("of/device: Add input id to of_dma_configure()"), it has not matched the signature of the full function, leading to build failure reports when code using this function is built on !OF configurations. Fixes: a081bd4af4ce ("of/device: Add input id to of_dma_configure()") Cc: [email protected] Signed-off-by: Thierry Reding <[email protected]> Reviewed-by: Frank Rowand <[email protected]> Acked-by: Lorenzo Pieralisi <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Rob Herring <[email protected]>
2022-08-25bpf: Introduce cgroup iterHao Luo1-0/+8
Cgroup_iter is a type of bpf_iter. It walks over cgroups in four modes: - walking a cgroup's descendants in pre-order. - walking a cgroup's descendants in post-order. - walking a cgroup's ancestors. - process only the given cgroup. When attaching cgroup_iter, one can set a cgroup to the iter_link created from attaching. This cgroup is passed as a file descriptor or cgroup id and serves as the starting point of the walk. If no cgroup is specified, the starting point will be the root cgroup v2. For walking descendants, one can specify the order: either pre-order or post-order. For walking ancestors, the walk starts at the specified cgroup and ends at the root. One can also terminate the walk early by returning 1 from the iter program. Note that because walking cgroup hierarchy holds cgroup_mutex, the iter program is called with cgroup_mutex held. Currently only one session is supported, which means, depending on the volume of data bpf program intends to send to user space, the number of cgroups that can be walked is limited. For example, given the current buffer size is 8 * PAGE_SIZE, if the program sends 64B data for each cgroup, assuming PAGE_SIZE is 4kb, the total number of cgroups that can be walked is 512. This is a limitation of cgroup_iter. If the output data is larger than the kernel buffer size, after all data in the kernel buffer is consumed by user space, the subsequent read() syscall will signal EOPNOTSUPP. In order to work around, the user may have to update their program to reduce the volume of data sent to output. For example, skip some uninteresting cgroups. In future, we may extend bpf_iter flags to allow customizing buffer size. Acked-by: Yonghong Song <[email protected]> Acked-by: Tejun Heo <[email protected]> Signed-off-by: Hao Luo <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2022-08-25platform/x86: asus-wmi: Add support for ROG X13 tablet modeLuke D. Jones1-0/+1
Add quirk for ASUS ROG X13 Flow 2-in-1 to enable tablet mode with lid flip (all screen rotations). Signed-off-by: Luke D. Jones <[email protected]> Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Hans de Goede <[email protected]> Signed-off-by: Hans de Goede <[email protected]>
2022-08-25platform/x86: asus-wmi: Support the hardware GPU MUX on some laptopsLuke D. Jones1-0/+3
Support the hardware GPU MUX switch available on some models. This switch can toggle the MUX between: - 0, Dedicated mode - 1, Optimus mode Optimus mode is the regular iGPU + dGPU available, while dedicated mode switches the system to have only the dGPU available. Signed-off-by: Luke D. Jones <[email protected]> Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Hans de Goede <[email protected]> Signed-off-by: Hans de Goede <[email protected]>
2022-08-25platform/x86: pmc_atom: Amend comment style and grammarAndy Shevchenko1-2/+2
The style of the comments is not uniform, make it so and fix a few grammar issues. While at it, update Copyright years. Signed-off-by: Andy Shevchenko <[email protected]> Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Hans de Goede <[email protected]> Signed-off-by: Hans de Goede <[email protected]>
2022-08-25platform/x86: pmc_atom: Fix SLP_TYPx bitfield maskAndy Shevchenko1-2/+4
On Intel hardware the SLP_TYPx bitfield occupies bits 10-12 as per ACPI specification (see Table 4.13 "PM1 Control Registers Fixed Hardware Feature Control Bits" for the details). Fix the mask and other related definitions accordingly. Fixes: 93e5eadd1f6e ("x86/platform: New Intel Atom SOC power management controller driver") Signed-off-by: Andy Shevchenko <[email protected]> Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Hans de Goede <[email protected]> Signed-off-by: Hans de Goede <[email protected]>
2022-08-25wifi: cfg80211/mac80211: check EHT capability size correctlyJohannes Berg1-4/+10
For AP/non-AP the EHT MCS/NSS subfield size differs, the 4-octet subfield is only used for 20 MHz-only non-AP STA. Pass an argument around everywhere to be able to parse it properly. Signed-off-by: Johannes Berg <[email protected]>
2022-08-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nfJakub Kicinski1-4/+0
Pablo Neira Ayuso says: ==================== Netfilter fixes for net 1) Fix crash with malformed ebtables blob which do not provide all entry points, from Florian Westphal. 2) Fix possible TCP connection clogging up with default 5-days timeout in conntrack, from Florian. 3) Fix crash in nf_tables tproxy with unsupported chains, also from Florian. 4) Do not allow to update implicit chains. 5) Make table handle allocation per-netns to fix data race. 6) Do not truncated payload length and offset, and checksum offset. Instead report EINVAl. 7) Enable chain stats update via static key iff no error occurs. 8) Restrict osf expression to ip, ip6 and inet families. 9) Restrict tunnel expression to netdev family. 10) Fix crash when trying to bind again an already bound chain. 11) Flowtable garbage collector might leave behind pending work to delete entries. This patch comes with a previous preparation patch as dependency. 12) Allow net.netfilter.nf_conntrack_frag6_high_thresh to be lowered, from Eric Dumazet. * git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: nf_defrag_ipv6: allow nf_conntrack_frag6_high_thresh increases netfilter: flowtable: fix stuck flows on cleanup due to pending work netfilter: flowtable: add function to invoke garbage collection immediately netfilter: nf_tables: disallow binding to already bound chain netfilter: nft_tunnel: restrict it to netdev family netfilter: nft_osf: restrict osf to ipv4, ipv6 and inet families netfilter: nf_tables: do not leave chain stats enabled on error netfilter: nft_payload: do not truncate csum_offset and csum_type netfilter: nft_payload: report ERANGE for too long offset and length netfilter: nf_tables: make table handle allocation per-netns friendly netfilter: nf_tables: disallow updates of implicit chain netfilter: nft_tproxy: restrict to prerouting hook netfilter: conntrack: work around exceeded receive window netfilter: ebtables: reject blobs that don't provide all entry points ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-08-24bpf: Fix reference state management for synchronous callbacksKumar Kartikeya Dwivedi1-0/+11
Currently, verifier verifies callback functions (sync and async) as if they will be executed once, (i.e. it explores execution state as if the function was being called once). The next insn to explore is set to start of subprog and the exit from nested frame is handled using curframe > 0 and prepare_func_exit. In case of async callback it uses a customized variant of push_stack simulating a kind of branch to set up custom state and execution context for the async callback. While this approach is simple and works when callback really will be executed only once, it is unsafe for all of our current helpers which are for_each style, i.e. they execute the callback multiple times. A callback releasing acquired references of the caller may do so multiple times, but currently verifier sees it as one call inside the frame, which then returns to caller. Hence, it thinks it released some reference that the cb e.g. got access through callback_ctx (register filled inside cb from spilled typed register on stack). Similarly, it may see that an acquire call is unpaired inside the callback, so the caller will copy the reference state of callback and then will have to release the register with new ref_obj_ids. But again, the callback may execute multiple times, but the verifier will only account for acquired references for a single symbolic execution of the callback, which will cause leaks. Note that for async callback case, things are different. While currently we have bpf_timer_set_callback which only executes it once, even for multiple executions it would be safe, as reference state is NULL and check_reference_leak would force program to release state before BPF_EXIT. The state is also unaffected by analysis for the caller frame. Hence async callback is safe. Since we want the reference state to be accessible, e.g. for pointers loaded from stack through callback_ctx's PTR_TO_STACK, we still have to copy caller's reference_state to callback's bpf_func_state, but we enforce that whatever references it adds to that reference_state has been released before it hits BPF_EXIT. This requires introducing a new callback_ref member in the reference state to distinguish between caller vs callee references. Hence, check_reference_leak now errors out if it sees we are in callback_fn and we have not released callback_ref refs. Since there can be multiple nested callbacks, like frame 0 -> cb1 -> cb2 etc. we need to also distinguish between whether this particular ref belongs to this callback frame or parent, and only error for our own, so we store state->frameno (which is always non-zero for callbacks). In short, callbacks can read parent reference_state, but cannot mutate it, to be able to use pointers acquired by the caller. They must only undo their changes (by releasing their own acquired_refs before BPF_EXIT) on top of caller reference_state before returning (at which point the caller and callback state will match anyway, so no need to copy it back to caller). Fixes: 69c087ba6225 ("bpf: Add bpf_for_each_map_elem() helper") Signed-off-by: Kumar Kartikeya Dwivedi <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2022-08-24mm: add NR_SECONDARY_PAGETABLE to count secondary page table uses.Yosry Ahmed1-0/+1
We keep track of several kernel memory stats (total kernel memory, page tables, stack, vmalloc, etc) on multiple levels (global, per-node, per-memcg, etc). These stats give insights to users to how much memory is used by the kernel and for what purposes. Currently, memory used by KVM mmu is not accounted in any of those kernel memory stats. This patch series accounts the memory pages used by KVM for page tables in those stats in a new NR_SECONDARY_PAGETABLE stat. This stat can be later extended to account for other types of secondary pages tables (e.g. iommu page tables). KVM has a decent number of large allocations that aren't for page tables, but for most of them, the number/size of those allocations scales linearly with either the number of vCPUs or the amount of memory assigned to the VM. KVM's secondary page table allocations do not scale linearly, especially when nested virtualization is in use. From a KVM perspective, NR_SECONDARY_PAGETABLE will scale with KVM's per-VM pages_{4k,2m,1g} stats unless the guest is doing something bizarre (e.g. accessing only 4kb chunks of 2mb pages so that KVM is forced to allocate a large number of page tables even though the guest isn't accessing that much memory). However, someone would need to either understand how KVM works to make that connection, or know (or be told) to go look at KVM's stats if they're running VMs to better decipher the stats. Furthermore, having NR_PAGETABLE side-by-side with NR_SECONDARY_PAGETABLE is informative. For example, when backing a VM with THP vs. HugeTLB, NR_SECONDARY_PAGETABLE is roughly the same, but NR_PAGETABLE is an order of magnitude higher with THP. So having this stat will at the very least prove to be useful for understanding tradeoffs between VM backing types, and likely even steer folks towards potential optimizations. The original discussion with more details about the rationale: https://lore.kernel.org/all/[email protected] This stat will be used by subsequent patches to count KVM mmu memory usage. Signed-off-by: Yosry Ahmed <[email protected]> Acked-by: Shakeel Butt <[email protected]> Acked-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]>