aboutsummaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)AuthorFilesLines
2023-03-28mm,jfs: move write_one_page/folio_write_one to jfsChristoph Hellwig1-6/+0
The last remaining user of folio_write_one through the write_one_page wrapper is jfs, so move the functionality there and hard code the call to metapage_writepage. Note that the use of the pagecache by the JFS 'metapage' buffer cache is a bit odd, and we could probably do without VM-level dirty tracking at all, but that's a change for another time. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Christoph Hellwig <[email protected]> Acked-by: Dave Kleikamp <[email protected]> Cc: Changwei Ge <[email protected]> Cc: Evgeniy Dushistov <[email protected]> Cc: Gang He <[email protected]> Cc: Jan Kara <[email protected]> Cc: Jan Kara via Ocfs2-devel <[email protected]> Cc: Joel Becker <[email protected]> Cc: Joseph Qi <[email protected]> Cc: Joseph Qi <[email protected]> Cc: Jun Piao <[email protected]> Cc: Junxiao Bi <[email protected]> Cc: Mark Fasheh <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-03-28mm, memcg: Prevent memory.swappiness load/store tearingYue Zhao1-4/+4
The knob for cgroup v1 memory controller: memory.swappiness is not protected by any locking so it can be modified while it is used. This is not an actual problem because races are unlikely. But it is better to use [READ|WRITE]_ONCE to prevent compiler from doing anything funky. The access of memcg->swappiness and vm_swappiness is lockless, so both of them can be concurrently set at the same time as we are trying to read them. All occurrences of memcg->swappiness and vm_swappiness are updated with [READ|WRITE]_ONCE. [[email protected]: v3] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Yue Zhao <[email protected]> Acked-by: Michal Hocko <[email protected]> Acked-by: Shakeel Butt <[email protected]> Acked-by: Roman Gushchin <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Muchun Song <[email protected]> Cc: Tang Yizhou <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-03-28mm: add PTE pointer parameter to flush_tlb_fix_spurious_fault()Gerald Schaefer1-1/+1
s390 can do more fine-grained handling of spurious TLB protection faults, when there also is the PTE pointer available. Therefore, pass on the PTE pointer to flush_tlb_fix_spurious_fault() as an additional parameter. This will add no functional change to other architectures, but those with private flush_tlb_fix_spurious_fault() implementations need to be made aware of the new parameter. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Gerald Schaefer <[email protected]> Reviewed-by: Alexander Gordeev <[email protected]> Acked-by: Catalin Marinas <[email protected]> [arm64] Acked-by: Michael Ellerman <[email protected]> [powerpc] Acked-by: David Hildenbrand <[email protected]> Cc: Anshuman Khandual <[email protected]> Cc: Borislav Petkov (AMD) <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Thomas Bogendoerfer <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-03-28mm: swap: remove unneeded cgroup_throttle_swaprate()Kefeng Wang1-8/+4
All the callers of cgroup_throttle_swaprate() are converted to folio_throttle_swaprate(), so make __cgroup_throttle_swaprate() to take a folio, and rename it to __folio_throttle_swaprate(), also rename gfp_mask to gfp and drop redundant extern keyword. finally, drop unused cgroup_throttle_swaprate(). Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Kefeng Wang <[email protected]> Reviewed-by: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-03-28kasan: call clear_page with a match-all tag instead of changing page tagPeter Collingbourne1-5/+3
Instead of changing the page's tag solely in order to obtain a pointer with a match-all tag and then changing it back again, just convert the pointer that we get from kmap_atomic() into one with a match-all tag before passing it to clear_page(). On a certain microarchitecture, this has been observed to cause a measurable improvement in microbenchmark performance, presumably as a result of being able to avoid the atomic operations on the page tag. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Peter Collingbourne <[email protected]> Link: https://linux-review.googlesource.com/id/I0249822cc29097ca7a04ad48e8eb14871f80e711 Reviewed-by: Andrey Konovalov <[email protected]> Reviewed-by: Catalin Marinas <[email protected]> Cc: Andrey Ryabinin <[email protected]> Cc: Evgenii Stepanov <[email protected]> Cc: Peter Collingbourne <[email protected]> Cc: Vincenzo Frascino <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-03-28mm, printk: introduce new format %pGt for page_typeHyeonggon Yoo2-1/+14
%pGp format is used to display 'flags' field of a struct page. However, some page flags (i.e. PG_buddy, see page-flags.h for more details) are stored in page_type field. To display human-readable output of page_type, introduce %pGt format. It is important to note the meaning of bits are different in page_type. if page_type is 0xffffffff, no flags are set. Setting PG_buddy (0x00000080) flag results in a page_type of 0xffffff7f. Clearing a bit actually means setting a flag. Bits in page_type are inverted when displaying type names. Only values for which page_type_has_type() returns true are considered as page_type, to avoid confusion with mapcount values. if it returns false, only raw values are displayed and not page type names. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Hyeonggon Yoo <[email protected]> Reviewed-by: Petr Mladek <[email protected]> [vsprintf part] Cc: Andy Shevchenko <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Joe Perches <[email protected]> Cc: John Ogness <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Sergey Senozhatsky <[email protected]> Cc: Steven Rostedt (Google) <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-03-28mmflags.h: use less error prone method to define pageflag_namesHyeonggon Yoo1-41/+44
Patch series "mm, printk: introduce new format for page_type", v4. This series moves PG_slab page flag to page_type, freeing one bit in page->flags and introduces %pGt format that prints human-readable page_type like %pGp for printing page flags. See changelog of patch 2 for more implementation details. Thanks everyone that gave valuable comments. This patch (of 3): Use helper macro to decrease chances of typo when defining pageflag_names. Link: https://lkml.kernel.org/r/[email protected] Link: https://lore.kernel.org/lkml/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Hyeonggon Yoo <[email protected]> Suggested-by: Andy Shevchenko <[email protected]> Reviewed-by: Andy Shevchenko <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Joe Perches <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Petr Mladek <[email protected]> Cc: Steven Rostedt (Google) <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Sergey Senozhatsky <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: John Ogness <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-03-28mm: add tracepoints to ksmStefan Roesch1-0/+251
This adds the following tracepoints to ksm: - start / stop scan - ksm enter / exit - merge a page - merge a page with ksm - remove a page - remove a rmap item This patch has been split off from the RFC patch series "mm: process/cgroup ksm support". Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Stefan Roesch <[email protected]> Reviewed-by: Steven Rostedt (Google) <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-03-28lazy tlb: allow lazy tlb mm refcounting to be configurableNicholas Piggin1-3/+15
Add CONFIG_MMU_TLB_REFCOUNT which enables refcounting of the lazy tlb mm when it is context switched. This can be disabled by architectures that don't require this refcounting if they clean up lazy tlb mms when the last refcount is dropped. Currently this is always enabled, so the patch introduces no functional change. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Nicholas Piggin <[email protected]> Acked-by: Linus Torvalds <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Nadav Amit <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-03-28lazy tlb: introduce lazy tlb mm refcount helper functionsNicholas Piggin1-0/+16
Add explicit _lazy_tlb annotated functions for lazy tlb mm refcounting. This makes the lazy tlb mm references more obvious, and allows the refcounting scheme to be modified in later changes. There is no functional change with this patch. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Nicholas Piggin <[email protected]> Acked-by: Linus Torvalds <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Nadav Amit <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-03-28mm: multi-gen LRU: clean up sysfs codeT.J. Alumbaugh1-1/+1
This patch cleans up the sysfs code. Specifically, 1. use sysfs_emit(), 2. use __ATTR_RW(), and 3. constify multi-gen LRU struct attribute_group. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: T.J. Alumbaugh <[email protected]> Cc: Yu Zhao <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-03-28x86/mm/pat: clear VM_PAT if copy_p4d_range failedMa Wupeng1-3/+4
Syzbot reports a warning in untrack_pfn(). Digging into the root we found that this is due to memory allocation failure in pmd_alloc_one. And this failure is produced due to failslab. In copy_page_range(), memory alloaction for pmd failed. During the error handling process in copy_page_range(), mmput() is called to remove all vmas. While untrack_pfn this empty pfn, warning happens. Here's a simplified flow: dup_mm dup_mmap copy_page_range copy_p4d_range copy_pud_range copy_pmd_range pmd_alloc __pmd_alloc pmd_alloc_one page = alloc_pages(gfp, 0); if (!page) return NULL; mmput exit_mmap unmap_vmas unmap_single_vma untrack_pfn follow_phys WARN_ON_ONCE(1); Since this vma is not generate successfully, we can clear flag VM_PAT. In this case, untrack_pfn() will not be called while cleaning this vma. Function untrack_pfn_moved() has also been renamed to fit the new logic. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ma Wupeng <[email protected]> Reported-by: <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Suresh Siddha <[email protected]> Cc: Toshi Kani <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-03-28fprobe: Skip exit_handler if entry_handler returns !0Masami Hiramatsu (Google)1-2/+2
Skip hooking function return and calling exit_handler if the entry_handler() returns !0. Link: https://lkml.kernel.org/r/167526699798.433354.10998365726830117303.stgit@mhiramat.roam.corp.google.com Cc: Florent Revest <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Masami Hiramatsu (Google) <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-03-28fprobe: Add nr_maxactive to specify rethook_node pool sizeMasami Hiramatsu (Google)1-0/+2
Add nr_maxactive to specify rethook_node pool size. This means the maximum number of actively running target functions concurrently for probing by exit_handler. Note that if the running function is preempted or sleep, it is still counted as 'active'. Link: https://lkml.kernel.org/r/167526697917.433354.17779774988245113106.stgit@mhiramat.roam.corp.google.com Cc: Florent Revest <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Masami Hiramatsu (Google) <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-03-28fprobe: Pass entry_data to handlersMasami Hiramatsu (Google)1-2/+6
Pass the private entry_data to the entry and exit handlers so that they can share the context data, something like saved function arguments etc. User must specify the private entry_data size by @entry_data_size field before registering the fprobe. Link: https://lkml.kernel.org/r/167526696173.433354.17408372048319432574.stgit@mhiramat.roam.corp.google.com Cc: Florent Revest <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Masami Hiramatsu (Google) <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-03-28drm/msm: Add wait-boost supportRob Clark1-2/+12
Add a way for various userspace waits to signal urgency. Signed-off-by: Rob Clark <[email protected]> Patchwork: https://patchwork.freedesktop.org/patch/525817/ Link: https://lore.kernel.org/r/[email protected]
2023-03-28Merge tag 'dma-fence-deadline' into HEADRob Clark5-22/+57
This series adds a deadline hint to fences, so realtime deadlines such as vblank can be communicated to the fence signaller for power/ frequency management decisions. This is partially inspired by a trick i915 does, but implemented via dma-fence for a couple of reasons: 1) To continue to be able to use the atomic helpers 2) To support cases where display and gpu are different drivers See https://patchwork.freedesktop.org/series/93035/ This does not yet add any UAPI, although this will be needed in a number of cases: 1) Workloads "ping-ponging" between CPU and GPU, where we don't want the GPU freq governor to interpret time stalled waiting for GPU as "idle" time 2) Cases where the compositor is waiting for fences to be signaled before issuing the atomic ioctl, for example to maintain 60fps cursor updates even when the GPU is not able to maintain that framerate. Signed-off-by: Rob Clark <[email protected]>
2023-03-28drm/vblank: Add helper to get next vblank timeRob Clark1-0/+1
Will be used in the next commit to set a deadline on fences that an atomic update is waiting on. v2: Calculate time at *start* of vblank period, not end v3: Fix kbuild complaints Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Mario Kleiner <[email protected]>
2023-03-28drm/scheduler: Add fence deadline supportRob Clark1-0/+17
As the finished fence is the one that is exposed to userspace, and therefore the one that other operations, like atomic update, would block on, we need to propagate the deadline from from the finished fence to the actual hw fence. v2: Split into drm_sched_fence_set_parent() (ckoenig) v3: Ensure a thread calling drm_sched_fence_set_deadline_finished() sees fence->parent set before drm_sched_fence_set_parent() does this test_bit(DMA_FENCE_FLAG_HAS_DEADLINE_BIT). Signed-off-by: Rob Clark <[email protected]> Acked-by: Luben Tuikov <[email protected]>
2023-03-28dma-buf/sync_file: Surface sync-file uABIRob Clark1-22/+15
We had all of the internal driver APIs, but not the all important userspace uABI, in the dma-buf doc. Fix that. And re-arrange the comments slightly as otherwise the comments for the ioctl nr defines would not show up. v2: Fix docs build warning coming from newly including the uabi header in the docs build Signed-off-by: Rob Clark <[email protected]> Acked-by: Pekka Paalanen <[email protected]>
2023-03-28dma-buf/dma-resv: Add a way to set fence deadlineRob Clark1-0/+2
Add a way to set a deadline on remaining resv fences according to the requested usage. Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Christian König <[email protected]>
2023-03-28Merge tag 'urgent-rcu.2023.03.28a' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu Pull RCU fix from Paul McKenney: "This brings the rcu_torture_read event trace into line with the new trace tools by replacing this event trace's __field() with the corresponding __array(). Without this, the new trace tools will fail when presented wtih an rcu_torture_read event trace, which is a regression from the viewpoint of trace tools users" Link: https://lore.kernel.org/all/[email protected]/ * tag 'urgent-rcu.2023.03.28a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: rcu: Fix rcu_torture_read ftrace event
2023-03-28firmware: arm_sdei: Fix sleep from invalid context BUGPierre Gondois1-1/+0
Running a preempt-rt (v6.2-rc3-rt1) based kernel on an Ampere Altra triggers: BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:46 in_atomic(): 0, irqs_disabled(): 128, non_block: 0, pid: 24, name: cpuhp/0 preempt_count: 0, expected: 0 RCU nest depth: 0, expected: 0 3 locks held by cpuhp/0/24: #0: ffffda30217c70d0 (cpu_hotplug_lock){++++}-{0:0}, at: cpuhp_thread_fun+0x5c/0x248 #1: ffffda30217c7120 (cpuhp_state-up){+.+.}-{0:0}, at: cpuhp_thread_fun+0x5c/0x248 #2: ffffda3021c711f0 (sdei_list_lock){....}-{3:3}, at: sdei_cpuhp_up+0x3c/0x130 irq event stamp: 36 hardirqs last enabled at (35): [<ffffda301e85b7bc>] finish_task_switch+0xb4/0x2b0 hardirqs last disabled at (36): [<ffffda301e812fec>] cpuhp_thread_fun+0x21c/0x248 softirqs last enabled at (0): [<ffffda301e80b184>] copy_process+0x63c/0x1ac0 softirqs last disabled at (0): [<0000000000000000>] 0x0 CPU: 0 PID: 24 Comm: cpuhp/0 Not tainted 5.19.0-rc3-rt5-[...] Hardware name: WIWYNN Mt.Jade Server [...] Call trace: dump_backtrace+0x114/0x120 show_stack+0x20/0x70 dump_stack_lvl+0x9c/0xd8 dump_stack+0x18/0x34 __might_resched+0x188/0x228 rt_spin_lock+0x70/0x120 sdei_cpuhp_up+0x3c/0x130 cpuhp_invoke_callback+0x250/0xf08 cpuhp_thread_fun+0x120/0x248 smpboot_thread_fn+0x280/0x320 kthread+0x130/0x140 ret_from_fork+0x10/0x20 sdei_cpuhp_up() is called in the STARTING hotplug section, which runs with interrupts disabled. Use a CPUHP_AP_ONLINE_DYN entry instead to execute the cpuhp cb later, with preemption enabled. SDEI originally got its own cpuhp slot to allow interacting with perf. It got superseded by pNMI and this early slot is not relevant anymore. [1] Some SDEI calls (e.g. SDEI_1_0_FN_SDEI_PE_MASK) take actions on the calling CPU. It is checked that preemption is disabled for them. _ONLINE cpuhp cb are executed in the 'per CPU hotplug thread'. Preemption is enabled in those threads, but their cpumask is limited to 1 CPU. Move 'WARN_ON_ONCE(preemptible())' statements so that SDEI cpuhp cb don't trigger them. Also add a check for the SDEI_1_0_FN_SDEI_PRIVATE_RESET SDEI call which acts on the calling CPU. [1]: https://lore.kernel.org/all/[email protected]/ Suggested-by: James Morse <[email protected]> Signed-off-by: Pierre Gondois <[email protected]> Reviewed-by: James Morse <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]>
2023-03-28atomics: Provide rcuref - scalable reference countingThomas Gleixner2-0/+161
atomic_t based reference counting, including refcount_t, uses atomic_inc_not_zero() for acquiring a reference. atomic_inc_not_zero() is implemented with a atomic_try_cmpxchg() loop. High contention of the reference count leads to retry loops and scales badly. There is nothing to improve on this implementation as the semantics have to be preserved. Provide rcuref as a scalable alternative solution which is suitable for RCU managed objects. Similar to refcount_t it comes with overflow and underflow detection and mitigation. rcuref treats the underlying atomic_t as an unsigned integer and partitions this space into zones: 0x00000000 - 0x7FFFFFFF valid zone (1 .. (INT_MAX + 1) references) 0x80000000 - 0xBFFFFFFF saturation zone 0xC0000000 - 0xFFFFFFFE dead zone 0xFFFFFFFF no reference rcuref_get() unconditionally increments the reference count with atomic_add_negative_relaxed(). rcuref_put() unconditionally decrements the reference count with atomic_add_negative_release(). This unconditional increment avoids the inc_not_zero() problem, but requires a more complex implementation on the put() side when the count drops from 0 to -1. When this transition is detected then it is attempted to mark the reference count dead, by setting it to the midpoint of the dead zone with a single atomic_cmpxchg_release() operation. This operation can fail due to a concurrent rcuref_get() elevating the reference count from -1 to 0 again. If the unconditional increment in rcuref_get() hits a reference count which is marked dead (or saturated) it will detect it after the fact and bring back the reference count to the midpoint of the respective zone. The zones provide enough tolerance which makes it practically impossible to escape from a zone. The racy implementation of rcuref_put() requires to protect rcuref_put() against a grace period ending in order to prevent a subtle use after free. As RCU is the only mechanism which allows to protect against that, it is not possible to fully replace the atomic_inc_not_zero() based implementation of refcount_t with this scheme. The final drop is slightly more expensive than the atomic_dec_return() counterpart, but that's not the case which this is optimized for. The optimization is on the high frequeunt get()/put() pairs and their scalability. The performance of an uncontended rcuref_get()/put() pair where the put() is not dropping the last reference is still on par with the plain atomic operations, while at the same time providing overflow and underflow detection and mitigation. The performance of rcuref compared to plain atomic_inc_not_zero() and atomic_dec_return() based reference counting under contention: - Micro benchmark: All CPUs running a increment/decrement loop on an elevated reference count, which means the 0 to -1 transition never happens. The performance gain depends on microarchitecture and the number of CPUs and has been observed in the range of 1.3X to 4.7X - Conversion of dst_entry::__refcnt to rcuref and testing with the localhost memtier/memcached benchmark. That benchmark shows the reference count contention prominently. The performance gain depends on microarchitecture and the number of CPUs and has been observed in the range of 1.1X to 2.6X over the previous fix for the false sharing issue vs. struct dst_entry::__refcnt. When memtier is run over a real 1Gb network connection, there is a small gain on top of the false sharing fix. The two changes combined result in a 2%-5% total gain for that networked test. Reported-by: Wangyang Guo <[email protected]> Reported-by: Arjan Van De Ven <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-03-28atomics: Provide atomic_add_negative() variantsThomas Gleixner3-11/+303
atomic_add_negative() does not provide the relaxed/acquire/release variants. Provide them in preparation for a new scalable reference count algorithm. Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Acked-by: Mark Rutland <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-03-28driver core: bus: constify class_unregister/destroy()Greg Kroah-Hartman1-2/+2
The class_unregister() and class_destroy() function should be taking a const * to struct class, not just a *, so fix that up. Cc: "Rafael J. Wysocki" <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2023-03-27fscrypt: new helper function - fscrypt_prepare_lookup_partial()Luís Henriques1-0/+7
This patch introduces a new helper function which can be used both in lookups and in atomic_open operations by filesystems that want to handle filename encryption and no-key dentries themselves. The reason for this function to be used in atomic open is that this operation can act as a lookup if handed a dentry that is negative. And in this case we may need to set DCACHE_NOKEY_NAME. Signed-off-by: Luís Henriques <[email protected]> Tested-by: Xiubo Li <[email protected]> Reviewed-by: Xiubo Li <[email protected]> [ebiggers: improved the function comment, and moved the function to just below __fscrypt_prepare_lookup()] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Eric Biggers <[email protected]>
2023-03-27ethtool: Add support for configuring tx_push_buf_lenShay Agroskin2-4/+12
This attribute, which is part of ethtool's ring param configuration allows the user to specify the maximum number of the packet's payload that can be written directly to the device. Example usage: # ethtool -G [interface] tx-push-buf-len [number of bytes] Co-developed-by: Jakub Kicinski <[email protected]> Signed-off-by: Shay Agroskin <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2023-03-27netlink: Add a macro to set policy message with format stringShay Agroskin1-0/+22
Similar to NL_SET_ERR_MSG_FMT, add a macro which sets netlink policy error message with a format string. Signed-off-by: Shay Agroskin <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2023-03-27net: introduce a config option to tweak MAX_SKB_FRAGSEric Dumazet1-11/+5
Currently, MAX_SKB_FRAGS value is 17. For standard tcp sendmsg() traffic, no big deal because tcp_sendmsg() attempts order-3 allocations, stuffing 32768 bytes per frag. But with zero copy, we use order-0 pages. For BIG TCP to show its full potential, we add a config option to be able to fit up to 45 segments per skb. This is also needed for BIG TCP rx zerocopy, as zerocopy currently does not support skbs with frag list. We have used MAX_SKB_FRAGS=45 value for years at Google before we deployed 4K MTU, with no adverse effect, other than a recent issue in mlx4, fixed in commit 26782aad00cc ("net/mlx4: MLX4_TX_BOUNCE_BUFFER_SIZE depends on MAX_SKB_FRAGS") Back then, goal was to be able to receive full size (64KB) GRO packets without the frag_list overhead. Note that /proc/sys/net/core/max_skb_frags can also be used to limit the number of fragments TCP can use in tx packets. By default we keep the old/legacy value of 17 until we get more coverage for the updated values. Sizes of struct skb_shared_info on 64bit arches MAX_SKB_FRAGS | sizeof(struct skb_shared_info): ============================================== 17 320 21 320+64 = 384 25 320+128 = 448 29 320+192 = 512 33 320+256 = 576 37 320+320 = 640 41 320+384 = 704 45 320+448 = 768 This inflation might cause problems for drivers assuming they could pack both the incoming packet (for MTU=1500) and skb_shared_info in half a page, using build_skb(). v3: fix build error when CONFIG_NET=n v2: fix two build errors assuming MAX_SKB_FRAGS was "unsigned long" Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: Nikolay Aleksandrov <[email protected]> Reviewed-by: Jason Xing <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-03-28ASoC: soc-core.c: add snd_soc_add_pcm_runtimes()Kuninori Morimoto1-2/+3
Current ASoC supports snd_soc_add_pcm_runtime(), but user need to call it one-by-one if it has multi dai_links. This patch adds snd_soc_add_pcm_runtimes() which supports multi dai_links. Signed-off-by: Kuninori Morimoto <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Mark Brown <[email protected]>
2023-03-28drm: bridge: samsung-dsim: Add i.MX8M Plus supportMarek Vasut1-0/+1
Add extras to support i.MX8M Plus. The main change is the removal of HS/VS/DE signal inversion in the LCDIFv3-DSIM glue logic, otherwise the implementation of this IP in i.MX8M Plus is very much compatible with the i.MX8M Mini/Nano one. Reviewed-by: Marek Vasut <[email protected]> Reviewed-by: Frieder Schrempf <[email protected]> Acked-by: Robert Foss <[email protected]> Signed-off-by: Marek Vasut <[email protected]> Signed-off-by: Jagan Teki <[email protected]> Tested-by: Marek Szyprowski <[email protected]> Signed-off-by: Inki Dae <[email protected]>
2023-03-28drm: bridge: Generalize Exynos-DSI driver into a Samsung DSIM bridgeJagan Teki1-0/+114
Samsung MIPI DSIM controller is common DSI IP that can be used in various SoCs like Exynos, i.MX8M Mini/Nano. In order to access this DSI controller between various platform SoCs, the ideal way to incorporate this in the drm stack is via the drm bridge driver. We already have a consolidated code for supporting component and bridge based DRM drivers, so keep the exynos component based code in existing exynos_drm_dsi.c and move generic bridge code as part of samsung-dsim.c Tested-by: Marek Szyprowski <[email protected]> Reviewed-by: Marek Vasut <[email protected]> Signed-off-by: Marek Szyprowski <[email protected]> Signed-off-by: Jagan Teki <[email protected]> Signed-off-by: Inki Dae <[email protected]>
2023-03-27dt-bindings: reset: add BCM63268 timer reset definitionsÁlvaro Fernández Rojas1-0/+4
Add missing timer reset definitions for BCM63268. Signed-off-by: Álvaro Fernández Rojas <[email protected]> Acked-by: Rob Herring <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Stephen Boyd <[email protected]>
2023-03-27dt-bindings: clk: add BCM63268 timer clock definitionsÁlvaro Fernández Rojas1-0/+13
Add missing timer clock definitions for BCM63268. Signed-off-by: Álvaro Fernández Rojas <[email protected]> Acked-by: Rob Herring <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Stephen Boyd <[email protected]>
2023-03-27dt-bindings: clock: add loongson-2 boot clock indexYinbo Zhu1-1/+2
The Loongson-2 boot clock was used to spi and lio peripheral and this patch was to add boot clock index number. Signed-off-by: Yinbo Zhu <[email protected]> Acked-by: Krzysztof Kozlowski <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Stephen Boyd <[email protected]>
2023-03-27locking/lockdep: Improve the deadlock scenario print for sync and read lockBoqun Feng1-1/+2
Lock scenario print is always a weak spot of lockdep splats. Improvement can be made if we rework the dependency search and the error printing. However without touching the graph search, we can improve a little for the circular deadlock case, since we have the to-be-added lock dependency, and know whether these two locks are read/write/sync. In order to know whether a held_lock is sync or not, a bit was "stolen" from ->references, which reduce our limit for the same lock class nesting from 2^12 to 2^11, and it should still be good enough. Besides, since we now have bit in held_lock for sync, we don't need the "hardirqoffs being 1" trick, and also we can avoid the __lock_release() if we jump out of __lock_acquire() before the held_lock stored. With these changes, a deadlock case evolved with read lock and sync gets a better print-out from: [...] Possible unsafe locking scenario: [...] [...] CPU0 CPU1 [...] ---- ---- [...] lock(srcuA); [...] lock(srcuB); [...] lock(srcuA); [...] lock(srcuB); to [...] Possible unsafe locking scenario: [...] [...] CPU0 CPU1 [...] ---- ---- [...] rlock(srcuA); [...] lock(srcuB); [...] lock(srcuA); [...] sync(srcuB); Signed-off-by: Boqun Feng <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Boqun Feng <[email protected]>
2023-03-27rcu: Annotate SRCU's update-side lockdep dependenciesBoqun Feng1-2/+32
Although all flavors of RCU readers are annotated correctly with lockdep as recursive read locks, they do not set the lock_acquire 'check' parameter. This means that RCU read locks are not added to the lockdep dependency graph, which in turn means that lockdep cannot detect RCU-based deadlocks. This is not a problem for RCU flavors having atomic read-side critical sections because context-based annotations can catch these deadlocks, see for example the RCU_LOCKDEP_WARN() statement in synchronize_rcu(). But context-based annotations are not helpful for sleepable RCU, especially given that it is perfectly legal to do synchronize_srcu(&srcu1) within an srcu_read_lock(&srcu2). However, we can detect SRCU-based by: (1) Making srcu_read_lock() a 'check'ed recursive read lock and (2) Making synchronize_srcu() a empty write lock critical section. Even better, with the newly introduced lock_sync(), we can avoid false positives about irq-unsafe/safe. This commit therefore makes it so. Note that NMI-safe SRCU read side critical sections are currently not annotated, but might be annotated in the future. Signed-off-by: Boqun Feng <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]> [ boqun: Add comments for annotation per Waiman's suggestion ] [ boqun: Fix comment warning reported by Stephen Rothwell ] Acked-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Boqun Feng <[email protected]>
2023-03-27locking/lockdep: Introduce lock_sync()Boqun Feng1-0/+5
Currently, functions like synchronize_srcu() do not have lockdep annotations resembling those of other write-side locking primitives. Such annotations might look as follows: lock_acquire(); lock_release(); Such annotations would tell lockdep that synchronize_srcu() acts like an empty critical section that waits for other (read-side) critical sections to finish. This would definitely catch some deadlock, but as pointed out by Paul Mckenney [1], this could also introduce false positives because of irq-safe/unsafe detection. Of course, there are tricks could help with this: might_sleep(); // Existing statement in __synchronize_srcu(). if (IS_ENABLED(CONFIG_PROVE_LOCKING)) { local_irq_disable(); lock_acquire(); lock_release(); local_irq_enable(); } But it would be better for lockdep to provide a separate annonation for functions like synchronize_srcu(), so that people won't need to repeat the ugly tricks above. Therefore introduce lock_sync(), which is simply an lock+unlock pair with no irq safe/unsafe deadlock check. This works because the to-be-annontated functions do not create real critical sections, and there is therefore no way that irq can create extra dependencies. [1]: https://lore.kernel.org/lkml/20180412021233.ewncg5jjuzjw3x62@tardis/ Signed-off-by: Boqun Feng <[email protected]> Acked-by: Waiman Long <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]> [ boqun: Fix typos reported by Davidlohr Bueso and Paul E. Mckenney ] Acked-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Boqun Feng <[email protected]>
2023-03-27driver core: move sysfs_dev_char_kobj out of class.hGreg Kroah-Hartman1-1/+0
The structure sysfs_dev_char_kobj is local only to the driver core code, so move it out of the global class.h file and into the internal base.h file as no one else should be touching this symbol. Acked-by: Rafael J. Wysocki <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2023-03-27device property: Remove unused struct net_device forward declarationAndy Shevchenko1-1/+0
There is no users in the property.h for the struct net_device. Remove the latter for good. Signed-off-by: Andy Shevchenko <[email protected]> Acked-by: Rafael J. Wysocki <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2023-03-27KVM: x86/ioapic: Resample the pending state of an IRQ when unmaskingDmytro Maluka1-0/+10
KVM irqfd based emulation of level-triggered interrupts doesn't work quite correctly in some cases, particularly in the case of interrupts that are handled in a Linux guest as oneshot interrupts (IRQF_ONESHOT). Such an interrupt is acked to the device in its threaded irq handler, i.e. later than it is acked to the interrupt controller (EOI at the end of hardirq), not earlier. Linux keeps such interrupt masked until its threaded handler finishes, to prevent the EOI from re-asserting an unacknowledged interrupt. However, with KVM + vfio (or whatever is listening on the resamplefd) we always notify resamplefd at the EOI, so vfio prematurely unmasks the host physical IRQ, thus a new physical interrupt is fired in the host. This extra interrupt in the host is not a problem per se. The problem is that it is unconditionally queued for injection into the guest, so the guest sees an extra bogus interrupt. [*] There are observed at least 2 user-visible issues caused by those extra erroneous interrupts for a oneshot irq in the guest: 1. System suspend aborted due to a pending wakeup interrupt from ChromeOS EC (drivers/platform/chrome/cros_ec.c). 2. Annoying "invalid report id data" errors from ELAN0000 touchpad (drivers/input/mouse/elan_i2c_core.c), flooding the guest dmesg every time the touchpad is touched. The core issue here is that by the time when the guest unmasks the IRQ, the physical IRQ line is no longer asserted (since the guest has acked the interrupt to the device in the meantime), yet we unconditionally inject the interrupt queued into the guest by the previous resampling. So to fix the issue, we need a way to detect that the IRQ is no longer pending, and cancel the queued interrupt in this case. With IOAPIC we are not able to probe the physical IRQ line state directly (at least not if the underlying physical interrupt controller is an IOAPIC too), so in this patch we use irqfd resampler for that. Namely, instead of injecting the queued interrupt, we just notify the resampler that this interrupt is done. If the IRQ line is actually already deasserted, we are done. If it is still asserted, a new interrupt will be shortly triggered through irqfd and injected into the guest. In the case if there is no irqfd resampler registered for this IRQ, we cannot fix the issue, so we keep the existing behavior: immediately unconditionally inject the queued interrupt. This patch fixes the issue for x86 IOAPIC only. In the long run, we can fix it for other irqchips and other architectures too, possibly taking advantage of reading the physical state of the IRQ line, which is possible with some other irqchips (e.g. with arm64 GIC, maybe even with the legacy x86 PIC). [*] In this description we assume that the interrupt is a physical host interrupt forwarded to the guest e.g. by vfio. Potentially the same issue may occur also with a purely virtual interrupt from an emulated device, e.g. if the guest handles this interrupt, again, as a oneshot interrupt. Signed-off-by: Dmytro Maluka <[email protected]> Link: https://lore.kernel.org/kvm/[email protected]/ Link: https://lore.kernel.org/lkml/[email protected]/ Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2023-03-27KVM: irqfd: Make resampler_list an RCU listDmytro Maluka2-1/+2
It is useful to be able to do read-only traversal of the list of all the registered irqfd resamplers without locking the resampler_lock mutex. In particular, we are going to traverse it to search for a resampler registered for the given irq of an irqchip, and that will be done with an irqchip spinlock (ioapic->lock) held, so it is undesirable to lock a mutex in this context. So turn this list into an RCU list. For protecting the read side, reuse kvm->irq_srcu which is already used for protecting a number of irq related things (kvm->irq_routing, irqfd->resampler->list, kvm->irq_ack_notifier_list, kvm->arch.mask_notifier_list). Signed-off-by: Dmytro Maluka <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2023-03-27arm64: perf: Abstract system register accesses awayMarc Zyngier1-0/+45
As we want to enable 32bit support, we need to distanciate the PMUv3 driver from the AArch64 system register names. This patch moves all system register accesses to an architecture specific include file, allowing the 32bit counterpart to be slotted in at a later time. Signed-off-by: Marc Zyngier <[email protected]> Co-developed-by: Zaid Al-Bassam <[email protected]> Signed-off-by: Zaid Al-Bassam <[email protected]> Tested-by: Florian Fainelli <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]>
2023-03-27arm64: perf: Move PMUv3 driver to drivers/perfMarc Zyngier2-1/+259
Having the ARM PMUv3 driver sitting in arch/arm64/kernel is getting in the way of being able to use perf on ARMv8 cores running a 32bit kernel, such as 32bit KVM guests. This patch moves it into drivers/perf/arm_pmuv3.c, with an include file in include/linux/perf/arm_pmuv3.h. The only thing left in arch/arm64 is some mundane perf stuff. Signed-off-by: Marc Zyngier <[email protected]> Signed-off-by: Zaid Al-Bassam <[email protected]> Tested-by: Florian Fainelli <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]>
2023-03-27Merge back thermal control material for 6.4-rc1.Rafael J. Wysocki1-0/+19
2023-03-27x86/hyperv: Change vTOM handling to use standard coco mechanismsMichael Kelley1-0/+2
Hyper-V guests on AMD SEV-SNP hardware have the option of using the "virtual Top Of Memory" (vTOM) feature specified by the SEV-SNP architecture. With vTOM, shared vs. private memory accesses are controlled by splitting the guest physical address space into two halves. vTOM is the dividing line where the uppermost bit of the physical address space is set; e.g., with 47 bits of guest physical address space, vTOM is 0x400000000000 (bit 46 is set). Guest physical memory is accessible at two parallel physical addresses -- one below vTOM and one above vTOM. Accesses below vTOM are private (encrypted) while accesses above vTOM are shared (decrypted). In this sense, vTOM is like the GPA.SHARED bit in Intel TDX. Support for Hyper-V guests using vTOM was added to the Linux kernel in two patch sets[1][2]. This support treats the vTOM bit as part of the physical address. For accessing shared (decrypted) memory, these patch sets create a second kernel virtual mapping that maps to physical addresses above vTOM. A better approach is to treat the vTOM bit as a protection flag, not as part of the physical address. This new approach is like the approach for the GPA.SHARED bit in Intel TDX. Rather than creating a second kernel virtual mapping, the existing mapping is updated using recently added coco mechanisms. When memory is changed between private and shared using set_memory_decrypted() and set_memory_encrypted(), the PTEs for the existing kernel mapping are changed to add or remove the vTOM bit in the guest physical address, just as with TDX. The hypercalls to change the memory status on the host side are made using the existing callback mechanism. Everything just works, with a minor tweak to map the IO-APIC to use private accesses. To accomplish the switch in approach, the following must be done: * Update Hyper-V initialization to set the cc_mask based on vTOM and do other coco initialization. * Update physical_mask so the vTOM bit is no longer treated as part of the physical address * Remove CC_VENDOR_HYPERV and merge the associated vTOM functionality under CC_VENDOR_AMD. Update cc_mkenc() and cc_mkdec() to set/clear the vTOM bit as a protection flag. * Code already exists to make hypercalls to inform Hyper-V about pages changing between shared and private. Update this code to run as a callback from __set_memory_enc_pgtable(). * Remove the Hyper-V special case from __set_memory_enc_dec() * Remove the Hyper-V specific call to swiotlb_update_mem_attributes() since mem_encrypt_init() will now do it. * Add a Hyper-V specific implementation of the is_private_mmio() callback that returns true for the IO-APIC and vTPM MMIO addresses [1] https://lore.kernel.org/all/[email protected]/ [2] https://lore.kernel.org/all/[email protected]/ [ bp: Touchups. ] Signed-off-by: Michael Kelley <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-03-27Merge 6.3-rc4 into usb-nextGreg Kroah-Hartman42-60/+253
We need the USB fixes here, and the USB gadget update for future development patches to be based on. Signed-off-by: Greg Kroah-Hartman <[email protected]>
2023-03-27net: phy: constify fwnode_get_phy_node() fwnode argumentRussell King (Oracle)1-1/+1
fwnode_get_phy_node() does not motify the fwnode structure, so make the argument const, Signed-off-by: Russell King (Oracle) <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-03-27net: sfp: make sfp_bus_find_fwnode() take a const fwnodeRussell King (Oracle)1-2/+3
sfp_bus_find_fwnode() does not write to the fwnode, so let's make it const. Signed-off-by: Russell King (Oracle) <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: David S. Miller <[email protected]>