aboutsummaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)AuthorFilesLines
2023-03-29vsock: support sockmapBobby Eshleman2-0/+18
This patch adds sockmap support for vsock sockets. It is intended to be usable by all transports, but only the virtio and loopback transports are implemented. SOCK_STREAM, SOCK_DGRAM, and SOCK_SEQPACKET are all supported. Signed-off-by: Bobby Eshleman <[email protected]> Acked-by: Michael S. Tsirkin <[email protected]> Reviewed-by: Stefano Garzarella <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-03-29Merge branch 'mlx5-next' into wip/leon-for-nextLeon Romanovsky1-3/+10
* mlx5-next: net/mlx5: Introduce other vport query for Q-counters
2023-03-29net/mlx5: Introduce other vport query for Q-countersPatrisious Haddad1-3/+10
These new fields in QUERY_Q_COUNTER command allow us to access another vport counters during the query command, which is specially useful to query representor vports. In addition also add the required caps to check if this capability is actually supported. Signed-off-by: Patrisious Haddad <[email protected]> Reviewed-by: Michael Guralnik <[email protected]> Link: https://lore.kernel.org/r/75c73a4a0e60f18c37b35a4a11ca2e2415e4a6f3.1679566038.git.leon@kernel.org Signed-off-by: Leon Romanovsky <[email protected]>
2023-03-29RDMA/bnxt_re: Add resize_cq supportSelvin Xavier1-0/+4
Add resize_cq verb support for user space CQs. Resize operation for kernel CQs are not supported now. Driver should free the current CQ only after user library polls for all the completions and switch to new CQ. So after the resize_cq is returned from the driver, user library polls for existing completions and store it as temporary data. Once library reaps all completions in the current CQ, it invokes the ibv_cmd_poll_cq to inform the driver about the resize_cq completion. Adding a check for user CQs in driver's poll_cq and complete the resize operation for user CQs. Updating uverbs_cmd_mask with poll_cq to support this. Signed-off-by: Selvin Xavier <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Leon Romanovsky <[email protected]>
2023-03-29xhci: use pm_ptr() instead of #ifdef for CONFIG_PM conditionalsArnd Bergmann2-4/+1
A recent patch caused an unused-function warning in builds with CONFIG_PM disabled, after the function became marked 'static': drivers/usb/host/xhci-pci.c:91:13: error: 'xhci_msix_sync_irqs' defined but not used [-Werror=unused-function] 91 | static void xhci_msix_sync_irqs(struct xhci_hcd *xhci) | ^~~~~~~~~~~~~~~~~~~ This could be solved by adding another #ifdef, but as there is a trend towards removing CONFIG_PM checks in favor of helper macros, do the same conversion here and use pm_ptr() to get either a function pointer or NULL but avoid the warning. As the hidden functions reference some other symbols, make sure those are visible at compile time, at the minimal cost of a few extra bytes for 'struct usb_device'. Fixes: 9abe15d55dcc ("xhci: Move xhci MSI sync function to to xhci-pci") Signed-off-by: Arnd Bergmann <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2023-03-28Merge tag 'mlx5-updates-2023-03-20' of ↵Jakub Kicinski2-2/+8
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2023-03-20 mlx5 dynamic msix This patch series adds support for dynamic msix vectors allocation in mlx5. Eli Cohen Says: ================ The following series of patches modifies mlx5_core to work with the dynamic MSIX API. Currently, mlx5_core allocates all the interrupt vectors it needs and distributes them amongst the consumers. With the introduction of dynamic MSIX support, which allows for allocation of interrupts more than once, we now allocate vectors as we need them. This allows other drivers running on top of mlx5_core to allocate interrupt vectors for their own use. An example for this is mlx5_vdpa, which uses these vectors to propagate interrupts directly from the hardware to the vCPU [1]. As a preparation for using this series, a use after free issue is fixed in lib/cpu_rmap.c and the allocator for rmap entries has been modified. A complementary API for irq_cpu_rmap_add() has also been introduced. [1] https://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git/patch/?id=0f2bf1fcae96a83b8c5581854713c9fc3407556e ================ * tag 'mlx5-updates-2023-03-20' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux: net/mlx5: Provide external API for allocating vectors net/mlx5: Use one completion vector if eth is disabled net/mlx5: Refactor calculation of required completion vectors net/mlx5: Move devlink registration before mlx5_load net/mlx5: Use dynamic msix vectors allocation net/mlx5: Refactor completion irq request/release code net/mlx5: Improve naming of pci function vectors net/mlx5: Use newer affinity descriptor net/mlx5: Modify struct mlx5_irq to use struct msi_map net/mlx5: Fix wrong comment net/mlx5e: Coding style fix, add empty line lib: cpu_rmap: Add irq_cpu_rmap_remove to complement irq_cpu_rmap_add lib: cpu_rmap: Use allocator for rmap entries lib: cpu_rmap: Avoid use after free on rmap->obj array entries ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-03-29driver core: class: mark the struct class for sysfs callbacks as constantGreg Kroah-Hartman1-4/+5
struct class should never be modified in a sysfs callback as there is nothing in the structure to modify, and frankly, the structure is almost never used in a sysfs callback, so mark it as constant to allow struct class to be moved to read-only memory. While we are touching all class sysfs callbacks also mark the attribute as constant as it can not be modified. The bonding code still uses this structure so it can not be removed from the function callbacks. Cc: "David S. Miller" <[email protected]> Cc: "Rafael J. Wysocki" <[email protected]> Cc: Bartosz Golaszewski <[email protected]> Cc: Eric Dumazet <[email protected]> Cc: Jakub Kicinski <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Johannes Berg <[email protected]> Cc: Linus Walleij <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Miquel Raynal <[email protected]> Cc: Namjae Jeon <[email protected]> Cc: Paolo Abeni <[email protected]> Cc: Russ Weight <[email protected]> Cc: Sergey Senozhatsky <[email protected]> Cc: Steve French <[email protected]> Cc: Vignesh Raghavendra <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Reviewed-by: Luis Chamberlain <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2023-03-28net: dst: Switch to rcuref_t reference countingThomas Gleixner2-10/+11
Under high contention dst_entry::__refcnt becomes a significant bottleneck. atomic_inc_not_zero() is implemented with a cmpxchg() loop, which goes into high retry rates on contention. Switch the reference count to rcuref_t which results in a significant performance gain. Rename the reference count member to __rcuref to reflect the change. The gain depends on the micro-architecture and the number of concurrent operations and has been measured in the range of +25% to +130% with a localhost memtier/memcached benchmark which amplifies the problem massively. Running the memtier/memcached benchmark over a real (1Gb) network connection the conversion on top of the false sharing fix for struct dst_entry::__refcnt results in a total gain in the 2%-5% range over the upstream baseline. Reported-by: Wangyang Guo <[email protected]> Reported-by: Arjan Van De Ven <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-03-28net: dst: Prevent false sharing vs. dst_entry:: __refcntWangyang Guo4-8/+15
dst_entry::__refcnt is highly contended in scenarios where many connections happen from and to the same IP. The reference count is an atomic_t, so the reference count operations have to take the cache-line exclusive. Aside of the unavoidable reference count contention there is another significant problem which is caused by that: False sharing. perf top identified two affected read accesses. dst_entry::lwtstate and rtable::rt_genid. dst_entry:__refcnt is located at offset 64 of dst_entry, which puts it into a seperate cacheline vs. the read mostly members located at the beginning of the struct. That prevents false sharing vs. the struct members in the first 64 bytes of the structure, but there is also dst_entry::lwtstate which is located after the reference count and in the same cache line. This member is read after a reference count has been acquired. struct rtable embeds a struct dst_entry at offset 0. struct dst_entry has a size of 112 bytes, which means that the struct members of rtable which follow the dst member share the same cache line as dst_entry::__refcnt. Especially rtable::rt_genid is also read by the contexts which have a reference count acquired already. When dst_entry:__refcnt is incremented or decremented via an atomic operation these read accesses stall. This was found when analysing the memtier benchmark in 1:100 mode, which amplifies the problem extremly. Move the rt[6i]_uncached[_list] members out of struct rtable and struct rt6_info into struct dst_entry to provide padding and move the lwtstate member after that so it ends up in the same cache line. The resulting improvement depends on the micro-architecture and the number of CPUs. It ranges from +20% to +120% with a localhost memtier/memcached benchmark. [ tglx: Rearrange struct ] Signed-off-by: Wangyang Guo <[email protected]> Signed-off-by: Arjan van de Ven <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Reviewed-by: David Ahern <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-03-28Merge branch 'locking/rcuref' of ↵Jakub Kicinski5-11/+464
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pulling rcurefs from Peter for tglx's work. Link: https://lore.kernel.org/all/[email protected]/ Signed-off-by: Jakub Kicinski <[email protected]>
2023-03-28mm/thp: rename TRANSPARENT_HUGEPAGE_NEVER_DAX to _UNSUPPORTEDPeter Xu1-1/+1
TRANSPARENT_HUGEPAGE_NEVER_DAX has nothing to do with DAX. It's set when has_transparent_hugepage() returns false, checked in hugepage_vma_check() and will disable THP completely if false. Rename it to TRANSPARENT_HUGEPAGE_UNSUPPORTED to reflect its real purpose. [[email protected]: fix comment, per David] Link: https://lkml.kernel.org/r/ZBMzQW674oHQJV7F@x1n Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Peter Xu <[email protected]> Acked-by: David Hildenbrand <[email protected]> Cc: Aneesh Kumar K.V <[email protected]> Cc: Yang Shi <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-03-28mm: vmscan: add a map_nr_max field to shrinker_infoQi Zheng1-0/+1
Patch series "make slab shrink lockless", v5. This patch series aims to make slab shrink lockless. 1. Background ============= On our servers, we often find the following system cpu hotspots: 52.22% [kernel] [k] down_read_trylock 19.60% [kernel] [k] up_read 8.86% [kernel] [k] shrink_slab 2.44% [kernel] [k] idr_find 1.25% [kernel] [k] count_shadow_nodes 1.18% [kernel] [k] shrink lruvec 0.71% [kernel] [k] mem_cgroup_iter 0.71% [kernel] [k] shrink_node 0.55% [kernel] [k] find_next_bit And we used bpftrace to capture its calltrace as follows: @[ down_read_trylock+1 shrink_slab+128 shrink_node+371 do_try_to_free_pages+232 try_to_free_pages+243 _alloc_pages_slowpath+771 _alloc_pages_nodemask+702 pagecache_get_page+255 filemap_fault+1361 ext4_filemap_fault+44 __do_fault+76 handle_mm_fault+3543 do_user_addr_fault+442 do_page_fault+48 page_fault+62 ]: 1161690 @[ down_read_trylock+1 shrink_slab+128 shrink_node+371 balance_pgdat+690 kswapd+389 kthread+246 ret_from_fork+31 ]: 8424884 @[ down_read_trylock+1 shrink_slab+128 shrink_node+371 do_try_to_free_pages+232 try_to_free_pages+243 __alloc_pages_slowpath+771 __alloc_pages_nodemask+702 __do_page_cache_readahead+244 filemap_fault+1674 ext4_filemap_fault+44 __do_fault+76 handle_mm_fault+3543 do_user_addr_fault+442 do_page_fault+48 page_fault+62 ]: 20917631 We can see that down_read_trylock() of shrinker_rwsem is being called with high frequency at that time. Because of the poor multicore scalability of atomic operations, this can lead to a significant drop in IPC (instructions per cycle). And more, the shrinker_rwsem is a global read-write lock in shrinkers subsystem, which protects most operations such as slab shrink, registration and unregistration of shrinkers, etc. This can easily cause problems in the following cases. 1) When the memory pressure is high and there are many filesystems mounted or unmounted at the same time, slab shrink will be affected (down_read_trylock() failed). Such as the real workload mentioned by Kirill Tkhai: ``` One of the real workloads from my experience is start of an overcommitted node containing many starting containers after node crash (or many resuming containers after reboot for kernel update). In these cases memory pressure is huge, and the node goes round in long reclaim. ``` 2) If a shrinker is blocked (such as the case mentioned in [1]) and a writer comes in (such as mount a fs), then this writer will be blocked and cause all subsequent shrinker-related operations to be blocked. [1]. https://lore.kernel.org/lkml/[email protected]/ All the above cases can be solved by replacing the shrinker_rwsem trylocks with SRCU. 2. Survey ========= Before doing the code implementation, I found that there were many similar submissions in the community: a. Davidlohr Bueso submitted a patch in 2015. Subject: [PATCH -next v2] mm: srcu-ify shrinkers Link: https://lore.kernel.org/all/[email protected]/ Result: It was finally merged into the linux-next branch, but failed on arm allnoconfig (without CONFIG_SRCU) b. Tetsuo Handa submitted a patchset in 2017. Subject: [PATCH 1/2] mm,vmscan: Kill global shrinker lock. Link: https://lore.kernel.org/lkml/1510609063-3327-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp/ Result: Finally chose to use the current simple way (break when rwsem_is_contended()). And Christoph Hellwig suggested to using SRCU, but SRCU was not unconditionally enabled at the time. c. Kirill Tkhai submitted a patchset in 2018. Subject: [PATCH RFC 00/10] Introduce lockless shrink_slab() Link: https://lore.kernel.org/lkml/153365347929.19074.12509495712735843805.stgit@localhost.localdomain/ Result: At that time, SRCU was not unconditionally enabled, and there were some objections to enabling SRCU. Later, because Kirill's focus was moved to other things, this patchset was not continued to be updated. d. Sultan Alsawaf submitted a patch in 2021. Subject: [PATCH] mm: vmscan: Replace shrinker_rwsem trylocks with SRCU protection Link: https://lore.kernel.org/lkml/[email protected]/ Result: Rejected because SRCU was not unconditionally enabled. We can find that almost all these historical commits were abandoned because SRCU was not unconditionally enabled. But now SRCU has been unconditionally enable by Paul E. McKenney in 2023 [2], so it's time to replace shrinker_rwsem trylocks with SRCU. [2] https://lore.kernel.org/lkml/20230105003759.GA1769545@paulmck-ThinkPad-P17-Gen-1/ 3. Reproduction and testing =========================== We can reproduce the down_read_trylock() hotspot through the following script: ``` #!/bin/bash DIR="/root/shrinker/memcg/mnt" do_create() { mkdir -p /sys/fs/cgroup/memory/test mkdir -p /sys/fs/cgroup/perf_event/test echo 4G > /sys/fs/cgroup/memory/test/memory.limit_in_bytes for i in `seq 0 $1`; do mkdir -p /sys/fs/cgroup/memory/test/$i; echo $$ > /sys/fs/cgroup/memory/test/$i/cgroup.procs; echo $$ > /sys/fs/cgroup/perf_event/test/cgroup.procs; mkdir -p $DIR/$i; done } do_mount() { for i in `seq $1 $2`; do mount -t tmpfs $i $DIR/$i; done } do_touch() { for i in `seq $1 $2`; do echo $$ > /sys/fs/cgroup/memory/test/$i/cgroup.procs; echo $$ > /sys/fs/cgroup/perf_event/test/cgroup.procs; dd if=/dev/zero of=$DIR/$i/file$i bs=1M count=1 & done } case "$1" in touch) do_touch $2 $3 ;; test) do_create 4000 do_mount 0 4000 do_touch 0 3000 ;; *) exit 1 ;; esac ``` Save the above script, then run test and touch commands. Then we can use the following perf command to view hotspots: perf top -U -F 999 1) Before applying this patchset: 32.31% [kernel] [k] down_read_trylock 19.40% [kernel] [k] pv_native_safe_halt 16.24% [kernel] [k] up_read 15.70% [kernel] [k] shrink_slab 4.69% [kernel] [k] _find_next_bit 2.62% [kernel] [k] shrink_node 1.78% [kernel] [k] shrink_lruvec 0.76% [kernel] [k] do_shrink_slab 2) After applying this patchset: 27.83% [kernel] [k] _find_next_bit 16.97% [kernel] [k] shrink_slab 15.82% [kernel] [k] pv_native_safe_halt 9.58% [kernel] [k] shrink_node 8.31% [kernel] [k] shrink_lruvec 5.64% [kernel] [k] do_shrink_slab 3.88% [kernel] [k] mem_cgroup_iter At the same time, we use the following perf command to capture IPC information: perf stat -e cycles,instructions -G test -a --repeat 5 -- sleep 10 1) Before applying this patchset: Performance counter stats for 'system wide' (5 runs): 454187219766 cycles test ( +- 1.84% ) 78896433101 instructions test # 0.17 insn per cycle ( +- 0.44% ) 10.0020430 +- 0.0000366 seconds time elapsed ( +- 0.00% ) 2) After applying this patchset: Performance counter stats for 'system wide' (5 runs): 841954709443 cycles test ( +- 15.80% ) (98.69%) 527258677936 instructions test # 0.63 insn per cycle ( +- 15.11% ) (98.68%) 10.01064 +- 0.00831 seconds time elapsed ( +- 0.08% ) We can see that IPC drops very seriously when calling down_read_trylock() at high frequency. After using SRCU, the IPC is at a normal level. This patch (of 8): To prepare for the subsequent lockless memcg slab shrink, add a map_nr_max field to struct shrinker_info to records its own real shrinker_nr_max. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Qi Zheng <[email protected]> Suggested-by: Kirill Tkhai <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Acked-by: Kirill Tkhai <[email protected]> Acked-by: Roman Gushchin <[email protected]> Cc: Christian König <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Muchun Song <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Qi Zheng <[email protected]> Cc: Shakeel Butt <[email protected]> Cc: Sultan Alsawaf <[email protected]> Cc: Tetsuo Handa <[email protected]> Cc: Yang Shi <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-03-28mm: prefer xxx_page() alloc/free functions for order-0 pagesLorenzo Stoakes1-2/+2
Update instances of alloc_pages(..., 0), __get_free_pages(..., 0) and __free_pages(..., 0) to use alloc_page(), __get_free_page() and __free_page() respectively in core code. Link: https://lkml.kernel.org/r/50c48ca4789f1da2a65795f2346f5ae3eff7d665.1678710232.git.lstoakes@gmail.com Signed-off-by: Lorenzo Stoakes <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Reviewed-by: Mike Rapoport (IBM) <[email protected]> Acked-by: Mel Gorman <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Uladzislau Rezki (Sony) <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-03-28kasan: remove PG_skip_kasan_poison flagPeter Collingbourne3-37/+15
Code inspection reveals that PG_skip_kasan_poison is redundant with kasantag, because the former is intended to be set iff the latter is the match-all tag. It can also be observed that it's basically pointless to poison pages which have kasantag=0, because any pages with this tag would have been pointed to by pointers with match-all tags, so poisoning the pages would have little to no effect in terms of bug detection. Therefore, change the condition in should_skip_kasan_poison() to check kasantag instead, and remove PG_skip_kasan_poison and associated flags. Link: https://lkml.kernel.org/r/[email protected] Link: https://linux-review.googlesource.com/id/I57f825f2eaeaf7e8389d6cf4597c8a5821359838 Signed-off-by: Peter Collingbourne <[email protected]> Reviewed-by: Andrey Konovalov <[email protected]> Cc: Andrey Ryabinin <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Evgenii Stepanov <[email protected]> Cc: Vincenzo Frascino <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-03-28io-mapping: don't disable preempt on RT in io_mapping_map_atomic_wc().Sebastian Andrzej Siewior1-4/+16
io_mapping_map_atomic_wc() disables preemption and pagefaults for historical reasons. The conversion to io_mapping_map_local_wc(), which only disables migration, cannot be done wholesale because quite some call sites need to be updated to accommodate with the changed semantics. On PREEMPT_RT enabled kernels the io_mapping_map_atomic_wc() semantics are problematic due to the implicit disabling of preemption which makes it impossible to acquire 'sleeping' spinlocks within the mapped atomic sections. PREEMPT_RT replaces the preempt_disable() with a migrate_disable() for more than a decade. It could be argued that this is a justification to do this unconditionally, but PREEMPT_RT covers only a limited number of architectures and it disables some functionality which limits the coverage further. Limit the replacement to PREEMPT_RT for now. This is also done kmap_atomic(). Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Sebastian Andrzej Siewior <[email protected]> Reported-by: Richard Weinberger <[email protected]> Link: https://lore.kernel.org/CAFLxGvw0WMxaMqYqJ5WgvVSbKHq2D2xcXTOgMCpgq9nDC-MWTQ@mail.gmail.com Cc: Thomas Gleixner <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-03-28shmem: add support to ignore swapLuis Chamberlain1-0/+1
In doing experimentations with shmem having the option to avoid swap becomes a useful mechanism. One of the *raves* about brd over shmem is you can avoid swap, but that's not really a good reason to use brd if we can instead use shmem. Using brd has its own good reasons to exist, but just because "tmpfs" doesn't let you do that is not a great reason to avoid it if we can easily add support for it. I don't add support for reconfiguring incompatible options, but if we really wanted to we can add support for that. To avoid swap we use mapping_set_unevictable() upon inode creation, and put a WARN_ON_ONCE() stop-gap on writepages() for reclaim. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Luis Chamberlain <[email protected]> Acked-by: Christian Brauner <[email protected]> Tested-by: Xin Hao <[email protected]> Reviewed-by: Davidlohr Bueso <[email protected]> Cc: Adam Manzanares <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Kees Cook <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Pankaj Raghav <[email protected]> Cc: Yosry Ahmed <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-03-28mm,jfs: move write_one_page/folio_write_one to jfsChristoph Hellwig1-6/+0
The last remaining user of folio_write_one through the write_one_page wrapper is jfs, so move the functionality there and hard code the call to metapage_writepage. Note that the use of the pagecache by the JFS 'metapage' buffer cache is a bit odd, and we could probably do without VM-level dirty tracking at all, but that's a change for another time. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Christoph Hellwig <[email protected]> Acked-by: Dave Kleikamp <[email protected]> Cc: Changwei Ge <[email protected]> Cc: Evgeniy Dushistov <[email protected]> Cc: Gang He <[email protected]> Cc: Jan Kara <[email protected]> Cc: Jan Kara via Ocfs2-devel <[email protected]> Cc: Joel Becker <[email protected]> Cc: Joseph Qi <[email protected]> Cc: Joseph Qi <[email protected]> Cc: Jun Piao <[email protected]> Cc: Junxiao Bi <[email protected]> Cc: Mark Fasheh <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-03-28mm, memcg: Prevent memory.swappiness load/store tearingYue Zhao1-4/+4
The knob for cgroup v1 memory controller: memory.swappiness is not protected by any locking so it can be modified while it is used. This is not an actual problem because races are unlikely. But it is better to use [READ|WRITE]_ONCE to prevent compiler from doing anything funky. The access of memcg->swappiness and vm_swappiness is lockless, so both of them can be concurrently set at the same time as we are trying to read them. All occurrences of memcg->swappiness and vm_swappiness are updated with [READ|WRITE]_ONCE. [[email protected]: v3] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Yue Zhao <[email protected]> Acked-by: Michal Hocko <[email protected]> Acked-by: Shakeel Butt <[email protected]> Acked-by: Roman Gushchin <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Muchun Song <[email protected]> Cc: Tang Yizhou <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-03-28mm: add PTE pointer parameter to flush_tlb_fix_spurious_fault()Gerald Schaefer1-1/+1
s390 can do more fine-grained handling of spurious TLB protection faults, when there also is the PTE pointer available. Therefore, pass on the PTE pointer to flush_tlb_fix_spurious_fault() as an additional parameter. This will add no functional change to other architectures, but those with private flush_tlb_fix_spurious_fault() implementations need to be made aware of the new parameter. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Gerald Schaefer <[email protected]> Reviewed-by: Alexander Gordeev <[email protected]> Acked-by: Catalin Marinas <[email protected]> [arm64] Acked-by: Michael Ellerman <[email protected]> [powerpc] Acked-by: David Hildenbrand <[email protected]> Cc: Anshuman Khandual <[email protected]> Cc: Borislav Petkov (AMD) <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Thomas Bogendoerfer <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-03-28mm: swap: remove unneeded cgroup_throttle_swaprate()Kefeng Wang1-8/+4
All the callers of cgroup_throttle_swaprate() are converted to folio_throttle_swaprate(), so make __cgroup_throttle_swaprate() to take a folio, and rename it to __folio_throttle_swaprate(), also rename gfp_mask to gfp and drop redundant extern keyword. finally, drop unused cgroup_throttle_swaprate(). Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Kefeng Wang <[email protected]> Reviewed-by: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-03-28kasan: call clear_page with a match-all tag instead of changing page tagPeter Collingbourne1-5/+3
Instead of changing the page's tag solely in order to obtain a pointer with a match-all tag and then changing it back again, just convert the pointer that we get from kmap_atomic() into one with a match-all tag before passing it to clear_page(). On a certain microarchitecture, this has been observed to cause a measurable improvement in microbenchmark performance, presumably as a result of being able to avoid the atomic operations on the page tag. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Peter Collingbourne <[email protected]> Link: https://linux-review.googlesource.com/id/I0249822cc29097ca7a04ad48e8eb14871f80e711 Reviewed-by: Andrey Konovalov <[email protected]> Reviewed-by: Catalin Marinas <[email protected]> Cc: Andrey Ryabinin <[email protected]> Cc: Evgenii Stepanov <[email protected]> Cc: Peter Collingbourne <[email protected]> Cc: Vincenzo Frascino <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-03-28mm, printk: introduce new format %pGt for page_typeHyeonggon Yoo2-1/+14
%pGp format is used to display 'flags' field of a struct page. However, some page flags (i.e. PG_buddy, see page-flags.h for more details) are stored in page_type field. To display human-readable output of page_type, introduce %pGt format. It is important to note the meaning of bits are different in page_type. if page_type is 0xffffffff, no flags are set. Setting PG_buddy (0x00000080) flag results in a page_type of 0xffffff7f. Clearing a bit actually means setting a flag. Bits in page_type are inverted when displaying type names. Only values for which page_type_has_type() returns true are considered as page_type, to avoid confusion with mapcount values. if it returns false, only raw values are displayed and not page type names. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Hyeonggon Yoo <[email protected]> Reviewed-by: Petr Mladek <[email protected]> [vsprintf part] Cc: Andy Shevchenko <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Joe Perches <[email protected]> Cc: John Ogness <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Sergey Senozhatsky <[email protected]> Cc: Steven Rostedt (Google) <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-03-28mmflags.h: use less error prone method to define pageflag_namesHyeonggon Yoo1-41/+44
Patch series "mm, printk: introduce new format for page_type", v4. This series moves PG_slab page flag to page_type, freeing one bit in page->flags and introduces %pGt format that prints human-readable page_type like %pGp for printing page flags. See changelog of patch 2 for more implementation details. Thanks everyone that gave valuable comments. This patch (of 3): Use helper macro to decrease chances of typo when defining pageflag_names. Link: https://lkml.kernel.org/r/[email protected] Link: https://lore.kernel.org/lkml/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Hyeonggon Yoo <[email protected]> Suggested-by: Andy Shevchenko <[email protected]> Reviewed-by: Andy Shevchenko <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Joe Perches <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Petr Mladek <[email protected]> Cc: Steven Rostedt (Google) <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Sergey Senozhatsky <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: John Ogness <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-03-28mm: add tracepoints to ksmStefan Roesch1-0/+251
This adds the following tracepoints to ksm: - start / stop scan - ksm enter / exit - merge a page - merge a page with ksm - remove a page - remove a rmap item This patch has been split off from the RFC patch series "mm: process/cgroup ksm support". Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Stefan Roesch <[email protected]> Reviewed-by: Steven Rostedt (Google) <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-03-28lazy tlb: allow lazy tlb mm refcounting to be configurableNicholas Piggin1-3/+15
Add CONFIG_MMU_TLB_REFCOUNT which enables refcounting of the lazy tlb mm when it is context switched. This can be disabled by architectures that don't require this refcounting if they clean up lazy tlb mms when the last refcount is dropped. Currently this is always enabled, so the patch introduces no functional change. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Nicholas Piggin <[email protected]> Acked-by: Linus Torvalds <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Nadav Amit <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-03-28lazy tlb: introduce lazy tlb mm refcount helper functionsNicholas Piggin1-0/+16
Add explicit _lazy_tlb annotated functions for lazy tlb mm refcounting. This makes the lazy tlb mm references more obvious, and allows the refcounting scheme to be modified in later changes. There is no functional change with this patch. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Nicholas Piggin <[email protected]> Acked-by: Linus Torvalds <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Nadav Amit <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-03-28mm: multi-gen LRU: clean up sysfs codeT.J. Alumbaugh1-1/+1
This patch cleans up the sysfs code. Specifically, 1. use sysfs_emit(), 2. use __ATTR_RW(), and 3. constify multi-gen LRU struct attribute_group. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: T.J. Alumbaugh <[email protected]> Cc: Yu Zhao <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-03-28x86/mm/pat: clear VM_PAT if copy_p4d_range failedMa Wupeng1-3/+4
Syzbot reports a warning in untrack_pfn(). Digging into the root we found that this is due to memory allocation failure in pmd_alloc_one. And this failure is produced due to failslab. In copy_page_range(), memory alloaction for pmd failed. During the error handling process in copy_page_range(), mmput() is called to remove all vmas. While untrack_pfn this empty pfn, warning happens. Here's a simplified flow: dup_mm dup_mmap copy_page_range copy_p4d_range copy_pud_range copy_pmd_range pmd_alloc __pmd_alloc pmd_alloc_one page = alloc_pages(gfp, 0); if (!page) return NULL; mmput exit_mmap unmap_vmas unmap_single_vma untrack_pfn follow_phys WARN_ON_ONCE(1); Since this vma is not generate successfully, we can clear flag VM_PAT. In this case, untrack_pfn() will not be called while cleaning this vma. Function untrack_pfn_moved() has also been renamed to fit the new logic. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ma Wupeng <[email protected]> Reported-by: <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Suresh Siddha <[email protected]> Cc: Toshi Kani <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-03-28fprobe: Skip exit_handler if entry_handler returns !0Masami Hiramatsu (Google)1-2/+2
Skip hooking function return and calling exit_handler if the entry_handler() returns !0. Link: https://lkml.kernel.org/r/167526699798.433354.10998365726830117303.stgit@mhiramat.roam.corp.google.com Cc: Florent Revest <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Masami Hiramatsu (Google) <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-03-28fprobe: Add nr_maxactive to specify rethook_node pool sizeMasami Hiramatsu (Google)1-0/+2
Add nr_maxactive to specify rethook_node pool size. This means the maximum number of actively running target functions concurrently for probing by exit_handler. Note that if the running function is preempted or sleep, it is still counted as 'active'. Link: https://lkml.kernel.org/r/167526697917.433354.17779774988245113106.stgit@mhiramat.roam.corp.google.com Cc: Florent Revest <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Masami Hiramatsu (Google) <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-03-28fprobe: Pass entry_data to handlersMasami Hiramatsu (Google)1-2/+6
Pass the private entry_data to the entry and exit handlers so that they can share the context data, something like saved function arguments etc. User must specify the private entry_data size by @entry_data_size field before registering the fprobe. Link: https://lkml.kernel.org/r/167526696173.433354.17408372048319432574.stgit@mhiramat.roam.corp.google.com Cc: Florent Revest <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Masami Hiramatsu (Google) <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-03-28drm/msm: Add wait-boost supportRob Clark1-2/+12
Add a way for various userspace waits to signal urgency. Signed-off-by: Rob Clark <[email protected]> Patchwork: https://patchwork.freedesktop.org/patch/525817/ Link: https://lore.kernel.org/r/[email protected]
2023-03-28Merge tag 'dma-fence-deadline' into HEADRob Clark5-22/+57
This series adds a deadline hint to fences, so realtime deadlines such as vblank can be communicated to the fence signaller for power/ frequency management decisions. This is partially inspired by a trick i915 does, but implemented via dma-fence for a couple of reasons: 1) To continue to be able to use the atomic helpers 2) To support cases where display and gpu are different drivers See https://patchwork.freedesktop.org/series/93035/ This does not yet add any UAPI, although this will be needed in a number of cases: 1) Workloads "ping-ponging" between CPU and GPU, where we don't want the GPU freq governor to interpret time stalled waiting for GPU as "idle" time 2) Cases where the compositor is waiting for fences to be signaled before issuing the atomic ioctl, for example to maintain 60fps cursor updates even when the GPU is not able to maintain that framerate. Signed-off-by: Rob Clark <[email protected]>
2023-03-28drm/vblank: Add helper to get next vblank timeRob Clark1-0/+1
Will be used in the next commit to set a deadline on fences that an atomic update is waiting on. v2: Calculate time at *start* of vblank period, not end v3: Fix kbuild complaints Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Mario Kleiner <[email protected]>
2023-03-28drm/scheduler: Add fence deadline supportRob Clark1-0/+17
As the finished fence is the one that is exposed to userspace, and therefore the one that other operations, like atomic update, would block on, we need to propagate the deadline from from the finished fence to the actual hw fence. v2: Split into drm_sched_fence_set_parent() (ckoenig) v3: Ensure a thread calling drm_sched_fence_set_deadline_finished() sees fence->parent set before drm_sched_fence_set_parent() does this test_bit(DMA_FENCE_FLAG_HAS_DEADLINE_BIT). Signed-off-by: Rob Clark <[email protected]> Acked-by: Luben Tuikov <[email protected]>
2023-03-28dma-buf/sync_file: Surface sync-file uABIRob Clark1-22/+15
We had all of the internal driver APIs, but not the all important userspace uABI, in the dma-buf doc. Fix that. And re-arrange the comments slightly as otherwise the comments for the ioctl nr defines would not show up. v2: Fix docs build warning coming from newly including the uabi header in the docs build Signed-off-by: Rob Clark <[email protected]> Acked-by: Pekka Paalanen <[email protected]>
2023-03-28dma-buf/dma-resv: Add a way to set fence deadlineRob Clark1-0/+2
Add a way to set a deadline on remaining resv fences according to the requested usage. Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Christian König <[email protected]>
2023-03-28Merge tag 'urgent-rcu.2023.03.28a' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu Pull RCU fix from Paul McKenney: "This brings the rcu_torture_read event trace into line with the new trace tools by replacing this event trace's __field() with the corresponding __array(). Without this, the new trace tools will fail when presented wtih an rcu_torture_read event trace, which is a regression from the viewpoint of trace tools users" Link: https://lore.kernel.org/all/[email protected]/ * tag 'urgent-rcu.2023.03.28a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: rcu: Fix rcu_torture_read ftrace event
2023-03-28firmware: arm_sdei: Fix sleep from invalid context BUGPierre Gondois1-1/+0
Running a preempt-rt (v6.2-rc3-rt1) based kernel on an Ampere Altra triggers: BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:46 in_atomic(): 0, irqs_disabled(): 128, non_block: 0, pid: 24, name: cpuhp/0 preempt_count: 0, expected: 0 RCU nest depth: 0, expected: 0 3 locks held by cpuhp/0/24: #0: ffffda30217c70d0 (cpu_hotplug_lock){++++}-{0:0}, at: cpuhp_thread_fun+0x5c/0x248 #1: ffffda30217c7120 (cpuhp_state-up){+.+.}-{0:0}, at: cpuhp_thread_fun+0x5c/0x248 #2: ffffda3021c711f0 (sdei_list_lock){....}-{3:3}, at: sdei_cpuhp_up+0x3c/0x130 irq event stamp: 36 hardirqs last enabled at (35): [<ffffda301e85b7bc>] finish_task_switch+0xb4/0x2b0 hardirqs last disabled at (36): [<ffffda301e812fec>] cpuhp_thread_fun+0x21c/0x248 softirqs last enabled at (0): [<ffffda301e80b184>] copy_process+0x63c/0x1ac0 softirqs last disabled at (0): [<0000000000000000>] 0x0 CPU: 0 PID: 24 Comm: cpuhp/0 Not tainted 5.19.0-rc3-rt5-[...] Hardware name: WIWYNN Mt.Jade Server [...] Call trace: dump_backtrace+0x114/0x120 show_stack+0x20/0x70 dump_stack_lvl+0x9c/0xd8 dump_stack+0x18/0x34 __might_resched+0x188/0x228 rt_spin_lock+0x70/0x120 sdei_cpuhp_up+0x3c/0x130 cpuhp_invoke_callback+0x250/0xf08 cpuhp_thread_fun+0x120/0x248 smpboot_thread_fn+0x280/0x320 kthread+0x130/0x140 ret_from_fork+0x10/0x20 sdei_cpuhp_up() is called in the STARTING hotplug section, which runs with interrupts disabled. Use a CPUHP_AP_ONLINE_DYN entry instead to execute the cpuhp cb later, with preemption enabled. SDEI originally got its own cpuhp slot to allow interacting with perf. It got superseded by pNMI and this early slot is not relevant anymore. [1] Some SDEI calls (e.g. SDEI_1_0_FN_SDEI_PE_MASK) take actions on the calling CPU. It is checked that preemption is disabled for them. _ONLINE cpuhp cb are executed in the 'per CPU hotplug thread'. Preemption is enabled in those threads, but their cpumask is limited to 1 CPU. Move 'WARN_ON_ONCE(preemptible())' statements so that SDEI cpuhp cb don't trigger them. Also add a check for the SDEI_1_0_FN_SDEI_PRIVATE_RESET SDEI call which acts on the calling CPU. [1]: https://lore.kernel.org/all/[email protected]/ Suggested-by: James Morse <[email protected]> Signed-off-by: Pierre Gondois <[email protected]> Reviewed-by: James Morse <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]>
2023-03-28atomics: Provide rcuref - scalable reference countingThomas Gleixner2-0/+161
atomic_t based reference counting, including refcount_t, uses atomic_inc_not_zero() for acquiring a reference. atomic_inc_not_zero() is implemented with a atomic_try_cmpxchg() loop. High contention of the reference count leads to retry loops and scales badly. There is nothing to improve on this implementation as the semantics have to be preserved. Provide rcuref as a scalable alternative solution which is suitable for RCU managed objects. Similar to refcount_t it comes with overflow and underflow detection and mitigation. rcuref treats the underlying atomic_t as an unsigned integer and partitions this space into zones: 0x00000000 - 0x7FFFFFFF valid zone (1 .. (INT_MAX + 1) references) 0x80000000 - 0xBFFFFFFF saturation zone 0xC0000000 - 0xFFFFFFFE dead zone 0xFFFFFFFF no reference rcuref_get() unconditionally increments the reference count with atomic_add_negative_relaxed(). rcuref_put() unconditionally decrements the reference count with atomic_add_negative_release(). This unconditional increment avoids the inc_not_zero() problem, but requires a more complex implementation on the put() side when the count drops from 0 to -1. When this transition is detected then it is attempted to mark the reference count dead, by setting it to the midpoint of the dead zone with a single atomic_cmpxchg_release() operation. This operation can fail due to a concurrent rcuref_get() elevating the reference count from -1 to 0 again. If the unconditional increment in rcuref_get() hits a reference count which is marked dead (or saturated) it will detect it after the fact and bring back the reference count to the midpoint of the respective zone. The zones provide enough tolerance which makes it practically impossible to escape from a zone. The racy implementation of rcuref_put() requires to protect rcuref_put() against a grace period ending in order to prevent a subtle use after free. As RCU is the only mechanism which allows to protect against that, it is not possible to fully replace the atomic_inc_not_zero() based implementation of refcount_t with this scheme. The final drop is slightly more expensive than the atomic_dec_return() counterpart, but that's not the case which this is optimized for. The optimization is on the high frequeunt get()/put() pairs and their scalability. The performance of an uncontended rcuref_get()/put() pair where the put() is not dropping the last reference is still on par with the plain atomic operations, while at the same time providing overflow and underflow detection and mitigation. The performance of rcuref compared to plain atomic_inc_not_zero() and atomic_dec_return() based reference counting under contention: - Micro benchmark: All CPUs running a increment/decrement loop on an elevated reference count, which means the 0 to -1 transition never happens. The performance gain depends on microarchitecture and the number of CPUs and has been observed in the range of 1.3X to 4.7X - Conversion of dst_entry::__refcnt to rcuref and testing with the localhost memtier/memcached benchmark. That benchmark shows the reference count contention prominently. The performance gain depends on microarchitecture and the number of CPUs and has been observed in the range of 1.1X to 2.6X over the previous fix for the false sharing issue vs. struct dst_entry::__refcnt. When memtier is run over a real 1Gb network connection, there is a small gain on top of the false sharing fix. The two changes combined result in a 2%-5% total gain for that networked test. Reported-by: Wangyang Guo <[email protected]> Reported-by: Arjan Van De Ven <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-03-28atomics: Provide atomic_add_negative() variantsThomas Gleixner3-11/+303
atomic_add_negative() does not provide the relaxed/acquire/release variants. Provide them in preparation for a new scalable reference count algorithm. Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Acked-by: Mark Rutland <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-03-28driver core: bus: constify class_unregister/destroy()Greg Kroah-Hartman1-2/+2
The class_unregister() and class_destroy() function should be taking a const * to struct class, not just a *, so fix that up. Cc: "Rafael J. Wysocki" <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2023-03-27fscrypt: new helper function - fscrypt_prepare_lookup_partial()Luís Henriques1-0/+7
This patch introduces a new helper function which can be used both in lookups and in atomic_open operations by filesystems that want to handle filename encryption and no-key dentries themselves. The reason for this function to be used in atomic open is that this operation can act as a lookup if handed a dentry that is negative. And in this case we may need to set DCACHE_NOKEY_NAME. Signed-off-by: Luís Henriques <[email protected]> Tested-by: Xiubo Li <[email protected]> Reviewed-by: Xiubo Li <[email protected]> [ebiggers: improved the function comment, and moved the function to just below __fscrypt_prepare_lookup()] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Eric Biggers <[email protected]>
2023-03-27ethtool: Add support for configuring tx_push_buf_lenShay Agroskin2-4/+12
This attribute, which is part of ethtool's ring param configuration allows the user to specify the maximum number of the packet's payload that can be written directly to the device. Example usage: # ethtool -G [interface] tx-push-buf-len [number of bytes] Co-developed-by: Jakub Kicinski <[email protected]> Signed-off-by: Shay Agroskin <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2023-03-27netlink: Add a macro to set policy message with format stringShay Agroskin1-0/+22
Similar to NL_SET_ERR_MSG_FMT, add a macro which sets netlink policy error message with a format string. Signed-off-by: Shay Agroskin <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2023-03-27net: introduce a config option to tweak MAX_SKB_FRAGSEric Dumazet1-11/+5
Currently, MAX_SKB_FRAGS value is 17. For standard tcp sendmsg() traffic, no big deal because tcp_sendmsg() attempts order-3 allocations, stuffing 32768 bytes per frag. But with zero copy, we use order-0 pages. For BIG TCP to show its full potential, we add a config option to be able to fit up to 45 segments per skb. This is also needed for BIG TCP rx zerocopy, as zerocopy currently does not support skbs with frag list. We have used MAX_SKB_FRAGS=45 value for years at Google before we deployed 4K MTU, with no adverse effect, other than a recent issue in mlx4, fixed in commit 26782aad00cc ("net/mlx4: MLX4_TX_BOUNCE_BUFFER_SIZE depends on MAX_SKB_FRAGS") Back then, goal was to be able to receive full size (64KB) GRO packets without the frag_list overhead. Note that /proc/sys/net/core/max_skb_frags can also be used to limit the number of fragments TCP can use in tx packets. By default we keep the old/legacy value of 17 until we get more coverage for the updated values. Sizes of struct skb_shared_info on 64bit arches MAX_SKB_FRAGS | sizeof(struct skb_shared_info): ============================================== 17 320 21 320+64 = 384 25 320+128 = 448 29 320+192 = 512 33 320+256 = 576 37 320+320 = 640 41 320+384 = 704 45 320+448 = 768 This inflation might cause problems for drivers assuming they could pack both the incoming packet (for MTU=1500) and skb_shared_info in half a page, using build_skb(). v3: fix build error when CONFIG_NET=n v2: fix two build errors assuming MAX_SKB_FRAGS was "unsigned long" Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: Nikolay Aleksandrov <[email protected]> Reviewed-by: Jason Xing <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-03-28ASoC: soc-core.c: add snd_soc_add_pcm_runtimes()Kuninori Morimoto1-2/+3
Current ASoC supports snd_soc_add_pcm_runtime(), but user need to call it one-by-one if it has multi dai_links. This patch adds snd_soc_add_pcm_runtimes() which supports multi dai_links. Signed-off-by: Kuninori Morimoto <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Mark Brown <[email protected]>
2023-03-28drm: bridge: samsung-dsim: Add i.MX8M Plus supportMarek Vasut1-0/+1
Add extras to support i.MX8M Plus. The main change is the removal of HS/VS/DE signal inversion in the LCDIFv3-DSIM glue logic, otherwise the implementation of this IP in i.MX8M Plus is very much compatible with the i.MX8M Mini/Nano one. Reviewed-by: Marek Vasut <[email protected]> Reviewed-by: Frieder Schrempf <[email protected]> Acked-by: Robert Foss <[email protected]> Signed-off-by: Marek Vasut <[email protected]> Signed-off-by: Jagan Teki <[email protected]> Tested-by: Marek Szyprowski <[email protected]> Signed-off-by: Inki Dae <[email protected]>
2023-03-28drm: bridge: Generalize Exynos-DSI driver into a Samsung DSIM bridgeJagan Teki1-0/+114
Samsung MIPI DSIM controller is common DSI IP that can be used in various SoCs like Exynos, i.MX8M Mini/Nano. In order to access this DSI controller between various platform SoCs, the ideal way to incorporate this in the drm stack is via the drm bridge driver. We already have a consolidated code for supporting component and bridge based DRM drivers, so keep the exynos component based code in existing exynos_drm_dsi.c and move generic bridge code as part of samsung-dsim.c Tested-by: Marek Szyprowski <[email protected]> Reviewed-by: Marek Vasut <[email protected]> Signed-off-by: Marek Szyprowski <[email protected]> Signed-off-by: Jagan Teki <[email protected]> Signed-off-by: Inki Dae <[email protected]>
2023-03-27dt-bindings: reset: add BCM63268 timer reset definitionsÁlvaro Fernández Rojas1-0/+4
Add missing timer reset definitions for BCM63268. Signed-off-by: Álvaro Fernández Rojas <[email protected]> Acked-by: Rob Herring <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Stephen Boyd <[email protected]>