aboutsummaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)AuthorFilesLines
2022-07-19gpio: ucb1400: Remove platform setup and teardown supportUwe Kleine-König1-2/+0
There is no user of these callbacks. The motivation for this change is to stop returning an error code from the remove callback. This is a preparation for making platform remove callbacks return void. Signed-off-by: Uwe Kleine-König <[email protected]> Signed-off-by: Bartosz Golaszewski <[email protected]>
2022-07-19gpio: twl4030: Drop platform teardown callbackUwe Kleine-König1-2/+0
There is no machine providing a teardown callback, so drop the unused code. This is a preparation for making platform remove callbacks return void. Signed-off-by: Uwe Kleine-König <[email protected]> Signed-off-by: Bartosz Golaszewski <[email protected]>
2022-07-19gpiolib: devres: Get rid of unused devm_gpio_free()Andy Shevchenko1-6/+0
The last user, which in fact was a dead code, has gone a year ago, previous one 3 years ago. On top of that we want to drop away the legacy GPIO APIs in the kernel, so take a chance to get rid of unused devm_gpio_free() and accompanying stuff. Signed-off-by: Andy Shevchenko <[email protected]> Signed-off-by: Bartosz Golaszewski <[email protected]>
2022-07-19dma-iommu: add iommu_dma_opt_mapping_size()John Garry1-0/+2
Add the IOMMU callback for DMA mapping API dma_opt_mapping_size(), which allows the drivers to know the optimal mapping limit and thus limit the requested IOVA lengths. This value is based on the IOVA rcache range limit, as IOVAs allocated above this limit must always be newly allocated, which may be quite slow. Signed-off-by: John Garry <[email protected]> Reviewed-by: Damien Le Moal <[email protected]> Acked-by: Robin Murphy <[email protected]> Acked-by: Martin K. Petersen <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2022-07-19dma-mapping: add dma_opt_mapping_size()John Garry2-0/+6
Streaming DMA mapping involving an IOMMU may be much slower for larger total mapping size. This is because every IOMMU DMA mapping requires an IOVA to be allocated and freed. IOVA sizes above a certain limit are not cached, which can have a big impact on DMA mapping performance. Provide an API for device drivers to know this "optimal" limit, such that they may try to produce mapping which don't exceed it. Signed-off-by: John Garry <[email protected]> Reviewed-by: Damien Le Moal <[email protected]> Acked-by: Martin K. Petersen <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2022-07-18net: devlink: remove unused locked functionsJiri Pirko1-20/+0
Remove locked versions of functions that are no longer used by anyone. Signed-off-by: Jiri Pirko <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-07-18netdevsim: convert driver to use unlocked devlink API during init/finiJiri Pirko1-0/+1
Prepare for devlink reload being called with devlink->lock held and convert the netdevsim driver to use unlocked devlink API during init and fini flows. Take devl_lock() in reload_down() and reload_up() ops in the meantime before reload cmd is converted to take the lock itself. Signed-off-by: Jiri Pirko <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-07-18net: devlink: add unlocked variants of devlink_region_create/destroy() functionsJiri Pirko1-0/+5
Add unlocked variants of devlink_region_create/destroy() functions to be used in drivers called-in with devlink->lock held. Signed-off-by: Jiri Pirko <[email protected]> Reviewed-by: Moshe Shemesh <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-07-18net: devlink: add unlocked variants of devlink_dpipe*() functionsJiri Pirko1-0/+12
Add unlocked variants of devlink_dpipe*() functions to be used in drivers called-in with devlink->lock held. Signed-off-by: Jiri Pirko <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-07-18net: devlink: add unlocked variants of devlink_sb*() functionsJiri Pirko1-0/+5
Add unlocked variants of devlink_sb*() functions to be used in drivers called-in with devlink->lock held. Signed-off-by: Jiri Pirko <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-07-18net: devlink: add unlocked variants of devlink_resource*() functionsJiri Pirko1-0/+17
Add unlocked variants of devlink_resource*() functions to be used in drivers called-in with devlink->lock held. Signed-off-by: Jiri Pirko <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-07-18net: devlink: add unlocked variants of devling_trap*() functionsJiri Pirko1-0/+20
Add unlocked variants of devl_trap*() functions to be used in drivers called-in with devlink->lock held. Signed-off-by: Jiri Pirko <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-07-18skbuff: add SKBFL_DONT_ORPHAN flagPavel Begunkov1-3/+5
We don't want to list every single ubuf_info callback in skb_orphan_frags(), add a flag controlling the behaviour. Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-07-18dt-bindings: power: qcom-rpmpd: Add MSM8909 power domainsStephan Gerhold1-0/+7
MSM8909 has the same power domains as MSM8916 so just define them as aliases for the existing definitions. Signed-off-by: Stephan Gerhold <[email protected]> Acked-by: Krzysztof Kozlowski <[email protected]> Signed-off-by: Bjorn Andersson <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-07-18mm: fix missing wake-up event for FSDAX pagesMuchun Song1-5/+9
FSDAX page refcounts are 1-based, rather than 0-based: if refcount is 1, then the page is freed. The FSDAX pages can be pinned through GUP, then they will be unpinned via unpin_user_page() using a folio variant to put the page, however, folio variants did not consider this special case, the result will be to miss a wakeup event (like the user of __fuse_dax_break_layouts()). This results in a task being permanently stuck in TASK_INTERRUPTIBLE state. Since FSDAX pages are only possibly obtained by GUP users, so fix GUP instead of folio_put() to lower overhead. Link: https://lkml.kernel.org/r/[email protected] Fixes: d8ddc099c6b3 ("mm/gup: Add gup_put_folio()") Signed-off-by: Muchun Song <[email protected]> Suggested-by: Matthew Wilcox <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: John Hubbard <[email protected]> Cc: William Kucharski <[email protected]> Cc: Dan Williams <[email protected]> Cc: Jan Kara <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-07-18drm/amdgpu: add comment to HW_IP_VCN_ENC typeRuijing Dong1-0/+4
From VCN4, AMDGPU_HW_IP_VCN_ENC is re-used to support both encoding and decoding jobs. Link: https://gitlab.freedesktop.org/mesa/drm/-/merge_requests/245/commits Reviewed-by: Christian König <[email protected]> Reviewed-by: Leo Liu <[email protected]> Signed-off-by: Ruijing Dong <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2022-07-18Merge 5.19-rc7 into usb-nextGreg Kroah-Hartman47-78/+163
We need the USB fixes in here as well. Signed-off-by: Greg Kroah-Hartman <[email protected]>
2022-07-18dma-buf/dma_resv_usage: update explicit sync documentationChristian König1-3/+13
Make it clear that DMA_RESV_USAGE_BOOKKEEP can be used for explicit synced user space submissions as well and document the rules around adding the same fence with different usages. Signed-off-by: Christian König <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Acked-by: Alex Deucher <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2022-07-18ublk: remove UBLK_IO_F_INTEGRITYChristoph Hellwig1-1/+0
The ublk protocol has no mechanism to actually transfer the integrity metadata, so don't define this flag, which requires that an integrity payload is attached to a bio. Signed-off-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-07-18iio: cros: Register FIFO callback after sensor is registeredGwendal Grignou1-2/+5
Instead of registering callback to process sensor events right at initialization time, wait for the sensor to be register in the iio subsystem. Events can come at probe time (in case the kernel rebooted abruptly without switching the sensor off for instance), and be sent to IIO core before the sensor is fully registered. Fixes: aa984f1ba4a4 ("iio: cros_ec: Register to cros_ec_sensorhub when EC supports FIFO") Reported-by: Douglas Anderson <[email protected]> Signed-off-by: Gwendal Grignou <[email protected]> Reviewed-by: Douglas Anderson <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jonathan Cameron <[email protected]>
2022-07-18random: remove CONFIG_ARCH_RANDOMJason A. Donenfeld3-8/+27
When RDRAND was introduced, there was much discussion on whether it should be trusted and how the kernel should handle that. Initially, two mechanisms cropped up, CONFIG_ARCH_RANDOM, a compile time switch, and "nordrand", a boot-time switch. Later the thinking evolved. With a properly designed RNG, using RDRAND values alone won't harm anything, even if the outputs are malicious. Rather, the issue is whether those values are being *trusted* to be good or not. And so a new set of options were introduced as the real ones that people use -- CONFIG_RANDOM_TRUST_CPU and "random.trust_cpu". With these options, RDRAND is used, but it's not always credited. So in the worst case, it does nothing, and in the best case, maybe it helps. Along the way, CONFIG_ARCH_RANDOM's meaning got sort of pulled into the center and became something certain platforms force-select. The old options don't really help with much, and it's a bit odd to have special handling for these instructions when the kernel can deal fine with the existence or untrusted existence or broken existence or non-existence of that CPU capability. Simplify the situation by removing CONFIG_ARCH_RANDOM and using the ordinary asm-generic fallback pattern instead, keeping the two options that are actually used. For now it leaves "nordrand" for now, as the removal of that will take a different route. Acked-by: Michael Ellerman <[email protected]> Acked-by: Catalin Marinas <[email protected]> Acked-by: Borislav Petkov <[email protected]> Acked-by: Heiko Carstens <[email protected]> Acked-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Jason A. Donenfeld <[email protected]>
2022-07-18tcp: Fix a data-race around sysctl_tcp_notsent_lowat.Kuniyuki Iwashima1-1/+1
While reading sysctl_tcp_notsent_lowat, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Fixes: c9bee3b7fdec ("tcp: TCP_NOTSENT_LOWAT socket option") Signed-off-by: Kuniyuki Iwashima <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-07-18tcp: Fix data-races around some timeout sysctl knobs.Kuniyuki Iwashima1-1/+2
While reading these sysctl knobs, they can be changed concurrently. Thus, we need to add READ_ONCE() to their readers. - tcp_retries1 - tcp_retries2 - tcp_orphan_retries - tcp_fin_timeout Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-07-18tcp: Fix data-races around keepalive sysctl knobs.Kuniyuki Iwashima1-3/+6
While reading sysctl_tcp_keepalive_(time|probes|intvl), they can be changed concurrently. Thus, we need to add READ_ONCE() to their readers. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-07-18tls: rx: async: hold onto the input skbJakub Kicinski1-0/+1
Async crypto currently benefits from the fact that we decrypt in place. When we allow input and output to be different skbs we will have to hang onto the input while we move to the next record. Clone the inputs and keep them on a list. Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-07-18tls: rx: remove the message decrypted trackingJakub Kicinski1-1/+0
We no longer allow a decrypted skb to remain linked to ctx->recv_pkt. Anything on the list is decrypted, anything on ctx->recv_pkt needs to be decrypted. Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-07-18tls: rx: allow only one reader at a timeJakub Kicinski1-0/+3
recvmsg() in TLS gets data from the skb list (rx_list) or fresh skbs we read from TCP via strparser. The former holds skbs which were already decrypted for peek or decrypted and partially consumed. tls_wait_data() only notices appearance of fresh skbs coming out of TCP (or psock). It is possible, if there is a concurrent call to peek() and recv() that the peek() will move the data from input to rx_list without recv() noticing. recv() will then read data out of order or never wake up. This is not a practical use case/concern, but it makes the self tests less reliable. This patch solves the problem by allowing only one reader in. Because having multiple processes calling read()/peek() is not normal avoid adding a lock and try to fast-path the single reader case. Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-07-18net/smc: Extend SMC-R link group netlink attributeWen Gu1-0/+1
Extend SMC-R link group netlink attribute SMC_GEN_LGR_SMCR. Introduce SMC_NLA_LGR_R_BUF_TYPE to show the buffer type of SMC-R link group. Signed-off-by: Wen Gu <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-07-18net/smc: Introduce a sysctl for setting SMC-R buffer typeWen Gu1-0/+1
This patch introduces the sysctl smcr_buf_type for setting the type of SMC-R sndbufs and RMBs. Valid values includes: - SMCR_PHYS_CONT_BUFS, which means use physically contiguous buffers for better performance and is the default value. - SMCR_VIRT_CONT_BUFS, which means use virtually contiguous buffers in case of physically contiguous memory is scarce. - SMCR_MIXED_BUFS, which means first try to use physically contiguous buffers. If not available, then use virtually contiguous buffers. Signed-off-by: Wen Gu <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-07-18net: stmmac: switch to use interrupt for hw crosstimestampingWong Vee Khee1-0/+1
Using current implementation of polling mode, there is high chances we will hit into timeout error when running phc2sys. Hence, update the implementation of hardware crosstimestamping to use the MAC interrupt service routine instead of polling for TSIS bit in the MAC Timestamp Interrupt Status register to be set. Cc: Richard Cochran <[email protected]> Signed-off-by: Wong Vee Khee <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-07-18drm/fourcc: Add formats for packed YUV 4:4:4 AVUY and XVUY permutationsLaurent Pinchart1-0/+2
Add FourCCs for two missing permutations of the packed YUV 4:4:4 color components, namely AVUY and XVUY. These formats are needed by the NXP i.MX8 ISI. While the ISI is supported by a V4L2 device (corresponding formats have been submitted to V4L2), it is handled in userspace by libcamera, which uses DRM FourCCs for pixel formats. Signed-off-by: Laurent Pinchart <[email protected]> Reviewed-by: Daniel Vetter <[email protected]> Signed-off-by: Maxime Ripard <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2022-07-18swiotlb: move struct io_tlb_slot to swiotlb.cChristoph Hellwig1-5/+1
No need to expose this structure definition in the header. Signed-off-by: Christoph Hellwig <[email protected]>
2022-07-18swiotlb: remove unused fields in io_tlb_memChao Gao1-5/+0
Commit 20347fca71a3 ("swiotlb: split up the global swiotlb lock") splits io_tlb_mem into multiple areas. Each area has its own lock and index. The global ones are not used so remove them. Signed-off-by: Chao Gao <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2022-07-17cpumask: update cpumask_next_wrap() signatureSander Vanheule1-1/+1
The extern specifier is not needed for this declaration, so drop it. The function also depends only on the input parameters, and has no side effects, so it can be marked __pure like other functions in cpumask.h. Link: https://lkml.kernel.org/r/72ab755695b74bb5fbaa756ae4c0edd708d172f1.1656777646.git.sander@svanheule.net Signed-off-by: Sander Vanheule <[email protected]> Reviewed-by: Andy Shevchenko <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Marco Elver <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Valentin Schneider <[email protected]> Cc: Yury Norov <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-07-17cpumask: Fix invalid uniprocessor mask assumptionSander Vanheule1-80/+19
On uniprocessor builds, any CPU mask is assumed to contain exactly one CPU (cpu0). This assumption ignores the existence of empty masks, resulting in incorrect behaviour. cpumask_first_zero(), cpumask_next_zero(), and for_each_cpu_not() don't provide behaviour matching the assumption that a UP mask is always "1", and instead provide behaviour matching the empty mask. Drop the incorrectly optimised code and use the generic implementations in all cases. Link: https://lkml.kernel.org/r/86bf3f005abba2d92120ddd0809235cab4f759a6.1656777646.git.sander@svanheule.net Signed-off-by: Sander Vanheule <[email protected]> Suggested-by: Yury Norov <[email protected]> Cc: Andy Shevchenko <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Marco Elver <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Valentin Schneider <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-07-17cpumask: add UP optimised for_each_*_cpu versionsSander Vanheule1-0/+7
On uniprocessor builds, the following loops will always run over a mask that contains one enabled CPU (cpu0): - for_each_possible_cpu - for_each_online_cpu - for_each_present_cpu Provide uniprocessor-specific macros for these loops, that always run exactly once. Link: https://lkml.kernel.org/r/3a92869b902a075b97be5d1452c9c6badbbff0df.1656777646.git.sander@svanheule.net Signed-off-by: Sander Vanheule <[email protected]> Acked-by: Yury Norov <[email protected]> Cc: Andy Shevchenko <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Marco Elver <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Valentin Schneider <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-07-17kfifo: fix kfifo_to_user() return typeDan Carpenter1-1/+1
The kfifo_to_user() macro is supposed to return zero for success or negative error codes. Unfortunately, there is a signedness bug so it returns unsigned int. This only affects callers which try to save the result in ssize_t and as far as I can see the only place which does that is line6_hwdep_read(). TL;DR: s/_uint/_int/. Link: https://lkml.kernel.org/r/YrVL3OJVLlNhIMFs@kili Fixes: 144ecf310eb5 ("kfifo: fix kfifo_alloc() to return a signed int value") Signed-off-by: Dan Carpenter <[email protected]> Cc: Stefani Seibold <[email protected]> Cc: Randy Dunlap <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-07-17compiler-gcc.h: remove ancient workaround for gcc PR 58670Uros Bizjak1-11/+0
The workaround for 'asm goto' miscompilation introduces a compiler barrier quirk that inhibits many useful compiler optimizations. For example, __try_cmpxchg_user compiles to: 11375: 41 8b 4d 00 mov 0x0(%r13),%ecx 11379: 41 8b 02 mov (%r10),%eax 1137c: f0 0f b1 0a lock cmpxchg %ecx,(%rdx) 11380: 0f 94 c2 sete %dl 11383: 84 d2 test %dl,%dl 11385: 75 c4 jne 1134b <...> 11387: 41 89 02 mov %eax,(%r10) where the barrier inhibits flags propagation from asm when compiled with gcc-12. When the mentioned quirk is removed, the following code is generated: 11553: 41 8b 4d 00 mov 0x0(%r13),%ecx 11557: 41 8b 02 mov (%r10),%eax 1155a: f0 0f b1 0a lock cmpxchg %ecx,(%rdx) 1155e: 74 c9 je 11529 <...> 11560: 41 89 02 mov %eax,(%r10) The refered compiler bug: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58670 was fixed for gcc-4.8.2. Current minimum required version of GCC is version 5.1 which has the above 'asm goto' miscompilation fixed, so remove the workaround. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Uros Bizjak <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Thomas Gleixner <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-07-17net, lib/once: remove {net_}get_random_once_wait macrowuchi2-4/+0
DO_ONCE(func, ...) will call func with spinlock which acquired by spin_lock_irqsave in __do_once_start. But the get_random_once_wait will sleep in get_random_bytes_wait -> wait_for_random_bytes. Fortunately, there is no place to use {net_}get_random_once_wait, so we could remove them simply. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: wuchi <[email protected]> Acked-by: Jakub Kicinski <[email protected]> Cc: David S. Miller <[email protected]> Cc: Eric Dumazet <[email protected]> Cc: Paolo Abeni <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-07-17writeback: cleanup bdi_sched_wait()Xiu Jianfeng1-6/+0
bdi_sched_wait() is no longer used since commit 839a8e8660b6 ("writeback: replace custom worker pool implementation with unbound workqueue"), so remove it. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Xiu Jianfeng <[email protected]> Reviewed-by: Jan Kara <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]> Acked-by: Jens Axboe <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-07-17mm, hugetlb: skip irrelevant nodes in show_free_areas()Gang Li1-2/+2
show_free_areas() allows to filter out node specific data which is irrelevant to the allocation request. But hugetlb_show_meminfo() still shows hugetlb on all nodes, which is redundant and unnecessary. Use show_mem_node_skip() to skip irrelevant nodes. And replace hugetlb_show_meminfo() with hugetlb_show_meminfo_node(nid). before-and-after sample output of OOM: before: ``` [ 214.362453] Node 1 active_anon:148kB inactive_anon:4050920kB active_file:112kB inactive_file:100kB [ 214.375429] Node 1 Normal free:45100kB boost:0kB min:45576kB low:56968kB high:68360kB reserved_hig [ 214.388334] lowmem_reserve[]: 0 0 0 0 0 [ 214.390251] Node 1 Normal: 423*4kB (UE) 320*8kB (UME) 187*16kB (UE) 117*32kB (UE) 57*64kB (UME) 20 [ 214.397626] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [ 214.401518] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB ``` after: ``` [ 145.069705] Node 1 active_anon:128kB inactive_anon:4049412kB active_file:56kB inactive_file:84kB u [ 145.110319] Node 1 Normal free:45424kB boost:0kB min:45576kB low:56968kB high:68360kB reserved_hig [ 145.152315] lowmem_reserve[]: 0 0 0 0 0 [ 145.155244] Node 1 Normal: 470*4kB (UME) 373*8kB (UME) 247*16kB (UME) 168*32kB (UE) 86*64kB (UME) [ 145.164119] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB ``` Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Gang Li <[email protected]> Reviewed-by: Mike Kravetz <[email protected]> Cc: Muchun Song <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-07-17mm/huge_memory: fix comment of page_deferred_listMiaohe Lin1-2/+2
The current comment is confusing because if global or memcg deferred list in the second tail page is occupied by compound_head, why we still use page[2].deferred_list here? I think it wants to say that Global or memcg deferred list in the first tail page is occupied by compound_mapcount and compound_pincount so we use the second tail page's deferred_list instead. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Miaohe Lin <[email protected]> Reviewed-by: Muchun Song <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Yang Shi <[email protected]> Cc: Zach O'Keefe <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-07-17mm/huge_memory: check pmd_present first in is_huge_zero_pmdMiaohe Lin1-1/+1
When pmd is non-present, pmd_pfn returns an insane value. So we should check pmd_present first to avoid acquiring such insane value and also avoid touching possible cold huge_zero_pfn cache line when pmd isn't present. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Miaohe Lin <[email protected]> Reviewed-by: Muchun Song <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Yang Shi <[email protected]> Cc: Zach O'Keefe <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-07-17mm/mmap: drop ARCH_HAS_VM_GET_PAGE_PROTAnshuman Khandual1-3/+0
Now all the platforms enable ARCH_HAS_GET_PAGE_PROT. They define and export own vm_get_page_prot() whether custom or standard DECLARE_VM_GET_PAGE_PROT. Hence there is no need for default generic fallback for vm_get_page_prot(). Just drop this fallback and also ARCH_HAS_GET_PAGE_PROT mechanism. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Anshuman Khandual <[email protected]> Reviewed-by: Geert Uytterhoeven <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Christophe Leroy <[email protected]> Acked-by: Geert Uytterhoeven <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Brian Cain <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Chris Zankel <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Dinh Nguyen <[email protected]> Cc: Guo Ren <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Huacai Chen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: "James E.J. Bottomley" <[email protected]> Cc: Jeff Dike <[email protected]> Cc: Jonas Bonn <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Michal Simek <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: Richard Henderson <[email protected]> Cc: Rich Felker <[email protected]> Cc: Russell King <[email protected]> Cc: Sam Ravnborg <[email protected]> Cc: Stafford Horne <[email protected]> Cc: Thomas Bogendoerfer <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vasily Gorbik <[email protected]> Cc: Vineet Gupta <[email protected]> Cc: WANG Xuerui <[email protected]> Cc: Will Deacon <[email protected]> Cc: Yoshinori Sato <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-07-17mm/mmap: build protect protection_map[] with ARCH_HAS_VM_GET_PAGE_PROTAnshuman Khandual1-1/+1
Now that protection_map[] has been moved inside those platforms that enable ARCH_HAS_VM_GET_PAGE_PROT. Hence generic protection_map[] array now can be protected with CONFIG_ARCH_HAS_VM_GET_PAGE_PROT intead of __P000. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Anshuman Khandual <[email protected]> Reviewed-by: Christophe Leroy <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Brian Cain <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Chris Zankel <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Dinh Nguyen <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Guo Ren <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Huacai Chen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: "James E.J. Bottomley" <[email protected]> Cc: Jeff Dike <[email protected]> Cc: Jonas Bonn <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Michal Simek <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: Richard Henderson <[email protected]> Cc: Rich Felker <[email protected]> Cc: Russell King <[email protected]> Cc: Sam Ravnborg <[email protected]> Cc: Stafford Horne <[email protected]> Cc: Thomas Bogendoerfer <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vasily Gorbik <[email protected]> Cc: Vineet Gupta <[email protected]> Cc: WANG Xuerui <[email protected]> Cc: Will Deacon <[email protected]> Cc: Yoshinori Sato <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-07-17mm/mmap: define DECLARE_VM_GET_PAGE_PROTAnshuman Khandual1-0/+28
This just converts the generic vm_get_page_prot() implementation into a new macro i.e DECLARE_VM_GET_PAGE_PROT which later can be used across platforms when enabling them with ARCH_HAS_VM_GET_PAGE_PROT. This does not create any functional change. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Anshuman Khandual <[email protected]> Reviewed-by: Christophe Leroy <[email protected]> Suggested-by: Christoph Hellwig <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Brian Cain <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Chris Zankel <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Dinh Nguyen <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Guo Ren <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Huacai Chen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: "James E.J. Bottomley" <[email protected]> Cc: Jeff Dike <[email protected]> Cc: Jonas Bonn <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Michal Simek <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: Richard Henderson <[email protected]> Cc: Rich Felker <[email protected]> Cc: Russell King <[email protected]> Cc: Sam Ravnborg <[email protected]> Cc: Stafford Horne <[email protected]> Cc: Thomas Bogendoerfer <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vasily Gorbik <[email protected]> Cc: Vineet Gupta <[email protected]> Cc: WANG Xuerui <[email protected]> Cc: Will Deacon <[email protected]> Cc: Yoshinori Sato <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-07-17mm/mmap: build protect protection_map[] with __P000Anshuman Khandual1-0/+2
Patch series "mm/mmap: Drop __SXXX/__PXXX macros from across platforms", v7. __SXXX/__PXXX macros are unnecessary abstraction layer in creating the generic protection_map[] array which is used for vm_get_page_prot(). This abstraction layer can be avoided, if the platforms just define the array protection_map[] for all possible vm_flags access permission combinations and also export vm_get_page_prot() implementation. This series drops __SXXX/__PXXX macros from across platforms in the tree. First it build protects generic protection_map[] array with '#ifdef __P000' and moves it inside platforms which enable ARCH_HAS_VM_GET_PAGE_PROT. Later this build protects same array with '#ifdef ARCH_HAS_VM_GET_PAGE_PROT' and moves inside remaining platforms while enabling ARCH_HAS_VM_GET_PAGE_PROT. This adds a new macro DECLARE_VM_GET_PAGE_PROT defining the current generic vm_get_page_prot(), in order for it to be reused on platforms that do not require custom implementation. Finally, ARCH_HAS_VM_GET_PAGE_PROT can just be dropped, as all platforms now define and export vm_get_page_prot(), via looking up a private and static protection_map[] array. protection_map[] data type has been changed as 'static const' on all platforms that do not change it during boot. This patch (of 26): Build protect generic protection_map[] array with __P000, so that it can be moved inside all the platforms one after the other. Otherwise there will be build failures during this process. CONFIG_ARCH_HAS_VM_GET_PAGE_PROT cannot be used for this purpose as only certain platforms enable this config now. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Anshuman Khandual <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Christophe Leroy <[email protected]> Suggested-by: Christophe Leroy <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Brian Cain <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Chris Zankel <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Dinh Nguyen <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Guo Ren <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Huacai Chen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: "James E.J. Bottomley" <[email protected]> Cc: Jeff Dike <[email protected]> Cc: Jonas Bonn <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Michal Simek <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: Richard Henderson <[email protected]> Cc: Rich Felker <[email protected]> Cc: Russell King <[email protected]> Cc: Sam Ravnborg <[email protected]> Cc: Stafford Horne <[email protected]> Cc: Thomas Bogendoerfer <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vasily Gorbik <[email protected]> Cc: Vineet Gupta <[email protected]> Cc: WANG Xuerui <[email protected]> Cc: Will Deacon <[email protected]> Cc: Yoshinori Sato <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-07-17mm/page_alloc: protect PCP lists with a spinlockMel Gorman1-0/+1
Currently the PCP lists are protected by using local_lock_irqsave to prevent migration and IRQ reentrancy but this is inconvenient. Remote draining of the lists is impossible and a workqueue is required and every task allocation/free must disable then enable interrupts which is expensive. As preparation for dealing with both of those problems, protect the lists with a spinlock. The IRQ-unsafe version of the lock is used because IRQs are already disabled by local_lock_irqsave. spin_trylock is used in combination with local_lock_irqsave() but later will be replaced with a spin_trylock_irqsave when the local_lock is removed. The per_cpu_pages still fits within the same number of cache lines after this patch relative to before the series. struct per_cpu_pages { spinlock_t lock; /* 0 4 */ int count; /* 4 4 */ int high; /* 8 4 */ int batch; /* 12 4 */ short int free_factor; /* 16 2 */ short int expire; /* 18 2 */ /* XXX 4 bytes hole, try to pack */ struct list_head lists[13]; /* 24 208 */ /* size: 256, cachelines: 4, members: 7 */ /* sum members: 228, holes: 1, sum holes: 4 */ /* padding: 24 */ } __attribute__((__aligned__(64))); There is overhead in the fast path due to acquiring the spinlock even though the spinlock is per-cpu and uncontended in the common case. Page Fault Test (PFT) running on a 1-socket reported the following results on a 1 socket machine. 5.19.0-rc3 5.19.0-rc3 vanilla mm-pcpspinirq-v5r16 Hmean faults/sec-1 869275.7381 ( 0.00%) 874597.5167 * 0.61%* Hmean faults/sec-3 2370266.6681 ( 0.00%) 2379802.0362 * 0.40%* Hmean faults/sec-5 2701099.7019 ( 0.00%) 2664889.7003 * -1.34%* Hmean faults/sec-7 3517170.9157 ( 0.00%) 3491122.8242 * -0.74%* Hmean faults/sec-8 3965729.6187 ( 0.00%) 3939727.0243 * -0.66%* There is a small hit in the number of faults per second but given that the results are more stable, it's borderline noise. [[email protected]: add missing local_unlock_irqrestore() on contention path] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Mel Gorman <[email protected]> Tested-by: Yu Zhao <[email protected]> Reviewed-by: Nicolas Saenz Julienne <[email protected]> Tested-by: Nicolas Saenz Julienne <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Marcelo Tosatti <[email protected]> Cc: Marek Szyprowski <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Minchan Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-07-17mm/page_alloc: use only one PCP list for THP-sized allocationsMel Gorman1-4/+7
The per_cpu_pages is cache-aligned on a standard x86-64 distribution configuration but a later patch will add a new field which would push the structure into the next cache line. Use only one list to store THP-sized pages on the per-cpu list. This assumes that the vast majority of THP-sized allocations are GFP_MOVABLE but even if it was another type, it would not contribute to serious fragmentation that potentially causes a later THP allocation failure. Align per_cpu_pages on the cacheline boundary to ensure there is no false cache sharing. After this patch, the structure sizing is; struct per_cpu_pages { int count; /* 0 4 */ int high; /* 4 4 */ int batch; /* 8 4 */ short int free_factor; /* 12 2 */ short int expire; /* 14 2 */ struct list_head lists[13]; /* 16 208 */ /* size: 256, cachelines: 4, members: 6 */ /* padding: 32 */ } __attribute__((__aligned__(64))); Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Mel Gorman <[email protected]> Tested-by: Minchan Kim <[email protected]> Acked-by: Minchan Kim <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Tested-by: Yu Zhao <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Marcelo Tosatti <[email protected]> Cc: Marek Szyprowski <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Nicolas Saenz Julienne <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-07-17mm/page_alloc: add page->buddy_list and page->pcp_listMel Gorman1-0/+5
Patch series "Drain remote per-cpu directly", v5. Some setups, notably NOHZ_FULL CPUs, may be running realtime or latency-sensitive applications that cannot tolerate interference due to per-cpu drain work queued by __drain_all_pages(). Introduce a new mechanism to remotely drain the per-cpu lists. It is made possible by remotely locking 'struct per_cpu_pages' new per-cpu spinlocks. This has two advantages, the time to drain is more predictable and other unrelated tasks are not interrupted. This series has the same intent as Nicolas' series "mm/page_alloc: Remote per-cpu lists drain support" -- avoid interference of a high priority task due to a workqueue item draining per-cpu page lists. While many workloads can tolerate a brief interruption, it may cause a real-time task running on a NOHZ_FULL CPU to miss a deadline and at minimum, the draining is non-deterministic. Currently an IRQ-safe local_lock protects the page allocator per-cpu lists. The local_lock on its own prevents migration and the IRQ disabling protects from corruption due to an interrupt arriving while a page allocation is in progress. This series adjusts the locking. A spinlock is added to struct per_cpu_pages to protect the list contents while local_lock_irq is ultimately replaced by just the spinlock in the final patch. This allows a remote CPU to safely. Follow-on work should allow the spin_lock_irqsave to be converted to spin_lock to avoid IRQs being disabled/enabled in most cases. The follow-on patch will be one kernel release later as it is relatively high risk and it'll make bisections more clear if there are any problems. Patch 1 is a cosmetic patch to clarify when page->lru is storing buddy pages and when it is storing per-cpu pages. Patch 2 shrinks per_cpu_pages to make room for a spin lock. Strictly speaking this is not necessary but it avoids per_cpu_pages consuming another cache line. Patch 3 is a preparation patch to avoid code duplication. Patch 4 is a minor correction. Patch 5 uses a spin_lock to protect the per_cpu_pages contents while still relying on local_lock to prevent migration, stabilise the pcp lookup and prevent IRQ reentrancy. Patch 6 remote drains per-cpu pages directly instead of using a workqueue. Patch 7 uses a normal spinlock instead of local_lock for remote draining This patch (of 7): The page allocator uses page->lru for storing pages on either buddy or PCP lists. Create page->buddy_list and page->pcp_list as a union with page->lru. This is simply to clarify what type of list a page is on in the page allocator. No functional change intended. [[email protected]: fix page lru fields in macros] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Mel Gorman <[email protected]> Tested-by: Minchan Kim <[email protected]> Acked-by: Minchan Kim <[email protected]> Reviewed-by: Nicolas Saenz Julienne <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Tested-by: Yu Zhao <[email protected]> Cc: Marcelo Tosatti <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Marek Szyprowski <[email protected]> Signed-off-by: Andrew Morton <[email protected]>