aboutsummaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)AuthorFilesLines
2022-07-19mfd: twl: Remove platform data supportUwe Kleine-König1-55/+0
There is no in-tree machine that provides a struct twl4030_platform_data since commit e92fc4f04a34 ("ARM: OMAP2+: Drop legacy board file for LDP"). So assume dev_get_platdata() returns NULL in twl_probe() and simplify accordingly. Signed-off-by: Uwe Kleine-König <[email protected]> Signed-off-by: Lee Jones <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-07-19mfd: cros_ec: Add SCP Core-1 as a new CrOS EC MCUTinghan Shen2-0/+3
MT8195 System Companion Processors(SCP) is a dual-core RISC-V MCU. Add a new CrOS feature ID to represent the SCP's 2nd core. The 1st core is referred to as 'core 0', and the 2nd core is referred to as 'core 1'. Signed-off-by: Tinghan Shen <[email protected]> Signed-off-by: Lee Jones <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-07-19mfd: mt6358-irq: Add MT6357 PMIC supportFabien Parent1-0/+1
Add MT6357 PMIC IRQ support. Signed-off-by: Fabien Parent <[email protected]> Signed-off-by: Lee Jones <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-07-19mfd: mt6397-core: Add MT6357 PMIC supportFabien Parent2-0/+1693
Adds support for PMIC keys, Regulator, and RTC for the MT6357 PMIC. Signed-off-by: Fabien Parent <[email protected]> Signed-off-by: Lee Jones <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-07-19mfd: tc6387xb: Drop disable callback that is never calledUwe Kleine-König1-1/+0
The driver never calls the disable callback, so drop the member from the platform struct and all callbacks from the actual platform datas. Signed-off-by: Uwe Kleine-König <[email protected]> Signed-off-by: Lee Jones <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-07-19mfd: t7l66xb: Drop platform disable callbackUwe Kleine-König1-1/+0
None of the in-tree instantiations of struct t7l66xb_platform_data provides a disable callback. So better don't dereference this function pointer unconditionally. As there is no user, drop it completely instead of calling it conditional. This is a preparation for making platform remove callbacks return void. Fixes: 1f192015ca5b ("mfd: driver for the T7L66XB TMIO SoC") Signed-off-by: Uwe Kleine-König <[email protected]> Signed-off-by: Lee Jones <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-07-19mfd: max77714: Update Luca Ceresoli's e-mail addressLuca Ceresoli1-1/+1
My Bootlin address is preferred from now on. Signed-off-by: Luca Ceresoli <[email protected]> Signed-off-by: Luca Ceresoli <[email protected]> Signed-off-by: Lee Jones <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-07-19Merge branches 'ib-mfd-acpi-for-rafael-5.20', ↵Lee Jones3-2/+29
'ib-mfd-edac-i2c-leds-pinctrl-platform-watchdog-5.20' and 'ib-mfd-soc-bcm-5.20' into ibs-for-mfd-merged
2022-07-19gpiolib: of: support bias pull disableNuno Sá1-0/+1
On top of looking at PULL_UP and PULL_DOWN flags, also look at PULL_DISABLE and set the appropriate GPIO flag. The GPIO core will then pass down this to controllers that support it. Signed-off-by: Nuno Sá <[email protected]> Reviewed-by: Linus Walleij <[email protected]> Signed-off-by: Bartosz Golaszewski <[email protected]>
2022-07-19gpiolib: add support for bias pull disableNuno Sá1-0/+1
This change prepares the gpio core to look at firmware flags and set 'FLAG_BIAS_DISABLE' if necessary. It works in similar way to 'GPIO_PULL_DOWN' and 'GPIO_PULL_UP'. Signed-off-by: Nuno Sá <[email protected]> Signed-off-by: Bartosz Golaszewski <[email protected]>
2022-07-19gpio: ucb1400: Remove platform setup and teardown supportUwe Kleine-König1-2/+0
There is no user of these callbacks. The motivation for this change is to stop returning an error code from the remove callback. This is a preparation for making platform remove callbacks return void. Signed-off-by: Uwe Kleine-König <[email protected]> Signed-off-by: Bartosz Golaszewski <[email protected]>
2022-07-19gpio: twl4030: Drop platform teardown callbackUwe Kleine-König1-2/+0
There is no machine providing a teardown callback, so drop the unused code. This is a preparation for making platform remove callbacks return void. Signed-off-by: Uwe Kleine-König <[email protected]> Signed-off-by: Bartosz Golaszewski <[email protected]>
2022-07-19gpiolib: devres: Get rid of unused devm_gpio_free()Andy Shevchenko1-6/+0
The last user, which in fact was a dead code, has gone a year ago, previous one 3 years ago. On top of that we want to drop away the legacy GPIO APIs in the kernel, so take a chance to get rid of unused devm_gpio_free() and accompanying stuff. Signed-off-by: Andy Shevchenko <[email protected]> Signed-off-by: Bartosz Golaszewski <[email protected]>
2022-07-19dma-iommu: add iommu_dma_opt_mapping_size()John Garry1-0/+2
Add the IOMMU callback for DMA mapping API dma_opt_mapping_size(), which allows the drivers to know the optimal mapping limit and thus limit the requested IOVA lengths. This value is based on the IOVA rcache range limit, as IOVAs allocated above this limit must always be newly allocated, which may be quite slow. Signed-off-by: John Garry <[email protected]> Reviewed-by: Damien Le Moal <[email protected]> Acked-by: Robin Murphy <[email protected]> Acked-by: Martin K. Petersen <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2022-07-19dma-mapping: add dma_opt_mapping_size()John Garry2-0/+6
Streaming DMA mapping involving an IOMMU may be much slower for larger total mapping size. This is because every IOMMU DMA mapping requires an IOVA to be allocated and freed. IOVA sizes above a certain limit are not cached, which can have a big impact on DMA mapping performance. Provide an API for device drivers to know this "optimal" limit, such that they may try to produce mapping which don't exceed it. Signed-off-by: John Garry <[email protected]> Reviewed-by: Damien Le Moal <[email protected]> Acked-by: Martin K. Petersen <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2022-07-18skbuff: add SKBFL_DONT_ORPHAN flagPavel Begunkov1-3/+5
We don't want to list every single ubuf_info callback in skb_orphan_frags(), add a flag controlling the behaviour. Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-07-18mm: fix missing wake-up event for FSDAX pagesMuchun Song1-5/+9
FSDAX page refcounts are 1-based, rather than 0-based: if refcount is 1, then the page is freed. The FSDAX pages can be pinned through GUP, then they will be unpinned via unpin_user_page() using a folio variant to put the page, however, folio variants did not consider this special case, the result will be to miss a wakeup event (like the user of __fuse_dax_break_layouts()). This results in a task being permanently stuck in TASK_INTERRUPTIBLE state. Since FSDAX pages are only possibly obtained by GUP users, so fix GUP instead of folio_put() to lower overhead. Link: https://lkml.kernel.org/r/[email protected] Fixes: d8ddc099c6b3 ("mm/gup: Add gup_put_folio()") Signed-off-by: Muchun Song <[email protected]> Suggested-by: Matthew Wilcox <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: John Hubbard <[email protected]> Cc: William Kucharski <[email protected]> Cc: Dan Williams <[email protected]> Cc: Jan Kara <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-07-18Merge 5.19-rc7 into usb-nextGreg Kroah-Hartman28-38/+97
We need the USB fixes in here as well. Signed-off-by: Greg Kroah-Hartman <[email protected]>
2022-07-18dma-buf/dma_resv_usage: update explicit sync documentationChristian König1-3/+13
Make it clear that DMA_RESV_USAGE_BOOKKEEP can be used for explicit synced user space submissions as well and document the rules around adding the same fence with different usages. Signed-off-by: Christian König <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Acked-by: Alex Deucher <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2022-07-18iio: cros: Register FIFO callback after sensor is registeredGwendal Grignou1-2/+5
Instead of registering callback to process sensor events right at initialization time, wait for the sensor to be register in the iio subsystem. Events can come at probe time (in case the kernel rebooted abruptly without switching the sensor off for instance), and be sent to IIO core before the sensor is fully registered. Fixes: aa984f1ba4a4 ("iio: cros_ec: Register to cros_ec_sensorhub when EC supports FIFO") Reported-by: Douglas Anderson <[email protected]> Signed-off-by: Gwendal Grignou <[email protected]> Reviewed-by: Douglas Anderson <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jonathan Cameron <[email protected]>
2022-07-18random: remove CONFIG_ARCH_RANDOMJason A. Donenfeld1-8/+1
When RDRAND was introduced, there was much discussion on whether it should be trusted and how the kernel should handle that. Initially, two mechanisms cropped up, CONFIG_ARCH_RANDOM, a compile time switch, and "nordrand", a boot-time switch. Later the thinking evolved. With a properly designed RNG, using RDRAND values alone won't harm anything, even if the outputs are malicious. Rather, the issue is whether those values are being *trusted* to be good or not. And so a new set of options were introduced as the real ones that people use -- CONFIG_RANDOM_TRUST_CPU and "random.trust_cpu". With these options, RDRAND is used, but it's not always credited. So in the worst case, it does nothing, and in the best case, maybe it helps. Along the way, CONFIG_ARCH_RANDOM's meaning got sort of pulled into the center and became something certain platforms force-select. The old options don't really help with much, and it's a bit odd to have special handling for these instructions when the kernel can deal fine with the existence or untrusted existence or broken existence or non-existence of that CPU capability. Simplify the situation by removing CONFIG_ARCH_RANDOM and using the ordinary asm-generic fallback pattern instead, keeping the two options that are actually used. For now it leaves "nordrand" for now, as the removal of that will take a different route. Acked-by: Michael Ellerman <[email protected]> Acked-by: Catalin Marinas <[email protected]> Acked-by: Borislav Petkov <[email protected]> Acked-by: Heiko Carstens <[email protected]> Acked-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Jason A. Donenfeld <[email protected]>
2022-07-18net: stmmac: switch to use interrupt for hw crosstimestampingWong Vee Khee1-0/+1
Using current implementation of polling mode, there is high chances we will hit into timeout error when running phc2sys. Hence, update the implementation of hardware crosstimestamping to use the MAC interrupt service routine instead of polling for TSIS bit in the MAC Timestamp Interrupt Status register to be set. Cc: Richard Cochran <[email protected]> Signed-off-by: Wong Vee Khee <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-07-18swiotlb: move struct io_tlb_slot to swiotlb.cChristoph Hellwig1-5/+1
No need to expose this structure definition in the header. Signed-off-by: Christoph Hellwig <[email protected]>
2022-07-18swiotlb: remove unused fields in io_tlb_memChao Gao1-5/+0
Commit 20347fca71a3 ("swiotlb: split up the global swiotlb lock") splits io_tlb_mem into multiple areas. Each area has its own lock and index. The global ones are not used so remove them. Signed-off-by: Chao Gao <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2022-07-17cpumask: update cpumask_next_wrap() signatureSander Vanheule1-1/+1
The extern specifier is not needed for this declaration, so drop it. The function also depends only on the input parameters, and has no side effects, so it can be marked __pure like other functions in cpumask.h. Link: https://lkml.kernel.org/r/72ab755695b74bb5fbaa756ae4c0edd708d172f1.1656777646.git.sander@svanheule.net Signed-off-by: Sander Vanheule <[email protected]> Reviewed-by: Andy Shevchenko <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Marco Elver <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Valentin Schneider <[email protected]> Cc: Yury Norov <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-07-17cpumask: Fix invalid uniprocessor mask assumptionSander Vanheule1-80/+19
On uniprocessor builds, any CPU mask is assumed to contain exactly one CPU (cpu0). This assumption ignores the existence of empty masks, resulting in incorrect behaviour. cpumask_first_zero(), cpumask_next_zero(), and for_each_cpu_not() don't provide behaviour matching the assumption that a UP mask is always "1", and instead provide behaviour matching the empty mask. Drop the incorrectly optimised code and use the generic implementations in all cases. Link: https://lkml.kernel.org/r/86bf3f005abba2d92120ddd0809235cab4f759a6.1656777646.git.sander@svanheule.net Signed-off-by: Sander Vanheule <[email protected]> Suggested-by: Yury Norov <[email protected]> Cc: Andy Shevchenko <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Marco Elver <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Valentin Schneider <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-07-17cpumask: add UP optimised for_each_*_cpu versionsSander Vanheule1-0/+7
On uniprocessor builds, the following loops will always run over a mask that contains one enabled CPU (cpu0): - for_each_possible_cpu - for_each_online_cpu - for_each_present_cpu Provide uniprocessor-specific macros for these loops, that always run exactly once. Link: https://lkml.kernel.org/r/3a92869b902a075b97be5d1452c9c6badbbff0df.1656777646.git.sander@svanheule.net Signed-off-by: Sander Vanheule <[email protected]> Acked-by: Yury Norov <[email protected]> Cc: Andy Shevchenko <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Marco Elver <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Valentin Schneider <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-07-17kfifo: fix kfifo_to_user() return typeDan Carpenter1-1/+1
The kfifo_to_user() macro is supposed to return zero for success or negative error codes. Unfortunately, there is a signedness bug so it returns unsigned int. This only affects callers which try to save the result in ssize_t and as far as I can see the only place which does that is line6_hwdep_read(). TL;DR: s/_uint/_int/. Link: https://lkml.kernel.org/r/YrVL3OJVLlNhIMFs@kili Fixes: 144ecf310eb5 ("kfifo: fix kfifo_alloc() to return a signed int value") Signed-off-by: Dan Carpenter <[email protected]> Cc: Stefani Seibold <[email protected]> Cc: Randy Dunlap <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-07-17compiler-gcc.h: remove ancient workaround for gcc PR 58670Uros Bizjak1-11/+0
The workaround for 'asm goto' miscompilation introduces a compiler barrier quirk that inhibits many useful compiler optimizations. For example, __try_cmpxchg_user compiles to: 11375: 41 8b 4d 00 mov 0x0(%r13),%ecx 11379: 41 8b 02 mov (%r10),%eax 1137c: f0 0f b1 0a lock cmpxchg %ecx,(%rdx) 11380: 0f 94 c2 sete %dl 11383: 84 d2 test %dl,%dl 11385: 75 c4 jne 1134b <...> 11387: 41 89 02 mov %eax,(%r10) where the barrier inhibits flags propagation from asm when compiled with gcc-12. When the mentioned quirk is removed, the following code is generated: 11553: 41 8b 4d 00 mov 0x0(%r13),%ecx 11557: 41 8b 02 mov (%r10),%eax 1155a: f0 0f b1 0a lock cmpxchg %ecx,(%rdx) 1155e: 74 c9 je 11529 <...> 11560: 41 89 02 mov %eax,(%r10) The refered compiler bug: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58670 was fixed for gcc-4.8.2. Current minimum required version of GCC is version 5.1 which has the above 'asm goto' miscompilation fixed, so remove the workaround. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Uros Bizjak <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Thomas Gleixner <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-07-17net, lib/once: remove {net_}get_random_once_wait macrowuchi2-4/+0
DO_ONCE(func, ...) will call func with spinlock which acquired by spin_lock_irqsave in __do_once_start. But the get_random_once_wait will sleep in get_random_bytes_wait -> wait_for_random_bytes. Fortunately, there is no place to use {net_}get_random_once_wait, so we could remove them simply. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: wuchi <[email protected]> Acked-by: Jakub Kicinski <[email protected]> Cc: David S. Miller <[email protected]> Cc: Eric Dumazet <[email protected]> Cc: Paolo Abeni <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-07-17writeback: cleanup bdi_sched_wait()Xiu Jianfeng1-6/+0
bdi_sched_wait() is no longer used since commit 839a8e8660b6 ("writeback: replace custom worker pool implementation with unbound workqueue"), so remove it. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Xiu Jianfeng <[email protected]> Reviewed-by: Jan Kara <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]> Acked-by: Jens Axboe <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-07-17mm, hugetlb: skip irrelevant nodes in show_free_areas()Gang Li1-2/+2
show_free_areas() allows to filter out node specific data which is irrelevant to the allocation request. But hugetlb_show_meminfo() still shows hugetlb on all nodes, which is redundant and unnecessary. Use show_mem_node_skip() to skip irrelevant nodes. And replace hugetlb_show_meminfo() with hugetlb_show_meminfo_node(nid). before-and-after sample output of OOM: before: ``` [ 214.362453] Node 1 active_anon:148kB inactive_anon:4050920kB active_file:112kB inactive_file:100kB [ 214.375429] Node 1 Normal free:45100kB boost:0kB min:45576kB low:56968kB high:68360kB reserved_hig [ 214.388334] lowmem_reserve[]: 0 0 0 0 0 [ 214.390251] Node 1 Normal: 423*4kB (UE) 320*8kB (UME) 187*16kB (UE) 117*32kB (UE) 57*64kB (UME) 20 [ 214.397626] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [ 214.401518] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB ``` after: ``` [ 145.069705] Node 1 active_anon:128kB inactive_anon:4049412kB active_file:56kB inactive_file:84kB u [ 145.110319] Node 1 Normal free:45424kB boost:0kB min:45576kB low:56968kB high:68360kB reserved_hig [ 145.152315] lowmem_reserve[]: 0 0 0 0 0 [ 145.155244] Node 1 Normal: 470*4kB (UME) 373*8kB (UME) 247*16kB (UME) 168*32kB (UE) 86*64kB (UME) [ 145.164119] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB ``` Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Gang Li <[email protected]> Reviewed-by: Mike Kravetz <[email protected]> Cc: Muchun Song <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-07-17mm/huge_memory: fix comment of page_deferred_listMiaohe Lin1-2/+2
The current comment is confusing because if global or memcg deferred list in the second tail page is occupied by compound_head, why we still use page[2].deferred_list here? I think it wants to say that Global or memcg deferred list in the first tail page is occupied by compound_mapcount and compound_pincount so we use the second tail page's deferred_list instead. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Miaohe Lin <[email protected]> Reviewed-by: Muchun Song <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Yang Shi <[email protected]> Cc: Zach O'Keefe <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-07-17mm/huge_memory: check pmd_present first in is_huge_zero_pmdMiaohe Lin1-1/+1
When pmd is non-present, pmd_pfn returns an insane value. So we should check pmd_present first to avoid acquiring such insane value and also avoid touching possible cold huge_zero_pfn cache line when pmd isn't present. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Miaohe Lin <[email protected]> Reviewed-by: Muchun Song <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Yang Shi <[email protected]> Cc: Zach O'Keefe <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-07-17mm/mmap: drop ARCH_HAS_VM_GET_PAGE_PROTAnshuman Khandual1-3/+0
Now all the platforms enable ARCH_HAS_GET_PAGE_PROT. They define and export own vm_get_page_prot() whether custom or standard DECLARE_VM_GET_PAGE_PROT. Hence there is no need for default generic fallback for vm_get_page_prot(). Just drop this fallback and also ARCH_HAS_GET_PAGE_PROT mechanism. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Anshuman Khandual <[email protected]> Reviewed-by: Geert Uytterhoeven <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Christophe Leroy <[email protected]> Acked-by: Geert Uytterhoeven <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Brian Cain <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Chris Zankel <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Dinh Nguyen <[email protected]> Cc: Guo Ren <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Huacai Chen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: "James E.J. Bottomley" <[email protected]> Cc: Jeff Dike <[email protected]> Cc: Jonas Bonn <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Michal Simek <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: Richard Henderson <[email protected]> Cc: Rich Felker <[email protected]> Cc: Russell King <[email protected]> Cc: Sam Ravnborg <[email protected]> Cc: Stafford Horne <[email protected]> Cc: Thomas Bogendoerfer <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vasily Gorbik <[email protected]> Cc: Vineet Gupta <[email protected]> Cc: WANG Xuerui <[email protected]> Cc: Will Deacon <[email protected]> Cc: Yoshinori Sato <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-07-17mm/mmap: build protect protection_map[] with ARCH_HAS_VM_GET_PAGE_PROTAnshuman Khandual1-1/+1
Now that protection_map[] has been moved inside those platforms that enable ARCH_HAS_VM_GET_PAGE_PROT. Hence generic protection_map[] array now can be protected with CONFIG_ARCH_HAS_VM_GET_PAGE_PROT intead of __P000. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Anshuman Khandual <[email protected]> Reviewed-by: Christophe Leroy <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Brian Cain <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Chris Zankel <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Dinh Nguyen <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Guo Ren <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Huacai Chen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: "James E.J. Bottomley" <[email protected]> Cc: Jeff Dike <[email protected]> Cc: Jonas Bonn <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Michal Simek <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: Richard Henderson <[email protected]> Cc: Rich Felker <[email protected]> Cc: Russell King <[email protected]> Cc: Sam Ravnborg <[email protected]> Cc: Stafford Horne <[email protected]> Cc: Thomas Bogendoerfer <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vasily Gorbik <[email protected]> Cc: Vineet Gupta <[email protected]> Cc: WANG Xuerui <[email protected]> Cc: Will Deacon <[email protected]> Cc: Yoshinori Sato <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-07-17mm/mmap: define DECLARE_VM_GET_PAGE_PROTAnshuman Khandual1-0/+28
This just converts the generic vm_get_page_prot() implementation into a new macro i.e DECLARE_VM_GET_PAGE_PROT which later can be used across platforms when enabling them with ARCH_HAS_VM_GET_PAGE_PROT. This does not create any functional change. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Anshuman Khandual <[email protected]> Reviewed-by: Christophe Leroy <[email protected]> Suggested-by: Christoph Hellwig <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Brian Cain <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Chris Zankel <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Dinh Nguyen <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Guo Ren <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Huacai Chen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: "James E.J. Bottomley" <[email protected]> Cc: Jeff Dike <[email protected]> Cc: Jonas Bonn <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Michal Simek <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: Richard Henderson <[email protected]> Cc: Rich Felker <[email protected]> Cc: Russell King <[email protected]> Cc: Sam Ravnborg <[email protected]> Cc: Stafford Horne <[email protected]> Cc: Thomas Bogendoerfer <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vasily Gorbik <[email protected]> Cc: Vineet Gupta <[email protected]> Cc: WANG Xuerui <[email protected]> Cc: Will Deacon <[email protected]> Cc: Yoshinori Sato <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-07-17mm/mmap: build protect protection_map[] with __P000Anshuman Khandual1-0/+2
Patch series "mm/mmap: Drop __SXXX/__PXXX macros from across platforms", v7. __SXXX/__PXXX macros are unnecessary abstraction layer in creating the generic protection_map[] array which is used for vm_get_page_prot(). This abstraction layer can be avoided, if the platforms just define the array protection_map[] for all possible vm_flags access permission combinations and also export vm_get_page_prot() implementation. This series drops __SXXX/__PXXX macros from across platforms in the tree. First it build protects generic protection_map[] array with '#ifdef __P000' and moves it inside platforms which enable ARCH_HAS_VM_GET_PAGE_PROT. Later this build protects same array with '#ifdef ARCH_HAS_VM_GET_PAGE_PROT' and moves inside remaining platforms while enabling ARCH_HAS_VM_GET_PAGE_PROT. This adds a new macro DECLARE_VM_GET_PAGE_PROT defining the current generic vm_get_page_prot(), in order for it to be reused on platforms that do not require custom implementation. Finally, ARCH_HAS_VM_GET_PAGE_PROT can just be dropped, as all platforms now define and export vm_get_page_prot(), via looking up a private and static protection_map[] array. protection_map[] data type has been changed as 'static const' on all platforms that do not change it during boot. This patch (of 26): Build protect generic protection_map[] array with __P000, so that it can be moved inside all the platforms one after the other. Otherwise there will be build failures during this process. CONFIG_ARCH_HAS_VM_GET_PAGE_PROT cannot be used for this purpose as only certain platforms enable this config now. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Anshuman Khandual <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Christophe Leroy <[email protected]> Suggested-by: Christophe Leroy <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Brian Cain <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Chris Zankel <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Dinh Nguyen <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Guo Ren <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Huacai Chen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: "James E.J. Bottomley" <[email protected]> Cc: Jeff Dike <[email protected]> Cc: Jonas Bonn <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Michal Simek <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: Richard Henderson <[email protected]> Cc: Rich Felker <[email protected]> Cc: Russell King <[email protected]> Cc: Sam Ravnborg <[email protected]> Cc: Stafford Horne <[email protected]> Cc: Thomas Bogendoerfer <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vasily Gorbik <[email protected]> Cc: Vineet Gupta <[email protected]> Cc: WANG Xuerui <[email protected]> Cc: Will Deacon <[email protected]> Cc: Yoshinori Sato <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-07-17mm/page_alloc: protect PCP lists with a spinlockMel Gorman1-0/+1
Currently the PCP lists are protected by using local_lock_irqsave to prevent migration and IRQ reentrancy but this is inconvenient. Remote draining of the lists is impossible and a workqueue is required and every task allocation/free must disable then enable interrupts which is expensive. As preparation for dealing with both of those problems, protect the lists with a spinlock. The IRQ-unsafe version of the lock is used because IRQs are already disabled by local_lock_irqsave. spin_trylock is used in combination with local_lock_irqsave() but later will be replaced with a spin_trylock_irqsave when the local_lock is removed. The per_cpu_pages still fits within the same number of cache lines after this patch relative to before the series. struct per_cpu_pages { spinlock_t lock; /* 0 4 */ int count; /* 4 4 */ int high; /* 8 4 */ int batch; /* 12 4 */ short int free_factor; /* 16 2 */ short int expire; /* 18 2 */ /* XXX 4 bytes hole, try to pack */ struct list_head lists[13]; /* 24 208 */ /* size: 256, cachelines: 4, members: 7 */ /* sum members: 228, holes: 1, sum holes: 4 */ /* padding: 24 */ } __attribute__((__aligned__(64))); There is overhead in the fast path due to acquiring the spinlock even though the spinlock is per-cpu and uncontended in the common case. Page Fault Test (PFT) running on a 1-socket reported the following results on a 1 socket machine. 5.19.0-rc3 5.19.0-rc3 vanilla mm-pcpspinirq-v5r16 Hmean faults/sec-1 869275.7381 ( 0.00%) 874597.5167 * 0.61%* Hmean faults/sec-3 2370266.6681 ( 0.00%) 2379802.0362 * 0.40%* Hmean faults/sec-5 2701099.7019 ( 0.00%) 2664889.7003 * -1.34%* Hmean faults/sec-7 3517170.9157 ( 0.00%) 3491122.8242 * -0.74%* Hmean faults/sec-8 3965729.6187 ( 0.00%) 3939727.0243 * -0.66%* There is a small hit in the number of faults per second but given that the results are more stable, it's borderline noise. [[email protected]: add missing local_unlock_irqrestore() on contention path] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Mel Gorman <[email protected]> Tested-by: Yu Zhao <[email protected]> Reviewed-by: Nicolas Saenz Julienne <[email protected]> Tested-by: Nicolas Saenz Julienne <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Marcelo Tosatti <[email protected]> Cc: Marek Szyprowski <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Minchan Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-07-17mm/page_alloc: use only one PCP list for THP-sized allocationsMel Gorman1-4/+7
The per_cpu_pages is cache-aligned on a standard x86-64 distribution configuration but a later patch will add a new field which would push the structure into the next cache line. Use only one list to store THP-sized pages on the per-cpu list. This assumes that the vast majority of THP-sized allocations are GFP_MOVABLE but even if it was another type, it would not contribute to serious fragmentation that potentially causes a later THP allocation failure. Align per_cpu_pages on the cacheline boundary to ensure there is no false cache sharing. After this patch, the structure sizing is; struct per_cpu_pages { int count; /* 0 4 */ int high; /* 4 4 */ int batch; /* 8 4 */ short int free_factor; /* 12 2 */ short int expire; /* 14 2 */ struct list_head lists[13]; /* 16 208 */ /* size: 256, cachelines: 4, members: 6 */ /* padding: 32 */ } __attribute__((__aligned__(64))); Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Mel Gorman <[email protected]> Tested-by: Minchan Kim <[email protected]> Acked-by: Minchan Kim <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Tested-by: Yu Zhao <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Marcelo Tosatti <[email protected]> Cc: Marek Szyprowski <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Nicolas Saenz Julienne <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-07-17mm/page_alloc: add page->buddy_list and page->pcp_listMel Gorman1-0/+5
Patch series "Drain remote per-cpu directly", v5. Some setups, notably NOHZ_FULL CPUs, may be running realtime or latency-sensitive applications that cannot tolerate interference due to per-cpu drain work queued by __drain_all_pages(). Introduce a new mechanism to remotely drain the per-cpu lists. It is made possible by remotely locking 'struct per_cpu_pages' new per-cpu spinlocks. This has two advantages, the time to drain is more predictable and other unrelated tasks are not interrupted. This series has the same intent as Nicolas' series "mm/page_alloc: Remote per-cpu lists drain support" -- avoid interference of a high priority task due to a workqueue item draining per-cpu page lists. While many workloads can tolerate a brief interruption, it may cause a real-time task running on a NOHZ_FULL CPU to miss a deadline and at minimum, the draining is non-deterministic. Currently an IRQ-safe local_lock protects the page allocator per-cpu lists. The local_lock on its own prevents migration and the IRQ disabling protects from corruption due to an interrupt arriving while a page allocation is in progress. This series adjusts the locking. A spinlock is added to struct per_cpu_pages to protect the list contents while local_lock_irq is ultimately replaced by just the spinlock in the final patch. This allows a remote CPU to safely. Follow-on work should allow the spin_lock_irqsave to be converted to spin_lock to avoid IRQs being disabled/enabled in most cases. The follow-on patch will be one kernel release later as it is relatively high risk and it'll make bisections more clear if there are any problems. Patch 1 is a cosmetic patch to clarify when page->lru is storing buddy pages and when it is storing per-cpu pages. Patch 2 shrinks per_cpu_pages to make room for a spin lock. Strictly speaking this is not necessary but it avoids per_cpu_pages consuming another cache line. Patch 3 is a preparation patch to avoid code duplication. Patch 4 is a minor correction. Patch 5 uses a spin_lock to protect the per_cpu_pages contents while still relying on local_lock to prevent migration, stabilise the pcp lookup and prevent IRQ reentrancy. Patch 6 remote drains per-cpu pages directly instead of using a workqueue. Patch 7 uses a normal spinlock instead of local_lock for remote draining This patch (of 7): The page allocator uses page->lru for storing pages on either buddy or PCP lists. Create page->buddy_list and page->pcp_list as a union with page->lru. This is simply to clarify what type of list a page is on in the page allocator. No functional change intended. [[email protected]: fix page lru fields in macros] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Mel Gorman <[email protected]> Tested-by: Minchan Kim <[email protected]> Acked-by: Minchan Kim <[email protected]> Reviewed-by: Nicolas Saenz Julienne <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Tested-by: Yu Zhao <[email protected]> Cc: Marcelo Tosatti <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Marek Szyprowski <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-07-17hugetlb: do not update address in huge_pmd_unshareMike Kravetz1-2/+2
As an optimization for loops sequentially processing hugetlb address ranges, huge_pmd_unshare would update a passed address if it unshared a pmd. Updating a loop control variable outside the loop like this is generally a bad idea. These loops are now using hugetlb_mask_last_page to optimize scanning when non-present ptes are discovered. The same can be done when huge_pmd_unshare returns 1 indicating a pmd was unshared. Remove address update from huge_pmd_unshare. Change the passed argument type and update all callers. In loops sequentially processing addresses use hugetlb_mask_last_page to update address if pmd is unshared. [[email protected]: fix an unused variable warning/error] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Mike Kravetz <[email protected]> Signed-off-by: Stephen Rothwell <[email protected]> Acked-by: Muchun Song <[email protected]> Reviewed-by: Baolin Wang <[email protected]> Cc: "Aneesh Kumar K.V" <[email protected]> Cc: Anshuman Khandual <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: James Houghton <[email protected]> Cc: kernel test robot <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Mina Almasry <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: Peter Xu <[email protected]> Cc: Rolf Eike Beer <[email protected]> Cc: Will Deacon <[email protected]> Cc: Stephen Rothwell <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-07-17hugetlb: skip to end of PT page mapping when pte not presentMike Kravetz1-0/+1
Patch series "hugetlb: speed up linear address scanning", v2. At unmap, fork and remap time hugetlb address ranges are linearly scanned. We can optimize these scans if the ranges are sparsely populated. Also, enable page table "Lazy copy" for hugetlb at fork. NOTE: Architectures not defining CONFIG_ARCH_WANT_GENERAL_HUGETLB need to add an arch specific version hugetlb_mask_last_page() to take advantage of sparse address scanning improvements. Baolin Wang added the routine for arm64. Other architectures which could be optimized are: ia64, mips, parisc, powerpc, s390, sh and sparc. This patch (of 4): HugeTLB address ranges are linearly scanned during fork, unmap and remap operations. If a non-present entry is encountered, the code currently continues to the next huge page aligned address. However, a non-present entry implies that the page table page for that entry is not present. Therefore, the linear scan can skip to the end of range mapped by the page table page. This can speed operations on large sparsely populated hugetlb mappings. Create a new routine hugetlb_mask_last_page() that will return an address mask. When the mask is ORed with an address, the result will be the address of the last huge page mapped by the associated page table page. Use this mask to update addresses in routines which linearly scan hugetlb address ranges when a non-present pte is encountered. hugetlb_mask_last_page is related to the implementation of huge_pte_offset as hugetlb_mask_last_page is called when huge_pte_offset returns NULL. This patch only provides a complete hugetlb_mask_last_page implementation when CONFIG_ARCH_WANT_GENERAL_HUGETLB is defined. Architectures which provide their own versions of huge_pte_offset can also provide their own version of hugetlb_mask_last_page. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Mike Kravetz <[email protected]> Tested-by: Baolin Wang <[email protected]> Reviewed-by: Baolin Wang <[email protected]> Acked-by: Muchun Song <[email protected]> Reported-by: kernel test robot <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Peter Xu <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: James Houghton <[email protected]> Cc: Mina Almasry <[email protected]> Cc: "Aneesh Kumar K.V" <[email protected]> Cc: Anshuman Khandual <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Will Deacon <[email protected]> Cc: Rolf Eike Beer <[email protected]> Cc: David Hildenbrand <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-07-17mm: khugepaged: reorg some khugepaged helpersYang Shi2-14/+8
The khugepaged_{enabled|always|req_madv} are not khugepaged only anymore, move them to huge_mm.h and rename to hugepage_flags_xxx, and remove khugepaged_req_madv due to no users. Also move khugepaged_defrag to khugepaged.c since its only caller is in that file, it doesn't have to be in a header file. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Yang Shi <[email protected]> Reviewed-by: Zach O'Keefe <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Miaohe Lin <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-07-17mm: thp: kill __transhuge_page_enabled()Yang Shi1-55/+2
The page fault path checks THP eligibility with __transhuge_page_enabled() which does the similar thing as hugepage_vma_check(), so use hugepage_vma_check() instead. However page fault allows DAX and !anon_vma cases, so added a new flag, in_pf, to hugepage_vma_check() to make page fault work correctly. The in_pf flag is also used to skip shmem and file THP for page fault since shmem handles THP in its own shmem_fault() and file THP allocation on fault is not supported yet. Also remove hugepage_vma_enabled() since hugepage_vma_check() is the only caller now, it is not necessary to have a helper function. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Yang Shi <[email protected]> Reviewed-by: Zach O'Keefe <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Miaohe Lin <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-07-17mm: thp: kill transparent_hugepage_active()Yang Shi2-8/+10
The transparent_hugepage_active() was introduced to show THP eligibility bit in smaps in proc, smaps is the only user. But it actually does the similar check as hugepage_vma_check() which is used by khugepaged. We definitely don't have to maintain two similar checks, so kill transparent_hugepage_active(). This patch also fixed the wrong behavior for VM_NO_KHUGEPAGED vmas. Also move hugepage_vma_check() to huge_memory.c and huge_mm.h since it is not only for khugepaged anymore. [[email protected]: check vma->vm_mm, per Zach] [[email protected]: add comment to vdso check] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Yang Shi <[email protected]> Reviewed-by: Zach O'Keefe <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Miaohe Lin <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-07-17mm: thp: consolidate vma size check to transhuge_vma_suitableYang Shi2-14/+11
There are couple of places that check whether the vma size is ok for THP or whether address fits, they are open coded and duplicate, use transhuge_vma_suitable() to do the job by passing in (vma->end - HPAGE_PMD_SIZE). Move vma size check into hugepage_vma_check(). This will make khugepaged_enter() is as same as khugepaged_enter_vma(). There is just one caller for khugepaged_enter(), replace it to khugepaged_enter_vma() and remove khugepaged_enter(). Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Yang Shi <[email protected]> Reviewed-by: Zach O'Keefe <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Miaohe Lin <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-07-17fsdax: dedup file range to use a compare functionShiyang Ruan2-4/+16
With dax we cannot deal with readpage() etc. So, we create a dax comparison function which is similar with vfs_dedupe_file_range_compare(). And introduce dax_remap_file_range_prep() for filesystem use. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Goldwyn Rodrigues <[email protected]> Signed-off-by: Shiyang Ruan <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Cc: Al Viro <[email protected]> Cc: Dan Williams <[email protected]> Cc: Dan Williams <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Goldwyn Rodrigues <[email protected]> Cc: Jane Chu <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Miaohe Lin <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: Ritesh Harjani <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-07-17fsdax: set a CoW flag when associate reflink mappingsShiyang Ruan1-0/+6
Introduce a PAGE_MAPPING_DAX_COW flag to support association with CoW file mappings. In this case, since the dax-rmap has already took the responsibility to look up for shared files by given dax page, the page->mapping is no longer to used for rmap but for marking that this dax page is shared. And to make sure disassociation works fine, we use page->index as refcount, and clear page->mapping to the initial state when page->index is decreased to 0. With the help of this new flag, it is able to distinguish normal case and CoW case, and keep the warning in normal case. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Shiyang Ruan <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Cc: Al Viro <[email protected]> Cc: Dan Williams <[email protected]> Cc: Dan Williams <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Goldwyn Rodrigues <[email protected]> Cc: Goldwyn Rodrigues <[email protected]> Cc: Jane Chu <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Miaohe Lin <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: Ritesh Harjani <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-07-17mm: introduce mf_dax_kill_procs() for fsdax caseShiyang Ruan1-0/+2
This new function is a variant of mf_generic_kill_procs that accepts a file, offset pair instead of a struct to support multiple files sharing a DAX mapping. It is intended to be called by the file systems as part of the memory_failure handler after the file system performed a reverse mapping from the storage address to the file and file offset. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Shiyang Ruan <[email protected]> Reviewed-by: Dan Williams <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Reviewed-by: Miaohe Lin <[email protected]> Cc: Al Viro <[email protected]> Cc: Dan Williams <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Goldwyn Rodrigues <[email protected]> Cc: Goldwyn Rodrigues <[email protected]> Cc: Jane Chu <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: Ritesh Harjani <[email protected]> Signed-off-by: Andrew Morton <[email protected]>