aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorDan Williams <[email protected]>2016-01-15 16:55:49 -0800
committerLinus Torvalds <[email protected]>2016-01-15 17:56:32 -0800
commit52db400fcd50216dd8511a0880b936235755836f (patch)
tree3446f0bb96bf1b2532d38d86403d2b59895f7e93
parentbd56086f10186e2c205429cc12b16e43aacb1c7e (diff)
pmem, dax: clean up clear_pmem()
To date, we have implemented two I/O usage models for persistent memory, PMEM (a persistent "ram disk") and DAX (mmap persistent memory into userspace). This series adds a third, DAX-GUP, that allows DAX mappings to be the target of direct-i/o. It allows userspace to coordinate DMA/RDMA from/to persistent memory. The implementation leverages the ZONE_DEVICE mm-zone that went into 4.3-rc1 (also discussed at kernel summit) to flag pages that are owned and dynamically mapped by a device driver. The pmem driver, after mapping a persistent memory range into the system memmap via devm_memremap_pages(), arranges for DAX to distinguish pfn-only versus page-backed pmem-pfns via flags in the new pfn_t type. The DAX code, upon seeing a PFN_DEV+PFN_MAP flagged pfn, flags the resulting pte(s) inserted into the process page tables with a new _PAGE_DEVMAP flag. Later, when get_user_pages() is walking ptes it keys off _PAGE_DEVMAP to pin the device hosting the page range active. Finally, get_page() and put_page() are modified to take references against the device driver established page mapping. Finally, this need for "struct page" for persistent memory requires memory capacity to store the memmap array. Given the memmap array for a large pool of persistent may exhaust available DRAM introduce a mechanism to allocate the memmap from persistent memory. The new "struct vmem_altmap *" parameter to devm_memremap_pages() enables arch_add_memory() to use reserved pmem capacity rather than the page allocator. This patch (of 25): Both __dax_pmd_fault, and clear_pmem() were taking special steps to clear memory a page at a time to take advantage of non-temporal clear_page() implementations. However, x86_64 does not use non-temporal instructions for clear_page(), and arch_clear_pmem() was always incurring the cost of __arch_wb_cache_pmem(). Clean up the assumption that doing clear_pmem() a page at a time is more performant. Signed-off-by: Dan Williams <[email protected]> Reported-by: Dave Hansen <[email protected]> Reviewed-by: Ross Zwisler <[email protected]> Reviewed-by: Jeff Moyer <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Alexander Viro <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Christoffer Dall <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Dave Chinner <[email protected]> Cc: David Airlie <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jan Kara <[email protected]> Cc: Jeff Dike <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Logan Gunthorpe <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Paolo Bonzini <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Richard Weinberger <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Toshi Kani <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
-rw-r--r--arch/x86/include/asm/pmem.h7
-rw-r--r--fs/dax.c4
2 files changed, 2 insertions, 9 deletions
diff --git a/arch/x86/include/asm/pmem.h b/arch/x86/include/asm/pmem.h
index d8ce3ec816ab..1544fabcd7f9 100644
--- a/arch/x86/include/asm/pmem.h
+++ b/arch/x86/include/asm/pmem.h
@@ -132,12 +132,7 @@ static inline void arch_clear_pmem(void __pmem *addr, size_t size)
{
void *vaddr = (void __force *)addr;
- /* TODO: implement the zeroing via non-temporal writes */
- if (size == PAGE_SIZE && ((unsigned long)vaddr & ~PAGE_MASK) == 0)
- clear_page(vaddr);
- else
- memset(vaddr, 0, size);
-
+ memset(vaddr, 0, size);
__arch_wb_cache_pmem(vaddr, size);
}
diff --git a/fs/dax.c b/fs/dax.c
index 43671b68220e..19492cc65a30 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -641,9 +641,7 @@ int __dax_pmd_fault(struct vm_area_struct *vma, unsigned long address,
goto fallback;
if (buffer_unwritten(&bh) || buffer_new(&bh)) {
- int i;
- for (i = 0; i < PTRS_PER_PMD; i++)
- clear_pmem(kaddr + i * PAGE_SIZE, PAGE_SIZE);
+ clear_pmem(kaddr, PMD_SIZE);
wmb_pmem();
count_vm_event(PGMAJFAULT);
mem_cgroup_count_vm_event(vma->vm_mm, PGMAJFAULT);