aboutsummaryrefslogtreecommitdiff
path: root/drivers/xen/gntdev.c
AgeCommit message (Collapse)AuthorFilesLines
2019-05-14xen/gntdev.c: convert to use vm_map_pages()Souptick Joarder1-7/+4
Convert to use vm_map_pages() to map range of kernel memory to user vma. map->count is passed to vm_map_pages() and internal API verify map->count against count ( count = vma_pages(vma)) for page array boundary overrun condition. Link: http://lkml.kernel.org/r/88e56e82d2db98705c2d842e9c9806c00b366d67.1552921225.git.jrdr.linux@gmail.com Signed-off-by: Souptick Joarder <[email protected]> Reviewed-by: Boris Ostrovsky <[email protected]> Cc: David Airlie <[email protected]> Cc: Heiko Stuebner <[email protected]> Cc: Joerg Roedel <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Juergen Gross <[email protected]> Cc: Kees Cook <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Cc: Kyungmin Park <[email protected]> Cc: Marek Szyprowski <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Mauro Carvalho Chehab <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Oleksandr Andrushchenko <[email protected]> Cc: Pawel Osciak <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Robin Murphy <[email protected]> Cc: Russell King <[email protected]> Cc: Sandy Huang <[email protected]> Cc: Stefan Richter <[email protected]> Cc: Stephen Rothwell <[email protected]> Cc: Thierry Reding <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-05-14mm/mmu_notifier: convert user range->blockable to helper functionJérôme Glisse1-3/+3
Use the mmu_notifier_range_blockable() helper function instead of directly dereferencing the range->blockable field. This is done to make it easier to change the mmu_notifier range field. This patch is the outcome of the following coccinelle patch: %<------------------------------------------------------------------- @@ identifier I1, FN; @@ FN(..., struct mmu_notifier_range *I1, ...) { <... -I1->blockable +mmu_notifier_range_blockable(I1) ...> } ------------------------------------------------------------------->% spatch --in-place --sp-file blockable.spatch --dir . Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Jérôme Glisse <[email protected]> Reviewed-by: Ralph Campbell <[email protected]> Reviewed-by: Ira Weiny <[email protected]> Cc: Christian König <[email protected]> Cc: Joonas Lahtinen <[email protected]> Cc: Jani Nikula <[email protected]> Cc: Rodrigo Vivi <[email protected]> Cc: Jan Kara <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Peter Xu <[email protected]> Cc: Felix Kuehling <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Ross Zwisler <[email protected]> Cc: Dan Williams <[email protected]> Cc: Paolo Bonzini <[email protected]> Cc: Radim Krcmar <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Christian Koenig <[email protected]> Cc: John Hubbard <[email protected]> Cc: Arnd Bergmann <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-05-14mm/gup: change GUP fast to use flags rather than a write 'bool'Ira Weiny1-1/+1
To facilitate additional options to get_user_pages_fast() change the singular write parameter to be gup_flags. This patch does not change any functionality. New functionality will follow in subsequent patches. Some of the get_user_pages_fast() call sites were unchanged because they already passed FOLL_WRITE or 0 for the write parameter. NOTE: It was suggested to change the ordering of the get_user_pages_fast() arguments to ensure that callers were converted. This breaks the current GUP call site convention of having the returned pages be the final parameter. So the suggestion was rejected. Link: http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ira Weiny <[email protected]> Reviewed-by: Mike Marshall <[email protected]> Cc: Aneesh Kumar K.V <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Dan Williams <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Hogan <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: John Hubbard <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Cc: Martin Schwidefsky <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: Rich Felker <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Yoshinori Sato <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-02-18xen/gntdev: Do not destroy context while dma-bufs are in useOleksandr Andrushchenko1-1/+1
If there are exported DMA buffers which are still in use and grant device is closed by either normal user-space close or by a signal this leads to the grant device context to be destroyed, thus making it not possible to correctly destroy those exported buffers when they are returned back to gntdev and makes the module crash: [ 339.617540] [<ffff00000854c0d8>] dmabuf_exp_ops_release+0x40/0xa8 [ 339.617560] [<ffff00000867a6e8>] dma_buf_release+0x60/0x190 [ 339.617577] [<ffff0000082211f0>] __fput+0x88/0x1d0 [ 339.617589] [<ffff000008221394>] ____fput+0xc/0x18 [ 339.617607] [<ffff0000080ed4e4>] task_work_run+0x9c/0xc0 [ 339.617622] [<ffff000008089714>] do_notify_resume+0xfc/0x108 Fix this by referencing gntdev on each DMA buffer export and unreferencing on buffer release. Signed-off-by: Oleksandr Andrushchenko <[email protected]> Reviewed-by: Boris [email protected]> Signed-off-by: Juergen Gross <[email protected]>
2018-12-28mm/mmu_notifier: use structure for invalidate_range_start/end callbackJérôme Glisse1-6/+6
Patch series "mmu notifier contextual informations", v2. This patchset adds contextual information, why an invalidation is happening, to mmu notifier callback. This is necessary for user of mmu notifier that wish to maintains their own data structure without having to add new fields to struct vm_area_struct (vma). For instance device can have they own page table that mirror the process address space. When a vma is unmap (munmap() syscall) the device driver can free the device page table for the range. Today we do not have any information on why a mmu notifier call back is happening and thus device driver have to assume that it is always an munmap(). This is inefficient at it means that it needs to re-allocate device page table on next page fault and rebuild the whole device driver data structure for the range. Other use case beside munmap() also exist, for instance it is pointless for device driver to invalidate the device page table when the invalidation is for the soft dirtyness tracking. Or device driver can optimize away mprotect() that change the page table permission access for the range. This patchset enables all this optimizations for device drivers. I do not include any of those in this series but another patchset I am posting will leverage this. The patchset is pretty simple from a code point of view. The first two patches consolidate all mmu notifier arguments into a struct so that it is easier to add/change arguments. The last patch adds the contextual information (munmap, protection, soft dirty, clear, ...). This patch (of 3): To avoid having to change many callback definition everytime we want to add a parameter use a structure to group all parameters for the mmu_notifier invalidate_range_start/end callback. No functional changes with this patch. [[email protected]: fix drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c kerneldoc] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Jérôme Glisse <[email protected]> Acked-by: Jan Kara <[email protected]> Acked-by: Jason Gunthorpe <[email protected]> [infiniband] Cc: Matthew Wilcox <[email protected]> Cc: Ross Zwisler <[email protected]> Cc: Dan Williams <[email protected]> Cc: Paolo Bonzini <[email protected]> Cc: Radim Krcmar <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Christian Koenig <[email protected]> Cc: Felix Kuehling <[email protected]> Cc: Ralph Campbell <[email protected]> Cc: John Hubbard <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-09-14xen/gntdev: fix up blockable calls to mn_invl_range_startMichal Hocko1-11/+15
Patch series "mmu_notifiers follow ups". Tetsuo has noticed some fallouts from 93065ac753e4 ("mm, oom: distinguish blockable mode for mmu notifiers"). One of them has been fixed and picked up by AMD/DRM maintainer [1]. XEN issue is fixed by patch 1. I have also clarified expectations about blockable semantic of invalidate_range_end. Finally the last patch removes MMU_INVALIDATE_DOES_NOT_BLOCK which is no longer used nor needed. [1] http://lkml.kernel.org/r/[email protected] This patch (of 3): 93065ac753e4 ("mm, oom: distinguish blockable mode for mmu notifiers") has introduced blockable parameter to all mmu_notifiers and the notifier has to back off when called in !blockable case and it could block down the road. The above commit implemented that for mn_invl_range_start but both in_range checks are done unconditionally regardless of the blockable mode and as such they would fail all the time for regular calls. Fix this by checking blockable parameter as well. Once we are there we can remove the stale TODO. The lock has to be sleepable because we wait for completion down in gnttab_unmap_refs_sync. Link: http://lkml.kernel.org/r/[email protected] Fixes: 93065ac753e4 ("mm, oom: distinguish blockable mode for mmu notifiers") Signed-off-by: Michal Hocko <[email protected]> Cc: Boris Ostrovsky <[email protected]> Cc: Juergen Gross <[email protected]> Cc: David Rientjes <[email protected]> Cc: Jerome Glisse <[email protected]> Cc: Tetsuo Handa <[email protected]> Reviewed-by: Juergen Gross <[email protected]> Signed-off-by: Boris Ostrovsky <[email protected]>
2018-08-22mm, oom: distinguish blockable mode for mmu notifiersMichal Hocko1-9/+35
There are several blockable mmu notifiers which might sleep in mmu_notifier_invalidate_range_start and that is a problem for the oom_reaper because it needs to guarantee a forward progress so it cannot depend on any sleepable locks. Currently we simply back off and mark an oom victim with blockable mmu notifiers as done after a short sleep. That can result in selecting a new oom victim prematurely because the previous one still hasn't torn its memory down yet. We can do much better though. Even if mmu notifiers use sleepable locks there is no reason to automatically assume those locks are held. Moreover majority of notifiers only care about a portion of the address space and there is absolutely zero reason to fail when we are unmapping an unrelated range. Many notifiers do really block and wait for HW which is harder to handle and we have to bail out though. This patch handles the low hanging fruit. __mmu_notifier_invalidate_range_start gets a blockable flag and callbacks are not allowed to sleep if the flag is set to false. This is achieved by using trylock instead of the sleepable lock for most callbacks and continue as long as we do not block down the call chain. I think we can improve that even further because there is a common pattern to do a range lookup first and then do something about that. The first part can be done without a sleeping lock in most cases AFAICS. The oom_reaper end then simply retries if there is at least one notifier which couldn't make any progress in !blockable mode. A retry loop is already implemented to wait for the mmap_sem and this is basically the same thing. The simplest way for driver developers to test this code path is to wrap userspace code which uses these notifiers into a memcg and set the hard limit to hit the oom. This can be done e.g. after the test faults in all the mmu notifier managed memory and set the hard limit to something really small. Then we are looking for a proper process tear down. [[email protected]: coding style fixes] [[email protected]: minor code simplification] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Michal Hocko <[email protected]> Acked-by: Christian König <[email protected]> # AMD notifiers Acked-by: Leon Romanovsky <[email protected]> # mlx and umem_odp Reported-by: David Rientjes <[email protected]> Cc: "David (ChunMing) Zhou" <[email protected]> Cc: Paolo Bonzini <[email protected]> Cc: Alex Deucher <[email protected]> Cc: David Airlie <[email protected]> Cc: Jani Nikula <[email protected]> Cc: Joonas Lahtinen <[email protected]> Cc: Rodrigo Vivi <[email protected]> Cc: Doug Ledford <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Mike Marciniszyn <[email protected]> Cc: Dennis Dalessandro <[email protected]> Cc: Sudeep Dutt <[email protected]> Cc: Ashutosh Dixit <[email protected]> Cc: Dimitri Sivanich <[email protected]> Cc: Boris Ostrovsky <[email protected]> Cc: Juergen Gross <[email protected]> Cc: "Jérôme Glisse" <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Felix Kuehling <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-07-26xen/gntdev: Add initial support for dma-buf UAPIOleksandr Andrushchenko1-0/+31
Add UAPI and IOCTLs for dma-buf grant device driver extension: the extension allows userspace processes and kernel modules to use Xen backed dma-buf implementation. With this extension grant references to the pages of an imported dma-buf can be exported for other domain use and grant references coming from a foreign domain can be converted into a local dma-buf for local export. Implement basic initialization and stubs for Xen DMA buffers' support. Signed-off-by: Oleksandr Andrushchenko <[email protected]> Reviewed-by: Boris Ostrovsky <[email protected]> Signed-off-by: Boris Ostrovsky <[email protected]>
2018-07-26xen/gntdev: Make private routines/structures accessibleOleksandr Andrushchenko1-91/+43
This is in preparation for adding support of DMA buffer functionality: make map/unmap related code and structures, used privately by gntdev, ready for dma-buf extension, which will re-use these. Rename corresponding structures as those become non-private to gntdev now. Signed-off-by: Oleksandr Andrushchenko <[email protected]> Reviewed-by: Boris Ostrovsky <[email protected]> Signed-off-by: Boris Ostrovsky <[email protected]>
2018-07-26xen/gntdev: Allow mappings for DMA buffersOleksandr Andrushchenko1-2/+97
Allow mappings for DMA backed buffers if grant table module supports such: this extends grant device to not only map buffers made of balloon pages, but also from buffers allocated with dma_alloc_xxx. Signed-off-by: Oleksandr Andrushchenko <[email protected]> Reviewed-by: Boris Ostrovsky <[email protected]> Signed-off-by: Boris Ostrovsky <[email protected]>
2018-01-10xen/gntdev: Fix partial gntdev_mmap() cleanupRoss Lagerwall1-1/+3
When cleaning up after a partially successful gntdev_mmap(), unmap the successfully mapped grant pages otherwise Xen will kill the domain if in debug mode (Attempt to implicitly unmap a granted PTE) or Linux will kill the process and emit "BUG: Bad page map in process" if Xen is in release mode. This is only needed when use_ptemod is true because gntdev_put_map() will unmap grant pages itself when use_ptemod is false. Signed-off-by: Ross Lagerwall <[email protected]> Reviewed-by: Boris Ostrovsky <[email protected]> Signed-off-by: Boris Ostrovsky <[email protected]>
2018-01-10xen/gntdev: Fix off-by-one error when unmapping with holesRoss Lagerwall1-3/+1
If the requested range has a hole, the calculation of the number of pages to unmap is off by one. Fix it. Signed-off-by: Ross Lagerwall <[email protected]> Reviewed-by: Boris Ostrovsky <[email protected]> Signed-off-by: Boris Ostrovsky <[email protected]>
2017-10-25xen/gntdev: avoid out of bounds access in case of partial gntdev_mmap()Juergen Gross1-1/+1
In case gntdev_mmap() succeeds only partially in mapping grant pages it will leave some vital information uninitialized needed later for cleanup. This will lead to an out of bounds array access when unmapping the already mapped pages. So just initialize the data needed for unmapping the pages a little bit earlier. Cc: <[email protected]> Reported-by: Arthur Borsboom <[email protected]> Signed-off-by: Juergen Gross <[email protected]> Reviewed-by: Boris Ostrovsky <[email protected]> Signed-off-by: Boris Ostrovsky <[email protected]>
2017-08-31xen/gntdev: update to new mmu_notifier semanticJérôme Glisse1-8/+0
Calls to mmu_notifier_invalidate_page() were replaced by calls to mmu_notifier_invalidate_range() and are now bracketed by calls to mmu_notifier_invalidate_range_start()/end() Remove now useless invalidate_page callback. Signed-off-by: Jérôme Glisse <[email protected]> Reviewed-by: Boris Ostrovsky <[email protected]> Cc: Konrad Rzeszutek Wilk <[email protected]> Cc: Roger Pau Monné <[email protected]> Cc: [email protected] (moderated for non-subscribers) Cc: Kirill A. Shutemov <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Andrea Arcangeli <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-03-13drivers, xen: convert grant_map.users from atomic_t to refcount_tElena Reshetova1-5/+6
refcount_t type and corresponding API should be used instead of atomic_t when the variable is used as a reference counter. This allows to avoid accidental refcounter overflows that might lead to use-after-free situations. Signed-off-by: Elena Reshetova <[email protected]> Signed-off-by: Hans Liljestrand <[email protected]> Signed-off-by: Kees Cook <[email protected]> Signed-off-by: David Windsor <[email protected]> Signed-off-by: Boris Ostrovsky <[email protected]>
2017-03-02sched/headers: Prepare for new header dependencies before moving code to ↵Ingo Molnar1-0/+1
<linux/sched/mm.h> We are going to split <linux/sched/mm.h> out of <linux/sched.h>, which will have to be picked up from other headers and a couple of .c files. Create a trivial placeholder <linux/sched/mm.h> file that just maps to <linux/sched.h> to make this patch obviously correct and bisectable. The APIs that are going to be moved first are: mm_alloc() __mmdrop() mmdrop() mmdrop_async_fn() mmdrop_async() mmget_not_zero() mmput() mmput_async() get_task_mm() mm_access() mm_release() Include the new header in the files that are going to need it. Acked-by: Linus Torvalds <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-11-28xen/gntdev: Use VM_MIXEDMAP instead of VM_IO to avoid NUMA balancingBoris Ostrovsky1-1/+1
Commit 9c17d96500f7 ("xen/gntdev: Grant maps should not be subject to NUMA balancing") set VM_IO flag to prevent grant maps from being subjected to NUMA balancing. It was discovered recently that this flag causes get_user_pages() to always fail with -EFAULT. check_vma_flags __get_user_pages __get_user_pages_locked __get_user_pages_unlocked get_user_pages_fast iov_iter_get_pages dio_refill_pages do_direct_IO do_blockdev_direct_IO do_blockdev_direct_IO ext4_direct_IO_read generic_file_read_iter aio_run_iocb (which can happen if guest's vdisk has direct-io-safe option). To avoid this let's use VM_MIXEDMAP flag instead --- it prevents NUMA balancing just as VM_IO does and has no effect on check_vma_flags(). Cc: [email protected] Reported-by: Olaf Hering <[email protected]> Suggested-by: Hugh Dickins <[email protected]> Signed-off-by: Boris Ostrovsky <[email protected]> Acked-by: Hugh Dickins <[email protected]> Tested-by: Olaf Hering <[email protected]> Signed-off-by: Juergen Gross <[email protected]>
2016-07-06xen: use vma_pages().Muhammad Falak R Wani1-1/+1
Replace explicit computation of vma page count by a call to vma_pages(). Signed-off-by: Muhammad Falak R Wani <[email protected]> Reviewed-by: Boris Ostrovsky <[email protected]> Signed-off-by: David Vrabel <[email protected]>
2016-05-24xen/gntdev: reduce copy batch size to 16David Vrabel1-1/+1
IOCTL_GNTDEV_GRANT_COPY batches copy operations to reduce the number of hypercalls. The stack is used to avoid a memory allocation in a hot path. However, a batch size of 24 requires more than 1024 bytes of stack which in some configurations causes a compiler warning. xen/gntdev.c: In function ‘gntdev_ioctl_grant_copy’: xen/gntdev.c:949:1: warning: the frame size of 1248 bytes is larger than 1024 bytes [-Wframe-larger-than=] This is a harmless warning as there is still plenty of stack spare, but people keep trying to "fix" it. Reduce the batch size to 16 to reduce stack usage to less than 1024 bytes. This should have minimal impact on performance. Signed-off-by: David Vrabel <[email protected]>
2016-01-07xen/gntdev: add ioctl for grant copyDavid Vrabel1-0/+203
Add IOCTL_GNTDEV_GRANT_COPY to allow applications to copy between user space buffers and grant references. This interface is similar to the GNTTABOP_copy hypercall ABI except the local buffers are provided using a virtual address (instead of a GFN and offset). To avoid userspace from having to page align its buffers the driver will use two or more ops if required. If the ioctl returns 0, the application must check the status of each segment with the segments status field. If the ioctl returns a -ve error code (EINVAL or EFAULT), the status of individual ops is undefined. Signed-off-by: David Vrabel <[email protected]> Reviewed-by: Boris Ostrovsky <[email protected]>
2015-12-21xen/gntdev: constify mmu_notifier_ops structuresJulia Lawall1-1/+1
This mmu_notifier_ops structure is never modified, so declare it as const, like the other mmu_notifier_ops structures. Done with the help of Coccinelle. Signed-off-by: Julia Lawall <[email protected]> Signed-off-by: David Vrabel <[email protected]>
2015-11-26xen/gntdev: Grant maps should not be subject to NUMA balancingBoris Ostrovsky1-1/+1
Doing so will cause the grant to be unmapped and then, during fault handling, the fault to be mistakenly treated as NUMA hint fault. In addition, even if those maps could partcipate in NUMA balancing, it wouldn't provide any benefit since we are unable to determine physical page's node (even if/when VNUMA is implemented). Marking grant maps' VMAs as VM_IO will exclude them from being part of NUMA balancing. Signed-off-by: Boris Ostrovsky <[email protected]> Cc: [email protected] Signed-off-by: David Vrabel <[email protected]>
2015-09-10mm: mark most vm_operations_struct constKirill A. Shutemov1-1/+1
With two exceptions (drm/qxl and drm/radeon) all vm_operations_struct structs should be constant. Signed-off-by: Kirill A. Shutemov <[email protected]> Reviewed-by: Oleg Nesterov <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Thomas Gleixner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-06-30xen/gntdevt: Fix race condition in gntdev_release()Marek Marczykowski-Górecki1-0/+2
While gntdev_release() is called the MMU notifier is still registered and can traverse priv->maps list even if no pages are mapped (which is the case -- gntdev_release() is called after all). But gntdev_release() will clear that list, so make sure that only one of those things happens at the same time. Signed-off-by: Marek Marczykowski-Górecki <[email protected]> Cc: <[email protected]> Signed-off-by: David Vrabel <[email protected]>
2015-06-17xen: Include xen/page.h rather than asm/xen/page.hJulien Grall1-1/+1
Using xen/page.h will be necessary later for using common xen page helpers. As xen/page.h already include asm/xen/page.h, always use the later. Signed-off-by: Julien Grall <[email protected]> Reviewed-by: David Vrabel <[email protected]> Cc: Stefano Stabellini <[email protected]> Cc: Ian Campbell <[email protected]> Cc: Wei Liu <[email protected]> Cc: Konrad Rzeszutek Wilk <[email protected]> Cc: Boris Ostrovsky <[email protected]> Cc: [email protected] Signed-off-by: David Vrabel <[email protected]>
2015-04-27xen/grant: introduce func gnttab_unmap_refs_sync()Bob Liu1-25/+3
There are several place using gnttab async unmap and wait for completion, so move the common code to a function gnttab_unmap_refs_sync(). Signed-off-by: Bob Liu <[email protected]> Acked-by: Roger Pau Monné <[email protected]> Acked-by: Konrad Rzeszutek Wilk <[email protected]> Signed-off-by: David Vrabel <[email protected]>
2015-01-28xen/gntdev: provide find_special_page VMA operationDavid Vrabel1-0/+11
For a PV guest, use the find_special_page op to find the right page. To handle VMAs being split, remember the start of the original VMA so the correct index in the pages array can be calculated. Signed-off-by: David Vrabel <[email protected]> Reviewed-by: Stefano Stabellini <[email protected]>
2015-01-28xen/gntdev: mark userspace PTEs as special on x86 PV guestsDavid Vrabel1-0/+34
In an x86 PV guest, get_user_pages_fast() on a userspace address range containing foreign mappings does not work correctly because the M2P lookup of the MFN from a userspace PTE may return the wrong page. Force get_user_pages_fast() to fail on such addresses by marking the PTEs as special. If Xen has XENFEAT_gnttab_map_avail_bits (available since at least 4.0), we can do so efficiently in the grant map hypercall. Otherwise, it needs to be done afterwards. This is both inefficient and racy (the mapping is visible to the task before we fixup the PTEs), but will be fine for well-behaved applications that do not use the mapping until after the mmap() system call returns. Guests with XENFEAT_auto_translated_physmap (ARM and x86 HVM or PVH) do not need this since get_user_pages() has always worked correctly for them. Signed-off-by: David Vrabel <[email protected]> Reviewed-by: Stefano Stabellini <[email protected]>
2015-01-28xen/gntdev: safely unmap grants in case they are still in useJennifer Herbert1-5/+31
Use gnttab_unmap_refs_async() to wait until the mapped pages are no longer in use before unmapping them. This allows userspace programs to safely use Direct I/O and AIO to a network filesystem which may retain refs to pages in queued skbs after the filesystem I/O has completed. Signed-off-by: Jennifer Herbert <[email protected]> Reviewed-by: Stefano Stabellini <[email protected]> Signed-off-by: David Vrabel <[email protected]>
2015-01-28xen/gntdev: convert priv->lock to a mutexDavid Vrabel1-20/+20
Unmapping may require sleeping and we unmap while holding priv->lock, so convert it to a mutex. Signed-off-by: David Vrabel <[email protected]> Reviewed-by: Stefano Stabellini <[email protected]>
2015-01-28xen/grant-table: add helpers for allocating pagesDavid Vrabel1-2/+2
Add gnttab_alloc_pages() and gnttab_free_pages() to allocate/free pages suitable to for granted maps. Signed-off-by: David Vrabel <[email protected]> Reviewed-by: Stefano Stabellini <[email protected]>
2015-01-28xen/grant-table: pre-populate kernel unmap ops for xen_gnttab_unmap_refs()David Vrabel1-6/+14
When unmapping grants, instead of converting the kernel map ops to unmap ops on the fly, pre-populate the set of unmap ops. This allows the grant unmap for the kernel mappings to be trivially batched in the future. Signed-off-by: David Vrabel <[email protected]> Reviewed-by: Stefano Stabellini <[email protected]>
2014-02-03Revert "xen/grant-table: Avoid m2p_override during mapping"Konrad Rzeszutek Wilk1-8/+5
This reverts commit 08ece5bb2312b4510b161a6ef6682f37f4eac8a1. As it breaks ARM builds and needs more attention on the ARM side. Acked-by: David Vrabel <[email protected]> Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
2014-01-31xen/grant-table: Avoid m2p_override during mappingZoltan Kiss1-5/+8
The grant mapping API does m2p_override unnecessarily: only gntdev needs it, for blkback and future netback patches it just cause a lock contention, as those pages never go to userspace. Therefore this series does the following: - the original functions were renamed to __gnttab_[un]map_refs, with a new parameter m2p_override - based on m2p_override either they follow the original behaviour, or just set the private flag and call set_phys_to_machine - gnttab_[un]map_refs are now a wrapper to call __gnttab_[un]map_refs with m2p_override false - a new function gnttab_[un]map_refs_userspace provides the old behaviour It also removes a stray space from page.h and change ret to 0 if XENFEAT_auto_translated_physmap, as that is the only possible return value there. v2: - move the storing of the old mfn in page->index to gnttab_map_refs - move the function header update to a separate patch v3: - a new approach to retain old behaviour where it needed - squash the patches into one v4: - move out the common bits from m2p* functions, and pass pfn/mfn as parameter - clear page->private before doing anything with the page, so m2p_find_override won't race with this v5: - change return value handling in __gnttab_[un]map_refs - remove a stray space in page.h - add detail why ret = 0 now at some places v6: - don't pass pfn to m2p* functions, just get it locally Signed-off-by: Zoltan Kiss <[email protected]> Suggested-by: David Vrabel <[email protected]> Acked-by: David Vrabel <[email protected]> Acked-by: Stefano Stabellini <[email protected]> Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
2014-01-06xen/pvh: Piggyback on PVHVM for grant driver (v4)Konrad Rzeszutek Wilk1-1/+1
In PVH the shared grant frame is the PFN and not MFN, hence its mapped via the same code path as HVM. The allocation of the grant frame is done differently - we do not use the early platform-pci driver and have an ioremap area - instead we use balloon memory and stitch all of the non-contingous pages in a virtualized area. That means when we call the hypervisor to replace the GMFN with a XENMAPSPACE_grant_table type, we need to lookup the old PFN for every iteration instead of assuming a flat contingous PFN allocation. Lastly, we only use v1 for grants. This is because PVHVM is not able to use v2 due to no XENMEM_add_to_physmap calls on the error status page (see commit 69e8f430e243d657c2053f097efebc2e2cd559f0 xen/granttable: Disable grant v2 for HVM domains.) Until that is implemented this workaround has to be in place. Also per suggestions by Stefano utilize the PVHVM paths as they share common functionality. v2 of this patch moves most of the PVH code out in the arch/x86/xen/grant-table driver and touches only minimally the generic driver. v3, v4: fixes us some of the code due to earlier patches. Signed-off-by: Konrad Rzeszutek Wilk <[email protected]> Acked-by: Stefano Stabellini <[email protected]>
2013-08-20xen/m2p: use GNTTABOP_unmap_and_replace to reinstate the original mappingStefano Stabellini1-9/+2
GNTTABOP_unmap_grant_ref unmaps a grant and replaces it with a 0 mapping instead of reinstating the original mapping. Doing so separately would be racy. To unmap a grant and reinstate the original mapping atomically we use GNTTABOP_unmap_and_replace. GNTTABOP_unmap_and_replace doesn't work with GNTMAP_contains_pte, so don't use it for kmaps. GNTTABOP_unmap_and_replace zeroes the mapping passed in new_addr so we have to reinstate it, however that is a per-cpu mapping only used for balloon scratch pages, so we can be sure that it's not going to be accessed while the mapping is not valid. Signed-off-by: Stefano Stabellini <[email protected]> Reviewed-by: David Vrabel <[email protected]> Acked-by: Konrad Rzeszutek Wilk <[email protected]> CC: [email protected] CC: [email protected] [v1: Konrad fixed up the conflicts] Conflicts: arch/x86/xen/p2m.c
2013-06-28xen: Convert printks to pr_<level>Joe Perches1-3/+5
Convert printks to pr_<level> (excludes printk(KERN_DEBUG...) to be more consistent throughout the xen subsystem. Add pr_fmt with KBUILD_MODNAME or "xen:" KBUILD_MODNAME Coalesce formats and add missing word spaces Add missing newlines Align arguments and reflow to 80 columns Remove DRV_NAME from formats as pr_fmt adds the same content This does change some of the prefixes of these messages but it also does make them more consistent. Signed-off-by: Joe Perches <[email protected]> Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
2013-01-15xen/gntdev: remove erronous use of copy_to_userDaniel De Graaf1-10/+3
Since there is now a mapping of granted pages in kernel address space in both PV and HVM, use it for UNMAP_NOTIFY_CLEAR_BYTE instead of accessing memory via copy_to_user and triggering sleep-in-atomic warnings. Signed-off-by: Daniel De Graaf <[email protected]> Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
2013-01-15xen/gntdev: correctly unmap unlinked maps in mmu notifierDaniel De Graaf1-29/+63
If gntdev_ioctl_unmap_grant_ref is called on a range before unmapping it, the entry is removed from priv->maps and the later call to mn_invl_range_start won't find it to do the unmapping. Fix this by creating another list of freeable maps that the mmu notifier can search and use to unmap grants. Signed-off-by: Daniel De Graaf <[email protected]> Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
2013-01-15xen/gntdev: fix unsafe vma accessDaniel De Graaf1-5/+24
In gntdev_ioctl_get_offset_for_vaddr, we need to hold mmap_sem while calling find_vma() to avoid potentially having the result freed out from under us. Similarly, the MMU notifier functions need to synchronize with gntdev_vma_close to avoid map->vma being freed during their iteration. Signed-off-by: Daniel De Graaf <[email protected]> Reported-by: Al Viro <[email protected]> Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
2012-10-30xen/gntdev: don't leak memory from IOCTL_GNTDEV_MAP_GRANT_REFDavid Vrabel1-17/+19
map->kmap_ops allocated in gntdev_alloc_map() wasn't freed by gntdev_put_map(). Add a gntdev_free_map() helper function to free everything allocated by gntdev_alloc_map(). Signed-off-by: David Vrabel <[email protected]> Cc: [email protected] Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
2012-10-09mm: kill vma flag VM_RESERVED and mm->reserved_vm counterKonstantin Khlebnikov1-1/+1
A long time ago, in v2.4, VM_RESERVED kept swapout process off VMA, currently it lost original meaning but still has some effects: | effect | alternative flags -+------------------------+--------------------------------------------- 1| account as reserved_vm | VM_IO 2| skip in core dump | VM_IO, VM_DONTDUMP 3| do not merge or expand | VM_IO, VM_DONTEXPAND, VM_HUGETLB, VM_PFNMAP 4| do not mlock | VM_IO, VM_DONTEXPAND, VM_HUGETLB, VM_PFNMAP This patch removes reserved_vm counter from mm_struct. Seems like nobody cares about it, it does not exported into userspace directly, it only reduces total_vm showed in proc. Thus VM_RESERVED can be replaced with VM_IO or pair VM_DONTEXPAND | VM_DONTDUMP. remap_pfn_range() and io_remap_pfn_range() set VM_IO|VM_DONTEXPAND|VM_DONTDUMP. remap_vmalloc_range() set VM_DONTEXPAND | VM_DONTDUMP. [[email protected]: drivers/vfio/pci/vfio_pci.c fixup] Signed-off-by: Konstantin Khlebnikov <[email protected]> Cc: Alexander Viro <[email protected]> Cc: Carsten Otte <[email protected]> Cc: Chris Metcalf <[email protected]> Cc: Cyrill Gorcunov <[email protected]> Cc: Eric Paris <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Morris <[email protected]> Cc: Jason Baron <[email protected]> Cc: Kentaro Takeda <[email protected]> Cc: Matt Helsley <[email protected]> Cc: Nick Piggin <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Robert Richter <[email protected]> Cc: Suresh Siddha <[email protected]> Cc: Tetsuo Handa <[email protected]> Cc: Venkatesh Pallipadi <[email protected]> Acked-by: Linus Torvalds <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2012-10-02Merge tag 'stable/for-linus-3.7-x86-tag' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen Pull Xen update from Konrad Rzeszutek Wilk: "Features: - When hotplugging PCI devices in a PV guest we can allocate Xen-SWIOTLB later. - Cleanup Xen SWIOTLB. - Support pages out grants from HVM domains in the backends. - Support wild cards in xen-pciback.hide=(BDF) arguments. - Update grant status updates with upstream hypervisor. - Boot PV guests with more than 128GB. - Cleanup Xen MMU code/add comments. - Obtain XENVERS using a preferred method. - Lay out generic changes to support Xen ARM. - Allow privcmd ioctl for HVM (used to do only PV). - Do v2 of mmap_batch for privcmd ioctls. - If hypervisor saves the LED keyboard light - we will now instruct the kernel about its state. Fixes: - More fixes to Xen PCI backend for various calls/FLR/etc. - With more than 4GB in a 64-bit PV guest disable native SWIOTLB. - Fix up smatch warnings. - Fix up various return values in privmcmd and mm." * tag 'stable/for-linus-3.7-x86-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: (48 commits) xen/pciback: Restore the PCI config space after an FLR. xen-pciback: properly clean up after calling pcistub_device_find() xen/vga: add the xen EFI video mode support xen/x86: retrieve keyboard shift status flags from hypervisor. xen/gndev: Xen backend support for paged out grant targets V4. xen-pciback: support wild cards in slot specifications xen/swiotlb: Fix compile warnings when using plain integer instead of NULL pointer. xen/swiotlb: Remove functions not needed anymore. xen/pcifront: Use Xen-SWIOTLB when initting if required. xen/swiotlb: For early initialization, return zero on success. xen/swiotlb: Use the swiotlb_late_init_with_tbl to init Xen-SWIOTLB late when PV PCI is used. xen/swiotlb: Move the error strings to its own function. xen/swiotlb: Move the nr_tbl determination in its own function. xen/arm: compile and run xenbus xen: resynchronise grant table status codes with upstream xen/privcmd: return -EFAULT on error xen/privcmd: Fix mmap batch ioctl error status copy back. xen/privcmd: add PRIVCMD_MMAPBATCH_V2 ioctl xen/mm: return more precise error from xen_remap_domain_range() xen/mmu: If the revector fails, don't attempt to revector anything else. ...
2012-09-12xen/m2p: do not reuse kmap_op->dev_bus_addrStefano Stabellini1-2/+3
If the caller passes a valid kmap_op to m2p_add_override, we use kmap_op->dev_bus_addr to store the original mfn, but dev_bus_addr is part of the interface with Xen and if we are batching the hypercalls it might not have been written by the hypervisor yet. That means that later on Xen will write to it and we'll think that the original mfn is actually what Xen has written to it. Rather than "stealing" struct members from kmap_op, keep using page->index to store the original mfn and add another parameter to m2p_remove_override to get the corresponding kmap_op instead. It is now responsibility of the caller to keep track of which kmap_op corresponds to a particular page in the m2p_override (gntdev, the only user of this interface that passes a valid kmap_op, is already doing that). CC: [email protected] Reported-and-Tested-By: Sander Eikelenboom <[email protected]> Signed-off-by: Stefano Stabellini <[email protected]> Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
2012-08-21xen/apic/xenbus/swiotlb/pcifront/grant/tmem: Make functions or variables static.Konrad Rzeszutek Wilk1-1/+1
There is no need for those functions/variables to be visible. Make them static and also fix the compile warnings of this sort: drivers/xen/<some file>.c: warning: symbol '<blah>' was not declared. Should it be static? Some of them just require including the header file that declares the functions. Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
2012-04-17xen/gntdev: do not set VM_PFNMAPStefano Stabellini1-1/+1
Since we are using the m2p_override we do have struct pages corresponding to the user vma mmap'ed by gntdev. Removing the VM_PFNMAP flag makes get_user_pages work on that vma. An example test case would be using a Xen userspace block backend (QDISK) on a file on NFS using O_DIRECT. CC: [email protected] Signed-off-by: Stefano Stabellini <[email protected]> Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
2011-12-20xen/grant-table: Support mappings required by blkbackDaniel De Graaf1-1/+2
Add support for mappings without GNTMAP_contains_pte. This was not supported because the unmap operation assumed that this flag was being used; adding a parameter to the unmap operation to allow the PTE clearing to be disabled is sufficient to make unmap capable of supporting either mapping type. Signed-off-by: Daniel De Graaf <[email protected]> [v1: Fix cleanpatch warnings] Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
2011-12-20Merge commit 'v3.2-rc3' into stable/for-linus-3.3Konrad Rzeszutek Wilk1-5/+5
* commit 'v3.2-rc3': (412 commits) Linux 3.2-rc3 virtio-pci: make reset operation safer virtio-mmio: Correct the name of the guest features selector virtio: add HAS_IOMEM dependency to MMIO platform bus driver eCryptfs: Extend array bounds for all filename chars eCryptfs: Flush file in vma close eCryptfs: Prevent file create race condition regulator: TPS65910: Fix VDD1/2 voltage selector count i2c: Make i2cdev_notifier_call static i2c: Delete ANY_I2C_BUS i2c: Fix device name for 10-bit slave address i2c-algo-bit: Generate correct i2c address sequence for 10-bit target drm: integer overflow in drm_mode_dirtyfb_ioctl() Revert "of/irq: of_irq_find_parent: check for parent equal to child" drivers/gpu/vga/vgaarb.c: add missing kfree drm/radeon/kms/atom: unify i2c gpio table handling drm/radeon/kms: fix up gpio i2c mask bits for r4xx for real ttm: Don't return the bo reserved on error path mount_subtree() pointless use-after-free iio: fix a leak due to improper use of anon_inode_getfd() ...
2011-11-21xen/gnt{dev,alloc}: reserve event channels for notifyDaniel De Graaf1-1/+30
When using the unmap notify ioctl, the event channel used for notification needs to be reserved to avoid it being deallocated prior to sending the notification. Signed-off-by: Daniel De Graaf <[email protected]> Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
2011-11-16xen-gntdev: integer overflow in gntdev_alloc_map()Dan Carpenter1-5/+5
The multiplications here can overflow resulting in smaller buffer sizes than expected. "count" comes from a copy_from_user(). Signed-off-by: Dan Carpenter <[email protected]> Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>