Age | Commit message (Collapse) | Author | Files | Lines |
|
Merge misc updates from Andrew Morton:
- a few misc things and hotfixes
- ocfs2
- almost all of MM
* emailed patches from Andrew Morton <[email protected]>: (139 commits)
kernel/memremap.c: remove the unused device_private_entry_fault() export
mm: delete find_get_entries_tag
mm/huge_memory.c: make __thp_get_unmapped_area static
mm/mprotect.c: fix compilation warning because of unused 'mm' variable
mm/page-writeback: introduce tracepoint for wait_on_page_writeback()
mm/vmscan: simplify trace_reclaim_flags and trace_shrink_flags
mm/Kconfig: update "Memory Model" help text
mm/vmscan.c: don't disable irq again when count pgrefill for memcg
mm: memblock: make keeping memblock memory opt-in rather than opt-out
hugetlbfs: always use address space in inode for resv_map pointer
mm/z3fold.c: support page migration
mm/z3fold.c: add structure for buddy handles
mm/z3fold.c: improve compression by extending search
mm/z3fold.c: introduce helper functions
mm/page_alloc.c: remove unnecessary parameter in rmqueue_pcplist
mm/hmm: add ARCH_HAS_HMM_MIRROR ARCH_HAS_HMM_DEVICE Kconfig
mm/vmscan.c: simplify shrink_inactive_list()
fs/sync.c: sync_file_range(2) may use WB_SYNC_ALL writeback
xen/privcmd-buf.c: convert to use vm_map_pages_zero()
xen/gntdev.c: convert to use vm_map_pages()
...
|
|
I removed the only user of this and hadn't noticed it was now unused.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Reviewed-by: Ross Zwisler <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
__thp_get_unmapped_area is only used in mm/huge_memory.c. Make it static.
Tested by building and booting the kernel.
Link: http://lkml.kernel.org/r/20190504102353.GA22525@bharath12345-Inspiron-5559
Signed-off-by: Bharath Vedartham <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Since 0cbe3e26abe0 ("mm: update ptep_modify_prot_start/commit to take
vm_area_struct as arg") the only place that uses the local 'mm' variable
in change_pte_range() is the call to set_pte_at().
Many architectures define set_pte_at() as macro that does not use the 'mm'
parameter, which generates the following compilation warning:
CC mm/mprotect.o
mm/mprotect.c: In function 'change_pte_range':
mm/mprotect.c:42:20: warning: unused variable 'mm' [-Wunused-variable]
struct mm_struct *mm = vma->vm_mm;
^~
Fix it by passing vma->mm to set_pte_at() and dropping the local 'mm'
variable in change_pte_range().
[[email protected]: fix missed conversions]
Link: http://lkml.kernel.org/r/CAPhsuW6wcQgYLHNdBdw6m0YiR4RWsS4XzfpSKU7wBLLeOCTbpw@mail.gmail.comLink: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Mike Rapoport <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Cc: Song Liu <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Recently there have been some hung tasks on our server due to
wait_on_page_writeback(), and we want to know the details of this
PG_writeback, i.e. this page is writing back to which device. But it is
not so convenient to get the details.
I think it would be better to introduce a tracepoint for diagnosing the
writeback details.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Yafang Shao <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Michal Hocko <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The help describing the memory model selection is outdated. It still says
that SPARSEMEM is experimental and DISCONTIGMEM is a preferred over
SPARSEMEM.
Update the help text for the relevant options:
* add a generic help for the "Memory Model" prompt
* add description for FLATMEM
* reduce the description of DISCONTIGMEM and add a deprecation note
* prefer SPARSEMEM over DISCONTIGMEM
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Mike Rapoport <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
We can use __count_memcg_events() directly because this callsite is alreay
protected by spin_lock_irq().
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Yafang Shao <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: Johannes Weiner <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Most architectures do not need the memblock memory after the page
allocator is initialized, but only few enable ARCH_DISCARD_MEMBLOCK in the
arch Kconfig.
Replacing ARCH_DISCARD_MEMBLOCK with ARCH_KEEP_MEMBLOCK and inverting the
logic makes it clear which architectures actually use memblock after
system initialization and skips the necessity to add ARCH_DISCARD_MEMBLOCK
to the architectures that are still missing that option.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Mike Rapoport <[email protected]>
Acked-by: Michael Ellerman <[email protected]> (powerpc)
Cc: Russell King <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Richard Kuo <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Ralf Baechle <[email protected]>
Cc: Paul Burton <[email protected]>
Cc: James Hogan <[email protected]>
Cc: Ley Foon Tan <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Martin Schwidefsky <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Yoshinori Sato <[email protected]>
Cc: Rich Felker <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Eric Biederman <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Continuing discussion about 58b6e5e8f1ad ("hugetlbfs: fix memory leak for
resv_map") brought up the issue that inode->i_mapping may not point to the
address space embedded within the inode at inode eviction time. The
hugetlbfs truncate routine handles this by explicitly using inode->i_data.
However, code cleaning up the resv_map will still use the address space
pointed to by inode->i_mapping. Luckily, private_data is NULL for address
spaces in all such cases today but, there is no guarantee this will
continue.
Change all hugetlbfs code getting a resv_map pointer to explicitly get it
from the address space embedded within the inode. In addition, add more
comments in the code to indicate why this is being done.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Mike Kravetz <[email protected]>
Reported-by: Yufen Yu <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Naoya Horiguchi <[email protected]>
Cc: "Kirill A . Shutemov" <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Now that we are not using page address in handles directly, we can make
z3fold pages movable to decrease the memory fragmentation z3fold may
create over time.
This patch starts advertising non-headless z3fold pages as movable and
uses the existing kernel infrastructure to implement moving of such pages
per memory management subsystem's request. It thus implements 3 required
callbacks for page migration:
* isolation callback: z3fold_page_isolate(): try to isolate the page by
removing it from all lists. Pages scheduled for some activity and
mapped pages will not be isolated. Return true if isolation was
successful or false otherwise
* migration callback: z3fold_page_migrate(): re-check critical
conditions and migrate page contents to the new page provided by the
memory subsystem. Returns 0 on success or negative error code otherwise
* putback callback: z3fold_page_putback(): put back the page if
z3fold_page_migrate() for it failed permanently (i. e. not with
-EAGAIN code).
[[email protected]: z3fold_page_isolate() can be static]
Link: http://lkml.kernel.org/r/20190419130924.GA161478@ivb42
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Vitaly Wool <[email protected]>
Signed-off-by: kbuild test robot <[email protected]>
Cc: Bartlomiej Zolnierkiewicz <[email protected]>
Cc: Dan Streetman <[email protected]>
Cc: Krzysztof Kozlowski <[email protected]>
Cc: Oleksiy Avramchenko <[email protected]>
Cc: Uladzislau Rezki <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
For z3fold to be able to move its pages per request of the memory
subsystem, it should not use direct object addresses in handles. Instead,
it will create abstract handles (3 per page) which will contain pointers
to z3fold objects. Thus, it will be possible to change these pointers
when z3fold page is moved.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Vitaly Wool <[email protected]>
Cc: Bartlomiej Zolnierkiewicz <[email protected]>
Cc: Dan Streetman <[email protected]>
Cc: Krzysztof Kozlowski <[email protected]>
Cc: Oleksiy Avramchenko <[email protected]>
Cc: Uladzislau Rezki <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The current z3fold implementation only searches this CPU's page lists for
a fitting page to put a new object into. This patch adds quick search for
very well fitting pages (i. e. those having exactly the required number
of free space) on other CPUs too, before allocating a new page for that
object.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Vitaly Wool <[email protected]>
Cc: Bartlomiej Zolnierkiewicz <[email protected]>
Cc: Dan Streetman <[email protected]>
Cc: Krzysztof Kozlowski <[email protected]>
Cc: Oleksiy Avramchenko <[email protected]>
Cc: Uladzislau Rezki <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Patch series "z3fold: support page migration", v2.
This patchset implements page migration support and slightly better buddy
search. To implement page migration support, z3fold has to move away from
the current scheme of handle encoding. i. e. stop encoding page address
in handles. Instead, a small per-page structure is created which will
contain actual addresses for z3fold objects, while pointers to fields of
that structure will be used as handles.
Thus, it will be possible to change the underlying addresses to reflect
page migration.
To support migration itself, 3 callbacks will be implemented:
1: isolation callback: z3fold_page_isolate(): try to isolate the page
by removing it from all lists. Pages scheduled for some activity and
mapped pages will not be isolated. Return true if isolation was
successful or false otherwise
2: migration callback: z3fold_page_migrate(): re-check critical
conditions and migrate page contents to the new page provided by the
system. Returns 0 on success or negative error code otherwise
3: putback callback: z3fold_page_putback(): put back the page if
z3fold_page_migrate() for it failed permanently (i. e. not with
-EAGAIN code).
To make sure an isolated page doesn't get freed, its kref is incremented
in z3fold_page_isolate() and decremented during post-migration compaction,
if migration was successful, or by z3fold_page_putback() in the other
case.
Since the new handle encoding scheme implies slight memory consumption
increase, better buddy search (which decreases memory consumption) is
included in this patchset.
This patch (of 4):
Introduce a separate helper function for object allocation, as well as 2
smaller helpers to add a buddy to the list and to get a pointer to the
pool from the z3fold header. No functional changes here.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Vitaly Wool <[email protected]>
Cc: Dan Streetman <[email protected]>
Cc: Bartlomiej Zolnierkiewicz <[email protected]>
Cc: Krzysztof Kozlowski <[email protected]>
Cc: Oleksiy Avramchenko <[email protected]>
Cc: Uladzislau Rezki <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Because rmqueue_pcplist() is only called when order is 0, we don't need to
use order as a parameter.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Yafang Shao <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Acked-by: Pankaj Gupta <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Add 2 new Kconfig variables that are not used by anyone. I check that
various make ARCH=somearch allmodconfig do work and do not complain. This
new Kconfig needs to be added first so that device drivers that depend on
HMM can be updated.
Once drivers are updated then I can update the HMM Kconfig to depend on
this new Kconfig in a followup patch.
This is about solving Kconfig for HMM given that device driver are
going through their own tree we want to avoid changing them from the mm
tree. So plan is:
1 - Kernel release N add the new Kconfig to mm/Kconfig (this patch)
2 - Kernel release N+1 update driver to depend on new Kconfig ie
stop using ARCH_HASH_HMM and start using ARCH_HAS_HMM_MIRROR
and ARCH_HAS_HMM_DEVICE (one or the other or both depending
on the driver)
3 - Kernel release N+2 remove ARCH_HASH_HMM and do final Kconfig
update in mm/Kconfig
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Jérôme Glisse <[email protected]>
Cc: Guenter Roeck <[email protected]>
Cc: Leon Romanovsky <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Ralph Campbell <[email protected]>
Cc: John Hubbard <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
This merges together duplicated patterns of code. Also, replace
count_memcg_events() with its irq-careless namesake, because they are
already called in interrupts disabled context.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Kirill Tkhai <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Reviewed-by: Daniel Jordan <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
Cc: Baoquan He <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Patch series "mm: Use vm_map_pages() and vm_map_pages_zero() API", v5.
This patch (of 5):
Previouly drivers have their own way of mapping range of kernel
pages/memory into user vma and this was done by invoking vm_insert_page()
within a loop.
As this pattern is common across different drivers, it can be generalized
by creating new functions and using them across the drivers.
vm_map_pages() is the API which can be used to map kernel memory/pages in
drivers which have considered vm_pgoff
vm_map_pages_zero() is the API which can be used to map a range of kernel
memory/pages in drivers which have not considered vm_pgoff. vm_pgoff is
passed as default 0 for those drivers.
We _could_ then at a later "fix" these drivers which are using
vm_map_pages_zero() to behave according to the normal vm_pgoff offsetting
simply by removing the _zero suffix on the function name and if that
causes regressions, it gives us an easy way to revert.
Tested on Rockchip hardware and display is working, including talking to
Lima via prime.
Link: http://lkml.kernel.org/r/751cb8a0f4c3e67e95c58a3b072937617f338eea.1552921225.git.jrdr.linux@gmail.com
Signed-off-by: Souptick Joarder <[email protected]>
Suggested-by: Russell King <[email protected]>
Suggested-by: Matthew Wilcox <[email protected]>
Reviewed-by: Mike Rapoport <[email protected]>
Tested-by: Heiko Stuebner <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Stephen Rothwell <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Robin Murphy <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Thierry Reding <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Marek Szyprowski <[email protected]>
Cc: Stefan Richter <[email protected]>
Cc: Sandy Huang <[email protected]>
Cc: David Airlie <[email protected]>
Cc: Oleksandr Andrushchenko <[email protected]>
Cc: Joerg Roedel <[email protected]>
Cc: Pawel Osciak <[email protected]>
Cc: Kyungmin Park <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: Mauro Carvalho Chehab <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
'default n' is the default value for any bool or tristate Kconfig
setting so there is no need to write it explicitly.
Also since commit f467c5640c29 ("kconfig: only write '# CONFIG_FOO
is not set' for visible symbols") the Kconfig behavior is the same
regardless of 'default n' being present or not:
...
One side effect of (and the main motivation for) this change is making
the following two definitions behave exactly the same:
config FOO
bool
config FOO
bool
default n
With this change, neither of these will generate a
'# CONFIG_FOO is not set' line (assuming FOO isn't selected/implied).
That might make it clearer to people that a bare 'default n' is
redundant.
...
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Bartlomiej Zolnierkiewicz <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
With the default overcommit==guess we occasionally run into mmap
rejections despite plenty of memory that would get dropped under
pressure but just isn't accounted reclaimable. One example of this is
dying cgroups pinned by some page cache. A previous case was auxiliary
path name memory associated with dentries; we have since annotated
those allocations to avoid overcommit failures (see d79f7aa496fc ("mm:
treat indirectly reclaimable memory as free in overcommit logic")).
But trying to classify all allocated memory reliably as reclaimable
and unreclaimable is a bit of a fool's errand. There could be a myriad
of dependencies that constantly change with kernel versions.
It becomes even more questionable of an effort when considering how
this estimate of available memory is used: it's not compared to the
system-wide allocated virtual memory in any way. It's not even
compared to the allocating process's address space. It's compared to
the single allocation request at hand!
So we have an elaborate left-hand side of the equation that tries to
assess the exact breathing room the system has available down to a
page - and then compare it to an isolated allocation request with no
additional context. We could fail an allocation of N bytes, but for
two allocations of N/2 bytes we'd do this elaborate dance twice in a
row and then still let N bytes of virtual memory through. This doesn't
make a whole lot of sense.
Let's take a step back and look at the actual goal of the
heuristic. From the documentation:
Heuristic overcommit handling. Obvious overcommits of address
space are refused. Used for a typical system. It ensures a
seriously wild allocation fails while allowing overcommit to
reduce swap usage. root is allowed to allocate slightly more
memory in this mode. This is the default.
If all we want to do is catch clearly bogus allocation requests
irrespective of the general virtual memory situation, the physical
memory counter-part doesn't need to be that complicated, either.
When in GUESS mode, catch wild allocations by comparing their request
size to total amount of ram and swap in the system.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Johannes Weiner <[email protected]>
Acked-by: Roman Gushchin <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
All callers of arch_remove_memory() ignore errors. And we should really
try to remove any errors from the memory removal path. No more errors are
reported from __remove_pages(). BUG() in s390x code in case
arch_remove_memory() is triggered. We may implement that properly later.
WARN in case powerpc code failed to remove the section mapping, which is
better than ignoring the error completely right now.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: David Hildenbrand <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Martin Schwidefsky <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Yoshinori Sato <[email protected]>
Cc: Rich Felker <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Stefan Agner <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Pavel Tatashin <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Arun KS <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Masahiro Yamada <[email protected]>
Cc: Rob Herring <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Wei Yang <[email protected]>
Cc: Qian Cai <[email protected]>
Cc: Mathieu Malaterre <[email protected]>
Cc: Andrew Banman <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Mike Travis <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Let's just warn in case a section is not valid instead of failing to
remove somewhere in the middle of the process, returning an error that
will be mostly ignored by callers.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: David Hildenbrand <[email protected]>
Reviewed-by: Oscar Salvador <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Pavel Tatashin <[email protected]>
Cc: Qian Cai <[email protected]>
Cc: Wei Yang <[email protected]>
Cc: Arun KS <[email protected]>
Cc: Mathieu Malaterre <[email protected]>
Cc: Andrew Banman <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Cc: Martin Schwidefsky <[email protected]>
Cc: Masahiro Yamada <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Mike Travis <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Rich Felker <[email protected]>
Cc: Rob Herring <[email protected]>
Cc: Stefan Agner <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Yoshinori Sato <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Failing while removing memory is mostly ignored and cannot really be
handled. Let's treat errors in unregister_memory_section() in a nice way,
warning, but continuing.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: David Hildenbrand <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Andrew Banman <[email protected]>
Cc: Mike Travis <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Pavel Tatashin <[email protected]>
Cc: Qian Cai <[email protected]>
Cc: Wei Yang <[email protected]>
Cc: Arun KS <[email protected]>
Cc: Mathieu Malaterre <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Cc: Martin Schwidefsky <[email protected]>
Cc: Masahiro Yamada <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Rich Felker <[email protected]>
Cc: Rob Herring <[email protected]>
Cc: Stefan Agner <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Yoshinori Sato <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Patch series "mm/memory_hotplug: Better error handling when removing
memory", v1.
Error handling when removing memory is somewhat messed up right now. Some
errors result in warnings, others are completely ignored. Memory unplug
code can essentially not deal with errors properly as of now.
remove_memory() will never fail.
We have basically two choices:
1. Allow arch_remov_memory() and friends to fail, propagating errors via
remove_memory(). Might be problematic (e.g. DIMMs consisting of multiple
pieces added/removed separately).
2. Don't allow the functions to fail, handling errors in a nicer way.
It seems like most errors that can theoretically happen are really corner
cases and mostly theoretical (e.g. "section not valid"). However e.g.
aborting removal of sections while all callers simply continue in case of
errors is not nice.
If we can gurantee that removal of memory always works (and WARN/skip in
case of theoretical errors so we can figure out what is going on), we can
go ahead and implement better error handling when adding memory.
E.g. via add_memory():
arch_add_memory()
ret = do_stuff()
if (ret) {
arch_remove_memory();
goto error;
}
Handling here that arch_remove_memory() might fail is basically
impossible. So I suggest, let's avoid reporting errors while removing
memory, warning on theoretical errors instead and continuing instead of
aborting.
This patch (of 4):
__add_pages() doesn't add the memory resource, so __remove_pages()
shouldn't remove it. Let's factor it out. Especially as it is a special
case for memory used as system memory, added via add_memory() and friends.
We now remove the resource after removing the sections instead of doing it
the other way around. I don't think this change is problematic.
add_memory()
register memory resource
arch_add_memory()
remove_memory
arch_remove_memory()
release memory resource
While at it, explain why we ignore errors and that it only happeny if
we remove memory in a different granularity as we added it.
[[email protected]: fix printk warning]
Link: http://lkml.kernel.org/r/[email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: David Hildenbrand <[email protected]>
Reviewed-by: Oscar Salvador <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Pavel Tatashin <[email protected]>
Cc: Wei Yang <[email protected]>
Cc: Qian Cai <[email protected]>
Cc: Arun KS <[email protected]>
Cc: Mathieu Malaterre <[email protected]>
Cc: Andrew Banman <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Cc: Martin Schwidefsky <[email protected]>
Cc: Masahiro Yamada <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Mike Travis <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Rich Felker <[email protected]>
Cc: Rob Herring <[email protected]>
Cc: Stefan Agner <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Yoshinori Sato <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Laurent Dufour <[email protected]>
Reviewed-by: William Kucharski <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
arch_add_memory, __add_pages take a want_memblock which controls whether
the newly added memory should get the sysfs memblock user API (e.g.
ZONE_DEVICE users do not want/need this interface). Some callers even
want to control where do we allocate the memmap from by configuring
altmap.
Add a more generic hotplug context for arch_add_memory and __add_pages.
struct mhp_restrictions contains flags which contains additional features
to be enabled by the memory hotplug (MHP_MEMBLOCK_API currently) and
altmap for alternative memmap allocator.
This patch shouldn't introduce any functional change.
[[email protected]: build fix]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Michal Hocko <[email protected]>
Signed-off-by: Oscar Salvador <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: David Hildenbrand <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
check_pages_isolated_cb currently accounts the whole pfn range as being
offlined if test_pages_isolated suceeds on the range. This is based on
the assumption that all pages in the range are freed which is currently
the case in most cases but it won't be with later changes, as pages marked
as vmemmap won't be isolated.
Move the offlined pages counting to offline_isolated_pages_cb and rely on
__offline_isolated_pages to return the correct value.
check_pages_isolated_cb will still do it's primary job and check the pfn
range.
While we are at it remove check_pages_isolated and offline_isolated_pages
and use directly walk_system_ram_range as do in online_pages.
Link: http://lkml.kernel.org/r/[email protected]
Reviewed-by: David Hildenbrand <[email protected]>
Signed-off-by: Michal Hocko <[email protected]>
Signed-off-by: Oscar Salvador <[email protected]>
Cc: Dan Williams <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Add yet another iterator, for_each_free_mem_range_in_zone_from, and then
use it to support initializing and freeing pages in groups no larger than
MAX_ORDER_NR_PAGES. By doing this we can greatly improve the cache
locality of the pages while we do several loops over them in the init and
freeing process.
We are able to tighten the loops further as a result of the "from"
iterator as we can perform the initial checks for first_init_pfn in our
first call to the iterator, and continue without the need for those checks
via the "from" iterator. I have added this functionality in the function
called deferred_init_mem_pfn_range_in_zone that primes the iterator and
causes us to exit if we encounter any failure.
On my x86_64 test system with 384GB of memory per node I saw a reduction
in initialization time from 1.85s to 1.38s as a result of this patch.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Alexander Duyck <[email protected]>
Reviewed-by: Pavel Tatashin <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Dave Jiang <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: <[email protected]>
Cc: Khalid Aziz <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Laurent Dufour <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: David S. Miller <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Cc: Pavel Tatashin <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Introduce a new iterator for_each_free_mem_pfn_range_in_zone.
This iterator will take care of making sure a given memory range provided
is in fact contained within a zone. It takes are of all the bounds
checking we were doing in deferred_grow_zone, and deferred_init_memmap.
In addition it should help to speed up the search a bit by iterating until
the end of a range is greater than the start of the zone pfn range, and
will exit completely if the start is beyond the end of the zone.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Alexander Duyck <[email protected]>
Reviewed-by: Pavel Tatashin <[email protected]>
Reviewed-by: Mike Rapoport <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Dave Jiang <[email protected]>
Cc: David S. Miller <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Khalid Aziz <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Cc: Laurent Dufour <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Pavel Tatashin <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
As best as I can tell the meminit_pfn_in_nid call is completely redundant.
The deferred memory initialization is already making use of
for_each_free_mem_range which in turn will call into __next_mem_range
which will only return a memory range if it matches the node ID provided
assuming it is not NUMA_NO_NODE.
I am operating on the assumption that there are no zones or pgdata_t
structures that have a NUMA node of NUMA_NO_NODE associated with them. If
that is the case then __next_mem_range will never return a memory range
that doesn't match the zone's node ID and as such the check is redundant.
So one piece I would like to verify on this is if this works for ia64.
Technically it was using a different approach to get the node ID, but it
seems to have the node ID also encoded into the memblock. So I am
assuming this is okay, but would like to get confirmation on that.
On my x86_64 test system with 384GB of memory per node I saw a reduction
in initialization time from 2.80s to 1.85s as a result of this patch.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Alexander Duyck <[email protected]>
Reviewed-by: Pavel Tatashin <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Dave Jiang <[email protected]>
Cc: David S. Miller <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Khalid Aziz <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Cc: Laurent Dufour <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Pavel Tatashin <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
We have the pra.mapcount already, and there is no need to call the
page_mapped() which may do some complicated computing for compound page.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Huang Shijie <[email protected]>
Acked-by: Kirill A. Shutemov <[email protected]>
Cc: Mike Kravetz <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Recently I messed up the error handling in filemap_fault() because of an
unexpected ENOMEM (related to cgroup memory limits) in add_to_page_cache.
Enable error injection at this point so I can add a testcase to xfstests
to verify I don't mess this up again.
[[email protected]: include linux/error-injection.h]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Josef Bacik <[email protected]>
Reviewed-by: William Kucharski <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Helper to test if a range is updated to read only (it is still valid to
read from the range). This is useful for device driver or anyone who wish
to optimize out update when they know that they already have the range map
read only.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Jérôme Glisse <[email protected]>
Reviewed-by: Ralph Campbell <[email protected]>
Reviewed-by: Ira Weiny <[email protected]>
Cc: Christian König <[email protected]>
Cc: Joonas Lahtinen <[email protected]>
Cc: Jani Nikula <[email protected]>
Cc: Rodrigo Vivi <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: Felix Kuehling <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Ross Zwisler <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Radim Krcmar <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Christian Koenig <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
This updates each existing invalidation to use the correct mmu notifier
event that represent what is happening to the CPU page table. See the
patch which introduced the events to see the rational behind this.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Jérôme Glisse <[email protected]>
Reviewed-by: Ralph Campbell <[email protected]>
Reviewed-by: Ira Weiny <[email protected]>
Cc: Christian König <[email protected]>
Cc: Joonas Lahtinen <[email protected]>
Cc: Jani Nikula <[email protected]>
Cc: Rodrigo Vivi <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: Felix Kuehling <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Ross Zwisler <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Radim Krcmar <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Christian Koenig <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
CPU page table update can happens for many reasons, not only as a result
of a syscall (munmap(), mprotect(), mremap(), madvise(), ...) but also as
a result of kernel activities (memory compression, reclaim, migration,
...).
Users of mmu notifier API track changes to the CPU page table and take
specific action for them. While current API only provide range of virtual
address affected by the change, not why the changes is happening.
This patchset do the initial mechanical convertion of all the places that
calls mmu_notifier_range_init to also provide the default MMU_NOTIFY_UNMAP
event as well as the vma if it is know (most invalidation happens against
a given vma). Passing down the vma allows the users of mmu notifier to
inspect the new vma page protection.
The MMU_NOTIFY_UNMAP is always the safe default as users of mmu notifier
should assume that every for the range is going away when that event
happens. A latter patch do convert mm call path to use a more appropriate
events for each call.
This is done as 2 patches so that no call site is forgotten especialy
as it uses this following coccinelle patch:
%<----------------------------------------------------------------------
@@
identifier I1, I2, I3, I4;
@@
static inline void mmu_notifier_range_init(struct mmu_notifier_range *I1,
+enum mmu_notifier_event event,
+unsigned flags,
+struct vm_area_struct *vma,
struct mm_struct *I2, unsigned long I3, unsigned long I4) { ... }
@@
@@
-#define mmu_notifier_range_init(range, mm, start, end)
+#define mmu_notifier_range_init(range, event, flags, vma, mm, start, end)
@@
expression E1, E3, E4;
identifier I1;
@@
<...
mmu_notifier_range_init(E1,
+MMU_NOTIFY_UNMAP, 0, I1,
I1->vm_mm, E3, E4)
...>
@@
expression E1, E2, E3, E4;
identifier FN, VMA;
@@
FN(..., struct vm_area_struct *VMA, ...) {
<...
mmu_notifier_range_init(E1,
+MMU_NOTIFY_UNMAP, 0, VMA,
E2, E3, E4)
...> }
@@
expression E1, E2, E3, E4;
identifier FN, VMA;
@@
FN(...) {
struct vm_area_struct *VMA;
<...
mmu_notifier_range_init(E1,
+MMU_NOTIFY_UNMAP, 0, VMA,
E2, E3, E4)
...> }
@@
expression E1, E2, E3, E4;
identifier FN;
@@
FN(...) {
<...
mmu_notifier_range_init(E1,
+MMU_NOTIFY_UNMAP, 0, NULL,
E2, E3, E4)
...> }
---------------------------------------------------------------------->%
Applied with:
spatch --all-includes --sp-file mmu-notifier.spatch fs/proc/task_mmu.c --in-place
spatch --sp-file mmu-notifier.spatch --dir kernel/events/ --in-place
spatch --sp-file mmu-notifier.spatch --dir mm --in-place
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Jérôme Glisse <[email protected]>
Reviewed-by: Ralph Campbell <[email protected]>
Reviewed-by: Ira Weiny <[email protected]>
Cc: Christian König <[email protected]>
Cc: Joonas Lahtinen <[email protected]>
Cc: Jani Nikula <[email protected]>
Cc: Rodrigo Vivi <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: Felix Kuehling <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Ross Zwisler <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Radim Krcmar <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Christian Koenig <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Use the mmu_notifier_range_blockable() helper function instead of directly
dereferencing the range->blockable field. This is done to make it easier
to change the mmu_notifier range field.
This patch is the outcome of the following coccinelle patch:
%<-------------------------------------------------------------------
@@
identifier I1, FN;
@@
FN(..., struct mmu_notifier_range *I1, ...) {
<...
-I1->blockable
+mmu_notifier_range_blockable(I1)
...>
}
------------------------------------------------------------------->%
spatch --in-place --sp-file blockable.spatch --dir .
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Jérôme Glisse <[email protected]>
Reviewed-by: Ralph Campbell <[email protected]>
Reviewed-by: Ira Weiny <[email protected]>
Cc: Christian König <[email protected]>
Cc: Joonas Lahtinen <[email protected]>
Cc: Jani Nikula <[email protected]>
Cc: Rodrigo Vivi <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: Felix Kuehling <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Ross Zwisler <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Radim Krcmar <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Christian Koenig <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Convert hmm_pfn_* to device_entry_* as here we are dealing with device
driver specific entry format and hmm provide helpers to allow differents
components (including HMM) to create/parse device entry.
We keep wrapper with the old name so that we can convert driver to use the
new API in stages in each device driver tree. This will get remove once
all driver are converted.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Jérôme Glisse <[email protected]>
Cc: Ralph Campbell <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Balbir Singh <[email protected]>
Cc: Dan Carpenter <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Souptick Joarder <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
This is a all in one helper that fault pages in a range and map them to a
device so that every single device driver do not have to re-implement this
common pattern.
This is taken from ODP RDMA in preparation of ODP RDMA convertion. It
will be use by nouveau and other drivers.
[[email protected]: Was using wrong field and wrong enum]
Link: http://lkml.kernel.org/r/[email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Jérôme Glisse <[email protected]>
Cc: Ralph Campbell <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Souptick Joarder <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Balbir Singh <[email protected]>
Cc: Dan Carpenter <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
HMM mirror is a device driver helpers to mirror range of virtual address.
It means that the process jobs running on the device can access the same
virtual address as the CPU threads of that process. This patch adds
support for mirroring mapping of file that are on a DAX block device (ie
range of virtual address that is an mmap of a file in a filesystem on a
DAX block device). There is no reason to not support such case when
mirroring virtual address on a device.
Note that unlike GUP code we do not take page reference hence when we
back-off we have nothing to undo.
[[email protected]: move THP and hugetlbfs code path behind #if KCONFIG]
Link: http://lkml.kernel.org/r/[email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Jérôme Glisse <[email protected]>
Reviewed-by: Ralph Campbell <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Balbir Singh <[email protected]>
Cc: Dan Carpenter <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Souptick Joarder <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
HMM mirror is a device driver helpers to mirror range of virtual address.
It means that the process jobs running on the device can access the same
virtual address as the CPU threads of that process. This patch adds
support for hugetlbfs mapping (ie range of virtual address that are mmap
of a hugetlbfs).
[[email protected]: fix initial PFN for hugetlbfs pages]
Link: http://lkml.kernel.org/r/[email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Jérôme Glisse <[email protected]>
Signed-off-by: Ralph Campbell <[email protected]>
Reviewed-by: Ralph Campbell <[email protected]>
Reviewed-by: Ira Weiny <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Balbir Singh <[email protected]>
Cc: Dan Carpenter <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Souptick Joarder <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The HMM mirror API can be use in two fashions. The first one where the
HMM user coalesce multiple page faults into one request and set flags per
pfns for of those faults. The second one where the HMM user want to
pre-fault a range with specific flags. For the latter one it is a waste
to have the user pre-fill the pfn arrays with a default flags value.
This patch adds a default flags value allowing user to set them for a
range without having to pre-fill the pfn array.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Jérôme Glisse <[email protected]>
Reviewed-by: Ralph Campbell <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Balbir Singh <[email protected]>
Cc: Dan Carpenter <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Souptick Joarder <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
A common use case for HMM mirror is user trying to mirror a range and
before they could program the hardware it get invalidated by some core mm
event. Instead of having user re-try right away to mirror the range
provide a completion mechanism for them to wait for any active
invalidation affecting the range.
This also changes how hmm_range_snapshot() and hmm_range_fault() works by
not relying on vma so that we can drop the mmap_sem when waiting and
lookup the vma again on retry.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Jérôme Glisse <[email protected]>
Reviewed-by: Ralph Campbell <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Dan Carpenter <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Balbir Singh <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: Souptick Joarder <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Minor optimization around hmm_pte_need_fault(). Rename for consistency
between code, comments and documentation. Also improves the comments on
all the possible returns values. Improve the function by returning the
number of populated entries in pfns array.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Jérôme Glisse <[email protected]>
Reviewed-by: Ralph Campbell <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Balbir Singh <[email protected]>
Cc: Dan Carpenter <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Souptick Joarder <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Rename for consistency between code, comments and documentation. Also
improves the comments on all the possible returns values. Improve the
function by returning the number of populated entries in pfns array.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Jérôme Glisse <[email protected]>
Reviewed-by: Ralph Campbell <[email protected]>
Reviewed-by: John Hubbard <[email protected]>
Reviewed-by: Ira Weiny <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Balbir Singh <[email protected]>
Cc: Dan Carpenter <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Souptick Joarder <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Users of HMM might be using the snapshot information to do preparatory
step like dma mapping pages to a device before checking for invalidation
through hmm_vma_range_done() so do not erase that information and assume
users will do the right thing.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Jérôme Glisse <[email protected]>
Reviewed-by: Ralph Campbell <[email protected]>
Reviewed-by: John Hubbard <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Balbir Singh <[email protected]>
Cc: Dan Carpenter <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Souptick Joarder <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Every time I read the code to check that the HMM structure does not vanish
before it should thanks to the many lock protecting its removal i get a
headache. Switch to reference counting instead it is much easier to
follow and harder to break. This also remove some code that is no longer
needed with refcounting.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Jérôme Glisse <[email protected]>
Reviewed-by: Ralph Campbell <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Balbir Singh <[email protected]>
Cc: Dan Carpenter <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Souptick Joarder <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
To avoid random config build issue, select mmu notifier when HMM is
selected. In any cases when HMM get selected it will be by users that
will also wants the mmu notifier.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Jérôme Glisse <[email protected]>
Acked-by: Balbir Singh <[email protected]>
Cc: Ralph Campbell <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Dan Carpenter <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Souptick Joarder <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
hugetlb uses a fault mutex hash table to prevent page faults of the
same pages concurrently. The key for shared and private mappings is
different. Shared keys off address_space and file index. Private keys
off mm and virtual address. Consider a private mappings of a populated
hugetlbfs file. A fault will map the page from the file and if needed
do a COW to map a writable page.
Hugetlbfs hole punch uses the fault mutex to prevent mappings of file
pages. It uses the address_space file index key. However, private
mappings will use a different key and could race with this code to map
the file page. This causes problems (BUG) for the page cache remove
code as it expects the page to be unmapped. A sample stack is:
page dumped because: VM_BUG_ON_PAGE(page_mapped(page))
kernel BUG at mm/filemap.c:169!
...
RIP: 0010:unaccount_page_cache_page+0x1b8/0x200
...
Call Trace:
__delete_from_page_cache+0x39/0x220
delete_from_page_cache+0x45/0x70
remove_inode_hugepages+0x13c/0x380
? __add_to_page_cache_locked+0x162/0x380
hugetlbfs_fallocate+0x403/0x540
? _cond_resched+0x15/0x30
? __inode_security_revalidate+0x5d/0x70
? selinux_file_permission+0x100/0x130
vfs_fallocate+0x13f/0x270
ksys_fallocate+0x3c/0x80
__x64_sys_fallocate+0x1a/0x20
do_syscall_64+0x5b/0x180
entry_SYSCALL_64_after_hwframe+0x44/0xa9
There seems to be another potential COW issue/race with this approach
of different private and shared keys as noted in commit 8382d914ebf7
("mm, hugetlb: improve page-fault scalability").
Since every hugetlb mapping (even anon and private) is actually a file
mapping, just use the address_space index key for all mappings. This
results in potentially more hash collisions. However, this should not
be the common case.
Link: http://lkml.kernel.org/r/[email protected]
Link: http://lkml.kernel.org/r/20190412165235.t4sscoujczfhuiyt@linux-r8p5
Fixes: b5cec28d36f5 ("hugetlbfs: truncate_hugepages() takes a range of pages")
Signed-off-by: Mike Kravetz <[email protected]>
Reviewed-by: Naoya Horiguchi <[email protected]>
Reviewed-by: Davidlohr Bueso <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: "Kirill A . Shutemov" <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
When a huge page is allocated, PagePrivate() is set if the allocation
consumed a reservation. When freeing a huge page, PagePrivate is checked.
If set, it indicates the reservation should be restored. PagePrivate
being set at free huge page time mostly happens on error paths.
When huge page reservations are created, a check is made to determine if
the mapping is associated with an explicitly mounted filesystem. If so,
pages are also reserved within the filesystem. The default action when
freeing a huge page is to decrement the usage count in any associated
explicitly mounted filesystem. However, if the reservation is to be
restored the reservation/use count within the filesystem should not be
decrementd. Otherwise, a subsequent page allocation and free for the same
mapping location will cause the file filesystem usage to go 'negative'.
Filesystem Size Used Avail Use% Mounted on
nodev 4.0G -4.0M 4.1G - /opt/hugepool
To fix, when freeing a huge page do not adjust filesystem usage if
PagePrivate() is set to indicate the reservation should be restored.
I did not cc stable as the problem has been around since reserves were
added to hugetlbfs and nobody has noticed.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Mike Kravetz <[email protected]>
Reviewed-by: Naoya Horiguchi <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: "Kirill A . Shutemov" <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The code comment above sparse_add_one_section() is obsolete and incorrect.
Clean it up and write a new one.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Baoquan He <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Reviewed-by: Oscar Salvador <[email protected]>
Reviewed-by: Mukesh Ojha <[email protected]>
Reviewed-by: Mike Rapoport <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
There is no function named munlock_vma_pages(). Correct it to
munlock_vma_page().
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Peng Fan <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Reviewed-by: Mukesh Ojha <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|