Age | Commit message (Collapse) | Author | Files | Lines |
|
We used to include an alignment overhead into a search length, in that
case we guarantee that a found area will definitely fit after applying a
specific alignment that user specifies. From the other hand we do not
guarantee that an area has the lowest address if an alignment is >=
PAGE_SIZE.
It means that, when a user specifies a special alignment together with a
range that corresponds to an exact requested size then an allocation
will fail. This is what happens to KASAN, it wants the free block that
exactly matches a specified range during onlining memory banks:
[root@vm-0 fedora]# echo online > /sys/devices/system/memory/memory82/state
[root@vm-0 fedora]# echo online > /sys/devices/system/memory/memory83/state
[root@vm-0 fedora]# echo online > /sys/devices/system/memory/memory85/state
[root@vm-0 fedora]# echo online > /sys/devices/system/memory/memory84/state
vmap allocation for size 16777216 failed: use vmalloc=<size> to increase size
bash: vmalloc: allocation failure: 16777216 bytes, mode:0x6000c0(GFP_KERNEL), nodemask=(null),cpuset=/,mems_allowed=0
CPU: 4 PID: 1644 Comm: bash Kdump: loaded Not tainted 4.18.0-339.el8.x86_64+debug #1
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
Call Trace:
dump_stack+0x8e/0xd0
warn_alloc.cold.90+0x8a/0x1b2
? zone_watermark_ok_safe+0x300/0x300
? slab_free_freelist_hook+0x85/0x1a0
? __get_vm_area_node+0x240/0x2c0
? kfree+0xdd/0x570
? kmem_cache_alloc_node_trace+0x157/0x230
? notifier_call_chain+0x90/0x160
__vmalloc_node_range+0x465/0x840
? mark_held_locks+0xb7/0x120
Fix it by making sure that find_vmap_lowest_match() returns lowest start
address with any given alignment value, i.e. for alignments bigger then
PAGE_SIZE the algorithm rolls back toward parent nodes checking right
sub-trees if the most left free block did not fit due to alignment
overhead.
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 68ad4a330433 ("mm/vmalloc.c: keep track of free blocks for vmap allocation")
Signed-off-by: Uladzislau Rezki (Sony) <[email protected]>
Reported-by: Ping Fang <[email protected]>
Tested-by: David Hildenbrand <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Hillf Danton <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Oleksiy Avramchenko <[email protected]>
Cc: Steven Rostedt <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
If last va found in vmap_area_list does not have a vm pointer,
vmallocinfo.s_show() returns 0, and show_purge_info() is not called as
it should.
Link: https://lkml.kernel.org/r/[email protected]
Fixes: dd3b8353bae7 ("mm/vmalloc: do not keep unpurged areas in the busy tree")
Signed-off-by: Eric Dumazet <[email protected]>
Cc: Uladzislau Rezki (Sony) <[email protected]>
Cc: Pengfei Li <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
show_numa_info() can be slightly faster, by skipping over hugepages
directly.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Eric Dumazet <[email protected]>
Cc: Uladzislau Rezki (Sony) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The vmalloc guard pages are added on top of each allocation, thereby
isolating any two allocations from one another. The top guard of the
lower allocation is the bottom guard guard of the higher allocation etc.
Therefore VM_NO_GUARD is dangerous; it breaks the basic premise of
isolating separate allocations.
There are only two in-tree users of this flag, neither of which use it
through the exported interface. Ensure it stays this way.
Link: https://lkml.kernel.org/r/YUMfdA36fuyZ+/[email protected]
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Acked-by: Will Deacon <[email protected]>
Acked-by: Kees Cook <[email protected]>
Cc: Andrey Konovalov <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Uladzislau Rezki <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Commit f255935b9767 ("mm: cleanup the gfp_mask handling in
__vmalloc_area_node") added __GFP_NOWARN to gfp_mask unconditionally
however it disabled all output inside warn_alloc() call. This patch
saves original gfp_mask and provides it to all warn_alloc() calls.
Link: https://lkml.kernel.org/r/[email protected]
Fixes: f255935b9767 ("mm: cleanup the gfp_mask handling in __vmalloc_area_node")
Signed-off-by: Vasily Averin <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Muchun Song <[email protected]>
Cc: Uladzislau Rezki (Sony) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
By using DECLARE_EVENT_CLASS and TRACE_EVENT_FN, we can save a lot of
space from duplicate code.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Gang Li <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
Reviewed-by: Steven Rostedt (VMware) <[email protected]>
Cc: Axel Rasmussen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Ftrace core will add newline automatically on printing, so using it in
TP_printkcreates a blank line.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Gang Li <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
Reviewed-by: Steven Rostedt (VMware) <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Axel Rasmussen <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The fallback was introduced in commit 80c33624e472 ("io-mapping: Fixup
for different names of writecombine") to fix the build on microblaze.
5 years later, it seems all archs now provide a pgprot_writecombine(),
so just remove the other possible fallbacks. For microblaze,
pgprot_writecombine() is available since commit 97ccedd793ac
("microblaze: Provide pgprot_device/writecombine macros for nommu").
This is build-tested on microblaze with a hack to always build
mm/io-mapping.o and without DIYing on an x86-only macro
(_PAGE_CACHE_MASK)
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Lucas De Marchi <[email protected]>
Cc: Chris Wilson <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Joonas Lahtinen <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
All this vm_unacct_memory(charged) dance seems to complicate the life
without a good reason. Furthermore, it seems not always done right on
error-pathes in mremap_to(). And worse than that: this `charged'
difference is sometimes double-accounted for growing MREMAP_DONTUNMAP
mremap()s in move_vma():
if (security_vm_enough_memory_mm(mm, new_len >> PAGE_SHIFT))
Let's not do this. Account memory in mremap() fast-path for growing
VMAs or in move_vma() for actually moving things. The same simpler way
as it's done by vm_stat_account(), but with a difference to call
security_vm_enough_memory_mm() before copying/adjusting VMA.
Originally noticed by Chen Wandun:
https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Fixes: e346b3813067 ("mm/mremap: add MREMAP_DONTUNMAP to mremap()")
Signed-off-by: Dmitry Safonov <[email protected]>
Acked-by: Brian Geffon <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Chen Wandun <[email protected]>
Cc: Dan Carpenter <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Dave Jiang <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: Kefeng Wang <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Cc: Mike Kravetz <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Ralph Campbell <[email protected]>
Cc: Russell King <[email protected]>
Cc: Thomas Bogendoerfer <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vishal Verma <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Wei Yongjun <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
After adjustment, the repeated assignment of "prev" is avoided, and the
readability of the code is improved.
Link: https://lkml.kernel.org/r/[email protected]
Reviewed-by: Andrew Morton <[email protected]>
Signed-off-by: Liu Song <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Commit 3947be1969a9 ("[PATCH] memory hotplug: sysfs and add/remove
functions") defines CONFIG_MEM_BLOCK_SIZE, but this has never been
utilized anywhere.
It is a good practice to keep the CONFIG_* defines exclusively for the
Kbuild system. So, drop this unused definition.
This issue was noticed due to running ./scripts/checkkconfigsymbols.py.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Lukas Bulwahn <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Dave Hansen <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
This patch follows the discussions on previous documentation patch
threads [1][2]. It presents the exception case of shared memory
management from the pagemap's point of view. It briefly describes what
is missing, why it is missing and alternatives to the pagemap for page
info retrieval in user space.
In short, the kernel does not keep track of PTEs for swapped out shared
pages within the processes that references them. Thus, the
proc/pid/pagemap tool cannot print the swap destination of the shared
memory pages, instead setting the pagemap entry to zero for both
non-allocated and swapped out pages. This can create confusion for
users who need information on swapped out pages.
The reasons why maintaining the PTEs of all swapped out shared pages
among all processes while maintaining similar performance is not a
trivial task, or a desirable change, have been discussed extensively
[1][3][4][5]. There are also arguments for why this arguably missing
information should eventually be exposed to the user in either a future
pagemap patch, or by an alternative tool.
[1]: https://marc.info/?m=162878395426774
[2]: https://lore.kernel.org/lkml/[email protected]/
[3]: https://lore.kernel.org/lkml/[email protected]/
[4]: https://lore.kernel.org/lkml/[email protected]/
[5]: https://lore.kernel.org/lkml/[email protected]/
Mention the current missing information in the pagemap and alternatives
on how to retrieve it, in case someone stumbles upon unexpected
behaviour.
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Tiberiu A Georgescu <[email protected]>
Reviewed-by: Ivan Teterevkov <[email protected]>
Reviewed-by: Florian Schmidt <[email protected]>
Reviewed-by: Carl Waldspurger <[email protected]>
Reviewed-by: Jonathan Davies <[email protected]>
Reviewed-by: Peter Xu <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The smp_wmb() which is in the __pte_alloc() is used to ensure all ptes
setup is visible before the pte is made visible to other CPUs by being
put into page tables. We only need this when the pte is actually
populated, so move it to pmd_install(). __pte_alloc_kernel(),
__p4d_alloc(), __pud_alloc() and __pmd_alloc() are similar to this case.
We can also defer smp_wmb() to the place where the pmd entry is really
populated by preallocated pte. There are two kinds of user of
preallocated pte, one is filemap & finish_fault(), another is THP. The
former does not need another smp_wmb() because the smp_wmb() has been
done by pmd_install(). Fortunately, the latter also does not need
another smp_wmb() because there is already a smp_wmb() before populating
the new pte when the THP uses a preallocated pte to split a huge pmd.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Qi Zheng <[email protected]>
Reviewed-by: Muchun Song <[email protected]>
Acked-by: David Hildenbrand <[email protected]>
Acked-by: Kirill A. Shutemov <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Mika Penttila <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vladimir Davydov <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Patch series "Do some code cleanups related to mm", v3.
This patch (of 2):
Currently we have three times the same few lines repeated in the code.
Deduplicate them by newly introduced pmd_install() helper.
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Qi Zheng <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Reviewed-by: Muchun Song <[email protected]>
Acked-by: Kirill A. Shutemov <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Vladimir Davydov <[email protected]>
Cc: Mika Penttila <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Use the helper for the checks. Rename "check_mapping" into
"zap_mapping" because "check_mapping" looks like a bool but in fact it
stores the mapping itself. When it's set, we check the mapping (it must
be non-NULL). When it's cleared we skip the check, which works like the
old way.
Move the duplicated comments to the helper too.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Peter Xu <[email protected]>
Reviewed-by: Alistair Popple <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Axel Rasmussen <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Jerome Glisse <[email protected]>
Cc: "Kirill A . Shutemov" <[email protected]>
Cc: Liam Howlett <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Miaohe Lin <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Yang Shi <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The first_index/last_index parameters in zap_details are actually only
used in unmap_mapping_range_tree(). At the meantime, this function is
only called by unmap_mapping_pages() once.
Instead of passing these two variables through the whole stack of page
zapping code, remove them from zap_details and let them simply be
parameters of unmap_mapping_range_tree(), which is inlined.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Peter Xu <[email protected]>
Reviewed-by: Alistair Popple <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Reviewed-by: Liam Howlett <[email protected]>
Acked-by: Hugh Dickins <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Axel Rasmussen <[email protected]>
Cc: Jerome Glisse <[email protected]>
Cc: "Kirill A . Shutemov" <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Miaohe Lin <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Yang Shi <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
pte_unmap_same() will always unmap the pte pointer. After the unmap,
vmf->pte will not be valid any more, we should clear it.
It was safe only because no one is accessing vmf->pte after
pte_unmap_same() returns, since the only caller of pte_unmap_same() (so
far) is do_swap_page(), where vmf->pte will in most cases be overwritten
very soon.
Directly pass in vmf into pte_unmap_same() and then we can also avoid
the long parameter list too, which should be a nice cleanup.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Peter Xu <[email protected]>
Reviewed-by: Miaohe Lin <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Reviewed-by: Liam Howlett <[email protected]>
Acked-by: Hugh Dickins <[email protected]>
Cc: Alistair Popple <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Axel Rasmussen <[email protected]>
Cc: Jerome Glisse <[email protected]>
Cc: "Kirill A . Shutemov" <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Yang Shi <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Patch series "mm: A few cleanup patches around zap, shmem and uffd", v4.
IMHO all of them are very nice cleanups to existing code already,
they're all small and self-contained. They'll be needed by uffd-wp
coming series.
This patch (of 4):
It was conditionally done previously, as there's one shmem special case
that we use SetPageDirty() instead. However that's not necessary and it
should be easier and cleaner to do it unconditionally in
mfill_atomic_install_pte().
The most recent discussion about this is here, where Hugh explained the
history of SetPageDirty() and why it's possible that it's not required
at all:
https://lore.kernel.org/lkml/[email protected]/
Currently mfill_atomic_install_pte() has three callers:
1. shmem_mfill_atomic_pte
2. mcopy_atomic_pte
3. mcontinue_atomic_pte
After the change: case (1) should have its SetPageDirty replaced by the
dirty bit on pte (so we unify them together, finally), case (2) should
have no functional change at all as it has page_in_cache==false, case
(3) may add a dirty bit to the pte. However since case (3) is
UFFDIO_CONTINUE for shmem, it's merely 100% sure the page is dirty after
all because UFFDIO_CONTINUE normally requires another process to modify
the page cache and kick the faulted thread, so should not make a real
difference either.
This should make it much easier to follow on which case will set dirty
for uffd, as we'll simply set it all now for all uffd related ioctls.
Meanwhile, no special handling of SetPageDirty() if there's no need.
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Peter Xu <[email protected]>
Reviewed-by: Axel Rasmussen <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Liam Howlett <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Yang Shi <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: "Kirill A . Shutemov" <[email protected]>
Cc: Jerome Glisse <[email protected]>
Cc: Alistair Popple <[email protected]>
Cc: Miaohe Lin <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Annotating a pointer from __user to kernel and then back again might
confuse sparse. In copy_huge_page_from_user() it can be avoided by
removing the intermediate variable since it is never used.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Amit Daniel Kachhap <[email protected]>
Acked-by: Kirill A. Shutemov <[email protected]>
Cc: Vincenzo Frascino <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
It is defined in the same file just a few lines above.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Rolf Eike Beer <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The variable mm->total_vm could be accessed concurrently during mmaping
and system accounting as noticed by KCSAN,
BUG: KCSAN: data-race in __acct_update_integrals / mmap_region
read-write to 0xffffa40267bd14c8 of 8 bytes by task 15609 on cpu 3:
mmap_region+0x6dc/0x1400
do_mmap+0x794/0xca0
vm_mmap_pgoff+0xdf/0x150
ksys_mmap_pgoff+0xe1/0x380
do_syscall_64+0x37/0x50
entry_SYSCALL_64_after_hwframe+0x44/0xa9
read to 0xffffa40267bd14c8 of 8 bytes by interrupt on cpu 2:
__acct_update_integrals+0x187/0x1d0
acct_account_cputime+0x3c/0x40
update_process_times+0x5c/0x150
tick_sched_timer+0x184/0x210
__run_hrtimer+0x119/0x3b0
hrtimer_interrupt+0x350/0xaa0
__sysvec_apic_timer_interrupt+0x7b/0x220
asm_call_irq_on_stack+0x12/0x20
sysvec_apic_timer_interrupt+0x4d/0x80
asm_sysvec_apic_timer_interrupt+0x12/0x20
smp_call_function_single+0x192/0x2b0
perf_install_in_context+0x29b/0x4a0
__se_sys_perf_event_open+0x1a98/0x2550
__x64_sys_perf_event_open+0x63/0x70
do_syscall_64+0x37/0x50
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Reported by Kernel Concurrency Sanitizer on:
CPU: 2 PID: 15610 Comm: syz-executor.3 Not tainted 5.10.0+ #2
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
Ubuntu-1.8.2-1ubuntu1 04/01/2014
In vm_stat_account which called by mmap_region, increase total_vm, and
__acct_update_integrals may read total_vm at the same time. This will
cause a data race which lead to undefined behaviour. To avoid potential
bad read/write, volatile property and barrier are both used to avoid
undefined behaviour.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Peng Liu <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Memory cgroup charging allows killed or exiting tasks to exceed the hard
limit. It is assumed that the amount of the memory charged by those
tasks is bound and most of the memory will get released while the task
is exiting. This is resembling a heuristic for the global OOM situation
when tasks get access to memory reserves. There is no global memory
shortage at the memcg level so the memcg heuristic is more relieved.
The above assumption is overly optimistic though. E.g. vmalloc can
scale to really large requests and the heuristic would allow that. We
used to have an early break in the vmalloc allocator for killed tasks
but this has been reverted by commit b8c8a338f75e ("Revert "vmalloc:
back off when the current task is killed""). There are likely other
similar code paths which do not check for fatal signals in an
allocation&charge loop. Also there are some kernel objects charged to a
memcg which are not bound to a process life time.
It has been observed that it is not really hard to trigger these
bypasses and cause global OOM situation.
One potential way to address these runaways would be to limit the amount
of excess (similar to the global OOM with limited oom reserves). This
is certainly possible but it is not really clear how much of an excess
is desirable and still protects from global OOMs as that would have to
consider the overall memcg configuration.
This patch is addressing the problem by removing the heuristic
altogether. Bypass is only allowed for requests which either cannot
fail or where the failure is not desirable while excess should be still
limited (e.g. atomic requests). Implementation wise a killed or dying
task fails to charge if it has passed the OOM killer stage. That should
give all forms of reclaim chance to restore the limit before the failure
(ENOMEM) and tell the caller to back off.
In addition, this patch renames should_force_charge() helper to
task_is_dying() because now its use is not associated witch forced
charging.
This patch depends on pagefault_out_of_memory() to not trigger
out_of_memory(), because then a memcg failure can unwind to VM_FAULT_OOM
and cause a global OOM killer.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Vasily Averin <[email protected]>
Suggested-by: Michal Hocko <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Vladimir Davydov <[email protected]>
Cc: Roman Gushchin <[email protected]>
Cc: Uladzislau Rezki <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Shakeel Butt <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Tetsuo Handa <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Any allocation failure during the #PF path will return with VM_FAULT_OOM
which in turn results in pagefault_out_of_memory. This can happen for 2
different reasons. a) Memcg is out of memory and we rely on
mem_cgroup_oom_synchronize to perform the memcg OOM handling or b)
normal allocation fails.
The latter is quite problematic because allocation paths already trigger
out_of_memory and the page allocator tries really hard to not fail
allocations. Anyway, if the OOM killer has been already invoked there
is no reason to invoke it again from the #PF path. Especially when the
OOM condition might be gone by that time and we have no way to find out
other than allocate.
Moreover if the allocation failed and the OOM killer hasn't been invoked
then we are unlikely to do the right thing from the #PF context because
we have already lost the allocation context and restictions and
therefore might oom kill a task from a different NUMA domain.
This all suggests that there is no legitimate reason to trigger
out_of_memory from pagefault_out_of_memory so drop it. Just to be sure
that no #PF path returns with VM_FAULT_OOM without allocation print a
warning that this is happening before we restart the #PF.
[VvS: #PF allocation can hit into limit of cgroup v1 kmem controller.
This is a local problem related to memcg, however, it causes unnecessary
global OOM kills that are repeated over and over again and escalate into a
real disaster. This has been broken since kmem accounting has been
introduced for cgroup v1 (3.8). There was no kmem specific reclaim for
the separate limit so the only way to handle kmem hard limit was to return
with ENOMEM. In upstream the problem will be fixed by removing the
outdated kmem limit, however stable and LTS kernels cannot do it and are
still affected. This patch fixes the problem and should be backported
into stable/LTS.]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Michal Hocko <[email protected]>
Signed-off-by: Vasily Averin <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Roman Gushchin <[email protected]>
Cc: Shakeel Butt <[email protected]>
Cc: Tetsuo Handa <[email protected]>
Cc: Uladzislau Rezki <[email protected]>
Cc: Vladimir Davydov <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Patch series "memcg: prohibit unconditional exceeding the limit of dying tasks", v3.
Memory cgroup charging allows killed or exiting tasks to exceed the hard
limit. It can be misused and allowed to trigger global OOM from inside
a memcg-limited container. On the other hand if memcg fails allocation,
called from inside #PF handler it triggers global OOM from inside
pagefault_out_of_memory().
To prevent these problems this patchset:
(a) removes execution of out_of_memory() from
pagefault_out_of_memory(), becasue nobody can explain why it is
necessary.
(b) allow memcg to fail allocation of dying/killed tasks.
This patch (of 3):
Any allocation failure during the #PF path will return with VM_FAULT_OOM
which in turn results in pagefault_out_of_memory which in turn executes
out_out_memory() and can kill a random task.
An allocation might fail when the current task is the oom victim and
there are no memory reserves left. The OOM killer is already handled at
the page allocator level for the global OOM and at the charging level
for the memcg one. Both have much more information about the scope of
allocation/charge request. This means that either the OOM killer has
been invoked properly and didn't lead to the allocation success or it
has been skipped because it couldn't have been invoked. In both cases
triggering it from here is pointless and even harmful.
It makes much more sense to let the killed task die rather than to wake
up an eternally hungry oom-killer and send him to choose a fatter victim
for breakfast.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Vasily Averin <[email protected]>
Suggested-by: Michal Hocko <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Roman Gushchin <[email protected]>
Cc: Shakeel Butt <[email protected]>
Cc: Tetsuo Handa <[email protected]>
Cc: Uladzislau Rezki <[email protected]>
Cc: Vladimir Davydov <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The non-memcg-aware lru is always skiped when traversing the global lru
list, which is not efficient. We can only add the memcg-aware lru to
the global lru list instead to make traversing more efficient.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Muchun Song <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Roman Gushchin <[email protected]>
Cc: Shakeel Butt <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Now the kmem states is only used to indicate whether the kmem is
offline. However, we can set ->kmemcg_id to -1 to indicate whether the
kmem is offline. Finally, we can remove the kmem states to simplify the
code.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Muchun Song <[email protected]>
Acked-by: Roman Gushchin <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Shakeel Butt <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Johannes Weiner <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Since slab objects and kmem pages are charged to object cgroup instead
of memory cgroup, memcg_reparent_objcgs() will reparent this cgroup and
all its descendants to its parent cgroup. This already makes further
list_lru_add()'s add elements to the parent's list. So it is
unnecessary to change kmemcg_id of an offline cgroup to its parent's id.
It just wastes CPU cycles. Just remove the redundant code.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Muchun Song <[email protected]>
Acked-by: Roman Gushchin <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Shakeel Butt <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Since commit 2788cf0c401c ("memcg: reparent list_lrus and free kmemcg_id
on css offline"), ->nr_items can be negative during memory cgroup
reparenting. In this case, list_lru_count_one() will return an unusual
and huge value, which can surprise users. At least for now it hasn't
affected any users. But it is better to let list_lru_count_ont()
returns zero when ->nr_items is negative.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Muchun Song <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Roman Gushchin <[email protected]>
Cc: Shakeel Butt <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Since commit e5bc3af7734f ("rcu: Consolidate PREEMPT and !PREEMPT
synchronize_rcu()"), the critical section of spin lock can serve as an
RCU read-side critical section which already allows readers that hold
nlru->lock to avoid taking rcu lock. So just remove holding lock.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Muchun Song <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Roman Gushchin <[email protected]>
Cc: Shakeel Butt <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The deprecation process of kmem.limit_in_bytes started with the commit
0158115f702 ("memcg, kmem: deprecate kmem.limit_in_bytes") which also
explains in detail the motivation behind the deprecation. To summarize,
it is the unexpected behavior on hitting the kmem limit. This patch
moves the deprecation process to the next stage by disallowing to set
the kmem limit. In future we might just remove the kmem.limit_in_bytes
file completely.
[[email protected]: s/ENOTSUPP/EOPNOTSUPP/]
[[email protected]: mark cancel_charge() inline]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Shakeel Butt <[email protected]>
Signed-off-by: Arnd Bergmann <[email protected]>
Acked-by: Roman Gushchin <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Reviewed-by: Muchun Song <[email protected]>
Cc: Vasily Averin <[email protected]>
Cc: Johannes Weiner <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
As noted in the "Deprecated Interfaces, Language Features, Attributes,
and Conventions" documentation [1], size calculations (especially
multiplication) should not be performed in memory allocator (or similar)
function arguments due to the risk of them overflowing.
This could lead to values wrapping around and a smaller allocation being
made than the caller was expecting. Using those allocations could lead
to linear overflows of heap memory and other misbehaviors.
So, use the struct_size() helper to do the arithmetic instead of the
argument "size + count * size" in the kvmalloc() functions.
Also, take the opportunity to refactor the memcpy() call to use the
flex_array_size() helper.
This code was detected with the help of Coccinelle and audited and fixed
manually.
[1] https://www.kernel.org/doc/html/latest/process/deprecated.html#open-coded-arithmetic-in-allocator-arguments
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Len Baker <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: "Gustavo A. R. Silva" <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Since commit d648bcc7fe65 ("mm: kmem: make memcg_kmem_enabled()
irreversible"), the only thing memcg_free_kmem() does is to call
memcg_offline_kmem() when the memcg is still online which can happen
when online_css() fails due to -ENOMEM.
However, the name memcg_free_kmem() is confusing and it is more clear
and straight forward to call memcg_offline_kmem() directly from
mem_cgroup_css_free().
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Waiman Long <[email protected]>
Suggested-by: Roman Gushchin <[email protected]>
Reviewed-by: Aaron Tomlin <[email protected]>
Reviewed-by: Shakeel Butt <[email protected]>
Reviewed-by: Roman Gushchin <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Vladimir Davydov <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Muchun Song <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The memcg stats can be flushed in multiple context and potentially in
parallel too. For example multiple parallel user space readers for
memcg stats will contend on the rstat locks with each other. There is
no need for that. We just need one flusher and everyone else can
benefit.
In addition after aa48e47e3906 ("memcg: infrastructure to flush memcg
stats") the kernel periodically flush the memcg stats from the root, so,
the other flushers will potentially have much less work to do.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Shakeel Butt <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: "Michal Koutný" <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
At the moment, the kernel flushes the memcg stats on every refault and
also on every reclaim iteration. Although rstat maintains per-cpu
update tree but on the flush the kernel still has to go through all the
cpu rstat update tree to check if there is anything to flush. This
patch adds the tracking on the stats update side to make flush side more
clever by skipping the flush if there is no update.
The stats update codepath is very sensitive performance wise for many
workloads and benchmarks. So, we can not follow what the commit
aa48e47e3906 ("memcg: infrastructure to flush memcg stats") did which
was triggering async flush through queue_work() and caused a lot
performance regression reports. That got reverted by the commit
1f828223b799 ("memcg: flush lruvec stats in the refault").
In this patch we kept the stats update codepath very minimal and let the
stats reader side to flush the stats only when the updates are over a
specific threshold. For now the threshold is (nr_cpus * CHARGE_BATCH).
To evaluate the impact of this patch, an 8 GiB tmpfs file is created on
a system with swap-on-zram and the file was pushed to swap through
memory.force_empty interface. On reading the whole file, the memcg stat
flush in the refault code path is triggered. With this patch, we
observed 63% reduction in the read time of 8 GiB file.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Shakeel Butt <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
Cc: Michal Hocko <[email protected]>
Reviewed-by: "Michal Koutný" <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
It is unused after the rework of commit f5df8635c5a3 ("mm: use
find_get_incore_page in memcontrol").
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Peter Xu <[email protected]>
Reviewed-by: Muchun Song <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Instead of calling put_page() one page at a time, pop pages off the list
if their refcount was too high and pass the remainder to
put_unref_page_list(). This should be a speed improvement, but I have
no measurements to support that. Current callers do not care about
performance, but I hope to add some which do.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Reviewed-by: Anthony Yznaga <[email protected]>
Cc: Mel Gorman <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
This one is just a minor nuisance for people going through /proc/swaps
if any of their swapareas is bigger than, or equal to 1073741824 pages
(4TB).
seq_printf() format string casts as uint the conversion from pages to
KB, and that will overflow in the aforementioned case.
Albeit being almost unthinkable that someone would actually set up such
big of a single swaparea, there is a ticket recently filed against RHEL:
https://bugzilla.redhat.com/show_bug.cgi?id=2008812
Given that all other codesites that use format strings for the same swap
pages-to-KB conversion do cast it as ulong, this patch just follows
suit.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Rafael Aquini <[email protected]>
Cc: Hugh Dickins <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The request_queue pointer returned from bdev_get_queue() shall never be
NULL, so the null check is unnecessary, just remove it.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Xu Wang <[email protected]>
Acked-by: David Hildenbrand <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Commit 6401c4eb57f9 ("mm: gup: fix potential pgmap refcnt leak in
__gup_device_huge()") simplified the return paths, but didn't go quite
far enough, as discussed in [1].
Remove the "ret" variable entirely, because there is enough information
already available to provide the return value.
[1] https://lore.kernel.org/r/CAHk-=wgQTRX=5SkCmS+zfmpqubGHGJvXX_HgnPG8JSpHKHBMeg@mail.gmail.com
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: John Hubbard <[email protected]>
Suggested-by: Linus Torvalds <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Cc: Miaohe Lin <[email protected]>
Cc: Claudio Imbrenda <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The fast path here is not needing any writeback, yet we spend time
setting up the xarray lookup data upfront. Move the part that actually
needs to iterate the address space mapping into a separate helper,
saving ~30% of the time here.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
It is not safe to check page->index without holding the page lock. It
can be changed if the page is moved between the swap cache and the page
cache for a shmem file, for example. There is a VM_BUG_ON below which
checks page->index is correct after taking the page lock.
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 5c211ba29deb ("mm: add and use find_lock_entries")
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Reported-by: <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
We always go through i_size_read(), and we rarely end up needing it.
Push the read to down where we need to check it, which avoids it for
most cases.
It looks like we can even remove this check entirely, which might be
worth pursuing. But at least this takes it out of the hot path.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
Acked-by: Chris Mason <[email protected]>
Cc: Josef Bacik <[email protected]>
Cc: Dave Chinner <[email protected]>
Cc: Pavel Begunkov <[email protected]>
Cc: Jan Kara <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Move grabbing and releasing the bdi refcount out of the common
wb_init/wb_exit helpers into code that is only used for the non-default
memcg driven bdi_writeback structures.
[[email protected]: add comment]
Link: https://lkml.kernel.org/r/[email protected]
[[email protected]: fix typo]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Cc: Miquel Raynal <[email protected]>
Cc: Richard Weinberger <[email protected]>
Cc: Vignesh Raghavendra <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
All BDI users now unregister explicitly.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Cc: Miquel Raynal <[email protected]>
Cc: Richard Weinberger <[email protected]>
Cc: Vignesh Raghavendra <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Add a new SB_I_ flag to mark superblocks that have an ephemeral bdi
associated with them, and unregister it when the superblock is shut
down.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Cc: Miquel Raynal <[email protected]>
Cc: Richard Weinberger <[email protected]>
Cc: Vignesh Raghavendra <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Call bdi_unregister explicitly instead of relying on the automatic
unregistration.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Cc: Miquel Raynal <[email protected]>
Cc: Richard Weinberger <[email protected]>
Cc: Vignesh Raghavendra <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Patch series "simplify bdi unregistation".
This series simplifies the BDI code to get rid of the magic
auto-unregister feature that hid a recent block layer refcounting bug.
This patch (of 5):
To wind down the magic auto-unregister semantics we'll need to push this
into modular code.
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Cc: Miquel Raynal <[email protected]>
Cc: Richard Weinberger <[email protected]>
Cc: Vignesh Raghavendra <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Under some circumstances, filemap_read() will allocate sufficient pages
to read to the end of the file, call readahead/readpages on them and
copy the data over - and then it will allocate another page at the EOF
and call readpage on that and then ignore it. This is unnecessary and a
waste of time and resources.
filemap_read() *does* check for this, but only after it has already done
the allocation and I/O. Fix this by checking before calling
filemap_get_pages() also.
Link: https://lkml.kernel.org/r/163472463105.3126792.7056099385135786492.stgit@warthog.procyon.org.uk
Link: https://lore.kernel.org/r/160588481358.3465195.16552616179674485179.stgit@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/163456863216.2614702.6384850026368833133.stgit@warthog.procyon.org.uk/
Signed-off-by: David Howells <[email protected]>
Acked-by: Jeff Layton <[email protected]>
Reviewed-by: Matthew Wilcox (Oracle) <[email protected]>
Cc: Kent Overstreet <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
I have noticed that the previous macro is #ifndef CONFIG_SPARSEMEM. I
think the comment of #else should be CONFIG_SPARSEMEM.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Yinan Zhang <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
As already done in GrapheneOS, add the __alloc_size attribute for
appropriate percpu allocator interfaces, to provide additional hinting
for better bounds checking, assisting CONFIG_FORTIFY_SOURCE and other
compiler optimizations.
Note that due to the implementation of the percpu API, this is unlikely
to ever actually provide compile-time checking beyond very simple
non-SMP builds. But, since they are technically allocators, mark them
as such.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Kees Cook <[email protected]>
Co-developed-by: Daniel Micay <[email protected]>
Signed-off-by: Daniel Micay <[email protected]>
Acked-by: Dennis Zhou <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Andy Whitcroft <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Dwaipayan Ray <[email protected]>
Cc: Joe Perches <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Lukas Bulwahn <[email protected]>
Cc: Miguel Ojeda <[email protected]>
Cc: Nathan Chancellor <[email protected]>
Cc: Nick Desaulniers <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Alexandre Bounine <[email protected]>
Cc: Gustavo A. R. Silva <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: Jing Xiangfeng <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: kernel test robot <[email protected]>
Cc: Matt Porter <[email protected]>
Cc: Randy Dunlap <[email protected]>
Cc: Souptick Joarder <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|