blaster4385/linux-IllusionX - Linux kernel with personal config changes for arch linux

Age	Commit message (Collapse)	Author	Files	Lines
2020-04-02	PCI: endpoint: functions/pci-epf-test: Print throughput information	Kishon Vijay Abraham I	1	-0/+48
	Print throughput information in KB/s after every completed transfer, including information on whether DMA is used or not. Signed-off-by: Kishon Vijay Abraham I <[email protected]> Signed-off-by: Lorenzo Pieralisi <[email protected]> Tested-by: Alan Mikhak <[email protected]>
2020-04-02	PCI: endpoint: functions/pci-epf-test: Add DMA support to transfer data	Kishon Vijay Abraham I	1	-3/+202
	Use dmaengine API and add support for transferring data using DMA. Signed-off-by: Kishon Vijay Abraham I <[email protected]> Signed-off-by: Lorenzo Pieralisi <[email protected]> Tested-by: Alan Mikhak <[email protected]>
2020-04-02	rtc: class: remove redundant assignment to variable err	Colin Ian King	1	-1/+1
	The variable err is being initialized with a value that is never read and it is being updated later with a new value. The initialization is redundant and can be removed. Addresses-Coverity: ("Unused value") Signed-off-by: Colin Ian King <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexandre Belloni <[email protected]>
2020-04-02	rtc: remove rtc_time_to_tm and rtc_tm_to_time	Alexandre Belloni	1	-12/+0
	There are no callers of the 32bit versions of rtc_time conversion functions, drop them. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexandre Belloni <[email protected]>
2020-04-02	rtc: sun6i: let the core handle rtc range	Alexandre Belloni	1	-15/+10
	Let the rtc core check the date/time against the RTC range. Tested-by: Paul Kocialkowski <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexandre Belloni <[email protected]>
2020-04-02	rtc: sun6i: switch to rtc_time64_to_tm/rtc_tm_to_time64	Alexandre Belloni	1	-3/+3
	Call the 64bit versions of rtc_tm time conversion. Tested-by: Paul Kocialkowski <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexandre Belloni <[email protected]>
2020-04-02	include/linux/huge_mm.h: check PageTail in hpage_nr_pages even when !THP	Matthew Wilcox (Oracle)	1	-1/+5
	It's even more important to check that we don't have a tail page when calling hpage_nr_pages() when THP are disabled. Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Cc: Aneesh Kumar K.V <[email protected]> Cc: Pankaj Gupta <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-04-02	mm/hugetlb: fix build failure with HUGETLB_PAGE but not HUGEBTLBFS	Christophe Leroy	1	-11/+8
	When CONFIG_HUGETLB_PAGE is set but not CONFIG_HUGETLBFS, the following build failure is encoutered: In file included from arch/powerpc/mm/fault.c:33:0: include/linux/hugetlb.h: In function 'hstate_inode': include/linux/hugetlb.h:477:9: error: implicit declaration of function 'HUGETLBFS_SB' [-Werror=implicit-function-declaration] return HUGETLBFS_SB(i->i_sb)->hstate; ^ include/linux/hugetlb.h:477:30: error: invalid type argument of '->' (have 'int') return HUGETLBFS_SB(i->i_sb)->hstate; ^ Gate hstate_inode() with CONFIG_HUGETLBFS instead of CONFIG_HUGETLB_PAGE. Fixes: a137e1cc6d6e ("hugetlbfs: per mount huge page sizes") Reported-by: kbuild test robot <[email protected]> Signed-off-by: Christophe Leroy <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Mike Kravetz <[email protected]> Cc: Baoquan He <[email protected]> Cc: Nishanth Aravamudan <[email protected]> Cc: Nick Piggin <[email protected]> Cc: Adam Litke <[email protected]> Cc: Andi Kleen <[email protected]> Link: http://lkml.kernel.org/r/7e8c3a3c9a587b9cd8a2f146df32a421b961f3a2.1584432148.git.christophe.leroy@c-s.fr Link: https://patchwork.ozlabs.org/patch/1255548/#2386036 Signed-off-by: Linus Torvalds <[email protected]>
2020-04-02	selftests/vm: fix map_hugetlb length used for testing read and write	Christophe Leroy	1	-7/+7
	Commit fa7b9a805c79 ("tools/selftest/vm: allow choosing mem size and page size in map_hugetlb") added the possibility to change the size of memory mapped for the test, but left the read and write test using the default value. This is unnoticed when mapping a length greater than the default one, but segfaults otherwise. Fix read_bytes() and write_bytes() by giving them the real length. Also fix the call to munmap(). Fixes: fa7b9a805c79 ("tools/selftest/vm: allow choosing mem size and page size in map_hugetlb") Signed-off-by: Christophe Leroy <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Leonardo Bras <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Shuah Khan <[email protected]> Cc: <[email protected]> Link: http://lkml.kernel.org/r/9a404a13c871c4bd0ba9ede68f69a1225180dd7e.1580978385.git.christophe.leroy@c-s.fr Signed-off-by: Linus Torvalds <[email protected]>
2020-04-02	mm/hugetlb: remove unnecessary memory fetch in PageHeadHuge()	Vlastimil Babka	1	-1/+1
	Commit f1e61557f023 ("mm: pack compound_dtor and compound_order into one word in struct page") changed compound_dtor from a pointer to an array index in order to pack it. To check if page has the hugeltbfs compound_dtor, we can just compare the index directly without fetching the function pointer. Said commit did that with PageHuge() and we can do the same with PageHeadHuge() to make the code a bit smaller and faster. Signed-off-by: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Mike Kravetz <[email protected]> Acked-by: David Rientjes <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Cc: Neha Agarwal <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-04-02	mm/hugetlb.c: clean code by removing unnecessary initialization	Mateusz Nosek	1	-1/+1
	Previously variable 'check_addr' was initialized, but was not read later before reassigning. So the initialization can be removed. Signed-off-by: Mateusz Nosek <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Reviewed-by: Mike Kravetz <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-04-02	hugetlb_cgroup: add hugetlb_cgroup reservation docs	Mina Almasry	1	-11/+92
	Add docs for how to use hugetlb_cgroup reservations, and their behavior. Signed-off-by: Mina Almasry <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Cc: David Rientjes <[email protected]> Cc: Greg Thelen <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Sandipan Das <[email protected]> Cc: Shakeel Butt <[email protected]> Cc: Shuah Khan <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-04-02	hugetlb_cgroup: add hugetlb_cgroup reservation tests	Mina Almasry	6	-0/+1086
	The tests use both shared and private mapped hugetlb memory, and monitors the hugetlb usage counter as well as the hugetlb reservation counter. They test different configurations such as hugetlb memory usage via hugetlbfs, or MAP_HUGETLB, or shmget/shmat, and with and without MAP_POPULATE. Also add test for hugetlb reservation reparenting, since this is a subtle issue. Signed-off-by: Mina Almasry <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Tested-by: Sandipan Das <[email protected]> [powerpc64] Acked-by: Mike Kravetz <[email protected]> Cc: Sandipan Das <[email protected]> Cc: David Rientjes <[email protected]> Cc: Greg Thelen <[email protected]> Cc: Shakeel Butt <[email protected]> Cc: Shuah Khan <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-04-02	hugetlb: support file_region coalescing again	Mina Almasry	1	-0/+44
	An earlier patch in this series disabled file_region coalescing in order to hang the hugetlb_cgroup uncharge info on the file_region entries. This patch re-adds support for coalescing of file_region entries. Essentially everytime we add an entry, we call a recursive function that tries to coalesce the added region with the regions next to it. The worst case call depth for this function is 3: one to coalesce with the region next to it, one to coalesce to the region prev, and one to reach the base case. This is an important performance optimization as private mappings add their entries page by page, and we could incur big performance costs for large mappings with lots of file_region entries in their resv_map. [[email protected]: fix CONFIG_CGROUP_HUGETLB ifdefs] Link: http://lkml.kernel.org/r/[email protected] [[email protected]: remove check_coalesce_bug debug code] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Mina Almasry <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Mike Kravetz <[email protected]> Acked-by: David Rientjes <[email protected]> Cc: Greg Thelen <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Sandipan Das <[email protected]> Cc: Shakeel Butt <[email protected]> Cc: Shuah Khan <[email protected]> Cc: Randy Dunlap <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-04-02	hugetlb_cgroup: support noreserve mappings	Mina Almasry	1	-1/+26
	Support MAP_NORESERVE accounting as part of the new counter. For each hugepage allocation, at allocation time we check if there is a reservation for this allocation or not. If there is a reservation for this allocation, then this allocation was charged at reservation time, and we don't re-account it. If there is no reserevation for this allocation, we charge the appropriate hugetlb_cgroup. The hugetlb_cgroup to uncharge for this allocation is stored in page[3].private. We use new APIs added in an earlier patch to set this pointer. Signed-off-by: Mina Almasry <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Mike Kravetz <[email protected]> Cc: David Rientjes <[email protected]> Cc: Greg Thelen <[email protected]> Cc: Sandipan Das <[email protected]> Cc: Shakeel Butt <[email protected]> Cc: Shuah Khan <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-04-02	hugetlb_cgroup: add accounting for shared mappings	Mina Almasry	4	-54/+155
	For shared mappings, the pointer to the hugetlb_cgroup to uncharge lives in the resv_map entries, in file_region->reservation_counter. After a call to region_chg, we charge the approprate hugetlb_cgroup, and if successful, we pass on the hugetlb_cgroup info to a follow up region_add call. When a file_region entry is added to the resv_map via region_add, we put the pointer to that cgroup in file_region->reservation_counter. If charging doesn't succeed, we report the error to the caller, so that the kernel fails the reservation. On region_del, which is when the hugetlb memory is unreserved, we also uncharge the file_region->reservation_counter. [[email protected]: forward declare struct file_region] Signed-off-by: Mina Almasry <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Mike Kravetz <[email protected]> Cc: David Rientjes <[email protected]> Cc: Greg Thelen <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Sandipan Das <[email protected]> Cc: Shakeel Butt <[email protected]> Cc: Shuah Khan <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-04-02	hugetlb: disable region_add file_region coalescing	Mina Almasry	1	-108/+228
	A follow up patch in this series adds hugetlb cgroup uncharge info the file_region entries in resv->regions. The cgroup uncharge info may differ for different regions, so they can no longer be coalesced at region_add time. So, disable region coalescing in region_add in this patch. Behavior change: Say a resv_map exists like this [0->1], [2->3], and [5->6]. Then a region_chg/add call comes in region_chg/add(f=0, t=5). Old code would generate resv->regions: [0->5], [5->6]. New code would generate resv->regions: [0->1], [1->2], [2->3], [3->5], [5->6]. Special care needs to be taken to handle the resv->adds_in_progress variable correctly. In the past, only 1 region would be added for every region_chg and region_add call. But now, each call may add multiple regions, so we can no longer increment adds_in_progress by 1 in region_chg, or decrement adds_in_progress by 1 after region_add or region_abort. Instead, region_chg calls add_reservation_in_range() to count the number of regions needed and allocates those, and that info is passed to region_add and region_abort to decrement adds_in_progress correctly. We've also modified the assumption that region_add after region_chg never fails. region_chg now pre-allocates at least 1 region for region_add. If region_add needs more regions than region_chg has allocated for it, then it may fail. [[email protected]: fix file_region entry allocations] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Mina Almasry <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Mike Kravetz <[email protected]> Acked-by: David Rientjes <[email protected]> Cc: Sandipan Das <[email protected]> Cc: Shakeel Butt <[email protected]> Cc: Shuah Khan <[email protected]> Cc: Greg Thelen <[email protected]> Cc: Miguel Ojeda <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-04-02	hugetlb_cgroup: add reservation accounting for private mappings	Mina Almasry	4	-40/+99
	Normally the pointer to the cgroup to uncharge hangs off the struct page, and gets queried when it's time to free the page. With hugetlb_cgroup reservations, this is not possible. Because it's possible for a page to be reserved by one task and actually faulted in by another task. The best place to put the hugetlb_cgroup pointer to uncharge for reservations is in the resv_map. But, because the resv_map has different semantics for private and shared mappings, the code patch to charge/uncharge shared and private mappings is different. This patch implements charging and uncharging for private mappings. For private mappings, the counter to uncharge is in resv_map->reservation_counter. On initializing the resv_map this is set to NULL. On reservation of a region in private mapping, the tasks hugetlb_cgroup is charged and the hugetlb_cgroup is placed is resv_map->reservation_counter. On hugetlb_vm_op_close, we uncharge resv_map->reservation_counter. [[email protected]: forward declare struct resv_map] Signed-off-by: Mina Almasry <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Mike Kravetz <[email protected]> Acked-by: David Rientjes <[email protected]> Cc: Greg Thelen <[email protected]> Cc: Sandipan Das <[email protected]> Cc: Shakeel Butt <[email protected]> Cc: Shuah Khan <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-04-02	mm/hugetlb_cgroup: fix hugetlb_cgroup migration	Mina Almasry	1	-0/+2
	Commit c32300516047 ("hugetlb_cgroup: add interface for charge/uncharge hugetlb reservations") mistakingly doesn't handle the migration of both the reservation hugetlb_cgroup and the fault hugetlb_cgroup correctly. What should happen is that both cgroups shuold be queried from the old page, then both set to NULL on the old page, then both inserted into the new page. The mistake also creates the following warning: mm/hugetlb_cgroup.c: In function 'hugetlb_cgroup_migrate': mm/hugetlb_cgroup.c:777:25: warning: variable 'h_cg' set but not used [-Wunused-but-set-variable] struct hugetlb_cgroup *h_cg; ^~~~ Solution is to add the missing steps, namly setting the reservation hugetlb_cgroup to NULL on the old page, and setting the fault hugetlb_cgroup on the new page. Fixes: c32300516047 ("hugetlb_cgroup: add interface for charge/uncharge hugetlb reservations") Reported-by: Qian Cai <[email protected]> Signed-off-by: Mina Almasry <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Cc: David Rientjes <[email protected]> Cc: Greg Thelen <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Sandipan Das <[email protected]> Cc: Shakeel Butt <[email protected]> Cc: Shuah Khan <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-04-02	hugetlb_cgroup: add interface for charge/uncharge hugetlb reservations	Mina Almasry	3	-48/+251
	Augments hugetlb_cgroup_charge_cgroup to be able to charge hugetlb usage or hugetlb reservation counter. Adds a new interface to uncharge a hugetlb_cgroup counter via hugetlb_cgroup_uncharge_counter. Integrates the counter with hugetlb_cgroup, via hugetlb_cgroup_init, hugetlb_cgroup_have_usage, and hugetlb_cgroup_css_offline. Signed-off-by: Mina Almasry <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Acked-by: Mike Kravetz <[email protected]> Acked-by: David Rientjes <[email protected]> Cc: Greg Thelen <[email protected]> Cc: Sandipan Das <[email protected]> Cc: Shakeel Butt <[email protected]> Cc: Shuah Khan <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-04-02	hugetlb_cgroup: add hugetlb_cgroup reservation counter	Mina Almasry	2	-15/+104
	These counters will track hugetlb reservations rather than hugetlb memory faulted in. This patch only adds the counter, following patches add the charging and uncharging of the counter. This is patch 1 of an 9 patch series. Problem: Currently tasks attempting to reserve more hugetlb memory than is available get a failure at mmap/shmget time. This is thanks to Hugetlbfs Reservations [1]. However, if a task attempts to reserve more hugetlb memory than its hugetlb_cgroup limit allows, the kernel will allow the mmap/shmget call, but will SIGBUS the task when it attempts to fault in the excess memory. We have users hitting their hugetlb_cgroup limits and thus we've been looking at this failure mode. We'd like to improve this behavior such that users violating the hugetlb_cgroup limits get an error on mmap/shmget time, rather than getting SIGBUS'd when they try to fault the excess memory in. This gives the user an opportunity to fallback more gracefully to non-hugetlbfs memory for example. The underlying problem is that today's hugetlb_cgroup accounting happens at hugetlb memory fault time, rather than at reservation time. Thus, enforcing the hugetlb_cgroup limit only happens at fault time, and the offending task gets SIGBUS'd. Proposed Solution: A new page counter named 'hugetlb.xMB.rsvd.[limit\|usage\|max_usage]_in_bytes'. This counter has slightly different semantics than 'hugetlb.xMB.[limit\|usage\|max_usage]_in_bytes': - While usage_in_bytes tracks all faulted hugetlb memory, rsvd.usage_in_bytes tracks all reserved hugetlb memory and hugetlb memory faulted in without a prior reservation. - If a task attempts to reserve more memory than limit_in_bytes allows, the kernel will allow it to do so. But if a task attempts to reserve more memory than rsvd.limit_in_bytes, the kernel will fail this reservation. This proposal is implemented in this patch series, with tests to verify functionality and show the usage. Alternatives considered: 1. A new cgroup, instead of only a new page_counter attached to the existing hugetlb_cgroup. Adding a new cgroup seemed like a lot of code duplication with hugetlb_cgroup. Keeping hugetlb related page counters under hugetlb_cgroup seemed cleaner as well. 2. Instead of adding a new counter, we considered adding a sysctl that modifies the behavior of hugetlb.xMB.[limit\|usage]_in_bytes, to do accounting at reservation time rather than fault time. Adding a new page_counter seems better as userspace could, if it wants, choose to enforce different cgroups differently: one via limit_in_bytes, and another via rsvd.limit_in_bytes. This could be very useful if you're transitioning how hugetlb memory is partitioned on your system one cgroup at a time, for example. Also, someone may find usage for both limit_in_bytes and rsvd.limit_in_bytes concurrently, and this approach gives them the option to do so. Testing: - Added tests passing. - Used libhugetlbfs for regression testing. [1]: https://www.kernel.org/doc/html/latest/vm/hugetlbfs_reserv.html Signed-off-by: Mina Almasry <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Mike Kravetz <[email protected]> Acked-by: David Rientjes <[email protected]> Cc: Shuah Khan <[email protected]> Cc: Shakeel Butt <[email protected]> Cc: Greg Thelen <[email protected]> Cc: Sandipan Das <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-04-02	hugetlbfs: Use i_mmap_rwsem to address page fault/truncate race	Mike Kravetz	2	-20/+31
	hugetlbfs page faults can race with truncate and hole punch operations. Current code in the page fault path attempts to handle this by 'backing out' operations if we encounter the race. One obvious omission in the current code is removing a page newly added to the page cache. This is pretty straight forward to address, but there is a more subtle and difficult issue of backing out hugetlb reservations. To handle this correctly, the 'reservation state' before page allocation needs to be noted so that it can be properly backed out. There are four distinct possibilities for reservation state: shared/reserved, shared/no-resv, private/reserved and private/no-resv. Backing out a reservation may require memory allocation which could fail so that needs to be taken into account as well. Instead of writing the required complicated code for this rare occurrence, just eliminate the race. i_mmap_rwsem is now held in read mode for the duration of page fault processing. Hold i_mmap_rwsem in write mode when modifying i_size. In this way, truncation can not proceed when page faults are being processed. In addition, i_size will not change during fault processing so a single check can be made to ensure faults are not beyond (proposed) end of file. Faults can still race with hole punch, but that race is handled by existing code and the use of hugetlb_fault_mutex. With this modification, checks for races with truncation in the page fault path can be simplified and removed. remove_inode_hugepages no longer needs to take hugetlb_fault_mutex in the case of truncation. Comments are expanded to explain reasoning behind locking. Signed-off-by: Mike Kravetz <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: "Aneesh Kumar K . V" <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: "Kirill A . Shutemov" <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: Prakash Sangappa <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-04-02	hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization	Mike Kravetz	8	-19/+234
	Patch series "hugetlbfs: use i_mmap_rwsem for more synchronization", v2. While discussing the issue with huge_pte_offset [1], I remembered that there were more outstanding hugetlb races. These issues are: 1) For shared pmds, huge PTE pointers returned by huge_pte_alloc can become invalid via a call to huge_pmd_unshare by another thread. 2) hugetlbfs page faults can race with truncation causing invalid global reserve counts and state. A previous attempt was made to use i_mmap_rwsem in this manner as described at [2]. However, those patches were reverted starting with [3] due to locking issues. To effectively use i_mmap_rwsem to address the above issues it needs to be held (in read mode) during page fault processing. However, during fault processing we need to lock the page we will be adding. Lock ordering requires we take page lock before i_mmap_rwsem. Waiting until after taking the page lock is too late in the fault process for the synchronization we want to do. To address this lock ordering issue, the following patches change the lock ordering for hugetlb pages. This is not too invasive as hugetlbfs processing is done separate from core mm in many places. However, I don't really like this idea. Much ugliness is contained in the new routine hugetlb_page_mapping_lock_write() of patch 1. The only other way I can think of to address these issues is by catching all the races. After catching a race, cleanup, backout, retry ... etc, as needed. This can get really ugly, especially for huge page reservations. At one time, I started writing some of the reservation backout code for page faults and it got so ugly and complicated I went down the path of adding synchronization to avoid the races. Any other suggestions would be welcome. [1] https://lore.kernel.org/linux-mm/[email protected]/ [2] https://lore.kernel.org/linux-mm/[email protected]/ [3] https://lore.kernel.org/linux-mm/[email protected] [4] https://lore.kernel.org/linux-mm/[email protected]/ [5] https://lore.kernel.org/lkml/[email protected]/ This patch (of 2): While looking at BUGs associated with invalid huge page map counts, it was discovered and observed that a huge pte pointer could become 'invalid' and point to another task's page table. Consider the following: A task takes a page fault on a shared hugetlbfs file and calls huge_pte_alloc to get a ptep. Suppose the returned ptep points to a shared pmd. Now, another task truncates the hugetlbfs file. As part of truncation, it unmaps everyone who has the file mapped. If the range being truncated is covered by a shared pmd, huge_pmd_unshare will be called. For all but the last user of the shared pmd, huge_pmd_unshare will clear the pud pointing to the pmd. If the task in the middle of the page fault is not the last user, the ptep returned by huge_pte_alloc now points to another task's page table or worse. This leads to bad things such as incorrect page map/reference counts or invalid memory references. To fix, expand the use of i_mmap_rwsem as follows: - i_mmap_rwsem is held in read mode whenever huge_pmd_share is called. huge_pmd_share is only called via huge_pte_alloc, so callers of huge_pte_alloc take i_mmap_rwsem before calling. In addition, callers of huge_pte_alloc continue to hold the semaphore until finished with the ptep. - i_mmap_rwsem is held in write mode whenever huge_pmd_unshare is called. One problem with this scheme is that it requires taking i_mmap_rwsem before taking the page lock during page faults. This is not the order specified in the rest of mm code. Handling of hugetlbfs pages is mostly isolated today. Therefore, we use this alternative locking order for PageHuge() pages. mapping->i_mmap_rwsem hugetlb_fault_mutex (hugetlbfs specific page fault mutex) page->flags PG_locked (lock_page) To help with lock ordering issues, hugetlb_page_mapping_lock_write() is introduced to write lock the i_mmap_rwsem associated with a page. In most cases it is easy to get address_space via vma->vm_file->f_mapping. However, in the case of migration or memory errors for anon pages we do not have an associated vma. A new routine _get_hugetlb_page_mapping() will use anon_vma to get address_space in these cases. Signed-off-by: Mike Kravetz <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: "Aneesh Kumar K . V" <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: "Kirill A . Shutemov" <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Prakash Sangappa <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-04-02	mm/memblock.c: remove redundant assignment to variable max_addr	Colin Ian King	1	-1/+1
	The variable max_addr is being initialized with a value that is never read and it is being updated later with a new value. The initialization is redundant and can be removed. Signed-off-by: Colin Ian King <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Pankaj Gupta <[email protected]> Reviewed-by: Mike Rapoport <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Addresses-Coverity: ("Unused value") Signed-off-by: Linus Torvalds <[email protected]>
2020-04-02	mm: mempolicy: require at least one nodeid for MPOL_PREFERRED	Randy Dunlap	1	-1/+5
	Using an empty (malformed) nodelist that is not caught during mount option parsing leads to a stack-out-of-bounds access. The option string that was used was: "mpol=prefer:,". However, MPOL_PREFERRED requires a single node number, which is not being provided here. Add a check that 'nodes' is not empty after parsing for MPOL_PREFERRED's nodeid. Fixes: 095f1fc4ebf3 ("mempolicy: rework shmem mpol parsing and display") Reported-by: Entropy Moe <[email protected]> Reported-by: [email protected] Signed-off-by: Randy Dunlap <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Tested-by: [email protected] Cc: Lee Schermerhorn <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-04-02	mm: mempolicy: use VM_BUG_ON_VMA in queue_pages_test_walk()	Yang Shi	1	-1/+1
	The VM_BUG_ON() is already used by queue_pages_test_walk(), it sounds better to dump more debug information by using VM_BUG_ON_VMA() to help debugging. Signed-off-by: Yang Shi <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Cc: "Li Xinhai" <[email protected]> Cc: Qian Cai <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-04-02	mm/mempolicy: check hugepage migration is supported by arch in vma_migratable()	Li Xinhai	2	-28/+29
	vma_migratable() is called to check if pages in vma can be migrated before go ahead to further actions. Currently it is used in below code path: - task_numa_work - mbind - move_pages For hugetlb mapping, whether vma is migratable or not is determined by: - CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION - arch_hugetlb_migration_supported Issue: current code only checks for CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION alone, and no code should use it directly. (note that current code in vma_migratable don't cause failure or bug because unmap_and_move_huge_page() will catch unsupported hugepage and handle it properly) This patch checks the two factors by hugepage_migration_supported for impoving code logic and robustness. It will enable early bail out of hugepage migration procedure, but because currently all architecture supporting hugepage migration is able to support all page size, we would not see performance gain with this patch applied. vma_migratable() is moved to mm/mempolicy.c, because of the circular reference of mempolicy.h and hugetlb.h cause defining it as inline not feasible. Signed-off-by: Li Xinhai <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Mike Kravetz <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Anshuman Khandual <[email protected]> Cc: Naoya Horiguchi <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-04-02	mm/mempolicy: support MPOL_MF_STRICT for huge page mapping	Li Xinhai	1	-4/+33
	MPOL_MF_STRICT is used in mbind() for purposes: (1) MPOL_MF_STRICT is set alone without MPOL_MF_MOVE or MPOL_MF_MOVE_ALL, to check if there is misplaced page and return -EIO; (2) MPOL_MF_STRICT is set with MPOL_MF_MOVE or MPOL_MF_MOVE_ALL, to check if there is misplaced page which is failed to isolate, or page is success on isolate but failed to move, and return -EIO. For non hugepage mapping, (1) and (2) are implemented as expectation. For hugepage mapping, (1) is not implemented. And in (2), the part about failed to isolate and report -EIO is not implemented. This patch implements the missed parts for hugepage mapping. Benefits with it applied: - User space can apply same code logic to handle mbind() on hugepage and non hugepage mapping; - Reliably using MPOL_MF_STRICT alone to check whether there is misplaced page or not when bind policy on address range, especially for address range which contains both hugepage and non hugepage mapping. Analysis of potential impact to existing users: - If MPOL_MF_STRICT alone was previously used, hugetlb pages not following the memory policy would not cause an EIO error. After this change, hugetlb pages are treated like all other pages. If MPOL_MF_STRICT alone is used and hugetlb pages do not follow memory policy an EIO error will be returned. - For users who using MPOL_MF_STRICT with MPOL_MF_MOVE or MPOL_MF_MOVE_ALL, the semantic about some pages could not be moved will not be changed by this patch, because failed to isolate and failed to move have same effects to users, so their existing code will not be impacted. In mbind man page, the note about 'MPOL_MF_STRICT is ignored on huge page mappings' can be removed after this patch is applied. Mike: : The current behavior with MPOL_MF_STRICT and hugetlb pages is inconsistent : and does not match documentation (as described above). The special : behavior for hugetlb pages ideally should have been removed when hugetlb : page migration was introduced. It is unlikely that anyone relies on : today's inconsistent behavior, and removing one more case of special : handling for hugetlb pages is a good thing. Signed-off-by: Li Xinhai <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Mike Kravetz <[email protected]> Reviewed-by: Naoya Horiguchi <[email protected]> Cc: Michal Hocko <[email protected]> Cc: linux-man <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-04-02	mm/compaction.c: clean code by removing unnecessary assignment	Mateusz Nosek	1	-1/+0
	Previously 0 was assigned to variable 'last_migrated_pfn'. But the variable is not read after that, so the assignment can be removed. Signed-off-by: Mateusz Nosek <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Cc: Mel Gorman <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-04-02	mm/compaction: Disable compact_unevictable_allowed on RT	Sebastian Andrzej Siewior	3	-1/+35
	Since commit 5bbe3547aa3ba ("mm: allow compaction of unevictable pages") it is allowed to examine mlocked pages and compact them by default. On -RT even minor pagefaults are problematic because it may take a few 100us to resolve them and until then the task is blocked. Make compact_unevictable_allowed = 0 default and issue a warning on RT if it is changed. [[email protected]: v5] Link: https://lore.kernel.org/linux-mm/[email protected]/ Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Sebastian Andrzej Siewior <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Acked-by: Mel Gorman <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Luis Chamberlain <[email protected]> Cc: Kees Cook <[email protected]> Cc: Iurii Zaikin <[email protected]> Cc: Vlastimil Babka <[email protected]> Link: https://lore.kernel.org/linux-mm/[email protected]/ Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-04-02	mm/compaction: really limit compact_unevictable_allowed to 0 and 1	Sebastian Andrzej Siewior	1	-1/+1
	The proc file `compact_unevictable_allowed' should allow 0 and 1 only, the `extra*' attribues have been set properly but without proc_dointvec_minmax() as the `proc_handler' the limit will not be enforced. Use proc_dointvec_minmax() as the `proc_handler' to enfoce the valid specified range. Signed-off-by: Sebastian Andrzej Siewior <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Luis Chamberlain <[email protected]> Cc: Kees Cook <[email protected]> Cc: Iurii Zaikin <[email protected]> Cc: Mel Gorman <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-04-02	mm, compaction: fully assume capture is not NULL in compact_zone_order()	Vlastimil Babka	1	-2/+2
	Dan reports: The patch 5e1f0f098b46: "mm, compaction: capture a page under direct compaction" from Mar 5, 2019, leads to the following Smatch complaint: mm/compaction.c:2321 compact_zone_order() error: we previously assumed 'capture' could be null (see line 2313) mm/compaction.c 2288 static enum compact_result compact_zone_order(struct zone zone, int order, 2289 gfp_t gfp_mask, enum compact_priority prio, 2290 unsigned int alloc_flags, int classzone_idx, 2291 struct page capture) ^^^^^^^ 2313 if (capture) ^^^^^^^ Check for NULL 2314 current->capture_control = &capc; 2315 2316 ret = compact_zone(&cc, &capc); 2317 2318 VM_BUG_ON(!list_empty(&cc.freepages)); 2319 VM_BUG_ON(!list_empty(&cc.migratepages)); 2320 2321 capture = capc.page; ^^^^^^^^ Unchecked dereference. 2322 current->capture_control = NULL; 2323 In practice this is not an issue, as the only caller path passes non-NULL capture: __alloc_pages_direct_compact() struct page *page = NULL; try_to_compact_pages(capture = &page); compact_zone_order(capture = capture); So let's remove the unnecessary check, which should also make Smatch happy. Fixes: 5e1f0f098b46 ("mm, compaction: capture a page under direct compaction") Reported-by: Dan Carpenter <[email protected]> Suggested-by: Andrew Morton <[email protected]> Signed-off-by: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Acked-by: Mel Gorman <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-04-02	mm,thp,compaction,cma: allow THP migration for CMA allocations	Rik van Riel	2	-11/+20
	The code to implement THP migrations already exists, and the code for CMA to clear out a region of memory already exists. Only a few small tweaks are needed to allow CMA to move THP memory when attempting an allocation from alloc_contig_range. With these changes, migrating THPs from a CMA area works when allocating a 1GB hugepage from CMA memory. [[email protected]: fix hugetlbfs pages per Mike, cleanup per Vlastimil] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Rik van Riel <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Zi Yan <[email protected]> Reviewed-by: Vlastimil Babka <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Mel Gorman <[email protected]> Cc: David Rientjes <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Joonsoo Kim <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-04-02	mm,compaction,cma: add alloc_contig flag to compact_control	Rik van Riel	2	-0/+2
	Patch series "fix THP migration for CMA allocations", v2. Transparent huge pages are allocated with __GFP_MOVABLE, and can end up in CMA memory blocks. Transparent huge pages also have most of the infrastructure in place to allow migration. However, a few pieces were missing, causing THP migration to fail when attempting to use CMA to allocate 1GB hugepages. With these patches in place, THP migration from CMA blocks seems to work, both for anonymous THPs and for tmpfs/shmem THPs. This patch (of 2): Add information to struct compact_control to indicate that the allocator would really like to clear out this specific part of memory, used by for example CMA. Signed-off-by: Rik van Riel <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Vlastimil Babka <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: David Rientjes <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Zi Yan <[email protected]> Cc: Joonsoo Kim <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-04-02	selftests: vm: drop dependencies on page flags from mlock2 tests	Michal Hocko	1	-196/+37
	It was noticed that mlock2 tests are failing after 9c4e6b1a7027f ("mm, mlock, vmscan: no more skipping pagevecs") because the patch has changed the timing on when the page is added to the unevictable LRU list and thus gains the unevictable page flag. The test was just too dependent on the implementation details which were true at the time when it was introduced. Page flags and the timing when they are set is something no userspace should ever depend on. The test should be testing only for the user observable contract of the tested syscalls. Those are defined pretty well for the mlock and there are other means for testing them. In fact this is already done and testing for page flags can be safely dropped to achieve the aimed purpose. Present bits can be checked by /proc/<pid>/smaps RSS field and the locking state by VmFlags although I would argue that Locked: field would be more appropriate. Drop all the page flag machinery and considerably simplify the test. This should be more robust for future kernel changes while checking the promised contract is still valid. Fixes: 9c4e6b1a7027f ("mm, mlock, vmscan: no more skipping pagevecs") Reported-by: Rafael Aquini <[email protected]> Signed-off-by: Michal Hocko <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Acked-by: Rafael Aquini <[email protected]> Cc: Shakeel Butt <[email protected]> Cc: Eric B Munson <[email protected]> Cc: Shuah Khan <[email protected]> Cc: <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-04-02	mm/vmscan.c: do_try_to_free_pages(): clean code by removing unnecessary ↵	Mateusz Nosek	1	-1/+0
	assignment sc->memcg_low_skipped resets skipped_deactivate to 0 but this is not needed as this code path is never reachable with skipped_deactivate != 0 due to previous sc->skipped_deactivate branch. [[email protected]: rewrite changelog] Signed-off-by: Mateusz Nosek <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Acked-by: Johannes Weiner <[email protected]> Cc: Michal Hocko <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-04-02	mm/vmscan.c: make may_enter_fs bool in shrink_page_list()	Kirill Tkhai	1	-3/+2
	This gives some size improvement: $size mm/vmscan.o (before) text data bss dec hex filename 53670 24123 12 77805 12fed mm/vmscan.o $size mm/vmscan.o (after) text data bss dec hex filename 53648 24123 12 77783 12fd7 mm/vmscan.o Signed-off-by: Kirill Tkhai <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Link: http://lkml.kernel.org/r/Message-ID: Signed-off-by: Linus Torvalds <[email protected]>
2020-04-02	mm/vmscan.c: clean code by removing unnecessary assignment	Mateusz Nosek	1	-3/+1
	Previously 0 was assigned to variable 'lruvec_size', but the variable was never read later. So the assignment can be removed. Fixes: f87bccde6a7d ("mm/vmscan: remove unused lru_pages argument") Signed-off-by: Mateusz Nosek <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Matthew Wilcox (Oracle) <[email protected]> Reviewed-by: Wei Yang <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-04-02	mm/vmscan.c: fix data races using kswapd_classzone_idx	Qian Cai	1	-19/+26
	pgdat->kswapd_classzone_idx could be accessed concurrently in wakeup_kswapd(). Plain writes and reads without any lock protection result in data races. Fix them by adding a pair of READ\|WRITE_ONCE() as well as saving a branch (compilers might well optimize the original code in an unintentional way anyway). While at it, also take care of pgdat->kswapd_order and non-kswapd threads in allow_direct_reclaim(). The data races were reported by KCSAN, BUG: KCSAN: data-race in wakeup_kswapd / wakeup_kswapd write to 0xffff9f427ffff2dc of 4 bytes by task 7454 on cpu 13: wakeup_kswapd+0xf1/0x400 wakeup_kswapd at mm/vmscan.c:3967 wake_all_kswapds+0x59/0xc0 wake_all_kswapds at mm/page_alloc.c:4241 __alloc_pages_slowpath+0xdcc/0x1290 __alloc_pages_slowpath at mm/page_alloc.c:4512 __alloc_pages_nodemask+0x3bb/0x450 alloc_pages_vma+0x8a/0x2c0 do_anonymous_page+0x16e/0x6f0 __handle_mm_fault+0xcd5/0xd40 handle_mm_fault+0xfc/0x2f0 do_page_fault+0x263/0x6f9 page_fault+0x34/0x40 1 lock held by mtest01/7454: #0: ffff9f425afe8808 (&mm->mmap_sem#2){++++}, at: do_page_fault+0x143/0x6f9 do_user_addr_fault at arch/x86/mm/fault.c:1405 (inlined by) do_page_fault at arch/x86/mm/fault.c:1539 irq event stamp: 6944085 count_memcg_event_mm+0x1a6/0x270 count_memcg_event_mm+0x119/0x270 __do_softirq+0x34c/0x57c irq_exit+0xa2/0xc0 read to 0xffff9f427ffff2dc of 4 bytes by task 7472 on cpu 38: wakeup_kswapd+0xc8/0x400 wake_all_kswapds+0x59/0xc0 __alloc_pages_slowpath+0xdcc/0x1290 __alloc_pages_nodemask+0x3bb/0x450 alloc_pages_vma+0x8a/0x2c0 do_anonymous_page+0x16e/0x6f0 __handle_mm_fault+0xcd5/0xd40 handle_mm_fault+0xfc/0x2f0 do_page_fault+0x263/0x6f9 page_fault+0x34/0x40 1 lock held by mtest01/7472: #0: ffff9f425a9ac148 (&mm->mmap_sem#2){++++}, at: do_page_fault+0x143/0x6f9 irq event stamp: 6793561 count_memcg_event_mm+0x1a6/0x270 count_memcg_event_mm+0x119/0x270 __do_softirq+0x34c/0x57c irq_exit+0xa2/0xc0 BUG: KCSAN: data-race in kswapd / wakeup_kswapd write to 0xffff90973ffff2dc of 4 bytes by task 820 on cpu 6: kswapd+0x27c/0x8d0 kthread+0x1e0/0x200 ret_from_fork+0x27/0x50 read to 0xffff90973ffff2dc of 4 bytes by task 6299 on cpu 0: wakeup_kswapd+0xf3/0x450 wake_all_kswapds+0x59/0xc0 __alloc_pages_slowpath+0xdcc/0x1290 __alloc_pages_nodemask+0x3bb/0x450 alloc_pages_vma+0x8a/0x2c0 do_anonymous_page+0x170/0x700 __handle_mm_fault+0xc9f/0xd00 handle_mm_fault+0xfc/0x2f0 do_page_fault+0x263/0x6f9 page_fault+0x34/0x40 Signed-off-by: Qian Cai <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Cc: Marco Elver <[email protected]> Cc: Matthew Wilcox <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-04-02	mm/vmscan.c: remove cpu online notification for now	Wei Yang	1	-26/+1
	kswapd kernel thread starts either with a CPU affinity set to the full cpu mask of its target node or without any affinity at all if the node is CPUless. There is a cpu hotplug callback (kswapd_cpu_online) that implements an elaborate way to update this mask when a cpu is onlined. It is not really clear whether there is any actual benefit from this scheme. Completely CPU-less NUMA nodes rarely gain a new CPU during runtime. Drop the code for that reason. If there is a real usecase then we can resurrect and simplify the code. [[email protected] rewrite changelog] Suggested-by: Michal Hocko <[email protected]> Signed-off-by: Wei Yang <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: David Rientjes <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-04-02	mm: vmscan: replace open codings to NUMA_NO_NODE	Yang Shi	1	-3/+3
	The commit 98fa15f34cb3 ("mm: replace all open encodings for NUMA_NO_NODE") did the replacement across the kernel tree, but we got some more in vmscan.c since then. Signed-off-by: Yang Shi <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Anshuman Khandual <[email protected]> Acked-by: Minchan Kim <[email protected]> Acked-by: David Rientjes <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-04-02	mm: vmpressure: use mem_cgroup_is_root API	Yang Shi	1	-1/+1
	Use mem_cgroup_is_root() API to check if memcg is root memcg instead of open coding. Signed-off-by: Yang Shi <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Acked-by: Michal Hocko <[email protected]> Acked-by: David Rientjes <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-04-02	mm: vmpressure: don't need call kfree if kstrndup fails	Yang Shi	1	-4/+2
	When kstrndup fails, no memory was allocated and we can exit directly. [[email protected]: reword changelog] Signed-off-by: Yang Shi <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Acked-by: David Rientjes <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-04-02	mm/page_alloc: simplify page_is_buddy() for better code readability	chenqiwu	1	-20/+13
	Simplify page_is_buddy() to reduce the redundant code for better code readability. Signed-off-by: chenqiwu <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Alexander Duyck <[email protected]> Reviewed-by: Matthew Wilcox (Oracle) <[email protected]> Reviewed-by: Vlastimil Babka <[email protected]> Acked-by: Pankaj Gupta <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-04-02	mm/page_alloc.c: micro-optimisation Remove unnecessary branch	Mateusz Nosek	1	-2/+1
	Previously if branch condition was false, the assignment was not executed. The assignment can be safely executed even when the condition is false and it is not incorrect as it assigns the value of 'nodemask' to 'ac.nodemask' which already has the same value. So as the assignment can be executed unconditionally, the branch can be removed. Signed-off-by: Mateusz Nosek <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Matthew Wilcox (Oracle) <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-04-02	mm/page_alloc.c: use free_area_empty() instead of open-coding	chenqiwu	1	-2/+1
	Use free_area_empty() API to replace list_empty() for better code readability. Signed-off-by: chenqiwu <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Matthew Wilcox (Oracle) <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-04-02	mm, pagealloc: micro-optimisation: save two branches on hot page allocation path	Mateusz Nosek	2	-9/+15
	This patch makes ALLOC_KSWAPD equal to __GFP_KSWAPD_RECLAIM (cast to int). Thanks to that code like: if (gfp_mask & __GFP_KSWAPD_RECLAIM) alloc_flags \|= ALLOC_KSWAPD; can be changed to: alloc_flags \|= (__force int) (gfp_mask &__GFP_KSWAPD_RECLAIM); Thanks to this one branch less is generated in the assembly. In case of ALLOC_KSWAPD flag two branches are saved, first one in code that always executes in the beginning of page allocation and the second one in loop in page allocator slowpath. Signed-off-by: Mateusz Nosek <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Acked-by: Mel Gorman <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-04-02	mm/page_alloc: increase default min_free_kbytes bound	Joel Savitz	1	-2/+2
	Currently, the vm.min_free_kbytes sysctl value is capped at a hardcoded 64M in init_per_zone_wmark_min (unless it is overridden by khugepaged initialization). This value has not been modified since 2005, and enterprise-grade systems now frequently have hundreds of GB of RAM and multiple 10, 40, or even 100 GB NICs. We have seen page allocation failures on heavily loaded systems related to NIC drivers. These issues were resolved by an increase to vm.min_free_kbytes. This patch increases the hardcoded value by a factor of 4 as a temporary solution. Further work to make the calculation of vm.min_free_kbytes more consistent throughout the kernel would be desirable. As an example of the inconsistency of the current method, this value is recalculated by init_per_zone_wmark_min() in the case of memory hotplug which will override the value set by set_recommended_min_free_kbytes() called during khugepaged initialization even if khugepaged remains enabled, however an on/off toggle of khugepaged will then recalculate and set the value via set_recommended_min_free_kbytes(). Signed-off-by: Joel Savitz <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Cc: Rafael Aquini <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-04-02	kasan: add test for invalid size in memmove	Walter Wu	1	-0/+19
	Test negative size in memmove in order to verify whether it correctly get KASAN report. Casting negative numbers to size_t would indeed turn up as a large size_t, so it will have out-of-bounds bug and be detected by KASAN. [[email protected]: fix -Wstringop-overflow warning] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Walter Wu <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Dmitry Vyukov <[email protected]> Reviewed-by: Andrey Ryabinin <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: kernel test robot <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-04-02	kasan: detect negative size in memory operation function	Walter Wu	8	-21/+54
	Patch series "fix the missing underflow in memory operation function", v4. The patchset helps to produce a KASAN report when size is negative in memory operation functions. It is helpful for programmer to solve an undefined behavior issue. Patch 1 based on Dmitry's review and suggestion, patch 2 is a test in order to verify the patch 1. [1]https://bugzilla.kernel.org/show_bug.cgi?id=199341 [2]https://lore.kernel.org/linux-arm-kernel/[email protected]/ This patch (of 2): KASAN missed detecting size is a negative number in memset(), memcpy(), and memmove(), it will cause out-of-bounds bug. So needs to be detected by KASAN. If size is a negative number, then it has a reason to be defined as out-of-bounds bug type. Casting negative numbers to size_t would indeed turn up as a large size_t and its value will be larger than ULONG_MAX/2, so that this can qualify as out-of-bounds. KASAN report is shown below: BUG: KASAN: out-of-bounds in kmalloc_memmove_invalid_size+0x70/0xa0 Read of size 18446744073709551608 at addr ffffff8069660904 by task cat/72 CPU: 2 PID: 72 Comm: cat Not tainted 5.4.0-rc1-next-20191004ajb-00001-gdb8af2f372b2-dirty #1 Hardware name: linux,dummy-virt (DT) Call trace: dump_backtrace+0x0/0x288 show_stack+0x14/0x20 dump_stack+0x10c/0x164 print_address_description.isra.9+0x68/0x378 __kasan_report+0x164/0x1a0 kasan_report+0xc/0x18 check_memory_region+0x174/0x1d0 memmove+0x34/0x88 kmalloc_memmove_invalid_size+0x70/0xa0 [1] https://bugzilla.kernel.org/show_bug.cgi?id=199341 [[email protected]: fix -Wdeclaration-after-statement warn] Link: http://lkml.kernel.org/r/[email protected] [[email protected]: fix objtool warning] Link: http://lkml.kernel.org/r/[email protected] Reported-by: kernel test robot <[email protected]> Reported-by: Dmitry Vyukov <[email protected]> Suggested-by: Dmitry Vyukov <[email protected]> Signed-off-by: Walter Wu <[email protected]> Signed-off-by: Qian Cai <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Dmitry Vyukov <[email protected]> Reviewed-by: Andrey Ryabinin <[email protected]> Cc: Alexander Potapenko <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>