aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2019-12-01mm/rmap.c: don't reuse anon_vma if we just want a copyWei Yang1-9/+15
Before commit 7a3ef208e662 ("mm: prevent endless growth of anon_vma hierarchy"), anon_vma_clone() doesn't change dst->anon_vma. While after this commit, anon_vma_clone() will try to reuse an exist one on forking. But this commit go a little bit further for the case not forking. anon_vma_clone() is called from __vma_split(), __split_vma(), copy_vma() and anon_vma_fork(). For the first three places, the purpose here is get a copy of src and we don't expect to touch dst->anon_vma even it is NULL. While after that commit, it is possible to reuse an anon_vma when dst->anon_vma is NULL. This is not we intend to have. This patch stops reuse of anon_vma for non-fork cases. Link: http://lkml.kernel.org/r/[email protected] Fixes: 7a3ef208e662 ("mm: prevent endless growth of anon_vma hierarchy") Signed-off-by: Wei Yang <[email protected]> Acked-by: Konstantin Khlebnikov <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: "Jérôme Glisse" <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Qian Cai <[email protected]> Cc: Shakeel Butt <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-12-01mm/mmap.c: rb_parent is not necessary in __vma_link_list()Wei Yang4-9/+5
Now we use rb_parent to get next, while this is not necessary. When prev is NULL, this means vma should be the first element in the list. Then next should be current first one (mm->mmap), no matter whether we have parent or not. After removing it, the code shows the beauty of symmetry. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Wei Yang <[email protected]> Acked-by: Andrew Morton <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Matthew Wilcox <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-12-01mm/mmap.c: extract __vma_unlink_list() as counterpart for __vma_link_list()Wei Yang4-18/+17
Just make the code a little easier to read. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Wei Yang <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-12-01mm/mmap.c: __vma_unlink_prev() is not necessary nowWei Yang1-8/+1
The third parameter of __vma_unlink_common() could differentiate these two types. __vma_unlink_prev() is not necessary now. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Wei Yang <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-12-01mm/mmap.c: prev could be retrieved from vma->vm_prevWei Yang1-13/+7
Currently __vma_unlink_common handles two cases: * has_prev * or not When has_prev is false, it is obvious prev is calculated from vma->vm_prev in __vma_unlink_common. When has_prev is true, the prev is passed through from __vma_unlink_prev in __vma_adjust for non-case 8. And at the beginning next is calculated from vma->vm_next, which implies vma is next->vm_prev. The above statement sounds a little complicated, while to think in another point of view, no matter whether vma and next is swapped, the mmap link list still preserves its property. It is proper to access vma->vm_prev. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Wei Yang <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-12-01mm/swap.c: piggyback lru_add_drain_all() callsKonstantin Khlebnikov1-1/+15
This is a very slow operation. Right now POSIX_FADV_DONTNEED is the top user because it has to freeze page references when removing it from the cache. invalidate_bdev() calls it for the same reason. Both are triggered from userspace, so it's easy to generate a storm. mlock/mlockall no longer calls lru_add_drain_all - I've seen here serious slowdown on older kernels. There are some less obvious paths in memory migration/CMA/offlining which shouldn't call frequently. The worst case requires a non-trivial workload because lru_add_drain_all() skips cpus where vectors are empty. Something must constantly generate a flow of pages for each cpu. Also cpus must be busy to make scheduling per-cpu works slower. And the machine must be big enough (64+ cpus in our case). In our case that was a massive series of mlock calls in map-reduce while other tasks write logs (and generates flows of new pages in per-cpu vectors). Mlock calls were serialized by mutex and accumulated latency up to 10 seconds or more. The kernel does not call lru_add_drain_all on mlock paths since 4.15, but the same scenario could be triggered by fadvise(POSIX_FADV_DONTNEED) or any other remaining user. There is no reason to do the drain again if somebody else already drained all the per-cpu vectors while we waited for the lock. Piggyback on a drain starting and finishing while we wait for the lock: all pages pending at the time of our entry were drained from the vectors. Callers like POSIX_FADV_DONTNEED retry their operations once after draining per-cpu vectors when pages have unexpected references. Link: http://lkml.kernel.org/r/157019456205.3142.3369423180908482020.stgit@buzz Signed-off-by: Konstantin Khlebnikov <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Matthew Wilcox <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-12-01mm/mmap.c: remove a never-triggered warning in __vma_adjust()Wei Yang1-2/+0
The upper level of "if" makes sure (end >= next->vm_end), which means there are only two possibilities: 1) end == next->vm_end 2) end > next->vm_end remove_next is assigned to be (1 + end > next->vm_end). This means if remove_next is 1, end must equal to next->vm_end. The VM_WARN_ON will never trigger. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Wei Yang <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Yang Shi <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-12-01rss_stat: add support to detect RSS updates of external mmJoel Fernandes (Google)5-20/+66
When a process updates the RSS of a different process, the rss_stat tracepoint appears in the context of the process doing the update. This can confuse userspace that the RSS of process doing the update is updated, while in reality a different process's RSS was updated. This issue happens in reclaim paths such as with direct reclaim or background reclaim. This patch adds more information to the tracepoint about whether the mm being updated belongs to the current process's context (curr field). We also include a hash of the mm pointer so that the process who the mm belongs to can be uniquely identified (mm_id field). Also vsprintf.c is refactored a bit to allow reuse of hashing code. [[email protected]: remove unused local `str'] [[email protected]: inline call to ptr_to_hashval] Link: http://lore.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Joel Fernandes (Google) <[email protected]> Reported-by: Ioannis Ilkos <[email protected]> Acked-by: Petr Mladek <[email protected]> [lib/vsprintf.c] Cc: Tim Murray <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Carmen Jackson <[email protected]> Cc: Mayank Gupta <[email protected]> Cc: Daniel Colascione <[email protected]> Cc: Steven Rostedt (VMware) <[email protected]> Cc: Minchan Kim <[email protected]> Cc: "Aneesh Kumar K.V" <[email protected]> Cc: Dan Williams <[email protected]> Cc: Jerome Glisse <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Ralph Campbell <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Steven Rostedt <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-12-01mm: emit tracepoint when RSS changesJoel Fernandes (Google)3-3/+38
Useful to track how RSS is changing per TGID to detect spikes in RSS and memory hogs. Several Android teams have been using this patch in various kernel trees for half a year now. Many reported to me it is really useful so I'm posting it upstream. Initial patch developed by Tim Murray. Changes I made from original patch: o Prevent any additional space consumed by mm_struct. Regarding the fact that the RSS may change too often thus flooding the traces - note that, there is some "hysterisis" with this already. That is - We update the counter only if we receive 64 page faults due to SPLIT_RSS_ACCOUNTING. However, during zapping or copying of pte range, the RSS is updated immediately which can become noisy/flooding. In a previous discussion, we agreed that BPF or ftrace can be used to rate limit the signal if this becomes an issue. Also note that I added wrappers to trace_rss_stat to prevent compiler errors where linux/mm.h is included from tracing code, causing errors such as: CC kernel/trace/power-traces.o In file included from ./include/trace/define_trace.h:102, from ./include/trace/events/kmem.h:342, from ./include/linux/mm.h:31, from ./include/linux/ring_buffer.h:5, from ./include/linux/trace_events.h:6, from ./include/trace/events/power.h:12, from kernel/trace/power-traces.c:15: ./include/trace/trace_events.h:113:22: error: field `ent' has incomplete type struct trace_entry ent; \ Link: http://lore.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/[email protected] Co-developed-by: Tim Murray <[email protected]> Signed-off-by: Tim Murray <[email protected]> Signed-off-by: Joel Fernandes (Google) <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Carmen Jackson <[email protected]> Cc: Mayank Gupta <[email protected]> Cc: Daniel Colascione <[email protected]> Cc: Steven Rostedt (VMware) <[email protected]> Cc: Minchan Kim <[email protected]> Cc: "Aneesh Kumar K.V" <[email protected]> Cc: Dan Williams <[email protected]> Cc: Jerome Glisse <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Ralph Campbell <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-12-01shmem: pin the file in shmem_fault() if mmap_sem is droppedKirill A. Shutemov1-5/+6
syzbot found the following crash: BUG: KASAN: use-after-free in perf_trace_lock_acquire+0x401/0x530 include/trace/events/lock.h:13 Read of size 8 at addr ffff8880a5cf2c50 by task syz-executor.0/26173 CPU: 0 PID: 26173 Comm: syz-executor.0 Not tainted 5.3.0-rc6 #146 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: perf_trace_lock_acquire+0x401/0x530 include/trace/events/lock.h:13 trace_lock_acquire include/trace/events/lock.h:13 [inline] lock_acquire+0x2de/0x410 kernel/locking/lockdep.c:4411 __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline] _raw_spin_lock+0x2f/0x40 kernel/locking/spinlock.c:151 spin_lock include/linux/spinlock.h:338 [inline] shmem_fault+0x5ec/0x7b0 mm/shmem.c:2034 __do_fault+0x111/0x540 mm/memory.c:3083 do_shared_fault mm/memory.c:3535 [inline] do_fault mm/memory.c:3613 [inline] handle_pte_fault mm/memory.c:3840 [inline] __handle_mm_fault+0x2adf/0x3f20 mm/memory.c:3964 handle_mm_fault+0x1b5/0x6b0 mm/memory.c:4001 do_user_addr_fault arch/x86/mm/fault.c:1441 [inline] __do_page_fault+0x536/0xdd0 arch/x86/mm/fault.c:1506 do_page_fault+0x38/0x590 arch/x86/mm/fault.c:1530 page_fault+0x39/0x40 arch/x86/entry/entry_64.S:1202 It happens if the VMA got unmapped under us while we dropped mmap_sem and inode got freed. Pinning the file if we drop mmap_sem fixes the issue. Link: http://lkml.kernel.org/r/20190927083908.rhifa4mmaxefc24r@box Signed-off-by: Kirill A. Shutemov <[email protected]> Reported-by: [email protected] Acked-by: Johannes Weiner <[email protected]> Reviewed-by: Matthew Wilcox (Oracle) <[email protected]> Cc: Hillf Danton <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Josef Bacik <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-12-01mm: drop mmap_sem before calling balance_dirty_pages() in write faultJohannes Weiner3-32/+48
One of our services is observing hanging ps/top/etc under heavy write IO, and the task states show this is an mmap_sem priority inversion: A write fault is holding the mmap_sem in read-mode and waiting for (heavily cgroup-limited) IO in balance_dirty_pages(): balance_dirty_pages+0x724/0x905 balance_dirty_pages_ratelimited+0x254/0x390 fault_dirty_shared_page.isra.96+0x4a/0x90 do_wp_page+0x33e/0x400 __handle_mm_fault+0x6f0/0xfa0 handle_mm_fault+0xe4/0x200 __do_page_fault+0x22b/0x4a0 page_fault+0x45/0x50 Somebody tries to change the address space, contending for the mmap_sem in write-mode: call_rwsem_down_write_failed_killable+0x13/0x20 do_mprotect_pkey+0xa8/0x330 SyS_mprotect+0xf/0x20 do_syscall_64+0x5b/0x100 entry_SYSCALL_64_after_hwframe+0x3d/0xa2 The waiting writer locks out all subsequent readers to avoid lock starvation, and several threads can be seen hanging like this: call_rwsem_down_read_failed+0x14/0x30 proc_pid_cmdline_read+0xa0/0x480 __vfs_read+0x23/0x140 vfs_read+0x87/0x130 SyS_read+0x42/0x90 do_syscall_64+0x5b/0x100 entry_SYSCALL_64_after_hwframe+0x3d/0xa2 To fix this, do what we do for cache read faults already: drop the mmap_sem before calling into anything IO bound, in this case the balance_dirty_pages() function, and return VM_FAULT_RETRY. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Johannes Weiner <[email protected]> Reviewed-by: Matthew Wilcox (Oracle) <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Cc: Josef Bacik <[email protected]> Cc: Hillf Danton <[email protected]> Cc: Hugh Dickins <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-12-01Documentation/admin-guide/cgroup-v2.rst: document why inactive_X + active_X ↵Chris Down1-1/+6
may not equal X This has confused a significant number of people using cgroups inside Facebook, and some of those outside as well judging by posts like this[0] (although it's not a problem unique to cgroup v2). If shmem handling in particular becomes more coherent at some point in the future -- although that seems unlikely now -- we can change the wording here. [0]: https://unix.stackexchange.com/q/525092/10762 Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Chris Down <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Roman Gushchin <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-12-01mm: vmscan: memcontrol: remove mem_cgroup_select_victim_node()Shakeel Butt3-129/+5
Since commit 1ba6fc9af35b ("mm: vmscan: do not share cgroup iteration between reclaimers"), the memcg reclaim does not bail out earlier based on sc->nr_reclaimed and will traverse all the nodes. All the reclaimable pages of the memcg on all the nodes will be scanned relative to the reclaim priority. So, there is no need to maintain state regarding which node to start the memcg reclaim from. This patch effectively reverts the commit 889976dbcb12 ("memcg: reclaim memory from nodes in round-robin order") and commit 453a9bf347f1 ("memcg: fix numa scan information update to be triggered by memory event"). [[email protected]: v2] Link: http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Shakeel Butt <[email protected]> Acked-by: Roman Gushchin <[email protected]> Acked-by: Michal Hocko <[email protected]> Acked-by: Johannes Weiner <[email protected]> Cc: Greg Thelen <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-12-01include/linux/memcontrol.h: fix comments based on per-node memcgHao Lee1-3/+2
These comments should be updated as memcg limit enforcement has been moved from zones to nodes. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Hao Lee <[email protected]> Acked-by: Roman Gushchin <[email protected]> Acked-by: Michal Hocko <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-12-01mm: memcontrol: try harder to set a new memory.highJohannes Weiner1-6/+24
Setting a memory.high limit below the usage makes almost no effort to shrink the cgroup to the new target size. While memory.high is a "soft" limit that isn't supposed to cause OOM situations, we should still try harder to meet a user request through persistent reclaim. For example, after setting a 10M memory.high on an 800M cgroup full of file cache, the usage shrinks to about 350M: + cat /cgroup/workingset/memory.current 841568256 + echo 10M + cat /cgroup/workingset/memory.current 355729408 This isn't exactly what the user would expect to happen. Setting the value a few more times eventually whittles the usage down to what we are asking for: + echo 10M + cat /cgroup/workingset/memory.current 104181760 + echo 10M + cat /cgroup/workingset/memory.current 31801344 + echo 10M + cat /cgroup/workingset/memory.current 10440704 To improve this, add reclaim retry loops to the memory.high write() callback, similar to what we do for memory.max, to make a reasonable effort that the usage meets the requested size after the call returns. Afterwards, a single write() to memory.high is enough in all but extreme cases: + cat /cgroup/workingset/memory.current 841609216 + echo 10M + cat /cgroup/workingset/memory.current 10182656 790M is not a reasonable reclaim target to ask of a single reclaim invocation. And it wouldn't be reasonable to optimize the reclaim code for it. So asking for the full size but retrying is not a bad choice here: we express our intent, and benefit if reclaim becomes better at handling larger requests, but we also acknowledge that some of the deltas we can encounter in memory_high_write() are just too ridiculously big for a single reclaim invocation to manage. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Johannes Weiner <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Vladimir Davydov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-12-01mm: memcontrol: remove dead code from memory_max_write()Johannes Weiner1-3/+1
When the reclaim loop in memory_max_write() is ^C'd or similar, we set err to -EINTR. But we don't return err. Once the limit is set, we always return success (nbytes). Delete the dead code. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Johannes Weiner <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Vladimir Davydov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-12-01mm, memcg: clean up reclaim iter arrayYafang Shao2-10/+4
The mem_cgroup_reclaim_cookie is only used in memcg softlimit reclaim now, and the priority of the reclaim is always 0. We don't need to define the iter in struct mem_cgroup_per_node as an array any more. That could make the code more clear and save some space. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Yafang Shao <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Vladimir Davydov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-12-01mm/swap.c: trivial mark_page_accessed() cleanupFengguang Wu1-4/+9
This avoids duplicated PageReferenced() calls. No behavior change. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Fengguang Wu <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Liu Jingqi <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-12-01mm, swap: disallow swapon() on zoned block devicesNaohiro Aota1-0/+7
A zoned block device consists of a number of zones. Zones are either conventional and accepting random writes or sequential and requiring that writes be issued in LBA order from each zone write pointer position. For the write restriction, zoned block devices are not suitable for a swap device. Disallow swapon on them. [[email protected]: reflow and reword comment, per Christoph] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Naohiro Aota <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: "Theodore Y. Ts'o" <[email protected]> Cc: Hannes Reinecke <[email protected]> Cc: Hugh Dickins <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-12-01mm/gup.c: fix comments of __get_user_pages() and get_user_pages_remote()Liu Xiang1-10/+22
Fix comments of __get_user_pages() and get_user_pages_remote(), make them more clear. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Liu Xiang <[email protected]> Suggested-by: John Hubbard <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Reviewed-by: John Hubbard <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-12-01mm/gup.c: allow CMA migration to propagate errors back to callerzhong jiang1-3/+5
check_and_migrate_cma_pages() was recording the result of __get_user_pages_locked() in an unsigned "nr_pages" variable. Because __get_user_pages_locked() returns a signed value that can include negative errno values, this had the effect of hiding errors. Change check_and_migrate_cma_pages() implementation so that it uses a signed variable instead, and propagates the results back to the caller just as other gup internal functions do. This was discovered with the help of unsigned_lesser_than_zero.cocci. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: zhong jiang <[email protected]> Suggested-by: John Hubbard <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Reviewed-by: John Hubbard <[email protected]> Reviewed-by: Ira Weiny <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-12-01mm/filemap.c: warn if stale pagecache is left after direct writeKonstantin Khlebnikov1-3/+5
generic_file_direct_write() tries to invalidate pagecache after O_DIRECT write. Unlike to similar code in dio_complete() this silently ignores error returned from invalidate_inode_pages2_range(). According to comment this code here because not all filesystems call dio_complete() to do proper invalidation after O_DIRECT write. Noticeable example is a blkdev_direct_IO(). This patch calls dio_warn_stale_pagecache() if invalidation fails. Link: http://lkml.kernel.org/r/157270038294.4812.2238891109785106069.stgit@buzz Signed-off-by: Konstantin Khlebnikov <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Reviewed-by: Jan Kara <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Alexander Viro <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-12-01fs/direct-io.c: keep dio_warn_stale_pagecache() when CONFIG_BLOCK=nKonstantin Khlebnikov3-22/+26
This helper prints warning if direct I/O write failed to invalidate cache, and set EIO at inode to warn usersapce about possible data corruption. See also commit 5a9d929d6e13 ("iomap: report collisions between directio and buffered writes to userspace"). Direct I/O is supported by non-disk filesystems, for example NFS. Thus generic code needs this even in kernel without CONFIG_BLOCK. Link: http://lkml.kernel.org/r/157270038074.4812.7980855544557488880.stgit@buzz Signed-off-by: Konstantin Khlebnikov <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Reviewed-by: Jan Kara <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Alexander Viro <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-12-01mm/filemap.c: remove redundant cache invalidation after async direct-io writeKonstantin Khlebnikov1-2/+4
generic_file_direct_write() invalidates cache at entry. Second time this should be done when request completes. But this function calls second invalidation at exit unconditionally even for async requests. This patch skips second invalidation for async requests (-EIOCBQUEUED). Link: http://lkml.kernel.org/r/157270037850.4812.15036239021726025572.stgit@buzz Signed-off-by: Konstantin Khlebnikov <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Reviewed-by: Jan Kara <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Alexander Viro <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-12-01mm/slub.c: clean up validate_slab()Yu Zhao1-13/+8
The function doesn't need to return any value, and the check can be done in one pass. There is a behavior change: before the patch, we stop at the first invalid free object; after the patch, we stop at the first invalid object, free or in use. This shouldn't matter because the original behavior isn't intended anyway. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Yu Zhao <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: David Rientjes <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Tetsuo Handa <[email protected]> Cc: Yu Zhao <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-12-01mm/slub.c: update commentsYu Zhao1-4/+2
Slub doesn't use PG_active and PG_error anymore. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Yu Zhao <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Acked-by: David Rientjes <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Joonsoo Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-12-01mm: slub: print the offset of fault addressesMiles Chen1-3/+6
With commit ad67b74d2469 ("printk: hash addresses printed with %p"), it is a little bit harder to match the fault addresses printed by check_bytes_and_report() or slab_pad_check() in the dump because the fault addresses may not show up in the dump. Print the offset of the fault addresses to make it easier to match the incorrect poison or padding values in the dump. Before: We have to search the "63" in the dump. If we want to get the offset of 63, we have to count it from the start of Object dump. ============================================================= BUG kmalloc-128 (Not tainted): Poison overwritten ------------------------------------------------------------- Disabling lock debugging due to kernel taint INFO: 0x00000000570da294-0x00000000570da294. First byte 0x63 instead of 0x6b ... INFO: Object 0x000000006ebb3b9e @offset=14208 fp=0x0000000065862488 Redzone 00000000a6abccff: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb Redzone 00000000741c16f0: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb Redzone 0000000061ad278f: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb Redzone 000000000467c1bd: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb Redzone 000000008812766b: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb Redzone 000000003d9b8f25: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb Redzone 0000000000d80c33: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb Redzone 00000000867b0d90: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb Object 000000006ebb3b9e: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b Object 000000005ea59a9f: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b Object 000000003ef8bddc: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b Object 000000008190375d: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b Object 000000006df7fb32: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b Object 0000000069474eae: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b Object 0000000008073b7d: 6b 6b 6b 6b 63 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b Object 00000000b45ae74d: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b a5 After: We know the fault address is at @offset=1508, and the Object is at @offset=1408, so we know the fault address is at offset=100 within the object. ========================================================= BUG kmalloc-128 (Not tainted): Poison overwritten --------------------------------------------------------- Disabling lock debugging due to kernel taint INFO: 0x00000000638ec1d1-0x00000000638ec1d1 @offset=1508. First byte 0x63 instead of 0x6b ... INFO: Object 0x000000008171818d @offset=1408 fp=0x0000000066dae230 Redzone 00000000e2697ab6: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb Redzone 0000000064b6a381: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb Redzone 00000000e413a234: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb Redzone 0000000004c1dfeb: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb Redzone 000000009ad24d42: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb Redzone 000000002a196a23: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb Redzone 00000000a7b8468a: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb Redzone 0000000088db6da3: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb Object 000000008171818d: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b Object 000000007c4035d4: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b Object 000000004dd281a4: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b Object 0000000079121dff: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b Object 00000000756682a9: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b Object 0000000053b7e541: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b Object 0000000091f8d530: 6b 6b 6b 6b 63 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b Object 000000009c76035c: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b a5 Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Miles Chen <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: David Rientjes <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-12-01mm, slab_common: use enum kmalloc_cache_type to iterate over kmalloc cachesPengfei Li1-2/+3
The type of local variable *type* of new_kmalloc_cache() should be enum kmalloc_cache_type instead of int, so correct it. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Pengfei Li <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Acked-by: Roman Gushchin <[email protected]> Acked-by: David Rientjes <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Pekka Enberg <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-12-01mm, slab: remove unused kmalloc_size()Pengfei Li3-25/+5
The size of kmalloc can be obtained from kmalloc_info[], so remove kmalloc_size() that will not be used anymore. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Pengfei Li <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Acked-by: Roman Gushchin <[email protected]> Acked-by: David Rientjes <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Pekka Enberg <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-12-01mm, slab: make kmalloc_info[] contain all types of namesPengfei Li3-44/+51
Patch series "mm, slab: Make kmalloc_info[] contain all types of names", v6. There are three types of kmalloc, KMALLOC_NORMAL, KMALLOC_RECLAIM and KMALLOC_DMA. The name of KMALLOC_NORMAL is contained in kmalloc_info[].name, but the names of KMALLOC_RECLAIM and KMALLOC_DMA are dynamically generated by kmalloc_cache_name(). Patch1 predefines the names of all types of kmalloc to save the time spent dynamically generating names. These changes make sense, and the time spent by new_kmalloc_cache() has been reduced by approximately 36.3%. Time spent by new_kmalloc_cache() (CPU cycles) 5.3-rc7 66264 5.3-rc7+patch 42188 This patch (of 3): There are three types of kmalloc, KMALLOC_NORMAL, KMALLOC_RECLAIM and KMALLOC_DMA. The name of KMALLOC_NORMAL is contained in kmalloc_info[].name, but the names of KMALLOC_RECLAIM and KMALLOC_DMA are dynamically generated by kmalloc_cache_name(). This patch predefines the names of all types of kmalloc to save the time spent dynamically generating names. Besides, remove the kmalloc_cache_name() that is no longer used. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Pengfei Li <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Acked-by: Roman Gushchin <[email protected]> Acked-by: David Rientjes <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Joonsoo Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-12-01fs/buffer.c: include internal.h for missing declarationsBen Dooks1-0/+2
The declarations of __block_write_begin_int and guard_bio_eod are needed from internal.h so include it to fix the following sparse warnings: fs/buffer.c:1930:5: warning: symbol '__block_write_begin_int' was not declared. Should it be static? fs/buffer.c:2994:6: warning: symbol 'guard_bio_eod' was not declared. Should it be static? Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ben Dooks <[email protected]> Reviewed-by: Jan Kara <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-12-01fs/buffer.c: fix use true/false for bool typeSaurav Girepunje1-2/+2
Use true/false for bool return type of has_bh_in_lru(). Link: http://lkml.kernel.org/r/20191029040529.GA7625@saurav Signed-off-by: Saurav Girepunje <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-12-01ocfs2: fix passing zero to 'PTR_ERR' warningDing Xiang1-2/+2
Fix a static code checker warning: fs/ocfs2/acl.c:331 ocfs2_acl_chmod() warn: passing zero to 'PTR_ERR' Link: http://lkml.kernel.org/r/[email protected] Fixes: 5ee0fbd50fd ("ocfs2: revert using ocfs2_acl_chmod to avoid inode cluster lock hang") Signed-off-by: Ding Xiang <[email protected]> Reviewed-by: Joseph Qi <[email protected]> Cc: Mark Fasheh <[email protected]> Cc: Joel Becker <[email protected]> Cc: Junxiao Bi <[email protected]> Cc: Changwei Ge <[email protected]> Cc: Gang He <[email protected]> Cc: Jun Piao <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-12-01scripts/spelling.txt: add more spellings to spelling.txtColin Ian King1-0/+28
Here are some of the more common spelling mistakes and typos that I've found while fixing up spelling mistakes in the kernel since July 2019. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Colin Ian King <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-11-30mailbox: imx: add support for imx v1 muRichard Zhu1-17/+38
There is a version 1.0 MU on i.MX7ULP platform. One new version ID register is added, and it's offset is 0. TRn registers are defined at the offset 0x20 ~ 0x2C. RRn registers are defined at the offset 0x40 ~ 0x4C. SR/CR registers are defined at 0x60/0x64. Extend this driver to support it. Signed-off-by: Richard Zhu <[email protected]> Suggested-by: Oleksij Rempel <[email protected]> Reviewed-by: Dong Aisheng <[email protected]> Reviewed-by: Oleksij Rempel <[email protected]> Reviewed-by: Daniel Baluta <[email protected]> Signed-off-by: Jassi Brar <[email protected]>
2019-11-30dt-bindings: mailbox: imx-mu: add imx7ulp MU supportRichard Zhu1-0/+2
There is a version 1.0 MU on imx7ulp, use "fsl,imx7ulp-mu" compatible to support it. Signed-off-by: Richard Zhu <[email protected]> Reviewed-by: Dong Aisheng <[email protected]> Signed-off-by: Jassi Brar <[email protected]>
2019-11-30mailbox: imx: Clear the right interrupts at shutdownDaniel Baluta1-2/+13
Make sure to only clear enabled interrupts keeping count of the connection type. Suggested-by: Oleksij Rempel <[email protected]> Signed-off-by: Daniel Baluta <[email protected]> Signed-off-by: Richard Zhu <[email protected]> Reviewed-by: Dong Aisheng <[email protected]> Signed-off-by: Jassi Brar <[email protected]>
2019-11-30mailbox: imx: Fix Tx doorbell shutdown pathDaniel Baluta1-1/+3
Tx doorbell is handled by txdb_tasklet and doesn't have an associated IRQ. Anyhow, imx_mu_shutdown ignores this and tries to free an IRQ that wasn't requested for Tx DB resulting in the following warning: [ 1.967644] Trying to free already-free IRQ 26 [ 1.972108] WARNING: CPU: 2 PID: 157 at kernel/irq/manage.c:1708 __free_irq+0xc0/0x358 [ 1.980024] Modules linked in: [ 1.983088] CPU: 2 PID: 157 Comm: kworker/2:1 Tainted: G [ 1.993524] Hardware name: Freescale i.MX8QXP MEK (DT) [ 1.998668] Workqueue: events deferred_probe_work_func [ 2.003812] pstate: 60000085 (nZCv daIf -PAN -UAO) [ 2.008607] pc : __free_irq+0xc0/0x358 [ 2.012364] lr : __free_irq+0xc0/0x358 [ 2.016111] sp : ffff00001179b7e0 [ 2.019422] x29: ffff00001179b7e0 x28: 0000000000000018 [ 2.024736] x27: ffff000011233000 x26: 0000000000000004 [ 2.030053] x25: 000000000000001a x24: ffff80083bec74d4 [ 2.035369] x23: 0000000000000000 x22: ffff80083bec7588 [ 2.040686] x21: ffff80083b1fe8d8 x20: ffff80083bec7400 [ 2.046003] x19: 0000000000000000 x18: ffffffffffffffff [ 2.051320] x17: 0000000000000000 x16: 0000000000000000 [ 2.056637] x15: ffff0000111296c8 x14: ffff00009179b517 [ 2.061953] x13: ffff00001179b525 x12: ffff000011142000 [ 2.067270] x11: ffff000011129f20 x10: ffff0000105da970 [ 2.072587] x9 : 00000000ffffffd0 x8 : 0000000000000194 [ 2.077903] x7 : 612065657266206f x6 : ffff0000111e7b09 [ 2.083220] x5 : 0000000000000003 x4 : 0000000000000000 [ 2.088537] x3 : 0000000000000000 x2 : 00000000ffffffff [ 2.093854] x1 : 28b70f0a2b60a500 x0 : 0000000000000000 [ 2.099173] Call trace: [ 2.101618] __free_irq+0xc0/0x358 [ 2.105021] free_irq+0x38/0x98 [ 2.108170] imx_mu_shutdown+0x90/0xb0 [ 2.111921] mbox_free_channel.part.2+0x24/0xb8 [ 2.116453] mbox_free_channel+0x18/0x28 This bug is present from the beginning of times. Cc: Oleksij Rempel <[email protected]> Signed-off-by: Daniel Baluta <[email protected]> Signed-off-by: Richard Zhu <[email protected]> Reviewed-by: Dong Aisheng <[email protected]> Signed-off-by: Jassi Brar <[email protected]>
2019-11-30mailbox: stm32-ipcc: Update wakeup managementFabien Dessenne1-29/+7
The wakeup specific IRQ management is no more needed to wake up the stm32 platform. A relationship has been established between the EXTI and the RX IRQ, just need to declare the EXTI interrupt instead of the IPCC RX IRQ. Signed-off-by: Alexandre Torgue <[email protected]> Signed-off-by: Fabien Dessenne <[email protected]> Signed-off-by: Jassi Brar <[email protected]>
2019-11-30Merge tag 'seccomp-v5.5-rc1' of ↵Linus Torvalds11-17/+170
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull seccomp updates from Kees Cook: "Mostly this is implementing the new flag SECCOMP_USER_NOTIF_FLAG_CONTINUE, but there are cleanups as well. - implement SECCOMP_USER_NOTIF_FLAG_CONTINUE (Christian Brauner) - fixes to selftests (Christian Brauner) - remove secure_computing() argument (Christian Brauner)" * tag 'seccomp-v5.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: seccomp: rework define for SECCOMP_USER_NOTIF_FLAG_CONTINUE seccomp: fix SECCOMP_USER_NOTIF_FLAG_CONTINUE test seccomp: simplify secure_computing() seccomp: test SECCOMP_USER_NOTIF_FLAG_CONTINUE seccomp: add SECCOMP_USER_NOTIF_FLAG_CONTINUE seccomp: avoid overflow in implicit constant conversion
2019-11-30Merge tag 'audit-pr-20191126' of ↵Linus Torvalds4-11/+18
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit Pull audit updates from Paul Moore: "Audit is back for v5.5, albeit with only two patches: - Allow for the auditing of suspicious O_CREAT usage via the new AUDIT_ANOM_CREAT record. - Remove a redundant if-conditional check found during code analysis. It's a minor change, but when the pull request is only two patches long, you need filler in the pull request email" [ Heh on the pull request filler. I wish more people tried to write better pull request messages, even if maybe it's not worth it for the trivial cases ;^) - Linus ] * tag 'audit-pr-20191126' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit: audit: remove redundant condition check in kauditd_thread() audit: Report suspicious O_CREAT usage
2019-11-30Merge tag 'selinux-pr-20191126' of ↵Linus Torvalds9-5/+74
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux Pull selinux updates from Paul Moore: "Only three SELinux patches for v5.5: - Remove the size limit on SELinux policies, the limitation was a lingering vestige and no longer necessary. - Allow file labeling before the policy is loaded. This should ease some of the burden when the policy is initially loaded (no need to relabel files), but it should also help enable some new system concepts which dynamically create the root filesystem in the initrd. - Add support for the "greatest lower bound" policy construct which is defined as the intersection of the MLS range of two SELinux labels" * tag 'selinux-pr-20191126' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux: selinux: default_range glblub implementation selinux: allow labeling before policy is loaded selinux: remove load size limit
2019-11-30Merge tag 'kgdb-5.5-rc1' of ↵Linus Torvalds5-177/+208
git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linux Pull kgdb updates from Daniel Thompson: "The major change here is the work from Douglas Anderson that reworks the way kdb stack traces are handled on SMP systems. The effect is to allow all CPUs to issue their stack trace which reduced the need for architecture specific code to support stack tracing. Also included are general of clean ups from Doug and myself: - Remove some unused variables or arguments. - Tidy up the kdb escape handling code and fix a couple of odd corner cases. - Better ignore escape characters that do not form part of an escape sequence. This mostly benefits vi users since they are most likely to press escape as a nervous habit but it won't harm anyone else" * tag 'kgdb-5.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linux: kdb: Tweak escape handling for vi users kdb: Improve handling of characters from different input sources kdb: Remove special case logic from kdb_read() kdb: Simplify code to fetch characters from console kdb: Tidy up code to handle escape sequences kdb: Avoid array subscript warnings on non-SMP builds kdb: Fix stack crawling on 'running' CPUs that aren't the master kdb: Fix "btc <cpu>" crash if the CPU didn't round up kdb: Remove unused "argcount" param from kdb_bt1(); make btaprompt bool kgdb: Remove unused DCPU_SSTEP definition
2019-11-30Merge tag 'hyperv-next-signed' of ↵Linus Torvalds27-116/+1386
git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux Pull Hyper-V updates from Sasha Levin: - support for new VMBus protocols (Andrea Parri) - hibernation support (Dexuan Cui) - latency testing framework (Branden Bonaby) - decoupling Hyper-V page size from guest page size (Himadri Pandya) * tag 'hyperv-next-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: (22 commits) Drivers: hv: vmbus: Fix crash handler reset of Hyper-V synic drivers/hv: Replace binary semaphore with mutex drivers: iommu: hyperv: Make HYPERV_IOMMU only available on x86 HID: hyperv: Add the support of hibernation hv_balloon: Add the support of hibernation x86/hyperv: Implement hv_is_hibernation_supported() Drivers: hv: balloon: Remove dependencies on guest page size Drivers: hv: vmbus: Remove dependencies on guest page size x86: hv: Add function to allocate zeroed page for Hyper-V Drivers: hv: util: Specify ring buffer size using Hyper-V page size Drivers: hv: Specify receive buffer size using Hyper-V page size tools: hv: add vmbus testing tool drivers: hv: vmbus: Introduce latency testing video: hyperv: hyperv_fb: Support deferred IO for Hyper-V frame buffer driver video: hyperv: hyperv_fb: Obtain screen resolution from Hyper-V host hv_netvsc: Add the support of hibernation hv_sock: Add the support of hibernation video: hyperv_fb: Add the support of hibernation scsi: storvsc: Add the support of hibernation Drivers: hv: vmbus: Add module parameter to cap the VMBus version ...
2019-11-30Merge branch 'ras-urgent-for-linus' of ↵Linus Torvalds1-5/+12
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RAS fix from Borislav Petkov: "One urgent fix for the thermal throttling machinery: the recent change reworking the thermal notifications forgot to mask out read-only and reserved bits in the thermal status MSRs, leading to exceptions while writing those MSRs. The fix takes care of masking out those bits first" * 'ras-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mce/therm_throt: Mask out read-only and reserved MSR bits
2019-11-30Merge branch 'parisc-5.5-1' of ↵Linus Torvalds3-52/+55
git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux Pull parisc updates from Helge Deller: "Just trivial small updates: An assembler register optimization in the inlined networking checksum functions, a compiler warning fix and don't unneccesary print a runtime warning on machines which wouldn't be affected anyway" * 'parisc-5.5-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux: parisc: Avoid spurious inequivalent alias kernel error messages kexec: Fix pointer-to-int-cast warnings parisc: Do not hardcode registers in checksum functions
2019-11-30Merge tag 'powerpc-5.5-1' of ↵Linus Torvalds215-2463/+4569
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: "Highlights: - Infrastructure for secure boot on some bare metal Power9 machines. The firmware support is still in development, so the code here won't actually activate secure boot on any existing systems. - A change to xmon (our crash handler / pseudo-debugger) to restrict it to read-only mode when the kernel is lockdown'ed, otherwise it's trivial to drop into xmon and modify kernel data, such as the lockdown state. - Support for KASLR on 32-bit BookE machines (Freescale / NXP). - Fixes for our flush_icache_range() and __kernel_sync_dicache() (VDSO) to work with memory ranges >4GB. - Some reworks of the pseries CMM (Cooperative Memory Management) driver to make it behave more like other balloon drivers and enable some cleanups of generic mm code. - A series of fixes to our hardware breakpoint support to properly handle unaligned watchpoint addresses. Plus a bunch of other smaller improvements, fixes and cleanups. Thanks to: Alastair D'Silva, Andrew Donnellan, Aneesh Kumar K.V, Anthony Steinhauser, Cédric Le Goater, Chris Packham, Chris Smart, Christophe Leroy, Christopher M. Riedl, Christoph Hellwig, Claudio Carvalho, Daniel Axtens, David Hildenbrand, Deb McLemore, Diana Craciun, Eric Richter, Geert Uytterhoeven, Greg Kroah-Hartman, Greg Kurz, Gustavo L. F. Walbon, Hari Bathini, Harish, Jason Yan, Krzysztof Kozlowski, Leonardo Bras, Mathieu Malaterre, Mauro S. M. Rodrigues, Michal Suchanek, Mimi Zohar, Nathan Chancellor, Nathan Lynch, Nayna Jain, Nick Desaulniers, Oliver O'Halloran, Qian Cai, Rasmus Villemoes, Ravi Bangoria, Sam Bobroff, Santosh Sivaraj, Scott Wood, Thomas Huth, Tyrel Datwyler, Vaibhav Jain, Valentin Longchamp, YueHaibing" * tag 'powerpc-5.5-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (144 commits) powerpc/fixmap: fix crash with HIGHMEM x86/efi: remove unused variables powerpc: Define arch_is_kernel_initmem_freed() for lockdep powerpc/prom_init: Use -ffreestanding to avoid a reference to bcmp powerpc: Avoid clang warnings around setjmp and longjmp powerpc: Don't add -mabi= flags when building with Clang powerpc: Fix Kconfig indentation powerpc/fixmap: don't clear fixmap area in paging_init() selftests/powerpc: spectre_v2 test must be built 64-bit powerpc/powernv: Disable native PCIe port management powerpc/kexec: Move kexec files into a dedicated subdir. powerpc/32: Split kexec low level code out of misc_32.S powerpc/sysdev: drop simple gpio powerpc/83xx: map IMMR with a BAT. powerpc/32s: automatically allocate BAT in setbat() powerpc/ioremap: warn on early use of ioremap() powerpc: Add support for GENERIC_EARLY_IOREMAP powerpc/fixmap: Use __fix_to_virt() instead of fix_to_virt() powerpc/8xx: use the fixmapped IMMR in cpm_reset() powerpc/8xx: add __init to cpm1 init functions ...
2019-11-30Merge tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-armLinus Torvalds59-428/+322
Pull ARM updates from Russell King: - improve ARM implementation of pfn_valid() - various sparse fixes - spelling fixes - add further ARMv8 debug architecture versions - clang fix for decompressor - update to generic vDSO - remove Brahma-B53 from spectre hardening - initialise broadcast hrtimer device - use correct nm executable in decompressor - remove old mcount et.al. * tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm: (26 commits) ARM: 8940/1: ftrace: remove mcount(),ftrace_caller_old() and ftrace_call_old() ARM: 8939/1: kbuild: use correct nm executable ARM: 8938/1: kernel: initialize broadcast hrtimer based clock event device ARM: 8937/1: spectre-v2: remove Brahma-B53 from hardening ARM: 8933/1: replace Sun/Solaris style flag on section directive ARM: 8932/1: Add clock_gettime64 entry point ARM: 8931/1: Add clock_getres entry point ARM: 8930/1: Add support for generic vDSO ARM: 8929/1: use APSR_nzcv instead of r15 as mrc operand ARM: 8927/1: ARM/hw_breakpoint: add more ARMv8 debug architecture versions support ARM: 8918/2: only build return_address() if needed ARM: 8928/1: ARM_ERRATA_775420: Spelling s/date/data/ ARM: 8925/1: tcm: include <asm/tcm.h> for missing declarations ARM: 8924/1: tcm: make dtcm_end and itcm_end static ARM: 8923/1: mm: include <asm/vga.h> for vga_base ARM: 8922/1: parse_dt_topology() rate is pointer to __be32 ARM: 8920/1: share get_signal_page from signal.c to process.c ARM: 8919/1: make unexported functions static ARM: 8917/1: mm: include <asm/set_memory.h> ARM: 8916/1: mm: make set_section_perms() static ...
2019-11-30Merge tag 'nds32-for-linus-5.5-rc1' of ↵Linus Torvalds4-7/+6
git://git.kernel.org/pub/scm/linux/kernel/git/greentime/linux Pull nds32 updates from Greentime Hu: - code clean up - add a nds32 maintainer * tag 'nds32-for-linus-5.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/greentime/linux: MAINTAINERS: add nds32 maintainer nds32: Move static keyword to the front of declaration nds32: Fix typo in Kconfig.cpu nds32: remove unneeded clean-files for DTB
2019-11-30Merge tag 'notifications-pipe-prep-20191115' of ↵Linus Torvalds13-334/+532
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs Pull pipe rework from David Howells: "This is my set of preparatory patches for building a general notification queue on top of pipes. It makes a number of significant changes: - It removes the nr_exclusive argument from __wake_up_sync_key() as this is always 1. This prepares for the next step: - Adds wake_up_interruptible_sync_poll_locked() so that poll can be woken up from a function that's holding the poll waitqueue spinlock. - Change the pipe buffer ring to be managed in terms of unbounded head and tail indices rather than bounded index and length. This means that reading the pipe only needs to modify one index, not two. - A selection of helper functions are provided to query the state of the pipe buffer, plus a couple to apply updates to the pipe indices. - The pipe ring is allowed to have kernel-reserved slots. This allows many notification messages to be spliced in by the kernel without allowing userspace to pin too many pages if it writes to the same pipe. - Advance the head and tail indices inside the pipe waitqueue lock and use wake_up_interruptible_sync_poll_locked() to poke poll without having to take the lock twice. - Rearrange pipe_write() to preallocate the buffer it is going to write into and then drop the spinlock. This allows kernel notifications to then be added the ring whilst it is filling the buffer it allocated. The read side is stalled because the pipe mutex is still held. - Don't wake up readers on a pipe if there was already data in it when we added more. - Don't wake up writers on a pipe if the ring wasn't full before we removed a buffer" * tag 'notifications-pipe-prep-20191115' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: pipe: Remove sync on wake_ups pipe: Increase the writer-wakeup threshold to reduce context-switch count pipe: Check for ring full inside of the spinlock in pipe_write() pipe: Remove redundant wakeup from pipe_write() pipe: Rearrange sequence in pipe_write() to preallocate slot pipe: Conditionalise wakeup in pipe_read() pipe: Advance tail pointer inside of wait spinlock in pipe_read() pipe: Allow pipes to have kernel-reserved slots pipe: Use head and tail pointers for the ring, not cursor and length Add wake_up_interruptible_sync_poll_locked() Remove the nr_exclusive argument from __wake_up_sync_key() pipe: Reduce #inclusion of pipe_fs_i.h