aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2017-10-03memremap: add scheduling point to devm_memremap_pagesMichal Hocko1-1/+3
devm_memremap_pages is initializing struct pages in for_each_device_pfn and that can take quite some time. We have even seen a soft lockup triggering on a non preemptive kernel NMI watchdog: BUG: soft lockup - CPU#61 stuck for 22s! [kworker/u641:11:1808] [...] RIP: 0010:[<ffffffff8118b6b7>] [<ffffffff8118b6b7>] devm_memremap_pages+0x327/0x430 [...] Call Trace: pmem_attach_disk+0x2fd/0x3f0 [nd_pmem] nvdimm_bus_probe+0x64/0x110 [libnvdimm] driver_probe_device+0x1f7/0x420 bus_for_each_drv+0x52/0x80 __device_attach+0xb0/0x130 bus_probe_device+0x87/0xa0 device_add+0x3fc/0x5f0 nd_async_device_register+0xe/0x40 [libnvdimm] async_run_entry_fn+0x43/0x150 process_one_work+0x14e/0x410 worker_thread+0x116/0x490 kthread+0xc7/0xe0 ret_from_fork+0x3f/0x70 fix this by adding cond_resched every 1024 pages. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Michal Hocko <[email protected]> Reported-by: Johannes Thumshirn <[email protected]> Tested-by: Johannes Thumshirn <[email protected]> Cc: Dan Williams <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-10-03mm, page_alloc: add scheduling point to memmap_init_zoneMichal Hocko1-0/+1
memmap_init_zone gets a pfn range to initialize and it can be really large resulting in a soft lockup on non-preemptible kernels NMI watchdog: BUG: soft lockup - CPU#31 stuck for 23s! [kworker/u642:5:1720] [...] task: ffff88ecd7e902c0 ti: ffff88eca4e50000 task.ti: ffff88eca4e50000 RIP: move_pfn_range_to_zone+0x185/0x1d0 [...] Call Trace: devm_memremap_pages+0x2c7/0x430 pmem_attach_disk+0x2fd/0x3f0 [nd_pmem] nvdimm_bus_probe+0x64/0x110 [libnvdimm] driver_probe_device+0x1f7/0x420 bus_for_each_drv+0x52/0x80 __device_attach+0xb0/0x130 bus_probe_device+0x87/0xa0 device_add+0x3fc/0x5f0 nd_async_device_register+0xe/0x40 [libnvdimm] async_run_entry_fn+0x43/0x150 process_one_work+0x14e/0x410 worker_thread+0x116/0x490 kthread+0xc7/0xe0 ret_from_fork+0x3f/0x70 Fix this by adding a scheduling point once per page block. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Michal Hocko <[email protected]> Reported-by: Johannes Thumshirn <[email protected]> Tested-by: Johannes Thumshirn <[email protected]> Cc: Dan Williams <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-10-03mm, memory_hotplug: add scheduling point to __add_pagesMichal Hocko1-0/+1
Patch series "mm, memory_hotplug: fix few soft lockups in memory hotadd". Johannes has noticed few soft lockups when adding a large nvdimm device. All of them were caused by a long loop without any explicit cond_resched which is a problem for !PREEMPT kernels. The fix is quite straightforward. Just make sure that cond_resched gets called from time to time. This patch (of 3): __add_pages gets a pfn range to add and there is no upper bound for a single call. This is usually a memory block aligned size for the regular memory hotplug - smaller sizes are usual for memory balloning drivers, or the whole NUMA node for physical memory online. There is no explicit scheduling point in that code path though. This can lead to long latencies while __add_pages is executed and we have even seen a soft lockup report during nvdimm initialization with !PREEMPT kernel NMI watchdog: BUG: soft lockup - CPU#11 stuck for 23s! [kworker/u641:3:832] [...] Workqueue: events_unbound async_run_entry_fn task: ffff881809270f40 ti: ffff881809274000 task.ti: ffff881809274000 RIP: _raw_spin_unlock_irqrestore+0x11/0x20 RSP: 0018:ffff881809277b10 EFLAGS: 00000286 [...] Call Trace: sparse_add_one_section+0x13d/0x18e __add_pages+0x10a/0x1d0 arch_add_memory+0x4a/0xc0 devm_memremap_pages+0x29d/0x430 pmem_attach_disk+0x2fd/0x3f0 [nd_pmem] nvdimm_bus_probe+0x64/0x110 [libnvdimm] driver_probe_device+0x1f7/0x420 bus_for_each_drv+0x52/0x80 __device_attach+0xb0/0x130 bus_probe_device+0x87/0xa0 device_add+0x3fc/0x5f0 nd_async_device_register+0xe/0x40 [libnvdimm] async_run_entry_fn+0x43/0x150 process_one_work+0x14e/0x410 worker_thread+0x116/0x490 kthread+0xc7/0xe0 ret_from_fork+0x3f/0x70 DWARF2 unwinder stuck at ret_from_fork+0x3f/0x70 Fix this by adding cond_resched once per each memory section in the given pfn range. Each section is constant amount of work which itself is not too expensive but many of them will just add up. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Michal Hocko <[email protected]> Reported-by: Johannes Thumshirn <[email protected]> Tested-by: Johannes Thumshirn <[email protected]> Cc: Dan Williams <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-10-03lib/idr.c: fix comment for idr_replace()Eric Biggers1-2/+2
idr_replace() returns the old value on success, not 0. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Eric Biggers <[email protected]> Cc: Matthew Wilcox <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-10-03mm: memcontrol: use vmalloc fallback for large kmem memcg arraysJohannes Weiner2-13/+21
For quick per-memcg indexing, slab caches and list_lru structures maintain linear arrays of descriptors. As the number of concurrent memory cgroups in the system goes up, this requires large contiguous allocations (8k cgroups = order-5, 16k cgroups = order-6 etc.) for every existing slab cache and list_lru, which can easily fail on loaded systems. E.g.: mkdir: page allocation failure: order:5, mode:0x14040c0(GFP_KERNEL|__GFP_COMP), nodemask=(null) CPU: 1 PID: 6399 Comm: mkdir Not tainted 4.13.0-mm1-00065-g720bbe532b7c-dirty #481 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-20170228_101828-anatol 04/01/2014 Call Trace: ? __alloc_pages_direct_compact+0x4c/0x110 __alloc_pages_nodemask+0xf50/0x1430 alloc_pages_current+0x60/0xc0 kmalloc_order_trace+0x29/0x1b0 __kmalloc+0x1f4/0x320 memcg_update_all_list_lrus+0xca/0x2e0 mem_cgroup_css_alloc+0x612/0x670 cgroup_apply_control_enable+0x19e/0x360 cgroup_mkdir+0x322/0x490 kernfs_iop_mkdir+0x55/0x80 vfs_mkdir+0xd0/0x120 SyS_mkdirat+0x6c/0xe0 SyS_mkdir+0x14/0x20 entry_SYSCALL_64_fastpath+0x18/0xad Mem-Info: active_anon:2965 inactive_anon:19 isolated_anon:0 active_file:100270 inactive_file:98846 isolated_file:0 unevictable:0 dirty:0 writeback:0 unstable:0 slab_reclaimable:7328 slab_unreclaimable:16402 mapped:771 shmem:52 pagetables:278 bounce:0 free:13718 free_pcp:0 free_cma:0 This output is from an artificial reproducer, but we have repeatedly observed order-7 failures in production in the Facebook fleet. These systems become useless as they cannot run more jobs, even though there is plenty of memory to allocate 128 individual pages. Use kvmalloc and kvzalloc to fall back to vmalloc space if these arrays prove too large for allocating them physically contiguous. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Johannes Weiner <[email protected]> Reviewed-by: Josef Bacik <[email protected]> Acked-by: Michal Hocko <[email protected]> Acked-by: Vladimir Davydov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-10-03kernel/sysctl.c: remove duplicate UINT_MAX check on do_proc_douintvec_conv()Luis R. Rodriguez1-2/+0
do_proc_douintvec_conv() has two UINT_MAX checks, we can remove one. This has no functional changes other than fixing a compiler warning: kernel/sysctl.c:2190]: (warning) Identical condition '*lvalp>UINT_MAX', second condition is always false Fixes: 4f2fec00afa60 ("sysctl: simplify unsigned int support") Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Luis R. Rodriguez <[email protected]> Reported-by: David Binderman <[email protected]> Acked-by: Kees Cook <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-10-03include/linux/bitfield.h: remove 32bit from FIELD_GET comment blockMasahiro Yamada1-1/+1
I do not see anything that restricts this macro to 32 bit width. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Masahiro Yamada <[email protected]> Acked-by: Jakub Kicinski <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-10-03lib/lz4: make arrays static const, reduces object code sizeColin Ian King1-2/+2
Don't populate the read-only arrays dec32table and dec64table on the stack, instead make them both static const. Makes the object code smaller by over 10K bytes: Before: text data bss dec hex filename 31500 0 0 31500 7b0c lib/lz4/lz4_decompress.o After: text data bss dec hex filename 20237 176 0 20413 4fbd lib/lz4/lz4_decompress.o (gcc version 7.2.0 x86_64) Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Colin Ian King <[email protected]> Cc: Christophe JAILLET <[email protected]> Cc: Sven Schmidt <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Joe Perches <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-10-03exec: binfmt_misc: kill the onstack iname[BINPRM_BUF_SIZE] arrayOleg Nesterov1-9/+5
After the previous change "fmt" can't go away, we can kill iname/iname_addr and use fmt->interpreter. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Oleg Nesterov <[email protected]> Acked-by: Kees Cook <[email protected]> Cc: Al Viro <[email protected]> Cc: Ben Woodard <[email protected]> Cc: James Bottomley <[email protected]> Cc: Jim Foraker <[email protected]> Cc: <[email protected]> Cc: Travis Gummels <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-10-03exec: binfmt_misc: fix race between load_misc_binary() and kill_node()Oleg Nesterov1-4/+8
load_misc_binary() makes a local copy of fmt->interpreter under entries_lock to avoid the race with kill_node() but this is not enough; the whole Node can be freed after we drop entries_lock, not only the ->interpreter string. Add dget/dput(fmt->dentry) to ensure bm_evict_inode() can't destroy/free this Node. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Oleg Nesterov <[email protected]> Acked-by: Kees Cook <[email protected]> Cc: Al Viro <[email protected]> Cc: Ben Woodard <[email protected]> Cc: James Bottomley <[email protected]> Cc: Jim Foraker <[email protected]> Cc: Travis Gummels <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-10-03exec: binfmt_misc: remove the confusing e->interp_file != NULL checksOleg Nesterov1-2/+2
If MISC_FMT_OPEN_FILE flag is set e->interp_file must be valid or we have a bug which should not be silently ignored. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Oleg Nesterov <[email protected]> Acked-by: Kees Cook <[email protected]> Cc: Al Viro <[email protected]> Cc: Ben Woodard <[email protected]> Cc: James Bottomley <[email protected]> Cc: Jim Foraker <[email protected]> Cc: <[email protected]> Cc: Travis Gummels <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-10-03exec: binfmt_misc: shift filp_close(interp_file) from kill_node() to ↵Oleg Nesterov1-6/+6
bm_evict_inode() To ensure that load_misc_binary() can't use the partially destroyed Node, see also the next patch. The current logic looks wrong in any case, once we close interp_file it doesn't make any sense to delay kfree(inode->i_private), this Node is no longer valid. Even if the MISC_FMT_OPEN_FILE/interp_file checks were not racy (they are), load_misc_binary() should not try to reopen ->interpreter if MISC_FMT_OPEN_FILE is set but ->interp_file is NULL. And I can't understand why do we use filp_close(), not fput(). Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Oleg Nesterov <[email protected]> Acked-by: Kees Cook <[email protected]> Cc: Al Viro <[email protected]> Cc: Ben Woodard <[email protected]> Cc: James Bottomley <[email protected]> Cc: Jim Foraker <[email protected]> Cc: <[email protected]> Cc: Travis Gummels <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-10-03exec: binfmt_misc: don't nullify Node->dentry in kill_node()Oleg Nesterov1-13/+9
kill_node() nullifies/checks Node->dentry to avoid double free. This complicates the next changes and this is very confusing: - we do not need to check dentry != NULL under entries_lock, kill_node() is always called under inode_lock(d_inode(root)) and we rely on this inode_lock() anyway, without this lock the MISC_FMT_OPEN_FILE cleanup could race with itself. - if kill_inode() was already called and ->dentry == NULL we should not even try to close e->interp_file. We can change bm_entry_write() to simply check !list_empty(list) before kill_node. Again, we rely on inode_lock(), in particular it saves us from the race with bm_status_write(), another caller of kill_node(). Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Oleg Nesterov <[email protected]> Acked-by: Kees Cook <[email protected]> Cc: Al Viro <[email protected]> Cc: Ben Woodard <[email protected]> Cc: James Bottomley <[email protected]> Cc: Jim Foraker <[email protected]> Cc: <[email protected]> Cc: Travis Gummels <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-10-03exec: load_script: kill the onstack interp[BINPRM_BUF_SIZE] arrayOleg Nesterov3-10/+11
Patch series "exec: binfmt_misc: fix use-after-free, kill iname[BINPRM_BUF_SIZE]". It looks like this code was always wrong, then commit 948b701a607f ("binfmt_misc: add persistent opened binary handler for containers") added more problems. This patch (of 6): load_script() can simply use i_name instead, it points into bprm->buf[] and nobody can change this memory until we call prepare_binprm(). The only complication is that we need to also change the signature of bprm_change_interp() but this change looks good too. While at it, do whitespace/style cleanups. NOTE: the real motivation for this change is that people want to increase BINPRM_BUF_SIZE, we need to change load_misc_binary() too but this looks more complicated because afaics it is very buggy. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Oleg Nesterov <[email protected]> Acked-by: Kees Cook <[email protected]> Cc: Travis Gummels <[email protected]> Cc: Ben Woodard <[email protected]> Cc: Jim Foraker <[email protected]> Cc: <[email protected]> Cc: Al Viro <[email protected]> Cc: James Bottomley <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-10-03userfaultfd: non-cooperative: fix fork use after freeAndrea Arcangeli1-10/+56
When reading the event from the uffd, we put it on a temporary fork_event list to detect if we can still access it after releasing and retaking the event_wqh.lock. If fork aborts and removes the event from the fork_event all is fine as long as we're still in the userfault read context and fork_event head is still alive. We've to put the event allocated in the fork kernel stack, back from fork_event list-head to the event_wqh head, before returning from userfaultfd_ctx_read, because the fork_event head lifetime is limited to the userfaultfd_ctx_read stack lifetime. Forgetting to move the event back to its event_wqh place then results in __remove_wait_queue(&ctx->event_wqh, &ewq->wq); in userfaultfd_event_wait_completion to remove it from a head that has been already freed from the reader stack. This could only happen if resolve_userfault_fork failed (for example if there are no file descriptors available to allocate the fork uffd). If it succeeded it was put back correctly. Furthermore, after find_userfault_evt receives a fork event, the forked userfault context in fork_nctx and uwq->msg.arg.reserved.reserved1 can be released by the fork thread as soon as the event_wqh.lock is released. Taking a reference on the fork_nctx before dropping the lock prevents an use after free in resolve_userfault_fork(). If the fork side aborted and it already released everything, we still try to succeed resolve_userfault_fork(), if possible. Fixes: 893e26e61d04eac9 ("userfaultfd: non-cooperative: Add fork() event") Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Andrea Arcangeli <[email protected]> Reported-by: Mark Rutland <[email protected]> Tested-by: Mark Rutland <[email protected]> Cc: Pavel Emelyanov <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: "Dr. David Alan Gilbert" <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-10-03mm/device-public-memory: fix edge case in _vm_normal_page()Reza Arbab1-1/+1
With device public pages at the end of my memory space, I'm getting output from _vm_normal_page(): BUG: Bad page map in process migrate_pages pte:c0800001ffff0d06 pmd:f95d3000 addr:00007fff89330000 vm_flags:00100073 anon_vma:c0000000fa899320 mapping: (null) index:7fff8933 file: (null) fault: (null) mmap: (null) readpage: (null) CPU: 0 PID: 13963 Comm: migrate_pages Tainted: P B OE 4.14.0-rc1-wip #155 Call Trace: dump_stack+0xb0/0xf4 (unreliable) print_bad_pte+0x28c/0x340 _vm_normal_page+0xc0/0x140 zap_pte_range+0x664/0xc10 unmap_page_range+0x318/0x670 unmap_vmas+0x74/0xe0 exit_mmap+0xe8/0x1f0 mmput+0xac/0x1f0 do_exit+0x348/0xcd0 do_group_exit+0x5c/0xf0 SyS_exit_group+0x1c/0x20 system_call+0x58/0x6c The pfn causing this is the very last one. Correct the bounds check accordingly. Fixes: df6ad69838fc ("mm/device-public-memory: device memory cache coherent with CPU") Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Reza Arbab <[email protected]> Reviewed-by: Jérôme Glisse <[email protected]> Reviewed-by: Balbir Singh <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-10-03mm: fix data corruption caused by lazyfree pageShaohua Li1-0/+11
MADV_FREE clears pte dirty bit and then marks the page lazyfree (clear SwapBacked). There is no lock to prevent the page is added to swap cache between these two steps by page reclaim. If page reclaim finds such page, it will simply add the page to swap cache without pageout the page to swap because the page is marked as clean. Next time, page fault will read data from the swap slot which doesn't have the original data, so we have a data corruption. To fix issue, we mark the page dirty and pageout the page. However, we shouldn't dirty all pages which is clean and in swap cache. swapin page is swap cache and clean too. So we only dirty page which is added into swap cache in page reclaim, which shouldn't be swapin page. As Minchan suggested, simply dirty the page in add_to_swap can do the job. Fixes: 802a3a92ad7a ("mm: reclaim MADV_FREE pages") Link: http://lkml.kernel.org/r/08c84256b007bf3f63c91d94383bd9eb6fee2daa.1506446061.git.shli@fb.com Signed-off-by: Shaohua Li <[email protected]> Reported-by: Artem Savkov <[email protected]> Acked-by: Michal Hocko <[email protected]> Acked-by: Minchan Kim <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Hillf Danton <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Mel Gorman <[email protected]> Cc: <[email protected]> [4.12+] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-10-03mm: avoid marking swap cached page as lazyfreeShaohua Li1-2/+2
MADV_FREE clears pte dirty bit and then marks the page lazyfree (clear SwapBacked). There is no lock to prevent the page is added to swap cache between these two steps by page reclaim. Page reclaim could add the page to swap cache and unmap the page. After page reclaim, the page is added back to lru. At that time, we probably start draining per-cpu pagevec and mark the page lazyfree. So the page could be in a state with SwapBacked cleared and PG_swapcache set. Next time there is a refault in the virtual address, do_swap_page can find the page from swap cache but the page has PageSwapCache false because SwapBacked isn't set, so do_swap_page will bail out and do nothing. The task will keep running into fault handler. Fixes: 802a3a92ad7a ("mm: reclaim MADV_FREE pages") Link: http://lkml.kernel.org/r/6537ef3814398c0073630b03f176263bc81f0902.1506446061.git.shli@fb.com Signed-off-by: Shaohua Li <[email protected]> Reported-by: Artem Savkov <[email protected]> Tested-by: Artem Savkov <[email protected]> Reviewed-by: Rik van Riel <[email protected]> Acked-by: Johannes Weiner <[email protected]> Acked-by: Michal Hocko <[email protected]> Acked-by: Minchan Kim <[email protected]> Cc: Hillf Danton <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Mel Gorman <[email protected]> Cc: <[email protected]> [4.12+] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-10-03mm: have filemap_check_and_advance_wb_err clear AS_EIO/AS_ENOSPCJeff Layton1-0/+8
Eryu noticed that he could sometimes get a leftover error reported when it shouldn't be on fsync with ext2 and non-journalled ext4. The problem is that writeback_single_inode still uses filemap_fdatawait. That picks up a previously set AS_EIO flag, which would ordinarily have been cleared before. Since we're mostly using this function as a replacement for filemap_check_errors, have filemap_check_and_advance_wb_err clear AS_EIO and AS_ENOSPC when reporting an error. That should allow the new function to better emulate the behavior of the old with respect to these flags. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Jeff Layton <[email protected]> Reported-by: Eryu Guan <[email protected]> Reviewed-by: Jan Kara <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-10-03m32r: define CPU_BIG_ENDIANSudip Mukherjee1-0/+4
The build of m32r allmodconfig is giving lots of build warnings about: include/linux/byteorder/big_endian.h:7:2: warning: #warning inconsistent configuration, needs CONFIG_CPU_BIG_ENDIAN [-Wcpp] #warning inconsistent configuration, needs CONFIG_CPU_BIG_ENDIAN Define CPU_BIG_ENDIAN like the way CPU_LITTLE_ENDIAN is defined. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Sudip Mukherjee <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-10-03zram: fix null dereference of handleMinchan Kim1-24/+12
In testing I found handle passed to zs_map_object in __zram_bvec_read is NULL so eh kernel goes oops in pin_object(). The reason is there is no routine to check the slot's freeing after getting the slot's lock. This patch fixes it. [[email protected]: v2] Link: http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/[email protected] Fixes: 1f7319c74275 ("zram: partial IO refactoring") Signed-off-by: Minchan Kim <[email protected]> Reviewed-by: Sergey Senozhatsky <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-10-03mm: fix RODATA_TEST failure "rodata_test: test data was not read only"Christophe Leroy1-1/+1
On powerpc, RODATA_TEST fails with message the following messages: Freeing unused kernel memory: 528K rodata_test: test data was not read only This is because GCC allocates it to .data section: c0695034 g O .data 00000004 rodata_test_data Since commit 056b9d8a7692 ("mm: remove rodata_test_data export, add pr_fmt"), rodata_test_data is used only inside rodata_test.c By declaring it static, it gets properly allocated into .rodata section instead of .data: c04df710 l O .rodata 00000004 rodata_test_data Fixes: 056b9d8a7692 ("mm: remove rodata_test_data export, add pr_fmt") Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Christophe Leroy <[email protected]> Cc: Kees Cook <[email protected]> Cc: Jinbum Park <[email protected]> Cc: Segher Boessenkool <[email protected]> Cc: David Laight <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-10-03rapidio: remove global irq spinlocks from the subsystemIoan Nicu4-37/+35
Locking of config and doorbell operations should be done only if the underlying hardware requires it. This patch removes the global spinlocks from the rapidio subsystem and moves them to the mport drivers (fsl_rio and tsi721), only to the necessary places. For example, local config space read and write operations (lcread/lcwrite) are atomic in all existing drivers, so there should be no need for locking, while the cread/cwrite operations which generate maintenance transactions need to be synchronized with a lock. Later, each driver could chose to use a per-port lock instead of a global one, or even more granular locking. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ioan Nicu <[email protected]> Signed-off-by: Frank Kunz <[email protected]> Acked-by: Alexandre Bounine <[email protected]> Cc: Matt Porter <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Randy Dunlap <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-10-03mm: meminit: mark init_reserved_page as __meminitArnd Bergmann1-1/+1
The function is called from __meminit context and calls other __meminit functions but isn't it self mark as such today: WARNING: vmlinux.o(.text.unlikely+0x4516): Section mismatch in reference from the function init_reserved_page() to the function .meminit.text:early_pfn_to_nid() The function init_reserved_page() references the function __meminit early_pfn_to_nid(). This is often because init_reserved_page lacks a __meminit annotation or the annotation of early_pfn_to_nid is wrong. On most compilers, we don't notice this because the function gets inlined all the time. Adding __meminit here fixes the harmless warning for the old versions and is generally the correct annotation. Link: http://lkml.kernel.org/r/[email protected] Fixes: 7e18adb4f80b ("mm: meminit: initialise remaining struct pages in parallel with kswapd") Signed-off-by: Arnd Bergmann <[email protected]> Acked-by: Mel Gorman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-10-03z3fold: fix stale list handlingVitaly Wool1-4/+2
Fix the situation when clear_bit() is called for page->private before the page pointer is actually assigned. While at it, remove work_busy() check because it is costly and does not give 100% guarantee anyway. Signed-off-by: Vitaly Wool <[email protected]> Cc: Dan Streetman <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-10-03mm,compaction: serialize waitqueue_active() checks (for real)Davidlohr Bueso1-8/+5
Andrea brought to my attention that the L->{L,S} guarantees are completely bogus for this case. I was looking at the diagram, from the offending commit, when that _is_ the race, we had the load reordered already. What we need is at least S->L semantics, thus simply use wq_has_sleeper() to serialize the call for good. Link: http://lkml.kernel.org/r/[email protected] Fixes: 46acef048a6 (mm,compaction: serialize waitqueue_active() checks) Signed-off-by: Davidlohr Bueso <[email protected]> Reported-by: Andrea Parri <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-10-03android: binder: drop lru lock in isolate callbackSherry Yang3-6/+36
Drop the global lru lock in isolate callback before calling zap_page_range which calls cond_resched, and re-acquire the global lru lock before returning. Also change return code to LRU_REMOVED_RETRY. Use mmput_async when fail to acquire mmap sem in an atomic context. Fix "BUG: sleeping function called from invalid context" errors when CONFIG_DEBUG_ATOMIC_SLEEP is enabled. Also restore mmput_async, which was initially introduced in commit ec8d7c14ea14 ("mm, oom_reaper: do not mmput synchronously from the oom reaper context"), and was removed in commit 212925802454 ("mm: oom: let oom_reap_task and exit_mmap run concurrently"). Link: http://lkml.kernel.org/r/[email protected] Fixes: f2517eb76f1f2 ("android: binder: Add global lru shrinker to binder") Signed-off-by: Sherry Yang <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> Reported-by: Kyle Yan <[email protected]> Acked-by: Arve Hjønnevåg <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Martijn Coenen <[email protected]> Cc: Todd Kjos <[email protected]> Cc: Riley Andrews <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Hillf Danton <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Hoeun Ryu <[email protected]> Cc: Christopher Lameter <[email protected]> Cc: Vegard Nossum <[email protected]> Cc: Frederic Weisbecker <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-10-03mm/memcg: avoid page count check for zone deviceJérôme Glisse1-1/+2
Fix for 4.14, zone device page always have an elevated refcount of one and thus page count sanity check in uncharge_page() is inappropriate for them. [[email protected]: nano-optimize VM_BUG_ON in uncharge_page] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Jérôme Glisse <[email protected]> Signed-off-by: Michal Hocko <[email protected]> Reported-by: Evgeny Baskakov <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Vladimir Davydov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-10-03mm, memcg: remove hotplug locking from try_chargeMichal Hocko1-5/+15
The following lockdep splat has been noticed during LTP testing ====================================================== WARNING: possible circular locking dependency detected 4.13.0-rc3-next-20170807 #12 Not tainted ------------------------------------------------------ a.out/4771 is trying to acquire lock: (cpu_hotplug_lock.rw_sem){++++++}, at: [<ffffffff812b4668>] drain_all_stock.part.35+0x18/0x140 but task is already holding lock: (&mm->mmap_sem){++++++}, at: [<ffffffff8106eb35>] __do_page_fault+0x175/0x530 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #3 (&mm->mmap_sem){++++++}: lock_acquire+0xc9/0x230 __might_fault+0x70/0xa0 _copy_to_user+0x23/0x70 filldir+0xa7/0x110 xfs_dir2_sf_getdents.isra.10+0x20c/0x2c0 [xfs] xfs_readdir+0x1fa/0x2c0 [xfs] xfs_file_readdir+0x30/0x40 [xfs] iterate_dir+0x17a/0x1a0 SyS_getdents+0xb0/0x160 entry_SYSCALL_64_fastpath+0x1f/0xbe -> #2 (&type->i_mutex_dir_key#3){++++++}: lock_acquire+0xc9/0x230 down_read+0x51/0xb0 lookup_slow+0xde/0x210 walk_component+0x160/0x250 link_path_walk+0x1a6/0x610 path_openat+0xe4/0xd50 do_filp_open+0x91/0x100 file_open_name+0xf5/0x130 filp_open+0x33/0x50 kernel_read_file_from_path+0x39/0x80 _request_firmware+0x39f/0x880 request_firmware_direct+0x37/0x50 request_microcode_fw+0x64/0xe0 reload_store+0xf7/0x180 dev_attr_store+0x18/0x30 sysfs_kf_write+0x44/0x60 kernfs_fop_write+0x113/0x1a0 __vfs_write+0x37/0x170 vfs_write+0xc7/0x1c0 SyS_write+0x58/0xc0 do_syscall_64+0x6c/0x1f0 return_from_SYSCALL_64+0x0/0x7a -> #1 (microcode_mutex){+.+.+.}: lock_acquire+0xc9/0x230 __mutex_lock+0x88/0x960 mutex_lock_nested+0x1b/0x20 microcode_init+0xbb/0x208 do_one_initcall+0x51/0x1a9 kernel_init_freeable+0x208/0x2a7 kernel_init+0xe/0x104 ret_from_fork+0x2a/0x40 -> #0 (cpu_hotplug_lock.rw_sem){++++++}: __lock_acquire+0x153c/0x1550 lock_acquire+0xc9/0x230 cpus_read_lock+0x4b/0x90 drain_all_stock.part.35+0x18/0x140 try_charge+0x3ab/0x6e0 mem_cgroup_try_charge+0x7f/0x2c0 shmem_getpage_gfp+0x25f/0x1050 shmem_fault+0x96/0x200 __do_fault+0x1e/0xa0 __handle_mm_fault+0x9c3/0xe00 handle_mm_fault+0x16e/0x380 __do_page_fault+0x24a/0x530 do_page_fault+0x30/0x80 page_fault+0x28/0x30 other info that might help us debug this: Chain exists of: cpu_hotplug_lock.rw_sem --> &type->i_mutex_dir_key#3 --> &mm->mmap_sem Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&mm->mmap_sem); lock(&type->i_mutex_dir_key#3); lock(&mm->mmap_sem); lock(cpu_hotplug_lock.rw_sem); *** DEADLOCK *** 2 locks held by a.out/4771: #0: (&mm->mmap_sem){++++++}, at: [<ffffffff8106eb35>] __do_page_fault+0x175/0x530 #1: (percpu_charge_mutex){+.+...}, at: [<ffffffff812b4c97>] try_charge+0x397/0x6e0 The problem is very similar to the one fixed by commit a459eeb7b852 ("mm, page_alloc: do not depend on cpu hotplug locks inside the allocator"). We are taking hotplug locks while we can be sitting on top of basically arbitrary locks. This just calls for problems. We can get rid of {get,put}_online_cpus, fortunately. We do not have to be worried about races with memory hotplug because drain_local_stock, which is called from both the WQ draining and the memory hotplug contexts, is always operating on the local cpu stock with IRQs disabled. The only thing to be careful about is that the target memcg doesn't vanish while we are still in drain_all_stock so take a reference on it. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Michal Hocko <[email protected]> Reported-by: Artem Savkov <[email protected]> Tested-by: Artem Savkov <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Thomas Gleixner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-10-03mm, oom_reaper: skip mm structs with mmu notifiersMichal Hocko2-0/+21
Andrea has noticed that the oom_reaper doesn't invalidate the range via mmu notifiers (mmu_notifier_invalidate_range_start/end) and that can corrupt the memory of the kvm guest for example. tlb_flush_mmu_tlbonly already invokes mmu notifiers but that is not sufficient as per Andrea: "mmu_notifier_invalidate_range cannot be used in replacement of mmu_notifier_invalidate_range_start/end. For KVM mmu_notifier_invalidate_range is a noop and rightfully so. A MMU notifier implementation has to implement either ->invalidate_range method or the invalidate_range_start/end methods, not both. And if you implement invalidate_range_start/end like KVM is forced to do, calling mmu_notifier_invalidate_range in common code is a noop for KVM. For those MMU notifiers that can get away only implementing ->invalidate_range, the ->invalidate_range is implicitly called by mmu_notifier_invalidate_range_end(). And only those secondary MMUs that share the same pagetable with the primary MMU (like AMD iommuv2) can get away only implementing ->invalidate_range" As the callback is allowed to sleep and the implementation is out of hand of the MM it is safer to simply bail out if there is an mmu notifier registered. In order to not fail too early make the mm_has_notifiers check under the oom_lock and have a little nap before failing to give the current oom victim some more time to exit. [[email protected]: coding-style fixes] Link: http://lkml.kernel.org/r/[email protected] Fixes: aac453635549 ("mm, oom: introduce oom reaper") Signed-off-by: Michal Hocko <[email protected]> Reported-by: Andrea Arcangeli <[email protected]> Reviewed-by: Andrea Arcangeli <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-10-03z3fold: fix potential race in z3fold_reclaim_pageVitaly Wool1-1/+3
It is possible that on a (partially) unsuccessful page reclaim, kref_put() called in z3fold_reclaim_page() does not yield page release, but the page is released shortly afterwards by another thread. Then z3fold_reclaim_page() would try to list_add() that (released) page again which is obviously a bug. To avoid that, spin_lock() has to be taken earlier, before the kref_put() call mentioned earlier. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Vitaly Wool <[email protected]> Cc: Dan Streetman <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-10-03sh: sh7269: remove nonexistent GPIO_PH[0-7] to fix pinctrl registrationGeert Uytterhoeven1-3/+1
Pinmux_pins[] is initialized through PINMUX_GPIO(), using designated array initializers, where the GPIO_* enums serve as indices. If enum values are defined, but never used, pinmux_pins[] contains (zero-filled) holes. Such entries are treated as pin zero, which was registered before, thus leading to pinctrl registration failures, as seen on sh7722: sh-pfc pfc-sh7722: pin 0 already registered sh-pfc pfc-sh7722: error during pin registration sh-pfc pfc-sh7722: could not register: -22 sh-pfc: probe of pfc-sh7722 failed with error -22 Remove GPIO_PH[0-7] from the enum to fix this. Link: http://lkml.kernel.org/r/[email protected] Fixes: ef0fa5331a73e479 ("sh: Add pinmux for sh7269") Signed-off-by: Geert Uytterhoeven <[email protected]> Reviewed-by: Laurent Pinchart <[email protected]> Cc: Yoshinori Sato <[email protected]> Cc: Rich Felker <[email protected]> Cc: Magnus Damm <[email protected]> Cc: Yoshihiro Shimoda <[email protected]> Cc: Jacopo Mondi <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-10-03sh: sh7264: remove nonexistent GPIO_PH[0-7] to fix pinctrl registrationGeert Uytterhoeven1-3/+1
Pinmux_pins[] is initialized through PINMUX_GPIO(), using designated array initializers, where the GPIO_* enums serve as indices. If enum values are defined, but never used, pinmux_pins[] contains (zero-filled) holes. Such entries are treated as pin zero, which was registered before, thus leading to pinctrl registration failures, as seen on sh7722: sh-pfc pfc-sh7722: pin 0 already registered sh-pfc pfc-sh7722: error during pin registration sh-pfc pfc-sh7722: could not register: -22 sh-pfc: probe of pfc-sh7722 failed with error -22 Remove GPIO_PH[0-7] from the enum to fix this. Link: http://lkml.kernel.org/r/[email protected] Fixes: 41797f75486d8ca3 ("sh: Add pinmux for sh7264") Signed-off-by: Geert Uytterhoeven <[email protected]> Reviewed-by: Laurent Pinchart <[email protected]> Cc: Jacopo Mondi <[email protected]> Cc: Magnus Damm <[email protected]> Cc: Rich Felker <[email protected]> Cc: Yoshihiro Shimoda <[email protected]> Cc: Yoshinori Sato <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-10-03sh: sh7757: remove nonexistent GPIO_PT[JLNQ]7_RESV to fix pinctrl registrationGeert Uytterhoeven1-4/+4
Commit 3810e96056ff ("sh: modify pinmux for SH7757 2nd cut") renamed GPIO_PT[JLNQ]7 to GPIO_PT[JLNQ]7_RESV, and removed the existing users from the pinmux_pins[] array. However, pinmux_pins[] is initialized through PINMUX_GPIO(), using designated array initializers, where the GPIO_* enums serve as indices. Hence entries were not really removed, but replaced by (zero-filled) holes. Such entries are treated as pin zero, which was registered before, thus leading to pinctrl registration failures, as seen on sh7722: sh-pfc pfc-sh7722: pin 0 already registered sh-pfc pfc-sh7722: error during pin registration sh-pfc pfc-sh7722: could not register: -22 sh-pfc: probe of pfc-sh7722 failed with error -22 Remove GPIO_PT[JLNQ]7_RESV from the enum to fix this. Link: http://lkml.kernel.org/r/[email protected] Fixes: 3810e96056ffddf6 ("sh: modify pinmux for SH7757 2nd cut") Signed-off-by: Geert Uytterhoeven <[email protected]> Reviewed-by: Laurent Pinchart <[email protected]> Cc: Jacopo Mondi <[email protected]> Cc: Magnus Damm <[email protected]> Cc: Rich Felker <[email protected]> Cc: Yoshihiro Shimoda <[email protected]> Cc: Yoshinori Sato <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-10-03sh: sh7722: remove nonexistent GPIO_PTQ7 to fix pinctrl registrationGeert Uytterhoeven1-1/+1
Patch series "sh: sh7722/sh7757i/sh7264/sh7269: Fix pinctrl registration", v2. Magnus Damm reported that on sh7722/Migo-R, pinctrl registration fails with: sh-pfc pfc-sh7722: pin 0 already registered sh-pfc pfc-sh7722: error during pin registration sh-pfc pfc-sh7722: could not register: -22 sh-pfc: probe of pfc-sh7722 failed with error -22 pinmux_pins[] is initialized through PINMUX_GPIO(), using designated array initializers, where the GPIO_* enums serve as indices. Apparently GPIO_PTQ7 was defined in the enum, but never used. If enum values are defined, but never used, pinmux_pins[] contains (zero-filled) holes. Hence such entries are treated as pin zero, which was registered before, and pinctrl registration fails. I can't see how this ever worked, as at the time of commit f5e25ae52fef ("sh-pfc: Add sh7722 pinmux support"), pinmux_gpios[] in drivers/pinctrl/sh-pfc/pfc-sh7722.c already had the hole, and drivers/pinctrl/core.c already had the check. Some scripting revealed a few more broken drivers: - sh7757 has four holes, due to nonexistent GPIO_PT[JLNQ]7_RESV. - sh7264 and sh7269 define GPIO_PH[0-7], but don't use it with PINMUX_GPIO(). Patch 1 fixes the issue on sh7722, and was tested. Patches 3-4 should fix the issue on the other 3 SoCs, but was untested due to lack of hardware. This patch (of 4): On sh7722/Migo-R, pinctrl registration fails with: sh-pfc pfc-sh7722: pin 0 already registered sh-pfc pfc-sh7722: error during pin registration sh-pfc pfc-sh7722: could not register: -22 sh-pfc: probe of pfc-sh7722 failed with error -22 pinmux_pins[] is initialized through PINMUX_GPIO(), using designated array initializers, where the GPIO_* enums serve as indices. As GPIO_PTQ7 is defined in the enum, but never used, pinmux_pins[] contains a (zero-filled) hole. Hence this entry is treated as pin zero, which was registered before, and pinctrl registration fails. According to the datasheet, port PTQ7 does not exist. Hence remove GPIO_PTQ7 from the enum to fix this. Link: http://lkml.kernel.org/r/[email protected] Fixes: 8d7b5b0af7e070b9 ("sh: Add sh7722 pinmux code") Signed-off-by: Geert Uytterhoeven <[email protected]> Reported-by: Magnus Damm <[email protected]> Reviewed-by: Laurent Pinchart <[email protected]> Tested-by: Jacopo Mondi <[email protected]> Cc: Rich Felker <[email protected]> Cc: Yoshihiro Shimoda <[email protected]> Cc: Yoshinori Sato <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-10-03mm, hugetlb, soft_offline: save compound page order before page migrationAlexandru Moise1-2/+10
This fixes a bug in madvise() where if you'd try to soft offline a hugepage via madvise(), while walking the address range you'd end up, using the wrong page offset due to attempting to get the compound order of a former but presently not compound page, due to dissolving the huge page (since commit c3114a84f7f9: "mm: hugetlb: soft-offline: dissolve source hugepage after successful migration"). As a result I ended up with all my free pages except one being offlined. Link: http://lkml.kernel.org/r/[email protected] Fixes: c3114a84f7f9 ("mm: hugetlb: soft-offline: dissolve source hugepage after successful migration") Signed-off-by: Alexandru Moise <[email protected]> Cc: Anshuman Khandual <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Hillf Danton <[email protected]> Cc: Shaohua Li <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Cc: Mel Gorman <[email protected]> Cc: David Rientjes <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-10-03ksm: fix unlocked iteration over vmas in cmp_and_merge_page()Kirill Tkhai1-1/+4
In this place mm is unlocked, so vmas or list may change. Down read mmap_sem to protect them from modifications. Link: http://lkml.kernel.org/r/150512788393.10691.8868381099691121308.stgit@localhost.localdomain Fixes: e86c59b1b12d ("mm/ksm: improve deduplication of zero pages with colouring") Signed-off-by: Kirill Tkhai <[email protected]> Acked-by: Michal Hocko <[email protected]> Reviewed-by: Andrea Arcangeli <[email protected]> Cc: Minchan Kim <[email protected]> Cc: zhong jiang <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Claudio Imbrenda <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-10-03include/linux/mm.h: fix typo in VM_MPX definitionKirill A. Shutemov1-1/+1
There's a typo in recent change of VM_MPX definition. We want it to be VM_HIGH_ARCH_4, not VM_HIGH_ARCH_BIT_4. This bug does cause visible regressions. In arch_vma_name the vmflags are tested against VM_MPX. With the incorrect value of VM_MPX, a number of vmas (such as the stack) test positive and end up being marked as "[mpx]" in /proc/N/maps instead of their correct names. This confuses tools like rr which expect to be able to find familiar vmas. Fixes: df3735c5b40f ("x86,mpx: make mpx depend on x86-64 to free up VMA flag") Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Kirill A. Shutemov <[email protected]> Reviewed-by: Rik van Riel <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Kyle Huey <[email protected]> Cc: <[email protected]> [4.14+] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-10-03scripts/spelling.txt: add more spelling mistakes to spelling.txtColin Ian King1-0/+33
Here are some of the more spelling mistakes and typos that I've found while fixing up spelling mistakes in kernel error message text over the past eight weeks. [[email protected]: s/|/||/, per Joe] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Colin Ian King <[email protected]> Acked-by: Kees Cook <[email protected]> Cc: Masahiro Yamada <[email protected]> Cc: Stephen Boyd <[email protected]> Cc: Joe Perches <[email protected]> Cc: Ross Zwisler <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-10-03kernel/params.c: align add_sysfs_param documentation with codeJean Delvare1-1/+1
This parameter is named kp, so the documentation should use that. Fixes: 9b473de87209 ("param: Fix duplicate module prefixes") Link: http://lkml.kernel.org/r/20170919142656.64aea59e@endymion Signed-off-by: Jean Delvare <[email protected]> Acked-by: Rusty Russell <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-10-03alpha: fix build failuresSudip Mukherjee1-0/+1
The build of alpha allmodconfig is giving error: arch/alpha/include/asm/mmu_context.h: In function 'ev5_switch_mm': arch/alpha/include/asm/mmu_context.h:160:2: error: implicit declaration of function 'task_thread_info'; did you mean 'init_thread_info'? [-Werror=implicit-function-declaration] The file 'mmu_context.h' needed an extra header file. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Sudip Mukherjee <[email protected]> Cc: Richard Henderson <[email protected]> Cc: Ivan Kokshaysky <[email protected]> Cc: Matt Turner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-10-03bpf: fix bpf_tail_call() x64 JITAlexei Starovoitov3-4/+4
- bpf prog_array just like all other types of bpf array accepts 32-bit index. Clarify that in the comment. - fix x64 JIT of bpf_tail_call which was incorrectly loading 8 instead of 4 bytes - tighten corresponding check in the interpreter to stay consistent The JIT bug can be triggered after introduction of BPF_F_NUMA_NODE flag in commit 96eabe7a40aa in 4.14. Before that the map_flags would stay zero and though JIT code is wrong it will check bounds correctly. Hence two fixes tags. All other JITs don't have this problem. Signed-off-by: Alexei Starovoitov <[email protected]> Fixes: 96eabe7a40aa ("bpf: Allow selecting numa node during map creation") Fixes: b52f00e6a715 ("x86: bpf_jit: implement bpf_tail_call() helper") Acked-by: Daniel Borkmann <[email protected]> Acked-by: Martin KaFai Lau <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-10-03net: stmmac: dwmac-rk: Add RK3128 GMAC supportDavid Wu2-0/+113
Add constants and callback functions for the dwmac on rk3128 soc. As can be seen, the base structure is the same, only registers and the bits in them moved slightly. Signed-off-by: David Wu <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-10-03blk-mq-debugfs: fix device sched directory for default schedulerOmar Sandoval1-1/+5
In blk_mq_debugfs_register(), I remembered to set up the per-hctx sched directories if a default scheduler was already configured by blk_mq_sched_init() from blk_mq_init_allocated_queue(), but I didn't do the same for the device-wide sched directory. Fix it. Fixes: d332ce091813 ("blk-mq-debugfs: allow schedulers to register debugfs attributes") Signed-off-by: Omar Sandoval <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-10-03null_blk: change configfs dependency to selectJens Axboe1-1/+1
A recent commit made null_blk depend on configfs, which is kind of annoying since you now have to find this dependency and enable that as well. Discovered this since I no longer had null_blk available on a box I needed to debug, since it got killed when the config updated after the configfs change was merged. Fixes: 3bf2bd20734e ("nullb: add configfs interface") Reviewed-by: Shaohua Li <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-10-03blk-throttle: fix possible io stall when upgrade to maxJoseph Qi1-2/+2
There is a case which will lead to io stall. The case is described as follows. /test1 |-subtest1 /test2 |-subtest2 And subtest1 and subtest2 each has 32 queued bios already. Now upgrade to max. In throtl_upgrade_state, it will try to dispatch bios as follows: 1) tg=subtest1, do nothing; 2) tg=test1, transfer 32 queued bios from subtest1 to test1; no pending left, no need to schedule next dispatch; 3) tg=subtest2, do nothing; 4) tg=test2, transfer 32 queued bios from subtest2 to test2; no pending left, no need to schedule next dispatch; 5) tg=/, transfer 8 queued bios from test1 to /, 8 queued bios from test2 to /, 8 queued bios from test1 to /, and 8 queued bios from test2 to /; note that test1 and test2 each still has 16 queued bios left; 6) tg=/, try to schedule next dispatch, but since disptime is now (update in tg_update_disptime, wait=0), pending timer is not scheduled in fact; 7) In throtl_upgrade_state it totally dispatches 32 queued bios and with 32 left. test1 and test2 each has 16 queued bios; 8) throtl_pending_timer_fn sees the left over bios, but could do nothing, because throtl_select_dispatch returns 0, and test1/test2 has no pending tg. The blktrace shows the following: 8,32 0 0 2.539007641 0 m N throtl upgrade to max 8,32 0 0 2.539072267 0 m N throtl /test2 dispatch nr_queued=16 read=0 write=16 8,32 7 0 2.539077142 0 m N throtl /test1 dispatch nr_queued=16 read=0 write=16 So force schedule dispatch if there are pending children. Reviewed-by: Shaohua Li <[email protected]> Signed-off-by: Joseph Qi <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-10-03rndis_host: support Novatel Verizon USB730LAleksander Morgado2-1/+14
Treat the ef/04/01 interface class/subclass/protocol combination used by the Novatel Verizon USB730L (1410:9030) as a possible RNDIS interface. T: Bus=01 Lev=02 Prnt=02 Port=01 Cnt=02 Dev#= 17 Spd=480 MxCh= 0 D: Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs= 3 P: Vendor=1410 ProdID=9030 Rev=03.10 S: Manufacturer=Novatel Wireless S: Product=MiFi USB730L S: SerialNumber=0123456789ABCDEF C: #Ifs= 3 Cfg#= 1 Atr=80 MxPwr=500mA I: If#= 0 Alt= 0 #EPs= 1 Cls=ef(misc ) Sub=04 Prot=01 Driver=rndis_host I: If#= 1 Alt= 0 #EPs= 2 Cls=0a(data ) Sub=00 Prot=00 Driver=rndis_host I: If#= 2 Alt= 0 #EPs= 1 Cls=03(HID ) Sub=00 Prot=00 Driver=usbhid Once the network interface is brought up, the user just needs to run a DHCP client to get IP address and routing setup. As a side note, other Novatel Verizon USB730L models with the same vid:pid end up exposing a standard ECM interface which doesn't require any other kernel update to make it work. Signed-off-by: Aleksander Morgado <[email protected]> Reviewed-by: Bjørn Mork <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-10-03drm/i915: Fix DDI PHY init if it was already onImre Deak2-21/+2
The common lane power down flag of a DPIO PHY has a funky semantic: after the initial enabling of the PHY (so from a disabled state) this flag will be clear. It will be set only after the PHY will be used for the first time (for instance due to enabling the corresponding pipe) and then become unused (due to disabling the pipe). During the initial PHY enablement we don't know which of the above phases we are in, so move the check for the flag where this is known, the HW readout code. This is where the rest of lane power down status checks are done anyway. This fixes at least a problem on GLK where after module reloading, the common lane power down flag of PHY1 is set, but the PHY is actually powered-on and properly set up. The GRC readout code for other PHYs will hence think that PHY1 is not powered initially and disable it after the GRC readout. This will cause the AUX power well related to PHY1 to get disabled in a stuck state, timing out when we try to enable it later. Cc: Ville Syrjala <[email protected]> Fixes: e93da0a0137b ("drm/i915/bxt: Sanitiy check the PHY lane power down status") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102777 Signed-off-by: Imre Deak <[email protected]> Reviewed-by: Rodrigo Vivi <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] (cherry picked from commit e19c1eb885ac4186e64c7e484424124f3145318e) Signed-off-by: Rodrigo Vivi <[email protected]>
2017-10-03ide: fix IRQ assignment for PCI bus order probingLorenzo Pieralisi1-4/+9
We used to assign IRQs for all devices at boot-time, before any drivers claimed devices. The following commits: 30fdfb929e82 ("PCI: Add a call to pci_assign_irq() in pci_device_probe()") 0e4c2eeb758a ("alpha/PCI: Replace pci_fixup_irqs() call with host bridge IRQ mapping hooks") changed this so we now call pci_assign_irq() from pci_device_probe() when we call a driver's probe method. The ide_scan_pcibus() path (enabled by CONFIG_IDEPCI_PCIBUS_ORDER) bypasses pci_device_probe() so it can guarantee devices are claimed in order of PCI bus address. It calls the driver's probe method directly, so it misses the pci_assign_irq() call (and other PCI initialization functions), which causes failures like this: ide0: disabled, no IRQ ide0: failed to initialize IDE interface ide0: disabling port cmd64x 0000:00:02.0: IDE controller (0x1095:0x0646 rev 0x07) CMD64x_IDE 0000:00:02.0: BAR 0: can't reserve [io 0x8050-0x8057] cmd64x 0000:00:02.0: can't reserve resources CMD64x_IDE: probe of 0000:00:02.0 failed with error -16 ide_generic: please use "probe_mask=0x3f" module parameter for probing all legacy ISA IDE ports ------------[ cut here ]------------ WARNING: CPU: 0 PID: 1 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x94/0xd0 sysfs: cannot create duplicate filename '/class/ide_port/ide0' ... Trace: [<fffffc000048c9f4>] sysfs_warn_dup+0x94/0xd0 [<fffffc0000330928>] warn_slowpath_fmt+0x58/0x70 [<fffffc000048c9f4>] sysfs_warn_dup+0x94/0xd0 [<fffffc0000486d40>] kernfs_path_from_node+0x30/0x60 [<fffffc00004874ac>] kernfs_put+0x16c/0x2c0 [<fffffc00004874ac>] kernfs_put+0x16c/0x2c0 [<fffffc000048d010>] sysfs_do_create_link_sd.isra.2+0x100/0x120 [<fffffc00005b9d64>] device_add+0x2a4/0x7c0 [<fffffc00005ba5cc>] device_create_groups_vargs+0x14c/0x170 [<fffffc00005ba518>] device_create_groups_vargs+0x98/0x170 [<fffffc00005ba690>] device_create+0x50/0x70 [<fffffc00005df36c>] ide_host_register+0x48c/0xa00 [<fffffc00005df330>] ide_host_register+0x450/0xa00 [<fffffc00005ba2a0>] device_register+0x20/0x50 [<fffffc00005df330>] ide_host_register+0x450/0xa00 [<fffffc00005df944>] ide_host_add+0x64/0xe0 [<fffffc000079b41c>] kobject_uevent_env+0x16c/0x710 [<fffffc0000310288>] do_one_initcall+0x68/0x260 [<fffffc00007b13bc>] kernel_init+0x1c/0x1a0 ... ---[ end trace 24a70433c3e4d374 ]--- ide0: disabling port Fix the IRQ allocation issue by calling pci_assign_irq() from ide_scan_pcidev() before probing the IDE PCI drivers, so that IRQs for a given PCI device are allocated for the IDE PCI drivers to use them for device configuration. Fixes: 30fdfb929e82 ("PCI: Add a call to pci_assign_irq() in pci_device_probe()") Fixes: 0e4c2eeb758a ("alpha/PCI: Replace pci_fixup_irqs() call with host bridge IRQ mapping hooks") Link: http://lkml.kernel.org/r/[email protected] Reported-by: Guenter Roeck <[email protected]> Tested-by: Guenter Roeck <[email protected]> Signed-off-by: Lorenzo Pieralisi <[email protected]> [bhelgaas: changelog] Signed-off-by: Bjorn Helgaas <[email protected]> Reviewed-by: Bartlomiej Zolnierkiewicz <[email protected]> Acked-by: David S. Miller <[email protected]> Cc: Richard Henderson <[email protected]> Cc: Ivan Kokshaysky <[email protected]> Cc: Matt Turner <[email protected]>
2017-10-03ide: pci: free PCI BARs on initialization failureBartlomiej Zolnierkiewicz1-23/+40
Recent pci_assign_irq() changes uncovered a problem with missing freeing of PCI BARs on PCI IDE host initialization failure: ide0: disabled, no IRQ ide0: failed to initialize IDE interface ide0: disabling port cmd64x 0000:00:02.0: IDE controller (0x1095:0x0646 rev 0x07) CMD64x_IDE 0000:00:02.0: BAR 0: can't reserve [io 0x8050-0x8057] cmd64x 0000:00:02.0: can't reserve resources CMD64x_IDE: probe of 0000:00:02.0 failed with error -16 Fix the problem by adding missing freeing of PCI BARs to ide_setup_pci_controller() and ide_pci_init_two(). Fixes: 30fdfb929e82 ("PCI: Add a call to pci_assign_irq() in pci_device_probe()") Fixes: 0e4c2eeb758a ("alpha/PCI: Replace pci_fixup_irqs() call with host bridge IRQ mapping hooks") Link: http://lkml.kernel.org/r/[email protected] Reported-by: Guenter Roeck <[email protected]> Tested-by: Guenter Roeck <[email protected]> Signed-off-by: Bartlomiej Zolnierkiewicz <[email protected]> [bhelgaas: add Fixes:] Signed-off-by: Bjorn Helgaas <[email protected]> Cc: Richard Henderson <[email protected]> Cc: Ivan Kokshaysky <[email protected]> Cc: Matt Turner <[email protected]>