aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2018-11-18mm/memblock.c: fix a typo in __next_mem_pfn_range() commentsChen Chang1-1/+1
Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Chen Chang <[email protected]> Acked-by: Michal Hocko <[email protected]> Acked-by: Mike Rapoport <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-11-18mm, page_alloc: check for max order in hot pathMichal Hocko1-11/+9
Konstantin has noticed that kvmalloc might trigger the following warning: WARNING: CPU: 0 PID: 6676 at mm/vmstat.c:986 __fragmentation_index+0x54/0x60 [...] Call Trace: fragmentation_index+0x76/0x90 compaction_suitable+0x4f/0xf0 shrink_node+0x295/0x310 node_reclaim+0x205/0x250 get_page_from_freelist+0x649/0xad0 __alloc_pages_nodemask+0x12a/0x2a0 kmalloc_large_node+0x47/0x90 __kmalloc_node+0x22b/0x2e0 kvmalloc_node+0x3e/0x70 xt_alloc_table_info+0x3a/0x80 [x_tables] do_ip6t_set_ctl+0xcd/0x1c0 [ip6_tables] nf_setsockopt+0x44/0x60 SyS_setsockopt+0x6f/0xc0 do_syscall_64+0x67/0x120 entry_SYSCALL_64_after_hwframe+0x3d/0xa2 the problem is that we only check for an out of bound order in the slow path and the node reclaim might happen from the fast path already. This is fixable by making sure that kvmalloc doesn't ever use kmalloc for requests that are larger than KMALLOC_MAX_SIZE but this also shows that the code is rather fragile. A recent UBSAN report just underlines that by the following report UBSAN: Undefined behaviour in mm/page_alloc.c:3117:19 shift exponent 51 is too large for 32-bit type 'int' CPU: 0 PID: 6520 Comm: syz-executor1 Not tainted 4.19.0-rc2 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0xd2/0x148 lib/dump_stack.c:113 ubsan_epilogue+0x12/0x94 lib/ubsan.c:159 __ubsan_handle_shift_out_of_bounds+0x2b6/0x30b lib/ubsan.c:425 __zone_watermark_ok+0x2c7/0x400 mm/page_alloc.c:3117 zone_watermark_fast mm/page_alloc.c:3216 [inline] get_page_from_freelist+0xc49/0x44c0 mm/page_alloc.c:3300 __alloc_pages_nodemask+0x21e/0x640 mm/page_alloc.c:4370 alloc_pages_current+0xcc/0x210 mm/mempolicy.c:2093 alloc_pages include/linux/gfp.h:509 [inline] __get_free_pages+0x12/0x60 mm/page_alloc.c:4414 dma_mem_alloc+0x36/0x50 arch/x86/include/asm/floppy.h:156 raw_cmd_copyin drivers/block/floppy.c:3159 [inline] raw_cmd_ioctl drivers/block/floppy.c:3206 [inline] fd_locked_ioctl+0xa00/0x2c10 drivers/block/floppy.c:3544 fd_ioctl+0x40/0x60 drivers/block/floppy.c:3571 __blkdev_driver_ioctl block/ioctl.c:303 [inline] blkdev_ioctl+0xb3c/0x1a30 block/ioctl.c:601 block_ioctl+0x105/0x150 fs/block_dev.c:1883 vfs_ioctl fs/ioctl.c:46 [inline] do_vfs_ioctl+0x1c0/0x1150 fs/ioctl.c:687 ksys_ioctl+0x9e/0xb0 fs/ioctl.c:702 __do_sys_ioctl fs/ioctl.c:709 [inline] __se_sys_ioctl fs/ioctl.c:707 [inline] __x64_sys_ioctl+0x7e/0xc0 fs/ioctl.c:707 do_syscall_64+0xc4/0x510 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe Note that this is not a kvmalloc path. It is just that the fast path really depends on having sanitzed order as well. Therefore move the order check to the fast path. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Michal Hocko <[email protected]> Reported-by: Konstantin Khlebnikov <[email protected]> Reported-by: Kyungtae Kim <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Cc: Balbir Singh <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Pavel Tatashin <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Aaron Lu <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Byoungyoung Lee <[email protected]> Cc: "Dae R. Jeong" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-11-18scripts/spdxcheck.py: make python3 compliantUwe Kleine-König1-1/+0
Without this change the following happens when using Python3 (3.6.6): $ echo "GPL-2.0" | python3 scripts/spdxcheck.py - FAIL: 'str' object has no attribute 'decode' Traceback (most recent call last): File "scripts/spdxcheck.py", line 253, in <module> parser.parse_lines(sys.stdin, args.maxlines, '-') File "scripts/spdxcheck.py", line 171, in parse_lines line = line.decode(locale.getpreferredencoding(False), errors='ignore') AttributeError: 'str' object has no attribute 'decode' So as the line is already a string, there is no need to decode it and the line can be dropped. /usr/bin/python on Arch is Python 3. So this would indeed be worth going into 4.19. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Uwe Kleine-König <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Joe Perches <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-11-18tmpfs: make lseek(SEEK_DATA/SEK_HOLE) return ENXIO with a negative offsetYufen Yu1-3/+1
Other filesystems such as ext4, f2fs and ubifs all return ENXIO when lseek (SEEK_DATA or SEEK_HOLE) requests a negative offset. man 2 lseek says : EINVAL whence is not valid. Or: the resulting file offset would be : negative, or beyond the end of a seekable device. : : ENXIO whence is SEEK_DATA or SEEK_HOLE, and the file offset is beyond : the end of the file. Make tmpfs return ENXIO under these circumstances as well. After this, tmpfs also passes xfstests's generic/448. [[email protected]: rewrite changelog] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Yufen Yu <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Cc: Al Viro <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: William Kucharski <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-11-18lib/ubsan.c: don't mark __ubsan_handle_builtin_unreachable as noreturnArnd Bergmann1-2/+1
gcc-8 complains about the prototype for this function: lib/ubsan.c:432:1: error: ignoring attribute 'noreturn' in declaration of a built-in function '__ubsan_handle_builtin_unreachable' because it conflicts with attribute 'const' [-Werror=attributes] This is actually a GCC's bug. In GCC internals __ubsan_handle_builtin_unreachable() declared with both 'noreturn' and 'const' attributes instead of only 'noreturn': https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84210 Workaround this by removing the noreturn attribute. [aryabinin: add information about GCC bug in changelog] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnd Bergmann <[email protected]> Signed-off-by: Andrey Ryabinin <[email protected]> Acked-by: Olof Johansson <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-11-18mm/vmstat.c: fix NUMA statistics updatesJanne Huttunen1-3/+4
Scan through the whole array to see if an update is needed. While we're at it, use sizeof() to be safe against any possible type changes in the future. The bug here is that we wouldn't sync per-cpu counters into global ones if there was an update of numa_stats for higher cpus. Highly theoretical one though because it is much more probable that zone_stats are updated so we would refresh anyway. So I wouldn't bother to mark this for stable, yet something nice to fix. [[email protected]: changelog enhancement] Link: http://lkml.kernel.org/r/[email protected] Fixes: 1d90ca897cb0 ("mm: update NUMA counter threshold size") Signed-off-by: Janne Huttunen <[email protected]> Acked-by: Michal Hocko <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-11-18mm/gup.c: fix follow_page_mask() kerneldoc commentMike Rapoport1-2/+8
Commit df06b37ffe5a ("mm/gup: cache dev_pagemap while pinning pages") modified the signature of follow_page_mask() but left the parameter description behind. Update the description to make the code and comments agree again. While at it, update formatting of the return value description to match Documentation/doc-guide/kernel-doc.rst guidelines. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Mike Rapoport <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-11-18ocfs2: free up write context when direct IO failedWengang Wang2-2/+19
The write context should also be freed even when direct IO failed. Otherwise a memory leak is introduced and entries remain in oi->ip_unwritten_list causing the following BUG later in unlink path: ERROR: bug expression: !list_empty(&oi->ip_unwritten_list) ERROR: Clear inode of 215043, inode has unwritten extents ... Call Trace: ? __set_current_blocked+0x42/0x68 ocfs2_evict_inode+0x91/0x6a0 [ocfs2] ? bit_waitqueue+0x40/0x33 evict+0xdb/0x1af iput+0x1a2/0x1f7 do_unlinkat+0x194/0x28f SyS_unlinkat+0x1b/0x2f do_syscall_64+0x79/0x1ae entry_SYSCALL_64_after_hwframe+0x151/0x0 This patch also logs, with frequency limit, direct IO failures. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Wengang Wang <[email protected]> Reviewed-by: Junxiao Bi <[email protected]> Reviewed-by: Changwei Ge <[email protected]> Reviewed-by: Joseph Qi <[email protected]> Cc: Mark Fasheh <[email protected]> Cc: Joel Becker <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-11-18scripts/faddr2line: fix location of start_kernel in commentRandy Dunlap1-1/+1
Fix a source file reference location to the correct path name. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Randy Dunlap <[email protected]> Acked-by: Josh Poimboeuf <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-11-18mm: don't reclaim inodes with many attached pagesRoman Gushchin1-2/+5
Spock reported that commit 172b06c32b94 ("mm: slowly shrink slabs with a relatively small number of objects") leads to a regression on his setup: periodically the majority of the pagecache is evicted without an obvious reason, while before the change the amount of free memory was balancing around the watermark. The reason behind is that the mentioned above change created some minimal background pressure on the inode cache. The problem is that if an inode is considered to be reclaimed, all belonging pagecache page are stripped, no matter how many of them are there. So, if a huge multi-gigabyte file is cached in the memory, and the goal is to reclaim only few slab objects (unused inodes), we still can eventually evict all gigabytes of the pagecache at once. The workload described by Spock has few large non-mapped files in the pagecache, so it's especially noticeable. To solve the problem let's postpone the reclaim of inodes, which have more than 1 attached page. Let's wait until the pagecache pages will be evicted naturally by scanning the corresponding LRU lists, and only then reclaim the inode structure. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Roman Gushchin <[email protected]> Reported-by: Spock <[email protected]> Tested-by: Spock <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Randy Dunlap <[email protected]> Cc: <[email protected]> [4.19.x] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-11-18mm, memory_hotplug: check zone_movable in has_unmovable_pagesMichal Hocko1-0/+8
Page state checks are racy. Under a heavy memory workload (e.g. stress -m 200 -t 2h) it is quite easy to hit a race window when the page is allocated but its state is not fully populated yet. A debugging patch to dump the struct page state shows has_unmovable_pages: pfn:0x10dfec00, found:0x1, count:0x0 page:ffffea0437fb0000 count:1 mapcount:1 mapping:ffff880e05239841 index:0x7f26e5000 compound_mapcount: 1 flags: 0x5fffffc0090034(uptodate|lru|active|head|swapbacked) Note that the state has been checked for both PageLRU and PageSwapBacked already. Closing this race completely would require some sort of retry logic. This can be tricky and error prone (think of potential endless or long taking loops). Workaround this problem for movable zones at least. Such a zone should only contain movable pages. Commit 15c30bc09085 ("mm, memory_hotplug: make has_unmovable_pages more robust") has told us that this is not strictly true though. Bootmem pages should be marked reserved though so we can move the original check after the PageReserved check. Pages from other zones are still prone to races but we even do not pretend that memory hotremove works for those so pre-mature failure doesn't hurt that much. Link: http://lkml.kernel.org/r/[email protected] Fixes: 15c30bc09085 ("mm, memory_hotplug: make has_unmovable_pages more robust") Signed-off-by: Michal Hocko <[email protected]> Reported-by: Baoquan He <[email protected]> Tested-by: Baoquan He <[email protected]> Acked-by: Baoquan He <[email protected]> Reviewed-by: Oscar Salvador <[email protected]> Acked-by: Balbir Singh <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-11-18mm/swapfile.c: use kvzalloc for swap_info_struct allocationVasily Averin1-3/+3
Commit a2468cc9bfdf ("swap: choose swap device according to numa node") changed 'avail_lists' field of 'struct swap_info_struct' to an array. In popular linux distros it increased size of swap_info_struct up to 40 Kbytes and now swap_info_struct allocation requires order-4 page. Switch to kvzmalloc allows to avoid unexpected allocation failures. Link: http://lkml.kernel.org/r/[email protected] Fixes: a2468cc9bfdf ("swap: choose swap device according to numa node") Signed-off-by: Vasily Averin <[email protected]> Acked-by: Aaron Lu <[email protected]> Acked-by: Michal Hocko <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Cc: Huang Ying <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-11-18MAINTAINERS: update OMAP MMC entryAaro Koskinen2-2/+6
Jarkko's e-mail address hasn't worked for a long time. We still want to keep this driver working as it is critical for some of the OMAP boards. I use and test this driver frequently, so change myself as a maintainer with "Odd Fixes" status. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Aaro Koskinen <[email protected]> Acked-by: Tony Lindgren <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-11-18hugetlbfs: fix kernel BUG at fs/hugetlbfs/inode.c:444!Mike Kravetz1-4/+19
This bug has been experienced several times by the Oracle DB team. The BUG is in remove_inode_hugepages() as follows: /* * If page is mapped, it was faulted in after being * unmapped in caller. Unmap (again) now after taking * the fault mutex. The mutex will prevent faults * until we finish removing the page. * * This race can only happen in the hole punch case. * Getting here in a truncate operation is a bug. */ if (unlikely(page_mapped(page))) { BUG_ON(truncate_op); In this case, the elevated map count is not the result of a race. Rather it was incorrectly incremented as the result of a bug in the huge pmd sharing code. Consider the following: - Process A maps a hugetlbfs file of sufficient size and alignment (PUD_SIZE) that a pmd page could be shared. - Process B maps the same hugetlbfs file with the same size and alignment such that a pmd page is shared. - Process B then calls mprotect() to change protections for the mapping with the shared pmd. As a result, the pmd is 'unshared'. - Process B then calls mprotect() again to chage protections for the mapping back to their original value. pmd remains unshared. - Process B then forks and process C is created. During the fork process, we do dup_mm -> dup_mmap -> copy_page_range to copy page tables. Copying page tables for hugetlb mappings is done in the routine copy_hugetlb_page_range. In copy_hugetlb_page_range(), the destination pte is obtained by: dst_pte = huge_pte_alloc(dst, addr, sz); If pmd sharing is possible, the returned pointer will be to a pte in an existing page table. In the situation above, process C could share with either process A or process B. Since process A is first in the list, the returned pte is a pointer to a pte in process A's page table. However, the check for pmd sharing in copy_hugetlb_page_range is: /* If the pagetables are shared don't copy or take references */ if (dst_pte == src_pte) continue; Since process C is sharing with process A instead of process B, the above test fails. The code in copy_hugetlb_page_range which follows assumes dst_pte points to a huge_pte_none pte. It copies the pte entry from src_pte to dst_pte and increments this map count of the associated page. This is how we end up with an elevated map count. To solve, check the dst_pte entry for huge_pte_none. If !none, this implies PMD sharing so do not copy. Link: http://lkml.kernel.org/r/[email protected] Fixes: c5c99429fa57 ("fix hugepages leak due to pagetable page sharing") Signed-off-by: Mike Kravetz <[email protected]> Reviewed-by: Naoya Horiguchi <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: "Kirill A . Shutemov" <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Prakash Sangappa <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-11-18kernel/sched/psi.c: simplify cgroup_move_task()Olof Johansson1-21/+22
The existing code triggered an invalid warning about 'rq' possibly being used uninitialized. Instead of doing the silly warning suppression by initializa it to NULL, refactor the code to bail out early instead. Warning was: kernel/sched/psi.c: In function `cgroup_move_task': kernel/sched/psi.c:639:13: warning: `rq' may be used uninitialized in this function [-Wmaybe-uninitialized] Link: http://lkml.kernel.org/r/[email protected] Fixes: 2ce7135adc9ad ("psi: cgroup support") Signed-off-by: Olof Johansson <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Acked-by: Johannes Weiner <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Peter Zijlstra <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-11-18z3fold: fix possible reclaim racesVitaly Wool1-39/+62
Reclaim and free can race on an object which is basically fine but in order for reclaim to be able to map "freed" object we need to encode object length in the handle. handle_to_chunks() is then introduced to extract object length from a handle and use it during mapping. Moreover, to avoid racing on a z3fold "headless" page release, we should not try to free that page in z3fold_free() if the reclaim bit is set. Also, in the unlikely case of trying to reclaim a page being freed, we should not proceed with that page. While at it, fix the page accounting in reclaim function. This patch supersedes "[PATCH] z3fold: fix reclaim lock-ups". Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Vitaly Wool <[email protected]> Signed-off-by: Jongseok Kim <[email protected]> Reported-by-by: Jongseok Kim <[email protected]> Reviewed-by: Snild Dolkow <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-11-18mtd: rawnand: qcom: Namespace prefix some commandsOlof Johansson1-16/+16
PAGE_READ is used by RISC-V arch code included through mm headers, and it makes sense to bring in a prefix on these in the driver. drivers/mtd/nand/raw/qcom_nandc.c:153: warning: "PAGE_READ" redefined #define PAGE_READ 0x2 In file included from include/linux/memremap.h:7, from include/linux/mm.h:27, from include/linux/scatterlist.h:8, from include/linux/dma-mapping.h:11, from drivers/mtd/nand/raw/qcom_nandc.c:17: arch/riscv/include/asm/pgtable.h:48: note: this is the location of the previous definition Caught by riscv allmodconfig. Signed-off-by: Olof Johansson <[email protected]> Reviewed-by: Miquel Raynal <[email protected]> Signed-off-by: Boris Brezillon <[email protected]>
2018-11-18mtd: rawnand: atmel: fix OF child-node lookupJohan Hovold1-4/+7
Use the new of_get_compatible_child() helper to lookup the nfc child node instead of using of_find_compatible_node(), which searches the entire tree from a given start node and thus can return an unrelated (i.e. non-child) node. This also addresses a potential use-after-free (e.g. after probe deferral) as the tree-wide helper drops a reference to its first argument (i.e. the node of the device being probed). While at it, also fix a related nfc-node reference leak. Fixes: f88fc122cc34 ("mtd: nand: Cleanup/rework the atmel_nand driver") Cc: stable <[email protected]> # 4.11 Cc: Nicolas Ferre <[email protected]> Cc: Josh Wu <[email protected]> Cc: Boris Brezillon <[email protected]> Signed-off-by: Johan Hovold <[email protected]> Signed-off-by: Boris Brezillon <[email protected]>
2018-11-17tipc: don't assume linear buffer when reading ancillary dataJon Maloy1-4/+11
The code for reading ancillary data from a received buffer is assuming the buffer is linear. To make this assumption true we have to linearize the buffer before message data is read. Signed-off-by: Jon Maloy <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-11-17tipc: fix lockdep warning when reinitilaizing socketsJon Maloy3-18/+48
We get the following warning: [ 47.926140] 32-bit node address hash set to 2010a0a [ 47.927202] [ 47.927433] ================================ [ 47.928050] WARNING: inconsistent lock state [ 47.928661] 4.19.0+ #37 Tainted: G E [ 47.929346] -------------------------------- [ 47.929954] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage. [ 47.930116] swapper/3/0 [HC0[0]:SC1[3]:HE1:SE0] takes: [ 47.930116] 00000000af8bc31e (&(&ht->lock)->rlock){+.?.}, at: rhashtable_walk_enter+0x36/0xb0 [ 47.930116] {SOFTIRQ-ON-W} state was registered at: [ 47.930116] _raw_spin_lock+0x29/0x60 [ 47.930116] rht_deferred_worker+0x556/0x810 [ 47.930116] process_one_work+0x1f5/0x540 [ 47.930116] worker_thread+0x64/0x3e0 [ 47.930116] kthread+0x112/0x150 [ 47.930116] ret_from_fork+0x3a/0x50 [ 47.930116] irq event stamp: 14044 [ 47.930116] hardirqs last enabled at (14044): [<ffffffff9a07fbba>] __local_bh_enable_ip+0x7a/0xf0 [ 47.938117] hardirqs last disabled at (14043): [<ffffffff9a07fb81>] __local_bh_enable_ip+0x41/0xf0 [ 47.938117] softirqs last enabled at (14028): [<ffffffff9a0803ee>] irq_enter+0x5e/0x60 [ 47.938117] softirqs last disabled at (14029): [<ffffffff9a0804a5>] irq_exit+0xb5/0xc0 [ 47.938117] [ 47.938117] other info that might help us debug this: [ 47.938117] Possible unsafe locking scenario: [ 47.938117] [ 47.938117] CPU0 [ 47.938117] ---- [ 47.938117] lock(&(&ht->lock)->rlock); [ 47.938117] <Interrupt> [ 47.938117] lock(&(&ht->lock)->rlock); [ 47.938117] [ 47.938117] *** DEADLOCK *** [ 47.938117] [ 47.938117] 2 locks held by swapper/3/0: [ 47.938117] #0: 0000000062c64f90 ((&d->timer)){+.-.}, at: call_timer_fn+0x5/0x280 [ 47.938117] #1: 00000000ee39619c (&(&d->lock)->rlock){+.-.}, at: tipc_disc_timeout+0xc8/0x540 [tipc] [ 47.938117] [ 47.938117] stack backtrace: [ 47.938117] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G E 4.19.0+ #37 [ 47.938117] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 47.938117] Call Trace: [ 47.938117] <IRQ> [ 47.938117] dump_stack+0x5e/0x8b [ 47.938117] print_usage_bug+0x1ed/0x1ff [ 47.938117] mark_lock+0x5b5/0x630 [ 47.938117] __lock_acquire+0x4c0/0x18f0 [ 47.938117] ? lock_acquire+0xa6/0x180 [ 47.938117] lock_acquire+0xa6/0x180 [ 47.938117] ? rhashtable_walk_enter+0x36/0xb0 [ 47.938117] _raw_spin_lock+0x29/0x60 [ 47.938117] ? rhashtable_walk_enter+0x36/0xb0 [ 47.938117] rhashtable_walk_enter+0x36/0xb0 [ 47.938117] tipc_sk_reinit+0xb0/0x410 [tipc] [ 47.938117] ? mark_held_locks+0x6f/0x90 [ 47.938117] ? __local_bh_enable_ip+0x7a/0xf0 [ 47.938117] ? lockdep_hardirqs_on+0x20/0x1a0 [ 47.938117] tipc_net_finalize+0xbf/0x180 [tipc] [ 47.938117] tipc_disc_timeout+0x509/0x540 [tipc] [ 47.938117] ? call_timer_fn+0x5/0x280 [ 47.938117] ? tipc_disc_msg_xmit.isra.19+0xa0/0xa0 [tipc] [ 47.938117] ? tipc_disc_msg_xmit.isra.19+0xa0/0xa0 [tipc] [ 47.938117] call_timer_fn+0xa1/0x280 [ 47.938117] ? tipc_disc_msg_xmit.isra.19+0xa0/0xa0 [tipc] [ 47.938117] run_timer_softirq+0x1f2/0x4d0 [ 47.938117] __do_softirq+0xfc/0x413 [ 47.938117] irq_exit+0xb5/0xc0 [ 47.938117] smp_apic_timer_interrupt+0xac/0x210 [ 47.938117] apic_timer_interrupt+0xf/0x20 [ 47.938117] </IRQ> [ 47.938117] RIP: 0010:default_idle+0x1c/0x140 [ 47.938117] Code: 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 41 54 55 53 65 8b 2d d8 2b 74 65 0f 1f 44 00 00 e8 c6 2c 8b ff fb f4 <65> 8b 2d c5 2b 74 65 0f 1f 44 00 00 5b 5d 41 5c c3 65 8b 05 b4 2b [ 47.938117] RSP: 0018:ffffaf6ac0207ec8 EFLAGS: 00000206 ORIG_RAX: ffffffffffffff13 [ 47.938117] RAX: ffff8f5b3735e200 RBX: 0000000000000003 RCX: 0000000000000001 [ 47.938117] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff8f5b3735e200 [ 47.938117] RBP: 0000000000000003 R08: 0000000000000001 R09: 0000000000000000 [ 47.938117] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 [ 47.938117] R13: 0000000000000000 R14: ffff8f5b3735e200 R15: ffff8f5b3735e200 [ 47.938117] ? default_idle+0x1a/0x140 [ 47.938117] do_idle+0x1bc/0x280 [ 47.938117] cpu_startup_entry+0x19/0x20 [ 47.938117] start_secondary+0x187/0x1c0 [ 47.938117] secondary_startup_64+0xa4/0xb0 The reason seems to be that tipc_net_finalize()->tipc_sk_reinit() is calling the function rhashtable_walk_enter() within a timer interrupt. We fix this by executing tipc_net_finalize() in work queue context. Acked-by: Ying Xue <[email protected]> Signed-off-by: Jon Maloy <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-11-17net-gro: reset skb->pkt_type in napi_reuse_skb()Eric Dumazet1-0/+4
eth_type_trans() assumes initial value for skb->pkt_type is PACKET_HOST. This is indeed the value right after a fresh skb allocation. However, it is possible that GRO merged a packet with a different value (like PACKET_OTHERHOST in case macvlan is used), so we need to make sure napi->skb will have pkt_type set back to PACKET_HOST. Otherwise, valid packets might be dropped by the stack because their pkt_type is not PACKET_HOST. napi_reuse_skb() was added in commit 96e93eab2033 ("gro: Add internal interfaces for VLAN"), but this bug always has been there. Fixes: 96e93eab2033 ("gro: Add internal interfaces for VLAN") Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-11-17Merge branch 'tdc-fixes'David S. Miller1-5/+13
Lucas Bates says: ==================== Prevent uncaught exceptions in tdc This patch series addresses two potential bugs in tdc that can cause exceptions to be raised in certain circumstances. These exceptions are generally not handled, so instead we will prevent them from being raised. ==================== Signed-off-by: David S. Miller <[email protected]>
2018-11-17tc-testing: tdc.py: Guard against lack of returncode in executed commandBrenda J. Butler1-3/+11
Add some defensive coding in case one of the subprocesses created by tdc returns nothing. If no object is returned from exec_cmd, then tdc will halt with an unhandled exception. Signed-off-by: Brenda J. Butler <[email protected]> Signed-off-by: Lucas Bates <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-11-17tc-testing: tdc.py: ignore errors when decoding stdout/stderrLucas Bates1-2/+2
Prevent exceptions from being raised while decoding output from an executed command. There is no impact on tdc's execution and the verify command phase would fail the pattern match. Signed-off-by: Lucas Bates <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-11-17ip_tunnel: don't force DF when MTU is lockedSabrina Dubroca1-1/+1
The various types of tunnels running over IPv4 can ask to set the DF bit to do PMTU discovery. However, PMTU discovery is subject to the threshold set by the net.ipv4.route.min_pmtu sysctl, and is also disabled on routes with "mtu lock". In those cases, we shouldn't set the DF bit. This patch makes setting the DF bit conditional on the route's MTU locking state. This issue seems to be older than git history. Signed-off-by: Sabrina Dubroca <[email protected]> Reviewed-by: Stefano Brivio <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-11-17MAINTAINERS: Add entry for CAKE qdiscToke Høiland-Jørgensen1-0/+6
We would like the existing community to be kept in the loop for any new developments on CAKE; and I certainly plan to keep maintaining it. Reflect this in MAINTAINERS. Signed-off-by: Toke Høiland-Jørgensen <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-11-17net: bridge: fix vlan stats use-after-free on destructionNikolay Aleksandrov2-1/+9
Syzbot reported a use-after-free of the global vlan context on port vlan destruction. When I added per-port vlan stats I missed the fact that the global vlan context can be freed before the per-port vlan rcu callback. There're a few different ways to deal with this, I've chosen to add a new private flag that is set only when per-port stats are allocated so we can directly check it on destruction without dereferencing the global context at all. The new field in net_bridge_vlan uses a hole. v2: cosmetic change, move the check to br_process_vlan_info where the other checks are done v3: add change log in the patch, add private (in-kernel only) flags in a hole in net_bridge_vlan struct and use that instead of mixing user-space flags with private flags Fixes: 9163a0fc1f0c ("net: bridge: add support for per-port vlan stats") Reported-by: [email protected] Signed-off-by: Nikolay Aleksandrov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-11-17socket: do a generic_file_splice_read when proto_ops has no splice_readSlavomir Kaslev1-1/+1
splice(2) fails with -EINVAL when called reading on a socket with no splice_read set in its proto_ops (such as vsock sockets). Switch this to fallbacks to a generic_file_splice_read instead. Signed-off-by: Slavomir Kaslev <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-11-17net: phy: mdio-gpio: Fix working over slow can_sleep GPIOsMartin Schiller1-5/+5
Up until commit 7e5fbd1e0700 ("net: mdio-gpio: Convert to use gpiod functions where possible"), the _cansleep variants of the gpio_ API was used. After that commit and the change to gpiod_ API, the _cansleep() was dropped. This then results in WARN_ON() when used with GPIO devices which do sleep. Add back the _cansleep() to avoid this. Fixes: 7e5fbd1e0700 ("net: mdio-gpio: Convert to use gpiod functions where possible") Signed-off-by: Martin Schiller <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-11-17dax: Fix huge page faultsMatthew Wilcox1-8/+4
Using xas_load() with a PMD-sized xa_state would work if either a PMD-sized entry was present or a PTE sized entry was present in the first 64 entries (of the 512 PTEs in a PMD on x86). If there was no PTE in the first 64 entries, grab_mapping_entry() would believe there were no entries present, allocate a PMD-sized entry and overwrite the PTE in the page cache. Use xas_find_conflict() instead which turns out to simplify both get_unlocked_entry() and grab_mapping_entry(). Also remove a WARN_ON_ONCE from grab_mapping_entry() as it will have already triggered in get_unlocked_entry(). Fixes: cfc93c6c6c96 ("dax: Convert dax_insert_pfn_mkwrite to XArray") Signed-off-by: Matthew Wilcox <[email protected]>
2018-11-17dax: Fix dax_unlock_mapping_entry for PMD pagesMatthew Wilcox1-9/+8
Device DAX PMD pages do not set the PageHead bit for compound pages. Fix for now by retrieving the PMD bit from the entry, but eventually we will be passed the page size by the caller. Reported-by: Dan Williams <[email protected]> Fixes: 9f32d221301c ("dax: Convert dax_lock_mapping_entry to XArray") Signed-off-by: Matthew Wilcox <[email protected]>
2018-11-16Revert "net: phy: mdio-gpio: Fix working over slow can_sleep GPIOs"David S. Miller1-9/+5
This reverts commit dfa0d55ff6be64e7b6881212a291cb95f8da3b08. Discussion still ongoing, I shouldn't have applied this. Signed-off-by: David S. Miller <[email protected]>
2018-11-16Merge tag 'batadv-net-for-davem-20181114' of git://git.open-mesh.org/linux-mergeDavid S. Miller2-3/+5
Simon Wunderlich says: ==================== Here are two batman-adv bugfixes: - Explicitly pad short ELP packets with zeros, by Sven Eckelmann - Fix packet size calculation when merging fragments, by Sven Eckelmann ==================== Signed-off-by: David S. Miller <[email protected]>
2018-11-16net: phy: mdio-gpio: Fix working over slow can_sleep GPIOsMartin Schiller1-5/+9
This commit re-enables support for slow GPIO pins. It was initially introduced by commit 2d6c9091ab76 ("net: mdio-gpio: support access that may sleep") and got lost by commit 7e5fbd1e0700 ("net: mdio-gpio: Convert to use gpiod functions where possible"). Also add a warning about slow GPIO pins like it is done in i2c-gpio. Signed-off-by: Martin Schiller <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-11-16net/sched: act_pedit: fix memory leak when IDR allocation failsDavide Caratti1-1/+2
tcf_idr_check_alloc() can return a negative value, on allocation failures (-ENOMEM) or IDR exhaustion (-ENOSPC): don't leak keys_ex in these cases. Fixes: 0190c1d452a9 ("net: sched: atomically check-allocate action") Signed-off-by: Davide Caratti <[email protected]> Acked-by: Cong Wang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-11-16net: lantiq: Fix returned value in case of error in 'xrx200_probe()'Christophe JAILLET1-2/+3
Return 'err' in the error handling path instead of 0. Return explicitly 0 in the normal path, instead of 'err', which is known to be 0 at this point. Fixes: fe1a56420cf2 ("net: lantiq: Add Lantiq / Intel VRX200 Ethernet driver") Signed-off-by: Christophe JAILLET <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-11-16ipv6: fix a dst leak when removing its exceptionXin Long1-4/+3
These is no need to hold dst before calling rt6_remove_exception_rt(). The call to dst_hold_safe() in ip6_link_failure() was for ip6_del_rt(), which has been removed in Commit 93531c674315 ("net/ipv6: separate handling of FIB entries from dst based routes"). Otherwise, it will cause a dst leak. This patch is to simply remove the dst_hold_safe() call before calling rt6_remove_exception_rt() and also do the same in ip6_del_cached_rt(). It's safe, because the removal of the exception that holds its dst's refcnt is protected by rt6_exception_lock. Fixes: 93531c674315 ("net/ipv6: separate handling of FIB entries from dst based routes") Fixes: 23fb93a4d3f1 ("net/ipv6: Cleanup exception and cache route handling") Reported-by: Li Shuang <[email protected]> Signed-off-by: Xin Long <[email protected]> Reviewed-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-11-16net: mvneta: Don't advertise 2.5G modesMaxime Chevallier1-9/+3
Using 2.5G speed relies on the SerDes lanes being configured accordingly. The lanes have to be reconfigured to switch between 1G and 2.5G, and for now only the bootloader does this configuration. In the case we add a Comphy driver to handle switching the lanes dynamically, it's better for now to stick with supporting only 1G and add advertisement for 2.5G once we really are capable of handling both speeds without problem. Since the interface mode is initialy taken from the DT, we want to make sure that adding comphy support won't break boards that don't update their dtb. Fixes: da58a931f248 ("net: mvneta: Add support for 2500Mbps SGMII") Reported-by: Andrew Lunn <[email protected]> Reported-by: Russell King <[email protected]> Signed-off-by: Maxime Chevallier <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-11-16MAINTAINERS: Do maintain Intel GPIO drivers via separate treeAndy Shevchenko1-12/+21
We would like to consolidate Intel pure GPIO drivers, including PMICs and some old x86 platforms, in one tree which is maintained by Intel. Signed-off-by: Andy Shevchenko <[email protected]> Signed-off-by: Linus Walleij <[email protected]>
2018-11-16gpio: mockup: fix indicated directionBartosz Golaszewski1-3/+3
Commit 3edfb7bd76bd ("gpiolib: Show correct direction from the beginning") fixed an existing issue but broke libgpiod tests by changing the default direction of dummy lines to output. We don't break user-space so make gpio-mockup behave as before. Signed-off-by: Bartosz Golaszewski <[email protected]> Signed-off-by: Linus Walleij <[email protected]>
2018-11-16gpio: pxa: fix legacy non pinctrl aware builds againRobert Jarzmik1-2/+2
As pointed out by Gregor, spitz keyboard matrix is broken, with or without CONFIG_PINCTRL set, quoting : "The gpio matrix keypard on the Zaurus C3x00 (see spitz.c) does not work properly. Noticeable are that rshift+c does nothing where as lshift+c creates C. Opposite it is for rshift+a vs lshift+a, here only rshift works. This affects a few other combinations using the rshift or lshift buttons." As a matter of fact, as for platform_data based builds CONFIG_PINCTRL=n is required for now (as opposed for devicetree builds where it should be set), this means gpio driver should change the direction, which is what was attempted by commit c4e5ffb6f224 ("gpio: pxa: fix legacy non pinctrl aware builds"). Unfortunately, the input case was inverted, and the direction change was never done. This wasn't seen up until now because the initial platform setup (MFP) was setting this direction. Yet in Gregory's case, the matrix-keypad driver changes back and forth the direction dynamically, and this is why he's the first to report it. Fixes: c4e5ffb6f224 ("gpio: pxa: fix legacy non pinctrl aware builds") Tested-by: Greg <[email protected]> Signed-off-by: Robert Jarzmik <[email protected]> Signed-off-by: Linus Walleij <[email protected]>
2018-11-16dax: Reinstate RCU protection of inodeMatthew Wilcox1-3/+19
For the device-dax case, it is possible that the inode can go away underneath us. The rcu_read_lock() was there to prevent it from being freed, and not (as I thought) to protect the tree. Bring back the rcu_read_lock() protection. Also add a little kernel-doc; while this function is not exported to modules, it is used from outside dax.c Reported-by: Dan Williams <[email protected]> Fixes: 9f32d221301c ("dax: Convert dax_lock_mapping_entry to XArray") Signed-off-by: Matthew Wilcox <[email protected]>
2018-11-16dax: Make sure the unlocking entry isn't lockedMatthew Wilcox1-0/+1
I wrote the semantics in the commit message, but didn't document it in the source code. Use a BUG_ON instead (if any code does do this, it's really buggy; we can't recover and it's worth taking the machine down). Signed-off-by: Matthew Wilcox <[email protected]>
2018-11-16dax: Remove optimisation from dax_lock_mapping_entryMatthew Wilcox1-5/+2
Skipping some of the revalidation after we sleep can lead to returning a mapping which has already been freed. Just drop this optimisation. Reported-by: Dan Williams <[email protected]> Fixes: 9f32d221301c ("dax: Convert dax_lock_mapping_entry to XArray") Signed-off-by: Matthew Wilcox <[email protected]>
2018-11-16XArray tests: Correct some 64-bit assumptionsMatthew Wilcox1-2/+2
The test-suite caught these two mistakes when compiled for 32-bit. I had only been running the test-suite in 64-bit mode. Signed-off-by: Matthew Wilcox <[email protected]>
2018-11-16XArray: Correct xa_store_rangeMatthew Wilcox1-2/+3
The explicit '64' should have been BITS_PER_LONG, but while looking at this code I realised I meant to use __ffs(), not ilog2(). Signed-off-by: Matthew Wilcox <[email protected]>
2018-11-16Merge tag 'fsnotify_for_v4.20-rc3' of ↵Linus Torvalds2-7/+10
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull fsnotify fix from Jan Kara: "One small fsnotify fix for duplicate events" * tag 'fsnotify_for_v4.20-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: fanotify: fix handling of events on child sub-directory
2018-11-16Merge tag 'gfs2-4.20.fixes3' of ↵Linus Torvalds2-28/+29
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull bfs2 fixes from Andreas Gruenbacher: "Fix two bugs leading to leaked buffer head references: - gfs2: Put bitmap buffers in put_super - gfs2: Fix iomap buffer head reference counting bug And one bug leading to significant slow-downs when deleting large files: - gfs2: Fix metadata read-ahead during truncate (2)" * tag 'gfs2-4.20.fixes3' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: gfs2: Fix iomap buffer head reference counting bug gfs2: Fix metadata read-ahead during truncate (2) gfs2: Put bitmap buffers in put_super
2018-11-16gfs2: Fix iomap buffer head reference counting bugAndreas Gruenbacher1-23/+17
GFS2 passes the inode buffer head (dibh) from gfs2_iomap_begin to gfs2_iomap_end in iomap->private. It sets that private pointer in gfs2_iomap_get. Users of gfs2_iomap_get other than gfs2_iomap_begin would have to release iomap->private, but this isn't done correctly, leading to a leak of buffer head references. To fix this, move the code for setting iomap->private from gfs2_iomap_get to gfs2_iomap_begin. Fixes: 64bc06bb32 ("gfs2: iomap buffered write support") Cc: [email protected] # v4.19+ Signed-off-by: Andreas Gruenbacher <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-11-16Merge branch 'linus' of ↵Linus Torvalds4-25/+50
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto fixes from Herbert Xu: "This fixes the following issues: - Potential memory overwrite in simd - Kernel info leaks in crypto_user - NULL dereference and use-after-free in hisilicon" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: user - Zeroize whole structure given to user space crypto: user - fix leaking uninitialized memory to userspace crypto: simd - correctly take reqsize of wrapped skcipher into account crypto: hisilicon - Fix reference after free of memories on error path crypto: hisilicon - Fix NULL dereference for same dst and src