aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2020-06-26kdb: Switch to use safer dbg_io_ops over console APIsSumit Garg5-22/+24
In kgdb context, calling console handlers aren't safe due to locks used in those handlers which could in turn lead to a deadlock. Although, using oops_in_progress increases the chance to bypass locks in most console handlers but it might not be sufficient enough in case a console uses more locks (VT/TTY is good example). Currently when a driver provides both polling I/O and a console then kdb will output using the console. We can increase robustness by using the currently active polling I/O driver (which should be lockless) instead of the corresponding console. For several common cases (e.g. an embedded system with a single serial port that is used both for console output and debugger I/O) this will result in no console handler being used. In order to achieve this we need to reverse the order of preference to use dbg_io_ops (uses polling I/O mode) over console APIs. So we just store "struct console" that represents debugger I/O in dbg_io_ops and while emitting kdb messages, skip console that matches dbg_io_ops console in order to avoid duplicate messages. After this change, "is_console" param becomes redundant and hence removed. Suggested-by: Daniel Thompson <[email protected]> Signed-off-by: Sumit Garg <[email protected]> Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Douglas Anderson <[email protected]> Reviewed-by: Petr Mladek <[email protected]> Acked-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Daniel Thompson <[email protected]>
2020-06-26SUNRPC: Properly set the @subbuf parameter of xdr_buf_subsegment()Chuck Lever1-0/+4
@subbuf is an output parameter of xdr_buf_subsegment(). A survey of call sites shows that @subbuf is always uninitialized before xdr_buf_segment() is invoked by callers. There are some execution paths through xdr_buf_subsegment() that do not set all of the fields in @subbuf, leaving some pointer fields containing garbage addresses. Subsequent processing of that buffer then results in a page fault. Signed-off-by: Chuck Lever <[email protected]> Cc: <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2020-06-26NFSv4 fix CLOSE not waiting for direct IO compeletionOlga Kornievskaia2-4/+10
Figuring out the root case for the REMOVE/CLOSE race and suggesting the solution was done by Neil Brown. Currently what happens is that direct IO calls hold a reference on the open context which is decremented as an asynchronous task in the nfs_direct_complete(). Before reference is decremented, control is returned to the application which is free to close the file. When close is being processed, it decrements its reference on the open_context but since directIO still holds one, it doesn't sent a close on the wire. It returns control to the application which is free to do other operations. For instance, it can delete a file. Direct IO is finally releasing its reference and triggering an asynchronous close. Which races with the REMOVE. On the server, REMOVE can be processed before the CLOSE, failing the REMOVE with EACCES as the file is still opened. Signed-off-by: Olga Kornievskaia <[email protected]> Suggested-by: Neil Brown <[email protected]> CC: [email protected] Signed-off-by: Anna Schumaker <[email protected]>
2020-06-26pNFS/flexfiles: Fix list corruption if the mirror count changesTrond Myklebust1-4/+7
If the mirror count changes in the new layout we pick up inside ff_layout_pg_init_write(), then we can end up adding the request to the wrong mirror and corrupting the mirror->pg_list. Fixes: d600ad1f2bdb ("NFS41: pop some layoutget errors to application") Cc: [email protected] Signed-off-by: Trond Myklebust <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2020-06-26nfs: Fix memory leak of export_pathTom Rix1-0/+1
The try_location function is called within a loop by nfs_follow_referral. try_location calls nfs4_pathname_string to created the export_path. nfs4_pathname_string allocates the memory. export_path is stored in the nfs_fs_context/fs_context structure similarly as hostname and source. But whereas the ctx hostname and source are freed before assignment, export_path is not. So if there are multiple loops, the new export_path will overwrite the old without the old being freed. So call kfree for export_path. Signed-off-by: Tom Rix <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2020-06-26sunrpc: fixed rollback in rpc_gssd_dummy_populate()Vasily Averin1-0/+1
__rpc_depopulate(gssd_dentry) was lost on error path cc: [email protected] Fixes: commit 4b9a445e3eeb ("sunrpc: create a new dummy pipe for gssd to hold open") Signed-off-by: Vasily Averin <[email protected]> Reviewed-by: Jeff Layton <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2020-06-26Merge branch 'linus' into x86/entry, to resolve conflictsIngo Molnar501-2937/+4535
Conflicts: arch/x86/kernel/traps.c Signed-off-by: Ingo Molnar <[email protected]>
2020-06-26i2c: core: check returned size of emulated smbus block readMans Rullgard1-0/+7
If the i2c bus driver ignores the I2C_M_RECV_LEN flag (as some of them do), it is possible for an I2C_SMBUS_BLOCK_DATA read issued on some random device to return an arbitrary value in the first byte (and nothing else). When this happens, i2c_smbus_xfer_emulated() will happily write past the end of the supplied data buffer, thus causing Bad Things to happen. To prevent this, check the size before copying the data block and return an error if it is too large. Fixes: 209d27c3b167 ("i2c: Emulate SMBus block read over I2C") Signed-off-by: Mans Rullgard <[email protected]> [wsa: use better errno] Signed-off-by: Wolfram Sang <[email protected]>
2020-06-26MAINTAINERS: update info for sparseLuc Van Oostenryck1-1/+3
Update the info for sparse. More specifically: - change W entry to point to sparse.docs.kernel.org - add Q & B entry (patchwork & bugzilla) Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Luc Van Oostenryck <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-06-26mm/memory_hotplug.c: fix false softlockup during pfn range removalBen Widawsky1-2/+11
When working with very large nodes, poisoning the struct pages (for which there will be very many) can take a very long time. If the system is using voluntary preemptions, the software watchdog will not be able to detect forward progress. This patch addresses this issue by offering to give up time like __remove_pages() does. This behavior was introduced in v5.6 with: commit d33695b16a9f ("mm/memory_hotplug: poison memmap in remove_pfn_range_from_zone()") Alternately, init_page_poison could do this cond_resched(), but it seems to me that the caller of init_page_poison() is what actually knows whether or not it should relax its own priority. Based on Dan's notes, I think this is perfectly safe: commit f931ab479dd2 ("mm: fix devm_memremap_pages crash, use mem_hotplug_{begin, done}") Aside from fixing the lockup, it is also a friendlier thing to do on lower core systems that might wipe out large chunks of hotplug memory (probably not a very common case). Fixes this kind of splat: watchdog: BUG: soft lockup - CPU#46 stuck for 22s! [daxctl:9922] irq event stamp: 138450 hardirqs last enabled at (138449): [<ffffffffa1001f26>] trace_hardirqs_on_thunk+0x1a/0x1c hardirqs last disabled at (138450): [<ffffffffa1001f42>] trace_hardirqs_off_thunk+0x1a/0x1c softirqs last enabled at (138448): [<ffffffffa1e00347>] __do_softirq+0x347/0x456 softirqs last disabled at (138443): [<ffffffffa10c416d>] irq_exit+0x7d/0xb0 CPU: 46 PID: 9922 Comm: daxctl Not tainted 5.7.0-BEN-14238-g373c6049b336 #30 Hardware name: Intel Corporation PURLEY/PURLEY, BIOS PLYXCRB1.86B.0578.D07.1902280810 02/28/2019 RIP: 0010:memset_erms+0x9/0x10 Code: c1 e9 03 40 0f b6 f6 48 b8 01 01 01 01 01 01 01 01 48 0f af c6 f3 48 ab 89 d1 f3 aa 4c 89 c8 c3 90 49 89 f9 40 88 f0 48 89 d1 <f3> aa 4c 89 c8 c3 90 49 89 fa 40 0f b6 ce 48 b8 01 01 01 01 01 01 Call Trace: remove_pfn_range_from_zone+0x3a/0x380 memunmap_pages+0x17f/0x280 release_nodes+0x22a/0x260 __device_release_driver+0x172/0x220 device_driver_detach+0x3e/0xa0 unbind_store+0x113/0x130 kernfs_fop_write+0xdc/0x1c0 vfs_write+0xde/0x1d0 ksys_write+0x58/0xd0 do_syscall_64+0x5a/0x120 entry_SYSCALL_64_after_hwframe+0x49/0xb3 Built 2 zonelists, mobility grouping on. Total pages: 49050381 Policy zone: Normal Built 3 zonelists, mobility grouping on. Total pages: 49312525 Policy zone: Normal David said: "It really only is an issue for devmem. Ordinary hotplugged system memory is not affected (onlined/offlined in memory block granularity)." Link: http://lkml.kernel.org/r/[email protected] Fixes: commit d33695b16a9f ("mm/memory_hotplug: poison memmap in remove_pfn_range_from_zone()") Signed-off-by: Ben Widawsky <[email protected]> Reported-by: "Scargall, Steve" <[email protected]> Reported-by: Ben Widawsky <[email protected]> Acked-by: David Hildenbrand <[email protected]> Cc: Dan Williams <[email protected]> Cc: Vishal Verma <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-06-26mm: remove vmalloc_execChristoph Hellwig4-39/+3
Merge vmalloc_exec into its only caller. Note that for !CONFIG_MMU __vmalloc_node_range maps to __vmalloc, which directly clears the __GFP_HIGHMEM added by the vmalloc_exec stub anyway. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Dexuan Cui <[email protected]> Cc: Jessica Yu <[email protected]> Cc: Vitaly Kuznetsov <[email protected]> Cc: Wei Liu <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-06-26arm64: use PAGE_KERNEL_ROX directly in alloc_insn_pageChristoph Hellwig1-9/+3
Use PAGE_KERNEL_ROX directly instead of allocating RWX and setting the page read-only just after the allocation. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Christoph Hellwig <[email protected]> Acked-by: David Hildenbrand <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Dexuan Cui <[email protected]> Cc: Jessica Yu <[email protected]> Cc: Vitaly Kuznetsov <[email protected]> Cc: Wei Liu <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-06-26x86/hyperv: allocate the hypercall page with only read and execute bitsChristoph Hellwig2-1/+5
Patch series "fix a hyperv W^X violation and remove vmalloc_exec" Dexuan reported a W^X violation due to the fact that the hyper hypercall page due switching it to be allocated using vmalloc_exec. The problem is that PAGE_KERNEL_EXEC as used by vmalloc_exec actually sets writable permissions in the pte. This series fixes the issue by switching to the low-level __vmalloc_node_range interface that allows specifing more detailed permissions instead. It then also open codes the other two callers and removes the somewhat confusing vmalloc_exec interface. Peter noted that the hyper hypercall page allocation also has another long standing issue in that it shouldn't use the full vmalloc but just the module space. This issue is so far theoretical as the allocation is done early in the boot process. I plan to fix it with another bigger series for 5.9. This patch (of 3): Avoid a W^X violation cause by the fact that PAGE_KERNEL_EXEC includes the writable bit. For this resurrect the removed PAGE_KERNEL_RX definition, but as PAGE_KERNEL_ROX to match arm64 and powerpc. Link: http://lkml.kernel.org/r/[email protected] Fixes: 78bb17f76edc ("x86/hyperv: use vmalloc_exec for the hypercall page") Signed-off-by: Christoph Hellwig <[email protected]> Reported-by: Dexuan Cui <[email protected]> Tested-by: Vitaly Kuznetsov <[email protected]> Acked-by: Wei Liu <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Will Deacon <[email protected]> Cc: Jessica Yu <[email protected]> Cc: David Hildenbrand <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-06-26mm/memory: fix IO cost for anonymous pageJoonsoo Kim1-0/+8
With synchronous IO swap device, swap-in is directly handled in fault code. Since IO cost notation isn't added there, with synchronous IO swap device, LRU balancing could be wrongly biased. Fix it to count it in fault code. Link: http://lkml.kernel.org/r/[email protected] Fixes: 314b57fb0460001 ("mm: balance LRU lists based on relative thrashing cache sizing") Signed-off-by: Joonsoo Kim <[email protected]> Acked-by: Johannes Weiner <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Rik van Riel <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-06-26mm/swap: fix for "mm: workingset: age nonresident information alongside ↵Joonsoo Kim1-2/+1
anonymous pages" Non-file-lru page could also be activated in mark_page_accessed() and we need to count this activation for nonresident_age. Note that it's better for this patch to be squashed into the patch "mm: workingset: age nonresident information alongside anonymous pages". Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Joonsoo Kim <[email protected]> Acked-by: Johannes Weiner <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Rik van Riel <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-06-26mm: workingset: age nonresident information alongside anonymous pagesJohannes Weiner4-21/+33
Patch series "fix for "mm: balance LRU lists based on relative thrashing" patchset" This patchset fixes some problems of the patchset, "mm: balance LRU lists based on relative thrashing", which is now merged on the mainline. Patch "mm: workingset: let cache workingset challenge anon fix" is the result of discussion with Johannes. See following link. http://lkml.kernel.org/r/[email protected] And, the other two are minor things which are found when I try to rebase my patchset. This patch (of 3): After ("mm: workingset: let cache workingset challenge anon fix"), we compare refault distances to active_file + anon. But age of the non-resident information is only driven by the file LRU. As a result, we may overestimate the recency of any incoming refaults and activate them too eagerly, causing unnecessary LRU churn in certain situations. Make anon aging drive nonresident age as well to address that. Link: http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/[email protected] Fixes: 34e58cac6d8f2a ("mm: workingset: let cache workingset challenge anon") Reported-by: Joonsoo Kim <[email protected]> Signed-off-by: Johannes Weiner <[email protected]> Signed-off-by: Joonsoo Kim <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Michal Hocko <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-06-26doc: THP CoW fault no longer allocate THPYang Shi2-4/+3
Since commit 3917c80280c9 ("thp: change CoW semantics for anon-THP"), THP CoW page fault is rewritten. Now it just splits pmd then fallback to base page fault, it doesn't try to allocate THP anymore. So it is no longer counted in THP_FAULT_ALLOC. Remove the obsolete statement in documentation about THP CoW allocation to avoid confusion. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Yang Shi <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Zi Yan <[email protected]> Cc: Jonathan Corbet <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-06-26docs: mm/gup: minor documentation updateSouptick Joarder1-1/+1
Now there are 5 cases. Updated the same. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Souptick Joarder <[email protected]> Reviewed-by: John Hubbard <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Jan Kara <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-06-26mm/memcontrol.c: prevent missed memory.low load tearsChris Down1-2/+3
Looks like one of these got missed when massaging in f86b810c2610 ("mm, memcg: prevent memory.low load/store tearing") with other linux-mm changes. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Chris Down <[email protected]> Reported-by: Michal Koutny <[email protected]> Acked-by: Michal Hocko <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-06-26mm/memcontrol.c: add missed css_put()Muchun Song1-1/+3
We should put the css reference when memory allocation failed. Link: http://lkml.kernel.org/r/[email protected] Fixes: f0a3a24b532d ("mm: memcg/slab: rework non-root kmem_cache lifecycle management") Signed-off-by: Muchun Song <[email protected]> Acked-by: Roman Gushchin <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Vladimir Davydov <[email protected]> Cc: Qian Cai <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-06-26mm: memcontrol: handle div0 crash race condition in memory.lowJohannes Weiner1-2/+7
Tejun reports seeing rare div0 crashes in memory.low stress testing: RIP: 0010:mem_cgroup_calculate_protection+0xed/0x150 Code: 0f 46 d1 4c 39 d8 72 57 f6 05 16 d6 42 01 40 74 1f 4c 39 d8 76 1a 4c 39 d1 76 15 4c 29 d1 4c 29 d8 4d 29 d9 31 d2 48 0f af c1 <49> f7 f1 49 01 c2 4c 89 96 38 01 00 00 5d c3 48 0f af c7 31 d2 49 RSP: 0018:ffffa14e01d6fcd0 EFLAGS: 00010246 RAX: 000000000243e384 RBX: 0000000000000000 RCX: 0000000000008f4b RDX: 0000000000000000 RSI: ffff8b89bee84000 RDI: 0000000000000000 RBP: ffffa14e01d6fcd0 R08: ffff8b89ca7d40f8 R09: 0000000000000000 R10: 0000000000000000 R11: 00000000006422f7 R12: 0000000000000000 R13: ffff8b89d9617000 R14: ffff8b89bee84000 R15: ffffa14e01d6fdb8 FS: 0000000000000000(0000) GS:ffff8b8a1f1c0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f93b1fc175b CR3: 000000016100a000 CR4: 0000000000340ea0 Call Trace: shrink_node+0x1e5/0x6c0 balance_pgdat+0x32d/0x5f0 kswapd+0x1d7/0x3d0 kthread+0x11c/0x160 ret_from_fork+0x1f/0x30 This happens when parent_usage == siblings_protected. We check that usage is bigger than protected, which should imply parent_usage being bigger than siblings_protected. However, we don't read (or even update) these values atomically, and they can be out of sync as the memory state changes under us. A bit of fluctuation around the target protection isn't a big deal, but we need to handle the div0 case. Check the parent state explicitly to make sure we have a reasonable positive value for the divisor. Link: http://lkml.kernel.org/r/[email protected] Fixes: 8a931f801340 ("mm: memcontrol: recursive memory.low protection") Signed-off-by: Johannes Weiner <[email protected]> Reported-by: Tejun Heo <[email protected]> Acked-by: Michal Hocko <[email protected]> Acked-by: Chris Down <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-06-26mm/vmalloc.c: fix a warning while make xmldocsMasanari Iida1-1/+0
This patch fixes following warning while "make xmldocs" mm/vmalloc.c:1877: warning: Excess function parameter 'prot' description in 'vm_map_ram' This warning started since commit d4efd79a81ab ("mm: remove the prot argument from vm_map_ram"). Link: http://lkml.kernel.org/r/[email protected] Fixes: d4efd79a81ab ("mm: remove the prot argument from vm_map_ram") Signed-off-by: Masanari Iida <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-06-26media: omap3isp: remove cacheflush.hNathan Chancellor2-3/+0
After mm.h was removed from the asm-generic version of cacheflush.h, s390 allyesconfig shows several warnings of the following nature: In file included from arch/s390/include/generated/asm/cacheflush.h:1, from drivers/media/platform/omap3isp/isp.c:42: include/asm-generic/cacheflush.h:16:42: warning: 'struct mm_struct' declared inside parameter list will not be visible outside of this definition or declaration As Geert and Laurent point out, this driver does not need this header in the two files that include it. Remove it so there are no warnings. Link: http://lkml.kernel.org/r/[email protected] Fixes: e0cf615d725c ("asm-generic: don't include <linux/mm.h> in cacheflush.h") Signed-off-by: Nathan Chancellor <[email protected]> Suggested-by: Geert Uytterhoeven <[email protected]> Suggested-by: Laurent Pinchart <[email protected]> Reviewed-by: Laurent Pinchart <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Mauro Carvalho Chehab <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-06-26make asm-generic/cacheflush.h more standaloneStephen Rothwell1-0/+5
Some s390 builds get these warnings: include/asm-generic/cacheflush.h:16:42: warning: 'struct mm_struct' declared inside parameter list will not be visible outside of this definition or declaration include/asm-generic/cacheflush.h:22:46: warning: 'struct mm_struct' declared inside parameter list will not be visible outside of this definition or declaration include/asm-generic/cacheflush.h:28:45: warning: 'struct vm_area_struct' declared inside parameter list will not be visible outside of this definition or declaration include/asm-generic/cacheflush.h:36:44: warning: 'struct vm_area_struct' declared inside parameter list will not be visible outside of this definition or declaration include/asm-generic/cacheflush.h:44:45: warning: 'struct page' declared inside parameter list will not be visible outside of this definition or declaration include/asm-generic/cacheflush.h:52:50: warning: 'struct address_space' declared inside parameter list will not be visible outside of this definition or declaration include/asm-generic/cacheflush.h:58:52: warning: 'struct address_space' declared inside parameter list will not be visible outside of this definition or declaration include/asm-generic/cacheflush.h:75:17: warning: 'struct page' declared inside parameter list will not be visible outside of this definition or declaration include/asm-generic/cacheflush.h:74:45: warning: 'struct vm_area_struct' declared inside parameter list will not be visible outside of this definition or declaration include/asm-generic/cacheflush.h:82:16: warning: 'struct page' declared inside parameter list will not be visible outside of this definition or declaration include/asm-generic/cacheflush.h:81:50: warning: 'struct vm_area_struct' declared inside parameter list will not be visible outside of this definition or declaration Forward declare the named structs to get rid of these. Link: http://lkml.kernel.org/r/[email protected] Fixes: e0cf615d725c ("asm-generic: don't include <linux/mm.h> in cacheflush.h") Signed-off-by: Stephen Rothwell <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Acked-by: Arnd Bergmann <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Vasily Gorbik <[email protected]> Cc: Christian Borntraeger <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-06-26mm/debug_vm_pgtable: fix build failure with powerpc 8xxChristophe Leroy1-2/+2
Since commit 9e343b467c70 ("READ_ONCE: Enforce atomicity for {READ,WRITE}_ONCE() memory accesses"), READ_ONCE() cannot be used anymore to read complex page table entries. This leads to: CC mm/debug_vm_pgtable.o In file included from ./include/asm-generic/bug.h:5, from ./arch/powerpc/include/asm/bug.h:109, from ./include/linux/bug.h:5, from ./include/linux/mmdebug.h:5, from ./include/linux/gfp.h:5, from mm/debug_vm_pgtable.c:13: In function 'pte_clear_tests', inlined from 'debug_vm_pgtable' at mm/debug_vm_pgtable.c:363:2: ./include/linux/compiler.h:392:38: error: Unsupported access size for {READ,WRITE}_ONCE(). mm/debug_vm_pgtable.c:249:14: note: in expansion of macro 'READ_ONCE' 249 | pte_t pte = READ_ONCE(*ptep); | ^~~~~~~~~ make[2]: *** [mm/debug_vm_pgtable.o] Error 1 Fix it by using the recently added ptep_get() helper. Link: http://lkml.kernel.org/r/6ca8c972e6c920dc4ae0d4affbed9703afa4d010.1592490570.git.christophe.leroy@csgroup.eu Fixes: 9e343b467c70 ("READ_ONCE: Enforce atomicity for {READ,WRITE}_ONCE() memory accesses") Signed-off-by: Christophe Leroy <[email protected]> Acked-by: Will Deacon <[email protected]> Reviewed-by: Anshuman Khandual <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: "Peter Zijlstra (Intel)" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-06-26mm/memory.c: properly pte_offset_map_lock/unlock in vm_insert_pages()Arjun Roy1-10/+11
Calls to pte_offset_map() in vm_insert_pages() are erroneously not matched with a call to pte_unmap(). This would cause problems on architectures where that is not a no-op. This patch does away with the non-traditional locking in the existing code, and instead uses pte_offset_map_lock/unlock() as usual, incrementing PTE as necessary. The PTE pointer is kept within bounds since we clamp it with PTRS_PER_PTE. Link: http://lkml.kernel.org/r/[email protected] Fixes: 8cd3984d81d5 ("mm/memory.c: add vm_insert_pages()") Signed-off-by: Arjun Roy <[email protected]> Acked-by: David Rientjes <[email protected]> Cc: Eric Dumazet <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Soheil Hassas Yeganeh <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-06-26mm: fix swap cache node allocation maskHugh Dickins1-2/+2
Chris Murphy reports that a slightly overcommitted load, testing swap and zram along with i915, splats and keeps on splatting, when it had better fail less noisily: gnome-shell: page allocation failure: order:0, mode:0x400d0(__GFP_IO|__GFP_FS|__GFP_COMP|__GFP_RECLAIMABLE), nodemask=(null),cpuset=/,mems_allowed=0 CPU: 2 PID: 1155 Comm: gnome-shell Not tainted 5.7.0-1.fc33.x86_64 #1 Call Trace: dump_stack+0x64/0x88 warn_alloc.cold+0x75/0xd9 __alloc_pages_slowpath.constprop.0+0xcfa/0xd30 __alloc_pages_nodemask+0x2df/0x320 alloc_slab_page+0x195/0x310 allocate_slab+0x3c5/0x440 ___slab_alloc+0x40c/0x5f0 __slab_alloc+0x1c/0x30 kmem_cache_alloc+0x20e/0x220 xas_nomem+0x28/0x70 add_to_swap_cache+0x321/0x400 __read_swap_cache_async+0x105/0x240 swap_cluster_readahead+0x22c/0x2e0 shmem_swapin+0x8e/0xc0 shmem_swapin_page+0x196/0x740 shmem_getpage_gfp+0x3a2/0xa60 shmem_read_mapping_page_gfp+0x32/0x60 shmem_get_pages+0x155/0x5e0 [i915] __i915_gem_object_get_pages+0x68/0xa0 [i915] i915_vma_pin+0x3fe/0x6c0 [i915] eb_add_vma+0x10b/0x2c0 [i915] i915_gem_do_execbuffer+0x704/0x3430 [i915] i915_gem_execbuffer2_ioctl+0x1ea/0x3e0 [i915] drm_ioctl_kernel+0x86/0xd0 [drm] drm_ioctl+0x206/0x390 [drm] ksys_ioctl+0x82/0xc0 __x64_sys_ioctl+0x16/0x20 do_syscall_64+0x5b/0xf0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Reported on 5.7, but it goes back really to 3.1: when shmem_read_mapping_page_gfp() was implemented for use by i915, and allowed for __GFP_NORETRY and __GFP_NOWARN flags in most places, but missed swapin's "& GFP_KERNEL" mask for page tree node allocation in __read_swap_cache_async() - that was to mask off HIGHUSER_MOVABLE bits from what page cache uses, but GFP_RECLAIM_MASK is now what's needed. Link: https://bugzilla.kernel.org/show_bug.cgi?id=208085 Link: http://lkml.kernel.org/r/[email protected] Fixes: 68da9f055755 ("tmpfs: pass gfp to shmem_getpage_gfp") Signed-off-by: Hugh Dickins <[email protected]> Reviewed-by: Vlastimil Babka <[email protected]> Reviewed-by: Matthew Wilcox (Oracle) <[email protected]> Reported-by: Chris Murphy <[email protected]> Analyzed-by: Vlastimil Babka <[email protected]> Analyzed-by: Matthew Wilcox <[email protected]> Tested-by: Chris Murphy <[email protected]> Cc: <[email protected]> [3.1+] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-06-26slub: cure list_slab_objects() from double fixSebastian Andrzej Siewior1-15/+4
According to Christopher Lameter two fixes have been merged for the same problem. As far as I can tell, the code does not acquire the list_lock and invoke kmalloc(). list_slab_objects() misses an unlock (the counterpart to get_map()) and the memory allocated in free_partial() isn't used. Revert the mentioned commit. Link: http://lkml.kernel.org/r/[email protected] Fixes: aa456c7aebb14 ("slub: remove kmalloc under list_lock from list_slab_objects() V2") Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Sebastian Andrzej Siewior <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: David Rientjes <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Yu Zhao <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-06-26mm/slab: use memzero_explicit() in kzfree()Waiman Long1-1/+1
The kzfree() function is normally used to clear some sensitive information, like encryption keys, in the buffer before freeing it back to the pool. Memset() is currently used for buffer clearing. However unlikely, there is still a non-zero probability that the compiler may choose to optimize away the memory clearing especially if LTO is being used in the future. To make sure that this optimization will never happen, memzero_explicit(), which is introduced in v3.18, is now used in kzfree() to future-proof it. Link: http://lkml.kernel.org/r/[email protected] Fixes: 3ef0e5ba4673 ("slab: introduce kzfree()") Signed-off-by: Waiman Long <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: David Howells <[email protected]> Cc: Jarkko Sakkinen <[email protected]> Cc: James Morris <[email protected]> Cc: "Serge E. Hallyn" <[email protected]> Cc: Joe Perches <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: David Rientjes <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Dan Carpenter <[email protected]> Cc: "Jason A . Donenfeld" <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-06-26mm, slab: fix sign conversion problem in memcg_uncharge_slab()Waiman Long1-2/+2
It was found that running the LTP test on a PowerPC system could produce erroneous values in /proc/meminfo, like: MemTotal: 531915072 kB MemFree: 507962176 kB MemAvailable: 1100020596352 kB Using bisection, the problem is tracked down to commit 9c315e4d7d8c ("mm: memcg/slab: cache page number in memcg_(un)charge_slab()"). In memcg_uncharge_slab() with a "int order" argument: unsigned int nr_pages = 1 << order; : mod_lruvec_state(lruvec, cache_vmstat_idx(s), -nr_pages); The mod_lruvec_state() function will eventually call the __mod_zone_page_state() which accepts a long argument. Depending on the compiler and how inlining is done, "-nr_pages" may be treated as a negative number or a very large positive number. Apparently, it was treated as a large positive number in that PowerPC system leading to incorrect stat counts. This problem hasn't been seen in x86-64 yet, perhaps the gcc compiler there has some slight difference in behavior. It is fixed by making nr_pages a signed value. For consistency, a similar change is applied to memcg_charge_slab() as well. Link: http://lkml.kernel.org/r/[email protected] Fixes: 9c315e4d7d8c ("mm: memcg/slab: cache page number in memcg_(un)charge_slab()"). Signed-off-by: Waiman Long <[email protected]> Acked-by: Roman Gushchin <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: David Rientjes <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Shakeel Butt <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Vladimir Davydov <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-06-26lib: fix test_hmm.c reference after freeRandy Dunlap1-2/+1
Coccinelle scripts report the following errors: lib/test_hmm.c:523:20-26: ERROR: reference preceded by free on line 521 lib/test_hmm.c:524:21-27: ERROR: reference preceded by free on line 521 lib/test_hmm.c:523:28-35: ERROR: devmem is NULL but dereferenced. lib/test_hmm.c:524:29-36: ERROR: devmem is NULL but dereferenced. Fix these by using the local variable 'res' instead of devmem. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Randy Dunlap <[email protected]> Reviewed-by: Ralph Campbell <[email protected]> Cc: Jérôme Glisse <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-06-26ocfs2: fix value of OCFS2_INVALID_SLOTJunxiao Bi1-1/+1
In the ocfs2 disk layout, slot number is 16 bits, but in ocfs2 implementation, slot number is 32 bits. Usually this will not cause any issue, because slot number is converted from u16 to u32, but OCFS2_INVALID_SLOT was defined as -1, when an invalid slot number from disk was obtained, its value was (u16)-1, and it was converted to u32. Then the following checking in get_local_system_inode will be always skipped: static struct inode **get_local_system_inode(struct ocfs2_super *osb, int type, u32 slot) { BUG_ON(slot == OCFS2_INVALID_SLOT); ... } Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Junxiao Bi <[email protected]> Reviewed-by: Joseph Qi <[email protected]> Cc: Mark Fasheh <[email protected]> Cc: Joel Becker <[email protected]> Cc: Changwei Ge <[email protected]> Cc: Gang He <[email protected]> Cc: Jun Piao <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-06-26ocfs2: fix panic on nfs server over ocfs2Junxiao Bi1-3/+6
The following kernel panic was captured when running nfs server over ocfs2, at that time ocfs2_test_inode_bit() was checking whether one inode locating at "blkno" 5 was valid, that is ocfs2 root inode, its "suballoc_slot" was OCFS2_INVALID_SLOT(65535) and it was allocted from //global_inode_alloc, but here it wrongly assumed that it was got from per slot inode alloctor which would cause array overflow and trigger kernel panic. BUG: unable to handle kernel paging request at 0000000000001088 IP: [<ffffffff816f6898>] _raw_spin_lock+0x18/0xf0 PGD 1e06ba067 PUD 1e9e7d067 PMD 0 Oops: 0002 [#1] SMP CPU: 6 PID: 24873 Comm: nfsd Not tainted 4.1.12-124.36.1.el6uek.x86_64 #2 Hardware name: Huawei CH121 V3/IT11SGCA1, BIOS 3.87 02/02/2018 RIP: _raw_spin_lock+0x18/0xf0 RSP: e02b:ffff88005ae97908 EFLAGS: 00010206 RAX: ffff88005ae98000 RBX: 0000000000001088 RCX: 0000000000000000 RDX: 0000000000020000 RSI: 0000000000000009 RDI: 0000000000001088 RBP: ffff88005ae97928 R08: 0000000000000000 R09: ffff880212878e00 R10: 0000000000007ff0 R11: 0000000000000000 R12: 0000000000001088 R13: ffff8800063c0aa8 R14: ffff8800650c27d0 R15: 000000000000ffff FS: 0000000000000000(0000) GS:ffff880218180000(0000) knlGS:ffff880218180000 CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000001088 CR3: 00000002033d0000 CR4: 0000000000042660 Call Trace: igrab+0x1e/0x60 ocfs2_get_system_file_inode+0x63/0x3a0 [ocfs2] ocfs2_test_inode_bit+0x328/0xa00 [ocfs2] ocfs2_get_parent+0xba/0x3e0 [ocfs2] reconnect_path+0xb5/0x300 exportfs_decode_fh+0xf6/0x2b0 fh_verify+0x350/0x660 [nfsd] nfsd4_putfh+0x4d/0x60 [nfsd] nfsd4_proc_compound+0x3d3/0x6f0 [nfsd] nfsd_dispatch+0xe0/0x290 [nfsd] svc_process_common+0x412/0x6a0 [sunrpc] svc_process+0x123/0x210 [sunrpc] nfsd+0xff/0x170 [nfsd] kthread+0xcb/0xf0 ret_from_fork+0x61/0x90 Code: 83 c2 02 0f b7 f2 e8 18 dc 91 ff 66 90 eb bf 0f 1f 40 00 55 48 89 e5 41 56 41 55 41 54 53 0f 1f 44 00 00 48 89 fb ba 00 00 02 00 <f0> 0f c1 17 89 d0 45 31 e4 45 31 ed c1 e8 10 66 39 d0 41 89 c6 RIP _raw_spin_lock+0x18/0xf0 CR2: 0000000000001088 ---[ end trace 7264463cd1aac8f9 ]--- Kernel panic - not syncing: Fatal exception Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Junxiao Bi <[email protected]> Reviewed-by: Joseph Qi <[email protected]> Cc: Changwei Ge <[email protected]> Cc: Gang He <[email protected]> Cc: Joel Becker <[email protected]> Cc: Jun Piao <[email protected]> Cc: Mark Fasheh <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-06-26ocfs2: load global_inode_allocJunxiao Bi1-1/+1
Set global_inode_alloc as OCFS2_FIRST_ONLINE_SYSTEM_INODE, that will make it load during mount. It can be used to test whether some global/system inodes are valid. One use case is that nfsd will test whether root inode is valid. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Junxiao Bi <[email protected]> Reviewed-by: Joseph Qi <[email protected]> Cc: Changwei Ge <[email protected]> Cc: Gang He <[email protected]> Cc: Joel Becker <[email protected]> Cc: Jun Piao <[email protected]> Cc: Mark Fasheh <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-06-26ocfs2: avoid inode removal while nfsd is accessing itJunxiao Bi2-1/+17
Patch series "ocfs2: fix nfsd over ocfs2 issues", v2. This is a series of patches to fix issues on nfsd over ocfs2. patch 1 is to avoid inode removed while nfsd access it patch 2 & 3 is to fix a panic issue. This patch (of 4): When nfsd is getting file dentry using handle or parent dentry of some dentry, one cluster lock is used to avoid inode removed from other node, but it still could be removed from local node, so use a rw lock to avoid this. Link: http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Junxiao Bi <[email protected]> Reviewed-by: Joseph Qi <[email protected]> Cc: Changwei Ge <[email protected]> Cc: Gang He <[email protected]> Cc: Joel Becker <[email protected]> Cc: Jun Piao <[email protected]> Cc: Mark Fasheh <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-06-26kexec: do not verify the signature without the lockdown or mandatory signatureLianbo Jiang1-28/+6
Signature verification is an important security feature, to protect system from being attacked with a kernel of unknown origin. Kexec rebooting is a way to replace the running kernel, hence need be secured carefully. In the current code of handling signature verification of kexec kernel, the logic is very twisted. It mixes signature verification, IMA signature appraising and kexec lockdown. If there is no KEXEC_SIG_FORCE, kexec kernel image doesn't have one of signature, the supported crypto, and key, we don't think this is wrong, Unless kexec lockdown is executed. IMA is considered as another kind of signature appraising method. If kexec kernel image has signature/crypto/key, it has to go through the signature verification and pass. Otherwise it's seen as verification failure, and won't be loaded. Seems kexec kernel image with an unqualified signature is even worse than those w/o signature at all, this sounds very unreasonable. E.g. If people get a unsigned kernel to load, or a kernel signed with expired key, which one is more dangerous? So, here, let's simplify the logic to improve code readability. If the KEXEC_SIG_FORCE enabled or kexec lockdown enabled, signature verification is mandated. Otherwise, we lift the bar for any kernel image. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Lianbo Jiang <[email protected]> Reviewed-by: Jiri Bohac <[email protected]> Acked-by: Dave Young <[email protected]> Acked-by: Baoquan He <[email protected]> Cc: James Morris <[email protected]> Cc: Matthew Garrett <[email protected]> Cc: "Eric W. Biederman" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-06-26mm, compaction: make capture control handling safe wrt interruptsVlastimil Babka1-3/+14
Hugh reports: "While stressing compaction, one run oopsed on NULL capc->cc in __free_one_page()'s task_capc(zone): compact_zone_order() had been interrupted, and a page was being freed in the return from interrupt. Though you would not expect it from the source, both gccs I was using (4.8.1 and 7.5.0) had chosen to compile compact_zone_order() with the ".cc = &cc" implemented by mov %rbx,-0xb0(%rbp) immediately before callq compact_zone - long after the "current->capture_control = &capc". An interrupt in between those finds capc->cc NULL (zeroed by an earlier rep stos). This could presumably be fixed by a barrier() before setting current->capture_control in compact_zone_order(); but would also need more care on return from compact_zone(), in order not to risk leaking a page captured by interrupt just before capture_control is reset. Maybe that is the preferable fix, but I felt safer for task_capc() to exclude the rather surprising possibility of capture at interrupt time" I have checked that gcc10 also behaves the same. The advantage of fix in compact_zone_order() is that we don't add another test in the page freeing hot path, and that it might prevent future problems if we stop exposing pointers to uninitialized structures in current task. So this patch implements the suggestion for compact_zone_order() with barrier() (and WRITE_ONCE() to prevent store tearing) for setting current->capture_control, and prevents page leaking with WRITE_ONCE/READ_ONCE in the proper order. Link: http://lkml.kernel.org/r/[email protected] Fixes: 5e1f0f098b46 ("mm, compaction: capture a page under direct compaction") Signed-off-by: Vlastimil Babka <[email protected]> Reported-by: Hugh Dickins <[email protected]> Suggested-by: Hugh Dickins <[email protected]> Acked-by: Hugh Dickins <[email protected]> Cc: Alex Shi <[email protected]> Cc: Li Wang <[email protected]> Cc: Mel Gorman <[email protected]> Cc: <[email protected]> [5.1+] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-06-26mm: do_swap_page(): fix up the error codeMichal Hocko1-1/+3
do_swap_page() returns error codes from the VM_FAULT* space. try_charge() might return -ENOMEM, though, and then do_swap_page() simply returns 0 which means a success. We almost never return ENOMEM for GFP_KERNEL single page charge. Except for async OOM handling (oom_disabled v1). So this needs translation to VM_FAULT_OOM otherwise the the page fault path will not notify the userspace and wait for an action. Link: http://lkml.kernel.org/r/[email protected] Fixes: 4c6355b25e8b ("mm: memcontrol: charge swapin pages on instantiation") Signed-off-by: Michal Hocko <[email protected]> Acked-by: Johannes Weiner <[email protected]> Cc: Alex Shi <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Shakeel Butt <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Cc: Roman Gushchin <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-06-26openrisc: fix boot oops when DEBUG_VM is enabledStafford Horne1-0/+5
Since v5.8-rc1 OpenRISC Linux fails to boot when DEBUG_VM is enabled. This has been bisected to commit 42fc541404f2 ("mmap locking API: add mmap_assert_locked() and mmap_assert_write_locked()"). The added locking checks exposed the issue that OpenRISC was not taking this mmap lock when during page walks for DMA operations. This patch locks and unlocks the mmap lock for page walking. Link: http://lkml.kernel.org/r/[email protected] Fixes: 42fc541404f2 ("mmap locking API: add mmap_assert_locked() and mmap_assert_write_locked()" Signed-off-by: Stafford Horne <[email protected]> Reviewed-by: Michel Lespinasse <[email protected]> Cc: Jonas Bonn <[email protected]> Cc: Stefan Kristiansson <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Steven Price <[email protected]> Cc: Thomas Hellstrom <[email protected]> Cc: Robin Murphy <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Daniel Jordan <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-06-26Merge tag 'drm-misc-fixes-2020-06-25' of ↵Dave Airlie11-26/+83
git://anongit.freedesktop.org/drm/drm-misc into drm-fixes Short summary of fixes pull (less than what git shortlog provides): * In mcde, set up fbdev after device registration and removde the last access to dev->dev_private. Fixes an error message and a segmentation fault. * Set the connector type for LogicPT Type 28 and newhaven_nhd_43_480272ef_atxl panels. * In uvesafb, fix the handling of the noblank option. * Fix panel orientation for Asus T101HA and Acer S1003. * Fix DMA configuration for sun4i if IOMMU is present. * Fix regression in VT restoration. Unbreaks userspace (i.e., Xorg) VT handling. Signed-off-by: Dave Airlie <[email protected]> From: Thomas Zimmermann <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/20200625082717.GA14856@linux-uq9g
2020-06-26selftests/powerpc: Fix build failure in ebb testsHarish1-1/+1
We use OUTPUT directory as TMPOUT for checking no-pie option. Since commit f2f02ebd8f38 ("kbuild: improve cc-option to clean up all temporary files") when building powerpc/ from selftests directory, the OUTPUT directory points to powerpc/pmu/ebb/ and gets removed when checking for -no-pie option in try-run routine, subsequently build fails with the following: $ make -C powerpc ... TARGET=ebb; BUILD_TARGET=$OUTPUT/$TARGET; mkdir -p $BUILD_TARGET; make OUTPUT=$BUILD_TARGET -k -C $TARGET all make[2]: Entering directory '/home/linux-master/tools/testing/selftests/powerpc/pmu/ebb' make[2]: *** No rule to make target 'Makefile'. make[2]: Failed to remake makefile 'Makefile'. make[2]: *** No rule to make target 'ebb.c', needed by '/home/linux-master/tools/testing/selftests/powerpc/pmu/ebb/reg_access_test'. make[2]: *** No rule to make target 'ebb_handler.S', needed by '/home/linux-master/tools/testing/selftests/powerpc/pmu/ebb/reg_access_test'. make[2]: *** No rule to make target 'trace.c', needed by '/home/linux-master/tools/testing/selftests/powerpc/pmu/ebb/reg_access_test'. make[2]: *** No rule to make target 'busy_loop.S', needed by '/home/linux-master/tools/testing/selftests/powerpc/pmu/ebb/reg_access_test'. make[2]: Target 'all' not remade because of errors. Fix this by adding a suffix to the OUTPUT directory so that the failure is avoided. Fixes: 9686813f6e9d ("selftests/powerpc: Fix try-run when source tree is not writable") Signed-off-by: Harish <[email protected]> [mpe: Mention that commit that triggered the breakage] Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-06-25Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds223-1156/+1973
Pull networking fixes from David Miller: 1) Don't insert ESP trailer twice in IPSEC code, from Huy Nguyen. 2) The default crypto algorithm selection in Kconfig for IPSEC is out of touch with modern reality, fix this up. From Eric Biggers. 3) bpftool is missing an entry for BPF_MAP_TYPE_RINGBUF, from Andrii Nakryiko. 4) Missing init of ->frame_sz in xdp_convert_zc_to_xdp_frame(), from Hangbin Liu. 5) Adjust packet alignment handling in ax88179_178a driver to match what the hardware actually does. From Jeremy Kerr. 6) register_netdevice can leak in the case one of the notifiers fail, from Yang Yingliang. 7) Use after free in ip_tunnel_lookup(), from Taehee Yoo. 8) VLAN checks in sja1105 DSA driver need adjustments, from Vladimir Oltean. 9) tg3 driver can sleep forever when we get enough EEH errors, fix from David Christensen. 10) Missing {READ,WRITE}_ONCE() annotations in various Intel ethernet drivers, from Ciara Loftus. 11) Fix scanning loop break condition in of_mdiobus_register(), from Florian Fainelli. 12) MTU limit is incorrect in ibmveth driver, from Thomas Falcon. 13) Endianness fix in mlxsw, from Ido Schimmel. 14) Use after free in smsc95xx usbnet driver, from Tuomas Tynkkynen. 15) Missing bridge mrp configuration validation, from Horatiu Vultur. 16) Fix circular netns references in wireguard, from Jason A. Donenfeld. 17) PTP initialization on recovery is not done properly in qed driver, from Alexander Lobakin. 18) Endian conversion of L4 ports in filters of cxgb4 driver is wrong, from Rahul Lakkireddy. 19) Don't clear bound device TX queue of socket prematurely otherwise we get problems with ktls hw offloading, from Tariq Toukan. 20) ipset can do atomics on unaligned memory, fix from Russell King. 21) Align ethernet addresses properly in bridging code, from Thomas Martitz. 22) Don't advertise ipv4 addresses on SCTP sockets having ipv6only set, from Marcelo Ricardo Leitner. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (149 commits) rds: transport module should be auto loaded when transport is set sch_cake: fix a few style nits sch_cake: don't call diffserv parsing code when it is not needed sch_cake: don't try to reallocate or unshare skb unconditionally ethtool: fix error handling in linkstate_prepare_data() wil6210: account for napi_gro_receive never returning GRO_DROP hns: do not cast return value of napi_gro_receive to null socionext: account for napi_gro_receive never returning GRO_DROP wireguard: receive: account for napi_gro_receive never returning GRO_DROP vxlan: fix last fdb index during dump of fdb with nhid sctp: Don't advertise IPv4 addresses if ipv6only is set on the socket tc-testing: avoid action cookies with odd length. bpf: tcp: bpf_cubic: fix spurious HYSTART_DELAY exit upon drop in min RTT tcp_cubic: fix spurious HYSTART_DELAY exit upon drop in min RTT net: dsa: sja1105: fix tc-gate schedule with single element net: dsa: sja1105: recalculate gating subschedule after deleting tc-gate rules net: dsa: sja1105: unconditionally free old gating config net: dsa: sja1105: move sja1105_compose_gating_subschedule at the top net: macb: free resources on failure path of at91ether_open() net: macb: call pm_runtime_put_sync on failure path ...
2020-06-25rds: transport module should be auto loaded when transport is setRao Shoaib2-10/+20
This enhancement auto loads transport module when the transport is set via SO_RDS_TRANSPORT socket option. Reviewed-by: Ka-Cheong Poon <[email protected]> Reviewed-by: Håkon Bugge <[email protected]> Signed-off-by: Rao Shoaib <[email protected]> Signed-off-by: Somasundaram Krishnasamy <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-06-25Merge branch 'sched-A-couple-of-fixes-for-sch_cake'David S. Miller1-17/+41
Toke Høiland-Jørgensen says: ==================== sched: A couple of fixes for sch_cake This series contains a couple of fixes for diffserv handling in sch_cake that provide a nice speedup (with a somewhat pedantic nit fix tacked on to the end). Not quite sure about whether this should go to stable; it does provide a nice speedup, but it's not strictly a fix in the "correctness" sense. I lean towards including this in stable as well, since our most important consumer of that (OpenWrt) is likely to backport the series anyway. ==================== Signed-off-by: David S. Miller <[email protected]>
2020-06-25sch_cake: fix a few style nitsToke Høiland-Jørgensen1-2/+2
I spotted a few nits when comparing the in-tree version of sch_cake with the out-of-tree one: A redundant error variable declaration shadowing an outer declaration, and an indentation alignment issue. Fix both of these. Fixes: 046f6fd5daef ("sched: Add Common Applications Kept Enhanced (cake) qdisc") Signed-off-by: Toke Høiland-Jørgensen <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-06-25sch_cake: don't call diffserv parsing code when it is not neededToke Høiland-Jørgensen1-4/+9
As a further optimisation of the diffserv parsing codepath, we can skip it entirely if CAKE is configured to neither use diffserv-based classification, nor to zero out the diffserv bits. Fixes: c87b4ecdbe8d ("sch_cake: Make sure we can write the IP header before changing DSCP bits") Signed-off-by: Toke Høiland-Jørgensen <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-06-25sch_cake: don't try to reallocate or unshare skb unconditionallyIlya Ponetayev1-11/+30
cake_handle_diffserv() tries to linearize mac and network header parts of skb and to make it writable unconditionally. In some cases it leads to full skb reallocation, which reduces throughput and increases CPU load. Some measurements of IPv4 forward + NAPT on MIPS router with 580 MHz single-core CPU was conducted. It appears that on kernel 4.9 skb_try_make_writable() reallocates skb, if skb was allocated in ethernet driver via so-called 'build skb' method from page cache (it was discovered by strange increase of kmalloc-2048 slab at first). Obtain DSCP value via read-only skb_header_pointer() call, and leave linearization only for DSCP bleaching or ECN CE setting. And, as an additional optimisation, skip diffserv parsing entirely if it is not needed by the current configuration. Fixes: c87b4ecdbe8d ("sch_cake: Make sure we can write the IP header before changing DSCP bits") Signed-off-by: Ilya Ponetayev <[email protected]> [ fix a few style issues, reflow commit message ] Signed-off-by: Toke Høiland-Jørgensen <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-06-25ethtool: fix error handling in linkstate_prepare_data()Michal Kubecek1-6/+5
When getting SQI or maximum SQI value fails in linkstate_prepare_data(), we must not return without calling ethnl_ops_complete(dev) as that could result in imbalance between ethtool_ops ->begin() and ->complete() calls. Fixes: 806602191592 ("ethtool: provide UAPI for PHY Signal Quality Index (SQI)") Signed-off-by: Michal Kubecek <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-06-25Merge tag 'trace-v5.8-rc2' of ↵Linus Torvalds3-6/+27
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixes from Steven Rostedt: "Four small fixes: - Fix a ringbuffer bug for nested events having time go backwards - Fix a config dependency for boot time tracing to depend on synthetic events instead of histograms. - Fix trigger format parsing to handle multiple spaces - Fix bootconfig to handle failures in multiple events" * tag 'trace-v5.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing/boottime: Fix kprobe multiple events tracing: Fix event trigger to accept redundant spaces tracing/boot: Fix config dependency for synthedic event ring-buffer: Zero out time extend if it is nested and not absolute
2020-06-25Merge branch 'napi_gro_receive-caller-return-value-cleanups'David S. Miller4-39/+17
Jason A. Donenfeld says: ==================== napi_gro_receive caller return value cleanups In 6570bc79c0df ("net: core: use listified Rx for GRO_NORMAL in napi_gro_receive()"), the GRO_NORMAL case stopped calling netif_receive_skb_internal, checking its return value, and returning GRO_DROP in case it failed. Instead, it calls into netif_receive_skb_list_internal (after a bit of indirection), which doesn't return any error. Therefore, napi_gro_receive will never return GRO_DROP, making handling GRO_DROP dead code. I emailed the author of 6570bc79c0df on netdev [1] to see if this change was intentional, but the dlink.ru email address has been disconnected, and looking a bit further myself, it seems somewhat infeasible to start propagating return values backwards from the internal machinations of netif_receive_skb_list_internal. Taking a look at all the callers of napi_gro_receive, it appears that three are checking the return value for the purpose of comparing it to the now never-happening GRO_DROP, and one just casts it to (void), a likely historical leftover. Every other of the 120 callers does not bother checking the return value. And it seems like these remaining 116 callers are doing the right thing: after calling napi_gro_receive, the packet is now in the hands of the upper layers of the newtworking, and the device driver itself has no business now making decisions based on what the upper layers choose to do. Incrementing stats counters on GRO_DROP seems like a mistake, made by these three drivers, but not by the remaining 117. It would seem, therefore, that after rectifying these four callers of napi_gro_receive, that I should go ahead and just remove returning the value from napi_gro_receive all together. However, napi_gro_receive has a function event tracer, and being able to introspect into the networking stack to see how often napi_gro_receive is returning whatever interesting GRO status (aside from _DROP) remains an interesting data point worth keeping for debugging. So, this series simply gets rid of the return value checking for the four useless places where that check never evaluates to anything meaningful. [1] https://lore.kernel.org/netdev/[email protected]/ ==================== Acked-by: Edward Cree <[email protected]> Signed-off-by: David S. Miller <[email protected]>