aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2019-10-19mm: memblock: do not enforce current limit for memblock_phys* familyMike Rapoport1-3/+3
Until commit 92d12f9544b7 ("memblock: refactor internal allocation functions") the maximal address for memblock allocations was forced to memblock.current_limit only for the allocation functions returning virtual address. The changes introduced by that commit moved the limit enforcement into the allocation core and as a result the allocation functions returning physical address also started to limit allocations to memblock.current_limit. This caused breakage of etnaviv GPU driver: etnaviv etnaviv: bound 130000.gpu (ops gpu_ops) etnaviv etnaviv: bound 134000.gpu (ops gpu_ops) etnaviv etnaviv: bound 2204000.gpu (ops gpu_ops) etnaviv-gpu 130000.gpu: model: GC2000, revision: 5108 etnaviv-gpu 130000.gpu: command buffer outside valid memory window etnaviv-gpu 134000.gpu: model: GC320, revision: 5007 etnaviv-gpu 134000.gpu: command buffer outside valid memory window etnaviv-gpu 2204000.gpu: model: GC355, revision: 1215 etnaviv-gpu 2204000.gpu: Ignoring GPU with VG and FE2.0 Restore the behaviour of memblock_phys* family so that these functions will not enforce memblock.current_limit. Link: http://lkml.kernel.org/r/[email protected] Fixes: 92d12f9544b7 ("memblock: refactor internal allocation functions") Signed-off-by: Mike Rapoport <[email protected]> Reported-by: Adam Ford <[email protected]> Tested-by: Adam Ford <[email protected]> [imx6q-logicpd] Cc: Catalin Marinas <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Fabio Estevam <[email protected]> Cc: Lucas Stach <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-10-19mm: memcg: get number of pages on the LRU list in memcgroup base on ↵Honglei Wang1-4/+5
lru_zone_size Commit 1a61ab8038e72 ("mm: memcontrol: replace zone summing with lruvec_page_state()") has made lruvec_page_state to use per-cpu counters instead of calculating it directly from lru_zone_size with an idea that this would be more effective. Tim has reported that this is not really the case for their database benchmark which is showing an opposite results where lruvec_page_state is taking up a huge chunk of CPU cycles (about 25% of the system time which is roughly 7% of total cpu cycles) on 5.3 kernels. The workload is running on a larger machine (96cpus), it has many cgroups (500) and it is heavily direct reclaim bound. Tim Chen said: : The problem can also be reproduced by running simple multi-threaded : pmbench benchmark with a fast Optane SSD swap (see profile below). : : : 6.15% 3.08% pmbench [kernel.vmlinux] [k] lruvec_lru_size : | : |--3.07%--lruvec_lru_size : | | : | |--2.11%--cpumask_next : | | | : | | --1.66%--find_next_bit : | | : | --0.57%--call_function_interrupt : | | : | --0.55%--smp_call_function_interrupt : | : |--1.59%--0x441f0fc3d009 : | _ops_rdtsc_init_base_freq : | access_histogram : | page_fault : | __do_page_fault : | handle_mm_fault : | __handle_mm_fault : | | : | --1.54%--do_swap_page : | swapin_readahead : | swap_cluster_readahead : | | : | --1.53%--read_swap_cache_async : | __read_swap_cache_async : | alloc_pages_vma : | __alloc_pages_nodemask : | __alloc_pages_slowpath : | try_to_free_pages : | do_try_to_free_pages : | shrink_node : | shrink_node_memcg : | | : | |--0.77%--lruvec_lru_size : | | : | --0.76%--inactive_list_is_low : | | : | --0.76%--lruvec_lru_size : | : --1.50%--measure_read : page_fault : __do_page_fault : handle_mm_fault : __handle_mm_fault : do_swap_page : swapin_readahead : swap_cluster_readahead : | : --1.48%--read_swap_cache_async : __read_swap_cache_async : alloc_pages_vma : __alloc_pages_nodemask : __alloc_pages_slowpath : try_to_free_pages : do_try_to_free_pages : shrink_node : shrink_node_memcg : | : |--0.75%--inactive_list_is_low : | | : | --0.75%--lruvec_lru_size : | : --0.73%--lruvec_lru_size The likely culprit is the cache traffic the lruvec_page_state_local generates. Dave Hansen says: : I was thinking purely of the cache footprint. If it's reading : pn->lruvec_stat_local->count[idx] is three separate cachelines, so 192 : bytes of cache *96 CPUs = 18k of data, mostly read-only. 1 cgroup would : be 18k of data for the whole system and the caching would be pretty : efficient and all 18k would probably survive a tight page fault loop in : the L1. 500 cgroups would be ~90k of data per CPU thread which doesn't : fit in the L1 and probably wouldn't survive a tight page fault loop if : both logical threads were banging on different cgroups. : : It's just a theory, but it's why I noted the number of cgroups when I : initially saw this show up in profiles Fix the regression by partially reverting the said commit and calculate the lru size explicitly. Link: http://lkml.kernel.org/r/[email protected] Fixes: 1a61ab8038e72 ("mm: memcontrol: replace zone summing with lruvec_page_state()") Signed-off-by: Honglei Wang <[email protected]> Reported-by: Tim Chen <[email protected]> Acked-by: Tim Chen <[email protected]> Tested-by: Tim Chen <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Vladimir Davydov <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Dave Hansen <[email protected]> Cc: <[email protected]> [5.2+] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-10-19mm/gup: fix a misnamed "write" argument, and a related bugJohn Hubbard1-6/+8
In several routines, the "flags" argument is incorrectly named "write". Change it to "flags". Also, in one place, the misnaming led to an actual bug: "flags & FOLL_WRITE" is required, rather than just "flags". (That problem was flagged by krobot, in v1 of this patch.) Also, change the flags argument from int, to unsigned int. You can see that this was a simple oversight, because the calling code passes "flags" to the fifth argument: gup_pgd_range(): ... if (!gup_huge_pd(__hugepd(pgd_val(pgd)), addr, PGDIR_SHIFT, next, flags, pages, nr)) ...which, until this patch, the callees referred to as "write". Also, change two lines to avoid checkpatch line length complaints, and another line to fix another oversight that checkpatch called out: missing "int" on pdshift. Link: http://lkml.kernel.org/r/[email protected] Fixes: b798bec4741b ("mm/gup: change write parameter to flags in fast walk") Signed-off-by: John Hubbard <[email protected]> Reported-by: kbuild test robot <[email protected]> Suggested-by: Kirill A. Shutemov <[email protected]> Suggested-by: Ira Weiny <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Reviewed-by: Ira Weiny <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Aneesh Kumar K.V <[email protected]> Cc: Keith Busch <[email protected]> Cc: Shuah Khan <[email protected]> Cc: Christoph Hellwig <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-10-19mm/gup_benchmark: add a missing "w" to getopt stringJohn Hubbard1-1/+1
Even though gup_benchmark.c has code to handle the -w command-line option, the "w" is not part of the getopt string. It looks as if it has been missing the whole time. On my machine, this leads naturally to the following predictable result: $ sudo ./gup_benchmark -w ./gup_benchmark: invalid option -- 'w' ...which is fixed with this commit. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: John Hubbard <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Cc: Keith Busch <[email protected]> Cc: Shuah Khan <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: "Aneesh Kumar K . V" <[email protected]> Cc: Ira Weiny <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: kbuild test robot <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-10-19ocfs2: fix error handling in ocfs2_setattr()Chengguang Xu1-0/+2
Should set transfer_to[USRQUOTA/GRPQUOTA] to NULL on error case before jumping to do dqput(). Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Chengguang Xu <[email protected]> Reviewed-by: Joseph Qi <[email protected]> Cc: Mark Fasheh <[email protected]> Cc: Joel Becker <[email protected]> Cc: Junxiao Bi <[email protected]> Cc: Changwei Ge <[email protected]> Cc: Gang He <[email protected]> Cc: Jun Piao <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-10-19mm: memcg/slab: fix panic in __free_slab() caused by premature memcg pointer ↵Roman Gushchin1-4/+5
release Karsten reported the following panic in __free_slab() happening on a s390x machine: Unable to handle kernel pointer dereference in virtual kernel address space Failing address: 0000000000000000 TEID: 0000000000000483 Fault in home space mode while using kernel ASCE. AS:00000000017d4007 R3:000000007fbd0007 S:000000007fbff000 P:000000000000003d Oops: 0004 ilc:3 Ý#1¨ PREEMPT SMP Modules linked in: tcp_diag inet_diag xt_tcpudp ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_at nf_nat CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.3.0-05872-g6133e3e4bada-dirty #14 Hardware name: IBM 2964 NC9 702 (z/VM 6.4.0) Krnl PSW : 0704d00180000000 00000000003cadb6 (__free_slab+0x686/0x6b0) R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 RI:0 EA:3 Krnl GPRS: 00000000f3a32928 0000000000000000 000000007fbf5d00 000000000117c4b8 0000000000000000 000000009e3291c1 0000000000000000 0000000000000000 0000000000000003 0000000000000008 000000002b478b00 000003d080a97600 0000000000000003 0000000000000008 000000002b478b00 000003d080a97600 000000000117ba00 000003e000057db0 00000000003cabcc 000003e000057c78 Krnl Code: 00000000003cada6: e310a1400004 lg %r1,320(%r10) 00000000003cadac: c0e50046c286 brasl %r14,ca32b8 #00000000003cadb2: a7f4fe36 brc 15,3caa1e >00000000003cadb6: e32060800024 stg %r2,128(%r6) 00000000003cadbc: a7f4fd9e brc 15,3ca8f8 00000000003cadc0: c0e50046790c brasl %r14,c99fd8 00000000003cadc6: a7f4fe2c brc 15,3caa 00000000003cadc6: a7f4fe2c brc 15,3caa1e 00000000003cadca: ecb1ffff00d9 aghik %r11,%r1,-1 Call Trace: (<00000000003cabcc> __free_slab+0x49c/0x6b0) <00000000001f5886> rcu_core+0x5a6/0x7e0 <0000000000ca2dea> __do_softirq+0xf2/0x5c0 <0000000000152644> irq_exit+0x104/0x130 <000000000010d222> do_IRQ+0x9a/0xf0 <0000000000ca2344> ext_int_handler+0x130/0x134 <0000000000103648> enabled_wait+0x58/0x128 (<0000000000103634> enabled_wait+0x44/0x128) <0000000000103b00> arch_cpu_idle+0x40/0x58 <0000000000ca0544> default_idle_call+0x3c/0x68 <000000000018eaa4> do_idle+0xec/0x1c0 <000000000018ee0e> cpu_startup_entry+0x36/0x40 <000000000122df34> arch_call_rest_init+0x5c/0x88 <0000000000000000> 0x0 INFO: lockdep is turned off. Last Breaking-Event-Address: <00000000003ca8f4> __free_slab+0x1c4/0x6b0 Kernel panic - not syncing: Fatal exception in interrupt The kernel panics on an attempt to dereference the NULL memcg pointer. When shutdown_cache() is called from the kmem_cache_destroy() context, a memcg kmem_cache might have empty slab pages in a partial list, which are still charged to the memory cgroup. These pages are released by free_partial() at the beginning of shutdown_cache(): either directly or by scheduling a RCU-delayed work (if the kmem_cache has the SLAB_TYPESAFE_BY_RCU flag). The latter case is when the reported panic can happen: memcg_unlink_cache() is called immediately after shrinking partial lists, without waiting for scheduled RCU works. It sets the kmem_cache->memcg_params.memcg pointer to NULL, and the following attempt to dereference it by __free_slab() from the RCU work context causes the panic. To fix the issue, let's postpone the release of the memcg pointer to destroy_memcg_params(). It's called from a separate work context by slab_caches_to_rcu_destroy_workfn(), which contains a full RCU barrier. This guarantees that all scheduled page release RCU works will complete before the memcg pointer will be zeroed. Big thanks for Karsten for the perfect report containing all necessary information, his help with the analysis of the problem and testing of the fix. Link: http://lkml.kernel.org/r/[email protected] Fixes: fb2f2b0adb98 ("mm: memcg/slab: reparent memcg kmem_caches on cgroup removal") Signed-off-by: Roman Gushchin <[email protected]> Reported-by: Karsten Graul <[email protected]> Tested-by: Karsten Graul <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Reviewed-by: Shakeel Butt <[email protected]> Cc: Karsten Graul <[email protected]> Cc: Vladimir Davydov <[email protected]> Cc: David Rientjes <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-10-19mm/memunmap: don't access uninitialized memmap in memunmap_pages()Aneesh Kumar K.V1-4/+7
Patch series "mm/memory_hotplug: Shrink zones before removing memory", v6. This series fixes the access of uninitialized memmaps when shrinking zones/nodes and when removing memory. Also, it contains all fixes for crashes that can be triggered when removing certain namespace using memunmap_pages() - ZONE_DEVICE, reported by Aneesh. We stop trying to shrink ZONE_DEVICE, as it's buggy, fixing it would be more involved (we don't have SECTION_IS_ONLINE as an indicator), and shrinking is only of limited use (set_zone_contiguous() cannot detect the ZONE_DEVICE as contiguous). We continue shrinking !ZONE_DEVICE zones, however, I reduced the amount of code to a minimum. Shrinking is especially necessary to keep zone->contiguous set where possible, especially, on memory unplug of DIMMs at zone boundaries. -------------------------------------------------------------------------- Zones are now properly shrunk when offlining memory blocks or when onlining failed. This allows to properly shrink zones on memory unplug even if the separate memory blocks of a DIMM were onlined to different zones or re-onlined to a different zone after offlining. Example: :/# cat /proc/zoneinfo Node 1, zone Movable spanned 0 present 0 managed 0 :/# echo "online_movable" > /sys/devices/system/memory/memory41/state :/# echo "online_movable" > /sys/devices/system/memory/memory43/state :/# cat /proc/zoneinfo Node 1, zone Movable spanned 98304 present 65536 managed 65536 :/# echo 0 > /sys/devices/system/memory/memory43/online :/# cat /proc/zoneinfo Node 1, zone Movable spanned 32768 present 32768 managed 32768 :/# echo 0 > /sys/devices/system/memory/memory41/online :/# cat /proc/zoneinfo Node 1, zone Movable spanned 0 present 0 managed 0 This patch (of 10): With an altmap, the memmap falling into the reserved altmap space are not initialized and, therefore, contain a garbage NID and a garbage zone. Make sure to read the NID/zone from a memmap that was initialized. This fixes a kernel crash that is observed when destroying a namespace: kernel BUG at include/linux/mm.h:1107! cpu 0x1: Vector: 700 (Program Check) at [c000000274087890] pc: c0000000004b9728: memunmap_pages+0x238/0x340 lr: c0000000004b9724: memunmap_pages+0x234/0x340 ... pid = 3669, comm = ndctl kernel BUG at include/linux/mm.h:1107! devm_action_release+0x30/0x50 release_nodes+0x268/0x2d0 device_release_driver_internal+0x174/0x240 unbind_store+0x13c/0x190 drv_attr_store+0x44/0x60 sysfs_kf_write+0x70/0xa0 kernfs_fop_write+0x1ac/0x290 __vfs_write+0x3c/0x70 vfs_write+0xe4/0x200 ksys_write+0x7c/0x140 system_call+0x5c/0x68 The "page_zone(pfn_to_page(pfn)" was introduced by 69324b8f4833 ("mm, devm_memremap_pages: add MEMORY_DEVICE_PRIVATE support"), however, I think we will never have driver reserved memory with MEMORY_DEVICE_PRIVATE (no altmap AFAIKS). [[email protected]: minimze code changes, rephrase description] Link: http://lkml.kernel.org/r/[email protected] Fixes: 2c2a5af6fed2 ("mm, memory_hotplug: add nid parameter to arch_remove_memory") Signed-off-by: Aneesh Kumar K.V <[email protected]> Signed-off-by: David Hildenbrand <[email protected]> Cc: Dan Williams <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Logan Gunthorpe <[email protected]> Cc: Ira Weiny <[email protected]> Cc: Damian Tometzki <[email protected]> Cc: Alexander Duyck <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Anshuman Khandual <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Fenghua Yu <[email protected]> Cc: Gerald Schaefer <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Halil Pasic <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jun Yao <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Masahiro Yamada <[email protected]> Cc: "Matthew Wilcox (Oracle)" <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Pankaj Gupta <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Pavel Tatashin <[email protected]> Cc: Pavel Tatashin <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Qian Cai <[email protected]> Cc: Rich Felker <[email protected]> Cc: Robin Murphy <[email protected]> Cc: Steve Capper <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Tom Lendacky <[email protected]> Cc: Tony Luck <[email protected]> Cc: Vasily Gorbik <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Wei Yang <[email protected]> Cc: Wei Yang <[email protected]> Cc: Will Deacon <[email protected]> Cc: Yoshinori Sato <[email protected]> Cc: Yu Zhao <[email protected]> Cc: <[email protected]> [5.0+] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-10-19mm/memory_hotplug: don't access uninitialized memmaps in shrink_pgdat_span()David Hildenbrand1-57/+15
We might use the nid of memmaps that were never initialized. For example, if the memmap was poisoned, we will crash the kernel in pfn_to_nid() right now. Let's use the calculated boundaries of the separate zones instead. This now also avoids having to iterate over a whole bunch of subsections again, after shrinking one zone. Before commit d0dc12e86b31 ("mm/memory_hotplug: optimize memory hotplug"), the memmap was initialized to 0 and the node was set to the right value. After that commit, the node might be garbage. We'll have to fix shrink_zone_span() next. Link: http://lkml.kernel.org/r/[email protected] Fixes: f1dd2cd13c4b ("mm, memory_hotplug: do not associate hotadded memory to zones until online") [d0dc12e86b319] Signed-off-by: David Hildenbrand <[email protected]> Reported-by: Aneesh Kumar K.V <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Pavel Tatashin <[email protected]> Cc: Dan Williams <[email protected]> Cc: Wei Yang <[email protected]> Cc: Alexander Duyck <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Anshuman Khandual <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Damian Tometzki <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Fenghua Yu <[email protected]> Cc: Gerald Schaefer <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Halil Pasic <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Ira Weiny <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Jun Yao <[email protected]> Cc: Logan Gunthorpe <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Masahiro Yamada <[email protected]> Cc: "Matthew Wilcox (Oracle)" <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Pankaj Gupta <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Pavel Tatashin <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Qian Cai <[email protected]> Cc: Rich Felker <[email protected]> Cc: Robin Murphy <[email protected]> Cc: Steve Capper <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Tom Lendacky <[email protected]> Cc: Tony Luck <[email protected]> Cc: Vasily Gorbik <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Wei Yang <[email protected]> Cc: Will Deacon <[email protected]> Cc: Yoshinori Sato <[email protected]> Cc: Yu Zhao <[email protected]> Cc: <[email protected]> [4.13+] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-10-19mm/page_owner: don't access uninitialized memmaps when reading ↵Qian Cai1-2/+3
/proc/pagetypeinfo Uninitialized memmaps contain garbage and in the worst case trigger kernel BUGs, especially with CONFIG_PAGE_POISONING. They should not get touched. For example, when not onlining a memory block that is spanned by a zone and reading /proc/pagetypeinfo with CONFIG_DEBUG_VM_PGFLAGS and CONFIG_PAGE_POISONING, we can trigger a kernel BUG: :/# echo 1 > /sys/devices/system/memory/memory40/online :/# echo 1 > /sys/devices/system/memory/memory42/online :/# cat /proc/pagetypeinfo > test.file page:fffff2c585200000 is uninitialized and poisoned raw: ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff raw: ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p)) There is not page extension available. ------------[ cut here ]------------ kernel BUG at include/linux/mm.h:1107! invalid opcode: 0000 [#1] SMP NOPTI Please note that this change does not affect ZONE_DEVICE, because pagetypeinfo_showmixedcount_print() is called from mm/vmstat.c:pagetypeinfo_showmixedcount() only for populated zones, and ZONE_DEVICE is never populated (zone->present_pages always 0). [[email protected]: move check to outer loop, add comment, rephrase description] Link: http://lkml.kernel.org/r/[email protected] Fixes: f1dd2cd13c4b ("mm, memory_hotplug: do not associate hotadded memory to zones until online") # visible after d0dc12e86b319 Signed-off-by: Qian Cai <[email protected]> Signed-off-by: David Hildenbrand <[email protected]> Acked-by: Michal Hocko <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: "Peter Zijlstra (Intel)" <[email protected]> Cc: Miles Chen <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Qian Cai <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: <[email protected]> [4.13+] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-10-19scripts/gdb: fix lx-dmesg when CONFIG_PRINTK_CALLER is setJoel Colledge2-16/+25
When CONFIG_PRINTK_CALLER is set, struct printk_log contains an additional member caller_id. This affects the offset of the log text. Account for this by using the type information from gdb to determine all the offsets instead of using hardcoded values. This fixes following error: (gdb) lx-dmesg Python Exception <class 'ValueError'> embedded null character: Error occurred in Python command: embedded null character The read_u* utility functions now take an offset argument to make them easier to use. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Joel Colledge <[email protected]> Reviewed-by: Jan Kiszka <[email protected]> Cc: Kieran Bingham <[email protected]> Cc: Leonard Crestez <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-10-19mm/memory-failure.c: don't access uninitialized memmaps in memory_failure()David Hildenbrand1-6/+8
We should check for pfn_to_online_page() to not access uninitialized memmaps. Reshuffle the code so we don't have to duplicate the error message. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Fixes: f1dd2cd13c4b ("mm, memory_hotplug: do not associate hotadded memory to zones until online") [visible after d0dc12e86b319] Acked-by: Naoya Horiguchi <[email protected]> Cc: Michal Hocko <[email protected]> Cc: <[email protected]> [4.13+] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-10-19fs/proc/page.c: don't access uninitialized memmaps in fs/proc/page.cDavid Hildenbrand1-12/+16
There are three places where we access uninitialized memmaps, namely: - /proc/kpagecount - /proc/kpageflags - /proc/kpagecgroup We have initialized memmaps either when the section is online or when the page was initialized to the ZONE_DEVICE. Uninitialized memmaps contain garbage and in the worst case trigger kernel BUGs, especially with CONFIG_PAGE_POISONING. For example, not onlining a DIMM during boot and calling /proc/kpagecount with CONFIG_PAGE_POISONING: :/# cat /proc/kpagecount > tmp.test BUG: unable to handle page fault for address: fffffffffffffffe #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 114616067 P4D 114616067 PUD 114618067 PMD 0 Oops: 0000 [#1] SMP NOPTI CPU: 0 PID: 469 Comm: cat Not tainted 5.4.0-rc1-next-20191004+ #11 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.4 RIP: 0010:kpagecount_read+0xce/0x1e0 Code: e8 09 83 e0 3f 48 0f a3 02 73 2d 4c 89 e7 48 c1 e7 06 48 03 3d ab 51 01 01 74 1d 48 8b 57 08 480 RSP: 0018:ffffa14e409b7e78 EFLAGS: 00010202 RAX: fffffffffffffffe RBX: 0000000000020000 RCX: 0000000000000000 RDX: 0000000000000001 RSI: 00007f76b5595000 RDI: fffff35645000000 RBP: 00007f76b5595000 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000140000 R13: 0000000000020000 R14: 00007f76b5595000 R15: ffffa14e409b7f08 FS: 00007f76b577d580(0000) GS:ffff8f41bd400000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: fffffffffffffffe CR3: 0000000078960000 CR4: 00000000000006f0 Call Trace: proc_reg_read+0x3c/0x60 vfs_read+0xc5/0x180 ksys_read+0x68/0xe0 do_syscall_64+0x5c/0xa0 entry_SYSCALL_64_after_hwframe+0x49/0xbe For now, let's drop support for ZONE_DEVICE from the three pseudo files in order to fix this. To distinguish offline memory (with garbage memmap) from ZONE_DEVICE memory with properly initialized memmaps, we would have to check get_dev_pagemap() and pfn_zone_device_reserved() right now. The usage of both (especially, special casing devmem) is frowned upon and needs to be reworked. The fundamental issue we have is: if (pfn_to_online_page(pfn)) { /* memmap initialized */ } else if (pfn_valid(pfn)) { /* * ??? * a) offline memory. memmap garbage. * b) devmem: memmap initialized to ZONE_DEVICE. * c) devmem: reserved for driver. memmap garbage. * (d) devmem: memmap currently initializing - garbage) */ } We'll leave the pfn_zone_device_reserved() check in stable_page_flags() in place as that function is also used from memory failure. We now no longer dump information about pages that are not in use anymore - offline. Link: http://lkml.kernel.org/r/[email protected] Fixes: f1dd2cd13c4b ("mm, memory_hotplug: do not associate hotadded memory to zones until online") [visible after d0dc12e86b319] Signed-off-by: David Hildenbrand <[email protected]> Reported-by: Qian Cai <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Dan Williams <[email protected]> Cc: Alexey Dobriyan <[email protected]> Cc: Stephen Rothwell <[email protected]> Cc: Toshiki Fukasawa <[email protected]> Cc: Pankaj gupta <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Anthony Yznaga <[email protected]> Cc: "Aneesh Kumar K.V" <[email protected]> Cc: <[email protected]> [4.13+] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-10-19drivers/base/memory.c: don't access uninitialized memmaps in ↵David Hildenbrand1-0/+3
soft_offline_page_store() Uninitialized memmaps contain garbage and in the worst case trigger kernel BUGs, especially with CONFIG_PAGE_POISONING. They should not get touched. Right now, when trying to soft-offline a PFN that resides on a memory block that was never onlined, one gets a misleading error with CONFIG_PAGE_POISONING: :/# echo 5637144576 > /sys/devices/system/memory/soft_offline_page [ 23.097167] soft offline: 0x150000 page already poisoned But the actual result depends on the garbage in the memmap. soft_offline_page() can only work with online pages, it returns -EIO in case of ZONE_DEVICE. Make sure to only forward pages that are online (iow, managed by the buddy) and, therefore, have an initialized memmap. Add a check against pfn_to_online_page() and similarly return -EIO. Link: http://lkml.kernel.org/r/[email protected] Fixes: f1dd2cd13c4b ("mm, memory_hotplug: do not associate hotadded memory to zones until online") [visible after d0dc12e86b319] Signed-off-by: David Hildenbrand <[email protected]> Acked-by: Naoya Horiguchi <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: "Rafael J. Wysocki" <[email protected]> Cc: <[email protected]> [4.13+] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-10-18Merge tag 'for-linus-2019-10-18' of git://git.kernel.dk/linux-blockLinus Torvalds13-117/+266
Pull block fixes from Jens Axboe: - NVMe pull request from Keith that address deadlocks, double resets, memory leaks, and other regression. - Fixup elv_support_iosched() for bio based devices (Damien) - Fixup for the ahci PCS quirk (Dan) - Socket O_NONBLOCK handling fix for io_uring (me) - Timeout sequence io_uring fixes (yangerkun) - MD warning fix for parameter default_layout (Song) - blkcg activation fixes (Tejun) - blk-rq-qos node deletion fix (Tejun) * tag 'for-linus-2019-10-18' of git://git.kernel.dk/linux-block: nvme-pci: Set the prp2 correctly when using more than 4k page io_uring: fix logic error in io_timeout io_uring: fix up O_NONBLOCK handling for sockets md/raid0: fix warning message for parameter default_layout libata/ahci: Fix PCS quirk application blk-rq-qos: fix first node deletion of rq_qos_del() blkcg: Fix multiple bugs in blkcg_activate_policy() io_uring: consider the overflow of sequence for timeout req nvme-tcp: fix possible leakage during error flow nvmet-loop: fix possible leakage during error flow block: Fix elv_support_iosched() nvme-tcp: Initialize sk->sk_ll_usec only with NET_RX_BUSY_POLL nvme: Wait for reset state when required nvme: Prevent resets during paused controller state nvme: Restart request timers in resetting state nvme: Remove ADMIN_ONLY state nvme-pci: Free tagset if no IO queues nvme: retain split access workaround for capability reads nvme: fix possible deadlock when nvme_update_formats fails
2019-10-18Merge tag 'riscv/for-v5.4-rc4' of ↵Linus Torvalds4-23/+20
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V fixes from Paul Walmsley: "Some RISC-V fixes: - Fix the virtual memory layout so the fixaddr region doesn't overlap with other regions. (This was originally intended to go in as part of an earlier patch, but I inadvertently dropped it during a rebase) - Add the DT chosen/stdout-path property to the HiFive Unleashed DT file. This is so "earlycon" can be specified with no arguments on the kernel command line, and the correct UART will be automatically selected. And two cleanup patches: - Simplify the code in our breakpoint trap handler. - Drop a comment in our TLB flush code that has caused some confusion" * tag 'riscv/for-v5.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: RISC-V: fix virtual address overlapped in FIXADDR_START and VMEMMAP_START riscv: tlbflush: remove confusing comment on local_flush_tlb_all() riscv: dts: HiFive Unleashed: add default chosen/stdout-path riscv: remove the switch statement in do_trap_break()
2019-10-18drm/i915: prettify MST debug messageLucas De Marchi1-1/+1
s/?/:/ so it gets correctly colored by dmesg. Signed-off-by: Lucas De Marchi <[email protected]> Reviewed-by: Ville Syrjälä <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2019-10-18drm/i915: add pipe id/name to pipe mismatch logsLucas De Marchi1-15/+19
This way it's easier to figure out what didn't match when we have multiple pipes enabled. v2: pass drm_crtc and use the more common [CRTC:%d:%s] format (Ville) v3: use struct intel_crtc type to pass crtc around (Ville) Signed-off-by: Lucas De Marchi <[email protected]> Reviewed-by: Ville Syrjälä <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2019-10-18drm/i915: remove extra new line on pipe_config mismatchLucas De Marchi1-11/+11
The new line is already added by pipe_config_mismatch(), so the callers shouldn't add it. Signed-off-by: Lucas De Marchi <[email protected]> Reviewed-by: Ville Syrjälä <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2019-10-18drm/i915: fix port checks for MST support on gen >= 11Lucas De Marchi2-11/+18
Both Ice Lake and Elkhart Lake (gen 11) support MST on all external connections except DDI A. Tiger Lake (gen 12) supports on all external connections. Move the check to happen inside intel_dp_mst_encoder_init() and add specific platform checks. v2: Replace != with == checks for ports on gen < 11 (Ville) Signed-off-by: Lucas De Marchi <[email protected]> Reviewed-by: Ville Syrjälä <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2019-10-18drm/i915: simplify setting of ddi_io_power_domainLucas De Marchi1-40/+3
Instead of the ever growing switch, just compute the ddi io power domain based on the port number. Signed-off-by: Lucas De Marchi <[email protected]> Reviewed-by: José Roberto de Souza <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2019-10-18drm/i915/display/icl: In port sync mode disable slaves first then masterManasi Navare1-6/+52
In the transcoder port sync mode, the slave transcoders mask their vblanks until master transcoder's vblank so while disabling them, make sure slaves are disabled first and then the masters. v5: * Dont pass dev priv to get_slave_crtc (Ville) v4: * Obtain slave state from master (Maarten) v3: * Rebase v2: * Use the intel_old_crtc_state_disables() helper Cc: Ville Syrjälä <[email protected]> Cc: Maarten Lankhorst <[email protected]> Cc: Matt Roper <[email protected]> Cc: Jani Nikula <[email protected]> Signed-off-by: Manasi Navare <[email protected]> Reviewed-by: Maarten Lankhorst <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2019-10-18drm/i915/display/icl: Disable transcoder port sync as part of crtc_disable() ↵Manasi Navare1-0/+22
sequence This clears the transcoder port sync bits of the TRANS_DDI_FUNC_CTL2 register during crtc_disable(). v3: * Rebase on maarten's patches v2: * Directly write the trans_port_sync reg value (Maarten) Cc: Ville Syrjälä <[email protected]> Cc: Maarten Lankhorst <[email protected]> Cc: Matt Roper <[email protected]> Cc: Jani Nikula <[email protected]> Signed-off-by: Manasi Navare <[email protected]> Reviewed-by: Maarten Lankhorst <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2019-10-18drm/i915/display/icl: Enable master-slaves in trans port syncManasi Navare3-5/+139
As per the display enable sequence, we need to follow the enable sequence for slaves first with DP_TP_CTL set to Idle and configure the transcoder port sync register to select the corersponding master, then follow the enable sequence for master leaving DP_TP_CTL to idle. At this point the transcoder port sync mode is configured and enabled and the Vblanks of both ports are synchronized so then set DP_TP_CTL for the slave and master to Normal and do post crtc enable updates. v11: * Rebase (Manasi) v10: * in trans sync mode, dont stop link train for tgl (Manasi) v9: Remove update_scanline_offset to rebase on Maarten's patch (Manasi) v8: * Rebase on Maarten's patches (Manasi) v7: * Use ffs(slaves) to get slave crtc (Ville) v6: * Modeset implies active_changed, remove one condition (Maarten) v5: * Fix checkpatch warning (Manasi) v4: * Reuse skl_commit_modeset_enables() hook (Maarten) * Obtain slave crtc and states from master (Maarten) v3: * Rebase on drm-tip (Manasi) v2: * Create a icl_update_crtcs hook (Maarten, Danvet) * This sequence only for CRTCs in trans port sync mode (Maarten) Cc: Daniel Vetter <[email protected]> Cc: Ville Syrjälä <[email protected]> Cc: Maarten Lankhorst <[email protected]> Cc: Matt Roper <[email protected]> Signed-off-by: Manasi Navare <[email protected]> Reviewed-by: Maarten Lankhorst <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2019-10-18drm/i915/display/icl: HW state readout for transcoder port sync configManasi Navare1-0/+61
After the state is committed, we readout the HW registers and compare the HW state with the SW state that we just committed. For Transcdoer port sync, we add master_transcoder and the salves bitmask to the crtc_state, hence we need to read those during the HW state readout to avoid pipe state mismatch. v11: * Move master trans init to get pipe_Config hooks (Ville) v10: * Initialize master_tarnscoder readout for all platforms (Ville) v9: * Initialize master_transcoder = INVALID at get config (Ville) v8: * Use master_select -1, address TRANS_EDP case (Ville) * Rename master_transcoder to _readout (Lucas) v7: * NDont read HW state for DSI v6: * Go through both parts of HW readout (Maarten) * Add a WARN if the same trans configured as master and slave (Ville, Maarten) v5: * Add return INVALID in defaut case (Maarten) v4: * Get power domains in master loop for get_config (Ville) v3: * Add TRANSCODER_D (Maarten) * v3 Reviewed-by: Maarten Lankhorst <[email protected]> v2: * Add Transcoder_D and MISSING_CASE (Maarten) Cc: Ville Syrjälä <[email protected]> Cc: Maarten Lankhorst <[email protected]> Cc: Matt Roper <[email protected]> Cc: Jani Nikula <[email protected]> Signed-off-by: Manasi Navare <[email protected]> Reviewed-by: Maarten Lankhorst <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2019-10-18drm/i915/display/icl: Enable TRANSCODER PORT SYNC for tiled displays across ↵Manasi Navare1-0/+33
separate ports In case of tiled displays where different tiles are displayed across different ports, we need to synchronize the transcoders involved. This patch implements the transcoder port sync feature for synchronizing one master transcoder with one or more slave transcoders. This is only enbaled in slave transcoder and the master transcoder is unaware that it is operating in this mode. This has been tested with tiled display connected to ICL. v7: * Rebase on Maarten's patches v6: * Use master_trans +1 and address missing trans_edp case (Ville) v5: * Add TRANSCODER_D case and MISSING_CASE (Maarten) v4: Rebase v3: * Check of DP_MST moved to atomic_check (Maarten) v2: * Do not use RMW, just write to the register in commit (Jani N) Cc: Daniel Vetter <[email protected]> Cc: Ville Syrjälä <[email protected]> Cc: Maarten Lankhorst <[email protected]> Cc: Matt Roper <[email protected]> Cc: Jani Nikula <[email protected]> Signed-off-by: Manasi Navare <[email protected]> Reviewed-by: Maarten Lankhorst <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2019-10-18drm/i915/display/icl: Save Master transcoder in slave's crtc_state for ↵Manasi Navare3-0/+126
Transcoder Port Sync In case of tiled displays when the two tiles are sent across two CRTCs over two separate DP SST connectors, we need a mechanism to synchronize the two CRTCs and their corresponding transcoders. So use the master-slave mode where there is one master corresponding to last horizontal and vertical tile that needs to be genlocked with all other slave tiles. This patch identifies saves the master transcoder in all the slave CRTC states. This is needed to select the master CRTC/transcoder while configuring transcoder port sync for the corresponding slaves. v6: Rebase (manasi) v5: * Address Ville's comments * Just pass crtc_state, no need to check GEN (Ville) v4: * Rebase v3: * Use master_tramscoder instead of master_crtc for valid HW state readouts (Ville) v2: * Move this to intel_mode_set_pipe_config(Jani N, Ville) * Use slave_bitmask to save associated slaves in master crtc state (Ville) Cc: Daniel Vetter <[email protected]> Cc: Ville Syrjälä <[email protected]> Cc: Maarten Lankhorst <[email protected]> Cc: Matt Roper <[email protected]> Signed-off-by: Manasi Navare <[email protected]> Reviewed-by: Maarten Lankhorst <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2019-10-18filldir[64]: remove WARN_ON_ONCE() for bad directory entriesLinus Torvalds1-2/+2
This was always meant to be a temporary thing, just for testing and to see if it actually ever triggered. The only thing that reported it was syzbot doing disk image fuzzing, and then that warning is expected. So let's just remove it before -rc4, because the extra sanity testing should probably go to -stable, but we don't want the warning to do so. Reported-by: [email protected] Fixes: 8a23eb804ca4 ("Make filldir[64]() verify the directory entry filename is valid") Signed-off-by: Linus Torvalds <[email protected]>
2019-10-18Merge tag 'ceph-for-5.4-rc4' of git://github.com/ceph/ceph-clientLinus Torvalds2-13/+17
Pull ceph fixes from Ilya Dryomov: "A future-proofing decoding fix from Jeff intended for stable and a patch for a mostly benign race from Dongsheng" * tag 'ceph-for-5.4-rc4' of git://github.com/ceph/ceph-client: rbd: cancel lock_dwork if the wait is interrupted ceph: just skip unrecognized info in ceph_reply_info_extra
2019-10-18Merge tag 'for-5.4/dm-fixes' of ↵Linus Torvalds3-45/+81
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper fixes from Mike Snitzer: - Fix DM snapshot deadlock that can occur due to COW throttling preventing locks from being released. - Fix DM cache's GFP_NOWAIT allocation failure error paths by switching to GFP_NOIO. - Make __hash_find() static in the DM clone target. * tag 'for-5.4/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm cache: fix bugs when a GFP_NOWAIT allocation fails dm snapshot: rework COW throttling to fix deadlock dm snapshot: introduce account_start_copy() and account_end_copy() dm clone: Make __hash_find static
2019-10-18Merge tag 'iommu-fixes-v5.4-rc3' of ↵Linus Torvalds6-27/+70
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull iommu fixes from Joerg Roedel: - Fixes for page-table issues on Mali GPUs - Missing free in an error path for ARM-SMMU - PASID decoding in the AMD IOMMU Event log code - Another update for the locking fixes in the AMD IOMMU driver - Reduce the calls to platform_get_irq() in the IPMMU-VMSA and Rockchip IOMMUs to get rid of the warning message added to this function recently * tag 'iommu-fixes-v5.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: iommu/amd: Check PM_LEVEL_SIZE() condition in locked section iommu/amd: Fix incorrect PASID decoding from event log iommu/ipmmu-vmsa: Only call platform_get_irq() when interrupt is mandatory iommu/rockchip: Don't use platform_get_irq to implicitly count irqs iommu/io-pgtable-arm: Support all Mali configurations iommu/io-pgtable-arm: Correct Mali attributes iommu/arm-smmu: Free context bitmap in the err path of arm_smmu_init_domain_context
2019-10-18Merge tag 'copy-struct-from-user-v5.4-rc4' of ↵Linus Torvalds1-9/+28
gitolite.kernel.org:pub/scm/linux/kernel/git/brauner/linux Pull usercopy test fixlets from Christian Brauner: "This contains two improvements for the copy_struct_from_user() tests: - a coding style change to get rid of the ugly "if ((ret |= test()))" pointed out when pulling the original patchset. - avoid a soft lockups when running the usercopy tests on machines with large page sizes by scanning only a 1024 byte region" * tag 'copy-struct-from-user-v5.4-rc4' of gitolite.kernel.org:pub/scm/linux/kernel/git/brauner/linux: usercopy: Avoid soft lockups in test_check_nonzero_user() lib: test_user_copy: style cleanup
2019-10-18drm/i915: Restore full symmetry in i915_driver_modeset_probe/removeJanusz Krzysztofik2-4/+6
Commit 2d6f6f359fd8 ("drm/i915: add i915_driver_modeset_remove()") claimed removal of asymmetry in probe() and remove() calls, however, it didn't take care of calling intel_irq_uninstall() on driver remove. That doesn't hurt as long as we still call it from intel_modeset_driver_remove() but in order to have full symmetry we should call it again from i915_driver_modeset_remove(). Note that it's safe to call intel_irq_uninstall() twice thanks to commit b318b82455bd ("drm/i915: Nuke drm_driver irq vfuncs"). We may only want to mention the case we are adding in a related FIXME comment provided by that commit. While being at it, update the name of function mentioned as calling it out of sequence as that name has been changed meanwhile by commit 78dae1ac35dd ("drm/i915: Propagate "_remove" function name suffix down"). Suggested-by: Michal Wajdeczko <[email protected]> Signed-off-by: Janusz Krzysztofik <[email protected]> Cc: Michal Wajdeczko <[email protected]> Reviewed-by: Michal Wajdeczko <[email protected]> Signed-off-by: Chris Wilson <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2019-10-18drm/edid: Fix HDMI VIC handlingVille Syrjälä1-16/+21
Extract drm_mode_hdmi_vic() to correctly calculate the final HDMI VIC for us. Currently this is being done a bit differently between the AVI and HDMI infoframes. Let's get both to agree on this. We need to allow the case where a mode is both 3D and has a HDMI VIC. Currently we'll just refuse to generate the HDMI infoframe when we really should be setting HDMI VIC to 0 and instead enabling 3D stereo signalling. If the sink doesn't even support the HDMI infoframe we should not be picking the HDMI VIC in favor of the CEA VIC, because then we'll end up not sending either VIC in the end. Cc: Wayne Lin <[email protected]> Signed-off-by: Ville Syrjälä <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Reviewed-by: Uma Shankar <[email protected]>
2019-10-18drm/edid: Extract drm_mode_cea_vic()Ville Syrjälä1-23/+30
Extract the logic to compute the final CEA VIC to a small helper. We'll reorder it a bit to make future modifications more straightforward. No function changes. Cc: Wayne Lin <[email protected]> Signed-off-by: Ville Syrjälä <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Reviewed-by: Uma Shankar <[email protected]>
2019-10-18drm/edid: Make drm_get_cea_aspect_ratio() staticVille Syrjälä2-10/+1
drm_get_cea_aspect_ratio() is not used outside drm_edid.c. Make it static. Cc: Wayne Lin <[email protected]> Signed-off-by: Ville Syrjälä <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Reviewed-by: Uma Shankar <[email protected]>
2019-10-18drm/fourcc: Fix undefined left shift in DRM_FORMAT_BIG_ENDIAN macrosAdam Jackson1-1/+1
1<<31 is undefined because it's a signed int and C is terrible. Reviewed-by: Eric Engestrom <[email protected]> Signed-off-by: Adam Jackson <[email protected]> Signed-off-by: Ville Syrjälä <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2019-10-18net: usb: lan78xx: Connect PHY before registering MACAndrew Lunn1-6/+6
As soon as the netdev is registers, the kernel can start using the interface. If the driver connects the MAC to the PHY after the netdev is registered, there is a race condition where the interface can be opened without having the PHY connected. Change the order to close this race condition. Fixes: 92571a1aae40 ("lan78xx: Connect phy early") Reported-by: Daniel Wagner <[email protected]> Signed-off-by: Andrew Lunn <[email protected]> Tested-by: Daniel Wagner <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-10-18Merge branch 'vsock-virtio-make-the-credit-mechanism-more-robust'David S. Miller1-3/+14
Stefano Garzarella says: ==================== vsock/virtio: make the credit mechanism more robust This series makes the credit mechanism implemented in the virtio-vsock devices more robust. Patch 1 sends an update to the remote peer when the buf_alloc change. Patch 2 prevents a malicious peer (especially the guest) can consume all the memory of the other peer, discarding packets when the credit available is not respected. ==================== Signed-off-by: David S. Miller <[email protected]>
2019-10-18vsock/virtio: discard packets if credit is not respectedStefano Garzarella1-3/+11
If the remote peer doesn't respect the credit information (buf_alloc, fwd_cnt), sending more data than it can send, we should drop the packets to prevent a malicious peer from using all of our memory. This is patch follows the VIRTIO spec: "VIRTIO_VSOCK_OP_RW data packets MUST only be transmitted when the peer has sufficient free buffer space for the payload" Signed-off-by: Stefano Garzarella <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-10-18vsock/virtio: send a credit update when buffer size is changedStefano Garzarella1-0/+3
When the user application set a new buffer size value, we should update the remote peer about this change, since it uses this information to calculate the credit available. Signed-off-by: Stefano Garzarella <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-10-18mlxsw: spectrum_trap: Push Ethernet header before reporting trapIdo Schimmel1-0/+1
devlink maintains packets and bytes statistics for each trap. Since eth_type_trans() was called to set the skb's protocol, the data pointer no longer points to the start of the packet and the bytes accounting is off by 14 bytes. Fix this by pushing the skb's data pointer to the start of the packet. Fixes: b5ce611fd96e ("mlxsw: spectrum: Add devlink-trap support") Reported-by: Alex Kushnarov <[email protected]> Tested-by: Alex Kushnarov <[email protected]> Acked-by: Jiri Pirko <[email protected]> Signed-off-by: Ido Schimmel <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-10-18net: ensure correct skb->tstamp in various fragmentersEric Dumazet4-0/+12
Thomas found that some forwarded packets would be stuck in FQ packet scheduler because their skb->tstamp contained timestamps far in the future. We thought we addressed this point in commit 8203e2d844d3 ("net: clear skb->tstamp in forwarding paths") but there is still an issue when/if a packet needs to be fragmented. In order to meet EDT requirements, we have to make sure all fragments get the original skb->tstamp. Note that this original skb->tstamp should be zero in forwarding path, but might have a non zero value in output path if user decided so. Fixes: fb420d5d91c1 ("tcp/fq: move back to CLOCK_MONOTONIC") Signed-off-by: Eric Dumazet <[email protected]> Reported-by: Thomas Bartschies <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-10-18Merge tag 'mmc-v5.4-rc1' of ↵Linus Torvalds4-17/+23
git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc Pull MMC fixes from Ulf Hansson: "MMC host: - sdhci-iproc: Prevent some spurious interrupts - renesas_sdhi/sh_mmcif: Avoid false warnings about IRQs not found MEMSTICK host: - jmb38x_ms: Fix an error handling path at ->probe()" * tag 'mmc-v5.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc: memstick: jmb38x_ms: Fix an error handling path in 'jmb38x_ms_probe()' mmc: sdhci-iproc: fix spurious interrupts on Multiblock reads with bcm2711 mmc: sh_mmcif: Use platform_get_irq_optional() for optional interrupt mmc: renesas_sdhi: Do not use platform_get_irq() to count interrupts
2019-10-18Merge branch 'net-bcmgenet-restore-internal-EPHY-support'David S. Miller4-77/+79
Doug Berger says: ==================== net: bcmgenet: restore internal EPHY support I managed to get my hands on an old BCM97435SVMB board to do some testing with the latest kernel and uncovered a number of things that managed to get broken over the years (some by me ;). This commit set attempts to correct the errors I observed in my testing. The first commit applies to all internal PHYs to restore proper reporting of link status when a link comes up. The second commit restores the soft reset to the initialization of the older internal EPHYs used by 40nm Set-Top Box devices. The third corrects a bug I introduced when removing excessive soft resets by altering the initialization sequence in a way that keeps the GENETv3 MAC interface happy. Finally, I observed a number of issues when manually configuring the network interface of the older EPHYs that appear to be resolved by the fourth commit. ==================== Signed-off-by: David S. Miller <[email protected]>
2019-10-18net: bcmgenet: reset 40nm EPHY on energy detectDoug Berger1-1/+8
The EPHY integrated into the 40nm Set-Top Box devices can falsely detect energy when connected to a disabled peer interface. When the peer interface is enabled the EPHY will detect and report the link as active, but on occasion may get into a state where it is not able to exchange data with the connected GENET MAC. This issue has not been observed when the link parameters are auto-negotiated; however, it has been observed with a manually configured link. It has been empirically determined that issuing a soft reset to the EPHY when energy is detected prevents it from getting into this bad state. Fixes: 1c1008c793fa ("net: bcmgenet: add main driver file") Signed-off-by: Doug Berger <[email protected]> Acked-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-10-18net: bcmgenet: soft reset 40nm EPHYs before MAC initDoug Berger3-73/+69
It turns out that the "Workaround for putting the PHY in IDDQ mode" used by the internal EPHYs on 40nm Set-Top Box chips when powering down puts the interface to the GENET MAC in a state that can cause subsequent MAC resets to be incomplete. Rather than restore the forced soft reset when powering up internal PHYs, this commit moves the invocation of phy_init_hw earlier in the MAC initialization sequence to just before the MAC reset in the open and resume functions. This allows the interface to be stable and allows the MAC resets to be successful. The bcmgenet_mii_probe() function is split in two to accommodate this. The new function bcmgenet_mii_connect() handles the first half of the functionality before the MAC initialization, and the bcmgenet_mii_config() function is extended to provide the remaining PHY configuration following the MAC initialization. Fixes: 484bfa1507bf ("Revert "net: bcmgenet: Software reset EPHY after power on"") Signed-off-by: Doug Berger <[email protected]> Acked-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-10-18net: phy: bcm7xxx: define soft_reset for 40nm EPHYDoug Berger1-0/+1
The internal 40nm EPHYs use a "Workaround for putting the PHY in IDDQ mode." These PHYs require a soft reset to restore functionality after they are powered back up. This commit defines the soft_reset function to use genphy_soft_reset during phy_init_hw to accommodate this. Fixes: 6e2d85ec0559 ("net: phy: Stop with excessive soft reset") Signed-off-by: Doug Berger <[email protected]> Acked-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-10-18net: bcmgenet: don't set phydev->link from MACDoug Berger1-3/+1
When commit 28b2e0d2cd13 ("net: phy: remove parameter new_link from phy_mac_interrupt()") removed the new_link parameter it set the phydev->link state from the MAC before invoking phy_mac_interrupt(). However, once commit 88d6272acaaa ("net: phy: avoid unneeded MDIO reads in genphy_read_status") was added this initialization prevents the proper determination of the connection parameters by the function genphy_read_status(). This commit removes that initialization to restore the proper functionality. Fixes: 88d6272acaaa ("net: phy: avoid unneeded MDIO reads in genphy_read_status") Signed-off-by: Doug Berger <[email protected]> Acked-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-10-18drm/i915: Correct the PCH type in irq postinstallVivek Kasireddy1-1/+4
JasperLake PCH (JSP) has DDI HPD pin mappings similar to TGP and not MCC. Also add the correct HPD pin mappings for the MCC PCH. Cc: Matt Roper <[email protected]> Signed-off-by: Vivek Kasireddy <[email protected]> Reviewed-by: Matt Roper <[email protected]> Signed-off-by: Matt Roper <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2019-10-18Merge tag 'sound-5.4-rc4' of ↵Linus Torvalds5-3/+45
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "Just a few small fixes for the usual suspect, HD- and USB-audio: enablement of runtime PM for Nvidia due to the recent PCI changes, a fix for potential hangs with recent HD-audio platforms, and the rest device-specific quirks" * tag 'sound-5.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: ALSA: hda - Force runtime PM on Nvidia HDMI codecs ALSA: hda/realtek - Enable headset mic on Asus MJ401TA ALSA: usb-audio: Disable quirks for BOSS Katana amplifiers ALSA: hdac: clear link output stream mapping ALSA: hda/realtek: Reduce the Headphone static noise on XPS 9350/9360