blaster4385/linux-IllusionX - Linux kernel with personal config changes for arch linux

Age	Commit message (Collapse)	Author	Files	Lines
2019-11-06	ALSA: timer: Fix incorrectly assigned timer instance	Takashi Iwai	1	-3/+3
	The clean up commit 41672c0c24a6 ("ALSA: timer: Simplify error path in snd_timer_open()") unified the error handling code paths with the standard goto, but it introduced a subtle bug: the timer instance is stored in snd_timer_open() incorrectly even if it returns an error. This may eventually lead to UAF, as spotted by fuzzer. The culprit is the snd_timer_open() code checks the SNDRV_TIMER_IFLG_EXCLUSIVE flag with the common variable timeri. This variable is supposed to be the newly created instance, but we (ab-)used it for a temporary check before the actual creation of a timer instance. After that point, there is another check for the max number of instances, and it bails out if over the threshold. Before the refactoring above, it worked fine because the code returned directly from that point. After the refactoring, however, it jumps to the unified error path that stores the timeri variable in return -- even if it returns an error. Unfortunately this stored value is kept in the caller side (snd_timer_user_tselect()) in tu->timeri. This causes inconsistency later, as if the timer was successfully assigned. In this patch, we fix it by not re-using timeri variable but a temporary variable for testing the exclusive connection, so timeri remains NULL at that point. Fixes: 41672c0c24a6 ("ALSA: timer: Simplify error path in snd_timer_open()") Reported-and-tested-by: Tristan Madani <[email protected]> Cc: <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Takashi Iwai <[email protected]>
2019-11-06	mm: memcontrol: fix network errors from failing __GFP_ATOMIC charges	Johannes Weiner	1	-0/+9
	While upgrading from 4.16 to 5.2, we noticed these allocation errors in the log of the new kernel: SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC) cache: tw_sock_TCPv6(960:helper-logs), object size: 232, buffer size: 240, default order: 1, min order: 0 node 0: slabs: 5, objs: 170, free: 0 slab_out_of_memory+1 ___slab_alloc+969 __slab_alloc+14 kmem_cache_alloc+346 inet_twsk_alloc+60 tcp_time_wait+46 tcp_fin+206 tcp_data_queue+2034 tcp_rcv_state_process+784 tcp_v6_do_rcv+405 __release_sock+118 tcp_close+385 inet_release+46 __sock_release+55 sock_close+17 __fput+170 task_work_run+127 exit_to_usermode_loop+191 do_syscall_64+212 entry_SYSCALL_64_after_hwframe+68 accompanied by an increase in machines going completely radio silent under memory pressure. One thing that changed since 4.16 is e699e2c6a654 ("net, mm: account sock objects to kmemcg"), which made these slab caches subject to cgroup memory accounting and control. The problem with that is that cgroups, unlike the page allocator, do not maintain dedicated atomic reserves. As a cgroup's usage hovers at its limit, atomic allocations - such as done during network rx - can fail consistently for extended periods of time. The kernel is not able to operate under these conditions. We don't want to revert the culprit patch, because it indeed tracks a potentially substantial amount of memory used by a cgroup. We also don't want to implement dedicated atomic reserves for cgroups. There is no point in keeping a fixed margin of unused bytes in the cgroup's memory budget to accomodate a consumer that is impossible to predict - we'd be wasting memory and get into configuration headaches, not unlike what we have going with min_free_kbytes. We do this for physical mem because we have to, but cgroups are an accounting game. Instead, account these privileged allocations to the cgroup, but let them bypass the configured limit if they have to. This way, we get the benefits of accounting the consumed memory and have it exert pressure on the rest of the cgroup, but like with the page allocator, we shift the burden of reclaimining on behalf of atomic allocations onto the regular allocations that can block. Link: http://lkml.kernel.org/r/[email protected] Fixes: e699e2c6a654 ("net, mm: account sock objects to kmemcg") Signed-off-by: Johannes Weiner <[email protected]> Reviewed-by: Shakeel Butt <[email protected]> Cc: Suleiman Souhlal <[email protected]> Cc: Michal Hocko <[email protected]> Cc: <[email protected]> [4.18+] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-11-06	mm/memory_hotplug: fix updating the node span	David Hildenbrand	1	-0/+8
	We recently started updating the node span based on the zone span to avoid touching uninitialized memmaps. Currently, we will always detect the node span to start at 0, meaning a node can easily span too many pages. pgdat_is_empty() will still work correctly if all zones span no pages. We should skip over all zones without spanned pages and properly handle the first detected zone that spans pages. Unfortunately, in contrast to the zone span (/proc/zoneinfo), the node span cannot easily be inspected and tested. The node span gives no real guarantees when an architecture supports memory hotplug, meaning it can easily contain holes or span pages of different nodes. The node span is not really used after init on architectures that support memory hotplug. E.g., we use it in mm/memory_hotplug.c:try_offline_node() and in mm/kmemleak.c:kmemleak_scan(). These users seem to be fine. Link: http://lkml.kernel.org/r/[email protected] Fixes: 00d6c019b5bc ("mm/memory_hotplug: don't access uninitialized memmaps in shrink_pgdat_span()") Signed-off-by: David Hildenbrand <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Stephen Rothwell <[email protected]> Cc: Dan Williams <[email protected]> Cc: Pavel Tatashin <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-11-06	scripts/gdb: fix debugging modules compiled with hot/cold partitioning	Ilya Leoshkevich	1	-1/+2
	gcc's -freorder-blocks-and-partition option makes it group frequently and infrequently used code in .text.hot and .text.unlikely sections respectively. At least when building modules on s390, this option is used by default. gdb assumes that all code is located in .text section, and that .text section is located at module load address. With such modules this is no longer the case: there is code in .text.hot and .text.unlikely, and either of them might precede .text. Fix by explicitly telling gdb the addresses of code sections. It might be tempting to do this for all sections, not only the ones in the white list. Unfortunately, gdb appears to have an issue, when telling it about e.g. loadable .note.gnu.build-id section causes it to think that non-loadable .note.Linux section is loaded at address 0, which in turn causes NULL pointers to be resolved to bogus symbols. So keep using the white list approach for the time being. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ilya Leoshkevich <[email protected]> Reviewed-by: Jan Kiszka <[email protected]> Cc: Kieran Bingham <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Vasily Gorbik <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-11-06	mm: slab: make page_cgroup_ino() to recognize non-compound slab pages properly	Roman Gushchin	2	-3/+3
	page_cgroup_ino() doesn't return a valid memcg pointer for non-compound slab pages, because it depends on PgHead AND PgSlab flags to be set to determine the memory cgroup from the kmem_cache. It's correct for compound pages, but not for generic small pages. Those don't have PgHead set, so it ends up returning zero. Fix this by replacing the condition to PageSlab() && !PageTail(). Before this patch: [root@localhost ~]# ./page-types -c /sys/fs/cgroup/user.slice/user-0.slice/[email protected]/ \| grep slab 0x0000000000000080 38 0 _______S___________________________________ slab After this patch: [root@localhost ~]# ./page-types -c /sys/fs/cgroup/user.slice/user-0.slice/[email protected]/ \| grep slab 0x0000000000000080 147 0 _______S___________________________________ slab Also, hwpoison_filter_task() uses output of page_cgroup_ino() in order to filter error injection events based on memcg. So if page_cgroup_ino() fails to return memcg pointer, we just fail to inject memory error. Considering that hwpoison filter is for testing, affected users are limited and the impact should be marginal. [[email protected]: changelog additions] Link: http://lkml.kernel.org/r/[email protected] Fixes: 4d96ba353075 ("mm: memcg/slab: stop setting page->mem_cgroup pointer for slab pages") Signed-off-by: Roman Gushchin <[email protected]> Reviewed-by: Shakeel Butt <[email protected]> Acked-by: David Rientjes <[email protected]> Cc: Vladimir Davydov <[email protected]> Cc: Daniel Jordan <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-11-06	MAINTAINERS: update information for "MEMORY MANAGEMENT"	Song Liu	1	-0/+4
	I was trying to find the mm tree in MAINTAINERS by searching "Morton". Unfortunately, I didn't find one. And I didn't even locate the MEMORY MANAGEMENT section quickly, because Andrew's name was not listed there. Thanks to Johannes who helped me find the mm tree. Let save other's time searching around by adding: M: Andrew Morton <[email protected]> T: git git://github.com/hnaz/linux-mm.git [[email protected]: add ozlabs.org quilt trees] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Song Liu <[email protected]> Acked-by: Andrew Morton <[email protected]> Cc: Johannes Weiner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-11-06	dump_stack: avoid the livelock of the dump_lock	Kevin Hao	1	-1/+6
	In the current code, we use the atomic_cmpxchg() to serialize the output of the dump_stack(), but this implementation suffers the thundering herd problem. We have observed such kind of livelock on a Marvell cn96xx board(24 cpus) when heavily using the dump_stack() in a kprobe handler. Actually we can let the competitors to wait for the releasing of the lock before jumping to atomic_cmpxchg(). This will definitely mitigate the thundering herd problem. Thanks Linus for the suggestion. [[email protected]: fix comment] Link: http://lkml.kernel.org/r/[email protected] Fixes: b58d977432c8 ("dump_stack: serialize the output from dump_stack()") Signed-off-by: Kevin Hao <[email protected]> Suggested-by: Linus Torvalds <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-11-06	zswap: add Vitaly to the maintainers list	Vitaly Wool	1	-0/+1
	Per conversation with Dan, add myself to the zswap MAINTAINERS list. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Vitaly Wool <[email protected]> Acked-by: Dan Streetman <[email protected]> Acked-by: Andrew Morton <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-11-06	mm/page_alloc.c: ratelimit allocation failure warnings more aggressively	Johannes Weiner	1	-6/+1
	While investigating a bug related to higher atomic allocation failures, we noticed the failure warnings positively drowning the console, and in our case trigger lockup warnings because of a serial console too slow to handle all that output. But even if we had a faster console, it's unclear what additional information the current level of repetition provides. Allocation failures happen for three reasons: The machine is OOM, the VM is failing to handle reasonable requests, or somebody is making unreasonable requests (and didn't acknowledge their opportunism with __GFP_NOWARN). Having the memory dump, a callstack, and the ratelimit stats on skipped failure warnings should provide enough information to let users/admins/developers know whether something is wrong and point them in the right direction for debugging, bpftracing etc. Limit allocation failure warnings to one spew every ten seconds. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Johannes Weiner <[email protected]> Acked-by: David Rientjes <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-11-06	mm/khugepaged: fix might_sleep() warn with CONFIG_HIGHPTE=y	Ville Syrjälä	1	-3/+4
	I got some khugepaged spew on a 32bit x86: BUG: sleeping function called from invalid context at include/linux/mmu_notifier.h:346 in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 25, name: khugepaged INFO: lockdep is turned off. CPU: 1 PID: 25 Comm: khugepaged Not tainted 5.4.0-rc5-elk+ #206 Hardware name: System manufacturer P5Q-EM/P5Q-EM, BIOS 2203 07/08/2009 Call Trace: dump_stack+0x66/0x8e ___might_sleep.cold.96+0x95/0xa6 __might_sleep+0x2e/0x80 collapse_huge_page.isra.51+0x5ac/0x1360 khugepaged+0x9a9/0x20f0 kthread+0xf5/0x110 ret_from_fork+0x2e/0x38 Looks like it's due to CONFIG_HIGHPTE=y pte_offset_map()->kmap_atomic() vs. mmu_notifier_invalidate_range_start(). Let's do the naive approach and just reorder the two operations. Link: http://lkml.kernel.org/r/[email protected] Fixes: 810e24e009cf71 ("mm/mmu_notifiers: annotate with might_sleep()") Signed-off-by: Ville Syrjl <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Jérôme Glisse <[email protected]> Cc: Ralph Campbell <[email protected]> Cc: Ira Weiny <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-11-06	mm, vmstat: reduce zone->lock holding time by /proc/pagetypeinfo	Michal Hocko	1	-3/+20
	pagetypeinfo_showfree_print is called by zone->lock held in irq mode. This is not really nice because it blocks both any interrupts on that cpu and the page allocator. On large machines this might even trigger the hard lockup detector. Considering the pagetypeinfo is a debugging tool we do not really need exact numbers here. The primary reason to look at the outuput is to see how pageblocks are spread among different migratetypes and low number of pages is much more interesting therefore putting a bound on the number of pages on the free_list sounds like a reasonable tradeoff. The new output will simply tell [...] Node 6, zone Normal, type Movable >100000 >100000 >100000 >100000 41019 31560 23996 10054 3229 983 648 instead of Node 6, zone Normal, type Movable 399568 294127 221558 102119 41019 31560 23996 10054 3229 983 648 The limit has been chosen arbitrary and it is a subject of a future change should there be a need for that. While we are at it, also drop the zone lock after each free_list iteration which will help with the IRQ and page allocator responsiveness even further as the IRQ lock held time is always bound to those 100k pages. [[email protected]: tweak comment text, per David Hildenbrand] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Michal Hocko <[email protected]> Suggested-by: Andrew Morton <[email protected]> Reviewed-by: Waiman Long <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Acked-by: David Hildenbrand <[email protected]> Acked-by: Rafael Aquini <[email protected]> Acked-by: David Rientjes <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Jann Horn <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Konstantin Khlebnikov <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Song Liu <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-11-06	mm, vmstat: hide /proc/pagetypeinfo from normal users	Michal Hocko	1	-1/+1
	/proc/pagetypeinfo is a debugging tool to examine internal page allocator state wrt to fragmentation. It is not very useful for any other use so normal users really do not need to read this file. Waiman Long has noticed that reading this file can have negative side effects because zone->lock is necessary for gathering data and that a) interferes with the page allocator and its users and b) can lead to hard lockups on large machines which have very long free_list. Reduce both issues by simply not exporting the file to regular users. Link: http://lkml.kernel.org/r/[email protected] Fixes: 467c996c1e19 ("Print out statistics in relation to fragmentation avoidance to /proc/pagetypeinfo") Signed-off-by: Michal Hocko <[email protected]> Reported-by: Waiman Long <[email protected]> Acked-by: Mel Gorman <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Acked-by: Waiman Long <[email protected]> Acked-by: Rafael Aquini <[email protected]> Acked-by: David Rientjes <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Konstantin Khlebnikov <[email protected]> Cc: Jann Horn <[email protected]> Cc: Song Liu <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-11-06	mm/mmu_notifiers: use the right return code for WARN_ON	Jason Gunthorpe	1	-1/+1
	The return code from the op callback is actually in _ret, while the WARN_ON was checking ret which causes it to misfire. Link: http://lkml.kernel.org/r/[email protected] Fixes: 8402ce61bec2 ("mm/mmu_notifiers: check if mmu notifier callbacks are allowed to fail") Signed-off-by: Jason Gunthorpe <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Cc: Daniel Vetter <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-11-06	ocfs2: protect extent tree in ocfs2_prepare_inode_for_write()	Shuning Zhang	1	-44/+90
	When the extent tree is modified, it should be protected by inode cluster lock and ip_alloc_sem. The extent tree is accessed and modified in the ocfs2_prepare_inode_for_write, but isn't protected by ip_alloc_sem. The following is a case. The function ocfs2_fiemap is accessing the extent tree, which is modified at the same time. kernel BUG at fs/ocfs2/extent_map.c:475! invalid opcode: 0000 [#1] SMP Modules linked in: tun ocfs2 ocfs2_nodemanager configfs ocfs2_stackglue [...] CPU: 16 PID: 14047 Comm: o2info Not tainted 4.1.12-124.23.1.el6uek.x86_64 #2 Hardware name: Oracle Corporation ORACLE SERVER X7-2L/ASM, MB MECH, X7-2L, BIOS 42040600 10/19/2018 task: ffff88019487e200 ti: ffff88003daa4000 task.ti: ffff88003daa4000 RIP: ocfs2_get_clusters_nocache.isra.11+0x390/0x550 [ocfs2] Call Trace: ocfs2_fiemap+0x1e3/0x430 [ocfs2] do_vfs_ioctl+0x155/0x510 SyS_ioctl+0x81/0xa0 system_call_fastpath+0x18/0xd8 Code: 18 48 c7 c6 60 7f 65 a0 31 c0 bb e2 ff ff ff 48 8b 4a 40 48 8b 7a 28 48 c7 c2 78 2d 66 a0 e8 38 4f 05 00 e9 28 fe ff ff 0f 1f 00 <0f> 0b 66 0f 1f 44 00 00 bb 86 ff ff ff e9 13 fe ff ff 66 0f 1f RIP ocfs2_get_clusters_nocache.isra.11+0x390/0x550 [ocfs2] ---[ end trace c8aa0c8180e869dc ]--- Kernel panic - not syncing: Fatal exception Kernel Offset: disabled This issue can be reproduced every week in a production environment. This issue is related to the usage mode. If others use ocfs2 in this mode, the kernel will panic frequently. [[email protected]: coding style fixes] [Fix new warning due to unused function by removing said function - Linus ] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Shuning Zhang <[email protected]> Reviewed-by: Junxiao Bi <[email protected]> Reviewed-by: Gang He <[email protected]> Cc: Mark Fasheh <[email protected]> Cc: Joel Becker <[email protected]> Cc: Joseph Qi <[email protected]> Cc: Changwei Ge <[email protected]> Cc: Jun Piao <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-11-06	mm: thp: handle page cache THP correctly in PageTransCompoundMap	Yang Shi	3	-7/+23
	We have a usecase to use tmpfs as QEMU memory backend and we would like to take the advantage of THP as well. But, our test shows the EPT is not PMD mapped even though the underlying THP are PMD mapped on host. The number showed by /sys/kernel/debug/kvm/largepage is much less than the number of PMD mapped shmem pages as the below: 7f2778200000-7f2878200000 rw-s 00000000 00:14 262232 /dev/shm/qemu_back_mem.mem.Hz2hSf (deleted) Size: 4194304 kB [snip] AnonHugePages: 0 kB ShmemPmdMapped: 579584 kB [snip] Locked: 0 kB cat /sys/kernel/debug/kvm/largepages 12 And some benchmarks do worse than with anonymous THPs. By digging into the code we figured out that commit 127393fbe597 ("mm: thp: kvm: fix memory corruption in KVM with THP enabled") checks if there is a single PTE mapping on the page for anonymous THP when setting up EPT map. But the _mapcount < 0 check doesn't work for page cache THP since every subpage of page cache THP would get _mapcount inc'ed once it is PMD mapped, so PageTransCompoundMap() always returns false for page cache THP. This would prevent KVM from setting up PMD mapped EPT entry. So we need handle page cache THP correctly. However, when page cache THP's PMD gets split, kernel just remove the map instead of setting up PTE map like what anonymous THP does. Before KVM calls get_user_pages() the subpages may get PTE mapped even though it is still a THP since the page cache THP may be mapped by other processes at the mean time. Checking its _mapcount and whether the THP has PTE mapped or not. Although this may report some false negative cases (PTE mapped by other processes), it looks not trivial to make this accurate. With this fix /sys/kernel/debug/kvm/largepage would show reasonable pages are PMD mapped by EPT as the below: 7fbeaee00000-7fbfaee00000 rw-s 00000000 00:14 275464 /dev/shm/qemu_back_mem.mem.SKUvat (deleted) Size: 4194304 kB [snip] AnonHugePages: 0 kB ShmemPmdMapped: 557056 kB [snip] Locked: 0 kB cat /sys/kernel/debug/kvm/largepages 271 And the benchmarks are as same as anonymous THPs. [[email protected]: v4] Link: http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/[email protected] Fixes: dd78fedde4b9 ("rmap: support file thp") Signed-off-by: Yang Shi <[email protected]> Reported-by: Gang Deng <[email protected]> Tested-by: Gang Deng <[email protected]> Suggested-by: Hugh Dickins <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: <[email protected]> [4.8+] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-11-06	mm, meminit: recalculate pcpu batch and high limits after init completes	Mel Gorman	1	-2/+8
	Deferred memory initialisation updates zone->managed_pages during the initialisation phase but before that finishes, the per-cpu page allocator (pcpu) calculates the number of pages allocated/freed in batches as well as the maximum number of pages allowed on a per-cpu list. As zone->managed_pages is not up to date yet, the pcpu initialisation calculates inappropriately low batch and high values. This increases zone lock contention quite severely in some cases with the degree of severity depending on how many CPUs share a local zone and the size of the zone. A private report indicated that kernel build times were excessive with extremely high system CPU usage. A perf profile indicated that a large chunk of time was lost on zone->lock contention. This patch recalculates the pcpu batch and high values after deferred initialisation completes for every populated zone in the system. It was tested on a 2-socket AMD EPYC 2 machine using a kernel compilation workload -- allmodconfig and all available CPUs. mmtests configuration: config-workload-kernbench-max Configuration was modified to build on a fresh XFS partition. kernbench 5.4.0-rc3 5.4.0-rc3 vanilla resetpcpu-v2 Amean user-256 13249.50 ( 0.00%) 16401.31 * -23.79%* Amean syst-256 14760.30 ( 0.00%) 4448.39 * 69.86%* Amean elsp-256 162.42 ( 0.00%) 119.13 * 26.65%* Stddev user-256 42.97 ( 0.00%) 19.15 ( 55.43%) Stddev syst-256 336.87 ( 0.00%) 6.71 ( 98.01%) Stddev elsp-256 2.46 ( 0.00%) 0.39 ( 84.03%) 5.4.0-rc3 5.4.0-rc3 vanilla resetpcpu-v2 Duration User 39766.24 49221.79 Duration System 44298.10 13361.67 Duration Elapsed 519.11 388.87 The patch reduces system CPU usage by 69.86% and total build time by 26.65%. The variance of system CPU usage is also much reduced. Before, this was the breakdown of batch and high values over all zones was: 256 batch: 1 256 batch: 63 512 batch: 7 256 high: 0 256 high: 378 512 high: 42 512 pcpu pagesets had a batch limit of 7 and a high limit of 42. After the patch: 256 batch: 1 768 batch: 63 256 high: 0 768 high: 378 [[email protected]: fix merge/linkage snafu] Link: http://lkml.kernel.org/r/[email protected]: http://lkml.kernel.org/r/[email protected] Signed-off-by: Mel Gorman <[email protected]> Acked-by: Michal Hocko <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Acked-by: David Hildenbrand <[email protected]> Cc: Matt Fleming <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Qian Cai <[email protected]> Cc: <[email protected]> [4.1+] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-11-06	mm/gup_benchmark: fix MAP_HUGETLB case	John Hubbard	1	-1/+1
	The MAP_HUGETLB ("-H" option) of gup_benchmark fails: $ sudo ./gup_benchmark -H mmap: Invalid argument This is because gup_benchmark.c is passing in a file descriptor to mmap(), but the fd came from opening up the /dev/zero file. This confuses the mmap syscall implementation, which thinks that, if the caller did not specify MAP_ANONYMOUS, then the file must be a huge page file. So it attempts to verify that the file really is a huge page file, as you can see here: ksys_mmap_pgoff() { if (!(flags & MAP_ANONYMOUS)) { retval = -EINVAL; if (unlikely(flags & MAP_HUGETLB && !is_file_hugepages(file))) goto out_fput; /* THIS IS WHERE WE END UP */ else if (flags & MAP_HUGETLB) { ...proceed normally, /dev/zero is ok here... ...and of course is_file_hugepages() returns "false" for the /dev/zero file. The problem is that the user space program, gup_benchmark.c, really just wants anonymous memory here. The simplest way to get that is to pass MAP_ANONYMOUS whenever MAP_HUGETLB is specified, so that's what this patch does. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: John Hubbard <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Reviewed-by: Jérôme Glisse <[email protected]> Cc: Keith Busch <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-11-06	mm: memcontrol: fix NULL-ptr deref in percpu stats flush	Shakeel Butt	1	-6/+6
	__mem_cgroup_free() can be called on the failure path in mem_cgroup_alloc(). However memcg_flush_percpu_vmstats() and memcg_flush_percpu_vmevents() which are called from __mem_cgroup_free() access the fields of memcg which can potentially be null if called from failure path from mem_cgroup_alloc(). Indeed syzbot has reported the following crash: kasan: CONFIG_KASAN_INLINE enabled kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] PREEMPT SMP KASAN CPU: 0 PID: 30393 Comm: syz-executor.1 Not tainted 5.4.0-rc2+ #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:memcg_flush_percpu_vmstats+0x4ae/0x930 mm/memcontrol.c:3436 Code: 05 41 89 c0 41 0f b6 04 24 41 38 c7 7c 08 84 c0 0f 85 5d 03 00 00 44 3b 05 33 d5 12 08 0f 83 e2 00 00 00 4c 89 f0 48 c1 e8 03 <42> 80 3c 28 00 0f 85 91 03 00 00 48 8b 85 10 fe ff ff 48 8b b0 90 RSP: 0018:ffff888095c27980 EFLAGS: 00010206 RAX: 0000000000000012 RBX: ffff888095c27b28 RCX: ffffc90008192000 RDX: 0000000000040000 RSI: ffffffff8340fae7 RDI: 0000000000000007 RBP: ffff888095c27be0 R08: 0000000000000000 R09: ffffed1013f0da33 R10: ffffed1013f0da32 R11: ffff88809f86d197 R12: fffffbfff138b760 R13: dffffc0000000000 R14: 0000000000000090 R15: 0000000000000007 FS: 00007f5027170700(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000710158 CR3: 00000000a7b18000 CR4: 00000000001406f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: __mem_cgroup_free+0x1a/0x190 mm/memcontrol.c:5021 mem_cgroup_free mm/memcontrol.c:5033 [inline] mem_cgroup_css_alloc+0x3a1/0x1ae0 mm/memcontrol.c:5160 css_create kernel/cgroup/cgroup.c:5156 [inline] cgroup_apply_control_enable+0x44d/0xc40 kernel/cgroup/cgroup.c:3119 cgroup_mkdir+0x899/0x11b0 kernel/cgroup/cgroup.c:5401 kernfs_iop_mkdir+0x14d/0x1d0 fs/kernfs/dir.c:1124 vfs_mkdir+0x42e/0x670 fs/namei.c:3807 do_mkdirat+0x234/0x2a0 fs/namei.c:3830 __do_sys_mkdir fs/namei.c:3846 [inline] __se_sys_mkdir fs/namei.c:3844 [inline] __x64_sys_mkdir+0x5c/0x80 fs/namei.c:3844 do_syscall_64+0xfa/0x760 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe Fixing this by moving the flush to mem_cgroup_free as there is no need to flush anything if we see failure in mem_cgroup_alloc(). Link: http://lkml.kernel.org/r/[email protected] Fixes: bb65f89b7d3d ("mm: memcontrol: flush percpu vmevents before releasing memcg") Fixes: c350a99ea2b1 ("mm: memcontrol: flush percpu vmstats before releasing memcg") Signed-off-by: Shakeel Butt <[email protected]> Reported-by: [email protected] Reviewed-by: Roman Gushchin <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Vladimir Davydov <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-11-06	MAINTAINERS: update Cavium ThunderX2 maintainers	Jayachandran C	2	-1/+4
	jnair is no longer at caviumnetworks.com (or at marvell.com). This also means that Cavium ThunderX2 will now be maintained by Robert. This is probably a good time to map various email addresses used for my patches to my personal email ID, update .mailmap to do this. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jayachandran C <[email protected]> Acked-by: Robert Richter <[email protected]> Signed-off-by: Olof Johansson <[email protected]>
2019-11-06	Merge tag 'stm32-dt-for-v5.4-fixes-2' of ↵	Olof Johansson	2	-13/+4
	git://git.kernel.org/pub/scm/linux/kernel/git/atorgue/stm32 into arm/fixes STM32 DT fixes for v5.4, round 2 Highlights: ----------- Fixes for STM32MP157: -Fix CAN RAM mapping -Change stmfx pinctrl definition for joystick and camera. Due to stmfx pinctrl fix done in v5.4-rc cycle, camera and joystick were no longer functional. * tag 'stm32-dt-for-v5.4-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/atorgue/stm32: ARM: dts: stm32: change joystick pinctrl definition on stm32mp157c-ev1 ARM: dts: stm32: remove OV5640 pinctrl definition on stm32mp157c-ev1 ARM: dts: stm32: Fix CAN RAM mapping on stm32mp157c ARM: dts: stm32: relax qspi pins slew-rate for stm32mp157 Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Olof Johansson <[email protected]>
2019-11-06	ASoC: SOF: topology: Fix bytes control size checks	Dragos Tarcatu	1	-5/+6
	When using the example SOF amp widget topology, KASAN dumps this when the AMP bytes kcontrol gets loaded: [ 9.579548] BUG: KASAN: slab-out-of-bounds in sof_control_load+0x8cc/0xac0 [snd_sof] [ 9.588194] Write of size 40 at addr ffff8882314559dc by task systemd-udevd/2411 Fix that by rejecting the topology if the bytes data size > max_size Fixes: 311ce4fe7637d ("ASoC: SOF: Add support for loading topologies") Reviewed-by: Jaska Uimonen <[email protected]> Reviewed-by: Ranjani Sridharan <[email protected]> Signed-off-by: Dragos Tarcatu <[email protected]> Signed-off-by: Pierre-Louis Bossart <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Mark Brown <[email protected]>
2019-11-06	ARM: dts: stm32: change joystick pinctrl definition on stm32mp157c-ev1	Amelie Delaunay	1	-1/+0
	Pins used for joystick are all configured as input. "push-pull" is not a valid setting for an input pin. Fixes: a502b343ebd0 ("pinctrl: stmfx: update pinconf settings") Signed-off-by: Alexandre Torgue <[email protected]> Signed-off-by: Amelie Delaunay <[email protected]> Signed-off-by: Alexandre Torgue <[email protected]>
2019-11-06	ARM: dts: stm32: remove OV5640 pinctrl definition on stm32mp157c-ev1	Amelie Delaunay	1	-10/+2
	"push-pull" configuration is now fully handled by the gpiolib and the STMFX pinctrl driver. There is no longer need to declare a pinctrl group to only configure "push-pull" setting for the line. It is done directly by the gpiolib. Fixes: a502b343ebd0 ("pinctrl: stmfx: update pinconf settings") Signed-off-by: Alexandre Torgue <[email protected]> Signed-off-by: Amelie Delaunay <[email protected]> Signed-off-by: Alexandre Torgue <[email protected]>
2019-11-06	ARM: dts: stm32: Fix CAN RAM mapping on stm32mp157c	Christophe Roullier	1	-2/+2
	Split the 10Kbytes CAN message RAM to be able to use simultaneously FDCAN1 and FDCAN2 instances. First 5Kbytes are allocated to FDCAN1 and last 5Kbytes are used for FDCAN2. To do so, set the offset to 0x1400 in mram-cfg for FDCAN2. Fixes: d44d6e021301 ("ARM: dts: stm32: change CAN RAM mapping on stm32mp157c") Signed-off-by: Christophe Roullier <[email protected]> Signed-off-by: Alexandre Torgue <[email protected]>
2019-11-06	ARM: dts: stm32: relax qspi pins slew-rate for stm32mp157	Patrice Chotard	1	-4/+4
	Relax qspi pins slew-rate to minimize peak currents. Fixes: 844030057339 ("ARM: dts: stm32: add flash nor support on stm32mp157c eval board") Signed-off-by: Patrice Chotard <[email protected]> Signed-off-by: Alexandre Torgue <[email protected]>
2019-11-05	Documentation: TLS: Add missing counter description	Tariq Toukan	1	-0/+4
	Add TLS TX counter description for the handshake retransmitted packets that triggers the resync procedure then skip it, going into the regular transmit flow. Fixes: 46a3ea98074e ("net/mlx5e: kTLS, Enhance TX resync flow") Signed-off-by: Tariq Toukan <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]> Acked-by: Jakub Kicinski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-11-05	NFC: fdp: fix incorrect free object	Pan Bian	1	-1/+1
	The address of fw_vsc_cfg is on stack. Releasing it with devm_kfree() is incorrect, which may result in a system crash or other security impacts. The expected object to free is *fw_vsc_cfg. Signed-off-by: Pan Bian <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-11-05	net: prevent load/store tearing on sk->sk_stamp	Eric Dumazet	1	-2/+2
	Add a couple of READ_ONCE() and WRITE_ONCE() to prevent load-tearing and store-tearing in sock_read_timestamp() and sock_write_timestamp() This might prevent another KCSAN report. Fixes: 3a0ed3e96197 ("sock: Make sock->sk_stamp thread-safe") Signed-off-by: Eric Dumazet <[email protected]> Cc: Deepa Dinamani <[email protected]> Acked-by: Deepa Dinamani <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-11-05	net: qualcomm: rmnet: Fix potential UAF when unregistering	Sean Tranchetti	1	-2/+2
	During the exit/unregistration process of the RmNet driver, the function rmnet_unregister_real_device() is called to handle freeing the driver's internal state and removing the RX handler on the underlying physical device. However, the order of operations this function performs is wrong and can lead to a use after free of the rmnet_port structure. Before calling netdev_rx_handler_unregister(), this port structure is freed with kfree(). If packets are received on any RmNet devices before synchronize_net() completes, they will attempt to use this already-freed port structure when processing the packet. As such, before cleaning up any other internal state, the RX handler must be unregistered in order to guarantee that no further packets will arrive on the device. Fixes: ceed73a2cf4a ("drivers: net: ethernet: qualcomm: rmnet: Initial implementation") Signed-off-by: Sean Tranchetti <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-11-05	net/tls: fix sk_msg trim on fallback to copy mode	Jakub Kicinski	2	-8/+21
	sk_msg_trim() tries to only update curr pointer if it falls into the trimmed region. The logic, however, does not take into the account pointer wrapping that sk_msg_iter_var_prev() does nor (as John points out) the fact that msg->sg is a ring buffer. This means that when the message was trimmed completely, the new curr pointer would have the value of MAX_MSG_FRAGS - 1, which is neither smaller than any other value, nor would it actually be correct. Special case the trimming to 0 length a little bit and rework the comparison between curr and end to take into account wrapping. This bug caused the TLS code to not copy all of the message, if zero copy filled in fewer sg entries than memcopy would need. Big thanks to Alexander Potapenko for the non-KMSAN reproducer. v2: - take into account that msg->sg is a ring buffer (John). Link: https://lore.kernel.org/netdev/[email protected]/ (v1) Fixes: d829e9c4112b ("tls: convert to generic sk_msg interface") Reported-by: [email protected] Reported-by: [email protected] Co-developed-by: John Fastabend <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: John Fastabend <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-11-05	mlx4_core: fix wrong comment about the reason of subtract one from the max_cqes	Dotan Barak	1	-2/+1
	The reason for the pre-allocation of one CQE is to enable resizing of the CQ. Fix comment accordingly. Signed-off-by: Dotan Barak <[email protected]> Signed-off-by: Eli Cohen <[email protected]> Signed-off-by: Vladimir Sokolovsky <[email protected]> Signed-off-by: Yuval Shaia <[email protected]> Reviewed-by: Tariq Toukan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-11-05	net: dsa: bcm_sf2: Fix driver removal	Florian Fainelli	1	-2/+2
	With the DSA core doing the call to dsa_port_disable() we do not need to do that within the driver itself. This could cause an use after free since past dsa_unregister_switch() we should not be accessing any dsa_switch internal structures. Fixes: 0394a63acfe2 ("net: dsa: enable and disable all ports") Signed-off-by: Florian Fainelli <[email protected]> Reviewed-by: Vivien Didelot <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-11-05	net: sched: prevent duplicate flower rules from tcf_proto destroy race	John Hurley	2	-4/+83
	When a new filter is added to cls_api, the function tcf_chain_tp_insert_unique() looks up the protocol/priority/chain to determine if the tcf_proto is duplicated in the chain's hashtable. It then creates a new entry or continues with an existing one. In cls_flower, this allows the function fl_ht_insert_unque to determine if a filter is a duplicate and reject appropriately, meaning that the duplicate will not be passed to drivers via the offload hooks. However, when a tcf_proto is destroyed it is removed from its chain before a hardware remove hook is hit. This can lead to a race whereby the driver has not received the remove message but duplicate flows can be accepted. This, in turn, can lead to the offload driver receiving incorrect duplicate flows and out of order add/delete messages. Prevent duplicates by utilising an approach suggested by Vlad Buslov. A hash table per block stores each unique chain/protocol/prio being destroyed. This entry is only removed when the full destroy (and hardware offload) has completed. If a new flow is being added with the same identiers as a tc_proto being detroyed, then the add request is replayed until the destroy is complete. Fixes: 8b64678e0af8 ("net: sched: refactor tp insert/delete for concurrent execution") Signed-off-by: John Hurley <[email protected]> Signed-off-by: Vlad Buslov <[email protected]> Reviewed-by: Simon Horman <[email protected]> Reported-by: Louis Peens <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-11-05	net: hns3: Use the correct style for SPDX License Identifier	Nishad Kamdar	7	-7/+7
	This patch corrects the SPDX License Identifier style in header files related to Hisilicon network devices. For C header files Documentation/process/license-rules.rst mandates C-like comments (opposed to C source files where C++ style should be used) Changes made by using a script provided by Joe Perches here: https://lkml.org/lkml/2019/2/7/46. Suggested-by: Joe Perches <[email protected]> Signed-off-by: Nishad Kamdar <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-11-05	bonding: fix state transition issue in link monitoring	Jay Vosburgh	2	-24/+23
	Since de77ecd4ef02 ("bonding: improve link-status update in mii-monitoring"), the bonding driver has utilized two separate variables to indicate the next link state a particular slave should transition to. Each is used to communicate to a different portion of the link state change commit logic; one to the bond_miimon_commit function itself, and another to the state transition logic. Unfortunately, the two variables can become unsynchronized, resulting in incorrect link state transitions within bonding. This can cause slaves to become stuck in an incorrect link state until a subsequent carrier state transition. The issue occurs when a special case in bond_slave_netdev_event sets slave->link directly to BOND_LINK_FAIL. On the next pass through bond_miimon_inspect after the slave goes carrier up, the BOND_LINK_FAIL case will set the proposed next state (link_new_state) to BOND_LINK_UP, but the new_link to BOND_LINK_DOWN. The setting of the final link state from new_link comes after that from link_new_state, and so the slave will end up incorrectly in _DOWN state. Resolve this by combining the two variables into one. Reported-by: Aleksei Zakharov <[email protected]> Reported-by: Sha Zhang <[email protected]> Cc: Mahesh Bandewar <[email protected]> Fixes: de77ecd4ef02 ("bonding: improve link-status update in mii-monitoring") Signed-off-by: Jay Vosburgh <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-11-05	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf	David S. Miller	8	-9/+35
	Daniel Borkmann says: ==================== pull-request: bpf 2019-11-02 The following pull-request contains BPF updates for your net tree. We've added 6 non-merge commits during the last 6 day(s) which contain a total of 8 files changed, 35 insertions(+), 9 deletions(-). The main changes are: 1) Fix ppc BPF JIT's tail call implementation by performing a second pass to gather a stable JIT context before opcode emission, from Eric Dumazet. 2) Fix build of BPF samples sys_perf_event_open() usage to compiled out unavailable test_attr__{enabled,open} checks. Also fix potential overflows in bpf_map_{area_alloc,charge_init} on 32 bit archs, from Björn Töpel. 3) Fix narrow loads of bpf_sysctl context fields with offset > 0 on big endian archs like s390x and also improve the test coverage, from Ilya Leoshkevich. ==================== Signed-off-by: David S. Miller <[email protected]>
2019-11-05	Merge branch 'nvme-5.4-rc7' of git://git.infradead.org/nvme into for-linus	Jens Axboe	3	-0/+11
	Pull NVMe fixes from Keith: "We have a few late nvme fixes for a couple device removal kernel crashes, and a compat fix for a new ioctl introduced during this merge window." * 'nvme-5.4-rc7' of git://git.infradead.org/nvme: nvme: change nvme_passthru_cmd64 to explicitly mark rsvd nvme-multipath: fix crash in nvme_mpath_clear_ctrl_paths nvme-rdma: fix a segmentation fault during module unload
2019-11-05	taprio: fix panic while hw offload sched list swap	Ivan Khoronzhuk	1	-2/+3
	Don't swap oper and admin schedules too early, it's not correct and causes crash. Steps to reproduce: 1) tc qdisc replace dev eth0 parent root handle 100 taprio \ num_tc 3 \ map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 \ queues 1@0 1@1 1@2 \ base-time $SOME_BASE_TIME \ sched-entry S 01 80000 \ sched-entry S 02 15000 \ sched-entry S 04 40000 \ flags 2 2) tc qdisc replace dev eth0 parent root handle 100 taprio \ base-time $SOME_BASE_TIME \ sched-entry S 01 90000 \ sched-entry S 02 20000 \ sched-entry S 04 40000 \ flags 2 3) tc qdisc replace dev eth0 parent root handle 100 taprio \ base-time $SOME_BASE_TIME \ sched-entry S 01 150000 \ sched-entry S 02 200000 \ sched-entry S 04 40000 \ flags 2 Do 2 3 2 .. steps more times if not happens and observe: [ 305.832319] Unable to handle kernel write to read-only memory at virtual address ffff0000087ce7f0 [ 305.910887] CPU: 0 PID: 0 Comm: swapper/0 Not tainted [ 305.919306] Hardware name: Texas Instruments AM654 Base Board (DT) [...] [ 306.017119] x1 : ffff800848031d88 x0 : ffff800848031d80 [ 306.022422] Call trace: [ 306.024866] taprio_free_sched_cb+0x4c/0x98 [ 306.029040] rcu_process_callbacks+0x25c/0x410 [ 306.033476] __do_softirq+0x10c/0x208 [ 306.037132] irq_exit+0xb8/0xc8 [ 306.040267] __handle_domain_irq+0x64/0xb8 [ 306.044352] gic_handle_irq+0x7c/0x178 [ 306.048092] el1_irq+0xb0/0x128 [ 306.051227] arch_cpu_idle+0x10/0x18 [ 306.054795] do_idle+0x120/0x138 [ 306.058015] cpu_startup_entry+0x20/0x28 [ 306.061931] rest_init+0xcc/0xd8 [ 306.065154] start_kernel+0x3bc/0x3e4 [ 306.068810] Code: f2fbd5b7 f2fbd5b6 d503201f f9400422 (f9000662) [ 306.074900] ---[ end trace 96c8e2284a9d9d6e ]--- [ 306.079507] Kernel panic - not syncing: Fatal exception in interrupt [ 306.085847] SMP: stopping secondary CPUs [ 306.089765] Kernel Offset: disabled Try to explain one of the possible crash cases: The "real" admin list is assigned when admin_sched is set to new_admin, it happens after "swap", that assigns to oper_sched NULL. Thus if call qdisc show it can crash. Farther, next second time, when sched list is updated, the admin_sched is not NULL and becomes the oper_sched, previous oper_sched was NULL so just skipped. But then admin_sched is assigned new_admin, but schedules to free previous assigned admin_sched (that already became oper_sched). Farther, next third time, when sched list is updated, while one more swap, oper_sched is not null, but it was happy to be freed already (while prev. admin update), so while try to free oper_sched the kernel panic happens at taprio_free_sched_cb(). So, move the "swap emulation" where it should be according to function comment from code. Fixes: 9c66d15646760e ("taprio: Add support for hardware offloading") Signed-off-by: Ivan Khoronzhuk <[email protected]> Acked-by: Vinicius Costa Gomes <[email protected]> Tested-by: Vladimir Oltean <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-11-05	Merge tag 'linux-can-fixes-for-5.4-20191105' of ↵	David S. Miller	23	-138/+369
	git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can Marc Kleine-Budde says: ==================== pull-request: can 2019-11-05 this is a pull request of 33 patches for net/master. In the first patch Wen Yang's patch adds a missing of_node_put() to CAN device infrastructure. Navid Emamdoost's patch for the gs_usb driver fixes a memory leak in the gs_can_open() error path. Johan Hovold provides two patches, one for the mcba_usb, the other for the usb_8dev driver. Both fix a use-after-free after USB-disconnect. Joakim Zhang's patch improves the flexcan driver, the ECC mechanism is now completely disabled instead of masking the interrupts. The next three patches all target the peak_usb driver. Stephane Grosjean's patch fixes a potential out-of-sync while decoding packets, Johan Hovold's patch fixes a slab info leak, Jeroen Hofstee's patch adds missing reporting of bus off recovery events. Followed by three patches for the c_can driver. Kurt Van Dijck's patch fixes detection of potential missing status IRQs, Jeroen Hofstee's patches add a chip reset on open and add missing reporting of bus off recovery events. Appana Durga Kedareswara rao's patch for the xilinx driver fixes the flags field initialization for axi CAN. The next seven patches target the rx-offload helper, they are by me and Jeroen Hofstee. The error handling in case of a queue overflow is fixed removing a memory leak. Further the error handling in case of queue overflow and skb OOM is cleaned up. The next two patches are by me and target the flexcan and ti_hecc driver. In case of a error during can_rx_offload_queue_sorted() the error counters in the drivers are incremented. Jeroen Hofstee provides 6 patches for the ti_hecc driver, which properly stop the device in ifdown, improve the rx-offload support (which hit mainline in v5.4-rc1), and add missing FIFO overflow and state change reporting. The following four patches target the j1939 protocol. Colin Ian King's patch fixes a memory leak in the j1939_sk_errqueue() handling. Three patches by Oleksij Rempel fix a memory leak on socket release and fix the EOMA packet in the transport protocol. Timo Schlüßler's patch fixes a potential race condition in the mcp251x driver on after suspend. The last patch is by Yegor Yefremov and updates the SPDX-License-Identifier to v3.0. ==================== Signed-off-by: David S. Miller <[email protected]>
2019-11-06	nvme: change nvme_passthru_cmd64 to explicitly mark rsvd	Charles Machalow	1	-0/+1
	Changing nvme_passthru_cmd64 to add a field: rsvd2. This field is an explicit marker for the padding space added on certain platforms as a result of the enlargement of the result field from 32 bit to 64 bits in size, and fixes differences in struct size when using compat ioctl for 32-bit binaries on 64-bit architecture. Fixes: 65e68edce0db ("nvme: allow 64-bit results in passthru commands") Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Charles Machalow <[email protected]> [changelog] Signed-off-by: Keith Busch <[email protected]>
2019-11-05	ALSA: hda: hdmi - add Tigerlake support	Kai Vehmanen	1	-0/+13
	Add Tigerlake HDMI codec support. BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=205379 BugLink: https://bugs.freedesktop.org/show_bug.cgi?id=112171 Cc: Pan Xiuli <[email protected]> Signed-off-by: Kai Vehmanen <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Takashi Iwai <[email protected]>
2019-11-05	ASoC: max98373: replace gpio_request with devm_gpio_request	Yong Zhi	1	-2/+2
	Use devm_gpio_request() to automatic unroll when fails and avoid resource leaks at error paths. Signed-off-by: Yong Zhi <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Mark Brown <[email protected]>
2019-11-05	ASoC: stm32: sai: add restriction on mmap support	Olivier Moysan	1	-1/+11
	Do not support mmap in S/PDIF mode. In S/PDIF mode the buffer has to be copied, to allow the channel status bits insertion. Signed-off-by: Olivier Moysan <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Mark Brown <[email protected]>
2019-11-05	Merge tag 'for-linus-2019-11-05' of ↵	Linus Torvalds	2	-1/+36
	git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux Pull clone3 stack argument update from Christian Brauner: "This changes clone3() to do basic stack validation and to set up the stack depending on whether or not it is growing up or down. With clone3() the expectation is now very simply that the .stack argument points to the lowest address of the stack and that .stack_size specifies the initial stack size. This is diferent from legacy clone() where the "stack" argument had to point to the lowest or highest address of the stack depending on the architecture. clone3() was released with 5.3. Currently, it is not documented and very unclear to userspace how the stack and stack_size argument have to be passed. After talking to glibc folks we concluded that changing clone3() to determine stack direction and doing basic validation is the right course of action. Note, this is a potentially user visible change. In the very unlikely case, that it breaks someone's use-case we will revert. (And then e.g. place the new behavior under an appropriate flag.) Note that passing an empty stack will continue working just as before. Breaking someone's use-case is very unlikely. Neither glibc nor musl currently expose a wrapper for clone3(). There is currently also no real motivation for anyone to use clone3() directly. First, because using clone{3}() with stacks requires some assembly (see glibc and musl). Second, because it does not provide features that legacy clone() doesn't. New features for clone3() will first happen in v5.5 which is why v5.4 is still a good time to try and make that change now and backport it to v5.3. I did a codesearch on https://codesearch.debian.net, github, and gitlab and could not find any software currently relying directly on clone3(). I expect this to change once we land CLONE_CLEAR_SIGHAND which was a request coming from glibc at which point they'll likely start using it" * tag 'for-linus-2019-11-05' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: clone3: validate stack arguments
2019-11-05	nvmem: add Rockchip OTP driver	Finley Xiao	3	-0/+281
	Newer Rockchip socs like the px30 use a different one-time-programmable memory controller for things like cpu-id and leakage information, so add the necessary driver for it. Signed-off-by: Finley Xiao <[email protected]> [ported from vendor 4.4, converted to clock-bulk API and cleanups] Signed-off-by: Heiko Stuebner <[email protected]> Signed-off-by: Srinivas Kandagatla <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2019-11-05	dt-bindings: nvmem: add binding for Rockchip OTP controller	Heiko Stuebner	1	-0/+25
	Newer Rockchip SoCs use a different IP for accessing special one- time-programmable memory, so add a binding for these controllers. Signed-off-by: Heiko Stuebner <[email protected]> Reviewed-by: Rob Herring <[email protected]> Signed-off-by: Srinivas Kandagatla <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2019-11-05	nvmem: imx: scu: fix dependency in Kconfig	Srinivas Kandagatla	1	-0/+1
	Fix below error by adding HAVE_ARM_SMCCC dependency in Kconfig ERROR: "__arm_smccc_smc" [drivers/nvmem/nvmem-imx-ocotp-scu.ko] undefined! Reported-by: kbuild test robot <[email protected]> Signed-off-by: Srinivas Kandagatla <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2019-11-05	nvmem: sprd: Add Spreadtrum SoCs eFuse support	Freeman Liu	3	-0/+437
	The Spreadtrum eFuse controller is widely used to dump chip ID, configuration setting, function select and so on, as well as supporting one-time programming. Signed-off-by: Freeman Liu <[email protected]> Signed-off-by: Baolin Wang <[email protected]> Signed-off-by: Srinivas Kandagatla <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2019-11-05	dt-bindings: nvmem: Add Spreadtrum eFuse controller documentation	Freeman Liu	1	-0/+39
	This patch adds the binding documentation for Spreadtrum eFuse controller. Signed-off-by: Freeman Liu <[email protected]> Signed-off-by: Baolin Wang <[email protected]> Reviewed-by: Rob Herring <[email protected]> Signed-off-by: Srinivas Kandagatla <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2019-11-05	nvmem: imx-ocotp: reset error status on probe	Lucas Stach	1	-0/+4
	If software running before the OCOTP driver is loaded left the controller with the error status pending, the driver will never be able to complete the read timing setup. Reset the error status on probe to make sure the controller is in usable state. Signed-off-by: Lucas Stach <[email protected]> Signed-off-by: Srinivas Kandagatla <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>