aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2020-02-04ipc/mqueue.c: update/document memory barriersManfred Spraul1-14/+78
Update and document memory barriers for mqueue.c: - ewp->state is read without any locks, thus READ_ONCE is required. - add smp_aquire__after_ctrl_dep() after the READ_ONCE, we need acquire semantics if the value is STATE_READY. - use wake_q_add_safe() - document why __set_current_state() may be used: Reading task->state cannot happen before the wake_q_add() call, which happens while holding info->lock. Thus the spin_unlock() is the RELEASE, and the spin_lock() is the ACQUIRE. For completeness: there is also a 3 CPU scenario, if the to be woken up task is already on another wake_q. Then: - CPU1: spin_unlock() of the task that goes to sleep is the RELEASE - CPU2: the spin_lock() of the waker is the ACQUIRE - CPU2: smp_mb__before_atomic inside wake_q_add() is the RELEASE - CPU3: smp_mb__after_spinlock() inside try_to_wake_up() is the ACQUIRE Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Manfred Spraul <[email protected]> Reviewed-by: Davidlohr Bueso <[email protected]> Cc: Waiman Long <[email protected]> Cc: <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-02-04ipc/mqueue.c: remove duplicated codeDavidlohr Bueso1-13/+18
pipelined_send() and pipelined_receive() are identical, so merge them. [[email protected]: add changelog] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Davidlohr Bueso <[email protected]> Signed-off-by: Manfred Spraul <[email protected]> Cc: <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Waiman Long <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-02-04smp_mb__{before,after}_atomic(): update DocumentationManfred Spraul1-6/+10
When adding the _{acquire|release|relaxed}() variants of some atomic operations, it was forgotten to update Documentation/memory_barrier.txt: smp_mb__{before,after}_atomic() is now intended for all RMW operations that do not imply a memory barrier. 1) smp_mb__before_atomic(); atomic_add(); 2) smp_mb__before_atomic(); atomic_xchg_relaxed(); 3) smp_mb__before_atomic(); atomic_fetch_add_relaxed(); Invalid would be: smp_mb__before_atomic(); atomic_set(); In addition, the patch splits the long sentence into multiple shorter sentences. Link: http://lkml.kernel.org/r/[email protected] Fixes: 654672d4ba1a ("locking/atomics: Add _{acquire|release|relaxed}() variants of some atomic operations") Signed-off-by: Manfred Spraul <[email protected]> Acked-by: Waiman Long <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Will Deacon <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-02-04mm/memory_hotplug: drop valid_start/valid_end from test_pages_in_a_zone()David Hildenbrand3-29/+15
The callers are only interested in the actual zone, they don't care about boundaries. Return the zone instead to simplify. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: "Rafael J. Wysocki" <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Oscar Salvador <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-02-04mm/memory_hotplug: cleanup __remove_pages()David Hildenbrand1-11/+6
Let's drop the basically unused section stuff and simplify. Also, let's use a shorter variant to calculate the number of pages to the next section boundary. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Michal Hocko <[email protected]> Cc: "Matthew Wilcox (Oracle)" <[email protected]> Cc: "Aneesh Kumar K.V" <[email protected]> Cc: Pavel Tatashin <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Dan Williams <[email protected]> Cc: Logan Gunthorpe <[email protected]> Cc: Pankaj Gupta <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-02-04mm/memory_hotplug: drop local variables in shrink_zone_span()David Hildenbrand1-9/+6
Get rid of the unnecessary local variables. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Cc: "Aneesh Kumar K.V" <[email protected]> Cc: Dan Williams <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Logan Gunthorpe <[email protected]> Cc: "Matthew Wilcox (Oracle)" <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Pankaj Gupta <[email protected]> Cc: Pavel Tatashin <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-02-04mm/memory_hotplug: don't check for "all holes" in shrink_zone_span()David Hildenbrand1-27/+7
If we have holes, the holes will automatically get detected and removed once we remove the next bigger/smaller section. The extra checks can go. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Cc: Michal Hocko <[email protected]> Cc: "Matthew Wilcox (Oracle)" <[email protected]> Cc: "Aneesh Kumar K.V" <[email protected]> Cc: Pavel Tatashin <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Dan Williams <[email protected]> Cc: Logan Gunthorpe <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Pankaj Gupta <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-02-04mm/memory_hotplug: we always have a zone in find_(smallest|biggest)_section_pfnDavid Hildenbrand1-2/+2
With shrink_pgdat_span() out of the way, we now always have a valid zone. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Cc: Michal Hocko <[email protected]> Cc: "Matthew Wilcox (Oracle)" <[email protected]> Cc: "Aneesh Kumar K.V" <[email protected]> Cc: Pavel Tatashin <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Dan Williams <[email protected]> Cc: Logan Gunthorpe <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Pankaj Gupta <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-02-04mm/memory_hotplug: poison memmap in remove_pfn_range_from_zone()David Hildenbrand2-0/+5
Let's poison the pages similar to when adding new memory in sparse_add_section(). Also call remove_pfn_range_from_zone() from memunmap_pages(), so we can poison the memmap from there as well. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Cc: Michal Hocko <[email protected]> Cc: "Matthew Wilcox (Oracle)" <[email protected]> Cc: "Aneesh Kumar K.V" <[email protected]> Cc: Pavel Tatashin <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Dan Williams <[email protected]> Cc: Logan Gunthorpe <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Pankaj Gupta <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-02-04mm/memmap_init: update variable name in memmap_init_zoneAneesh Kumar K.V1-4/+4
Patch series "mm/memory_hotplug: Shrink zones before removing memory", v6. This series fixes the access of uninitialized memmaps when shrinking zones/nodes and when removing memory. Also, it contains all fixes for crashes that can be triggered when removing certain namespace using memunmap_pages() - ZONE_DEVICE, reported by Aneesh. We stop trying to shrink ZONE_DEVICE, as it's buggy, fixing it would be more involved (we don't have SECTION_IS_ONLINE as an indicator), and shrinking is only of limited use (set_zone_contiguous() cannot detect the ZONE_DEVICE as contiguous). We continue shrinking !ZONE_DEVICE zones, however, I reduced the amount of code to a minimum. Shrinking is especially necessary to keep zone->contiguous set where possible, especially, on memory unplug of DIMMs at zone boundaries. -------------------------------------------------------------------------- Zones are now properly shrunk when offlining memory blocks or when onlining failed. This allows to properly shrink zones on memory unplug even if the separate memory blocks of a DIMM were onlined to different zones or re-onlined to a different zone after offlining. Example: :/# cat /proc/zoneinfo Node 1, zone Movable spanned 0 present 0 managed 0 :/# echo "online_movable" > /sys/devices/system/memory/memory41/state :/# echo "online_movable" > /sys/devices/system/memory/memory43/state :/# cat /proc/zoneinfo Node 1, zone Movable spanned 98304 present 65536 managed 65536 :/# echo 0 > /sys/devices/system/memory/memory43/online :/# cat /proc/zoneinfo Node 1, zone Movable spanned 32768 present 32768 managed 32768 :/# echo 0 > /sys/devices/system/memory/memory41/online :/# cat /proc/zoneinfo Node 1, zone Movable spanned 0 present 0 managed 0 This patch (of 6): The third argument is actually number of pages. Change the variable name from size to nr_pages to indicate this better. No functional change in this patch. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Aneesh Kumar K.V <[email protected]> Signed-off-by: David Hildenbrand <[email protected]> Reviewed-by: Pankaj Gupta <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Cc: Michal Hocko <[email protected]> Cc: "Matthew Wilcox (Oracle)" <[email protected]> Cc: "Aneesh Kumar K.V" <[email protected]> Cc: Pavel Tatashin <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Dan Williams <[email protected]> Cc: Logan Gunthorpe <[email protected]> Cc: Oscar Salvador <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-02-04mm: factor out next_present_section_nr()David Hildenbrand3-19/+12
Let's move it to the header and use the shorter variant from mm/page_alloc.c (the original one will also check "__highest_present_section_nr + 1", which is not necessary). While at it, make the section_nr in next_pfn() const. In next_pfn(), we now return section_nr_to_pfn(-1) instead of -1 once we exceed __highest_present_section_nr, which doesn't make a difference in the caller as it is big enough (>= all sane end_pfn). Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Cc: Baoquan He <[email protected]> Cc: Dan Williams <[email protected]> Cc: "Jin, Zhi" <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Pavel Tatashin <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-02-04mm/page_alloc: fix and rework pfn handling in memmap_init_zone()David Hildenbrand1-3/+6
Let's update the pfn manually whenever we continue the loop. This makes the code easier to read but also less error prone (and we can directly fix one issue). When overlap_memmap_init() returns true, pfn is updated to "memblock_region_memory_end_pfn(r)". So it already points at the *next* pfn to process. Incrementing the pfn another time is wrong, we might leave one uninitialized. I spotted this by inspecting the code, so I have no idea if this is relevant in practise (with kernelcore=mirror). Link: http://lkml.kernel.org/r/[email protected] Fixes: a9a9e77fbf27 ("mm: move mirrored memory specific code outside of memmap_init_zone") Signed-off-by: David Hildenbrand <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Reviewed-by: Alexander Duyck <[email protected]> Cc: Pavel Tatashin <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Baoquan He <[email protected]> Cc: Dan Williams <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Mel Gorman <[email protected]> Cc: "Jin, Zhi" <[email protected]> Cc: Michal Hocko <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-02-04mm/page_alloc.c: initialize memmap of unavailable memory directlyDavid Hildenbrand2-17/+22
Let's make sure that all memory holes are actually marked PageReserved(), that page_to_pfn() produces reliable results, and that these pages are not detected as "mmap" pages due to the mapcount. E.g., booting a x86-64 QEMU guest with 4160 MB: [ 0.010585] Early memory node ranges [ 0.010586] node 0: [mem 0x0000000000001000-0x000000000009efff] [ 0.010588] node 0: [mem 0x0000000000100000-0x00000000bffdefff] [ 0.010589] node 0: [mem 0x0000000100000000-0x0000000143ffffff] max_pfn is 0x144000. Before this change: [root@localhost ~]# ./page-types -r -a 0x144000, flags page-count MB symbolic-flags long-symbolic-flags 0x0000000000000800 16384 64 ___________M_______________________________ mmap total 16384 64 After this change: [root@localhost ~]# ./page-types -r -a 0x144000, flags page-count MB symbolic-flags long-symbolic-flags 0x0000000100000000 16384 64 ___________________________r_______________ reserved total 16384 64 IOW, especially the unavailable physical memory ("memory hole") in the last section would not get properly marked PageReserved() and is indicated to be "mmap" memory. Drop the trace of that function from include/linux/mm.h - nobody else needs it, and rename it accordingly. Note: The fake zone/node might not be covered by the zone/node span. This is not an urgent issue (for now, we had the same node/zone due to the zeroing). We'll need a clean way to mark memory holes (e.g., using a page type PageHole() if possible or a fake ZONE_INVALID) and eventually stop marking these memory holes PageReserved(). Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Dan Williams <[email protected]> Cc: Alexey Dobriyan <[email protected]> Cc: Bob Picco <[email protected]> Cc: Daniel Jordan <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: Pavel Tatashin <[email protected]> Cc: Stephen Rothwell <[email protected]> Cc: Steven Sistare <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-02-04fs/proc/page.c: allow inspection of last section and fix end detectionDavid Hildenbrand1-3/+27
If max_pfn does not fall onto a section boundary, it is possible to inspect PFNs up to max_pfn, and PFNs above max_pfn, however, max_pfn itself can't be inspected. We can have a valid (and online) memmap at and above max_pfn if max_pfn is not aligned to a section boundary. The whole early section has a memmap and is marked online. Being able to inspect the state of these PFNs is valuable for debugging, especially because max_pfn can change on memory hotplug and expose these memmaps. Also, querying page flags via "./page-types -r -a 0x144001," (tools/vm/page-types.c) inside a x86-64 guest with 4160MB under QEMU results in an (almost) endless loop in user space, because the end is not detected properly when starting after max_pfn. Instead, let's allow to inspect all pages in the highest section and return 0 directly if we try to access pages above that section. While at it, check the count before adjusting it, to avoid masking user errors. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Cc: Alexey Dobriyan <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Stephen Rothwell <[email protected]> Cc: Bob Picco <[email protected]> Cc: Daniel Jordan <[email protected]> Cc: Dan Williams <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: Pavel Tatashin <[email protected]> Cc: Steven Sistare <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-02-04mm/page_alloc.c: fix uninitialized memmaps on a partially populated last sectionDavid Hildenbrand1-2/+12
Patch series "mm: fix max_pfn not falling on section boundary", v2. Playing with different memory sizes for a x86-64 guest, I discovered that some memmaps (highest section if max_mem does not fall on the section boundary) are marked as being valid and online, but contain garbage. We have to properly initialize these memmaps. Looking at /proc/kpageflags and friends, I found some more issues, partially related to this. This patch (of 3): If max_pfn is not aligned to a section boundary, we can easily run into BUGs. This can e.g., be triggered on x86-64 under QEMU by specifying a memory size that is not a multiple of 128MB (e.g., 4097MB, but also 4160MB). I was told that on real HW, we can easily have this scenario (esp., one of the main reasons sub-section hotadd of devmem was added). The issue is, that we have a valid memmap (pfn_valid()) for the whole section, and the whole section will be marked "online". pfn_to_online_page() will succeed, but the memmap contains garbage. E.g., doing a "./page-types -r -a 0x144001" when QEMU was started with "-m 4160M" - (see tools/vm/page-types.c): [ 200.476376] BUG: unable to handle page fault for address: fffffffffffffffe [ 200.477500] #PF: supervisor read access in kernel mode [ 200.478334] #PF: error_code(0x0000) - not-present page [ 200.479076] PGD 59614067 P4D 59614067 PUD 59616067 PMD 0 [ 200.479557] Oops: 0000 [#4] SMP NOPTI [ 200.479875] CPU: 0 PID: 603 Comm: page-types Tainted: G D W 5.5.0-rc1-next-20191209 #93 [ 200.480646] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu4 [ 200.481648] RIP: 0010:stable_page_flags+0x4d/0x410 [ 200.482061] Code: f3 ff 41 89 c0 48 b8 00 00 00 00 01 00 00 00 45 84 c0 0f 85 cd 02 00 00 48 8b 53 08 48 8b 2b 48f [ 200.483644] RSP: 0018:ffffb139401cbe60 EFLAGS: 00010202 [ 200.484091] RAX: fffffffffffffffe RBX: fffffbeec5100040 RCX: 0000000000000000 [ 200.484697] RDX: 0000000000000001 RSI: ffffffff9535c7cd RDI: 0000000000000246 [ 200.485313] RBP: ffffffffffffffff R08: 0000000000000000 R09: 0000000000000000 [ 200.485917] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000144001 [ 200.486523] R13: 00007ffd6ba55f48 R14: 00007ffd6ba55f40 R15: ffffb139401cbf08 [ 200.487130] FS: 00007f68df717580(0000) GS:ffff9ec77fa00000(0000) knlGS:0000000000000000 [ 200.487804] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 200.488295] CR2: fffffffffffffffe CR3: 0000000135d48000 CR4: 00000000000006f0 [ 200.488897] Call Trace: [ 200.489115] kpageflags_read+0xe9/0x140 [ 200.489447] proc_reg_read+0x3c/0x60 [ 200.489755] vfs_read+0xc2/0x170 [ 200.490037] ksys_pread64+0x65/0xa0 [ 200.490352] do_syscall_64+0x5c/0xa0 [ 200.490665] entry_SYSCALL_64_after_hwframe+0x49/0xbe But it can be triggered much easier via "cat /proc/kpageflags > /dev/null" after cold/hot plugging a DIMM to such a system: [root@localhost ~]# cat /proc/kpageflags > /dev/null [ 111.517275] BUG: unable to handle page fault for address: fffffffffffffffe [ 111.517907] #PF: supervisor read access in kernel mode [ 111.518333] #PF: error_code(0x0000) - not-present page [ 111.518771] PGD a240e067 P4D a240e067 PUD a2410067 PMD 0 This patch fixes that by at least zero-ing out that memmap (so e.g., page_to_pfn() will not crash). Commit 907ec5fca3dc ("mm: zero remaining unavailable struct pages") tried to fix a similar issue, but forgot to consider this special case. After this patch, there are still problems to solve. E.g., not all of these pages falling into a memory hole will actually get initialized later and set PageReserved - they are only zeroed out - but at least the immediate crashes are gone. A follow-up patch will take care of this. Link: http://lkml.kernel.org/r/[email protected] Fixes: f7f99100d8d9 ("mm: stop zeroing memory during allocation in vmemmap") Signed-off-by: David Hildenbrand <[email protected]> Tested-by: Daniel Jordan <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: Pavel Tatashin <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Steven Sistare <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Daniel Jordan <[email protected]> Cc: Bob Picco <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Alexey Dobriyan <[email protected]> Cc: Dan Williams <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Stephen Rothwell <[email protected]> Cc: <[email protected]> [4.15+] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-02-04ocfs2: fix oops when writing cloned fileGang He1-8/+6
Writing a cloned file triggers a kernel oops and the user-space command process is also killed by the system. The bug can be reproduced stably via: 1) create a file under ocfs2 file system directory. journalctl -b > aa.txt 2) create a cloned file for this file. reflink aa.txt bb.txt 3) write the cloned file with dd command. dd if=/dev/zero of=bb.txt bs=512 count=1 conv=notrunc The dd command is killed by the kernel, then you can see the oops message via dmesg command. [ 463.875404] BUG: kernel NULL pointer dereference, address: 0000000000000028 [ 463.875413] #PF: supervisor read access in kernel mode [ 463.875416] #PF: error_code(0x0000) - not-present page [ 463.875418] PGD 0 P4D 0 [ 463.875425] Oops: 0000 [#1] SMP PTI [ 463.875431] CPU: 1 PID: 2291 Comm: dd Tainted: G OE 5.3.16-2-default [ 463.875433] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 [ 463.875500] RIP: 0010:ocfs2_refcount_cow+0xa4/0x5d0 [ocfs2] [ 463.875505] Code: 06 89 6c 24 38 89 eb f6 44 24 3c 02 74 be 49 8b 47 28 [ 463.875508] RSP: 0018:ffffa2cb409dfce8 EFLAGS: 00010202 [ 463.875512] RAX: ffff8b1ebdca8000 RBX: 0000000000000001 RCX: ffff8b1eb73a9df0 [ 463.875515] RDX: 0000000000056a01 RSI: 0000000000000000 RDI: 0000000000000000 [ 463.875517] RBP: 0000000000000001 R08: ffff8b1eb73a9de0 R09: 0000000000000000 [ 463.875520] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000 [ 463.875522] R13: ffff8b1eb922f048 R14: 0000000000000000 R15: ffff8b1eb922f048 [ 463.875526] FS: 00007f8f44d15540(0000) GS:ffff8b1ebeb00000(0000) knlGS:0000000000000000 [ 463.875529] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 463.875532] CR2: 0000000000000028 CR3: 000000003c17a000 CR4: 00000000000006e0 [ 463.875546] Call Trace: [ 463.875596] ? ocfs2_inode_lock_full_nested+0x18b/0x960 [ocfs2] [ 463.875648] ocfs2_file_write_iter+0xaf8/0xc70 [ocfs2] [ 463.875672] new_sync_write+0x12d/0x1d0 [ 463.875688] vfs_write+0xad/0x1a0 [ 463.875697] ksys_write+0xa1/0xe0 [ 463.875710] do_syscall_64+0x60/0x1f0 [ 463.875743] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 463.875758] RIP: 0033:0x7f8f4482ed44 [ 463.875762] Code: 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 80 00 00 00 [ 463.875765] RSP: 002b:00007fff300a79d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [ 463.875769] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f8f4482ed44 [ 463.875771] RDX: 0000000000000200 RSI: 000055f771b5c000 RDI: 0000000000000001 [ 463.875774] RBP: 0000000000000200 R08: 00007f8f44af9c78 R09: 0000000000000003 [ 463.875776] R10: 000000000000089f R11: 0000000000000246 R12: 000055f771b5c000 [ 463.875779] R13: 0000000000000200 R14: 0000000000000000 R15: 000055f771b5c000 This regression problem was introduced by commit e74540b28556 ("ocfs2: protect extent tree in ocfs2_prepare_inode_for_write()"). Link: http://lkml.kernel.org/r/[email protected] Fixes: e74540b28556 ("ocfs2: protect extent tree in ocfs2_prepare_inode_for_write()"). Signed-off-by: Gang He <[email protected]> Reviewed-by: Joseph Qi <[email protected]> Cc: Mark Fasheh <[email protected]> Cc: Joel Becker <[email protected]> Cc: Junxiao Bi <[email protected]> Cc: Changwei Ge <[email protected]> Cc: Jun Piao <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-02-03aio: prevent potential eventfd recursion on pollJens Axboe1-2/+18
If we have nested or circular eventfd wakeups, then we can deadlock if we run them inline from our poll waitqueue wakeup handler. It's also possible to have very long chains of notifications, to the extent where we could risk blowing the stack. Check the eventfd recursion count before calling eventfd_signal(). If it's non-zero, then punt the signaling to async context. This is always safe, as it takes us out-of-line in terms of stack and locking context. Cc: [email protected] # 4.19+ Reviewed-by: Jeff Moyer <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2020-02-03io_uring: put the flag changing code in the same spotPavel Begunkov1-5/+4
Both iocb_flags() and kiocb_set_rw_flags() are inline and modify kiocb->ki_flags. Place them close, so they can be potentially better optimised. Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2020-02-03io_uring: iterate req cache backwardsPavel Begunkov1-8/+4
Grab requests from cache-array from the end, so can get by only free_reqs. Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2020-02-03io_uring: punt even fadvise() WILLNEED to async contextJens Axboe1-3/+10
Andres correctly points out that read-ahead can block, if it needs to read in meta data (or even just through the page cache page allocations). Play it safe for now and just ensure WILLNEED is also punted to async context. While in there, allow the file settings hints from non-blocking context. They don't need to start/do IO, and we can safely do them inline. Fixes: 4840e418c2fc ("io_uring: add IORING_OP_FADVISE") Reported-by: Andres Freund <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2020-02-03io_uring: fix sporadic double CQE entry for closeJens Axboe1-5/+8
We punt close to async for the final fput(), but we log the completion even before that even in that case. We rely on the request not having a files table assigned to detect what the final async close should do. However, if we punt the async queue to __io_queue_sqe(), we'll get ->files assigned and this makes io_close_finish() think it should both close the filp again (which does no harm) AND log a new CQE event for this request. This causes duplicate CQEs. Queue the request up for async manually so we don't grab files needlessly and trigger this condition. Signed-off-by: Jens Axboe <[email protected]>
2020-02-03io_uring: remove extra ->file checkPavel Begunkov1-3/+0
It won't ever get into io_prep_rw() when req->file haven't been set in io_req_set_file(), hence remove the check. Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2020-02-03io_uring: don't map read/write iovec potentially twiceJens Axboe1-3/+5
If we have a read/write that is deferred, we already setup the async IO context for that request, and mapped it. When we later try and execute the request and we get -EAGAIN, we don't want to attempt to re-map it. If we do, we end up with garbage in the iovec, which typically leads to an -EFAULT or -EINVAL completion. Cc: [email protected] # 5.5 Reported-by: Dan Melnic <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2020-02-03io_uring: use the proper helpers for io_send/recvJens Axboe1-2/+4
Don't use the recvmsg/sendmsg helpers, use the same helpers that the recv(2) and send(2) system calls use. Reported-by: 李通洲 <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2020-02-03io_uring: prevent potential eventfd recursion on pollJens Axboe1-7/+30
If we have nested or circular eventfd wakeups, then we can deadlock if we run them inline from our poll waitqueue wakeup handler. It's also possible to have very long chains of notifications, to the extent where we could risk blowing the stack. Check the eventfd recursion count before calling eventfd_signal(). If it's non-zero, then punt the signaling to async context. This is always safe, as it takes us out-of-line in terms of stack and locking context. Cc: [email protected] # 5.1+ Signed-off-by: Jens Axboe <[email protected]>
2020-02-03eventfd: track eventfd_signal() recursion depthJens Axboe2-0/+29
eventfd use cases from aio and io_uring can deadlock due to circular or resursive calling, when eventfd_signal() tries to grab the waitqueue lock. On top of that, it's also possible to construct notification chains that are deep enough that we could blow the stack. Add a percpu counter that tracks the percpu recursion depth, warn if we exceed it. The counter is also exposed so that users of eventfd_signal() can do the right thing if it's non-zero in the context where it is called. Cc: [email protected] # 4.19+ Signed-off-by: Jens Axboe <[email protected]>
2020-02-03Merge branch 'netdevsim-fix-several-bugs-in-netdevsim-module'Jakub Kicinski6-91/+93
Taehee Yoo says: ===================== netdevsim: fix several bugs in netdevsim module This patchset fixes several bugs in netdevsim module. 1. The first patch fixes using uninitialized resources This patch fixes two similar problems, which is to use uninitialized resources. a) In the current code, {new/del}_device_store() use resource, they are initialized by __init(). But, these functions could be called before __init() is finished. So, accessing uninitialized data could occur and it eventually makes panic. b) In the current code, {new/del}_port_store() uses resource, they are initialized by new_device_store(). But thes functions could be called before new_device_store() is finished. 2. The second patch fixes another race condition. The main problem is a race condition in {new/del}_port() and devlink reload function. These functions would allocate and remove resources. So these functions should not be executed concurrently. 3. The third patch fixes a panic in nsim_dev_take_snapshot_write(). nsim_dev_take_snapshot_write() uses nsim_dev and nsim_dev->dummy_region. But these data could be removed by both reload routine and del_device_store(). And these functions could be executed concurrently. 4. The fourth patch fixes stack-out-of-bound in nsim_dev_debugfs_init(). nsim_dev_debugfs_init() provides only 16bytes for name pointer. But, there are some case the name length is over 16bytes. So, stack-out-of-bound occurs. 5. The fifth patch uses IS_ERR instead of IS_ERR_OR_NULL. debugfs_create_{dir/file} doesn't return NULL. So, IS_ERR() is more correct. 6. The sixth patch avoids kmalloc warning. When too large memory allocation is requested by user-space, kmalloc internally prints warning message. That warning message is not necessary. In order to avoid that, it adds __GFP_NOWARN. 7. The last patch removes an unused sdev.c file Change log: v2 -> v3: - Use smp_load_acquire() and smp_store_release() for flag variables. - Change variable names. - Fix deadlock in second patch. - Update lock variable comment. - Add new patch for fixing panic in snapshot_write(). - Include Reviewed-by tags. - Update some log messages and comment. v1 -> v2: - Splits a fixing race condition patch into two patches. - Fix incorrect Fixes tags. - Update comments - Fix use-after-free - Add a new patch, which removes an unused sdev.c file. - Remove a patch, which tries to avoid debugfs warning. ===================== Signed-off-by: Jakub Kicinski <[email protected]>
2020-02-03netdevsim: remove unused sdev codeTaehee Yoo1-69/+0
sdev.c code is merged into dev.c and is not used anymore. it would be removed. Reviewed-by: Jakub Kicinski <[email protected]> Signed-off-by: Taehee Yoo <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2020-02-03netdevsim: use __GFP_NOWARN to avoid memalloc warningTaehee Yoo2-2/+2
vfnum buffer size and binary_len buffer size is received by user-space. So, this buffer size could be too large. If so, kmalloc will internally print a warning message. This warning message is actually not necessary for the netdevsim module. So, this patch adds __GFP_NOWARN. Test commands: modprobe netdevsim echo 1 > /sys/bus/netdevsim/new_device echo 1000000000 > /sys/devices/netdevsim1/sriov_numvfs Splat looks like: [ 357.847266][ T1000] WARNING: CPU: 0 PID: 1000 at mm/page_alloc.c:4738 __alloc_pages_nodemask+0x2f3/0x740 [ 357.850273][ T1000] Modules linked in: netdevsim veth openvswitch nsh nf_conncount nf_nat nf_conntrack nf_defrx [ 357.852989][ T1000] CPU: 0 PID: 1000 Comm: bash Tainted: G B 5.5.0-rc5+ #270 [ 357.854334][ T1000] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 357.855703][ T1000] RIP: 0010:__alloc_pages_nodemask+0x2f3/0x740 [ 357.856669][ T1000] Code: 64 fe ff ff 65 48 8b 04 25 c0 0f 02 00 48 05 f0 12 00 00 41 be 01 00 00 00 49 89 47 0 [ 357.860272][ T1000] RSP: 0018:ffff8880b7f47bd8 EFLAGS: 00010246 [ 357.861009][ T1000] RAX: ffffed1016fe8f80 RBX: 1ffff11016fe8fae RCX: 0000000000000000 [ 357.861843][ T1000] RDX: 0000000000000000 RSI: 0000000000000017 RDI: 0000000000000000 [ 357.862661][ T1000] RBP: 0000000000040dc0 R08: 1ffff11016fe8f67 R09: dffffc0000000000 [ 357.863509][ T1000] R10: ffff8880b7f47d68 R11: fffffbfff2798180 R12: 1ffff11016fe8f80 [ 357.864355][ T1000] R13: 0000000000000017 R14: 0000000000000017 R15: ffff8880c2038d68 [ 357.865178][ T1000] FS: 00007fd9a5b8c740(0000) GS:ffff8880d9c00000(0000) knlGS:0000000000000000 [ 357.866248][ T1000] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 357.867531][ T1000] CR2: 000055ce01ba8100 CR3: 00000000b7dbe005 CR4: 00000000000606f0 [ 357.868972][ T1000] Call Trace: [ 357.869423][ T1000] ? lock_contended+0xcd0/0xcd0 [ 357.870001][ T1000] ? __alloc_pages_slowpath+0x21d0/0x21d0 [ 357.870673][ T1000] ? _kstrtoull+0x76/0x160 [ 357.871148][ T1000] ? alloc_pages_current+0xc1/0x1a0 [ 357.871704][ T1000] kmalloc_order+0x22/0x80 [ 357.872184][ T1000] kmalloc_order_trace+0x1d/0x140 [ 357.872733][ T1000] __kmalloc+0x302/0x3a0 [ 357.873204][ T1000] nsim_bus_dev_numvfs_store+0x1ab/0x260 [netdevsim] [ 357.873919][ T1000] ? kernfs_get_active+0x12c/0x180 [ 357.874459][ T1000] ? new_device_store+0x450/0x450 [netdevsim] [ 357.875111][ T1000] ? kernfs_get_parent+0x70/0x70 [ 357.875632][ T1000] ? sysfs_file_ops+0x160/0x160 [ 357.876152][ T1000] kernfs_fop_write+0x276/0x410 [ 357.876680][ T1000] ? __sb_start_write+0x1ba/0x2e0 [ 357.877225][ T1000] vfs_write+0x197/0x4a0 [ 357.877671][ T1000] ksys_write+0x141/0x1d0 [ ... ] Reviewed-by: Jakub Kicinski <[email protected]> Fixes: 79579220566c ("netdevsim: add SR-IOV functionality") Fixes: 82c93a87bf8b ("netdevsim: implement couple of testing devlink health reporters") Signed-off-by: Taehee Yoo <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2020-02-03netdevsim: use IS_ERR instead of IS_ERR_OR_NULL for debugfsTaehee Yoo3-14/+16
Debugfs APIs return valid pointer or error pointer. it doesn't return NULL. So, using IS_ERR is enough, not using IS_ERR_OR_NULL. Reviewed-by: Jakub Kicinski <[email protected]> Reported-by: kbuild test robot <[email protected]> Reported-by: Dan Carpenter <[email protected]> Signed-off-by: Taehee Yoo <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2020-02-03netdevsim: fix stack-out-of-bounds in nsim_dev_debugfs_init()Taehee Yoo1-1/+1
When netdevsim dev is being created, a debugfs directory is created. The variable "dev_ddir_name" is 16bytes device name pointer and device name is "netdevsim<dev id>". The maximum dev id length is 10. So, 16bytes for device name isn't enough. Test commands: modprobe netdevsim echo "1000000000 0" > /sys/bus/netdevsim/new_device Splat looks like: [ 249.622710][ T900] BUG: KASAN: stack-out-of-bounds in number+0x824/0x880 [ 249.623658][ T900] Write of size 1 at addr ffff88804c527988 by task bash/900 [ 249.624521][ T900] [ 249.624830][ T900] CPU: 1 PID: 900 Comm: bash Not tainted 5.5.0+ #322 [ 249.625691][ T900] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 249.626712][ T900] Call Trace: [ 249.627103][ T900] dump_stack+0x96/0xdb [ 249.627639][ T900] ? number+0x824/0x880 [ 249.628173][ T900] print_address_description.constprop.5+0x1be/0x360 [ 249.629022][ T900] ? number+0x824/0x880 [ 249.629569][ T900] ? number+0x824/0x880 [ 249.630105][ T900] __kasan_report+0x12a/0x170 [ 249.630717][ T900] ? number+0x824/0x880 [ 249.631201][ T900] kasan_report+0xe/0x20 [ 249.631723][ T900] number+0x824/0x880 [ 249.632235][ T900] ? put_dec+0xa0/0xa0 [ 249.632716][ T900] ? rcu_read_lock_sched_held+0x90/0xc0 [ 249.633392][ T900] vsnprintf+0x63c/0x10b0 [ 249.633983][ T900] ? pointer+0x5b0/0x5b0 [ 249.634543][ T900] ? mark_lock+0x11d/0xc40 [ 249.635200][ T900] sprintf+0x9b/0xd0 [ 249.635750][ T900] ? scnprintf+0xe0/0xe0 [ 249.636370][ T900] nsim_dev_probe+0x63c/0xbf0 [netdevsim] [ ... ] Reviewed-by: Jakub Kicinski <[email protected]> Fixes: ab1d0cc004d7 ("netdevsim: change debugfs tree topology") Signed-off-by: Taehee Yoo <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2020-02-03netdevsim: fix panic in nsim_dev_take_snapshot_write()Taehee Yoo2-2/+12
nsim_dev_take_snapshot_write() uses nsim_dev and nsim_dev->dummy_region. So, during this function, these data shouldn't be removed. But there is no protecting stuff in this function. There are two similar cases. 1. reload case reload could be called during nsim_dev_take_snapshot_write(). When reload is being executed, nsim_dev_reload_down() is called and it calls nsim_dev_reload_destroy(). nsim_dev_reload_destroy() calls devlink_region_destroy() to destroy nsim_dev->dummy_region. So, during nsim_dev_take_snapshot_write(), nsim_dev->dummy_region() would be removed. At this point, snapshot_write() would access freed pointer. In order to fix this case, take_snapshot file will be removed before devlink_region_destroy(). The take_snapshot file will be re-created by ->reload_up(). 2. del_device_store case del_device_store() also could call nsim_dev_reload_destroy() during nsim_dev_take_snapshot_write(). If so, panic would occur. This problem is actually the same problem with the first case. So, this problem will be fixed by the first case's solution. Test commands: modprobe netdevsim while : do echo 1 > /sys/bus/netdevsim/new_device & echo 1 > /sys/bus/netdevsim/del_device & devlink dev reload netdevsim/netdevsim1 & echo 1 > /sys/kernel/debug/netdevsim/netdevsim1/take_snapshot & done Splat looks like: [ 45.564513][ T975] general protection fault, probably for non-canonical address 0xdffffc000000003a: 0000 [#1] SMP DEI [ 45.566131][ T975] KASAN: null-ptr-deref in range [0x00000000000001d0-0x00000000000001d7] [ 45.566135][ T975] CPU: 1 PID: 975 Comm: bash Not tainted 5.5.0+ #322 [ 45.569020][ T975] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 45.569026][ T975] RIP: 0010:__mutex_lock+0x10a/0x14b0 [ 45.570518][ T975] Code: 08 84 d2 0f 85 7f 12 00 00 44 8b 0d 10 23 65 02 45 85 c9 75 29 49 8d 7f 68 48 b8 00 00 00 0f [ 45.570522][ T975] RSP: 0018:ffff888046ccfbf0 EFLAGS: 00010206 [ 45.572305][ T975] RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000 [ 45.572308][ T975] RDX: 000000000000003a RSI: ffffffffac926440 RDI: 00000000000001d0 [ 45.576843][ T975] RBP: ffff888046ccfd70 R08: ffffffffab610645 R09: 0000000000000000 [ 45.576847][ T975] R10: ffff888046ccfd90 R11: ffffed100d6360ad R12: 0000000000000000 [ 45.578471][ T975] R13: dffffc0000000000 R14: ffffffffae1976c0 R15: 0000000000000168 [ 45.578475][ T975] FS: 00007f614d6e7740(0000) GS:ffff88806c400000(0000) knlGS:0000000000000000 [ 45.581492][ T975] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 45.582942][ T975] CR2: 00005618677d1cf0 CR3: 000000005fb9c002 CR4: 00000000000606e0 [ 45.584543][ T975] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 45.586633][ T975] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 45.589889][ T975] Call Trace: [ 45.591445][ T975] ? devlink_region_snapshot_create+0x55/0x4a0 [ 45.601250][ T975] ? mutex_lock_io_nested+0x1380/0x1380 [ 45.602817][ T975] ? mutex_lock_io_nested+0x1380/0x1380 [ 45.603875][ T975] ? mark_held_locks+0xa5/0xe0 [ 45.604769][ T975] ? _raw_spin_unlock_irqrestore+0x2d/0x50 [ 45.606147][ T975] ? __mutex_unlock_slowpath+0xd0/0x670 [ 45.607723][ T975] ? crng_backtrack_protect+0x80/0x80 [ 45.613530][ T975] ? wait_for_completion+0x390/0x390 [ 45.615152][ T975] ? devlink_region_snapshot_create+0x55/0x4a0 [ 45.616834][ T975] devlink_region_snapshot_create+0x55/0x4a0 [ ... ] Fixes: 4418f862d675 ("netdevsim: implement support for devlink region and snapshots") Signed-off-by: Taehee Yoo <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2020-02-03netdevsim: disable devlink reload when resources are being usedTaehee Yoo2-0/+21
devlink reload destroys resources and allocates resources again. So, when devices and ports resources are being used, devlink reload function should not be executed. In order to avoid this race, a new lock is added and new_port() and del_port() call devlink_reload_disable() and devlink_reload_enable(). Thread0 Thread1 {new/del}_port() {new/del}_port() devlink_reload_disable() devlink_reload_disable() devlink_reload_enable() //here devlink_reload_enable() Before Thread1's devlink_reload_enable(), the devlink is already allowed to execute reload because Thread0 allows it. devlink reload disable/enable variable type is bool. So the above case would exist. So, disable/enable should be executed atomically. In order to do that, a new lock is used. Test commands: modprobe netdevsim echo 1 > /sys/bus/netdevsim/new_device while : do echo 1 > /sys/devices/netdevsim1/new_port & echo 1 > /sys/devices/netdevsim1/del_port & devlink dev reload netdevsim/netdevsim1 & done Splat looks like: [ 23.342145][ T932] DEBUG_LOCKS_WARN_ON(mutex_is_locked(lock)) [ 23.342159][ T932] WARNING: CPU: 0 PID: 932 at kernel/locking/mutex-debug.c:103 mutex_destroy+0xc7/0xf0 [ 23.344182][ T932] Modules linked in: netdevsim openvswitch nsh nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_dx [ 23.346485][ T932] CPU: 0 PID: 932 Comm: devlink Not tainted 5.5.0+ #322 [ 23.347696][ T932] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 23.348893][ T932] RIP: 0010:mutex_destroy+0xc7/0xf0 [ 23.349505][ T932] Code: e0 07 83 c0 03 38 d0 7c 04 84 d2 75 2e 8b 05 00 ac b0 02 85 c0 75 8b 48 c7 c6 00 5e 07 96 40 [ 23.351887][ T932] RSP: 0018:ffff88806208f810 EFLAGS: 00010286 [ 23.353963][ T932] RAX: dffffc0000000008 RBX: ffff888067f6f2c0 RCX: ffffffff942c4bd4 [ 23.355222][ T932] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff96dac5b4 [ 23.356169][ T932] RBP: ffff888067f6f000 R08: fffffbfff2d235a5 R09: fffffbfff2d235a5 [ 23.357160][ T932] R10: 0000000000000001 R11: fffffbfff2d235a4 R12: ffff888067f6f208 [ 23.358288][ T932] R13: ffff88806208fa70 R14: ffff888067f6f000 R15: ffff888069ce3800 [ 23.359307][ T932] FS: 00007fe2a3876740(0000) GS:ffff88806c000000(0000) knlGS:0000000000000000 [ 23.360473][ T932] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 23.361319][ T932] CR2: 00005561357aa000 CR3: 000000005227a006 CR4: 00000000000606f0 [ 23.362323][ T932] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 23.363417][ T932] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 23.364414][ T932] Call Trace: [ 23.364828][ T932] nsim_dev_reload_destroy+0x77/0xb0 [netdevsim] [ 23.365655][ T932] nsim_dev_reload_down+0x84/0xb0 [netdevsim] [ 23.366433][ T932] devlink_reload+0xb1/0x350 [ 23.367010][ T932] genl_rcv_msg+0x580/0xe90 [ ...] [ 23.531729][ T1305] kernel BUG at lib/list_debug.c:53! [ 23.532523][ T1305] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI [ 23.533467][ T1305] CPU: 2 PID: 1305 Comm: bash Tainted: G W 5.5.0+ #322 [ 23.534962][ T1305] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 23.536503][ T1305] RIP: 0010:__list_del_entry_valid+0xe6/0x150 [ 23.538346][ T1305] Code: 89 ea 48 c7 c7 00 73 1e 96 e8 df f7 4c ff 0f 0b 48 c7 c7 60 73 1e 96 e8 d1 f7 4c ff 0f 0b 44 [ 23.541068][ T1305] RSP: 0018:ffff888047c27b58 EFLAGS: 00010282 [ 23.542001][ T1305] RAX: 0000000000000054 RBX: ffff888067f6f318 RCX: 0000000000000000 [ 23.543051][ T1305] RDX: 0000000000000054 RSI: 0000000000000008 RDI: ffffed1008f84f61 [ 23.544072][ T1305] RBP: ffff88804aa0fca0 R08: ffffed100d940539 R09: ffffed100d940539 [ 23.545085][ T1305] R10: 0000000000000001 R11: ffffed100d940538 R12: ffff888047c27cb0 [ 23.546422][ T1305] R13: ffff88806208b840 R14: ffffffff981976c0 R15: ffff888067f6f2c0 [ 23.547406][ T1305] FS: 00007f76c0431740(0000) GS:ffff88806c800000(0000) knlGS:0000000000000000 [ 23.548527][ T1305] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 23.549389][ T1305] CR2: 00007f5048f1a2f8 CR3: 000000004b310006 CR4: 00000000000606e0 [ 23.550636][ T1305] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 23.551578][ T1305] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 23.552597][ T1305] Call Trace: [ 23.553004][ T1305] mutex_remove_waiter+0x101/0x520 [ 23.553646][ T1305] __mutex_lock+0xac7/0x14b0 [ 23.554218][ T1305] ? nsim_dev_port_del+0x4e/0x140 [netdevsim] [ 23.554908][ T1305] ? mutex_lock_io_nested+0x1380/0x1380 [ 23.555570][ T1305] ? _parse_integer+0xf0/0xf0 [ 23.556043][ T1305] ? kstrtouint+0x86/0x110 [ 23.556504][ T1305] ? nsim_dev_port_del+0x4e/0x140 [netdevsim] [ 23.557133][ T1305] nsim_dev_port_del+0x4e/0x140 [netdevsim] [ 23.558024][ T1305] del_port_store+0xcc/0xf0 [netdevsim] [ ... ] Fixes: 75ba029f3c07 ("netdevsim: implement proper devlink reload") Signed-off-by: Taehee Yoo <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2020-02-03netdevsim: fix using uninitialized resourcesTaehee Yoo2-3/+41
When module is being initialized, __init() calls bus_register() and driver_register(). These functions internally create various resources and sysfs files. The sysfs files are used for basic operations(add/del device). /sys/bus/netdevsim/new_device /sys/bus/netdevsim/del_device These sysfs files use netdevsim resources, they are mostly allocated and initialized in ->probe() function, which is nsim_dev_probe(). But, sysfs files could be executed before ->probe() is finished. So, accessing uninitialized data would occur. Another problem is very similar. /sys/bus/netdevsim/new_device internally creates sysfs files. /sys/devices/netdevsim<id>/new_port /sys/devices/netdevsim<id>/del_port These sysfs files also use netdevsim resources, they are mostly allocated and initialized in creating device routine, which is nsim_bus_dev_new(). But they also could be executed before nsim_bus_dev_new() is finished. So, accessing uninitialized data would occur. To fix these problems, this patch adds flags, which means whether the operation is finished or not. The flag variable 'nsim_bus_enable' means whether netdevsim bus was initialized or not. This is protected by nsim_bus_dev_list_lock. The flag variable 'nsim_bus_dev->init' means whether nsim_bus_dev was initialized or not. This could be used in {new/del}_port_store() with no lock. Test commands: #SHELL1 modprobe netdevsim while : do echo "1 1" > /sys/bus/netdevsim/new_device echo "1 1" > /sys/bus/netdevsim/del_device done #SHELL2 while : do echo 1 > /sys/devices/netdevsim1/new_port echo 1 > /sys/devices/netdevsim1/del_port done Splat looks like: [ 47.508954][ T1008] general protection fault, probably for non-canonical address 0xdffffc0000000021: 0000 I [ 47.510793][ T1008] KASAN: null-ptr-deref in range [0x0000000000000108-0x000000000000010f] [ 47.511963][ T1008] CPU: 2 PID: 1008 Comm: bash Not tainted 5.5.0+ #322 [ 47.512823][ T1008] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 47.514041][ T1008] RIP: 0010:__mutex_lock+0x10a/0x14b0 [ 47.514699][ T1008] Code: 08 84 d2 0f 85 7f 12 00 00 44 8b 0d 10 23 65 02 45 85 c9 75 29 49 8d 7f 68 48 b8 00 00 00 0f [ 47.517163][ T1008] RSP: 0018:ffff888059b4fbb0 EFLAGS: 00010206 [ 47.517802][ T1008] RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000 [ 47.518941][ T1008] RDX: 0000000000000021 RSI: ffffffff85926440 RDI: 0000000000000108 [ 47.519732][ T1008] RBP: ffff888059b4fd30 R08: ffffffffc073fad0 R09: 0000000000000000 [ 47.520729][ T1008] R10: ffff888059b4fd50 R11: ffff88804bb38040 R12: 0000000000000000 [ 47.521702][ T1008] R13: dffffc0000000000 R14: ffffffff871976c0 R15: 00000000000000a0 [ 47.522760][ T1008] FS: 00007fd4be05a740(0000) GS:ffff88806c800000(0000) knlGS:0000000000000000 [ 47.523877][ T1008] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 47.524627][ T1008] CR2: 0000561c82b69cf0 CR3: 0000000065dd6004 CR4: 00000000000606e0 [ 47.527662][ T1008] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 47.528604][ T1008] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 47.529531][ T1008] Call Trace: [ 47.529874][ T1008] ? nsim_dev_port_add+0x50/0x150 [netdevsim] [ 47.530470][ T1008] ? mutex_lock_io_nested+0x1380/0x1380 [ 47.531018][ T1008] ? _kstrtoull+0x76/0x160 [ 47.531449][ T1008] ? _parse_integer+0xf0/0xf0 [ 47.531874][ T1008] ? kernfs_fop_write+0x1cf/0x410 [ 47.532330][ T1008] ? sysfs_file_ops+0x160/0x160 [ 47.532773][ T1008] ? kstrtouint+0x86/0x110 [ 47.533168][ T1008] ? nsim_dev_port_add+0x50/0x150 [netdevsim] [ 47.533721][ T1008] nsim_dev_port_add+0x50/0x150 [netdevsim] [ 47.534336][ T1008] ? sysfs_file_ops+0x160/0x160 [ 47.534858][ T1008] new_port_store+0x99/0xb0 [netdevsim] [ 47.535439][ T1008] ? del_port_store+0xb0/0xb0 [netdevsim] [ 47.536035][ T1008] ? sysfs_file_ops+0x112/0x160 [ 47.536544][ T1008] ? sysfs_kf_write+0x3b/0x180 [ 47.537029][ T1008] kernfs_fop_write+0x276/0x410 [ 47.537548][ T1008] ? __sb_start_write+0x215/0x2e0 [ 47.538110][ T1008] vfs_write+0x197/0x4a0 [ ... ] Fixes: f9d9db47d3ba ("netdevsim: add bus attributes to add new and delete devices") Fixes: 794b2c05ca1c ("netdevsim: extend device attrs to support port addition and deletion") Signed-off-by: Taehee Yoo <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2020-02-03Merge branch 'bnxt_en-Bug-fixes'Jakub Kicinski1-13/+24
Michael Chan says: ===================== bnxt_en: Bug fixes 3 patches that fix some issues in the firmware reset logic, starting with a small patch to refactor the code that re-enables SRIOV. The last patch fixes a TC queue mapping issue. ==================== Signed-off-by: Jakub Kicinski <[email protected]>
2020-02-03bnxt_en: Fix TC queue mapping.Michael Chan1-1/+1
The driver currently only calls netdev_set_tc_queue when the number of TCs is greater than 1. Instead, the comparison should be greater than or equal to 1. Even with 1 TC, we need to set the queue mapping. This bug can cause warnings when the number of TCs is changed back to 1. Fixes: 7809592d3e2e ("bnxt_en: Enable MSIX early in bnxt_init_one().") Signed-off-by: Michael Chan <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2020-02-03bnxt_en: Fix logic that disables Bus Master during firmware reset.Vasundhara Volam1-4/+7
The current logic that calls pci_disable_device() in __bnxt_close_nic() during firmware reset is flawed. If firmware is still alive, we're disabling the device too early, causing some firmware commands to not reach the firmware. Fix it by moving the logic to bnxt_reset_close(). If firmware is in fatal condition, we call pci_disable_device() before we free any of the rings to prevent DMA corruption of the freed rings. If firmware is still alive, we call pci_disable_device() after the last firmware message has been sent. Fixes: 3bc7d4a352ef ("bnxt_en: Add BNXT_STATE_IN_FW_RESET state.") Signed-off-by: Vasundhara Volam <[email protected]> Signed-off-by: Michael Chan <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2020-02-03bnxt_en: Fix RDMA driver failure with SRIOV after firmware reset.Michael Chan1-2/+5
bnxt_ulp_start() needs to be called before SRIOV is re-enabled after firmware reset. Re-enabling SRIOV may consume all the resources and may cause the RDMA driver to fail to get MSIX and other resources. Fix it by calling bnxt_ulp_start() first before calling bnxt_reenable_sriov(). We re-arrange the logic so that we call bnxt_ulp_start() and bnxt_reenable_sriov() in proper sequence in bnxt_fw_reset_task() and bnxt_open(). The former is the normal coordinated firmware reset sequence and the latter is firmware reset while the function is down. This new logic is now more straight forward and will now fix both scenarios. Fixes: f3a6d206c25a ("bnxt_en: Call bnxt_ulp_stop()/bnxt_ulp_start() during error recovery.") Reported-by: Vasundhara Volam <[email protected]> Signed-off-by: Michael Chan <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2020-02-03bnxt_en: Refactor logic to re-enable SRIOV after firmware reset detected.Michael Chan1-7/+12
Put the current logic in bnxt_open() to re-enable SRIOV after detecting firmware reset into a new function bnxt_reenable_sriov(). This call needs to be invoked in the firmware reset path also in the next patch. Signed-off-by: Michael Chan <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2020-02-03net: stmmac: Delete txtimer in suspend()Nicolin Chen1-0/+4
When running v5.5 with a rootfs on NFS, memory abort may happen in the system resume stage: Unable to handle kernel paging request at virtual address dead00000000012a [dead00000000012a] address between user and kernel address ranges pc : run_timer_softirq+0x334/0x3d8 lr : run_timer_softirq+0x244/0x3d8 x1 : ffff800011cafe80 x0 : dead000000000122 Call trace: run_timer_softirq+0x334/0x3d8 efi_header_end+0x114/0x234 irq_exit+0xd0/0xd8 __handle_domain_irq+0x60/0xb0 gic_handle_irq+0x58/0xa8 el1_irq+0xb8/0x180 arch_cpu_idle+0x10/0x18 do_idle+0x1d8/0x2b0 cpu_startup_entry+0x24/0x40 secondary_start_kernel+0x1b4/0x208 Code: f9000693 a9400660 f9000020 b4000040 (f9000401) ---[ end trace bb83ceeb4c482071 ]--- Kernel panic - not syncing: Fatal exception in interrupt SMP: stopping secondary CPUs SMP: failed to stop secondary CPUs 2-3 Kernel Offset: disabled CPU features: 0x00002,2300aa30 Memory Limit: none ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]--- It's found that stmmac_xmit() and stmmac_resume() sometimes might run concurrently, possibly resulting in a race condition between mod_timer() and setup_timer(), being called by stmmac_xmit() and stmmac_resume() respectively. Since the resume() runs setup_timer() every time, it'd be safer to have del_timer_sync() in the suspend() as the counterpart. Signed-off-by: Nicolin Chen <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2020-02-03Merge branch 'for-5.6' of ↵Linus Torvalds2-7/+6
git://git.kernel.org/pub/scm/linux/kernel/git/dennis/percpu Pull percpu updates from Dennis Zhou: "Separate out variables that can be decrypted into their own page anytime encryption can be enabled and fix __percpu annotations in asm-generic for sparse" * 'for-5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/dennis/percpu: percpu: Separate decrypted varaibles anytime encryption can be enabled percpu: fix __percpu annotation in asm-generic
2020-02-03Merge branch 'stable/for-linus-5.6' of ↵Linus Torvalds1-4/+5
git://git.kernel.org/pub/scm/linux/kernel/git/konrad/ibft Pull ibft update from Konrad Rzeszutek Wilk: "Adhere to the iBFT spec and extend the structure to handle more than two NICs" * 'stable/for-linus-5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/ibft: iscsi_ibft: Don't limits Targets and NICs to two
2020-02-03Merge tag 'vfio-v5.6-rc1' of git://github.com/awilliam/linux-vfioLinus Torvalds4-7/+9
Pull VFIO updates from Alex Williamson: - Fix nvlink error path (Alexey Kardashevskiy) - Update nvlink and spapr to use mmgrab() (Julia Lawall) - Update static declaration (Ben Dooks) - Annotate __iomem to fix sparse warnings (Ben Dooks) * tag 'vfio-v5.6-rc1' of git://github.com/awilliam/linux-vfio: vfio: platform: fix __iomem in vfio_platform_amdxgbe.c vfio/mdev: make create attribute static vfio/spapr_tce: use mmgrab vfio: vfio_pci_nvlink2: use mmgrab vfio/spapr/nvlink2: Skip unpinning pages on error exit
2020-02-03Merge tag 'clk-for-linus' of ↵Linus Torvalds146-1806/+15018
git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux Pull clk updates from Stephen Boyd: "There are a few changes to the core framework this time around, in addition to the normal collection of driver updates to support new SoCs, fix incorrect data, and convert various drivers to clk_hw based APIs. In the core, we allow clk_ops::init() to return an error code now so that we can fail clk registration if the callback does something like fail to allocate memory. We also add a new "terminate" clk_op so that things done in clk_ops::init() can be undone, e.g. free memory. We also spit out a warning now when critical clks fail to enable and we support changing clk rates and enable/disable state through debugfs when developers compile the kernel themselves. On the driver front, we get support for what seems like a lot of Qualcomm and NXP SoCs given that those vendors dominate the diffstat. There are a couple new drivers for Xilinx and Amlogic SoCs too. The updates are all small things like fixing the way glitch free muxes switch parents, avoiding div-by-zero problems, or fixing data like parent names. See the updates section below for more details. Finally, the "basic" clk types have been converted to support specifying parents with clk_hw pointers. This work includes an overhaul of the fixed-rate clk type to be more modern by using clk_hw APIs. Core: - Let clk_ops::init() return an error code - Add a clk_ops::terminate() callback to undo clk_ops::init() - Warn about critical clks that fail to enable or prepare - Support dangerous debugfs actions on clks with dead code New Drivers: - Support for Xilinx Versal platform clks - Display clk controller on qcom sc7180 - Video clk controller on qcom sc7180 - Graphics clk controller on qcom sc7180 - CPU PLLs for qcom msm8916 - Move qcom msm8974 gfx3d clk to RPM control - Display port clk support on qcom sdm845 SoCs - Global clk controller on qcom ipq6018 - Add a driver for BCLK of Freescale SAI cores - Add cam, vpe and sgx clock support for TI dra7 - Add aess clock support for TI omap5 - Enable clks for CPUfreq on Allwinner A64 SoCs - Add Amlogic meson8b DDR clock controller - Add input clocks to Amlogic meson8b controllers - Add SPIBSC (SPI FLASH) clock on Renesas RZ/A2 - i.MX8MP clk driver support Updates: - Convert gpio, fixed-factor, mux, gate, divider basic clks to hw based APIs - Detect more PRMCU variants in ux500 driver - Adjust the composite clk type to new way of describing clk parents - Fixes for clk controllers on qcom msm8998 SoCs - Fix gmac main clock for TI dra7 - Move TI dra7-atl clock header to correct location - Fix hidden node name dependency on TI clkctrl clocks - Fix Amlogic meson8b mali clock update using the glitch free mux - Fix Amlogic pll driver division by zero at init - Prepare for split of Renesas R-Car H3 ES1.x and ES2.0+ config symbols - Switch more i.MX clk drivers to clk_hw based APIs - Disable non-functional divider between pll4_audio_div and pll4_post_div on imx6q - Fix watchdog2 clock name typo in imx7ulp clock driver - Set CLK_GET_RATE_NOCACHE flag for DRAM related clocks on i.MX8M SoCs - Suppress bind attrs for i.MX8M clock driver - Add a big comment in imx8qxp-lpcg driver to tell why devm_platform_ioremap_resource() shouldn't be used for the driver - A correction on i.MX8MN usb1_ctrl parent clock setting" * tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: (140 commits) dt/bindings: clk: fsl,plldig: Drop 'bindings' from schema id clk: ls1028a: Fix warning on clamp() usage clk: qoriq: add ls1088a hwaccel clocks support clk: ls1028a: Add clock driver for Display output interface dt/bindings: clk: Add YAML schemas for LS1028A Display Clock bindings clk: fsl-sai: new driver dt-bindings: clock: document the fsl-sai driver clk: composite: add _register_composite_pdata() variants clk: qcom: rpmh: Sort OF match table dt-bindings: fix warnings in validation of qcom,gcc.yaml dt-binding: fix compilation error of the example in qcom,gcc.yaml clk: zynqmp: Add support for clock with CLK_DIVIDER_POWER_OF_TWO flag clk: zynqmp: Fix divider calculation clk: zynqmp: Add support for get max divider clk: zynqmp: Warn user if clock user are more than allowed clk: zynqmp: Extend driver for versal dt-bindings: clock: Add bindings for versal clock driver clk: ti: clkctrl: Fix hidden dependency to node name clk: ti: add clkctrl data dra7 sgx clk: ti: omap5: Add missing AESS clock ...
2020-02-03Merge branch 'for-linus' of ↵Linus Torvalds14-144/+544
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input Pull input updates from Dmitry Torokhov: - a driver for SGI IOC3 PS/2 controller - updates to driver for FocalTech FT5x06 series touch screen controllers - other assorted fixes * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: Input: synaptics-rmi4 - switch to reduced reporting mode dt-bindings: touchscreen: Convert Goodix touchscreen to json-schema dt-bindings: touchscreen: Add touchscreen schema Input: add IOC3 serio driver Input: axp20x-pek - enable wakeup for all AXP variants Input: axp20x-pek - respect userspace wakeup configuration Input: ads7846 - use new `delay` structure for SPI transfer delays Input: edt-ft5x06 - use pm core to enable/disable the wake irq Input: edt-ft5x06 - make wakeup-source switchable Input: edt-ft5x06 - document wakeup-source capability Input: edt-ft5x06 - alphabetical include reorder Input: edt-ft5x06 - work around first register access error Input: apbps2 - add __iomem to register struct Input: axp20x-pek - make device attributes static Input: elants_i2c - check Remark ID when attempting firmware update
2020-02-03parisc: Regenerate parisc defconfigsHelge Deller8-1168/+43
Regenerate the 32- and 64-bit defconfigs and drop the outdated specific machine defconfigs for the 712, A500, B160, C3000 and C8000 workstations. Signed-off-by: Helge Deller <[email protected]>
2020-02-03dt/bindings: clk: fsl,plldig: Drop 'bindings' from schema idStephen Boyd1-1/+1
Having 'bindings' in here causes a warning when checking the schema. Documentation/devicetree/bindings/clock/fsl,plldig.yaml: $id: relative path/filename doesn't match actual path or filename expected: http://devicetree.org/schemas/clock/fsl,plldig.yaml# Remove it. Cc: Rob Herring <[email protected]> Cc: Wen He <[email protected]> Signed-off-by: Stephen Boyd <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Acked-by: Rob Herring <[email protected]>
2020-02-03clk: ls1028a: Fix warning on clamp() usageStephen Boyd1-2/+2
These constants are used in clamp() with the value being clamped an unsigned long. Make them unsigned long defines so that clamp() doesn't complain about comparing different types. In file included from include/linux/list.h:9, from include/linux/kobject.h:19, from include/linux/of.h:17, from include/linux/clk-provider.h:9, from drivers/clk/clk-plldig.c:8: drivers/clk/clk-plldig.c: In function 'plldig_determine_rate': include/linux/kernel.h:835:29: warning: comparison of distinct pointer types lacks a cast 835 | (!!(sizeof((typeof(x) *)1 == (typeof(y) *)1))) | Reported-by: Stephen Rothwell <[email protected]> Cc: Wen He <[email protected]> Fixes: d37010a3c162 ("clk: ls1028a: Add clock driver for Display output interface") Signed-off-by: Stephen Boyd <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-02-03Merge tag 'rxrpc-fixes-20200203' of ↵Jakub Kicinski10-69/+83
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs David Howells says: ==================== RxRPC fixes Here are a number of fixes for AF_RXRPC: (1) Fix a potential use after free in rxrpc_put_local() where it was accessing the object just put to get tracing information. (2) Fix insufficient notifications being generated by the function that queues data packets on a call. This occasionally causes recvmsg() to stall indefinitely. (3) Fix a number of packet-transmitting work functions to hold an active count on the local endpoint so that the UDP socket doesn't get destroyed whilst they're calling kernel_sendmsg() on it. (4) Fix a NULL pointer deref that stemmed from a call's connection pointer being cleared when the call was disconnected. Changes: v2: Removed a couple of BUG() statements that got added. ==================== Signed-off-by: Jakub Kicinski <[email protected]>
2020-02-04nvme-pci: remove nvmeq->tagsChristoph Hellwig1-15/+8
There is no real need to have a pointer to the tagset in struct nvme_queue, as we only need it in a single place, and that place can derive the used tagset from the device and qid trivially. This fixes a problem with stale pointer exposure when tagsets are reset, and also shrinks the nvme_queue structure. It also matches what most other transports have done since day 1. Reported-by: Edmund Nadolski <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Keith Busch <[email protected]>