aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2017-07-26tools/kvm_stat: use variables instead of hard paths in help outputLin Ma1-3/+3
Using variables instead of hard paths makes the requirements information more accurate. Signed-off-by: Lin Ma <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2017-07-26Merge tag 'kvm-s390-master-4.13-1' of ↵Paolo Bonzini1-2/+6
git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD KVM: s390: fixup missing srcu lock We need to hold the srcu lock when accessing memory slots during migration Signed-off-by: Paolo Bonzini <[email protected]>
2017-07-26KVM: nVMX: Fix loss of L2's NMI blocking stateWanpeng Li1-0/+2
Run kvm-unit-tests/eventinj.flat in L1 w/ ept=0 on both L0 and L1: Before NMI IRET test Sending NMI to self NMI isr running stack 0x461000 Sending nested NMI to self After nested NMI to self Nested NMI isr running rip=40038e After iret After NMI to self FAIL: NMI Commit 4c4a6f790ee862 (KVM: nVMX: track NMI blocking state separately for each VMCS) tracks NMI blocking state separately for vmcs01 and vmcs02. However it is not enough: - The L2 (kvm-unit-tests/eventinj.flat) generates NMI that will fault on IRET, so the L2 can generate #PF which can be intercepted by L0. - L0 walks L1's guest page table and sees the mapping is invalid, it resumes the L1 guest and injects the #PF into L1. At this point the vmcs02 has nmi_known_unmasked=true. - L1 sets set bit 3 (blocking by NMI) in the interruptibility-state field of vmcs12 (and fixes the shadow page table) before resuming L2 guest. - L1 executes VMRESUME to resume L2, causing a vmexit to L0 - during VMRESUME emulation, prepare_vmcs02 sets bit 3 in the interruptibility-state field of vmcs02, but nmi_known_unmasked is still true. - L2 immediately exits to L0 with another page fault, because L0 still has not updated the NGVA->HPA page tables. However, nmi_known_unmasked is true so vmx_recover_nmi_blocking does not do anything. The fix is to update nmi_known_unmasked when preparing vmcs02 from vmcs12. Cc: Paolo Bonzini <[email protected]> Cc: Radim Krčmář <[email protected]> Signed-off-by: Wanpeng Li <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2017-07-26KVM: nVMX: Fix posted intr delivery when vcpu is in guest modeWincy Van1-11/+11
The PI vector for L0 and L1 must be different. If dest vcpu0 is in guest mode while vcpu1 is delivering a non-nested PI to vcpu0, there wont't be any vmexit so that the non-nested interrupt will be delayed. Signed-off-by: Wincy Van <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2017-07-26x86: irq: Define a global vector for nested posted interruptsWincy Van7-1/+29
We are using the same vector for nested/non-nested posted interrupts delivery, this may cause interrupts latency in L1 since we can't kick the L2 vcpu out of vmx-nonroot mode. This patch introduces a new vector which is only for nested posted interrupts to solve the problems above. Signed-off-by: Wincy Van <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2017-07-26KVM: x86: do mask out upper bits of PAE CR3Paolo Bonzini1-2/+2
This reverts the change of commit f85c758dbee54cc3612a6e873ef7cecdb66ebee5, as the behavior it modified was intended. The VM is running in 32-bit PAE mode, and Table 4-7 of the Intel manual says: Table 4-7. Use of CR3 with PAE Paging Bit Position(s) Contents 4:0 Ignored 31:5 Physical address of the 32-Byte aligned page-directory-pointer table used for linear-address translation 63:32 Ignored (these bits exist only on processors supporting the Intel-64 architecture) To placate the static checker, write the mask explicitly as an unsigned long constant instead of using a 32-bit unsigned constant. Cc: Dan Carpenter <[email protected]> Fixes: f85c758dbee54cc3612a6e873ef7cecdb66ebee5 Signed-off-by: Paolo Bonzini <[email protected]>
2017-07-26KVM: make pid available for uevents without debugfsClaudio Imbrenda2-23/+13
Simplify and improve the code so that the PID is always available in the uevent even when debugfs is not available. This adds a userspace_pid field to struct kvm, as per Radim's suggestion, so that the PID can be retrieved on destruction too. Acked-by: Janosch Frank <[email protected]> Fixes: 286de8f6ac9202 ("KVM: trigger uevents when creating or destroying a VM") Signed-off-by: Claudio Imbrenda <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2017-07-26nvme: validate admin queue before unquiesceScott Bauer1-1/+2
With a misbehaving controller it's possible we'll never enter the live state and create an admin queue. When we fail out of reset work it's possible we failed out early enough without setting up the admin queue. We tear down queues after a failed reset, but needed to do some more sanitization. Fixes 443bd90f2cca: "nvme: host: unquiesce queue in nvme_kill_queues()" [ 189.650995] nvme nvme1: pci function 0000:0b:00.0 [ 317.680055] nvme nvme0: Device not ready; aborting reset [ 317.680183] nvme nvme0: Removing after probe failure status: -19 [ 317.681258] kasan: GPF could be caused by NULL-ptr deref or user memory access [ 317.681397] general protection fault: 0000 [#1] SMP KASAN [ 317.682984] CPU: 3 PID: 477 Comm: kworker/3:2 Not tainted 4.13.0-rc1+ #5 [ 317.683112] Hardware name: Gigabyte Technology Co., Ltd. Z170X-UD5/Z170X-UD5-CF, BIOS F5 03/07/2016 [ 317.683284] Workqueue: events nvme_remove_dead_ctrl_work [nvme] [ 317.683398] task: ffff8803b0990000 task.stack: ffff8803c2ef0000 [ 317.683516] RIP: 0010:blk_mq_unquiesce_queue+0x2b/0xa0 [ 317.683614] RSP: 0018:ffff8803c2ef7d40 EFLAGS: 00010282 [ 317.683716] RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 1ffff1006fbdcde3 [ 317.683847] RDX: 0000000000000038 RSI: 1ffff1006f5a9245 RDI: 0000000000000000 [ 317.683978] RBP: ffff8803c2ef7d58 R08: 1ffff1007bcdc974 R09: 0000000000000000 [ 317.684108] R10: 1ffff1007bcdc975 R11: 0000000000000000 R12: 00000000000001c0 [ 317.684239] R13: ffff88037ad49228 R14: ffff88037ad492d0 R15: ffff88037ad492e0 [ 317.684371] FS: 0000000000000000(0000) GS:ffff8803de6c0000(0000) knlGS:0000000000000000 [ 317.684519] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 317.684627] CR2: 0000002d1860c000 CR3: 000000045b40d000 CR4: 00000000003406e0 [ 317.684758] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 317.684888] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 317.685018] Call Trace: [ 317.685084] nvme_kill_queues+0x4d/0x170 [nvme_core] [ 317.685185] nvme_remove_dead_ctrl_work+0x3a/0x90 [nvme] [ 317.685289] process_one_work+0x771/0x1170 [ 317.685372] worker_thread+0xde/0x11e0 [ 317.685452] ? pci_mmcfg_check_reserved+0x110/0x110 [ 317.685550] kthread+0x2d3/0x3d0 [ 317.685617] ? process_one_work+0x1170/0x1170 [ 317.685704] ? kthread_create_on_node+0xc0/0xc0 [ 317.685785] ret_from_fork+0x25/0x30 [ 317.685798] Code: 0f 1f 44 00 00 55 48 b8 00 00 00 00 00 fc ff df 48 89 e5 41 54 4c 8d a7 c0 01 00 00 53 48 89 fb 4c 89 e2 48 c1 ea 03 48 83 ec 08 <80> 3c 02 00 75 50 48 8b bb c0 01 00 00 e8 33 8a f9 00 0f ba b3 [ 317.685872] RIP: blk_mq_unquiesce_queue+0x2b/0xa0 RSP: ffff8803c2ef7d40 [ 317.685908] ---[ end trace a3f8704150b1e8b4 ]--- Signed-off-by: Scott Bauer <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2017-07-26xfs: fix multi-AG deadlock in xfs_bunmapiChristoph Hellwig1-0/+12
Just like in the allocator we must avoid touching multiple AGs out of order when freeing blocks, as freeing still locks the AGF and can cause the same AB-BA deadlocks as in the allocation path. Signed-off-by: Christoph Hellwig <[email protected]> Reported-by: Nikolay Borisov <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]>
2017-07-26arm64: sysreg: Fix unprotected macro argmuent in write_sysregDave Martin1-2/+2
write_sysreg() may misparse the value argument because it is used without parentheses to protect it. This patch adds the ( ) in order to avoid any surprises. Acked-by: Mark Rutland <[email protected]> Signed-off-by: Dave Martin <[email protected]> [will: same change to write_sysreg_s] Signed-off-by: Will Deacon <[email protected]>
2017-07-26perf: qcom_l2: fix column exclusion checkNeil Leeder1-0/+2
The check for column exclusion did not verify that the event being checked was an L2 event, and not a software event. Software events should not be checked for column exclusion. This resulted in a group with both software and L2 events sometimes incorrectly rejecting the L2 event for column exclusion and not counting it. Add a check for PMU type before applying column exclusion logic. Fixes: 21bdbb7102edeaeb ("perf: add qcom l2 cache perf events driver") Acked-by: Mark Rutland <[email protected]> Signed-off-by: Neil Leeder <[email protected]> Signed-off-by: Will Deacon <[email protected]>
2017-07-26powerpc/Makefile: Fix ld version check with 64-bit LE-only toolchainMichael Ellerman1-12/+13
In commit efe0160cfd40 ("powerpc/64: Linker on-demand sfpr functions for modules"), we added an ld version check early in the powerpc top-level Makefile. Because the Makefile runs before the kernel config is setup, the checks for CONFIG_CPU_LITTLE_ENDIAN etc. all take the default case. So we end up configuring ld for 32-bit big endian. That would be OK, except that for historical (or perhaps no) reason, we use 'override LD' to add the endian flags to the LD variable itself, rather than the normal approach of adding them to LDFLAGS. The end result is that when we check the ld version we run it as: $(CROSS_COMPILE)ld -EB -m elf32ppc --version This often works, unless you are using a 64-bit only and/or little endian only, toolchain. In which case you see something like: $ make defconfig powerpc64le-linux-ld: unrecognised emulation mode: elf32ppc Supported emulations: elf64lppc elf32lppc elf32lppclinux elf32lppcsim /bin/sh: 1: [: -ge: unexpected operator The proper fix is to stop using 'override LD', but that will require a fair bit of testing. Instead we can fix it for now just by reordering the Makefile to do the version check earlier. Fixes: efe0160cfd40 ("powerpc/64: Linker on-demand sfpr functions for modules") Signed-off-by: Michael Ellerman <[email protected]>
2017-07-26powerpc/pseries: Fix of_node_put() underflow during reconfig removeLaurent Vivier1-1/+0
As for commit 68baf692c435 ("powerpc/pseries: Fix of_node_put() underflow during DLPAR remove"), the call to of_node_put() must be removed from pSeries_reconfig_remove_node(). dlpar_detach_node() and pSeries_reconfig_remove_node() both call of_detach_node(), and thus the node should not be released in both cases. Fixes: 0829f6d1f69e ("of: device_node kobject lifecycle fixes") Cc: [email protected] # v3.15+ Signed-off-by: Laurent Vivier <[email protected]> Reviewed-by: David Gibson <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2017-07-26powerpc/mm/radix: Workaround prefetch issue with KVMBenjamin Herrenschmidt6-22/+154
There's a somewhat architectural issue with Radix MMU and KVM. When coming out of a guest with AIL (Alternate Interrupt Location, ie, MMU enabled), we start executing hypervisor code with the PID register still containing whatever the guest has been using. The problem is that the CPU can (and will) then start prefetching or speculatively load from whatever host context has that same PID (if any), thus bringing translations for that context into the TLB, which Linux doesn't know about. This can cause stale translations and subsequent crashes. Fixing this in a way that is neither racy nor a huge performance impact is difficult. We could just make the host invalidations always use broadcast forms but that would hurt single threaded programs for example. We chose to fix it instead by partitioning the PID space between guest and host. This is possible because today Linux only use 19 out of the 20 bits of PID space, so existing guests will work if we make the host use the top half of the 20 bits space. We additionally add support for a property to indicate to Linux the size of the PID register which will be useful if we eventually have processors with a larger PID space available. There is still an issue with malicious guests purposefully setting the PID register to a value in the hosts PID range. Hopefully future HW can prevent that, but in the meantime, we handle it with a pair of kludges: - On the way out of a guest, before we clear the current VCPU in the PACA, we check the PID and if it's outside of the permitted range we flush the TLB for that PID. - When context switching, if the mm is "new" on that CPU (the corresponding bit was set for the first time in the mm cpumask), we check if any sibling thread is in KVM (has a non-NULL VCPU pointer in the PACA). If that is the case, we also flush the PID for that CPU (core). This second part is needed to handle the case where a process is migrated (or starts a new pthread) on a sibling thread of the CPU coming out of KVM, as there's a window where stale translations can exist before we detect it and flush them out. A future optimization could be added by keeping track of whether the PID has ever been used and avoid doing that for completely fresh PIDs. We could similarily mark PIDs that have been the subject of a global invalidation as "fresh". But for now this will do. Signed-off-by: Benjamin Herrenschmidt <[email protected]> [mpe: Rework the asm to build with CONFIG_PPC_RADIX_MMU=n, drop unneeded include of kvm_book3s_asm.h] Signed-off-by: Michael Ellerman <[email protected]>
2017-07-25Merge tag 'scsi-fixes' of ↵Linus Torvalds3-4/+7
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "Three small fixes. The transfer size fixes are actually correcting some performance drops on the hpsa and smartpqi cards. The cards actually have an internal cache for request speed up but bypass it for transfers > 1MB. Since 4.3 the efficiency of our merges has rendered the cache mostly unused, so limit transfers to under 1MB to recover the cache boost" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: sg: fix static checker warning in sg_is_valid_dxfer scsi: smartpqi: limit transfer length to 1MB scsi: hpsa: limit transfer length to 1MB
2017-07-25Merge tag 'uuid-for-4.13-2' of git://git.infradead.org/users/hch/uuidLinus Torvalds5-27/+13
Pull uuid fixes from Christoph Hellwig: - add a missing "!" in the uuid tests - remove the last remaining user of the uuid_be type, and then the type and its helpers * tag 'uuid-for-4.13-2' of git://git.infradead.org/users/hch/uuid: uuid: remove uuid_be thunderbolt: use uuid_t instead of uuid_be uuid: fix incorrect uuid_equal conversion in test_uuid_test
2017-07-25Merge tag 'dma-mapping-4.13-2' of git://git.infradead.org/users/hch/dma-mappingLinus Torvalds8-81/+180
Pull dma mapping fixes from Christoph Hellwig: "split the global dma coherent pool from the per-device pool. This fixes a regression in the earlier 4.13 pull requests where the global pool would override a per-device CMA pool (Vladimir Murzin)" * tag 'dma-mapping-4.13-2' of git://git.infradead.org/users/hch/dma-mapping: ARM: NOMMU: Wire-up default DMA interface dma-coherent: introduce interface for default DMA pool
2017-07-25MD: fix warnning for UP caseShaohua Li1-1/+1
spin_is_locked always returns 0 for UP case, so ignores it Reported-by: Joshua Kinard <[email protected]> Signed-off-by: Shaohua Li <[email protected]>
2017-07-25parisc: Extend disabled preemption in copy_user_pageJohn David Anglin1-1/+1
It's always bothered me that we only disable preemption in copy_user_page around the call to flush_dcache_page_asm. This patch extends this to after the copy. Signed-off-by: John David Anglin <[email protected]> Cc: [email protected] # 4.9+ Signed-off-by: Helge Deller <[email protected]>
2017-07-25parisc: Prevent TLB speculation on flushed pages on CPUs that only support ↵John David Anglin1-20/+14
equivalent aliases Helge noticed that we flush the TLB page in flush_cache_page but not in flush_cache_range or flush_cache_mm. For a long time, we have had random segmentation faults building packages on machines with PA8800/8900 processors. These machines only support equivalent aliases. We don't see these faults on machines that don't require strict coherency. So, it appears TLB speculation sometimes leads to cache corruption on machines that require coherency. This patch adds TLB flushes to flush_cache_range and flush_cache_mm when coherency is required. We only flush the TLB in flush_cache_page when coherency is required. The patch also optimizes flush_cache_range. It turns out we always have the right context to use flush_user_dcache_range_asm and flush_user_icache_range_asm. The patch has been tested for some time on rp3440, rp3410 and A500-44. It's been boot tested on c8000. No random segmentation faults were observed during testing. Signed-off-by: John David Anglin <[email protected]> Cc: [email protected] # 4.9+ Signed-off-by: Helge Deller <[email protected]>
2017-07-25Merge branch 'stable/for-jens-4.13' of ↵Jens Axboe1-9/+12
git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen into for-linus Pull xen-blkfront fixes from Konrad for 4.13.
2017-07-25ALSA: hda - Add mute led support for HP ProBook 440 G4Kai-Heng Feng1-0/+1
Mic mute led does not work on HP ProBook 440 G4. We can use CXT_FIXUP_MUTE_LED_GPIO fixup to support it. BugLink: https://bugs.launchpad.net/bugs/1705586 Signed-off-by: Kai-Heng Feng <[email protected]> Cc: <[email protected]> # v4.12+ Signed-off-by: Takashi Iwai <[email protected]>
2017-07-25drm/amd/powerplay: fix AVFS voltage offset for Vega10Eric Huang1-9/+3
Signed-off-by: Eric Huang <[email protected]> Acked-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2017-07-25drm/amdgpu/gfx9: simplify and fix GRBM index selectionNicolai Hähnle1-11/+13
Copy the approach taken by gfx8, which simplifies the code, and set the instance index properly. The latter is required for debugging, e.g. for reading wave status by UMR. Signed-off-by: Nicolai Hähnle <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2017-07-25drm/amdgpu: Fix blocking in RCU critical section(v2)Alex Xie1-3/+7
In RCU read-side critical sections, blocking or sleeping is prohibited. v2: Unlock RCU for the code path where result==NULL. (David Zhou) Update subject Tested-by and reported by: Dave Airlie <[email protected]> [ 141.965723] ============================= [ 141.965724] WARNING: suspicious RCU usage [ 141.965726] 4.12.0-rc7 #221 Not tainted [ 141.965727] ----------------------------- [ 141.965728] /home/airlied/devel/kernel/linux-2.6/include/linux/rcupdate.h:531 Illegal context switch in RCU read-side critical section! [ 141.965730] other info that might help us debug this: [ 141.965731] rcu_scheduler_active = 2, debug_locks = 0 [ 141.965732] 1 lock held by amdgpu_cs:0/1332: [ 141.965733] #0: (rcu_read_lock){......}, at: [<ffffffffa01a0d07>] amdgpu_bo_list_get+0x0/0x109 [amdgpu] [ 141.965774] stack backtrace: [ 141.965776] CPU: 6 PID: 1332 Comm: amdgpu_cs:0 Not tainted 4.12.0-rc7 #221 [ 141.965777] Hardware name: To be filled by O.E.M. To be filled by O.E.M./M5A97 R2.0, BIOS 2603 06/26/2015 [ 141.965778] Call Trace: [ 141.965782] dump_stack+0x68/0x92 [ 141.965785] lockdep_rcu_suspicious+0xf7/0x100 [ 141.965788] ___might_sleep+0x56/0x1fc [ 141.965790] __might_sleep+0x68/0x6f [ 141.965793] __mutex_lock+0x4e/0x7b5 [ 141.965817] ? amdgpu_bo_list_get+0xa4/0x109 [amdgpu] [ 141.965820] ? lock_acquire+0x125/0x1b9 [ 141.965844] ? amdgpu_bo_list_set+0x464/0x464 [amdgpu] [ 141.965846] mutex_lock_nested+0x16/0x18 [ 141.965848] ? mutex_lock_nested+0x16/0x18 [ 141.965872] amdgpu_bo_list_get+0xa4/0x109 [amdgpu] [ 141.965895] amdgpu_cs_ioctl+0x4a0/0x17dd [amdgpu] [ 141.965898] ? radix_tree_node_alloc.constprop.11+0x77/0xab [ 141.965916] drm_ioctl+0x264/0x393 [drm] [ 141.965939] ? amdgpu_cs_find_mapping+0x83/0x83 [amdgpu] [ 141.965942] ? trace_hardirqs_on_caller+0x16a/0x186 Signed-off-by: Alex Xie <[email protected]> Reviewed-by: Chunming Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2017-07-25nbd: clear disconnected on reconnectJosef Bacik1-0/+2
If our device loses its connection for longer than the dead timeout we will set NBD_DISCONNECTED in order to quickly fail any pending IO's that flood in after the IO's that were waiting during the dead timer. However if we re-connect at some point in the future we'll still see this DISCONNECTED flag set if we then lose our connection again after that, which means we won't get notifications for our newly lost connections. Fix this by just clearing the DISCONNECTED flag on reconnect in order to make sure everything works as it's supposed to. Reported-by: Dan Melnic <[email protected]> Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-07-25parisc: Suspend lockup detectors before system haltHelge Deller1-0/+2
Some machines can't power off the machine, so disable the lockup detectors to avoid this watchdog BUG to show up every few seconds: watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [systemd-shutdow:1] Signed-off-by: Helge Deller <[email protected]> Cc: [email protected] # 4.9+
2017-07-25parisc: Show DIMM slot number which holds broken memory moduleHelge Deller1-5/+14
The Page Deallocation Table (PDT) holds the physical addresses of all broken memory addresses. With the physical address we now are able to show which DIMM slot (e.g. 1a, 3c) actually holds the broken memory module so that users are able to replace it. Signed-off-by: Helge Deller <[email protected]>
2017-07-25dm zoned: remove test for impossible REQ_OP_FLUSH conditionsMikulas Patocka1-2/+2
The value REQ_OP_FLUSH is only used by the block code for request-based devices. Remove the tests for REQ_OP_FLUSH from the bio-based dm-zoned-target. Signed-off-by: Mikulas Patocka <[email protected]> Reviewed-by: Damien Le Moal <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
2017-07-25dm raid: bump target versionHeinz Mauelshagen2-1/+2
Bumo dm-raid target version to 1.12.1 to reflect that commit cc27b0c78c ("md: fix deadlock between mddev_suspend() and md_write_start()") is available. This version change allows userspace to detect that MD fix is available. Signed-off-by: Heinz Mauelshagen <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
2017-07-25dm raid: avoid mddev->suspended accessHeinz Mauelshagen1-5/+7
Use runtime flag to ensure that an mddev gets suspended/resumed just once. Signed-off-by: Heinz Mauelshagen <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
2017-07-25dm raid: fix activation check in validate_raid_redundancy()Heinz Mauelshagen1-5/+5
During growing reshapes (i.e. stripes being added to a raid set), the new stripe images are not in-sync and not part of the raid set until the reshape is started. LVM2 has to request multiple table reloads involving superblock updates in order to reflect proper size of SubLVs in the cluster. Before a stripe adding reshape starts, validate_raid_redundancy() fails as a result of that because it checks the total number of devices against the number of rebuild ones rather than the actual ones in the raid set (as retrieved from the superblock) thus resulting in failed raid4/5/6/10 redundancy checks. E.g. convert 3 stripes -> 7 stripes raid5 (which only allows for maximum 1 device to fail) requesting +4 delta disks causing 4 devices to rebuild during reshaping thus failing activation. To fix this, move validate_raid_redundancy() to get access to the current raid_set members. Signed-off-by: Heinz Mauelshagen <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
2017-07-25dm raid: remove WARN_ON() in raid10_md_layout_to_format()Heinz Mauelshagen1-2/+3
Signed-off-by: Heinz Mauelshagen <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
2017-07-25parisc: Add function to return DIMM slot of physical addressHelge Deller2-1/+40
Add a firmware wrapper function, which asks PDC firmware for the DIMM slot of a physical address. This is needed to show users which DIMM module needs replacement in case a broken DIMM was encountered. Signed-off-by: Helge Deller <[email protected]>
2017-07-25parisc: Fix crash when calling PDC_PAT_MEM PDT firmware functionHelge Deller2-3/+12
Commit c9c2877d08d9 ("parisc: Add Page Deallocation Table (PDT) support") introduced the pdc_pat_mem_read_pd_pdt() firmware helper function, which crashed the system because it trashed the stack if the pdc_pat_mem_read_pd_retinfo struct was located on the stack (and which is in size less than the required 32 64-bit values). Fix it by using the pdc_result struct instead when calling firmware and copy the return values back into the result struct when finished sucessfully. While debugging this code I noticed that the pdc_type wasn't set correctly either, so let's fix that too. Fixes: c9c2877d08d9 ("parisc: Add Page Deallocation Table (PDT) support") Signed-off-by: Helge Deller <[email protected]>
2017-07-25nvme-pci: fix HMB size calculationChristoph Hellwig1-3/+3
It's possible the preferred HMB size may not be a multiple of the chunk_size. This patch moves len to function scope and uses that in the for loop increment so the last iteration doesn't cause the total size to exceed the allocated HMB size. Based on an earlier patch from Keith Busch. Signed-off-by: Christoph Hellwig <[email protected]> Reported-by: Dan Carpenter <[email protected]> Reviewed-by: Keith Busch <[email protected]> Fixes: 87ad72a59a38 ("nvme-pci: implement host memory buffer support")
2017-07-25nvme-fc: revise TRADDR parsingJames Smart3-97/+125
The FC-NVME spec hasn't locked down on the format string for TRADDR. Currently the spec is lobbying for "nn-<16hexdigits>:pn-<16hexdigits>" where the wwn's are hex values but not prefixed by 0x. Most implementations so far expect a string format of "nn-0x<16hexdigits>:pn-0x<16hexdigits>" to be used. The transport uses the match_u64 parser which requires a leading 0x prefix to set the base properly. If it's not there, a match will either fail or return a base 10 value. The resolution in T11 is pushing out. Therefore, to fix things now and to cover any eventuality and any implementations already in the field, this patch adds support for both formats. The change consists of replacing the token matching routine with a routine that validates the fixed string format, and then builds a local copy of the hex name with a 0x prefix before calling the system parser. Note: the same parser routine exists in both the initiator and target transports. Given this is about the only "shared" item, we chose to replicate rather than create an interdendency on some shared code. Signed-off-by: James Smart <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2017-07-25nvme-fc: address target disconnect race conditions in fcp io submitJames Smart1-8/+11
There are cases where threads are in the process of submitting new io when the LLDD calls in to remove the remote port. In some cases, the next io actually goes to the LLDD, who knows the remoteport isn't present and rejects it. To properly recovery/restart these i/o's we don't want to hard fail them, we want to treat them as temporary resource errors in which a delayed retry will work. Add a couple more checks on remoteport connectivity and commonize the busy response handling when it's seen. Signed-off-by: James Smart <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2017-07-25nvme: fabrics commands should use the fctype field for data directionJon Derrick1-1/+1
Fabrics commands with opcode 0x7F use the fctype field to indicate data direction. Signed-off-by: Jon Derrick <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Fixes: eb793e2c ("nvme.h: add NVMe over Fabrics definitions")
2017-07-25nvme: also provide a UUID in the WWID sysfs attributeJohannes Thumshirn1-0/+3
The WWID sysfs attribute can provide multiple means of a World Wide ID for a NVMe device. It can either be a NGUID, a EUI-64 or a concatenation of VID, Serial Number, Model and the Namespace ID in this order of preference. If the target also sends us a UUID use the UUID for identification and give it the highest priority. This eases generation of /dev/disk/by-* symlinks. Signed-off-by: Johannes Thumshirn <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2017-07-25Merge tag 'jfs-4.13' of git://github.com/kleikamp/linux-shaggyLinus Torvalds3-12/+20
Pull JFS fixes from David Kleikamp. * tag 'jfs-4.13' of git://github.com/kleikamp/linux-shaggy: jfs: preserve i_mode if __jfs_set_acl() fails jfs: Don't clear SGID when inheriting ACLs jfs: atomically read inode size
2017-07-25Merge branch 'for-linus' of ↵Linus Torvalds4-8/+16
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid Pull HID fixes from Jiri Kosina: - regression fix (missing IRQs) for devices that require 'always poll' quirk, from Dmitry Torokhov - new device ID addition to Ortek driver, from Benjamin Tissoires * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid: HID: ortek: add one more buggy device HID: usbhid: fix "always poll" quirk
2017-07-25Merge branch 'for-linus' of ↵Linus Torvalds3-4/+5
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 fixes from Martin Schwidefsky: "Three bug fixes" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390/mm: set change and reference bit on lazy key enablement s390: chp: handle CRW_ERC_INIT for channel-path status change s390/perf: fix problem state detection
2017-07-25xfs: check that dir block entries don't off the end of the bufferDarrick J. Wong1-0/+4
When we're checking the entries in a directory buffer, make sure that the entry length doesn't push us off the end of the buffer. Found via xfs/388 writing ones to the length fields. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Brian Foster <[email protected]>
2017-07-25xen/blkfront: always allocate grants first from per-queue persistent grantsDongli Zhang1-8/+11
This patch partially reverts 3df0e50 ("xen/blkfront: pseudo support for multi hardware queues/rings"). The xen-blkfront queue/ring might hang due to grants allocation failure in the situation when gnttab_free_head is almost empty while many persistent grants are reserved for this queue/ring. As persistent grants management was per-queue since 73716df ("xen/blkfront: make persistent grants pool per-queue"), we should always allocate from persistent grants first. Acked-by: Roger Pau Monné <[email protected]> Signed-off-by: Dongli Zhang <[email protected]> Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
2017-07-25xen-blkfront: fix mq start/stop raceJunxiao Bi1-1/+1
When ring buf full, hw queue will be stopped. While blkif interrupt consume request and make free space in ring buf, hw queue will be started again. But since start queue is protected by spin lock while stop not, that will cause a race. interrupt: process: blkif_interrupt() blkif_queue_rq() kick_pending_request_queues_locked() blk_mq_start_stopped_hw_queues() clear_bit(BLK_MQ_S_STOPPED, &hctx->state) blk_mq_stop_hw_queue(hctx) blk_mq_run_hw_queue(hctx, async) If ring buf is made empty in this case, interrupt will never come, then the hw queue will be stopped forever, all processes waiting for the pending io in the queue will hung. Signed-off-by: Junxiao Bi <[email protected]> Reviewed-by: Ankur Arora <[email protected]> Acked-by: Roger Pau Monné <[email protected]> Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
2017-07-25dm bufio: fix error code in dm_bufio_write_dirty_buffers()Dan Carpenter1-2/+1
We should be returning normal negative error codes here. The "a" variables comes from &c->async_write_error which is a blk_status_t converted to a regular error code. In the current code, the blk_status_t gets propogated back to pool_create() and eventually results in an Oops. Fixes: 4e4cbee93d56 ("block: switch bios to blk_status_t") Signed-off-by: Dan Carpenter <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
2017-07-25dm integrity: test for corrupted disk format during table loadMikulas Patocka1-0/+5
If the dm-integrity superblock was corrupted in such a way that the journal_sections field was zero, the integrity target would deadlock because it would wait forever for free space in the journal. Detect this situation and refuse to activate the device. Signed-off-by: Mikulas Patocka <[email protected]> Cc: [email protected] Fixes: 7eada909bfd7 ("dm: add integrity target") Signed-off-by: Mike Snitzer <[email protected]>
2017-07-25dm integrity: WARN_ON if variables representing journal usage get out of syncMikulas Patocka1-0/+2
If this WARN_ON triggers it speaks to programmer error, and likely implies corruption, but no released kernel should trigger it. This WARN_ON serves to assist DM integrity developers as changes are made/tested in the future. BUG_ON is excessive for catching programmer error, if a user or developer would like warnings to trigger a panic, they can enable that via /proc/sys/kernel/panic_on_warn Signed-off-by: Mikulas Patocka <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
2017-07-25virtio-net: fix module unloadingAndrew Jones1-1/+1
Unregister the driver before removing multi-instance hotplug callbacks. This order avoids the warning issued from __cpuhp_remove_state_cpuslocked when the number of remaining instances isn't yet zero. Fixes: 8017c279196a ("net/virtio-net: Convert to hotplug state machine") Cc: Sebastian Andrzej Siewior <[email protected]> Signed-off-by: Andrew Jones <[email protected]> Signed-off-by: Michael S. Tsirkin <[email protected]>