aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2015-11-25block/sd: Fix device-imposed transfer length limitsMartin K. Petersen5-37/+51
Commit 4f258a46346c ("sd: Fix maximum I/O size for BLOCK_PC requests") had the unfortunate side-effect of removing an implicit clamp to BLK_DEF_MAX_SECTORS for REQ_TYPE_FS requests in the block layer code. This caused problems for some SMR drives. Debugging this issue revealed a few problems with the existing infrastructure since the block layer didn't know how to deal with device-imposed limits, only limits set by the I/O controller. - Introduce a new queue limit, max_dev_sectors, which is used by the ULD to signal the maximum sectors for a REQ_TYPE_FS request. - Ensure that max_dev_sectors is correctly stacked and taken into account when overriding max_sectors through sysfs. - Rework sd_read_block_limits() so it saves the max_xfer and opt_xfer values for later processing. - In sd_revalidate() set the queue's max_dev_sectors based on the MAXIMUM TRANSFER LENGTH value in the Block Limits VPD. If this value is not reported, fall back to a cap based on the CDB TRANSFER LENGTH field size. - In sd_revalidate(), use OPTIMAL TRANSFER LENGTH from the Block Limits VPD--if reported and sane--to signal the preferred device transfer size for FS requests. Otherwise use BLK_DEF_MAX_SECTORS. - blk_limits_max_hw_sectors() is no longer used and can be removed. Signed-off-by: Martin K. Petersen <[email protected]> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=93581 Reviewed-by: Christoph Hellwig <[email protected]> Tested-by: [email protected] Tested-by: Arzeets <[email protected]> Tested-by: David Eisner <[email protected]> Tested-by: Mario Kicherer <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]>
2015-11-25scsi_debug: fix prevent_allow+verify regressionsDouglas Gilbert1-4/+5
Ruediger Meier observed a regression with the PREVENT ALLOW MEDIUM REMOVAL command in lk 3.19: http://www.spinics.net/lists/util-linux-ng/msg11448.html Inspection indicated the same regression with VERIFY(10). The patch is against lk 3.19.3 and also works with lk 4.3.0 . With this patch both commands are accepted and do nothing. ChangeLog: - fix the lk 3.19 regression so that the PREVENT ALLOW MEDIUM REMOVAL command is supported once again - same fix for VERIFY(10) Signed-off-by: Douglas Gilbert <[email protected]> Reviewed-by: Hannes Reinicke <[email protected]> Reviewed-by: Ewan Milne <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]>
2015-11-25MAINTAINERS: Add myself as co-maintainer of the SCSI subsystem.Martin K. Petersen1-1/+3
Signed-off-by: Martin K. Petersen <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]>
2015-11-25sd: Make discard granularity match logical block size when LBPRZ=1Martin K. Petersen1-5/+18
A device may report an OPTIMAL UNMAP GRANULARITY and UNMAP GRANULARITY ALIGNMENT in the Block Limits VPD. These parameters describe the device's internal provisioning allocation units. By default the block layer will round and align any discard requests based on these limits. If a device reports LBPRZ=1 to guarantee zeroes after discard, however, it is imperative that the block layer does not leave out any parts of the requested block range. Otherwise the device can not do the required zeroing of any partial allocation units and this can lead to data corruption. Since the dm thinp personality relies on the block layer's current behavior and is unable to deal with partial discard blocks we work around the problem by setting the granularity to match the logical block size when LBPRZ is enabled. Signed-off-by: Martin K. Petersen <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]>
2015-11-25Merge branch 'for-linus' of ↵Linus Torvalds2-9/+10
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs fixes from Al Viro: "A couple of fixes for sendfile lockups caught by Dmitry + a fix for ancient sysvfs symlink breakage" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: vfs: Avoid softlockups with sendfile(2) vfs: Make sendfile(2) killable even better fix sysvfs symlinks
2015-11-25Merge tag 'omap-for-v4.4/fixes-rc2' of ↵Arnd Bergmann9-67/+106
git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap into fixes Merge "Fixes for omaps for v4.4-rc cycle" from Tony Lindgren: - A series of audio changes for dra7 that missed the merge window but turned out to be necessary to fix a boot time imprecise external abort error and to getaudio working - Fix l4 related boot time errors for dm81xx - Use lockless cldm/pwrdm api in omap4_boot_secondary - Remove t410 custom abort handler that is no longer needed and may hide other critical errors - Mark cpuidle tracepoints as _rcuidle - Fix module alias for omap-ocp2scp * tag 'omap-for-v4.4/fixes-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap: ARM: OMAP4+: SMP: use lockless clkdm/pwrdm api in omap4_boot_secondary arm: omap2+: add missing HWMOD_NO_IDLEST in 81xx hwmod data ARM: OMAP2+: remove custom abort handler for t410 ARM: OMAP: DRA7: hwmod: Add data for McASP3 ARM: OMAP2+: hwmod: Add hwmod flag for HWMOD_OPT_CLKS_NEEDED ARM: dts: dra7: Fix McASP3 node regarding to clocks bus: omap-ocp2scp: Fix module alias ARM: OMAP2+: PM: Denote the cpuidle tracepoints as _rcuidle()
2015-11-25Merge tag 'keystone-fixes-for-4.4' of ↵Arnd Bergmann3-23/+5
git://git.kernel.org/pub/scm/linux/kernel/git/ssantosh/linux-keystone into fixes Merge "Few Keystone fixes for 4.4-rcx" from Santosh Shilimkar: - Fix the optional PDSP firmware loading - Fix linking RAM setup for QMs - Fix crash with clk_ignore_unused * tag 'keystone-fixes-for-4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/ssantosh/linux-keystone: ARM: dts: keystone: k2l: fix kernel crash when clk_ignore_unused is not in bootargs soc: ti: knav_qmss_queue: Fix linking RAM setup for queue managers soc: ti: use request_firmware_direct() as acc firmware is optional
2015-11-25Merge tag 'v4.4-rc2' into fixesArnd Bergmann334-5497/+4149
Linux 4.4-rc2 is backmerged from the keystone fixes.
2015-11-25Merge tag 'imx-fixes-4.4' of ↵Arnd Bergmann2-4/+5
git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux into fixes Merge "The i.MX fixes for 4.4" from Shawn Guo: - Add missing .irq_set_type for i.MX GPC irq_chip. It fixes an issue that device IRQ type setting doesn't match the one specified in device tree, since stacked IRQ domain is adopted in GPC driver. - Fix the wrong spi-num-chipselects settings for Vybrid DSPI devices. - Fix a merge error in Vybrid dts regarding to ADC device property fsl,adck-max-frequency * tag 'imx-fixes-4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux: ARM: dts: vfxxx: Fix dspi[01] spi-num-chipselects. ARM: imx: add platform irq type setting in gpc ARM: dts: vfxxx: Fix erroneous property in esdhc0 node
2015-11-25intel_pstate: Fix "performance" mode behavior with HWP enabledAlexandra Yates1-0/+2
If hardware-driven P-state selection (HWP) is enabled, the "performance" mode of intel_pstate should only allow the processor to use the highest-performance P-state available. That is not the case currently, so make it actually happen. Acked-by: Srinivas Pandruvada <[email protected]> Signed-off-by: Alexandra Yates <[email protected]> [ rjw: Subject and changelog ] Signed-off-by: Rafael J. Wysocki <[email protected]>
2015-11-25x86 smpboot: Re-enable init_udelay=0 by default on modern CPUsLen Brown1-4/+5
commit f1ccd249319e allowed the cmdline "cpu_init_udelay=" to work with all values, including the default of 10000. But in setting the default of 10000, it over-rode the code that sets the delay 0 on modern processors. Also, tidy up use of INT/UINT. Fixes: f1ccd249319e "x86/smpboot: Fix cpu_init_udelay=10000 corner case boot parameter misbehavior" Reported-by: Shane <[email protected]> Signed-off-by: Len Brown <[email protected]> Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/9082eb809ef40dad02db714759c7aaf618c518d4.1448232494.git.len.brown@intel.com Signed-off-by: Thomas Gleixner <[email protected]>
2015-11-25clk: mmp: add linux/clk.h includesArnd Bergmann3-0/+3
The common clk implementation for MMP broke without anyone noticing when we stopped including linux/clk.h from the clk-provider header. This did not show up in the defconfig builds because those use the legacy MMP clk drivers, and it did not show up in my randconfig tests either because I was testing with my mmp multiplatform series applied, which at some point gained the fixup. This fixes the three broken files. Signed-off-by: Arnd Bergmann <[email protected]> Fixes: 61ae76563ec3 ("clk: Remove clk.h from clk-provider.h") Signed-off-by: Stephen Boyd <[email protected]>
2015-11-25nfs4: resend LAYOUTGET when there is a race that changes the seqidJeff Layton1-25/+31
pnfs_layout_process will check the returned layout stateid against what the kernel has in-core. If it turns out that the stateid we received is older, then we should resend the LAYOUTGET instead of falling back to MDS I/O. Signed-off-by: Jeff Layton <[email protected]> Cc: [email protected] # 3.18+ Signed-off-by: Trond Myklebust <[email protected]>
2015-11-25nfs: if we have no valid attrs, then don't declare the attribute cache validJeff Layton1-1/+5
If we pass in an empty nfs_fattr struct to nfs_update_inode, it will (correctly) not update any of the attributes, but it then clears the NFS_INO_INVALID_ATTR flag, which indicates that the attributes are up to date. Don't clear the flag if the fattr struct has no valid attrs to apply. Reviewed-by: Steve French <[email protected]> Signed-off-by: Jeff Layton <[email protected]> Cc: [email protected] Signed-off-by: Trond Myklebust <[email protected]>
2015-11-25nfs: ensure that attrcache is revalidated after a SETATTRJeff Layton1-1/+4
If we get no post-op attributes back from a SETATTR operation, then no attributes will of course be updated during the call to nfs_update_inode. We know however that the attributes are invalid at that point, since we just changed some of them. At the very least, the ctime will be bogus. If we get no post-op attributes back on the call, mark the attrcache invalid to reflect that fact. Reviewed-by: Steve French <[email protected]> Signed-off-by: Jeff Layton <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2015-11-25ARM/PCI: Move align_resource function pointer to pci_host_bridge structureGabriele Paoloni3-10/+20
Commit b3a72384fe29 ("ARM/PCI: Replace pci_sys_data->align_resource with global function pointer") introduced an ARM-specific align_resource() function pointer. This is not portable to other arches and doesn't work for platforms with two different PCIe host bridge controllers. Move the function pointer to the pci_host_bridge structure so each host bridge driver can specify its own align_resource() function. Signed-off-by: Gabriele Paoloni <[email protected]> Signed-off-by: Bjorn Helgaas <[email protected]> Reviewed-by: Arnd Bergmann <[email protected]>
2015-11-25Merge branch 'for-linus' of git://git.kernel.dk/linux-blockLinus Torvalds3-16/+24
Pull more block layer fixes from Jens Axboe: "I wasn't going to send off a new pull before next week, but the blk flush fix from Jan from the other day introduced a regression. It's rare enough not to have hit during testing, since it requires both a device that rejects the first flush, and bad timing while it does that. But since someone did hit it, let's get the revert into 4.4-rc3 so we don't have a released rc with that known issue. Apart from that revert, three other fixes: - From Christoph, a fix for a missing unmap in NVMe request preparation. - An NVMe fix from Nishanth that fixes data corruption on powerpc. - Also from Christoph, fix a list_del() attempt on blk-mq that didn't have a matching list_add() at timer start" * 'for-linus' of git://git.kernel.dk/linux-block: Revert "blk-flush: Queue through IO scheduler when flush not required" block: fix blk_abort_request for blk-mq drivers nvme: add missing unmaps in nvme_queue_rq NVMe: default to 4k device page size
2015-11-25ARM: OMAP4+: SMP: use lockless clkdm/pwrdm api in omap4_boot_secondaryGrygorii Strashko1-3/+3
OMAP CPU hotplug uses cpu1's clocks and power domains for CPU1 wake up from low power states (or turn on CPU1). This part of code is also part of system suspend (disable_nonboot_cpus()). >From other side, cpu1's clocks and power domains are used by CPUIdle. All above functionality is mutually exclusive and, therefore, lockless clkdm/pwrdm api can be used in omap4_boot_secondary(). This fixes below back-trace on -RT which is triggered by pwrdm_lock/unlock(): BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:917 in_atomic(): 1, irqs_disabled(): 0, pid: 118, name: sh 9 locks held by sh/118: #0: (sb_writers#4){.+.+.+}, at: [<c0144a6c>] vfs_write+0x13c/0x164 #1: (&of->mutex){+.+.+.}, at: [<c01b4c70>] kernfs_fop_write+0x48/0x19c #2: (s_active#24){.+.+.+}, at: [<c01b4c78>] kernfs_fop_write+0x50/0x19c #3: (device_hotplug_lock){+.+.+.}, at: [<c03cbff0>] lock_device_hotplug_sysfs+0xc/0x4c #4: (&dev->mutex){......}, at: [<c03cd284>] device_online+0x14/0x88 #5: (cpu_add_remove_lock){+.+.+.}, at: [<c003af90>] cpu_up+0x50/0x1a0 #6: (cpu_hotplug.lock){++++++}, at: [<c003ae48>] cpu_hotplug_begin+0x0/0xc4 #7: (cpu_hotplug.lock#2){+.+.+.}, at: [<c003aec0>] cpu_hotplug_begin+0x78/0xc4 #8: (boot_lock){+.+...}, at: [<c002b254>] omap4_boot_secondary+0x1c/0x178 Preemption disabled at:[< (null)>] (null) CPU: 0 PID: 118 Comm: sh Not tainted 4.1.12-rt11-01998-gb4a62c3-dirty #137 Hardware name: Generic DRA74X (Flattened Device Tree) [<c0017574>] (unwind_backtrace) from [<c0013be8>] (show_stack+0x10/0x14) [<c0013be8>] (show_stack) from [<c05a8670>] (dump_stack+0x80/0x94) [<c05a8670>] (dump_stack) from [<c05ad158>] (rt_spin_lock+0x24/0x54) [<c05ad158>] (rt_spin_lock) from [<c0030dac>] (clkdm_wakeup+0x10/0x2c) [<c0030dac>] (clkdm_wakeup) from [<c002b2c0>] (omap4_boot_secondary+0x88/0x178) [<c002b2c0>] (omap4_boot_secondary) from [<c0015d00>] (__cpu_up+0xc4/0x164) [<c0015d00>] (__cpu_up) from [<c003b09c>] (cpu_up+0x15c/0x1a0) [<c003b09c>] (cpu_up) from [<c03cd2d4>] (device_online+0x64/0x88) [<c03cd2d4>] (device_online) from [<c03cd360>] (online_store+0x68/0x74) [<c03cd360>] (online_store) from [<c01b4ce0>] (kernfs_fop_write+0xb8/0x19c) [<c01b4ce0>] (kernfs_fop_write) from [<c0144124>] (__vfs_write+0x20/0xd8) [<c0144124>] (__vfs_write) from [<c01449c0>] (vfs_write+0x90/0x164) [<c01449c0>] (vfs_write) from [<c01451e4>] (SyS_write+0x44/0x9c) [<c01451e4>] (SyS_write) from [<c0010240>] (ret_fast_syscall+0x0/0x54) CPU1: smp_ops.cpu_die() returned, trying to resuscitate Cc: Tero Kristo <[email protected]> Signed-off-by: Grygorii Strashko <[email protected]> Signed-off-by: Tony Lindgren <[email protected]>
2015-11-25Merge branch '81xx' into omap-for-v4.4/fixesTony Lindgren1286-50158/+30523
2015-11-25arm: omap2+: add missing HWMOD_NO_IDLEST in 81xx hwmod dataNeil Armstrong1-0/+3
Add missing HWMOD_NO_IDLEST hwmod flag for entries not having omap4 clkctrl values. The emac0 hwmod flag fixes the davinci_emac driver probe since the return of pm_resume() call is now checked. This solves the following boot errors : [ 0.121429] omap_hwmod: l4_ls: _wait_target_ready failed: -16 [ 0.121441] omap_hwmod: l4_ls: cannot be enabled for reset (3) [ 0.124342] omap_hwmod: l4_hs: _wait_target_ready failed: -16 [ 0.124352] omap_hwmod: l4_hs: cannot be enabled for reset (3) [ 1.967228] omap_hwmod: emac0: _wait_target_ready failed: -16 Cc: Brian Hutchinson <[email protected]> Signed-off-by: Neil Armstrong <[email protected]> Signed-off-by: Tony Lindgren <[email protected]>
2015-11-25Merge remote-tracking branches 'asoc/fix/rt5677', 'asoc/fix/st', ↵Mark Brown7-61/+86
'asoc/fix/sun4i-codec', 'asoc/fix/topology', 'asoc/fix/wm8960' and 'asoc/fix/wm8962' into asoc-linus
2015-11-25Merge remote-tracking branches 'asoc/fix/nau8825', 'asoc/fix/ops', ↵Mark Brown8-14/+54
'asoc/fix/rcar', 'asoc/fix/rl6231', 'asoc/fix/rockchip' and 'asoc/fix/rt5670' into asoc-linus
2015-11-25Merge remote-tracking branches 'asoc/fix/davinci', 'asoc/fix/es8328', ↵Mark Brown6-12/+24
'asoc/fix/fsl', 'asoc/fix/fsl-sai' and 'asoc/fix/intel' into asoc-linus
2015-11-25Merge remote-tracking branch 'asoc/fix/rt5645' into asoc-linusMark Brown1-4/+57
2015-11-25Merge remote-tracking branch 'asoc/fix/dapm' into asoc-linusMark Brown4-3/+12
2015-11-25Merge remote-tracking branch 'asoc/fix/arizona' into asoc-linusMark Brown1-11/+5
2015-11-25bpf: fix clearing on persistent program array mapsDaniel Borkmann4-17/+33
Currently, when having map file descriptors pointing to program arrays, there's still the issue that we unconditionally flush program array contents via bpf_fd_array_map_clear() in bpf_map_release(). This happens when such a file descriptor is released and is independent of the map's refcount. Having this flush independent of the refcount is for a reason: there can be arbitrary complex dependency chains among tail calls, also circular ones (direct or indirect, nesting limit determined during runtime), and we need to make sure that the map drops all references to eBPF programs it holds, so that the map's refcount can eventually drop to zero and initiate its freeing. Btw, a walk of the whole dependency graph would not be possible for various reasons, one being complexity and another one inconsistency, i.e. new programs can be added to parts of the graph at any time, so there's no guaranteed consistent state for the time of such a walk. Now, the program array pinning itself works, but the issue is that each derived file descriptor on close would nevertheless call unconditionally into bpf_fd_array_map_clear(). Instead, keep track of users and postpone this flush until the last reference to a user is dropped. As this only concerns a subset of references (f.e. a prog array could hold a program that itself has reference on the prog array holding it, etc), we need to track them separately. Short analysis on the refcounting: on map creation time usercnt will be one, so there's no change in behaviour for bpf_map_release(), if unpinned. If we already fail in map_create(), we are immediately freed, and no file descriptor has been made public yet. In bpf_obj_pin_user(), we need to probe for a possible map in bpf_fd_probe_obj() already with a usercnt reference, so before we drop the reference on the fd with fdput(). Therefore, if actual pinning fails, we need to drop that reference again in bpf_any_put(), otherwise we keep holding it. When last reference drops on the inode, the bpf_any_put() in bpf_evict_inode() will take care of dropping the usercnt again. In the bpf_obj_get_user() case, the bpf_any_get() will grab a reference on the usercnt, still at a time when we have the reference on the path. Should we later on fail to grab a new file descriptor, bpf_any_put() will drop it, otherwise we hold it until bpf_map_release() time. Joint work with Alexei. Fixes: b2197755b263 ("bpf: add support for persistent maps/progs") Signed-off-by: Daniel Borkmann <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-11-25Revert "blk-flush: Queue through IO scheduler when flush not required"Jens Axboe1-1/+1
This reverts commit 1b2ff19e6a957b1ef0f365ad331b608af80e932e. Jan writes: -- Thanks for report! After some investigation I found out we allocate elevator specific data in __get_request() only for non-flush requests. And this is actually required since the flush machinery uses the space in struct request for something else. Doh. So my patch is just wrong and not easy to fix since at the time __get_request() is called we are not sure whether the flush machinery will be used in the end. Jens, please revert 1b2ff19e6a957b1ef0f365ad331b608af80e932e. Thanks! I'm somewhat surprised that you can reliably hit the race where flushing gets disabled for the device just while the request is in flight. But I guess during boot it makes some sense. -- So let's just revert it, we can fix the queue run manually after the fact. This race is rare enough that it didn't trigger in testing, it requires the specific disable-while-in-flight scenario to trigger.
2015-11-25Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds21-111/+171
Pull KVM fixes from Paolo Bonzini: "Bug fixes for all architectures. Nothing really stands out" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (21 commits) KVM: nVMX: remove incorrect vpid check in nested invvpid emulation arm64: kvm: report original PAR_EL1 upon panic arm64: kvm: avoid %p in __kvm_hyp_panic KVM: arm/arm64: vgic: Trust the LR state for HW IRQs KVM: arm/arm64: arch_timer: Preserve physical dist. active state on LR.active KVM: arm/arm64: Fix preemptible timer active state crazyness arm64: KVM: Add workaround for Cortex-A57 erratum 834220 arm64: KVM: Fix AArch32 to AArch64 register mapping ARM/arm64: KVM: test properly for a PTE's uncachedness KVM: s390: fix wrong lookup of VCPUs by array index KVM: s390: avoid memory overwrites on emergency signal injection KVM: Provide function for VCPU lookup by id KVM: s390: fix pfmf intercept handler KVM: s390: enable SIMD only when no VCPUs were created KVM: x86: request interrupt window when IRQ chip is split KVM: x86: set KVM_REQ_EVENT on local interrupt request from user space KVM: x86: split kvm_vcpu_ready_for_interrupt_injection out of dm_request_for_irq_injection KVM: x86: fix interrupt window handling in split IRQ chip case MIPS: KVM: Uninit VCPU in vcpu_create error path MIPS: KVM: Fix CACHE immediate offset sign extension ...
2015-11-25isdn: Partially revert debug format string usage clean upChristoph Biedl4-6/+6
Commit 35a4a57 ("isdn: clean up debug format string usage") introduced a safeguard to avoid accidential format string interpolation of data when calling debugl1 or HiSax_putstatus. This did however not take into account VHiSax_putstatus (called by HiSax_putstatus) does *not* call vsprintf if the head parameter is NULL - the format string is treated as plain text then instead. As a result, the string "%s" is processed literally, and the actual information is lost. This affects the isdnlog userspace program which stopped logging information since that commit. So revert the HiSax_putstatus invocations to the previous state. Fixes: 35a4a5733b0a ("isdn: clean up debug format string usage") Cc: Kees Cook <[email protected]> Cc: Karsten Keil <[email protected]> Signed-off-by: Christoph Biedl <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-11-25drm/radeon: make some dpm errors debug onlyAlex Deucher2-3/+3
"Could not force DPM to low", etc. is usually harmless and just confuses users. Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected]
2015-11-25arm64: efi: correctly map runtime regionsMark Rutland1-7/+2
The kernel may use a page granularity of 4K, 16K, or 64K depending on configuration. When mapping EFI runtime regions, we use memrange_efi_to_native to round the physical base address of a region down to a kernel page boundary, and round the size up to a kernel page boundary, adding the residue left over from rounding down the physical base address. We do not round down the virtual base address. In __create_mapping we account for the offset of the virtual base from a granule boundary, adding the residue to the size before rounding the base down to said granule boundary. Thus we account for the residue twice, and when the residue is non-zero will cause __create_mapping to map an additional page at the end of the region. Depending on the memory map, this page may be in a region we are not intended/permitted to map, or may clash with a different region that we wish to map. In typical cases, mapping the next item in the memory map will overwrite the erroneously created entry, as we sort the memory map in the stub. As __create_mapping can cope with base addresses which are not page aligned, we can instead rely on it to map the region appropriately, and simplify efi_virtmap_init by removing the unnecessary code. Signed-off-by: Mark Rutland <[email protected]> Reviewed-by: Ard Biesheuvel <[email protected]> Cc: Leif Lindholm <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Catalin Marinas <[email protected]>
2015-11-25arm64: mm: fix fault_info table xFSC decodingMark Rutland1-14/+14
We are missing descriptions for some valid xFSC values in the fault info table (e.g. "TLB conflict abort"), and have erroneous descriptions for reserved values (e.g. "asynchronous external abort", "debug event"). This patch adds the missing xFSC values, and removes erroneous decoding of values reserved by the architecture, as described in ARM DDI 0487A.h. At the same time, fixed the unbalanced brackets for the synchronous parity error strings in the table. Signed-off-by: Mark Rutland <[email protected]> Reviewed-by: Will Deacon <[email protected]> Signed-off-by: Catalin Marinas <[email protected]>
2015-11-25arm64: fix building without CONFIG_UID16Arnd Bergmann2-2/+2
As reported by Michal Simek, building an ARM64 kernel with CONFIG_UID16 disabled currently fails because the system call table still needs to reference the individual function entry points that are provided by kernel/sys_ni.c in this case, and the declarations are hidden inside of #ifdef CONFIG_UID16: arch/arm64/include/asm/unistd32.h:57:8: error: 'sys_lchown16' undeclared here (not in a function) __SYSCALL(__NR_lchown, sys_lchown16) I believe this problem only exists on ARM64, because older architectures tend to not need declarations when their system call table is built in assembly code, while newer architectures tend to not need UID16 support. ARM64 only uses these system calls for compatibility with 32-bit ARM binaries. This changes the CONFIG_UID16 check into CONFIG_HAVE_UID16, which is set unconditionally on ARM64 with CONFIG_COMPAT, so we see the declarations whenever we need them, but otherwise the behavior is unchanged. Fixes: af1839eb4bd4 ("Kconfig: clean up the long arch list for the UID16 config option") Signed-off-by: Arnd Bergmann <[email protected]> Acked-by: Will Deacon <[email protected]> Cc: [email protected] Signed-off-by: Catalin Marinas <[email protected]>
2015-11-25drm/i915: Don't override output type for DDI HDMITakashi Iwai1-1/+2
Currently a DDI port may register the DP hotplug handler even though it's used with HDMI, and the DP HPD handler overrides the encoder type forcibly to DP. This caused the inconsistency on a machine connected with a HDMI monitor; upon a hotplug event, the DDI port is suddenly switched to be handled as a DP although the same monitor is kept connected, and this leads to the erroneous blank output. This patch papers over the bug by excluding the previous HDMI encoder type from this override. This should be fixed more fundamentally, e.g. by moving the encoder type reset from the HPD or by having individual encoder objects for HDMI and DP. But since the bug has been present for a long time (3.17), it's better to have a quick-n-dirty fix for now, and keep working on a cleaner fix. Bugzilla: http://bugzilla.opensuse.org/show_bug.cgi?id=955190 Fixes: 0e32b39ceed6 ('drm/i915: add DP 1.2 MST support (v0.7)') Cc: <[email protected]> # v3.17+ Signed-off-by: Takashi Iwai <[email protected]> Reviewed-by: Daniel Vetter <[email protected]> Link: http://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Jani Nikula <[email protected]>
2015-11-25drm/i915: Don't compare has_drrs strictly in pipe configTakashi Iwai1-1/+0
The commit [cfb23ed622d0: drm/i915: Allow fuzzy matching in pipe_config_compare, v2] relaxed the way to compare the pipe configurations, but one new comparison sneaked in there: it added the strict has_drrs value check. This causes a regression on many machines, typically HP laptops with a docking port, where the kernel spews warnings and eventually fails to set the mode properly like: [drm:intel_pipe_config_compare [i915]] *ERROR* mismatch in has_drrs (expected 1, found 0) ------------[ cut here ]------------ WARNING: CPU: 0 PID: 79 at drivers/gpu/drm/i915/intel_display.c:12700 intel_modeset_check_state+0x5aa/0x870 [i915]() pipe state doesn't match! .... This patch just removes the check again for fixing the regression. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=104041 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92456 Bugzilla: https://bugzilla.suse.com/show_bug.cgi?id=956397 Fixes: cfb23ed622d0 ('drm/i915: Allow fuzzy matching in pipe_config_compare, v2') Cc: <[email protected]> # v4.3+ Reported-and-tested-by: Max Lin <[email protected]> Signed-off-by: Takashi Iwai <[email protected]> Reviewed-by: Daniel Vetter <[email protected]> Link: http://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Jani Nikula <[email protected]>
2015-11-25ARM: orion5x: Fix legacy get_irqnr_and_baseNicolas Pitre1-1/+1
Commit 5be9fc23cd ("ARM: orion5x: fix legacy orion5x IRQ numbers") shifted IRQ numbers by one but didn't update the get_irqnr_and_base macro accordingly. This macro is involved when CONFIG_MULTI_IRQ_HANDLER is not defined. [jac: 5d6bed2a9c went in to v4.2, but was backported to v3.18] Signed-off-by: Nicolas Pitre <[email protected]> Fixes: 5be9fc23cd ("ARM: orion5x: fix legacy orion5x IRQ numbers") Cc: <[email protected]> # v3.18+ Signed-off-by: Jason Cooper <[email protected]>
2015-11-25ARM: dove: Fix legacy get_irqnr_and_baseNicolas Pitre1-2/+2
Commit 5d6bed2a9c ("ARM: dove: fix legacy dove IRQ numbers") shifted IRQ numbers by one but didn't update the get_irqnr_and_base macro accordingly. This macro is involved when CONFIG_MULTI_IRQ_HANDLER is not defined. [jac: 5d6bed2a9c went in to v4.2, but was backported to v3.18] Signed-off-by: Nicolas Pitre <[email protected]> Fixes: 5d6bed2a9c ("ARM: dove: fix legacy dove IRQ numbers") Cc: <[email protected]> # v3.18+ Signed-off-by: Jason Cooper <[email protected]>
2015-11-25KVM: nVMX: remove incorrect vpid check in nested invvpid emulationHaozhong Zhang1-5/+0
This patch removes the vpid check when emulating nested invvpid instruction of type all-contexts invalidation. The existing code is incorrect because: (1) According to Intel SDM Vol 3, Section "INVVPID - Invalidate Translations Based on VPID", invvpid instruction does not check vpid in the invvpid descriptor when its type is all-contexts invalidation. (2) According to the same document, invvpid of type all-contexts invalidation does not require there is an active VMCS, so/and get_vmcs12() in the existing code may result in a NULL-pointer dereference. In practice, it can crash both KVM itself and L1 hypervisors that use invvpid (e.g. Xen). Signed-off-by: Haozhong Zhang <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2015-11-25btrfs: fix balance range usage filters in 4.4-rcHolger Hoffstätte1-2/+2
There's a regression in 4.4-rc since commit bc3094673f22 (btrfs: extend balance filter usage to take minimum and maximum) in that existing (non-ranged) balance with -dusage=x no longer works; all chunks are skipped. After staring at the code for a while and wondering why a non-ranged balance would even need min and max thresholds (..which then were not set correctly, leading to the bug) I realized that the only problem was the fact that the filter functions were named wrong, thanks to patching copypasta. Simply renaming both functions lets the existing btrfs-progs call balance with -dusage=x and now the non-ranged filter function is invoked, properly using only a single chunk limit. Signed-off-by: Holger Hoffstätte <[email protected]> Fixes: bc3094673f22 ("btrfs: extend balance filter usage to take minimum and maximum") Reviewed-by: Filipe Manana <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2015-11-25btrfs: qgroup: account shared subtree during snapshot deleteMark Fasheh2-7/+42
Commit 0ed4792 ('btrfs: qgroup: Switch to new extent-oriented qgroup mechanism.') removed our qgroup accounting during btrfs_drop_snapshot(). Predictably, this results in qgroup numbers going bad shortly after a snapshot is removed. Fix this by adding a dirty extent record when we encounter extents during our shared subtree walk. This effectively restores the functionality we had with the original shared subtree walking code in 1152651 (btrfs: qgroup: account shared subtrees during snapshot delete). The idea with the original patch (and this one) is that shared subtrees can get skipped during drop_snapshot. The shared subtree walk then allows us a chance to visit those extents and add them to the qgroup work for later processing. This ultimately makes the accounting for drop snapshot work. The new qgroup code nicely handles all the other extents during the tree walk via the ref dec/inc functions so we don't have to add actions beyond what we had originally. Signed-off-by: Mark Fasheh <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2015-11-25Btrfs: use btrfs_get_fs_root in resolve_indirect_refJosef Bacik1-1/+1
The backref code will look up the fs_root we're trying to resolve our indirect refs for, unfortunately we use btrfs_read_fs_root_no_name, which returns -ENOENT if the ref is 0. This isn't helpful for the qgroup stuff with snapshot delete as it won't be able to search down the snapshot we are deleting, which will cause us to miss roots. So use btrfs_get_fs_root and send false for check_ref so we can always get the root we're looking for. Thanks, Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: Mark Fasheh <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2015-11-25btrfs: qgroup: fix quota disable during rescanJustin Maggard1-1/+2
There's a race condition that leads to a NULL pointer dereference if you disable quotas while a quota rescan is running. To fix this, we just need to wait for the quota rescan worker to actually exit before tearing down the quota structures. Signed-off-by: Justin Maggard <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2015-11-25Btrfs: fix race between cleaner kthread and space cache writeoutFilipe Manana1-13/+16
When a block group becomes unused and the cleaner kthread is currently running, we can end up getting the current transaction aborted with error -ENOENT when we try to commit the transaction, leading to the following trace: [59779.258768] WARNING: CPU: 3 PID: 5990 at fs/btrfs/extent-tree.c:3740 btrfs_write_dirty_block_groups+0x17c/0x214 [btrfs]() [59779.272594] BTRFS: Transaction aborted (error -2) (...) [59779.291137] Call Trace: [59779.291621] [<ffffffff812566f4>] dump_stack+0x4e/0x79 [59779.292543] [<ffffffff8104d0a6>] warn_slowpath_common+0x9f/0xb8 [59779.293435] [<ffffffffa04cb81f>] ? btrfs_write_dirty_block_groups+0x17c/0x214 [btrfs] [59779.295000] [<ffffffff8104d107>] warn_slowpath_fmt+0x48/0x50 [59779.296138] [<ffffffffa04c2721>] ? write_one_cache_group.isra.32+0x77/0x82 [btrfs] [59779.297663] [<ffffffffa04cb81f>] btrfs_write_dirty_block_groups+0x17c/0x214 [btrfs] [59779.299141] [<ffffffffa0549b0d>] commit_cowonly_roots+0x1de/0x261 [btrfs] [59779.300359] [<ffffffffa04dd5b6>] btrfs_commit_transaction+0x4c4/0x99c [btrfs] [59779.301805] [<ffffffffa04b5df4>] btrfs_sync_fs+0x145/0x1ad [btrfs] [59779.302893] [<ffffffff81196634>] sync_filesystem+0x7f/0x93 (...) [59779.318186] ---[ end trace 577e2daff90da33a ]--- The following diagram illustrates a sequence of steps leading to this problem: CPU 1 CPU 2 <at transaction N> adds bg A to list fs_info->unused_bgs adds bg B to list fs_info->unused_bgs <transaction kthread commits transaction N and wakes up the cleaner kthread> cleaner kthread delete_unused_bgs() sees bg A in list fs_info->unused_bgs btrfs_start_transaction() <transaction N + 1 starts> deletes bg A update_block_group(bg C) --> adds bg C to list fs_info->unused_bgs deletes bg B sees bg C in the list fs_info->unused_bgs btrfs_remove_chunk(bg C) btrfs_remove_block_group(bg C) --> checks if the block group is in a dirty list, and because it isn't now, it does nothing --> the block group item is deleted from the extent tree --> adds bg C to list transaction->dirty_bgs some task calls btrfs_commit_transaction(t N + 1) commit_cowonly_roots() btrfs_write_dirty_block_groups() --> sees bg C in cur_trans->dirty_bgs --> calls write_one_cache_group() which returns -ENOENT because it did not find the block group item in the extent tree --> transaction aborte with -ENOENT because write_one_cache_group() returned that error So fix this by adding a block group to the list of dirty block groups before adding it to the list of unused block groups. This happened on a stress test using fsstress plus concurrent calls to fallocate 20G and truncate (releasing part of the space allocated with fallocate). Signed-off-by: Filipe Manana <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2015-11-25Btrfs: fix scrub preventing unused block groups from being deletedFilipe Manana3-1/+24
Currently scrub can race with the cleaner kthread when the later attempts to delete an unused block group, and the result is preventing the cleaner kthread from ever deleting later the block group - unless the block group becomes used and unused again. The following diagram illustrates that race: CPU 1 CPU 2 cleaner kthread btrfs_delete_unused_bgs() gets block group X from fs_info->unused_bgs and removes it from that list scrub_enumerate_chunks() searches device tree using its commit root finds device extent for block group X gets block group X from the tree fs_info->block_group_cache_tree (via btrfs_lookup_block_group()) sets bg X to RO sees the block group is already RO and therefore doesn't delete it nor adds it back to unused list So fix this by making scrub add the block group again to the list of unused block groups if the block group is still unused when it finished scrubbing it and it hasn't been removed already. Signed-off-by: Filipe Manana <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2015-11-25Btrfs: fix race between scrub and block group deletionFilipe Manana1-4/+16
Scrub can race with the cleaner kthread deleting block groups that are unused (and with relocation too) leading to a failure with error -EINVAL that gets returned to user space. The following diagram illustrates how it happens: CPU 1 CPU 2 cleaner kthread btrfs_delete_unused_bgs() gets block group X from fs_info->unused_bgs sets block group to RO btrfs_remove_chunk(bg X) deletes device extents scrub_enumerate_chunks() searches device tree using its commit root finds device extent for block group X gets block group X from the tree fs_info->block_group_cache_tree (via btrfs_lookup_block_group()) sets bg X to RO (again) btrfs_remove_block_group(bg X) deletes block group from fs_info->block_group_cache_tree removes extent map from fs_info->mapping_tree scrub_chunk(offset X) searches fs_info->mapping_tree for extent map starting at offset X --> doesn't find any such extent map --> returns -EINVAL and scrub errors out to userspace with -EINVAL Fix this by dealing with an extent map lookup failure as an indicator of block group deletion. Issue reproduced with fstest btrfs/071. Signed-off-by: Filipe Manana <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2015-11-25btrfs: fix rcu warning during device replaceDavid Sterba1-4/+2
The test btrfs/011 triggers a rcu warning Reviewed-by: Anand Jain <[email protected]> =============================== [ INFO: suspicious RCU usage. ] 4.4.0-rc1-default+ #286 Tainted: G W ------------------------------- fs/btrfs/volumes.c:1977 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 1, debug_locks = 0 4 locks held by btrfs/28786: 0: (&fs_info->dev_replace.lock_finishing_cancel_unmount){+.+...}, at: [<ffffffffa00bc785>] btrfs_dev_replace_finishing+0x45/0xa00 [btrfs] 1: (uuid_mutex){+.+.+.}, at: [<ffffffffa00bc84f>] btrfs_dev_replace_finishing+0x10f/0xa00 [btrfs] 2: (&fs_devs->device_list_mutex){+.+.+.}, at: [<ffffffffa00bc868>] btrfs_dev_replace_finishing+0x128/0xa00 [btrfs] 3: (&fs_info->chunk_mutex){+.+...}, at: [<ffffffffa00bc87d>] btrfs_dev_replace_finishing+0x13d/0xa00 [btrfs] stack backtrace: CPU: 0 PID: 28786 Comm: btrfs Tainted: G W 4.4.0-rc1-default+ #286 Hardware name: Intel Corporation SandyBridge Platform/To be filled by O.E.M., BIOS ASNBCPT1.86C.0031.B00.1006301607 06/30/2010 0000000000000001 ffff8800a07dfb48 ffffffff8141d47b 0000000000000001 0000000000000001 0000000000000000 ffff8801464a4f00 ffff8800a07dfb78 ffffffff810cd883 ffff880146eb9400 ffff8800a3698600 ffff8800a33fe220 Call Trace: [<ffffffff8141d47b>] dump_stack+0x4f/0x74 [<ffffffff810cd883>] lockdep_rcu_suspicious+0x103/0x140 [<ffffffffa0071261>] btrfs_rm_dev_replace_remove_srcdev+0x111/0x130 [btrfs] [<ffffffff810d354d>] ? trace_hardirqs_on+0xd/0x10 [<ffffffff81449536>] ? __percpu_counter_sum+0x66/0x80 [<ffffffffa00bcc15>] btrfs_dev_replace_finishing+0x4d5/0xa00 [btrfs] [<ffffffffa00bc96e>] ? btrfs_dev_replace_finishing+0x22e/0xa00 [btrfs] [<ffffffffa00a8795>] ? btrfs_scrub_dev+0x415/0x6d0 [btrfs] [<ffffffffa003ea69>] ? btrfs_start_transaction+0x9/0x20 [btrfs] [<ffffffffa00bda79>] btrfs_dev_replace_start+0x339/0x590 [btrfs] [<ffffffff81196aa5>] ? __might_fault+0x95/0xa0 [<ffffffffa0078638>] btrfs_ioctl_dev_replace+0x118/0x160 [btrfs] [<ffffffff811409c6>] ? stack_trace_call+0x46/0x70 [<ffffffffa007c914>] ? btrfs_ioctl+0x24/0x1770 [btrfs] [<ffffffffa007ce43>] btrfs_ioctl+0x553/0x1770 [btrfs] [<ffffffff811409c6>] ? stack_trace_call+0x46/0x70 [<ffffffff811d6eb1>] ? do_vfs_ioctl+0x21/0x5a0 [<ffffffff811d6f1c>] do_vfs_ioctl+0x8c/0x5a0 [<ffffffff811e3336>] ? __fget_light+0x86/0xb0 [<ffffffff811e3369>] ? __fdget+0x9/0x20 [<ffffffff811d7451>] ? SyS_ioctl+0x21/0x80 [<ffffffff811d7483>] SyS_ioctl+0x53/0x80 [<ffffffff81b1efd7>] entry_SYSCALL_64_fastpath+0x12/0x6f This is because of unprotected use of rcu_dereference in btrfs_scratch_superblocks. We can't add rcu locks around the whole function because we read the superblock. The fix will use the rcu string buffer directly without the rcu locking. Thi is safe as the device will not go away in the meantime. We're holding the device list mutexes. Restructuring the code to narrow down the rcu section turned out to be impossible, we need to call filp_open (through update_dev_time) on the buffer and this could call kmalloc/__might_sleep. We could call kstrdup with GFP_ATOMIC but it's not absolutely necessary. Fixes: 12b1c2637b6e (Btrfs: enhance btrfs_scratch_superblock to scratch all superblocks) Signed-off-by: David Sterba <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2015-11-25btrfs: Continue replace when set_block_ro failedZhaolei1-2/+18
xfstests/011 failed in node with small_size filesystem. Can be reproduced by following script: DEV_LIST="/dev/vdd /dev/vde" DEV_REPLACE="/dev/vdf" do_test() { local mkfs_opt="$1" local size="$2" dmesg -c >/dev/null umount $SCRATCH_MNT &>/dev/null echo mkfs.btrfs -f $mkfs_opt "${DEV_LIST[*]}" mkfs.btrfs -f $mkfs_opt "${DEV_LIST[@]}" || return 1 mount "${DEV_LIST[0]}" $SCRATCH_MNT echo -n "Writing big files" dd if=/dev/urandom of=$SCRATCH_MNT/t0 bs=1M count=1 >/dev/null 2>&1 for ((i = 1; i <= size; i++)); do echo -n . /bin/cp $SCRATCH_MNT/t0 $SCRATCH_MNT/t$i || return 1 done echo echo Start replace btrfs replace start -Bf "${DEV_LIST[0]}" "$DEV_REPLACE" $SCRATCH_MNT || { dmesg return 1 } return 0 } # Set size to value near fs size # for example, 1897 can trigger this bug in 2.6G device. # ./do_test "-d raid1 -m raid1" 1897 System will report replace fail with following warning in dmesg: [ 134.710853] BTRFS: dev_replace from /dev/vdd (devid 1) to /dev/vdf started [ 135.542390] BTRFS: btrfs_scrub_dev(/dev/vdd, 1, /dev/vdf) failed -28 [ 135.543505] ------------[ cut here ]------------ [ 135.544127] WARNING: CPU: 0 PID: 4080 at fs/btrfs/dev-replace.c:428 btrfs_dev_replace_start+0x398/0x440() [ 135.545276] Modules linked in: [ 135.545681] CPU: 0 PID: 4080 Comm: btrfs Not tainted 4.3.0 #256 [ 135.546439] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.2-0-g33fbe13 by qemu-project.org 04/01/2014 [ 135.547798] ffffffff81c5bfcf ffff88003cbb3d28 ffffffff817fe7b5 0000000000000000 [ 135.548774] ffff88003cbb3d60 ffffffff810a88f1 ffff88002b030000 00000000ffffffe4 [ 135.549774] ffff88003c080000 ffff88003c082588 ffff88003c28ab60 ffff88003cbb3d70 [ 135.550758] Call Trace: [ 135.551086] [<ffffffff817fe7b5>] dump_stack+0x44/0x55 [ 135.551737] [<ffffffff810a88f1>] warn_slowpath_common+0x81/0xc0 [ 135.552487] [<ffffffff810a89e5>] warn_slowpath_null+0x15/0x20 [ 135.553211] [<ffffffff81448c88>] btrfs_dev_replace_start+0x398/0x440 [ 135.554051] [<ffffffff81412c3e>] btrfs_ioctl+0x1d2e/0x25c0 [ 135.554722] [<ffffffff8114c7ba>] ? __audit_syscall_entry+0xaa/0xf0 [ 135.555506] [<ffffffff8111ab36>] ? current_kernel_time64+0x56/0xa0 [ 135.556304] [<ffffffff81201e3d>] do_vfs_ioctl+0x30d/0x580 [ 135.557009] [<ffffffff8114c7ba>] ? __audit_syscall_entry+0xaa/0xf0 [ 135.557855] [<ffffffff810011d1>] ? do_audit_syscall_entry+0x61/0x70 [ 135.558669] [<ffffffff8120d1c1>] ? __fget_light+0x61/0x90 [ 135.559374] [<ffffffff81202124>] SyS_ioctl+0x74/0x80 [ 135.559987] [<ffffffff81809857>] entry_SYSCALL_64_fastpath+0x12/0x6f [ 135.560842] ---[ end trace 2a5c1fc3205abbdd ]--- Reason: When big data writen to fs, the whole free space will be allocated for data chunk. And operation as scrub need to set_block_ro(), and when there is only one metadata chunk in system(or other metadata chunks are all full), the function will try to allocate a new chunk, and failed because no space in device. Fix: When set_block_ro failed for metadata chunk, it is not a problem because scrub_lock paused commit_trancaction in same time, and metadata are always cowed, so the on-the-fly writepages will not write data into same place with scrub/replace. Let replace continue in this case is no problem. Tested by above script, and xfstests/011, plus 100 times xfstests/070. Changelog v1->v2: 1: Add detail comments in source and commit-message. 2: Add dmesg detail into commit-message. 3: Limit return value of -ENOSPC to be passed. All suggested by: Filipe Manana <[email protected]> Suggested-by: Filipe Manana <[email protected]> Signed-off-by: Zhao Lei <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2015-11-25btrfs: fix clashing number of the enhanced balance usage filterDavid Sterba1-1/+1
I've accidentally picked an already used number for the enhanced usage filter represented by BTRFS_BALANCE_ARGS_USAGE_RANGE, clashing with BTRFS_BALANCE_ARGS_CONVERT. Introduced during the development phase, no backward compatibility issues. Reported-by: Holger Hoffstätte <[email protected]> Reported-by: Dan Carpenter <[email protected]> Fixes: bc3094673f22 ("btrfs: extend balance filter usage to take minimum and maximum") Signed-off-by: David Sterba <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2015-11-25Btrfs: fix the number of transaction units needed to remove a block groupFilipe Manana3-5/+38
We were using only 1 transaction unit when attempting to delete an unused block group but in reality we need 3 + N units, where N corresponds to the number of stripes. We were accounting only for the addition of the orphan item (for the block group's free space cache inode) but we were not accounting that we need to delete one block group item from the extent tree, one free space item from the tree of tree roots and N device extent items from the device tree. While one unit is not enough, it worked most of the time because for each single unit we are too pessimistic and assume an entire tree path, with the highest possible heigth (8), needs to be COWed with eventual node splits at every possible level in the tree, so there was usually enough reserved space for removing all the items and adding the orphan item. However after adding the orphan item, writepages() can by called by the VM subsystem against the btree inode when we are under memory pressure, which causes writeback to start for the nodes we COWed before, this forces the operation to remove the free space item to COW again some (or all of) the same nodes (in the tree of tree roots). Even without writepages() being called, we could fail with ENOSPC because these items are located in multiple trees and one of them might have a higher heigth and require node/leaf splits at many levels, exhausting all the reserved space before removing all the items and adding the orphan. In the kernel 4.0 release, commit 3d84be799194 ("Btrfs: fix BUG_ON in btrfs_orphan_add() when delete unused block group"), we attempted to fix a BUG_ON due to ENOSPC when trying to add the orphan item by making the cleaner kthread reserve one transaction unit before attempting to remove the block group, but this was not enough. We had a couple user reports still hitting the same BUG_ON after 4.0, like Stefan Priebe's report on a 4.2-rc6 kernel for example: http://www.spinics.net/lists/linux-btrfs/msg46070.html So fix this by reserving all the necessary units of metadata. Reported-by: Stefan Priebe <[email protected]> Fixes: 3d84be799194 ("Btrfs: fix BUG_ON in btrfs_orphan_add() when delete unused block group") Signed-off-by: Filipe Manana <[email protected]> Signed-off-by: Chris Mason <[email protected]>