Age | Commit message (Collapse) | Author | Files | Lines |
|
The atomic connector DPMS helper implements the connector DPMS operation
using atomic commit, removing the need for DPMS helper operations on
CRTCs and encoders.
Signed-off-by: Laurent Pinchart <[email protected]>
|
|
This removes the legacy mode config code. The CRTC and encoder prepare
and commit operations are not used anymore, remove them.
Signed-off-by: Laurent Pinchart <[email protected]>
|
|
This removes the legacy plane update code. Wire up the default atomic
check and atomic commit mode config helpers as needed by the plane
update atomic helpers.
Signed-off-by: Laurent Pinchart <[email protected]>
|
|
When using atomic updates the CRTC .enable() and .disable() helper
operations are preferred over the (then legacy) .prepare() and .commit()
operations. Implement .enable() and rework .disable() to not depend on
DPMS, easing DPMS removal later on.
Signed-off-by: Laurent Pinchart <[email protected]>
|
|
When using atomic updates the encoder .enable() and .disable() helper
operations are preferred over the (then legacy) .prepare() and .commit()
operations. Implement .enable() and .disable() and rework .prepare(),
.commit() and .dpms() as wrappers around .enable() and .disable(),
easing their future removal.
Signed-off-by: Laurent Pinchart <[email protected]>
|
|
When using atomic updates the encoder .enable() and .disable() helper
operations are preferred over the (then legacy) .prepare() and .commit()
operations. Implement .enable() and .disable() and rework .prepare(),
.commit() and .dpms() as wrappers around .enable() and .disable(),
easing their future removal.
Signed-off-by: Laurent Pinchart <[email protected]>
|
|
The LVDS encoder doesn't support DPMS states, replace the DPMS operation
by enable/disable to avoid propagating DPMS states down to the encoder
code.
Signed-off-by: Laurent Pinchart <[email protected]>
|
|
The plane source and destination size and positions are stored in the
plane state, and a private copy is kept in the rcar_du_plane objects.
Remove the private copy as it just duplicates the state.
Signed-off-by: Laurent Pinchart <[email protected]>
|
|
Hook up the default .reset(), .atomic_duplicate_state() and
.atomic_free_state() helpers to ensure that state objects are properly
created and destroyed, and call drm_mode_config_reset() at init time to
create the initial state objects.
Framebuffer reference count also gets maintained automatically by the
transitional helpers except for the legacy page flip operation. Maintain
it explicitly there.
Signed-off-by: Laurent Pinchart <[email protected]>
|
|
Use the new CRTC atomic transitional helpers drm_helper_crtc_mode_set()
and drm_helper_crtc_mode_set_base() to implement the CRTC .mode_set and
.mode_set_base operations. This delegates primary plane configuration to
the plane .atomic_update and .atomic_disable operations, removing
duplicate code from the CRTC implementation.
There is now no code path available to the driver in which to drop the
reference to the CRTC acquired in the .prepare() operation if an error
then occurs. The driver thus now leaks a reference if an error occurs
during mode set. So be it, this will be fixed in a further step of the
atomic update transition.
Signed-off-by: Laurent Pinchart <[email protected]>
|
|
Implement the CRTC .atomic_begin() and .atomic_flush() operations, the
plane .atomic_check(), .atomic_update() and operations, and use the
transitional atomic helpers to implement the plane update and disable
operations on top of the new atomic operations.
The plane setup code can't be moved out of the CRTC start function
completely yet, as the atomic code paths are not taken every time the
CRTC needs to be started. This results in some code duplication that
will be fixed after switching to atomic updates completely.
Signed-off-by: Laurent Pinchart <[email protected]>
|
|
The hardware plane allocator loops over all planes to find free
candidates. However, instead of looping over the number of hardware
planes, it loops over the number of software planes, which happens to be
larger by one unit. This has no effect in practise as the extra plane is
always cleared in the mask of free planes, but it should still be fixed
for correctness.
Signed-off-by: Laurent Pinchart <[email protected]>
|
|
Explicitly create the CRTC primary plane instead of relying on the core
helpers to do so. This simplifies the plane logic by merging the KMS and
software planes.
Reject plane API operations on the primary planes for now, as that code
will anyway be refactored when implementing support for atomic updates.
Signed-off-by: Laurent Pinchart <[email protected]>
|
|
Let's avoid magic constants. Beside increasing code readability, it will
also ensure that no location will be forgotten when raising the maximum
number of groups, CRTCs or LVDS encoders
Signed-off-by: Laurent Pinchart <[email protected]>
|
|
fbdev emulation requires at least one connector, and will fail to
initialize if no connector has been successfully instantiated. Disable
it in that case and print an informational message instead of failing
probe with a confusing fbdev emulation error message.
It could be argued that probe should fail when no connector is present,
but the DU could still be useful in that case with the to-be-implemented
memory write-back support.
Signed-off-by: Laurent Pinchart <[email protected]>
|
|
The DRM core vblank handling mechanism requires drivers to forcefully
turn vblank reporting off when disabling the CRTC, and to restore the
vblank reporting status when enabling the CRTC.
Implement this using the drm_crtc_vblank_on/off helpers. When disabling
vblank we must first wait for page flips to complete, so implement page
flip completion wait as well.
Finally, drm_crtc_vblank_off() must be called at startup to synchronize
the state of the vblank core code with the hardware, which is initially
disabled. This is performed at CRTC creation time, requiring vertical
blanking to be initialized before creating CRTCs.
Signed-off-by: Laurent Pinchart <[email protected]>
|
|
Turning a CRTC off will prevent a queued page flip from ever completing,
potentially confusing userspace. Wait for queued page flips to complete
before turning the CRTC off to avoid this.
Signed-off-by: Laurent Pinchart <[email protected]>
|
|
The next commit will need functions to be reordered to avoid forward
declarations. Do it separately to help review.
This only moves functions without any change to the code.
Signed-off-by: Laurent Pinchart <[email protected]>
|
|
The drm_connector encoder field points to the encoder driving the
connector. No such association exists at init time, as all pipelines are
disabled. Don't set the field.
Signed-off-by: Laurent Pinchart <[email protected]>
|
|
The function is meant to restore the fbdev mode in the lastclose
handler, not to be called at init time. Remove the call.
Signed-off-by: Laurent Pinchart <[email protected]>
|
|
All encoders and CRTCs start disabled, re-disabling them is a no-op.
Signed-off-by: Laurent Pinchart <[email protected]>
|
|
This reverts commit 9c58e8dbd3bfe7197323c88a784617afeffa9f87.
This doesn't seem to fully fix this, Kbuild who knows.
Signed-off-by: Dave Airlie <[email protected]>
|
|
Otherwise Kconfig gets confused and somehow ends up creating a 2nd drm
submenu. I couldn't find i915 because of this any more at first.
Cc: Andy Yan <[email protected]>
Cc: Russell King <[email protected]>
Cc: Philipp Zabel <[email protected]>
Cc: "Yann E. MORIN" <[email protected]>
Cc: [email protected]
Signed-off-by: Daniel Vetter <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
git://git.pengutronix.de/git/pza/linux into drm-fixes
imx-drm fixes for mode fixup, dw_hdmi/imx, and parallel-display
- A clock fix for too large pixel clocks depending on the
DI clock flag simplification patch
- Pruning of unsupported modes and a missing end of array element
for dw_hdmi-imx
- LVDS modeset fix for mode fixup
- Fix parallel-display deferred probing if drm_panel is used
* tag 'imx-drm-fixes-2015-02-24' of git://git.pengutronix.de/git/pza/linux:
DRM: i.MX: parallel display: Support probe deferral for finding DRM panel
drm/imx: imx-ldb: enable DI clock in encoder_mode_set
drm/imx: dw_hdmi-imx: add end of array element to current control array
drm/imx: dw_hdmi-imx: add mode_valid callback prune unsupported modes
gpu: ipu-v3: do not divide by zero if the pixel clock is too large
|
|
eCryptfs can't be aware of what to expect when after passing an
arbitrary ioctl command through to the lower filesystem. The ioctl
command may trigger an action in the lower filesystem that is
incompatible with eCryptfs.
One specific example is when one attempts to use the Btrfs clone
ioctl command when the source file is in the Btrfs filesystem that
eCryptfs is mounted on top of and the destination fd is from a new file
created in the eCryptfs mount. The ioctl syscall incorrectly returns
success because the command is passed down to Btrfs which thinks that it
was able to do the clone operation. However, the result is an empty
eCryptfs file.
This patch allows the trim, {g,s}etflags, and {g,s}etversion ioctl
commands through and then copies up the inode metadata from the lower
inode to the eCryptfs inode to catch any changes made to the lower
inode's metadata. Those five ioctl commands are mostly common across all
filesystems but the whitelist may need to be further pruned in the
future.
https://bugzilla.kernel.org/show_bug.cgi?id=93691
https://launchpad.net/bugs/1305335
Signed-off-by: Tyler Hicks <[email protected]>
Cc: Rocko <[email protected]>
Cc: Colin Ian King <[email protected]>
Cc: [email protected] # v2.6.36+: c43f7b8 eCryptfs: Handle ioctl calls with unlocked and compat functions
|
|
This patch integrates Cyber Cortex AV boards with the existing
ftdi_jtag_quirk in order to use serial port 0 with JTAG which is
required by the manufacturers' software.
Steps: 2
[ftdi_sio_ids.h]
1. Defined the device PID
[ftdi_sio.c]
2. Added a macro declaration to the ids array, in order to enable the
jtag quirk for the device.
Signed-off-by: Max Mansfield <[email protected]>
Cc: stable <[email protected]>
Signed-off-by: Johan Hovold <[email protected]>
|
|
If an over-MTU UDP datagram is sent through a SOCK_RAW socket to a
UFO-capable device, ip_ufo_append_data() sets skb->ip_summed to
CHECKSUM_PARTIAL unconditionally as all GSO code assumes transport layer
checksum is to be computed on segmentation. However, in this case,
skb->csum_start and skb->csum_offset are never set as raw socket
transmit path bypasses udp_send_skb() where they are usually set. As a
result, driver may access invalid memory when trying to calculate the
checksum and store the result (as observed in virtio_net driver).
Moreover, the very idea of modifying the userspace provided UDP header
is IMHO against raw socket semantics (I wasn't able to find a document
clearly stating this or the opposite, though). And while allowing
CHECKSUM_NONE in the UFO case would be more efficient, it would be a bit
too intrusive change just to handle a corner case like this. Therefore
disallowing UFO for packets from SOCK_DGRAM seems to be the best option.
Signed-off-by: Michal Kubecek <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Ben Hutchings says:
====================
Fixes for sh_eth #4 v2
I'm continuing review and testing of Ethernet support on the R-Car H2
chip, with help from a colleague. This series fixes a few more issues.
These are not tested on any of the other supported chips.
v2: Add note that the revert is not a pure revert.
====================
Signed-off-by: David S. Miller <[email protected]>
|
|
My previous fix to clear padding of short frames used skb->len as the
DMA length, assuming that skb_padto() extended skb->len to include the
padding. That isn't the case; we need to use skb_put_padto() instead.
(This wasn't immediately obvious because software padding isn't
actually needed on the R-Car H2. We could make it conditional on
which chip is being driven, but it's probably not worth the effort.)
Reported-by: "Violeta Menéndez González" <[email protected]>
Fixes: 612a17a54b50 ("sh_eth: Fix padding of short frames on TX")
Signed-off-by: Ben Hutchings <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
This reverts commit fd9af07c3404ac9ecbd0d859563360f51ce1ffde.
The hardware manual states that the frame error and multicast bits are
copied to bits 9:0 of RD0, not bits 25:16. I've tested that this is
true for RFS1 (CRC error), RFS3 (frame too short), RFS4 (frame too
long) and RFS8 (multicast).
Also adjust a comment to agree with this.
Signed-off-by: Ben Hutchings <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
In case of RX ring underrun (RDE), we attempt to reset the software
descriptor pointers (dirty_rx and cur_rx) to match where the hardware
will read the next descriptor from, as that might not be the first
dirty descriptor. This relies on reading RDFAR, but that register
doesn't exist on all supported chips - specifically, not on the R-Car
chips. This will result in unpredictable behaviour on those chips
after an RDE.
Make this pointer reset conditional and assume that it isn't needed on
the R-Car chips. This fix also assumes that RDFAR is never exposed at
offset 0 in the memory map - this is currently true, and a subsequent
commit will fix the ambiguity between offset 0 and no-offset in the
register offset maps.
Fixes: 79fba9f51755 ("net: sh_eth: fix the rxdesc pointer when rx ...")
Signed-off-by: Ben Hutchings <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
When submitting a DMA descriptor, the active bit must be written last.
When reading a completed DMA descriptor, the active bit must be read
first.
Add memory barriers to ensure that this ordering is maintained.
Signed-off-by: Ben Hutchings <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
work-fixes
Pull samsung thermal fixes from Lukasz Majewski:
"Changes:
- Exynos7 power down detection mode fix
- Fix for cpufreq cooling device regression
- Updating MAINTAINER's entry for Samsung Exynos Thermal"
Signed-off-by: Eduardo Valentin <[email protected]>
|
|
While one must hold RCU-sched (aka. preempt_disable) for find_symbol()
one must equally hold it over the use of the object returned.
The moment you release the RCU-sched read lock, the object can be dead
and gone.
[[email protected]: change subject line to be aligned with other patches]
Cc: Seth Jennings <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Miroslav Benes <[email protected]>
Cc: Petr Mladek <[email protected]>
Cc: Jiri Kosina <[email protected]>
Cc: "Paul E. McKenney" <[email protected]>
Cc: Rusty Russell <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Masami Hiramatsu <[email protected]>
Acked-by: Paul E. McKenney <[email protected]>
Acked-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Jiri Kosina <[email protected]>
|
|
In nfs_client_return_marked_delegations() and nfs_delegation_reap_unclaimed()
we want to optimise the loop traversal by skipping delegations that are
already in the process of being returned.
Signed-off-by: Trond Myklebust <[email protected]>
|
|
This patch ensures that the superblock doesn't go ahead and disappear
underneath us while the state manager thread is returning delegations.
Signed-off-by: Trond Myklebust <[email protected]>
|
|
Ensure that nfs_inode_set_delegation() doesn't inadvertently detach a
delegation that is already in the process of being returned.
Signed-off-by: Trond Myklebust <[email protected]>
|
|
Signed-off-by: Trond Myklebust <[email protected]>
|
|
After 566fcec60 the client uses the "current stateid" from the
nfs4_state structure to close a file. This could potentially contain a
delegation stateid, which is disallowed by the protocol and causes
servers to return NFS4ERR_BAD_STATEID. This patch restores the
(correct) behavior of sending the open stateid to close a file.
Reported-by: Olga Kornievskaia <[email protected]>
Fixes: 566fcec60 (NFSv4: Fix an atomicity problem in CLOSE)
Signed-off-by: Anna Schumaker <[email protected]>
Signed-off-by: Trond Myklebust <[email protected]>
|
|
Enable disabled interrupt, on unsuccessful operation.
Found by Coccinelle.
Signed-off-by: Tapasweni Pathak <[email protected]>
Acked-by: Julia Lawall <[email protected]>
Reviewed-by: James Hogan <[email protected]>
Signed-off-by: Marcelo Tosatti <[email protected]>
|
|
Currently the guest exit trace event saves the VCPU pointer to the
structure, and the guest PC is retrieved by dereferencing it when the
event is printed rather than directly from the trace record. This isn't
safe as the printing may occur long afterwards, after the PC has changed
and potentially after the VCPU has been freed. Usually this results in
the same (wrong) PC being printed for multiple trace events. It also
isn't portable as userland has no way to access the VCPU data structure
when interpreting the trace record itself.
Lets save the actual PC in the structure so that the correct value is
accessible later.
Fixes: 669e846e6c4e ("KVM/MIPS32: MIPS arch specific APIs for KVM")
Signed-off-by: James Hogan <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Ralf Baechle <[email protected]>
Cc: Marcelo Tosatti <[email protected]>
Cc: Gleb Natapov <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: <[email protected]> # v3.10+
Acked-by: Steven Rostedt <[email protected]>
Signed-off-by: Marcelo Tosatti <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio
Pull GPIO fixes from Linus Walleij:
"Two GPIO fixes:
- Fix a translation problem in of_get_named_gpiod_flags()
- Fix a long standing container_of() mistake in the TPS65912 driver"
* tag 'gpio-v4.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
gpio: tps65912: fix wrong container_of arguments
gpiolib: of: allow of_gpiochip_find_and_xlate to find more than one chip per node
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal
Pull thermal management fixes from Eduardo Valentin:
"Specifics:
- Several fixes in tmon tool.
- Fixes in intel int340x for _ART and _TRT tables.
- Add id for Avoton SoC into powerclamp driver.
- Fixes in RCAR thermal driver to remove race conditions and fix fail
path
- Fixes in TI thermal driver: removal of unnecessary code and build
fix if !CONFIG_PM_SLEEP
- Cleanups in exynos thermal driver
- Add stubs for include/linux/thermal.h. Now drivers using thermal
calls but that also work without CONFIG_THERMAL will be able to
compile for systems that don't care about thermal.
Note: I am sending this pull on Rui's behalf while he fixes issues in
his Linux box"
* 'fixes-for-4.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal:
thermal: int340x_thermal: Ignore missing _ART, _TRT tables
thermal/intel_powerclamp: add id for Avoton SoC
tools/thermal: tmon: silence 'set but not used' warnings
tools/thermal: tmon: use pkg-config to determine library dependencies
tools/thermal: tmon: support cross-compiling
tools/thermal: tmon: add .gitignore
tools/thermal: tmon: fixup tui windowing calculations
tools/thermal: tmon: tui: don't hard-code dialog window size assumptions
tools/thermal: tmon: add min/max macros
tools/thermal: tmon: add --target-temp parameter
thermal: exynos: Clean-up code to use oneline entry for exynos compatible table
thermal: rcar: Make error and remove paths symmetrical with init
thermal: rcar: Fix race condition between init and interrupt
thermal: Introduce dummy functions when thermal is not defined
ti-soc-thermal: Delete an unnecessary check before the function call "cpufreq_cooling_unregister"
thermal: ti-soc-thermal: bandgap: Fix build warning if !CONFIG_PM_SLEEP
|
|
There's one more case where we can't issue a rename operation for a
directory as soon as we process it. We used to delay directory renames
only if they have some ancestor directory with a higher inode number
that got renamed too, but there's another case where we need to delay
the rename too - when a directory A is renamed to the old name of a
directory B but that directory B has its rename delayed because it
has now (in the send root) an ancestor with a higher inode number that
was renamed. If we don't delay the directory rename in this case, the
receiving end of the send stream will attempt to rename A to the old
name of B before B got renamed to its new name, which results in a
"directory not empty" error. So fix this by delaying directory renames
for this case too.
Steps to reproduce:
$ mkfs.btrfs -f /dev/sdb
$ mount /dev/sdb /mnt
$ mkdir /mnt/a
$ mkdir /mnt/b
$ mkdir /mnt/c
$ touch /mnt/a/file
$ btrfs subvolume snapshot -r /mnt /mnt/snap1
$ mv /mnt/c /mnt/x
$ mv /mnt/a /mnt/x/y
$ mv /mnt/b /mnt/a
$ btrfs subvolume snapshot -r /mnt /mnt/snap2
$ btrfs send /mnt/snap1 -f /tmp/1.send
$ btrfs send -p /mnt/snap1 /mnt/snap2 -f /tmp/2.send
$ mkfs.btrfs -f /dev/sdc
$ mount /dev/sdc /mnt2
$ btrfs receive /mnt2 -f /tmp/1.send
$ btrfs receive /mnt2 -f /tmp/2.send
ERROR: rename b -> a failed. Directory not empty
A test case for xfstests follows soon.
Reported-by: Ames Cornish <[email protected]>
Signed-off-by: Filipe Manana <[email protected]>
Signed-off-by: Chris Mason <[email protected]>
|
|
A block-local variable stores error code but btrfs_get_blocks_direct may
not return it in the end as there's a ret defined in the function scope.
CC: <[email protected]> # 3.6+
Fixes: d187663ef24c ("Btrfs: lock extents as we map them in DIO")
Signed-off-by: David Sterba <[email protected]>
Signed-off-by: Chris Mason <[email protected]>
|
|
The return value from btrfs_lookup_xattr() can be a pointer encoding an
error, therefore deal with it. This fixes commit 5f5bc6b1e2d5
("Btrfs: make xattr replace operations atomic").
Signed-off-by: Filipe Manana <[email protected]>
Signed-off-by: Chris Mason <[email protected]>
|
|
The end_slot variable actually matches the number of pointers in the
node and not the last slot (which is 'nritems - 1'). Therefore in order
to check that the current slot in the for loop doesn't match the last
one, the correct logic is to check if 'i' is less than 'end_slot - 1'
and not 'end_slot - 2'.
Fix this and set end_slot to be 'nritems - 1', as it's less confusing
since the variable name implies it's inclusive rather then exclusive.
Signed-off-by: Filipe Manana <[email protected]>
Signed-off-by: Chris Mason <[email protected]>
|
|
When punching a file hole if we endup only zeroing parts of a page,
because the start offset isn't a multiple of the sector size or the
start offset and length fall within the same page, we were not updating
the inode item. This prevented an fsync from doing anything, if no other
file changes happened in the current transaction, because the fields
in btrfs_inode used to check if the inode needs to be fsync'ed weren't
updated.
This issue is easy to reproduce and the following excerpt from the
xfstest case I made shows how to trigger it:
_scratch_mkfs >> $seqres.full 2>&1
_init_flakey
_mount_flakey
# Create our test file.
$XFS_IO_PROG -f -c "pwrite -S 0x22 -b 16K 0 16K" \
$SCRATCH_MNT/foo | _filter_xfs_io
# Fsync the file, this makes btrfs update some btrfs inode specific fields
# that are used to track if the inode needs to be written/updated to the fsync
# log or not. After this fsync, the new values for those fields indicate that
# a subsequent fsync does not need to touch the fsync log.
$XFS_IO_PROG -c "fsync" $SCRATCH_MNT/foo
# Force a commit of the current transaction. After this point, any operation
# that modifies the data or metadata of our file, should update those fields in
# the btrfs inode with values that make the next fsync operation write to the
# fsync log.
sync
# Punch a hole in our file. This small range affects only 1 page.
# This made the btrfs hole punching implementation write only some zeroes in
# one page, but it did not update the btrfs inode fields used to determine if
# the next fsync needs to write to the fsync log.
$XFS_IO_PROG -c "fpunch 8000 4K" $SCRATCH_MNT/foo
# Another variation of the previously mentioned case.
$XFS_IO_PROG -c "fpunch 15000 100" $SCRATCH_MNT/foo
# Now fsync the file. This was a no-operation because the previous hole punch
# operation didn't update the inode's fields mentioned before, so they remained
# with the values they had after the first fsync - that is, they indicate that
# it is not needed to write to fsync log.
$XFS_IO_PROG -c "fsync" $SCRATCH_MNT/foo
echo "File content before:"
od -t x1 $SCRATCH_MNT/foo
# Simulate a crash/power loss.
_load_flakey_table $FLAKEY_DROP_WRITES
_unmount_flakey
# Enable writes and mount the fs. This makes the fsync log replay code run.
_load_flakey_table $FLAKEY_ALLOW_WRITES
_mount_flakey
# Because the last fsync didn't do anything, here the file content matched what
# it was after the first fsync, before the holes were punched, and not what it
# was after the holes were punched.
echo "File content after:"
od -t x1 $SCRATCH_MNT/foo
This issue has been around since 2012, when the punch hole implementation
was added, commit 2aaa66558172 ("Btrfs: add hole punching").
A test case for xfstests follows soon.
Signed-off-by: Filipe Manana <[email protected]>
Reviewed-by: Liu Bo <[email protected]>
Signed-off-by: Chris Mason <[email protected]>
|
|
Our gluster boxes were hitting a problem where they'd run out of space when
updating the block group cache and therefore wouldn't be able to update the free
space inode. This is a problem because this is how we invalidate the cache and
protect ourselves from errors further down the stack, so if this fails we have
to abort the transaction so we make sure we don't end up with stale free space
cache. Thanks,
Signed-off-by: Josef Bacik <[email protected]>
Signed-off-by: Chris Mason <[email protected]>
|
|
We can have multiple fsync operations against the same file during the
same transaction and they can collect the same ordered extents while they
don't complete (still accessible from the inode's ordered tree). If this
happens, those ordered extents will never get their reference counts
decremented to 0, leading to memory leaks and inode leaks (an iput for an
ordered extent's inode is scheduled only when the ordered extent's refcount
drops to 0). The following sequence diagram explains this race:
CPU 1 CPU 2
btrfs_sync_file()
btrfs_sync_file()
mutex_lock(inode->i_mutex)
btrfs_log_inode()
btrfs_get_logged_extents()
--> collects ordered extent X
--> increments ordered
extent X's refcount
btrfs_submit_logged_extents()
mutex_unlock(inode->i_mutex)
mutex_lock(inode->i_mutex)
btrfs_sync_log()
btrfs_wait_logged_extents()
--> list_del_init(&ordered->log_list)
btrfs_log_inode()
btrfs_get_logged_extents()
--> Adds ordered extent X
to logged_list because
at this point:
list_empty(&ordered->log_list)
&& test_bit(BTRFS_ORDERED_LOGGED,
&ordered->flags) == 0
--> Increments ordered extent
X's refcount
--> check if ordered extent's io is
finished or not, start it if
necessary and wait for it to finish
--> sets bit BTRFS_ORDERED_LOGGED
on ordered extent X's flags
and adds it to trans->ordered
btrfs_sync_log() finishes
btrfs_submit_logged_extents()
btrfs_log_inode() finishes
mutex_unlock(inode->i_mutex)
btrfs_sync_file() finishes
btrfs_sync_log()
btrfs_wait_logged_extents()
--> Sees ordered extent X has the
bit BTRFS_ORDERED_LOGGED set in
its flags
--> X's refcount is untouched
btrfs_sync_log() finishes
btrfs_sync_file() finishes
btrfs_commit_transaction()
--> called by transaction kthread for e.g.
btrfs_wait_pending_ordered()
--> waits for ordered extent X to
complete
--> decrements ordered extent X's
refcount by 1 only, corresponding
to the increment done by the fsync
task ran by CPU 1
In the scenario of the above diagram, after the transaction commit,
the ordered extent will remain with a refcount of 1 forever, leaking
the ordered extent structure and preventing the i_count of its inode
from ever decreasing to 0, since the delayed iput is scheduled only
when the ordered extent's refcount drops to 0, preventing the inode
from ever being evicted by the VFS.
Fix this by using the flag BTRFS_ORDERED_LOGGED differently. Use it to
mean that an ordered extent is already being processed by an fsync call,
which will attach it to the current transaction, preventing it from being
collected by subsequent fsync operations against the same inode.
This race was introduced with the following change (added in 3.19 and
backported to stable 3.18 and 3.17):
Btrfs: make sure logged extents complete in the current transaction V3
commit 50d9aa99bd35c77200e0e3dd7a72274f8304701f
I ran into this issue while running xfstests/generic/113 in a loop, which
failed about 1 out of 10 runs with the following warning in dmesg:
[ 2612.440038] WARNING: CPU: 4 PID: 22057 at fs/btrfs/disk-io.c:3558 free_fs_root+0x36/0x133 [btrfs]()
[ 2612.442810] Modules linked in: btrfs crc32c_generic xor raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc loop processor parport_pc parport psmouse therma
l_sys i2c_piix4 serio_raw pcspkr evdev microcode button i2c_core ext4 crc16 jbd2 mbcache sd_mod sg sr_mod cdrom virtio_scsi ata_generic virtio_pci ata_piix virtio_ring libata virtio flo
ppy e1000 scsi_mod [last unloaded: btrfs]
[ 2612.452711] CPU: 4 PID: 22057 Comm: umount Tainted: G W 3.19.0-rc5-btrfs-next-4+ #1
[ 2612.454921] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
[ 2612.457709] 0000000000000009 ffff8801342c3c78 ffffffff8142425e ffff88023ec8f2d8
[ 2612.459829] 0000000000000000 ffff8801342c3cb8 ffffffff81045308 ffff880046460000
[ 2612.461564] ffffffffa036da56 ffff88003d07b000 ffff880046460000 ffff880046460068
[ 2612.463163] Call Trace:
[ 2612.463719] [<ffffffff8142425e>] dump_stack+0x4c/0x65
[ 2612.464789] [<ffffffff81045308>] warn_slowpath_common+0xa1/0xbb
[ 2612.466026] [<ffffffffa036da56>] ? free_fs_root+0x36/0x133 [btrfs]
[ 2612.467247] [<ffffffff810453c5>] warn_slowpath_null+0x1a/0x1c
[ 2612.468416] [<ffffffffa036da56>] free_fs_root+0x36/0x133 [btrfs]
[ 2612.469625] [<ffffffffa036f2a7>] btrfs_drop_and_free_fs_root+0x93/0x9b [btrfs]
[ 2612.471251] [<ffffffffa036f353>] btrfs_free_fs_roots+0xa4/0xd6 [btrfs]
[ 2612.472536] [<ffffffff8142612e>] ? wait_for_completion+0x24/0x26
[ 2612.473742] [<ffffffffa0370bbc>] close_ctree+0x1f3/0x33c [btrfs]
[ 2612.475477] [<ffffffff81059d1d>] ? destroy_workqueue+0x148/0x1ba
[ 2612.476695] [<ffffffffa034e3da>] btrfs_put_super+0x19/0x1b [btrfs]
[ 2612.477911] [<ffffffff81153e53>] generic_shutdown_super+0x73/0xef
[ 2612.479106] [<ffffffff811540e2>] kill_anon_super+0x13/0x1e
[ 2612.480226] [<ffffffffa034e1e3>] btrfs_kill_super+0x17/0x23 [btrfs]
[ 2612.481471] [<ffffffff81154307>] deactivate_locked_super+0x3b/0x50
[ 2612.482686] [<ffffffff811547a7>] deactivate_super+0x3f/0x43
[ 2612.483791] [<ffffffff8116b3ed>] cleanup_mnt+0x59/0x78
[ 2612.484842] [<ffffffff8116b44c>] __cleanup_mnt+0x12/0x14
[ 2612.485900] [<ffffffff8105d019>] task_work_run+0x8f/0xbc
[ 2612.486960] [<ffffffff810028d8>] do_notify_resume+0x5a/0x6b
[ 2612.488083] [<ffffffff81236e5b>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[ 2612.489333] [<ffffffff8142a17f>] int_signal+0x12/0x17
[ 2612.490353] ---[ end trace 54a960a6bdcb8d93 ]---
[ 2612.557253] VFS: Busy inodes after unmount of sdb. Self-destruct in 5 seconds. Have a nice day...
Kmemleak confirmed the ordered extent leak (and btrfs inode specific
structures such as delayed nodes):
$ cat /sys/kernel/debug/kmemleak
unreferenced object 0xffff880154290db0 (size 576):
comm "btrfsck", pid 21980, jiffies 4295542503 (age 1273.412s)
hex dump (first 32 bytes):
01 40 00 00 01 00 00 00 b0 1d f1 4e 01 88 ff ff [email protected]....
00 00 00 00 00 00 00 00 c8 0d 29 54 01 88 ff ff ..........)T....
backtrace:
[<ffffffff8141d74d>] kmemleak_update_trace+0x4c/0x6a
[<ffffffff8122f2c0>] radix_tree_node_alloc+0x6d/0x83
[<ffffffff8122fb26>] __radix_tree_create+0x109/0x190
[<ffffffff8122fbdd>] radix_tree_insert+0x30/0xac
[<ffffffffa03b9bde>] btrfs_get_or_create_delayed_node+0x130/0x187 [btrfs]
[<ffffffffa03bb82d>] btrfs_delayed_delete_inode_ref+0x32/0xac [btrfs]
[<ffffffffa0379dae>] __btrfs_unlink_inode+0xee/0x288 [btrfs]
[<ffffffffa037c715>] btrfs_unlink_inode+0x1e/0x40 [btrfs]
[<ffffffffa037c797>] btrfs_unlink+0x60/0x9b [btrfs]
[<ffffffff8115d7f0>] vfs_unlink+0x9c/0xed
[<ffffffff8115f5de>] do_unlinkat+0x12c/0x1fa
[<ffffffff811601a7>] SyS_unlinkat+0x29/0x2b
[<ffffffff81429e92>] system_call_fastpath+0x12/0x17
[<ffffffffffffffff>] 0xffffffffffffffff
unreferenced object 0xffff88014ef11db0 (size 576):
comm "rm", pid 22009, jiffies 4295542593 (age 1273.052s)
hex dump (first 32 bytes):
02 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 ................
00 00 00 00 00 00 00 00 c8 1d f1 4e 01 88 ff ff ...........N....
backtrace:
[<ffffffff8141d74d>] kmemleak_update_trace+0x4c/0x6a
[<ffffffff8122f2c0>] radix_tree_node_alloc+0x6d/0x83
[<ffffffff8122fb26>] __radix_tree_create+0x109/0x190
[<ffffffff8122fbdd>] radix_tree_insert+0x30/0xac
[<ffffffffa03b9bde>] btrfs_get_or_create_delayed_node+0x130/0x187 [btrfs]
[<ffffffffa03bb82d>] btrfs_delayed_delete_inode_ref+0x32/0xac [btrfs]
[<ffffffffa0379dae>] __btrfs_unlink_inode+0xee/0x288 [btrfs]
[<ffffffffa037c715>] btrfs_unlink_inode+0x1e/0x40 [btrfs]
[<ffffffffa037c797>] btrfs_unlink+0x60/0x9b [btrfs]
[<ffffffff8115d7f0>] vfs_unlink+0x9c/0xed
[<ffffffff8115f5de>] do_unlinkat+0x12c/0x1fa
[<ffffffff811601a7>] SyS_unlinkat+0x29/0x2b
[<ffffffff81429e92>] system_call_fastpath+0x12/0x17
[<ffffffffffffffff>] 0xffffffffffffffff
unreferenced object 0xffff8800336feda8 (size 584):
comm "aio-stress", pid 22031, jiffies 4295543006 (age 1271.400s)
hex dump (first 32 bytes):
00 40 3e 00 00 00 00 00 00 00 8f 42 00 00 00 00 .@>........B....
00 00 01 00 00 00 00 00 00 00 01 00 00 00 00 00 ................
backtrace:
[<ffffffff8114eb34>] create_object+0x172/0x29a
[<ffffffff8141d790>] kmemleak_alloc+0x25/0x41
[<ffffffff81141ae6>] kmemleak_alloc_recursive.constprop.52+0x16/0x18
[<ffffffff81145288>] kmem_cache_alloc+0xf7/0x198
[<ffffffffa0389243>] __btrfs_add_ordered_extent+0x43/0x309 [btrfs]
[<ffffffffa038968b>] btrfs_add_ordered_extent_dio+0x12/0x14 [btrfs]
[<ffffffffa03810e2>] btrfs_get_blocks_direct+0x3ef/0x571 [btrfs]
[<ffffffff81181349>] do_blockdev_direct_IO+0x62a/0xb47
[<ffffffff8118189a>] __blockdev_direct_IO+0x34/0x36
[<ffffffffa03776e5>] btrfs_direct_IO+0x16a/0x1e8 [btrfs]
[<ffffffff81100373>] generic_file_direct_write+0xb8/0x12d
[<ffffffffa038615c>] btrfs_file_write_iter+0x24b/0x42f [btrfs]
[<ffffffff8118bb0d>] aio_run_iocb+0x2b7/0x32e
[<ffffffff8118c99a>] do_io_submit+0x26e/0x2ff
[<ffffffff8118ca3b>] SyS_io_submit+0x10/0x12
[<ffffffff81429e92>] system_call_fastpath+0x12/0x17
CC: <[email protected]> # 3.19, 3.18 and 3.17
Signed-off-by: Filipe Manana <[email protected]>
Signed-off-by: Chris Mason <[email protected]>
|