Age | Commit message (Collapse) | Author | Files | Lines |
|
The ceph_fault() function takes the con mutex, so we should avoid
dropping it before calling it. This fixes a potential race with
another thread calling ceph_con_close(), or _open(), or similar (we
don't reverify con->state after retaking the lock).
Add annotation so that lockdep realizes we will drop the mutex before
returning.
Signed-off-by: Sage Weil <[email protected]>
Reviewed-by: Alex Elder <[email protected]>
|
|
We drop the con mutex when delivering a message. When we retake the
lock, we need to verify we are still in the OPEN state before
preparing to read the next tag, or else we risk stepping on a
connection that has been closed.
Signed-off-by: Sage Weil <[email protected]>
Reviewed-by: Alex Elder <[email protected]>
|
|
Revoke all mon_client messages when we shut down the old connection.
This is mostly moot since we are re-using the same ceph_connection,
but it is cleaner.
Signed-off-by: Sage Weil <[email protected]>
Reviewed-by: Alex Elder <[email protected]>
|
|
If the connect() call immediately fails such that sock == NULL, we
still need con_close_socket() to reset our socket state to CLOSED.
Signed-off-by: Sage Weil <[email protected]>
Reviewed-by: Alex Elder <[email protected]>
|
|
* shiny new inktank.com email addresses
* add include/linux/crush directory (previous oversight)
Signed-off-by: Sage Weil <[email protected]>
Reviewed-by: Alex Elder <[email protected]>
|
|
There are many (normal) conditions that can lead to us getting
unexpected replies, include cluster topology changes, osd failures,
and timeouts. There's no need to spam the console about it.
Signed-off-by: Sage Weil <[email protected]>
Reviewed-by: Alex Elder <[email protected]>
|
|
Signed-off-by: Sage Weil <[email protected]>
|
|
Rename flags with CON_FLAG prefix, move the definitions into the c file,
and (better) document their meaning.
Signed-off-by: Sage Weil <[email protected]>
|
|
Use a simple set of 6 enumerated values for the socket states (CON_STATE_*)
and use those instead of the state bits. All of the con->state checks are
now under the protection of the con mutex, so this is safe. It also
simplifies many of the state checks because we can check for anything other
than the expected state instead of various bits for races we can think of.
This appears to hold up well to stress testing both with and without socket
failure injection on the server side.
Signed-off-by: Sage Weil <[email protected]>
|
|
If we are CLOSED, the socket is closed and we won't get these.
Signed-off-by: Sage Weil <[email protected]>
|
|
It is simpler to do this immediately, since we already hold the con mutex.
It also avoids the need to deal with a not-quite-CLOSED socket in con_work.
Signed-off-by: Sage Weil <[email protected]>
|
|
If the state is CLOSED or OPENING, we shouldn't have a socket.
Signed-off-by: Sage Weil <[email protected]>
|
|
Take the con mutex before checking whether the connection is closed to
avoid racing with someone else closing it.
Signed-off-by: Sage Weil <[email protected]>
|
|
Avoid dropping and retaking con->mutex in the ceph_con_send() case by
leaving locking up to the caller.
Signed-off-by: Sage Weil <[email protected]>
|
|
If we fault on a lossy connection, we should still close the socket
immediately, and do so under the con mutex.
We should also take the con mutex before printing out the state bits in
the debug output.
Signed-off-by: Sage Weil <[email protected]>
|
|
rbd_req_sync_unwatch() only ever uses rbd_dev->header_name as the
value of its "object_name" parameter, and that value is available
within the function already. So get rid of the parameter.
Signed-off-by: Alex Elder <[email protected]>
Reviewed-by: Josh Durgin <[email protected]>
|
|
rbd_req_sync_notify_ack() only ever uses rbd_dev->header_name as the
value of its "object_name" parameter, and that value is available
within the function already. So get rid of the parameter.
Signed-off-by: Alex Elder <[email protected]>
Reviewed-by: Josh Durgin <[email protected]>
|
|
rbd_req_sync_notify() only ever uses rbd_dev->header_name as the
value of its "object_name" parameter, and that value is available
within the function already. So get rid of the parameter.
Signed-off-by: Alex Elder <[email protected]>
Reviewed-by: Josh Durgin <[email protected]>
|
|
rbd_req_sync_watch() is only called in one place, and in that place
it passes rbd_dev->header_name as the value of the "object_name"
parameter. This value is available within the function already.
Having the extra parameter leaves the impression the object name
could take on different values, but it does not.
So get rid of the parameter. We can always add it back again if
we find we want to watch some other object in the future.
Signed-off-by: Alex Elder <[email protected]>
Reviewed-by: Josh Durgin <[email protected]>
|
|
Both rbd_register_snap_dev() and __rbd_remove_snap_dev() have
rbd_dev parameters that are unused. Remove them.
Signed-off-by: Alex Elder <[email protected]>
Reviewed-by: Josh Durgin <[email protected]>
|
|
The function rbd_header_from_disk() is only called in one spot, and
it passes GFP_KERNEL as its value for the gfp_flags parameter.
Just drop that parameter and substitute GFP_KERNEL everywhere within
that function it had been used. (If we find we need the parameter
again in the future it's easy enough to add back again.)
Signed-off-by: Alex Elder <[email protected]>
Reviewed-by: Josh Durgin <[email protected]>
|
|
The "snapc" parameter to in rbd_req_sync_read() is not used, so
get rid of it.
Reported-by: Josh Durgin <[email protected]>
Signed-off-by: Alex Elder <[email protected]>
Reviewed-by: Josh Durgin <[email protected]>
|
|
The "id" field of an rbd device structure represents the unique
client-local device id mapped to the underlying rbd image. Each rbd
image will have another id--the image id--and each snapshot has its
own id as well. The simple name "id" no longer conveys the
information one might like to have.
Rename the device "id" field in struct rbd_dev to be "dev_id" to
make it a little more obvious what we're dealing with without having
to think more about context.
Signed-off-by: Alex Elder <[email protected]>
Reviewed-by: Josh Durgin <[email protected]>
|
|
If an rbd image header is read and it doesn't begin with the
expected magic information, a warning is displayed. This is
a fairly simple test, but it could be extended at some point.
Fix the comparison so it actually looks at the "text" field
rather than the front of the structure.
In any case, encapsulate the validity test in its own function.
Signed-off-by: Alex Elder <[email protected]>
Reviewed-by: Josh Durgin <[email protected]>
|
|
There are two structures in which a count of snapshots are
maintained:
struct ceph_snap_context {
...
u32 num_snaps;
...
}
and
struct ceph_snap_realm {
...
u32 num_prior_parent_snaps; /* had prior to parent_since */
...
u32 num_snaps;
...
}
These fields never take on negative values (e.g., to hold special
meaning), and so are really inherently unsigned. Furthermore they
take their value from over-the-wire or on-disk formatted 32-bit
values.
So change their definition to have type u32, and change some spots
elsewhere in the code to account for this change.
Signed-off-by: Alex Elder <[email protected]>
Reviewed-by: Josh Durgin <[email protected]>
|
|
There was a dout() call in rbd_do_request() that was reporting
the reporting the offset as the length and vice versa. While
fixing that I did a quick scan of other dout() calls and fixed
a couple of other minor things.
Signed-off-by: Alex Elder <[email protected]>
Reviewed-by: Josh Durgin <[email protected]>
|
|
This just replaces a while loop with list_for_each_entry_safe()
in __rbd_remove_all_snaps().
Signed-off-by: Alex Elder <[email protected]>
Reviewed-by: Josh Durgin <[email protected]>
|
|
In commit c666601a there was inadvertently added an extra
initialization of rbd_dev->header_rwsem. This gets rid of the
duplicate.
Reported-by: Guangliang Zhao <[email protected]>
Signed-off-by: Alex Elder <[email protected]>
Reviewed-by: Josh Durgin <[email protected]>
|
|
The snap_seq field in an rbd_image_header structure held the value
from the rbd image header when it was last refreshed. We now
maintain this value in the snapc->seq field. So get rid of the
other one.
Signed-off-by: Alex Elder <[email protected]>
Reviewed-by: Josh Durgin <[email protected]>
|
|
In rbd_header_add_snap() there is code to set snapc->seq to the
just-added snapshot id. This is the only remnant left of the
use of that field for recording which snapshot an rbd_dev was
associated with. That functionality is no longer supported,
so get rid of that final bit of code.
Doing so means we never actually set snapc->seq any more. On the
server, the snapshot context's sequence value represents the highest
snapshot id ever issued for a particular rbd image. So we'll make
it have that meaning here as well. To do so, set this value
whenever the rbd header is (re-)read. That way it will always be
consistent with the rest of the snapshot context we maintain.
Signed-off-by: Alex Elder <[email protected]>
Reviewed-by: Josh Durgin <[email protected]>
|
|
In rbd_header_set_snap(), there is logic to make the snap context's
seq field get set to a particular snapshot id, or 0 if there is no
snapshot for the rbd image.
This seems to be an artifact of how the current snapshot id for an
rbd_dev was recorded before the rbd_dev->snap_id field began to be
used for that purpose.
There's no need to update the value of snapc->seq here any more, so
stop doing it. Tidy up a few local variables in that function
while we're at it.
Signed-off-by: Alex Elder <[email protected]>
Reviewed-by: Josh Durgin <[email protected]>
|
|
In what appears to be an artifact of a different way of encoding
whether an rbd image maps a snapshot, __rbd_refresh_header() has
code that arranges to update the seq value in an rbd image's
snapshot context to point to the first entry in its snapshot
array if that's where it was pointing initially.
We now use rbd_dev->snap_id to record the snapshot id--using the
special value CEPH_NOSNAP to indicate the rbd_dev is not mapping a
snapshot at all.
There is therefore no need to check for this case, nor to update the
seq value, in __rbd_refresh_header(). Just preserve the seq value
that rbd_read_header() provides (which, at the moment, is nothing).
Signed-off-by: Alex Elder <[email protected]>
Reviewed-by: Josh Durgin <[email protected]>
|
|
Previously the original header version was sent. Now, we update it
when the header changes.
Signed-off-by: Josh Durgin <[email protected]>
Reviewed-by: Alex Elder <[email protected]>
|
|
This prevents a race between requests with a given snap context and
header updates that free it. The osd client was already expecting the
snap context to be reference counted, since it get()s it in
ceph_osdc_build_request and put()s it when the request completes.
Also remove the second down_read()/up_read() on header_rwsem in
rbd_do_request, which wasn't actually preventing this race or
protecting any other data.
Signed-off-by: Josh Durgin <[email protected]>
Reviewed-by: Alex Elder <[email protected]>
|
|
The image may have been resized.
Signed-off-by: Josh Durgin <[email protected]>
Reviewed-by: Alex Elder <[email protected]>
|
|
If an image was mapped to a snapshot, the size of the head version
would be shown. Protect capacity with header_rwsem, since it may
change.
Signed-off-by: Josh Durgin <[email protected]>
Reviewed-by: Alex Elder <[email protected]>
|
|
Snapshots cannot be resized, and the new capacity of head should not
be reflected by the snapshot.
Signed-off-by: Josh Durgin <[email protected]>
Reviewed-by: Alex Elder <[email protected]>
|
|
When a snapshot is deleted, the OSD will return ENOENT when reading
from it. This is normally interpreted as a hole by rbd, which will
return zeroes. To minimize the time in which this can happen, stop
requests early when we are notified that our snapshot no longer
exists.
[[email protected]: updated __rbd_init_snaps_header() logic]
Signed-off-by: Josh Durgin <[email protected]>
Reviewed-by: Alex Elder <[email protected]>
|
|
This is a trivial fix for the debug output, as it is inconsistent
with the function name so may confuse people when debugging.
[[email protected]: switched to use __func__]
Signed-off-by: Jiaju Zhang <[email protected]>
Reviewed-by: Alex Elder <[email protected]>
|
|
We re-run the loop but we don't re-set the attrs pointer back to NULL.
Signed-off-by: Alan Cox <[email protected]>
Reviewed-by: Alex Elder <[email protected]>
|
|
We exponentially back off when we encounter connection errors. If several
errors accumulate, we will eventually wait ages before even trying to
reconnect.
Fix this by resetting the backoff counter after a successful negotiation/
connection with the remote node. Fixes ceph issue #2802.
Signed-off-by: Sage Weil <[email protected]>
Reviewed-by: Yehuda Sadeh <[email protected]>
Reviewed-by: Alex Elder <[email protected]>
|
|
Take the con mutex while we are initiating a ceph open. This is necessary
because the may have previously been in use and then closed, which could
result in a racing workqueue running con_work().
Signed-off-by: Sage Weil <[email protected]>
Reviewed-by: Yehuda Sadeh <[email protected]>
Reviewed-by: Alex Elder <[email protected]>
|
|
When we detect a mds session reset, close the old ceph_connection before
reopening it. This ensures we clean up the old socket properly and keep
the ceph_connection state correct.
Signed-off-by: Sage Weil <[email protected]>
Reviewed-by: Alex Elder <[email protected]>
Reviewed-by: Yehuda Sadeh <[email protected]>
|
|
Previously, we were opportunistically initializing the bio_iter if it
appeared to be uninitialized in the middle of the read path. The problem
is that a sequence like:
- start reading message
- initialize bio_iter
- read half a message
- messenger fault, reconnect
- restart reading message
- ** bio_iter now non-NULL, not reinitialized **
- read past end of bio, crash
Instead, initialize the bio_iter unconditionally when we allocate/claim
the message for read.
Signed-off-by: Sage Weil <[email protected]>
Reviewed-by: Alex Elder <[email protected]>
Reviewed-by: Yehuda Sadeh <[email protected]>
|
|
The linger op registration (i.e., watch) modifies the object state. As
such, the OSD will reply with success if it has already applied without
doing the associated side-effects (setting up the watch session state).
If we lose the ACK and resubmit, we will see success but the watch will not
be correctly registered and we won't get notifies.
To fix this, always resubmit the linger op with a new tid. We accomplish
this by re-registering as a linger (i.e., 'registered') if we are not yet
registered. Then the second loop will treat this just like a normal
case of re-registering.
This mirrors a similar fix on the userland ceph.git, commit 5dd68b95, and
ceph bug #2796.
Signed-off-by: Sage Weil <[email protected]>
Reviewed-by: Alex Elder <[email protected]>
Reviewed-by: Yehuda Sadeh <[email protected]>
|
|
Hold the mutex while twiddling all of the state bits to avoid possible
races. While we're here, make not of why we cannot close the socket
directly.
Signed-off-by: Sage Weil <[email protected]>
Reviewed-by: Alex Elder <[email protected]>
Reviewed-by: Yehuda Sadeh <[email protected]>
|
|
We need to set error_msg to something useful before calling ceph_fault();
do so here for try_{read,write}(). This is more informative than
libceph: osd0 192.168.106.220:6801 (null)
Signed-off-by: Sage Weil <[email protected]>
Reviewed-by: Alex Elder <[email protected]>
Reviewed-by: Yehuda Sadeh <[email protected]>
|
|
The server side recently added support for tuning some magic
crush variables. Decode these variables if they are present, or use the
default values if they are not present.
Corresponds to ceph.git commit 89af369c25f274fe62ef730e5e8aad0c54f1e5a5.
Signed-off-by: caleb miles <[email protected]>
Reviewed-by: Sage Weil <[email protected]>
Reviewed-by: Alex Elder <[email protected]>
Reviewed-by: Yehuda Sadeh <[email protected]>
|
|
The code attempts to maintain a "user format" and a "sensor format",
but in this case it looks like a typo is passing the user format down
to the sensor.
This was preventing display of video at anything other than 640x480.
Signed-off-by: Daniel Drake <[email protected]>
Signed-off-by: Mauro Carvalho Chehab <[email protected]>
|
|
The current formulation of the bw_params loop uses the counter j as an
index for the first dimension of the bw_params array which is later
incremented by the variable i. It is evaluated correctly only, because j
is initialized to 0 at the beginning of the loop. I think that
explicitly using the index 0 better reflects the intent of the
expression.
Signed-off-by: Hans-Frieder Vogt <[email protected]>
Signed-off-by: Mauro Carvalho Chehab <[email protected]>
|