aboutsummaryrefslogtreecommitdiff
path: root/net/ceph
AgeCommit message (Collapse)AuthorFilesLines
2012-07-30libceph: protect ceph_con_open() with mutexSage Weil1-0/+2
Take the con mutex while we are initiating a ceph open. This is necessary because the may have previously been in use and then closed, which could result in a racing workqueue running con_work(). Signed-off-by: Sage Weil <[email protected]> Reviewed-by: Yehuda Sadeh <[email protected]> Reviewed-by: Alex Elder <[email protected]>
2012-07-30libceph: (re)initialize bio_iter on start of message receiveSage Weil1-5/+6
Previously, we were opportunistically initializing the bio_iter if it appeared to be uninitialized in the middle of the read path. The problem is that a sequence like: - start reading message - initialize bio_iter - read half a message - messenger fault, reconnect - restart reading message - ** bio_iter now non-NULL, not reinitialized ** - read past end of bio, crash Instead, initialize the bio_iter unconditionally when we allocate/claim the message for read. Signed-off-by: Sage Weil <[email protected]> Reviewed-by: Alex Elder <[email protected]> Reviewed-by: Yehuda Sadeh <[email protected]>
2012-07-30libceph: resubmit linger ops when pg mapping changesSage Weil1-5/+21
The linger op registration (i.e., watch) modifies the object state. As such, the OSD will reply with success if it has already applied without doing the associated side-effects (setting up the watch session state). If we lose the ACK and resubmit, we will see success but the watch will not be correctly registered and we won't get notifies. To fix this, always resubmit the linger op with a new tid. We accomplish this by re-registering as a linger (i.e., 'registered') if we are not yet registered. Then the second loop will treat this just like a normal case of re-registering. This mirrors a similar fix on the userland ceph.git, commit 5dd68b95, and ceph bug #2796. Signed-off-by: Sage Weil <[email protected]> Reviewed-by: Alex Elder <[email protected]> Reviewed-by: Yehuda Sadeh <[email protected]>
2012-07-30libceph: fix mutex coverage for ceph_con_closeSage Weil1-1/+7
Hold the mutex while twiddling all of the state bits to avoid possible races. While we're here, make not of why we cannot close the socket directly. Signed-off-by: Sage Weil <[email protected]> Reviewed-by: Alex Elder <[email protected]> Reviewed-by: Yehuda Sadeh <[email protected]>
2012-07-30libceph: report socket read/write error messageSage Weil1-2/+6
We need to set error_msg to something useful before calling ceph_fault(); do so here for try_{read,write}(). This is more informative than libceph: osd0 192.168.106.220:6801 (null) Signed-off-by: Sage Weil <[email protected]> Reviewed-by: Alex Elder <[email protected]> Reviewed-by: Yehuda Sadeh <[email protected]>
2012-07-30libceph: support crush tunablesSage Weil2-6/+46
The server side recently added support for tuning some magic crush variables. Decode these variables if they are present, or use the default values if they are not present. Corresponds to ceph.git commit 89af369c25f274fe62ef730e5e8aad0c54f1e5a5. Signed-off-by: caleb miles <[email protected]> Reviewed-by: Sage Weil <[email protected]> Reviewed-by: Alex Elder <[email protected]> Reviewed-by: Yehuda Sadeh <[email protected]>
2012-07-30libceph: move feature bits to separate headerSage Weil1-2/+3
This is simply cleanup that will keep things more closely synced with the userland code. Signed-off-by: Sage Weil <[email protected]> Reviewed-by: Alex Elder <[email protected]> Reviewed-by: Yehuda Sadeh <[email protected]>
2012-07-30libceph: prevent the race of incoming work during teardownGuanjun He2-0/+7
Add an atomic variable 'stopping' as flag in struct ceph_messenger, set this flag to 1 in function ceph_destroy_client(), and add the condition code in function ceph_data_ready() to test the flag value, if true(1), just return. Signed-off-by: Guanjun He <[email protected]> Reviewed-by: Sage Weil <[email protected]>
2012-07-30libceph: fix messenger retrySage Weil1-6/+6
In ancient times, the messenger could both initiate and accept connections. An artifact if that was data structures to store/process an incoming ceph_msg_connect request and send an outgoing ceph_msg_connect_reply. Sadly, the negotiation code was referencing those structures and ignoring important information (like the peer's connect_seq) from the correct ones. Among other things, this fixes tight reconnect loops where the server sends RETRY_SESSION and we (the client) retries with the same connect_seq as last time. This bug pretty easily triggered by injecting socket failures on the MDS and running some fs workload like workunits/direct_io/test_sync_io. Signed-off-by: Sage Weil <[email protected]>
2012-07-30libceph: initialize rb, list nodes in ceph_osd_requestSage Weil1-0/+3
These don't strictly need to be initialized based on how they are used, but it is good practice to do so. Reported-by: Alex Elder <[email protected]> Signed-off-by: Sage Weil <[email protected]>
2012-07-30libceph: initialize msgpool message typesSage Weil2-6/+8
Initialize the type field for messages in a msgpool. The caller was doing this for osd ops, but not for the reply messages. Reported-by: Alex Elder <[email protected]> Signed-off-by: Sage Weil <[email protected]>
2012-07-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds1-10/+4
Pull networking changes from David S Miller: 1) Remove the ipv4 routing cache. Now lookups go directly into the FIB trie and use prebuilt routes cached there. No more garbage collection, no more rDOS attacks on the routing cache. Instead we now get predictable and consistent performance, no matter what the pattern of traffic we service. This has been almost 2 years in the making. Special thanks to Julian Anastasov, Eric Dumazet, Steffen Klassert, and others who have helped along the way. I'm sure that with a change of this magnitude there will be some kind of fallout, but such things ought the be simple to fix at this point. Luckily I'm not European so I'll be around all of August to fix things :-) The major stages of this work here are each fronted by a forced merge commit whose commit message contains a top-level description of the motivations and implementation issues. 2) Pre-demux of established ipv4 TCP sockets, saves a route demux on input. 3) TCP SYN/ACK performance tweaks from Eric Dumazet. 4) Add namespace support for netfilter L4 conntrack helpers, from Gao Feng. 5) Add config mechanism for Energy Efficient Ethernet to ethtool, from Yuval Mintz. 6) Remove quadratic behavior from /proc/net/unix, from Eric Dumazet. 7) Support for connection tracker helpers in userspace, from Pablo Neira Ayuso. 8) Allow userspace driven TX load balancing functions in TEAM driver, from Jiri Pirko. 9) Kill off NLMSG_PUT and RTA_PUT macros, more gross stuff with embedded gotos. 10) TCP Small Queues, essentially minimize the amount of TCP data queued up in the packet scheduler layer. Whereas the existing BQL (Byte Queue Limits) limits the pkt_sched --> netdevice queuing levels, this controls the TCP --> pkt_sched queueing levels. From Eric Dumazet. 11) Reduce the number of get_page/put_page ops done on SKB fragments, from Alexander Duyck. 12) Implement protection against blind resets in TCP (RFC 5961), from Eric Dumazet. 13) Support the client side of TCP Fast Open, basically the ability to send data in the SYN exchange, from Yuchung Cheng. Basically, the sender queues up data with a sendmsg() call using MSG_FASTOPEN, then they do the connect() which emits the queued up fastopen data. 14) Avoid all the problems we get into in TCP when timers or PMTU events hit a locked socket. The TCP Small Queues changes added a tcp_release_cb() that allows us to queue work up to the release_sock() caller, and that's what we use here too. From Eric Dumazet. 15) Zero copy on TX support for TUN driver, from Michael S. Tsirkin. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1870 commits) genetlink: define lockdep_genl_is_held() when CONFIG_LOCKDEP r8169: revert "add byte queue limit support". ipv4: Change rt->rt_iif encoding. net: Make skb->skb_iif always track skb->dev ipv4: Prepare for change of rt->rt_iif encoding. ipv4: Remove all RTCF_DIRECTSRC handliing. ipv4: Really ignore ICMP address requests/replies. decnet: Don't set RTCF_DIRECTSRC. net/ipv4/ip_vti.c: Fix __rcu warnings detected by sparse. ipv4: Remove redundant assignment rds: set correct msg_namelen openvswitch: potential NULL deref in sample() tcp: dont drop MTU reduction indications bnx2x: Add new 57840 device IDs tcp: avoid oops in tcp_metrics and reset tcpm_stamp niu: Change niu_rbr_fill() to use unlikely() to check niu_rbr_add_page() return value niu: Fix to check for dma mapping errors. net: Fix references to out-of-scope variables in put_cmsg_compat() net: ethernet: davinci_emac: add pm_runtime support net: ethernet: davinci_emac: Remove unnecessary #include ...
2012-07-17libceph: fix messenger retrySage Weil1-6/+6
In ancient times, the messenger could both initiate and accept connections. An artifact if that was data structures to store/process an incoming ceph_msg_connect request and send an outgoing ceph_msg_connect_reply. Sadly, the negotiation code was referencing those structures and ignoring important information (like the peer's connect_seq) from the correct ones. Among other things, this fixes tight reconnect loops where the server sends RETRY_SESSION and we (the client) retries with the same connect_seq as last time. This bug pretty easily triggered by injecting socket failures on the MDS and running some fs workload like workunits/direct_io/test_sync_io. Signed-off-by: Sage Weil <[email protected]>
2012-07-10net: Fix non-kernel-doc comments with kernel-doc start markerBen Hutchings1-10/+4
Signed-off-by: Ben Hutchings <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2012-07-05libceph: allow sock transition from CONNECTING to CLOSEDSage Weil1-12/+13
It is possible to close a socket that is in the OPENING state. For example, it can happen if ceph_con_close() is called on the con before the TCP connection is established. con_work() will come around and shut down the socket. Signed-off-by: Sage Weil <[email protected]>
2012-07-05libceph: initialize mon_client con only onceSage Weil1-4/+3
Do not re-initialize the con on every connection attempt. When we ceph_con_close, there may still be work queued on the socket (e.g., to close it), and re-initializing will clobber the work_struct state. Signed-off-by: Sage Weil <[email protected]>
2012-07-05libceph: set peer name on con_open, not initSage Weil3-11/+15
The peer name may change on each open attempt, even when the connection is reused. Signed-off-by: Sage Weil <[email protected]>
2012-07-05libceph: add some fine ASCII artAlex Elder1-1/+41
Sage liked the state diagram I put in my commit description so I'm putting it in with the code. Signed-off-by: Alex Elder <[email protected]> Reviewed-by: Sage Weil <[email protected]>
2012-07-05libceph: small changes to messenger.cAlex Elder1-32/+31
This patch gathers a few small changes in "net/ceph/messenger.c": out_msg_pos_next() - small logic change that mostly affects indentation write_partial_msg_pages(). - use a local variable trail_off to represent the offset into a message of the trail portion of the data (if present) - once we are in the trail portion we will always be there, so we don't always need to check against our data position - avoid computing len twice after we've reached the trail - get rid of the variable tmpcrc, which is not needed - trail_off and trail_len never change so mark them const - update some comments read_partial_message_bio() - bio_iovec_idx() will never return an error, so don't bother checking for it Signed-off-by: Alex Elder <[email protected]> Reviewed-by: Sage Weil <[email protected]>
2012-07-05libceph: distinguish two phases of connect sequenceAlex Elder1-24/+28
Currently a ceph connection enters a "CONNECTING" state when it begins the process of (re-)connecting with its peer. Once the two ends have successfully exchanged their banner and addresses, an additional NEGOTIATING bit is set in the ceph connection's state to indicate the connection information exhange has begun. The CONNECTING bit/state continues to be set during this phase. Rather than have the CONNECTING state continue while the NEGOTIATING bit is set, interpret these two phases as distinct states. In other words, when NEGOTIATING is set, clear CONNECTING. That way only one of them will be active at a time. Signed-off-by: Alex Elder <[email protected]> Reviewed-by: Sage Weil <[email protected]>
2012-07-05libceph: separate banner and connect writesAlex Elder1-9/+11
There are two phases in the process of linking together the two ends of a ceph connection. The first involves exchanging a banner and IP addresses, and if that is successful a second phase exchanges some detail about each side's connection capabilities. When initiating a connection, the client side now queues to send its information for both phases of this process at the same time. This is probably a bit more efficient, but it is slightly messier from a layering perspective in the code. So rearrange things so that the client doesn't send the connection information until it has received and processed the response in the initial banner phase (in process_banner()). Move the code (in the (con->sock == NULL) case in try_write()) that prepares for writing the connection information, delaying doing that until the banner exchange has completed. Move the code that begins the transition to this second "NEGOTIATING" phase out of process_banner() and into its caller, so preparing to write the connection information and preparing to read the response are adjacent to each other. Finally, preparing to write the connection information now requires the output kvec to be reset in all cases, so move that into the prepare_write_connect() and delete it from all callers. Signed-off-by: Alex Elder <[email protected]> Reviewed-by: Sage Weil <[email protected]>
2012-07-05libceph: define and use an explicit CONNECTED stateAlex Elder1-2/+7
There is no state explicitly defined when a ceph connection is fully operational. So define one. It's set when the connection sequence completes successfully, and is cleared when the connection gets closed. Be a little more careful when examining the old state when a socket disconnect event is reported. Signed-off-by: Alex Elder <[email protected]> Reviewed-by: Sage Weil <[email protected]>
2012-07-05libceph: clear NEGOTIATING when doneAlex Elder1-3/+5
A connection state's NEGOTIATING bit gets set while in CONNECTING state after we have successfully exchanged a ceph banner and IP addresses with the connection's peer (the server). But that bit is not cleared again--at least not until another connection attempt is initiated. Instead, clear it as soon as the connection is fully established. Also, clear it when a socket connection gets prematurely closed in the midst of establishing a ceph connection (in case we had reached the point where it was set). Signed-off-by: Alex Elder <[email protected]> Reviewed-by: Sage Weil <[email protected]>
2012-07-05libceph: clear CONNECTING in ceph_con_close()Alex Elder1-1/+2
A connection that is closed will no longer be connecting. So clear the CONNECTING state bit in ceph_con_close(). Similarly, if the socket has been closed we no longer are in connecting state (a new connect sequence will need to be initiated). Signed-off-by: Alex Elder <[email protected]> Reviewed-by: Sage Weil <[email protected]>
2012-07-05libceph: don't touch con state in con_close_socket()Alex Elder1-1/+7
In con_close_socket(), a connection's SOCK_CLOSED flag gets set and then cleared while its shutdown method is called and its reference gets dropped. Previously, that flag got set only if it had not already been set, so setting it in con_close_socket() might have prevented additional processing being done on a socket being shut down. We no longer set SOCK_CLOSED in the socket event routine conditionally, so setting that bit here no longer provides whatever benefit it might have provided before. A race condition could still leave the SOCK_CLOSED bit set even after we've issued the call to con_close_socket(), so we still clear that bit after shutting the socket down. Add a comment explaining the reason for this. Signed-off-by: Alex Elder <[email protected]> Reviewed-by: Sage Weil <[email protected]>
2012-07-05libceph: just set SOCK_CLOSED when state changesAlex Elder1-2/+2
When a TCP_CLOSE or TCP_CLOSE_WAIT event occurs, the SOCK_CLOSED connection flag bit is set, and if it had not been previously set queue_con() is called to ensure con_work() will get a chance to handle the changed state. con_work() atomically checks--and if set, clears--the SOCK_CLOSED bit if it was set. This means that even if the bit were set repeatedly, the related processing in con_work() only gets called once per transition of the bit from 0 to 1. What's important then is that we ensure con_work() gets called *at least* once when a socket close event occurs, not that it gets called *exactly* once. The work queue mechanism already takes care of queueing work only if it is not already queued, so there's no need for us to call queue_con() conditionally. So this patch just makes it so the SOCK_CLOSED flag gets set unconditionally in ceph_sock_state_change(). Signed-off-by: Alex Elder <[email protected]> Reviewed-by: Sage Weil <[email protected]>
2012-07-05libceph: don't change socket state on sock eventAlex Elder1-9/+9
Currently the socket state change event handler records an error message on a connection to distinguish a close while connecting from a close while a connection was already established. Changing connection information during handling of a socket event is not very clean, so instead move this assignment inside con_work(), where it can be done during normal connection-level processing (and under protection of the connection mutex as well). Move the handling of a socket closed event up to the top of the processing loop in con_work(); there's no point in handling backoff etc. if we have a newly-closed socket to take care of. Signed-off-by: Alex Elder <[email protected]> Reviewed-by: Sage Weil <[email protected]>
2012-07-05libceph: SOCK_CLOSED is a flag, not a stateAlex Elder1-2/+2
The following commit changed it so SOCK_CLOSED bit was stored in a connection's new "flags" field rather than its "state" field. libceph: start separating connection flags from state commit 928443cd That bit is used in con_close_socket() to protect against setting an error message more than once in the socket event handler function. Unfortunately, the field being operated on in that function was not updated to be "flags" as it should have been. This fixes that error. Signed-off-by: Alex Elder <[email protected]> Reviewed-by: Sage Weil <[email protected]>
2012-07-05libceph: don't use bio_iter as a flagAlex Elder1-5/+1
Recently a bug was fixed in which the bio_iter field in a ceph message was not being properly re-initialized when a message got re-transmitted: commit 43643528cce60ca184fe8197efa8e8da7c89a037 Author: Yan, Zheng <[email protected]> rbd: Clear ceph_msg->bio_iter for retransmitted message We are now only initializing the bio_iter field when we are about to start to write message data (in prepare_write_message_data()), rather than every time we are attempting to write any portion of the message data (in write_partial_msg_pages()). This means we no longer need to use the msg->bio_iter field as a flag. So just don't do that any more. Trust prepare_write_message_data() to ensure msg->bio_iter is properly initialized, every time we are about to begin writing (or re-writing) a message's bio data. Signed-off-by: Alex Elder <[email protected]> Reviewed-by: Sage Weil <[email protected]>
2012-07-05libceph: move init of bio_iterAlex Elder1-5/+4
If a message has a non-null bio pointer, its bio_iter field is initialized in write_partial_msg_pages() if this has not been done already. This is really a one-time setup operation for sending a message's (bio) data, so move that initialization code into prepare_write_message_data() which serves that purpose. Signed-off-by: Alex Elder <[email protected]> Reviewed-by: Sage Weil <[email protected]>
2012-07-05libceph: move init_bio_*() functions upAlex Elder1-25/+25
Move init_bio_iter() and iter_bio_next() up in their source file so the'll be defined before they're needed. Signed-off-by: Alex Elder <[email protected]> Reviewed-by: Sage Weil <[email protected]>
2012-07-05libceph: don't mark footer complete before it isAlex Elder1-1/+3
This is a nit, but prepare_write_message() sets the FOOTER_COMPLETE flag before the CRC for the data portion (recorded in the footer) has been completely computed. Hold off setting the complete flag until we've decided it's ready to send. Signed-off-by: Alex Elder <[email protected]> Reviewed-by: Sage Weil <[email protected]>
2012-07-05libceph: encapsulate advancing msg pageAlex Elder1-24/+34
In write_partial_msg_pages(), once all the data from a page has been sent we advance to the next one. Put the code that takes care of this into its own function. While modifying write_partial_msg_pages(), make its local variable "in_trail" be Boolean, and use the local variable "msg" (which is just the connection's current out_msg pointer) consistently. Signed-off-by: Alex Elder <[email protected]> Reviewed-by: Sage Weil <[email protected]>
2012-07-05libceph: encapsulate out message data setupAlex Elder1-14/+23
Move the code that prepares to write the data portion of a message into its own function. Signed-off-by: Alex Elder <[email protected]> Reviewed-by: Sage Weil <[email protected]>
2012-06-22libceph: drop ceph_con_get/put helpers and nref memberSage Weil1-27/+1
These are no longer used. Every ceph_connection instance is embedded in another structure, and refcounts manipulated via the get/put ops. Signed-off-by: Sage Weil <[email protected]>
2012-06-22libceph: use con get/put methodsSage Weil1-8/+8
The ceph_con_get/put() helpers manipulate the embedded con ref count, which isn't used now that ceph_connections are embedded in other structures. Signed-off-by: Sage Weil <[email protected]> Reviewed-by: Alex Elder <[email protected]>
2012-06-20libceph: flush msgr queue during mon_client shutdownSage Weil2-7/+8
We need to flush the msgr workqueue during mon_client shutdown to ensure that any work affecting our embedded ceph_connection is finished so that we can be safely destroyed. Previously, we were flushing the work queue after osd_client shutdown and before mon_client shutdown to ensure that any osd connection refs to authorizers are flushed. Remove the redundant flush, and document in the comment that the mon_client flush is needed to cover that case as well. Signed-off-by: Sage Weil <[email protected]> Reviewed-by: Alex Elder <[email protected]> (cherry picked from commit f3dea7edd3d449fe7a6d402c1ce56a294b985261)
2012-06-20rbd: Clear ceph_msg->bio_iter for retransmitted messageYan, Zheng1-0/+4
The bug can cause NULL pointer dereference in write_partial_msg_pages Signed-off-by: Zheng Yan <[email protected]> Reviewed-by: Alex Elder <[email protected]> (cherry picked from commit 43643528cce60ca184fe8197efa8e8da7c89a037)
2012-06-20libceph: use con get/put ops from osd_clientSage Weil1-4/+4
There were a few direct calls to ceph_con_{get,put}() instead of the con ops from osd_client.c. This is a bug since those ops aren't defined to be ceph_con_get/put. This breaks refcounting on the ceph_osd structs that contain the ceph_connections, and could lead to all manner of strangeness. The purpose of the ->get and ->put methods in a ceph connection are to allow the connection to indicate it has a reference to something external to the messaging system, *not* to indicate something external has a reference to the connection. [[email protected]: added that last sentence] Signed-off-by: Sage Weil <[email protected]> Reviewed-by: Alex Elder <[email protected]> (cherry picked from commit 0d47766f14211a73eaf54cab234db134ece79f49)
2012-06-20libceph: osd_client: don't drop reply reference too earlyAlex Elder1-2/+2
In ceph_osdc_release_request(), a reference to the r_reply message is dropped. But just after that, that same message is revoked if it was in use to receive an incoming reply. Reorder these so we are sure we hold a reference until we're actually done with the message. Signed-off-by: Alex Elder <[email protected]> Reviewed-by: Sage Weil <[email protected]> (cherry picked from commit ab8cb34a4b2f60281a4b18b1f1ad23bc2313d91b)
2012-06-19libceph: fix NULL dereference in reset_connection()Dan Carpenter1-1/+1
We dereference "con->in_msg" on the line after it was set to NULL. Signed-off-by: Dan Carpenter <[email protected]> Reviewed-by: Alex Elder <[email protected]>
2012-06-15Merge tag 'v3.5-rc1'Sage Weil9-36/+37
Linux 3.5-rc1 Conflicts: net/ceph/messenger.c
2012-06-15libceph: flush msgr queue during mon_client shutdownSage Weil2-7/+8
We need to flush the msgr workqueue during mon_client shutdown to ensure that any work affecting our embedded ceph_connection is finished so that we can be safely destroyed. Previously, we were flushing the work queue after osd_client shutdown and before mon_client shutdown to ensure that any osd connection refs to authorizers are flushed. Remove the redundant flush, and document in the comment that the mon_client flush is needed to cover that case as well. Signed-off-by: Sage Weil <[email protected]> Reviewed-by: Alex Elder <[email protected]>
2012-06-15libceph: transition socket state prior to actual connectSage Weil1-2/+1
Once we call ->connect(), we are racing against the actual connection, and a subsequent transition from CONNECTING -> CONNECTED. Set the state to CONNECTING before that, under the protection of the mutex, to avoid the race. This was introduced in 928443cd9644e7cfd46f687dbeffda2d1a357ff9, with the original socket state code. Signed-off-by: Sage Weil <[email protected]> Reviewed-by: Alex Elder <[email protected]>
2012-06-07libceph: fix overflow in osdmap_apply_incremental()Xi Wang1-0/+4
On 32-bit systems, a large `pglen' would overflow `pglen*sizeof(u32)' and bypass the check ceph_decode_need(p, end, pglen*sizeof(u32), bad). It would also overflow the subsequent kmalloc() size, leading to out-of-bounds write. Signed-off-by: Xi Wang <[email protected]> Reviewed-by: Alex Elder <[email protected]>
2012-06-07libceph: fix overflow in osdmap_decode()Xi Wang1-0/+3
On 32-bit systems, a large `n' would overflow `n * sizeof(u32)' and bypass the check ceph_decode_need(p, end, n * sizeof(u32), bad). It would also overflow the subsequent kmalloc() size, leading to out-of-bounds write. Signed-off-by: Xi Wang <[email protected]> Reviewed-by: Alex Elder <[email protected]>
2012-06-07libceph: fix overflow in __decode_pool_names()Xi Wang1-6/+7
`len' is read from network and thus needs validation. Otherwise a large `len' would cause out-of-bounds access via the memcpy() call. In addition, len = 0xffffffff would overflow the kmalloc() size, leading to out-of-bounds write. This patch adds a check of `len' via ceph_decode_need(). Also use kstrndup rather than kmalloc/memcpy. [[email protected]: added -ENOMEM return for null kstrndup() result] Signed-off-by: Xi Wang <[email protected]> Reviewed-by: Alex Elder <[email protected]>
2012-06-07rbd: Clear ceph_msg->bio_iter for retransmitted messageYan, Zheng1-0/+4
The bug can cause NULL pointer dereference in write_partial_msg_pages Signed-off-by: Zheng Yan <[email protected]> Reviewed-by: Alex Elder <[email protected]>
2012-06-06libceph: make ceph_con_revoke_message() a msg opAlex Elder2-11/+20
ceph_con_revoke_message() is passed both a message and a ceph connection. A ceph_msg allocated for incoming messages on a connection always has a pointer to that connection, so there's no need to provide the connection when revoking such a message. Note that the existing logic does not preclude the message supplied being a null/bogus message pointer. The only user of this interface is the OSD client, and the only value an osd client passes is a request's r_reply field. That is always non-null (except briefly in an error path in ceph_osdc_alloc_request(), and that drops the only reference so the request won't ever have a reply to revoke). So we can safely assume the passed-in message is non-null, but add a BUG_ON() to make it very obvious we are imposing this restriction. Rename the function ceph_msg_revoke_incoming() to reflect that it is really an operation on an incoming message. Signed-off-by: Alex Elder <[email protected]> Reviewed-by: Sage Weil <[email protected]>
2012-06-06libceph: make ceph_con_revoke() a msg operationAlex Elder3-7/+12
ceph_con_revoke() is passed both a message and a ceph connection. Now that any message associated with a connection holds a pointer to that connection, there's no need to provide the connection when revoking a message. This has the added benefit of precluding the possibility of the providing the wrong connection pointer. If the message's connection pointer is null, it is not being tracked by any connection, so revoking it is a no-op. This is supported as a convenience for upper layers, so they can revoke a message that is not actually "in flight." Rename the function ceph_msg_revoke() to reflect that it is really an operation on a message, not a connection. Signed-off-by: Alex Elder <[email protected]> Reviewed-by: Sage Weil <[email protected]>