aboutsummaryrefslogtreecommitdiff
path: root/include/linux/ceph
AgeCommit message (Collapse)AuthorFilesLines
2016-05-26libceph: rename ceph_oloc_oid_to_pg()Ilya Dryomov1-5/+4
Rename ceph_oloc_oid_to_pg() to ceph_object_locator_to_pg(). Emphasise that returned is raw PG and return -ENOENT instead of -EIO if the pool doesn't exist. Signed-off-by: Ilya Dryomov <[email protected]>
2016-05-26libceph: fix ceph_eversion encodingIlya Dryomov1-1/+1
eversion_t is version+epoch in userspace and is encoded in that order. ceph_eversion is defined as epoch+version in rados.h, yet we memcpy it in __send_request(). Reoder ceph_eversion fields. Signed-off-by: Ilya Dryomov <[email protected]>
2016-05-26libceph: DEFINE_RB_FUNCS macroIlya Dryomov1-0/+57
Given struct foo { u64 id; struct rb_node bar_node; }; generate insert_bar(), erase_bar() and lookup_bar() functions with DEFINE_RB_FUNCS(bar, struct foo, id, bar_node) The key is assumed to be an integer (u64, int, etc), compared with < and >. nodefld has to be initialized with RB_CLEAR_NODE(). Start using it for MDS, MON and OSD requests and OSD sessions. Signed-off-by: Ilya Dryomov <[email protected]>
2016-05-26libceph: nuke unused fields and functionsIlya Dryomov3-13/+2
Either unused or useless: osdmap->mkfs_epoch osd->o_marked_for_keepalive monc->num_generic_requests osdc->map_waiters osdc->last_requested_map osdc->timeout_tid osd_req_op_cls_response_data() osdmap_apply_incremental() @msgr arg Signed-off-by: Ilya Dryomov <[email protected]>
2016-05-26libceph: variable-sized ceph_object_idIlya Dryomov1-25/+37
Currently ceph_object_id can hold object names of up to 100 (CEPH_MAX_OID_NAME_LEN) characters. This is enough for all use cases, expect one - long rbd image names: - a format 1 header is named "<imgname>.rbd" - an object that points to a format 2 header is named "rbd_id.<imgname>" We operate on these potentially long-named objects during rbd map, and, for format 1 images, during header refresh. (A format 2 header name is a small system-generated string.) Lift this 100 character limit by making ceph_object_id be able to point to an externally-allocated string. Apart from being able to work with almost arbitrarily-long named objects, this allows us to reduce the size of ceph_object_id from >100 bytes to 64 bytes. Signed-off-by: Ilya Dryomov <[email protected]>
2016-05-26libceph: move message allocation out of ceph_osdc_alloc_request()Ilya Dryomov1-0/+1
The size of ->r_request and ->r_reply messages depends on the size of the object name (ceph_object_id), while the size of ceph_osd_request is fixed. Move message allocation into a separate function that would have to be called after ceph_object_id and ceph_object_locator (which is also going to become variable in size with RADOS namespaces) have been filled in: req = ceph_osdc_alloc_request(...); <fill in req->r_base_oid> <fill in req->r_base_oloc> ceph_osdc_alloc_messages(req); Signed-off-by: Ilya Dryomov <[email protected]>
2016-04-25libceph: make authorizer destruction independent of ceph_auth_clientIlya Dryomov2-6/+5
Starting the kernel client with cephx disabled and then enabling cephx and restarting userspace daemons can result in a crash: [262671.478162] BUG: unable to handle kernel paging request at ffffebe000000000 [262671.531460] IP: [<ffffffff811cd04a>] kfree+0x5a/0x130 [262671.584334] PGD 0 [262671.635847] Oops: 0000 [#1] SMP [262672.055841] CPU: 22 PID: 2961272 Comm: kworker/22:2 Not tainted 4.2.0-34-generic #39~14.04.1-Ubuntu [262672.162338] Hardware name: Dell Inc. PowerEdge R720/068CDY, BIOS 2.4.3 07/09/2014 [262672.268937] Workqueue: ceph-msgr con_work [libceph] [262672.322290] task: ffff88081c2d0dc0 ti: ffff880149ae8000 task.ti: ffff880149ae8000 [262672.428330] RIP: 0010:[<ffffffff811cd04a>] [<ffffffff811cd04a>] kfree+0x5a/0x130 [262672.535880] RSP: 0018:ffff880149aeba58 EFLAGS: 00010286 [262672.589486] RAX: 000001e000000000 RBX: 0000000000000012 RCX: ffff8807e7461018 [262672.695980] RDX: 000077ff80000000 RSI: ffff88081af2be04 RDI: 0000000000000012 [262672.803668] RBP: ffff880149aeba78 R08: 0000000000000000 R09: 0000000000000000 [262672.912299] R10: ffffebe000000000 R11: ffff880819a60e78 R12: ffff8800aec8df40 [262673.021769] R13: ffffffffc035f70f R14: ffff8807e5b138e0 R15: ffff880da9785840 [262673.131722] FS: 0000000000000000(0000) GS:ffff88081fac0000(0000) knlGS:0000000000000000 [262673.245377] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [262673.303281] CR2: ffffebe000000000 CR3: 0000000001c0d000 CR4: 00000000001406e0 [262673.417556] Stack: [262673.472943] ffff880149aeba88 ffff88081af2be04 ffff8800aec8df40 ffff88081af2be04 [262673.583767] ffff880149aeba98 ffffffffc035f70f ffff880149aebac8 ffff8800aec8df00 [262673.694546] ffff880149aebac8 ffffffffc035c89e ffff8807e5b138e0 ffff8805b047f800 [262673.805230] Call Trace: [262673.859116] [<ffffffffc035f70f>] ceph_x_destroy_authorizer+0x1f/0x50 [libceph] [262673.968705] [<ffffffffc035c89e>] ceph_auth_destroy_authorizer+0x3e/0x60 [libceph] [262674.078852] [<ffffffffc0352805>] put_osd+0x45/0x80 [libceph] [262674.134249] [<ffffffffc035290e>] remove_osd+0xae/0x140 [libceph] [262674.189124] [<ffffffffc0352aa3>] __reset_osd+0x103/0x150 [libceph] [262674.243749] [<ffffffffc0354703>] kick_requests+0x223/0x460 [libceph] [262674.297485] [<ffffffffc03559e2>] ceph_osdc_handle_map+0x282/0x5e0 [libceph] [262674.350813] [<ffffffffc035022e>] dispatch+0x4e/0x720 [libceph] [262674.403312] [<ffffffffc034bd91>] try_read+0x3d1/0x1090 [libceph] [262674.454712] [<ffffffff810ab7c2>] ? dequeue_entity+0x152/0x690 [262674.505096] [<ffffffffc034cb1b>] con_work+0xcb/0x1300 [libceph] [262674.555104] [<ffffffff8108fb3e>] process_one_work+0x14e/0x3d0 [262674.604072] [<ffffffff810901ea>] worker_thread+0x11a/0x470 [262674.652187] [<ffffffff810900d0>] ? rescuer_thread+0x310/0x310 [262674.699022] [<ffffffff810957a2>] kthread+0xd2/0xf0 [262674.744494] [<ffffffff810956d0>] ? kthread_create_on_node+0x1c0/0x1c0 [262674.789543] [<ffffffff817bd81f>] ret_from_fork+0x3f/0x70 [262674.834094] [<ffffffff810956d0>] ? kthread_create_on_node+0x1c0/0x1c0 What happens is the following: (1) new MON session is established (2) old "none" ac is destroyed (3) new "cephx" ac is constructed ... (4) old OSD session (w/ "none" authorizer) is put ceph_auth_destroy_authorizer(ac, osd->o_auth.authorizer) osd->o_auth.authorizer in the "none" case is just a bare pointer into ac, which contains a single static copy for all services. By the time we get to (4), "none" ac, freed in (2), is long gone. On top of that, a new vtable installed in (3) points us at ceph_x_destroy_authorizer(), so we end up trying to destroy a "none" authorizer with a "cephx" destructor operating on invalid memory! To fix this, decouple authorizer destruction from ac and do away with a single static "none" authorizer by making a copy for each OSD or MDS session. Authorizers themselves are independent of ac and so there is no reason for destroy_authorizer() to be an ac op. Make it an op on the authorizer itself by turning ceph_authorizer into a real struct. Fixes: http://tracker.ceph.com/issues/15447 Reported-by: Alan Zhang <[email protected]> Signed-off-by: Ilya Dryomov <[email protected]> Reviewed-by: Sage Weil <[email protected]>
2016-04-04mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macrosKirill A. Shutemov1-2/+2
PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time ago with promise that one day it will be possible to implement page cache with bigger chunks than PAGE_SIZE. This promise never materialized. And unlikely will. We have many places where PAGE_CACHE_SIZE assumed to be equal to PAGE_SIZE. And it's constant source of confusion on whether PAGE_CACHE_* or PAGE_* constant should be used in a particular case, especially on the border between fs and mm. Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much breakage to be doable. Let's stop pretending that pages in page cache are special. They are not. The changes are pretty straight-forward: - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>; - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>; - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN}; - page_cache_get() -> get_page(); - page_cache_release() -> put_page(); This patch contains automated changes generated with coccinelle using script below. For some reason, coccinelle doesn't patch header files. I've called spatch for them manually. The only adjustment after coccinelle is revert of changes to PAGE_CAHCE_ALIGN definition: we are going to drop it later. There are few places in the code where coccinelle didn't reach. I'll fix them manually in a separate patch. Comments and documentation also will be addressed with the separate patch. virtual patch @@ expression E; @@ - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT) + E @@ expression E; @@ - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) + E @@ @@ - PAGE_CACHE_SHIFT + PAGE_SHIFT @@ @@ - PAGE_CACHE_SIZE + PAGE_SIZE @@ @@ - PAGE_CACHE_MASK + PAGE_MASK @@ expression E; @@ - PAGE_CACHE_ALIGN(E) + PAGE_ALIGN(E) @@ expression E; @@ - page_cache_get(E) + get_page(E) @@ expression E; @@ - page_cache_release(E) + put_page(E) Signed-off-by: Kirill A. Shutemov <[email protected]> Acked-by: Michal Hocko <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-03-25ceph: fix security xattr deadlockYan, Zheng1-1/+2
When security is enabled, security module can call filesystem's getxattr/setxattr callbacks during d_instantiate(). For cephfs, d_instantiate() is usually called by MDS' dispatch thread, while handling MDS reply. If the MDS reply does not include xattrs and corresponding caps, getxattr/setxattr need to send a new request to MDS and waits for the reply. This makes MDS' dispatch sleep, nobody handles later MDS replies. The fix is make sure lookup/atomic_open reply include xattrs and corresponding caps. So getxattr can be handled by cached xattrs. This requires some modification to both MDS and request message. (Client tells MDS what caps it wants; MDS encodes proper caps in the reply) Smack security module may call setxattr during d_instantiate(). Unlike getxattr, we can't force MDS to issue CEPH_CAP_XATTR_EXCL to us. So just make setxattr return error when called by MDS' dispatch thread. Signed-off-by: Yan, Zheng <[email protected]>
2016-03-25libceph: add helper that duplicates last extent operationYan, Zheng1-0/+2
This helper duplicates last extent operation in OSD request, then adjusts the new extent operation's offset and length. The helper is for scatterd page writeback, which adds nonconsecutive dirty pages to single OSD request. Signed-off-by: Yan, Zheng <[email protected]> Signed-off-by: Ilya Dryomov <[email protected]>
2016-03-25libceph: enable large, variable-sized OSD requestsIlya Dryomov1-2/+4
Turn r_ops into a flexible array member to enable large, consisting of up to 16 ops, OSD requests. The use case is scattered writeback in cephfs and, as far as the kernel client is concerned, 16 is just a made up number. r_ops had size 3 for copyup+hint+write, but copyup is really a special case - it can only happen once. ceph_osd_request_cache is therefore stuffed with num_ops=2 requests, anything bigger than that is allocated with kmalloc(). req_mempool is backed by ceph_osd_request_cache, which means either num_ops=1 or num_ops=2 for use_mempool=true - all existing users (ceph_writepages_start(), ceph_osdc_writepages()) are fine with that. Signed-off-by: Ilya Dryomov <[email protected]>
2016-03-25libceph: move r_reply_op_{len,result} into struct ceph_osd_req_opYan, Zheng1-2/+3
This avoids defining large array of r_reply_op_{len,result} in in struct ceph_osd_request. Signed-off-by: Yan, Zheng <[email protected]> Signed-off-by: Ilya Dryomov <[email protected]>
2016-03-25libceph: rename ceph_osd_req_op::payload_len to indata_lenIlya Dryomov1-1/+1
Follow userspace nomenclature on this - the next commit adds outdata_len. Signed-off-by: Ilya Dryomov <[email protected]>
2016-03-25libceph: monc hunt rate is 3s with backoff up to 30sIlya Dryomov2-0/+6
Unless we are in the process of setting up a client (i.e. connecting to the monitor cluster for the first time), apply a backoff: every time we want to reopen a session, increase our timeout by a multiple (currently 2); when we complete the connection, reduce that multipler by 50%. Mirrors ceph.git commit 794c86fd289bd62a35ed14368fa096c46736e9a2. Signed-off-by: Ilya Dryomov <[email protected]>
2016-03-25libceph: monc ping rate is 10sIlya Dryomov1-2/+3
Split ping interval and ping timeout: ping interval is 10s; keepalive timeout is 30s. Make monc_ping_timeout a constant while at it - it's not actually exported as a mount option (and the rest of tick-related settings won't be either), so it's got no place in ceph_options. Signed-off-by: Ilya Dryomov <[email protected]>
2016-03-25libceph: revamp subs code, switch to SUBSCRIBE2 protocolIlya Dryomov3-10/+24
It is currently hard-coded in the mon_client that mdsmap and monmap subs are continuous, while osdmap sub is always "onetime". To better handle full clusters/pools in the osd_client, we need to be able to issue continuous osdmap subs. Revamp subs code to allow us to specify for each sub whether it should be continuous or not. Although not strictly required for the above, switch to SUBSCRIBE2 protocol while at it, eliminating the ambiguity between a request for "every map since X" and a request for "just the latest" when we don't have a map yet (i.e. have epoch 0). SUBSCRIBE2 feature bit is now required - it's been supported since pre-argonaut (2010). Move "got mdsmap" call to the end of ceph_mdsc_handle_map() - calling in before we validate the epoch and successfully install the new map can mess up mon_client sub state. Signed-off-by: Ilya Dryomov <[email protected]>
2016-03-04ceph: initial CEPH_FEATURE_FS_FILE_LAYOUT_V2 supportYan, Zheng1-0/+1
Add support for the format change of MClientReply/MclientCaps. Also add code that denies access to inodes with pool_ns layouts. Signed-off-by: Yan, Zheng <[email protected]> Reviewed-by: Sage Weil <[email protected]>
2016-02-04libceph: MOSDOpReply v7 encodingIlya Dryomov1-1/+4
Empty request_redirect_t (struct ceph_request_redirect in the kernel client) is now encoded with a bool. NEW_OSDOPREPLY_ENCODING feature bit overlaps with already supported CRUSH_TUNABLES5. Signed-off-by: Ilya Dryomov <[email protected]> Reviewed-by: Sage Weil <[email protected]>
2016-02-04libceph: advertise support for TUNABLES5Ilya Dryomov1-1/+12
Add TUNABLES5 feature (chooseleaf_stable tunable) to a set of features supported by default. Signed-off-by: Ilya Dryomov <[email protected]> Reviewed-by: Sage Weil <[email protected]>
2016-01-21libceph: fix ceph_msg_revoke()Ilya Dryomov1-1/+1
There are a number of problems with revoking a "was sending" message: (1) We never make any attempt to revoke data - only kvecs contibute to con->out_skip. However, once the header (envelope) is written to the socket, our peer learns data_len and sets itself to expect at least data_len bytes to follow front or front+middle. If ceph_msg_revoke() is called while the messenger is sending message's data portion, anything we send after that call is counted by the OSD towards the now revoked message's data portion. The effects vary, the most common one is the eventual hang - higher layers get stuck waiting for the reply to the message that was sent out after ceph_msg_revoke() returned and treated by the OSD as a bunch of data bytes. This is what Matt ran into. (2) Flat out zeroing con->out_kvec_bytes worth of bytes to handle kvecs is wrong. If ceph_msg_revoke() is called before the tag is sent out or while the messenger is sending the header, we will get a connection reset, either due to a bad tag (0 is not a valid tag) or a bad header CRC, which kind of defeats the purpose of revoke. Currently the kernel client refuses to work with header CRCs disabled, but that will likely change in the future, making this even worse. (3) con->out_skip is not reset on connection reset, leading to one or more spurious connection resets if we happen to get a real one between con->out_skip is set in ceph_msg_revoke() and before it's cleared in write_partial_skip(). Fixing (1) and (3) is trivial. The idea behind fixing (2) is to never zero the tag or the header, i.e. send out tag+header regardless of when ceph_msg_revoke() is called. That way the header is always correct, no unnecessary resets are induced and revoke stands ready for disabled CRCs. Since ceph_msg_revoke() rips out con->out_msg, introduce a new "message out temp" and copy the header into it before sending. Cc: [email protected] # 4.0+ Reported-by: Matt Conner <[email protected]> Signed-off-by: Ilya Dryomov <[email protected]> Tested-by: Matt Conner <[email protected]> Reviewed-by: Sage Weil <[email protected]>
2016-01-21ceph: ceph_frag_contains_value can be booleanYaowei Bai1-1/+1
This patch makes ceph_frag_contains_value return bool to improve readability due to this particular function only using either one or zero as its return value. No functional change. Signed-off-by: Yaowei Bai <[email protected]> Signed-off-by: Yan, Zheng <[email protected]>
2016-01-21ceph: remove unused functions in ceph_frag.hYaowei Bai1-35/+0
These functions were introduced in commit 3d14c5d2b ("ceph: factor out libceph from Ceph file system"). Howover, there's no user of these functions since then, so remove them for simplicity. Signed-off-by: Yaowei Bai <[email protected]> Signed-off-by: Yan, Zheng <[email protected]>
2015-11-02libceph: add nocephx_sign_messages optionIlya Dryomov1-1/+2
Support for message signing was merged into 3.19, along with nocephx_require_signatures option. But, all that option does is allow the kernel client to talk to clusters that don't support MSG_AUTH feature bit. That's pretty useless, given that it's been supported since bobtail. Meanwhile, if one disables message signing on the server side with "cephx sign messages = false", it becomes impossible to use the kernel client since it expects messages to be signed if MSG_AUTH was negotiated. Add nocephx_sign_messages option to support this use case. Signed-off-by: Ilya Dryomov <[email protected]>
2015-11-02libceph: stop duplicating client fields in messengerIlya Dryomov2-10/+2
supported_features and required_features serve no purpose at all, while nocrc and tcp_nodelay belong to ceph_options::flags. Signed-off-by: Ilya Dryomov <[email protected]>
2015-11-02libceph: msg signing callouts don't need con argumentIlya Dryomov1-3/+2
We can use msg->con instead - at the point we sign an outgoing message or check the signature on the incoming one, msg->con is always set. We wouldn't know how to sign a message without an associated session (i.e. msg->con == NULL) and being able to sign a message using an explicitly provided authorizer is of no use. Signed-off-by: Ilya Dryomov <[email protected]>
2015-09-17libceph: advertise support for keepalive2Ilya Dryomov1-0/+1
We are the client, but advertise keepalive2 anyway - for consistency, if nothing else. In the future the server might want to know whether its clients support keepalive2. Signed-off-by: Ilya Dryomov <[email protected]> Reviewed-by: Yan, Zheng <[email protected]>
2015-09-17libceph: don't access invalid memory in keepalive2 pathIlya Dryomov1-1/+3
This struct ceph_timespec ceph_ts; ... con_out_kvec_add(con, sizeof(ceph_ts), &ceph_ts); wraps ceph_ts into a kvec and adds it to con->out_kvec array, yet ceph_ts becomes invalid on return from prepare_write_keepalive(). As a result, we send out bogus keepalive2 stamps. Fix this by encoding into a ceph_timespec member, similar to how acks are read and written. Signed-off-by: Ilya Dryomov <[email protected]> Reviewed-by: Yan, Zheng <[email protected]>
2015-09-08libceph: use keepalive2 to verify the mon session is aliveYan, Zheng3-1/+9
Signed-off-by: Yan, Zheng <[email protected]> Signed-off-by: Ilya Dryomov <[email protected]>
2015-07-09libceph: enable ceph in a non-default network namespaceIlya Dryomov1-0/+3
Grab a reference on a network namespace of the 'rbd map' (in case of rbd) or 'mount' (in case of ceph) process and use that to open sockets instead of always using init_net and bailing if network namespace is anything but init_net. Be careful to not share struct ceph_client instances between different namespaces and don't add any code in the !CONFIG_NET_NS case. This is based on a patch from Hong Zhiguo <[email protected]>. Signed-off-by: Ilya Dryomov <[email protected]> Reviewed-by: Sage Weil <[email protected]>
2015-06-25ceph: pre-allocate data structure that tracks caps flushingYan, Zheng1-0/+1
Signed-off-by: Yan, Zheng <[email protected]>
2015-06-25libceph: store timeouts in jiffies, verify user inputIlya Dryomov1-6/+11
There are currently three libceph-level timeouts that the user can specify on mount: mount_timeout, osd_idle_ttl and osdkeepalive. All of these are in seconds and no checking is done on user input: negative values are accepted, we multiply them all by HZ which may or may not overflow, arbitrarily large jiffies then get added together, etc. There is also a bug in the way mount_timeout=0 is handled. It's supposed to mean "infinite timeout", but that's not how wait.h APIs treat it and so __ceph_open_session() for example will busy loop without much chance of being interrupted if none of ceph-mons are there. Fix all this by verifying user input, storing timeouts capped by msecs_to_jiffies() in jiffies and using the new ceph_timeout_jiffies() helper for all user-specified waits to handle infinite timeouts correctly. Signed-off-by: Ilya Dryomov <[email protected]> Reviewed-by: Alex Elder <[email protected]>
2015-06-25libceph: nuke time_sub()Ilya Dryomov1-9/+0
Unused since ceph got merged into mainline I guess. Signed-off-by: Ilya Dryomov <[email protected]> Reviewed-by: Alex Elder <[email protected]>
2015-06-25libceph: allow setting osd_req_op's flagsYan, Zheng1-1/+1
Signed-off-by: Yan, Zheng <[email protected]> Reviewed-by: Alex Elder <[email protected]>
2015-04-22libceph: announce support for straw2 bucketsIlya Dryomov1-1/+15
Sync up feature bits and enable CEPH_FEATURE_CRUSH_V4. Signed-off-by: Ilya Dryomov <[email protected]>
2015-04-22ceph: rename snapshot supportYan, Zheng1-0/+1
Signed-off-by: Yan, Zheng <[email protected]>
2015-04-20libceph: simplify our debugfs attr macroIlya Dryomov1-7/+1
No need to do single_open()'s job ourselves. Signed-off-by: Ilya Dryomov <[email protected]>
2015-04-20libceph: expose client options through debugfsIlya Dryomov1-0/+1
Add a client_options attribute for showing libceph options. Signed-off-by: Ilya Dryomov <[email protected]>
2015-04-20libceph, ceph: split ceph_show_options()Ilya Dryomov1-0/+1
Split ceph_show_options() into two pieces and move the piece responsible for printing client (libceph) options into net/ceph. This way people adding a libceph option wouldn't have to remember to update code in fs/ceph. Signed-off-by: Ilya Dryomov <[email protected]>
2015-04-20libceph: osdmap.h: Add missing format newlinesJoe Perches1-3/+2
To avoid possible interleaving, add missing '\n' to formats. Convert pr_warning to pr_warn while there. Signed-off-by: Joe Perches <[email protected]> Signed-off-by: Ilya Dryomov <[email protected]>
2015-02-19libceph: tcp_nodelay supportChaitanya Huilgol2-2/+5
TCP_NODELAY socket option set on connection sockets, disables Nagle’s algorithm and improves latency characteristics. tcp_nodelay(default)/notcp_nodelay option flags provided to enable/disable setting the socket option. Signed-off-by: Chaitanya Huilgol <[email protected]> [[email protected]: NO_TCP_NODELAY -> TCP_NODELAY, minor adjustments] Signed-off-by: Ilya Dryomov <[email protected]>
2015-02-19ceph: handle SESSION_FORCE_RO messageYan, Zheng1-0/+1
mark session as readonly and wake up all cap waiters. Signed-off-by: Yan, Zheng <[email protected]>
2015-02-19libceph: nuke pool op infrastructureIlya Dryomov2-44/+1
On Mon, Dec 22, 2014 at 5:35 PM, Sage Weil <[email protected]> wrote: > On Mon, 22 Dec 2014, Ilya Dryomov wrote: >> Actually, pool op stuff has been unused for over two years - looks like >> it was added for rbd create_snap and that got ripped out in 2012. It's >> unlikely we'd ever need to manage pools or snaps from the kernel client >> so I think it makes sense to nuke it. Sage? > > Yep! Signed-off-by: Ilya Dryomov <[email protected]>
2015-01-08libceph: fix sparse endianness warningsIlya Dryomov1-2/+2
The only real issue is the one in auth_x.c and it came with 3.19-rc1 merge. Signed-off-by: Ilya Dryomov <[email protected]>
2014-12-17libceph: fixup includes in pagelist.hIlya Dryomov1-1/+3
pagelist.h needs to include linux/types.h and asm/byteorder.h and not rely on other headers pulling yet another set of headers. Signed-off-by: Ilya Dryomov <[email protected]>
2014-12-17ceph: use getattr request to fetch inline dataYan, Zheng1-0/+2
Add a new parameter 'locked_page' to ceph_do_getattr(). If inline data in getattr reply will be copied to the page. Signed-off-by: Yan, Zheng <[email protected]>
2014-12-17ceph: add inline data to pagecacheYan, Zheng1-0/+1
Request reply and cap message can contain inline data. add inline data to the page cache if there is Fc cap. Signed-off-by: Yan, Zheng <[email protected]>
2014-12-17libceph: specify position of extent operationYan, Zheng1-1/+2
allow specifying position of extent operation in multi-operations osd request. This is required for cephfs to convert inline data to normal data (compare xattr, then write object). Signed-off-by: Yan, Zheng <[email protected]> Reviewed-by: Ilya Dryomov <[email protected]>
2014-12-17libceph: add SETXATTR/CMPXATTR osd operations supportYan, Zheng1-0/+10
Signed-off-by: Yan, Zheng <[email protected]> Reviewed-by: Ilya Dryomov <[email protected]>
2014-12-17libceph: require cephx message signature by defaultYan, Zheng1-0/+1
Signed-off-by: Yan, Zheng <[email protected]> Reviewed-by: Ilya Dryomov <[email protected]>
2014-12-17libceph: update ceph_msg_header structureJohn Spray1-1/+2
2 bytes of what was reserved space is now used by userspace for the compat_version field. Signed-off-by: John Spray <[email protected]> Reviewed-by: Sage Weil <[email protected]>