aboutsummaryrefslogtreecommitdiff
path: root/include/linux/ceph/osdmap.h
AgeCommit message (Collapse)AuthorFilesLines
2017-07-07libceph: osd_state is 32 bits wide in luminousIlya Dryomov1-2/+2
Signed-off-by: Ilya Dryomov <[email protected]>
2017-07-07libceph: pg_upmap[_items] infrastructureIlya Dryomov1-1/+9
pg_temp and pg_upmap encodings are the same (PG -> array of osds), except for the incremental remove: it's an empty mapping in new_pg_temp for pg_temp and a separate old_pg_upmap set for pg_upmap. (This isn't to allow for empty pg_upmap mappings -- apparently, pg_temp just wasn't looked at as an example for pg_upmap encoding.) Reuse __decode_pg_temp() for decoding pg_upmap and new_pg_upmap. __decode_pg_temp() stores into pg_temp union member, but since pg_upmap union member is identical, reading through pg_upmap later is OK. Signed-off-by: Ilya Dryomov <[email protected]>
2017-07-07libceph: respect RADOS_BACKOFF backoffsIlya Dryomov1-0/+1
Signed-off-by: Ilya Dryomov <[email protected]>
2017-07-07libceph: avoid unnecessary pi lookups in calc_target()Ilya Dryomov1-2/+8
Signed-off-by: Ilya Dryomov <[email protected]>
2017-07-07libceph: resend on PG splits if OSD has RESEND_ON_SPLITIlya Dryomov1-0/+2
Note that ceph_osd_request_target fields are updated regardless of RESEND_ON_SPLIT. Signed-off-by: Ilya Dryomov <[email protected]>
2017-07-07libceph: MOSDOp v8 encoding (actual spgid + full hash)Ilya Dryomov1-1/+3
Signed-off-by: Ilya Dryomov <[email protected]>
2017-07-07libceph: introduce ceph_spg, ceph_pg_to_primary_shard()Ilya Dryomov1-0/+10
Store both raw pgid and actual spgid in ceph_osd_request_target. Signed-off-by: Ilya Dryomov <[email protected]>
2017-02-20rbd: kill obj_request->object_name and rbd_segment_name_cacheIlya Dryomov1-7/+0
Signed-off-by: Ilya Dryomov <[email protected]> Reviewed-by: Jason Dillaman <[email protected]>
2017-02-20crush: merge working data and scratchIlya Dryomov1-2/+1
Much like Arlo Guthrie, I decided that one big pile is better than two little piles. Reflects ceph.git commit 95c2df6c7e0b22d2ea9d91db500cf8b9441c73ba. Signed-off-by: Ilya Dryomov <[email protected]>
2017-02-20crush: remove mutable part of CRUSH mapIlya Dryomov1-0/+1
Then add it to the working state. It would be very nice if we didn't have to take a lock to calculate a crush placement. By moving the permutation array into the working data, we can treat the CRUSH map as immutable. Reflects ceph.git commit cbcd039651c0569551cb90d26ce27e1432671f2a. Signed-off-by: Ilya Dryomov <[email protected]>
2017-02-20libceph: use BUG() instead of BUG_ON(1)Arnd Bergmann1-1/+1
I ran into this compile warning, which is the result of BUG_ON(1) not always leading to the compiler treating the code path as unreachable: include/linux/ceph/osdmap.h: In function 'ceph_can_shift_osds': include/linux/ceph/osdmap.h:62:1: error: control reaches end of non-void function [-Werror=return-type] Using BUG() here avoids the warning. Signed-off-by: Arnd Bergmann <[email protected]> Signed-off-by: Ilya Dryomov <[email protected]>
2016-07-28libceph: rados pool namespace supportYan, Zheng1-5/+5
Add pool namesapce pointer to struct ceph_file_layout and struct ceph_object_locator. Pool namespace is used by when mapping object to PG, it's also used when composing OSD request. The namespace pointer in struct ceph_file_layout is RCU protected. So libceph can read namespace without taking lock. Signed-off-by: Yan, Zheng <[email protected]> [[email protected]: ceph_oloc_destroy(), misc minor changes] Signed-off-by: Ilya Dryomov <[email protected]>
2016-07-28libceph: add an ONSTACK initializer for oidsIlya Dryomov1-0/+5
An on-stack oid in ceph_ioctl_get_dataloc() is not initialized, resulting in a WARN and a NULL pointer dereference later on. We will have more of these on-stack in the future, so fix it with a convenience macro. Fixes: d30291b985d1 ("libceph: variable-sized ceph_object_id") Signed-off-by: Ilya Dryomov <[email protected]>
2016-05-30libceph: change ceph_osdmap_flag() to take osdcIlya Dryomov1-5/+0
For the benefit of every single caller, take osdc instead of map. Also, now that osdc->osdmap can't ever be NULL, drop the check. Signed-off-by: Ilya Dryomov <[email protected]>
2016-05-26ceph: make logical calculation functions return boolZhang Zhuoyu1-3/+3
This patch makes serverl logical caculation functions return bool to improve readability due to these particular functions only using 0/1 as their return value. No functional change. Signed-off-by: Zhang Zhuoyu <[email protected]>
2016-05-26libceph: handle_one_map()Ilya Dryomov1-0/+2
Separate osdmap handling from decoding and iterating over a bag of maps in a fresh MOSDMap message. This sets up the scene for the updated OSD client. Of particular importance here is the addition of pi->was_full, which can be used to answer "did this pool go full -> not-full in this map?". This is the key bit for supporting pool quotas. We won't be able to downgrade map_sem for much longer, so drop downgrade_write(). Signed-off-by: Ilya Dryomov <[email protected]>
2016-05-26libceph: allocate dummy osdmap in ceph_osdc_init()Ilya Dryomov1-0/+1
This leads to a simpler osdmap handling code, particularly when dealing with pi->was_full, which is introduced in a later commit. Signed-off-by: Ilya Dryomov <[email protected]>
2016-05-26libceph: introduce ceph_osd_request_target, calc_target()Ilya Dryomov1-0/+34
Introduce ceph_osd_request_target, containing all mapping-related fields of ceph_osd_request and calc_target() for calculating mappings and populating it. Signed-off-by: Ilya Dryomov <[email protected]>
2016-05-26libceph: pi->min_size, pi->last_force_request_resendIlya Dryomov1-3/+6
Add and decode pi->min_size and pi->last_force_request_resend. These are going to be used by calc_target(). Signed-off-by: Ilya Dryomov <[email protected]>
2016-05-26libceph: make pgid_cmp() globalIlya Dryomov1-0/+2
calc_target() code is going to need to know how to compare PGs. Take lhs and rhs pgid by const * while at it. Signed-off-by: Ilya Dryomov <[email protected]>
2016-05-26libceph: rename ceph_calc_pg_primary()Ilya Dryomov1-2/+2
Rename ceph_calc_pg_primary() to ceph_pg_to_acting_primary() to emphasise that it returns acting primary. Signed-off-by: Ilya Dryomov <[email protected]>
2016-05-26libceph: ceph_osds, ceph_pg_to_up_acting_osds()Ilya Dryomov1-3/+18
Knowning just acting set isn't enough, we need to be able to record up set as well to detect interval changes. This means returning (up[], up_len, up_primary, acting[], acting_len, acting_primary) and passing it around. Introduce and switch to ceph_osds to help with that. Rename ceph_calc_pg_acting() to ceph_pg_to_up_acting_osds() and return both up and acting sets from it. Signed-off-by: Ilya Dryomov <[email protected]>
2016-05-26libceph: rename ceph_oloc_oid_to_pg()Ilya Dryomov1-5/+4
Rename ceph_oloc_oid_to_pg() to ceph_object_locator_to_pg(). Emphasise that returned is raw PG and return -ENOENT instead of -EIO if the pool doesn't exist. Signed-off-by: Ilya Dryomov <[email protected]>
2016-05-26libceph: nuke unused fields and functionsIlya Dryomov1-4/+2
Either unused or useless: osdmap->mkfs_epoch osd->o_marked_for_keepalive monc->num_generic_requests osdc->map_waiters osdc->last_requested_map osdc->timeout_tid osd_req_op_cls_response_data() osdmap_apply_incremental() @msgr arg Signed-off-by: Ilya Dryomov <[email protected]>
2016-05-26libceph: variable-sized ceph_object_idIlya Dryomov1-25/+37
Currently ceph_object_id can hold object names of up to 100 (CEPH_MAX_OID_NAME_LEN) characters. This is enough for all use cases, expect one - long rbd image names: - a format 1 header is named "<imgname>.rbd" - an object that points to a format 2 header is named "rbd_id.<imgname>" We operate on these potentially long-named objects during rbd map, and, for format 1 images, during header refresh. (A format 2 header name is a small system-generated string.) Lift this 100 character limit by making ceph_object_id be able to point to an externally-allocated string. Apart from being able to work with almost arbitrarily-long named objects, this allows us to reduce the size of ceph_object_id from >100 bytes to 64 bytes. Signed-off-by: Ilya Dryomov <[email protected]>
2015-04-20libceph: osdmap.h: Add missing format newlinesJoe Perches1-3/+2
To avoid possible interleaving, add missing '\n' to formats. Convert pr_warning to pr_warn while there. Signed-off-by: Joe Perches <[email protected]> Signed-off-by: Ilya Dryomov <[email protected]>
2014-04-04libceph: return primary from ceph_calc_pg_acting()Ilya Dryomov1-1/+1
In preparation for adding support for primary_temp, stop assuming primaryness: add a primary out parameter to ceph_calc_pg_acting() and change call sites accordingly. Primary is now specified separately from the order of osds in the set. Signed-off-by: Ilya Dryomov <[email protected]> Reviewed-by: Alex Elder <[email protected]>
2014-04-04libceph: switch ceph_calc_pg_acting() to new helpersIlya Dryomov1-1/+1
Switch ceph_calc_pg_acting() to new helpers: pg_to_raw_osds(), raw_to_up_osds() and apply_temps(). Signed-off-by: Ilya Dryomov <[email protected]> Reviewed-by: Alex Elder <[email protected]>
2014-04-04libceph: ceph_can_shift_osds(pool) and pool type definesIlya Dryomov1-0/+12
Bring in pg_pool_t::can_shift_osds() counterpart along with pool type defines. Signed-off-by: Ilya Dryomov <[email protected]> Reviewed-by: Alex Elder <[email protected]>
2014-04-04libceph: ceph_osd_{exists,is_up,is_down}(osd) definitionsIlya Dryomov1-1/+13
Sync up with ceph.git definitions. Bring in ceph_osd_is_down(). Signed-off-by: Ilya Dryomov <[email protected]> Reviewed-by: Alex Elder <[email protected]>
2014-04-04libceph: primary_affinity infrastructureIlya Dryomov1-0/+3
Add primary_affinity infrastructure. primary_affinity values are stored in an max_osd-sized array, hanging off ceph_osdmap, similar to a osd_weight array. Introduce {get,set}_primary_affinity() helpers, primarily to return CEPH_OSD_DEFAULT_PRIMARY_AFFINITY when no affinity has been set and to abstract out osd_primary_affinity array allocation and initialization. Signed-off-by: Ilya Dryomov <[email protected]> Reviewed-by: Alex Elder <[email protected]>
2014-04-04libceph: primary_temp infrastructureIlya Dryomov1-0/+5
Add primary_temp mappings infrastructure. struct ceph_pg_mapping is overloaded, primary_temp mappings are stored in an rb-tree, rooted at ceph_osdmap, in a manner similar to pg_temp mappings. Dump primary_temp mappings to /sys/kernel/debug/ceph/<client>/osdmap, one 'primary_temp <pgid> <osd>' per line, e.g: primary_temp 2.6 4 Signed-off-by: Ilya Dryomov <[email protected]> Reviewed-by: Alex Elder <[email protected]>
2014-04-04libceph: generalize ceph_pg_mappingIlya Dryomov1-2/+7
In preparation for adding support for primary_temp mappings, generalize struct ceph_pg_mapping so it can hold mappings other than pg_temp. Signed-off-by: Ilya Dryomov <[email protected]> Reviewed-by: Alex Elder <[email protected]>
2014-04-04libceph: split osdmap allocation and decode stepsIlya Dryomov1-1/+1
Split osdmap allocation and initialization into a separate function, ceph_osdmap_decode(). Signed-off-by: Ilya Dryomov <[email protected]> Reviewed-by: Alex Elder <[email protected]>
2014-04-03libceph: a per-osdc crush scratch bufferIlya Dryomov1-0/+3
With the addition of erasure coding support in the future, scratch variable-length array in crush_do_rule_ary() is going to grow to at least 200 bytes on average, on top of another 128 bytes consumed by rawosd/osd arrays in the call chain. Replace it with a buffer inside struct osdmap and a mutex. This shouldn't result in any contention, because all osd requests were already serialized by request_mutex at that point; the only unlocked caller was ceph_ioctl_get_dataloc(). Signed-off-by: Ilya Dryomov <[email protected]> Reviewed-by: Sage Weil <[email protected]>
2014-01-27libceph: follow {read,write}_tier fields on osd request submissionIlya Dryomov1-0/+2
Overwrite ceph_osd_request::r_oloc.pool with read_tier for read ops and write_tier for write and read+write ops (aka basic tiering support). {read,write}_tier are part of pg_pool_t since v9. This commit bumps our pg_pool_t decode compat version from v7 to v9, all new fields except for {read,write}_tier are ignored. Signed-off-by: Ilya Dryomov <[email protected]> Reviewed-by: Sage Weil <[email protected]>
2014-01-27libceph: add ceph_pg_pool_by_id()Ilya Dryomov1-0/+3
"Lookup pool info by ID" function is hidden in osdmap.c. Expose it to the rest of libceph. Signed-off-by: Ilya Dryomov <[email protected]> Reviewed-by: Sage Weil <[email protected]>
2014-01-27libceph: replace ceph_calc_ceph_pg() with ceph_oloc_oid_to_pg()Ilya Dryomov1-2/+5
Switch ceph_calc_ceph_pg() to new oloc and oid abstractions and rename it to ceph_oloc_oid_to_pg() to make its purpose more clear. Signed-off-by: Ilya Dryomov <[email protected]> Reviewed-by: Sage Weil <[email protected]>
2014-01-27libceph: introduce and start using oid abstractionIlya Dryomov1-0/+36
In preparation for tiering support, which would require having two (base and target) object names for each osd request and also copying those names around, introduce struct ceph_object_id (oid) and a couple helpers to facilitate those copies and encapsulate the fact that object name is not necessarily a NUL-terminated string. Signed-off-by: Ilya Dryomov <[email protected]> Reviewed-by: Sage Weil <[email protected]>
2014-01-27libceph: move ceph_file_layout helpers to ceph_fs.hIlya Dryomov1-27/+0
Move ceph_file_layout helper macros and inline functions to ceph_fs.h. Signed-off-by: Ilya Dryomov <[email protected]> Reviewed-by: Sage Weil <[email protected]>
2014-01-27libceph: start using oloc abstractionIlya Dryomov1-2/+1
Instead of relying on pool fields in ceph_file_layout (for mapping) and ceph_pg (for enconding), start using ceph_object_locator (oloc) abstraction. Note that userspace oloc currently consists of pool, key, nspace and hash fields, while this one contains only a pool. This is OK, because at this point we only send (i.e. encode) olocs and never have to receive (i.e. decode) them. This makes keeping a copy of ceph_file_layout in every osd request unnecessary, so ceph_osd_request::r_file_layout field is nuked. Signed-off-by: Ilya Dryomov <[email protected]> Reviewed-by: Sage Weil <[email protected]>
2013-05-01libceph: define ceph_decode_pgid() only onceAlex Elder1-0/+24
There are two basically identical definitions of __decode_pgid() in libceph, one in "net/ceph/osdmap.c" and the other in "net/ceph/osd_client.c". Get rid of both, and instead define a single inline version in "include/linux/ceph/osdmap.h". Signed-off-by: Alex Elder <[email protected]> Reviewed-by: Josh Durgin <[email protected]>
2013-05-01libceph: rename ceph_calc_object_layout()Alex Elder1-4/+2
The purpose of ceph_calc_object_layout() is to fill in the pool number and seed for a ceph_pg structure provided, based on a given osd map and target object id. Currently that function takes a file layout parameter, but the only thing used out of that is its pool number. Change the function so it takes a pool number rather than the full file layout structure. Only update the ceph_pg if the pool is found in the osd map. Get rid of few useless lines of code from the function while there. Since the function now very clearly just fills in the ceph_pg structure it's provided, rename it ceph_calc_ceph_pg(). Signed-off-by: Alex Elder <[email protected]> Reviewed-by: Josh Durgin <[email protected]>
2013-02-26libceph: add support for HASHPSPOOL pool flagSage Weil1-0/+2
The legacy behavior adds the pgid seed and pool together as the input for CRUSH. That is problematic because each pool's PGs end up mapping to the same OSDs: 1.5 == 2.4 == 3.3 == ... Instead, if the HASHPSPOOL flag is set, we has the ps and pool together and feed that into CRUSH. This ensures that two adjacent pools will map to an independent pseudorandom set of OSDs. Advertise our support for this via a protocol feature flag. Signed-off-by: Sage Weil <[email protected]> Reviewed-by: Alex Elder <[email protected]>
2013-02-26libceph: calculate placement based on the internal data typesSage Weil1-1/+1
Instead of using the old ceph_object_layout struct, update our internal ceph_calc_object_layout method to use the ceph_pg type. This allows us to pass the full 32-bit precision of the pgid.seed to the callers. It also allows some callers to avoid reaching into the request structures for the struct ceph_object_layout fields. Signed-off-by: Sage Weil <[email protected]> Reviewed-by: Alex Elder <[email protected]>
2013-02-26ceph: update support for PGID64, PGPOOL3, OSDENC protocol featuresSage Weil1-3/+13
Support (and require) the PGID64, PGPOOL3, and OSDENC protocol features. These have been present in ceph.git since v0.42, Feb 2012. Require these features to simplify support; nobody is running older userspace. Note that the new request and reply encoding is still not in place, so the new code is not yet functional. Signed-off-by: Sage Weil <[email protected]> Reviewed-by: Alex Elder <[email protected]>
2013-02-26libceph: decode into cpu-native ceph_pg typeSage Weil1-3/+8
Always decode data into our cpu-native ceph_pg type that has the correct field widths. Limit any remaining uses of ceph_pg_v1 to dealing with the legacy protocol. Signed-off-by: Sage Weil <[email protected]> Reviewed-by: Alex Elder <[email protected]>
2013-02-26libceph: rename ceph_pg -> ceph_pg_v1Sage Weil1-3/+4
Rename the old version this type to distinguish it from the new version. Signed-off-by: Sage Weil <[email protected]> Reviewed-by: Alex Elder <[email protected]>
2013-01-30Merge branch 'testing' of github.com:ceph/ceph-client into v3.8-rc5-testingAlex Elder1-1/+1
2013-01-17libceph: pass length to ceph_calc_file_object_mapping()Alex Elder1-1/+1
ceph_calc_file_object_mapping() takes (among other things) a "file" offset and length, and based on the layout, determines the object number ("bno") backing the affected portion of the file's data and the offset into that object where the desired range begins. It also computes the size that should be used for the request--either the amount requested or something less if that would exceed the end of the object. This patch changes the input length parameter in this function so it is used only for input. That is, the argument will be passed by value rather than by address, so the value provided won't get updated by the function. The value would only get updated if the length would surpass the current object, and in that case the value it got updated to would be exactly that returned in *oxlen. Only one of the two callers is affected by this change. Update ceph_calc_raw_layout() so it records any updated value. Signed-off-by: Alex Elder <[email protected]> Reviewed-by: Josh Durgin <[email protected]>