aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2017-07-07f2fs: do not set LOST_PINO for newly created dirSheng Yong1-1/+2
Since directories will be written back with checkpoint and fsync a directory will always write CP, there is no need to set LOST_PINO after creating a directory. Signed-off-by: Sheng Yong <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2017-07-07f2fs: skip ->writepages for {mete,node}_inode during recoveryChao Yu3-6/+13
Skip ->writepages in prior to ->writepage for {meta,node}_inode during recovery, hence unneeded loop in ->writepages can be avoided. Moreover, check SBI_POR_DOING earlier while writebacking pages. Signed-off-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2017-07-07f2fs: introduce __check_sit_bitmapChao Yu1-0/+25
After we introduce discard thread, discard command can be issued concurrently with data allocating, this patch adds new function to heck sit bitmap to ensure that userdata was invalid in which on-going discard command covered. Signed-off-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2017-07-07f2fs: stop gc/discard thread in prior during umountChao Yu3-9/+18
This patch resolves kernel panic for xfstests/081, caused by recent f2fs_bug_on f2fs: add f2fs_bug_on in __remove_discard_cmd For fixing, we will stop gc/discard thread in prior in ->kill_sb in order to avoid referring and releasing race among them. Signed-off-by: Jaegeuk Kim <[email protected]> Signed-off-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2017-07-07f2fs: introduce reserved_blocks in sysfsChao Yu4-6/+33
In this patch, we add a new sysfs interface, with it, we can control number of reserved blocks in system which could not be used by user, it enable f2fs to let user to configure for adjusting over-provision ratio dynamically instead of changing it by mkfs. So we can expect it will help to reserve more free space for relieving GC in both filesystem and flash device. Signed-off-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2017-07-07f2fs: avoid redundant f2fs_flush after remountYunlong Song1-0/+2
create_flush_cmd_control will create redundant issue_flush_thread after each remount with flush_merge option. Signed-off-by: Yunlong Song <[email protected]> Reviewed-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2017-07-07f2fs: report # of free inodes more preciselyJaegeuk Kim1-3/+11
If the partition is small, we don't need to report total # of inodes including hidden free nodes. Reviewed-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2017-07-07Merge branch 'mailbox-for-next' of ↵Linus Torvalds5-6/+195
git://git.linaro.org/landing-teams/working/fujitsu/integration Pull mailbox updates from Jassi Brar: - Minor improvement : avoid requiring unnecessary startup/shutdown callback that many drivers seem to not need - New controller driver for Qualcomm's APCS IPC * 'mailbox-for-next' of git://git.linaro.org/landing-teams/working/fujitsu/integration: mailbox: Introduce Qualcomm APCS IPC driver dt-bindings: mailbox: Introduce Qualcomm APCS global binding mailbox: Make startup and shutdown ops optional
2017-07-07Merge tag 'libnvdimm-for-4.13' of ↵Linus Torvalds47-460/+1504
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm updates from Dan Williams: "libnvdimm updates for the latest ACPI and UEFI specifications. This pull request also includes new 'struct dax_operations' enabling to undo the abuse of copy_user_nocache() for copy operations to pmem. The dax work originally missed 4.12 to address concerns raised by Al. Summary: - Introduce the _flushcache() family of memory copy helpers and use them for persistent memory write operations on x86. The _flushcache() semantic indicates that the cache is either bypassed for the copy operation (movnt) or any lines dirtied by the copy operation are written back (clwb, clflushopt, or clflush). - Extend dax_operations with ->copy_from_iter() and ->flush() operations. These operations and other infrastructure updates allow all persistent memory specific dax functionality to be pushed into libnvdimm and the pmem driver directly. It also allows dax-specific sysfs attributes to be linked to a host device, for example: /sys/block/pmem0/dax/write_cache - Add support for the new NVDIMM platform/firmware mechanisms introduced in ACPI 6.2 and UEFI 2.7. This support includes the v1.2 namespace label format, extensions to the address-range-scrub command set, new error injection commands, and a new BTT (block-translation-table) layout. These updates support inter-OS and pre-OS compatibility. - Fix a longstanding memory corruption bug in nfit_test. - Make the pmem and nvdimm-region 'badblocks' sysfs files poll(2) capable. - Miscellaneous fixes and small updates across libnvdimm and the nfit driver. Acknowledgements that came after the branch was pushed: commit 6aa734a2f38e ("libnvdimm, region, pmem: fix 'badblocks' sysfs_get_dirent() reference lifetime") was reviewed by Toshi Kani <[email protected]>" * tag 'libnvdimm-for-4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (42 commits) libnvdimm, namespace: record 'lbasize' for pmem namespaces acpi/nfit: Issue Start ARS to retrieve existing records libnvdimm: New ACPI 6.2 DSM functions acpi, nfit: Show bus_dsm_mask in sysfs libnvdimm, acpi, nfit: Add bus level dsm mask for pass thru. acpi, nfit: Enable DSM pass thru for root functions. libnvdimm: passthru functions clear to send libnvdimm, btt: convert some info messages to warn/err libnvdimm, region, pmem: fix 'badblocks' sysfs_get_dirent() reference lifetime libnvdimm: fix the clear-error check in nsio_rw_bytes libnvdimm, btt: fix btt_rw_page not returning errors acpi, nfit: quiet invalid block-aperture-region warnings libnvdimm, btt: BTT updates for UEFI 2.7 format acpi, nfit: constify *_attribute_group libnvdimm, pmem: disable dax flushing when pmem is fronting a volatile region libnvdimm, pmem, dax: export a cache control attribute dax: convert to bitmask for flags dax: remove default copy_from_iter fallback libnvdimm, nfit: enable support for volatile ranges libnvdimm, pmem: fix persistence warning ...
2017-07-07xfs: rename MAXPATHLEN to XFS_SYMLINK_MAXLENDarrick J. Wong6-8/+8
XFS has a maximum symlink target length of 1024 bytes; this is a holdover from the Irix days. Unfortunately, the constant establishing this is 'MAXPATHLEN' and is /not/ the same as the Linux MAXPATHLEN, which is 4096. The kernel enforces its 1024 byte MAXPATHLEN on symlink targets, but xfsprogs picks up the (Linux) system 4096 byte MAXPATHLEN, which means that xfs_repair doesn't complain about oversized symlinks. Since this is an on-disk format constraint, put the define in the XFS namespace and move everything over to use the new name. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Brian Foster <[email protected]>
2017-07-07libceph: advertise support for NEW_OSDOP_ENCODING and SERVER_LUMINOUSIlya Dryomov1-0/+6
All four SERVER_LUMINOUS feature bits are implemented, switch it on! NEW_OSDOP_ENCODING doesn't mean much for the client (it signifies support for MOSDOp v6) but needs to be enabled in order to get the latest (currently v25) pg_pool_t. Signed-off-by: Ilya Dryomov <[email protected]> Acked-by: Sage Weil <[email protected]>
2017-07-07libceph: osd_state is 32 bits wide in luminousIlya Dryomov3-12/+23
Signed-off-by: Ilya Dryomov <[email protected]>
2017-07-07crush: remove an obsolete commentIlya Dryomov1-5/+0
Reflects ceph.git commit dca1ae1e0a6b02029c3a7f9dec4114972be26d50. Signed-off-by: Ilya Dryomov <[email protected]>
2017-07-07crush: crush_init_workspace starts with struct crush_workIlya Dryomov1-1/+1
It is not just a pointer to crush_work, it is the whole structure. That is not a problem since it only contains a pointer. But it will be a problem if new data members are added to crush_work. Reflects ceph.git commit ee957dd431bfbeb6dadaf77764db8e0757417328. Signed-off-by: Ilya Dryomov <[email protected]>
2017-07-07libceph, crush: per-pool crush_choose_arg_map for crush_do_rule()Ilya Dryomov3-3/+208
If there is no crush_choose_arg_map for a given pool, a NULL pointer is passed to preserve existing crush_do_rule() behavior. Reflects ceph.git commits 55fb91d64071552ea1bc65ab4ea84d3c8b73ab4b, dbe36e08be00c6519a8c89718dd47b0219c20516. Signed-off-by: Ilya Dryomov <[email protected]>
2017-07-07crush: implement weight and id overrides for straw2Ilya Dryomov4-24/+119
bucket_straw2_choose needs to use weights that may be different from weight_items. For instance to compensate for an uneven distribution caused by a low number of values. Or to fix the probability biais introduced by conditional probabilities (see http://tracker.ceph.com/issues/15653 for more information). We introduce a weight_set for each straw2 bucket to set the desired weight for a given item at a given position. The weight of a given item when picking the first replica (first position) may be different from the weight the second replica (second position). For instance the weight matrix for a given bucket containing items 3, 7 and 13 could be as follows: position 0 position 1 item 3 0x10000 0x100000 item 7 0x40000 0x10000 item 13 0x40000 0x10000 When crush_do_rule picks the first of two replicas (position 0), item 7, 3 are four times more likely to be choosen by bucket_straw2_choose than item 13. When choosing the second replica (position 1), item 3 is ten times more likely to be choosen than item 7, 13. By default the weight_set of each bucket exactly matches the content of item_weights for each position to ensure backward compatibility. bucket_straw2_choose compares items by using their id. The same ids are also used to index buckets and they must be unique. For each item in a bucket an array of ids can be provided for placement purposes and they are used instead of the ids. If no replacement ids are provided, the legacy behavior is preserved. Reflects ceph.git commit 19537a450fd5c5a0bb8b7830947507a76db2ceca. Signed-off-by: Ilya Dryomov <[email protected]>
2017-07-07libceph: apply_upmap()Ilya Dryomov1-2/+95
Previously, pg_to_raw_osds() didn't filter for existent OSDs because raw_to_up_osds() would filter for "up" ("up" is predicated on "exists") and raw_to_up_osds() was called directly after pg_to_raw_osds(). Now, with apply_upmap() call in there, nonexistent OSDs in pg_to_raw_osds() output can affect apply_upmap(). Introduce remove_nonexistent_osds() to deal with that. Signed-off-by: Ilya Dryomov <[email protected]>
2017-07-07libceph: compute actual pgid in ceph_pg_to_up_acting_osds()Ilya Dryomov1-6/+6
Move raw_pg_to_pg() call out of get_temp_osds() and into ceph_pg_to_up_acting_osds(), for upcoming apply_upmap(). Signed-off-by: Ilya Dryomov <[email protected]>
2017-07-07libceph: pg_upmap[_items] infrastructureIlya Dryomov3-4/+164
pg_temp and pg_upmap encodings are the same (PG -> array of osds), except for the incremental remove: it's an empty mapping in new_pg_temp for pg_temp and a separate old_pg_upmap set for pg_upmap. (This isn't to allow for empty pg_upmap mappings -- apparently, pg_temp just wasn't looked at as an example for pg_upmap encoding.) Reuse __decode_pg_temp() for decoding pg_upmap and new_pg_upmap. __decode_pg_temp() stores into pg_temp union member, but since pg_upmap union member is identical, reading through pg_upmap later is OK. Signed-off-by: Ilya Dryomov <[email protected]>
2017-07-07libceph: ceph_decode_skip_* helpersIlya Dryomov2-22/+63
Some of these won't be as efficient as they could be (e.g. ceph_decode_skip_set(... 32 ...) could advance by len * sizeof(u32) once instead of advancing by sizeof(u32) len times), but that's fine and not worth a bunch of extra macro code. Replace skip_name_map() with ceph_decode_skip_map as an example. Signed-off-by: Ilya Dryomov <[email protected]>
2017-07-07libceph: kill __{insert,lookup,remove}_pg_mapping()Ilya Dryomov1-72/+15
Switch to DEFINE_RB_FUNCS2-generated {insert,lookup,erase}_pg_mapping(). Signed-off-by: Ilya Dryomov <[email protected]>
2017-07-07libceph: introduce and switch to decode_pg_mapping()Ilya Dryomov1-67/+83
Signed-off-by: Ilya Dryomov <[email protected]>
2017-07-07libceph: don't pass pgid by valueIlya Dryomov1-10/+10
Make __{lookup,remove}_pg_mapping() look like their ceph_spg_mapping counterparts: take const struct ceph_pg *. Signed-off-by: Ilya Dryomov <[email protected]>
2017-07-07libceph: respect RADOS_BACKOFF backoffsIlya Dryomov8-0/+737
Signed-off-by: Ilya Dryomov <[email protected]>
2017-07-07libceph: make DEFINE_RB_* helpers more generalIlya Dryomov1-12/+37
Initially for ceph_pg_mapping, ceph_spg_mapping and ceph_hobject_id, compared with ceph_pg_compare(), ceph_spg_compare() and hoid_compare() respectively. Signed-off-by: Ilya Dryomov <[email protected]>
2017-07-07libceph: avoid unnecessary pi lookups in calc_target()Ilya Dryomov3-30/+42
Signed-off-by: Ilya Dryomov <[email protected]>
2017-07-07libceph: use target pi for calc_target() calculationsIlya Dryomov1-1/+8
For luminous and beyond we are encoding the actual spgid, which requires operating with the correct pg_num, i.e. that of the target pool. Signed-off-by: Ilya Dryomov <[email protected]>
2017-07-07libceph: always populate t->target_{oid,oloc} in calc_target()Ilya Dryomov1-11/+4
need_check_tiering logic doesn't make a whole lot of sense. Drop it and apply tiering unconditionally on every calc_target() call instead. Signed-off-by: Ilya Dryomov <[email protected]>
2017-07-07libceph: make sure need_resend targets reflect latest mapIlya Dryomov3-9/+27
Otherwise we may miss events like PG splits, pool deletions, etc when we get multiple incremental maps at once. Because check_pool_dne() can now be fed an unlinked request, finish_request() needed to be taught to handle unlinked requests. Signed-off-by: Ilya Dryomov <[email protected]>
2017-07-07libceph: delete from need_resend_linger before check_linger_pool_dne()Ilya Dryomov1-0/+1
When processing a map update consisting of multiple incrementals, we may end up running check_linger_pool_dne() on a lingering request that was previously added to need_resend_linger list. If it is concluded that the target pool doesn't exist, the request is killed off while still on need_resend_linger list, which leads to a crash on a NULL lreq->osd in kick_requests(): libceph: linger_id 18446462598732840961 pool does not exist BUG: unable to handle kernel NULL pointer dereference at 0000000000000010 IP: ceph_osdc_handle_map+0x4ae/0x870 Signed-off-by: Ilya Dryomov <[email protected]>
2017-07-07libceph: resend on PG splits if OSD has RESEND_ON_SPLITIlya Dryomov3-11/+19
Note that ceph_osd_request_target fields are updated regardless of RESEND_ON_SPLIT. Signed-off-by: Ilya Dryomov <[email protected]>
2017-07-07libceph: drop need_resend from calc_target()Ilya Dryomov1-7/+11
Replace it with more fine-grained bools to separate updating ceph_osd_request_target fields and the decision to resend. Signed-off-by: Ilya Dryomov <[email protected]>
2017-07-07libceph: MOSDOp v8 encoding (actual spgid + full hash)Ilya Dryomov3-20/+154
Signed-off-by: Ilya Dryomov <[email protected]>
2017-07-07libceph: ceph_connection_operations::reencode_message() methodIlya Dryomov2-2/+7
Give upper layers a chance to reencode the message after the connection is negotiated and ->peer_features is set. OSD client will use this to support both luminous and pre-luminous OSDs (in a single cluster): the former need MOSDOp v8; the latter will continue to be sent MOSDOp v4. Signed-off-by: Ilya Dryomov <[email protected]>
2017-07-07libceph: encode_{pgid,oloc}() helpersIlya Dryomov1-23/+27
Factor out encode_{pgid,oloc}() and use ceph_encode_string() for oid. Signed-off-by: Ilya Dryomov <[email protected]>
2017-07-07libceph: introduce ceph_spg, ceph_pg_to_primary_shard()Ilya Dryomov5-4/+60
Store both raw pgid and actual spgid in ceph_osd_request_target. Signed-off-by: Ilya Dryomov <[email protected]>
2017-07-07libceph: new pi->last_force_request_resendIlya Dryomov1-0/+37
The old (v15) pi->last_force_request_resend has been repurposed to make pre-RESEND_ON_SPLIT clients that don't check for PG splits but do obey pi->last_force_request_resend resend on splits. See ceph.git commit 189ca7ec6420 ("mon/OSDMonitor: make pre-luminous clients resend ops on split"). Signed-off-by: Ilya Dryomov <[email protected]>
2017-07-07libceph: fold [l]req->last_force_resend into ceph_osd_request_targetIlya Dryomov2-13/+12
Signed-off-by: Ilya Dryomov <[email protected]>
2017-07-07libceph: support SERVER_JEWEL feature bitsIlya Dryomov2-1/+9
Only MON_STATEFUL_SUB, really. MON_ROUTE_OSDMAP and OSDSUBOP_NO_SNAPCONTEXT are irrelevant. Signed-off-by: Ilya Dryomov <[email protected]>
2017-07-07libceph: advertise support for OSD_POOLRESENDIlya Dryomov1-0/+1
The code has been in place since commit 63244fa123a7 ("libceph: introduce ceph_osd_request_target, calc_target()"), and, with the ceph_{oloc,oid}_copy() issue fixed in the previous commit, is now in working order. Signed-off-by: Ilya Dryomov <[email protected]>
2017-07-07libceph: handle non-empty dest in ceph_{oloc,oid}_copy()Ilya Dryomov1-4/+6
Signed-off-by: Ilya Dryomov <[email protected]>
2017-07-07libceph: new features macrosIlya Dryomov1-75/+167
Signed-off-by: Ilya Dryomov <[email protected]>
2017-07-07libceph: remove ceph_sanitize_features() workaroundIlya Dryomov2-23/+1
Reflects ceph.git commit ff1959282826ae6acd7134e1b1ede74ffd1cc04a. Signed-off-by: Ilya Dryomov <[email protected]>
2017-07-07ceph: update ceph_dentry_info::lease_session when necessaryYan, Zheng1-2/+7
Current code does not update ceph_dentry_info::lease_session once it is set. If auth mds of corresponding dentry changes, dentry lease keeps in an invalid state. Signed-off-by: "Yan, Zheng" <[email protected]> Reviewed-by: Jeff Layton <[email protected]> Signed-off-by: Ilya Dryomov <[email protected]>
2017-07-07ceph: new mount option that specifies fscache uniquifierYan, Zheng3-21/+113
Current ceph uses FSID as primary index key of fscache data. This allows ceph to retain cached data across remount. But this causes problem (kernel opps, fscache does not support sharing data) when a filesystem get mounted several times (with fscache enabled, with different mount options). The fix is adding a new mount option, which specifies uniquifier for fscache. Signed-off-by: "Yan, Zheng" <[email protected]> Acked-by: Jeff Layton <[email protected]> Signed-off-by: Ilya Dryomov <[email protected]>
2017-07-07ceph: avoid accessing freeing inode in ceph_check_delayed_caps()Yan, Zheng1-2/+9
Signed-off-by: "Yan, Zheng" <[email protected]> Signed-off-by: Ilya Dryomov <[email protected]>
2017-07-07ceph: avoid invalid memory dereference in the middle of umountYan, Zheng2-4/+6
extra_mon_dispatch() and debugfs' foo_show functions dereference fsc->mdsc. we should clean up fsc->client->extra_mon_dispatch and debugfs before destroying fsc->mds. Signed-off-by: "Yan, Zheng" <[email protected]> Signed-off-by: Ilya Dryomov <[email protected]>
2017-07-07ceph: getattr before read on ceph.* xattrsYan, Zheng1-0/+3
Previously we were returning values for quota, layout xattrs without any kind of update -- the user just got whatever happened to be in our cache. Clearly this extra round trip has a cost, but reads of these xattrs are fairly rare, happening on admin intervention rather than in normal operation. Link: http://tracker.ceph.com/issues/17939 Signed-off-by: "Yan, Zheng" <[email protected]> Signed-off-by: Ilya Dryomov <[email protected]>
2017-07-07ceph: don't re-send interrupted flock requestYan, Zheng1-1/+24
Don't re-send interrupted flock request in cases of mds failover and receiving request forward. Because corresponding 'lock intr' request may have been finished, it won't get re-sent. Link: http://tracker.ceph.com/issues/20170 Signed-off-by: "Yan, Zheng" <[email protected]> Signed-off-by: Ilya Dryomov <[email protected]>
2017-07-07ceph: cleanup writepage_nounlock()Yan, Zheng1-6/+6
Signed-off-by: "Yan, Zheng" <[email protected]> Signed-off-by: Ilya Dryomov <[email protected]>