aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2016-09-19staging: lustre: lnet: Enable setting per NI peer_creditsDoug Oucharek1-27/+19
The code to allow peer_credits to be set per NI was originally "left inactive" because there were concerns about peer_credits interfering with the ability for IB nodes to connect to each other when peer_credits are not the same (peer_credits controls the queue depth for IB). With LU-3322, the values do not have to match so it is now safe to enable this code so peer_credits can be set per NI. This patch enables existing code for setting per NI peer_credits. Second this patch fixes a long standing bug in that the conf data was not being used to set variables in the lnet_ni structure until after lnd_startup() was called which meant LND drivers were ignoring struct lnet_ni tunable values being set. Now we change struct lnet_ni data fields based on conf data before calling lnd_startup(). Signed-off-by: Doug Oucharek <[email protected]> Signed-off-by: James Simmons <[email protected]> Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-8507 Reviewed-on: http://review.whamcloud.com/21948 Reviewed-by: Olaf Weber <[email protected]> Reviewed-by: Dmitry Eremin <[email protected]> Reviewed-by: Oleg Drokin <[email protected]> Signed-off-by: James Simmons <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2016-09-19staging: lustre: lnet: Ensure routing is turned on first timeDoug Oucharek1-5/+5
In lnet_rtrpools_enable(), a mistake was made and routing was not being turned on when the rtrpools are being allocated for the first time. This patch fixes that routine so we remember to turn on routing after allocating the rtrpools. Signed-off-by: Doug Oucharek <[email protected]> Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-8501 Reviewed-on: http://review.whamcloud.com/21934 Reviewed-by: James Simmons <[email protected]> Reviewed-by: Amir Shehata <[email protected]> Reviewed-by: Dmitry Eremin <[email protected]> Reviewed-by: Oleg Drokin <[email protected]> Signed-off-by: James Simmons <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2016-09-19staging: lustre: lnet: check if ni is in current net namespaceSebastien Buisson3-0/+27
Add new 'ni_net_ns' field to struct lnet_ni to hold a reference to original net namespace in which ni is created. In LNetDist(), check if ni was created in same net namespace as current's one. If not, assign order above 0xffff0000, to make this ni not a priority. Signed-off-by: Sebastien Buisson <[email protected]> Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-7845 Reviewed-on: http://review.whamcloud.com/21884 Reviewed-by: Olaf Weber <[email protected]> Reviewed-by: Doug Oucharek <[email protected]> Reviewed-by: Oleg Drokin <[email protected]> Signed-off-by: James Simmons <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2016-09-19staging: lustre: lnet: potential deadlock in lnetQuentin Bouget1-11/+13
Fixes potential deadlock in LNetMDAttach Signed-off-by: Quentin Bouget <[email protected]> Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-8249 Reviewed-on: http://review.whamcloud.com/20676 Reviewed-by: Doug Oucharek <[email protected]> Reviewed-by: James Simmons <[email protected]> Reviewed-by: Henri Doreau <[email protected]> Reviewed-by: Oleg Drokin <[email protected]> Signed-off-by: James Simmons <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2016-09-19staging: lustre: lmv: fix parent FID for migrationwang di1-5/+25
If the migrating directory is under striped directory, it needs to set right stripe FID for its parent. Signed-off-by: wang di <[email protected]> Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-6263 Reviewed-on: http://review.whamcloud.com/13817 Reviewed-by: John L. Hammond <[email protected]> Reviewed-by: Andreas Dilger <[email protected]> Reviewed-by: Fan Yong <[email protected]> Reviewed-by: Lai Siyao <[email protected]> Signed-off-by: James Simmons <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2016-09-19staging: lustre: mdc: cl_default_mds_easize not refreshedNed Bass3-14/+67
The client_obd::cl_default_mds_easize field should track the largest observed EA size advertised by the MDT, subject to a reasonable upper bound. The MDC uses cl_default_mds_easize to calculate the initial size of request buffers. The default value should be small enough to avoid wasted memory and excessive use of vmalloc(), yet large enough to accommodate the common use case. In the current code, the default value is only updated if client_obd::cl_max_mds_easize is strictly less than mdt_body::mbo_max_mdsize. This condition is almost never met, because client_obd::cl_max_mds_easize is computed at client mount-time based on the number of OSTs in the filesystem, so the MDT won't ever observe and advertise an EA size larger than that. As a result, client_obd::cl_default_mds_easize indefinitely retains its initial value, which is computed at client mount-time based on the filesystem's default stripe width. Any getattr() requests for widely striped files will consequently allocate a request buffer that is too small, forcing reallocations on both the client and server side. To avoid this, update client_obd::cl_default_mds_easize independently of the value of client_obd::cl_max_mds_easize. In addition, this patch includes these changes: - Add comments to the client_obd structure to clarify what the cl_{default,max}_mds_{cookie,ea}size values mean. - Prevent mdc_get_info() from storing uninitialized data in client_obd::cl_max_mds_cookiesize. - Use 4096 as an upper bound for the default values. The former bound of PAGE_CACHE_SIZE is too large on 64k-page platforms (i.e. PPC), so it fails to prevent the vmalloc() spinlock contention described in LU-3338. The new value was chosen to be large enough to accommodate common use cases while staying well below the 16k threshold at which allocations start using vmalloc(). Signed-off-by: Ned Bass <[email protected]> Signed-off-by: Kyle Blatter <[email protected]> Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5549 Reviewed-on: http://review.whamcloud.com/11614 Reviewed-by: Lai Siyao <[email protected]> Reviewed-by: Andreas Dilger <[email protected]> Reviewed-by: Oleg Drokin <[email protected]> Signed-off-by: James Simmons <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2016-09-19staging: lustre: llite: make default_easize writeable in /sysfsNed Bass5-2/+124
Allow default_easize to be tuned via /sysfs. A system administrator might want this if a rare access to widely striped files drives up the value on a filesystem where narrowly striped files are the more common case. In practice, however, this is wanted primarily to facilitate a test case for LU-5549. - Plumb the necessary interfaces through the LMV and MDC layers to expose write access to this value by higher layers. - Add block comments to modified functions. Signed-off-by: Ned Bass <[email protected]> Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5549 Reviewed-on: http://review.whamcloud.com/13112 Reviewed-by: Andreas Dilger <[email protected]> Reviewed-by: Lai Siyao <[email protected]> Signed-off-by: James Simmons <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2016-09-19staging: lustre: mdt: add indexing option to default dir stripewang di5-4/+63
Add indexing option to default dirstripe EA. If MDT find out the client send the create req to the wrong MDT because of default stripeEA, it will return -EREMOTE, then client will retrieve default stripeEA through xattr cache, and re-create the object. Also merged patch for LU-6341 to resolve the following problem. Use ll_dir_getstripe to get default stripeEA in ll_new_node(), Because ll_getxattr_common requires admin rights for retrieving default LMVEA (because of trusted- prefix), which might cause mkdir (from normal user) failure. If parent does not have default stripeEA, then child should always be in the same MDT for mkdir. Otherwise MDT should return -EREMOTE, then client will refresh the default stripe index, and recreate the object. Signed-off-by: wang di <[email protected]> Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5523 Reviewed-on: http://review.whamcloud.com/13360 Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-6341 Reviewed-on: http://review.whamcloud.com/13990 Reviewed-by: Andreas Dilger <[email protected]> Reviewed-by: Lai Siyao <[email protected]> Reviewed-by: John L. Hammond <[email protected]> Reviewed-by: James Simmons <[email protected]> Reviewed-by: Oleg Drokin <[email protected]> Signed-off-by: James Simmons <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2016-09-19staging: lustre: ptlrpc: prevent request timeout grow due to recoveryMikhail Pershin2-20/+22
Patch fixes the issue seen on the client with growing request timeout which occurred after the server side patch landed for LU-5079. While commit itself is correct, it reveals another issue. If request is being processed for a long time on server then client adaptive timeouts will adapt to that after receiving reply and new requests will have bigger timeout. Another problem is that server AT history is corrupted by recovery request processing time which not pure service time but includes also waiting time for clients to recover. Patch prevents the AT stats update from early replies on client and from recovering requests processing time on server. The ptlrpc_at_recv_early_reply() still updates the current request timeout as asked by server, but don't include this into AT stats. The real reply will bring that data from server after all. Signed-off-by: Mikhail Pershin <[email protected]> Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-6084 Reviewed-on: http://review.whamcloud.com/13520 Reviewed-by: Andreas Dilger <[email protected]> Reviewed-by: Jian Yu <[email protected]> Reviewed-by: Oleg Drokin <[email protected]> Signed-off-by: James Simmons <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2016-09-19staging: lustre: obd: use proper flags for call_usermodehelperJames Simmons1-1/+1
When a parameter is permanently changed on the MGS the MGS send a changelog packet to the proper nodes that are affected by the change. Once the nodes receive the change they then call the userland utility lctl to change its local value. When calling a userland application from the kernel you specify a flag to control the interaction with the application. Originally by default the flag was set to 0 which is UMH_NO_WAIT which meant lctl was being called asynchronously. In older kernels this was fine since UHM_NO_WAIT and UHM_WAIT_PROC had nearly the same logic. This changed with newer kernels which broke updating our parameters. Plus doing a UHM_NO_WAIT doesn't report back a error if something goes wrong with lctl. The fix is to set the flag to UHM_WAIT_PROC so kernel space waits until lctl has finished and we get a proper error code if something does go wrong with lctl. Signed-off-by: James Simmons <[email protected]> Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-6063 Reviewed-on: http://review.whamcloud.com/13677 Reviewed-by: Bob Glossman <[email protected]> Reviewed-by: Andreas Dilger <[email protected]> Signed-off-by: James Simmons <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2016-09-19staging: lustre: ptlrpc: remove unnecessary EXPORT_SYMBOLfrank zago13-100/+8
A lot of symbols don't need to be exported at all because they are only used in the module they belong to. Signed-off-by: frank zago <[email protected]> Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5829 Reviewed-on: http://review.whamcloud.com/12510 Reviewed-by: Dmitry Eremin <[email protected]> Reviewed-by: John L. Hammond <[email protected]> Reviewed-by: James Simmons <[email protected]> Reviewed-by: Oleg Drokin <[email protected]> Signed-off-by: James Simmons <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2016-09-19staging: lustre: llite: lock the inode to be migratedwang di1-2/+7
Because the inode and its connected dentries will be cleared out of the cache after migration, the inode needs to be locked during the migration. Signed-off-by: wang di <[email protected]> Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4712 Reviewed-on: http://review.whamcloud.com/9689 Reviewed-by: Lai Siyao <[email protected]> Reviewed-by: Fan Yong <[email protected]> Reviewed-by: Oleg Drokin <[email protected]> Signed-off-by: James Simmons <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2016-09-19staging: lustre: obdclass: remove unnecessary EXPORT_SYMBOLfrank zago10-25/+0
A lot of symbols don't need to be exported at all because they are only used in the module they belong to. Signed-off-by: frank zago <[email protected]> Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5829 Reviewed-on: http://review.whamcloud.com/13323 Reviewed-by: Jian Yu <[email protected]> Reviewed-by: Dmitry Eremin <[email protected]> Reviewed-by: James Simmons <[email protected]> Reviewed-by: Oleg Drokin <[email protected]> Signed-off-by: James Simmons <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2016-09-19staging: lustre: misc: remove unnecessary EXPORT_SYMBOLfrank zago4-6/+0
A lot of symbols don't need to be exported at all because they are only used in the module they belong to. Signed-off-by: frank zago <[email protected]> Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5829 Reviewed-on: http://review.whamcloud.com/13321 Reviewed-by: James Simmons <[email protected]> Reviewed-by: Dmitry Eremin <[email protected]> Reviewed-by: Oleg Drokin <[email protected]> Signed-off-by: James Simmons <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2016-09-19staging: lustre: grant: quiet message on grant waiting timeoutJohann Lombardi1-21/+40
Use at_max in osc_enter_cache() to bound how long we wait for grant space before switching to synchronous I/Os. Do not print a message on the console when the timeout is hit since such long wait can be legitimate with flaky network (i.e. BRW is resent multiple times). Signed-off-by: Johann Lombardi <[email protected]> Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5521 Reviewed-on: http://review.whamcloud.com/12146 Reviewed-by: Niu Yawei <[email protected]> Reviewed-by: Jinshan Xiong <[email protected]> Reviewed-by: Oleg Drokin <[email protected]> Signed-off-by: James Simmons <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2016-09-19staging: lustre: lmv: Do not revalidate stripes with master lockwang di8-91/+31
Do not revalidate slave stripes while holding master lock. Otherwise if the revalidating slaves are blocked, then the master lock can not be released in time. Remove some unnecesary merging in ll_revalidate_slave(), and the attributes will be stored in each stripe, only merging them if required. Signed-off-by: wang di <[email protected]> Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-6088 Reviewed-on: http://review.whamcloud.com/13432 Reviewed-by: Andreas Dilger <[email protected]> Reviewed-by: Lai Siyao <[email protected]> Reviewed-by: Oleg Drokin <[email protected]> Signed-off-by: James Simmons <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2016-09-19staging: lustre: client: Fix mkdir -i 1 from DNE2 client to DNE1 serverArtem Blagodarenko1-8/+12
After DNE phase 2 has been added to client it sends create request to slave MDT. DNT1-only server doesn't expect request to slave MDT from client. It expects only cross-mdt request from master MDT. Thus if DNE2 client tries to "mkdir -i 1" on DNE1 server, then LBUG happened. This patch adds OBD_CONNECT_DIR_STRIPE connection flag check on client side. If striped directories are not supported by server, then create requrest is sent to master MDT. Signed-off-by: Artem Blagodarenko <[email protected]> Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-6071 Xyratex-bug-id: MRP-2319 Reviewed-on: http://review.whamcloud.com/13189 Reviewed-by: Andreas Dilger <[email protected]> Reviewed-by: wang di <[email protected]> Reviewed-by: Oleg Drokin <[email protected]> Signed-off-by: James Simmons <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2016-09-19staging: lustre: clio: pass fid for OST setattrBobi Jam4-2/+10
Store inode's fid in cl_setattr_ost() and OSC packs this info on the wire (via lustre_set_wire_obdo) so that OST can use. NOTE: currently lu_fid::f_ver and obdo::o_parent_ver are not used on OFD device, and we use obdo::o_stripe_idx as filter_fid::ff_parent::f_ver and save it to the device. Signed-off-by: Bobi Jam <[email protected]> Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-1154 Reviewed-on: http://review.whamcloud.com/12902 Reviewed-by: Jinshan Xiong <[email protected]> Reviewed-by: Andreas Dilger <[email protected]> Reviewed-by: John L. Hammond <[email protected]> Signed-off-by: James Simmons <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2016-09-19staging: lustre: clio: rename coo_attr_set to coo_attr_updateBobi Jam10-32/+34
coo_attr_set() is used to update object's attribute but its name makes confusion that people intuitively think that it is used to pass object's attribute down to server sides. Signed-off-by: Bobi Jam <[email protected]> Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-1154 Reviewed-on: http://review.whamcloud.com/12888 Reviewed-by: Jinshan Xiong <[email protected]> Reviewed-by: Andreas Dilger <[email protected]> Signed-off-by: James Simmons <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2016-09-19staging: lustre: llite: pack suppgid to MDS correctlyFan Yong2-2/+35
The ll_lookup_it() may trigger IT_OPEN RPC to open a file by name. But at that time, the client does not know the target file's GID, so it cannot pack the necessary supplementary group ID in the RPC. Because of missing the supplementary group ID, the RPC maybe fail for open permission check on the MDS. Under such case, MDS should return the target file's GID, if the current thread on the client in the right group (according to the file's GID), the client will try the IT_OPEN RPC again with the right supplementary group ID. This patch is also helpful if some other(s) changed the file's GID after current RPC sent to the MDS with the suppgid as the original GID by race. Signed-off-by: Fan Yong <[email protected]> Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5423 Reviewed-on: http://review.whamcloud.com/12476 Reviewed-by: Lai Siyao <[email protected]> Reviewed-by: Andreas Dilger <[email protected]> Reviewed-by: Oleg Drokin <[email protected]> Signed-off-by: James Simmons <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2016-09-19staging: lustre: remove lustre/include/linux/John L. Hammond12-77/+36
Merge the contents of lustre/include/linux/lvfs.h into lustre/include/lvfs.h. Merge lustre/include/linux/lustre_user.h into lustre/include/lustre/lustre_user.h. Move lustre_compat25.h and lustre_patchless_compat.h from lustre/include/linux/ to lustre/include/ and rename lustre_compat25.h to lustre_compat.h. Signed-off-by: John L. Hammond <[email protected]> Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-2675 Reviewed-on: http://review.whamcloud.com/13271 Reviewed-by: Bob Glossman <[email protected]> Reviewed-by: Amir Shehata <[email protected]> Reviewed-by: Oleg Drokin <[email protected]> Signed-off-by: James Simmons <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2016-09-19staging: lustre: libcfs: check mask returned by cpumask_of_nodeLiang Zhen1-3/+14
cpumask_of_node can return NULL if NUMA node is unavailable, in this case cfs_node_to_cpumask will try to copy from NULL and cause kernel panic. Signed-off-by: Liang Zhen <[email protected]> Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5751 Reviewed-on: http://review.whamcloud.com/13207 Reviewed-by: Li Wei <[email protected]> Reviewed-by: Bobi Jam <[email protected]> Reviewed-by: James Simmons <[email protected]> Reviewed-by: Oleg Drokin <[email protected]> Signed-off-by: James Simmons <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2016-09-19staging: lustre: obd: change type of cl_conn_count to size_tDmitry Eremin2-2/+2
Change type of cl_conn_count to size_t. Signed-off-by: Dmitry Eremin <[email protected]> Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5577 Reviewed-on: http://review.whamcloud.com/13125 Reviewed-by: James Simmons <[email protected]> Reviewed-by: John L. Hammond <[email protected]> Reviewed-by: Oleg Drokin <[email protected]> Signed-off-by: James Simmons <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2016-09-19staging: lustre: llite: unlock inode size in ll_lov_setstripe_ea_info()John L. Hammond1-7/+6
In ll_lov_setstripe_ea_info() release the inode size lock on all appropriate exit paths. Signed-off-by: John L. Hammond <[email protected]> Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-6059 Reviewed-on: http://review.whamcloud.com/13167 Reviewed-by: Jinshan Xiong <[email protected]> Reviewed-by: Andreas Dilger <[email protected]> Reviewed-by: Lai Siyao <[email protected]> Signed-off-by: James Simmons <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2016-09-19staging: lustre: lprocfs: cleanup stats locking codeAndreas Dilger1-58/+74
Add comment blocks on lprocfs_stats_lock() and lprocfs_stats_unlock(). Move common NOPERCPU code out of the switch() statements to reduce code size and complexity, since it doesn't depend on the opc at all. Replace switch() in lprocfs_stats_unlock() with a simple if/else, since the lock opc was already checked in lprocfs_stats_lock(). Add an enum for the lprocfs_stats_lock() operations to make it clear what the valid values are and allow compiler checking. Signed-off-by: Andreas Dilger <[email protected]> Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5946 Reviewed-on: http://review.whamcloud.com/12872 Reviewed-by: Bobi Jam <[email protected]> Reviewed-by: John L. Hammond <[email protected]> Reviewed-by: Oleg Drokin <[email protected]> Signed-off-by: James Simmons <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2016-09-19staging: lustre: osc: change cl_extent_tax and *grants to unsignedDmitry Eremin5-28/+30
Change the type accordant usage and remove warnings. Signed-off-by: Dmitry Eremin <[email protected]> Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5577 Reviewed-on: http://review.whamcloud.com/12386 Reviewed-by: James Simmons <[email protected]> Reviewed-by: John L. Hammond <[email protected]> Reviewed-by: Oleg Drokin <[email protected]> Signed-off-by: James Simmons <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2016-09-19staging: lustre: osc: osc_object_ast_clear() LBUGBobi Jam1-1/+0
An OSC object could be destroyed with AGL locks waiting for granted, so we'd get rid of the osc_object_ast_clear() assertion that its dlm locks all getting granted. Signed-off-by: Bobi Jam <[email protected]> Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-6042 Reviewed-on: http://review.whamcloud.com/13163 Reviewed-by: Jinshan Xiong <[email protected]> Reviewed-by: Niu Yawei <[email protected]> Reviewed-by: Oleg Drokin <[email protected]> Signed-off-by: James Simmons <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2016-09-19staging: lustre: mgc: add nid iterationAlexander Boyko1-5/+11
mgc_apply_recover_logs use only first nid from entry, this could be the problem for a cluster with several network address for a one node. Signed-off-by: Alexander Boyko <[email protected]> Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5950 Xyratex-bug-id: MRP-2255 Reviewed-on: http://review.whamcloud.com/12829 Reviewed-by: Andreas Dilger <[email protected]> Reviewed-by: Mike Pershin <[email protected]> Reviewed-by: Ann Koehler <[email protected]> Signed-off-by: James Simmons <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2016-09-19staging: lustre: ptlrpc: fix race between connect vs resendAlexander Boyko2-156/+179
Buggy code at ptlrpc_connect_interpret() finish: rc = ptlrpc_import_recovery_state_machine(imp); ... Set import connection flags When import has FULL state ptlrpc_import_recovery_state_machine() wakeup all waiters on import and all delayed request, which was resented. And it could happened that request was send without updated flags and AT is disabled. If such request is in progress on the server, server drop the new instance, and could do early reply for it. But this early reply confuse client, cause it wait real reply(no AT for this request). Client try to touch buffer outside reply and got EPROTO error. The same bug existed for initital connect too. Import became FULL before import connection flags was set. Signed-off-by: Alexander Boyko <[email protected]> Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5528 Xyratex-bug-id: MRP-2034 Reviewed-on: http://review.whamcloud.com/11723 Reviewed-by: Li Wei <[email protected]> Reviewed-by: Alexander Boyko <[email protected]> Reviewed-by: Liang Zhen <[email protected]> Reviewed-by: Oleg Drokin <[email protected]> Signed-off-by: James Simmons <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2016-09-19staging: lustre: lov: flatten struct lov_stripe_mdJohn L. Hammond2-24/+10
Flatten out the lsm_wire struct from the middle of struct lov_stripe_md and remove the member name macros. Signed-off-by: John L. Hammond <[email protected]> Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5814 Reviewed-on: http://review.whamcloud.com/12581 Reviewed-by: Bobi Jam <[email protected]> Reviewed-by: Jinshan Xiong <[email protected]> Reviewed-by: Oleg Drokin <[email protected]> Signed-off-by: James Simmons <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2016-09-19staging: lustre: ldlm: move LDLM_GID_ANY to lustre_dlm.hJinshan Xiong2-2/+5
lustre_idl.h only includes wire data; lustre_dlm.h is the right place for LDLM_GID_ANY. Signed-off-by: Jinshan Xiong <[email protected]> Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-6028 Reviewed-on: http://review.whamcloud.com/13074 Reviewed-by: Bobi Jam <[email protected]> Reviewed-by: James Simmons <[email protected]> Reviewed-by: John L. Hammond <[email protected]> Reviewed-by: Oleg Drokin <[email protected]> Signed-off-by: James Simmons <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2016-09-19staging: lustre: ptlrpc: fix comparison between signed and unsignedDmitry Eremin5-102/+101
Change return type and size argiments of lustre_msg_hdr_size(), lustre_msg_buf{len,count}() and req_capsule_*_size() to __u32. Change type of req_format->rf_idx and req_format->rf_fields.nr to size_t. Also return zero for incorrect message magic instead of -EINVAL. This will be more robust because of few of them after LASSERTF(0, "...") and will not be returned. In the rest places it return zero size instead of huge number after implicit unsigned conversion. Signed-off-by: Dmitry Eremin <[email protected]> Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5577 Reviewed-on: http://review.whamcloud.com/12475 Reviewed-by: James Simmons <[email protected]> Reviewed-by: Fan Yong <[email protected]> Reviewed-by: John L. Hammond <[email protected]> Reviewed-by: Oleg Drokin <[email protected]> Signed-off-by: James Simmons <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2016-09-19staging: lustre: clio: add coo_getstripe interfaceBobi Jam8-47/+105
Use cl_object_operations::coo_getstripe() to handle LL_IOC_LOV_GETSTRIPE ops. Signed-off-by: Bobi Jam <[email protected]> Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5823 Reviewed-on: http://review.whamcloud.com/12452 Reviewed-by: John L. Hammond <[email protected]> Reviewed-by: Jinshan Xiong <[email protected]> Reviewed-by: Oleg Drokin <[email protected]> Signed-off-by: James Simmons <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2016-09-19staging: lustre: obdclass: change cl_fault_io->ft_nob to size_tDmitry Eremin5-9/+9
Change the type accordant usage. Signed-off-by: Dmitry Eremin <[email protected]> Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5577 Reviewed-on: http://review.whamcloud.com/12380 Reviewed-by: John L. Hammond <[email protected]> Reviewed-by: James Simmons <[email protected]> Reviewed-by: Oleg Drokin <[email protected]> Signed-off-by: James Simmons <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2016-09-19staging: lustre: obd: change brw_page->count to unsignedDmitry Eremin3-6/+8
Pages count is unsigned. So, change the type accordant usage. Signed-off-by: Dmitry Eremin <[email protected]> Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5577 Reviewed-on: http://review.whamcloud.com/12378 Reviewed-by: John L. Hammond <[email protected]> Reviewed-by: James Simmons <[email protected]> Reviewed-by: Oleg Drokin <[email protected]> Signed-off-by: James Simmons <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2016-09-19staging: lustre: ldlm: Recalculate interval in ldlm_pool_recalc()Nathaniel Clark1-16/+22
Instead of rechecking a static value, recalculate to see if pool stats need to be updated. Add newline so message will print instead of warning about missing newline. Signed-off-by: Nathaniel Clark <[email protected]> Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4536 Reviewed-on: http://review.whamcloud.com/12547 Reviewed-by: Lai Siyao <[email protected]> Reviewed-by: Andreas Dilger <[email protected]> Reviewed-by: Jian Yu <[email protected]> Signed-off-by: James Simmons <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2016-09-19staging: lustre: ptlrpc: Suppress error message when imp_sec is freedAmir Shehata1-2/+17
There is a race condition on client reconnect when the import is being destroyed. Some outstanding client bound requests are being processed when the imp_sec has alread been freed. Ensure to suppress the error message in import_sec_validate_get() in that case Signed-off-by: Amir Shehata <[email protected]> Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3353 Reviewed-on: http://review.whamcloud.com/10200 Reviewed-by: Andreas Dilger <[email protected]> Reviewed-by: James Simmons <[email protected]> Reviewed-by: Oleg Drokin <[email protected]> Signed-off-by: James Simmons <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2016-09-19staging: lustre: obdclass: eliminate NULL error returnBob Glossman1-6/+8
Always return an ERR_PTR() on errors, never return a NULL, in lu_object_find_slice(). Also clean up callers who no longer need special case handling of NULL returns. Signed-off-by: Bob Glossman <[email protected]> Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5858 Reviewed-on: http://review.whamcloud.com/12554 Reviewed-by: Dmitry Eremin <[email protected]> Reviewed-by: Fan Yong <[email protected]> Reviewed-by: John L. Hammond <[email protected]> Reviewed-by: Oleg Drokin <[email protected]> Signed-off-by: James Simmons <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2016-09-19staging: lustre: obdclass: change loop indexes to unsignedDmitry Eremin2-8/+8
Cleanup warnings about comparison between signed and unsigned. Signed-off-by: Dmitry Eremin <[email protected]> Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5577 Reviewed-on: http://review.whamcloud.com/12387 Reviewed-by: Bob Glossman <[email protected]> Reviewed-by: John L. Hammond <[email protected]> Reviewed-by: Jinshan Xiong <[email protected]> Reviewed-by: Oleg Drokin <[email protected]> Signed-off-by: James Simmons <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2016-09-19staging: lustre: fiemap: set FIEMAP_EXTENT_LAST correctlyBobi Jam1-2/+10
When we've collected enough extents as user requested, we'd check one further to decide whether we've reached the last extent of the file. Signed-off-by: Bobi Jam <[email protected]> Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5933 Reviewed-on: http://review.whamcloud.com/12781 Reviewed-by: Andreas Dilger <[email protected]> Reviewed-by: Oleg Drokin <[email protected]> Signed-off-by: James Simmons <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2016-09-19staging: lustre: ldlm: evict clients returning errors on ASTsAlexey Lyashkov1-1/+4
To test proper behavior of clients returning errors on ASTs we can induce a failure with setting OBD_FAIL_LDLM_BL_CALLBACK_NET. Handle the new additonal case of cfs_fail_err being set as well so that the cfs_fail_err can be sent back in a reply. Signed-off-by: Alexey Lyashkov <[email protected]> Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5581 Xyratex-bug-id: MRP-2041 Reviewed-on: http://review.whamcloud.com/11752 Reviewed-by: James Simmons <[email protected]> Reviewed-by: Oleg Drokin <[email protected]> Signed-off-by: James Simmons <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2016-09-19staging: lustre: mdc: Proper accessing struct lov_user_mdYoshifumi Uemura1-1/+1
In mdc_setattr_pack() access the members of struct lov_user_md by little endian byte order. Signed-off-by: Yoshifumi Uemura <[email protected]> Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5889 Reviewed-on: http://review.whamcloud.com/12683 Reviewed-by: Dmitry Eremin <[email protected]> Reviewed-by: James Simmons <[email protected]> Reviewed-by: Yang Sheng <[email protected]> Reviewed-by: Oleg Drokin <[email protected]> Signed-off-by: James Simmons <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2016-09-19staging: lustre: obdclass: lu_htable_order() return type to longDmitry Eremin1-4/+4
Change the type accordant usage. Signed-off-by: Dmitry Eremin <[email protected]> Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5577 Reviewed-on: http://review.whamcloud.com/12385 Reviewed-by: Bob Glossman <[email protected]> Reviewed-by: John L. Hammond <[email protected]> Reviewed-by: Oleg Drokin <[email protected]> Signed-off-by: James Simmons <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2016-09-19staging: lustre: llite: fix dup flags namesBob Glossman1-2/+2
The name 'xattr' is used for two different ll_flags bits. Change the names to be distinct and different, reflecting the names of the bits used in LL_SBI_xbitnamex #defines. Signed-off-by: Bob Glossman <[email protected]> Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5586 Reviewed-on: http://review.whamcloud.com/12892 Reviewed-by: Minh Diep <[email protected]> Reviewed-by: Jian Yu <[email protected]> Reviewed-by: Oleg Drokin <[email protected]> Signed-off-by: James Simmons <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2016-09-19staging: lustre: llog: prevent out-of-bound indexfrank zago1-0/+4
llog_process_thread() can be called from llog_cat_process_cb with an index already out of bound, leading to the following crash: LustreError: 3773:0:(llog.c:310:llog_process_thread()) ASSERTION(index <= last_index + 1 ) failed: LustreError: 3773:0:(llog.c:310:llog_process_thread()) LBUG #0 [ffff8801144bf900] machine_kexec at ffffffff81038f3b #1 [ffff8801144bf960] crash_kexec at ffffffff810c5d82 #2 [ffff8801144bfa30] panic at ffffffff8152798a #3 [ffff8801144bfab0] lbug_with_loc at ffffffffa02f8eeb [libcfs] #4 [ffff8801144bfad0] llog_process_thread at ffffffffa0413fff [obdclass] #5 [ffff8801144bfb80] llog_process_or_fork at ffffffffa041585f [obdclass] #6 [ffff8801144bfbd0] llog_cat_process_cb at ffffffffa0418612 [obdclass] #7 [ffff8801144bfc30] llog_process_thread at ffffffffa0413c22 [obdclass] #8 [ffff8801144bfce0] llog_process_or_fork at ffffffffa041585f [obdclass] #9 [ffff8801144bfd30] llog_cat_process_or_fork at ffffffffa0416b9d [obdclass] If index is too big, simply return success. Signed-off-by: frank zago <[email protected]> Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5635 Reviewed-on: http://review.whamcloud.com/12161 Reviewed-by: Jinshan Xiong <[email protected]> Reviewed-by: Patrick Farrell <[email protected]> Reviewed-by: John L. Hammond <[email protected]> Reviewed-by: Oleg Drokin <[email protected]> Signed-off-by: James Simmons <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2016-09-19staging: lustre: ptlrpc: quiet errors on initial connectionAndreas Dilger2-24/+30
It may be that a client or MDS is trying to connect to a target (OST or peer MDT) before that target is finished setup. Rather than spamming the console logs during initial connection, only print a console error message if there are repeated failures trying to connect to the target, which may indicate an error on that node. Signed-off-by: Andreas Dilger <[email protected]> Signed-off-by: Bobi Jam <[email protected]> Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3456 Reviewed-on: http://review.whamcloud.com/10057 Reviewed-by: Bobi Jam <[email protected]> Reviewed-by: Bob Glossman <[email protected]> Reviewed-by: Oleg Drokin <[email protected]> Signed-off-by: James Simmons <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2016-09-19staging: lustre: ldlm: revert the changes for lock canceling policyJinshan Xiong1-6/+0
The changes for LRU lock policy was introduced by commit bfae5a4e, where I was trying to revise the policy to pick locks for canceling. However, this caused two problems as mentioned in LU-5727. The first problem is that the lock can only be picked for canceling only if the number of LRU locks is over preset LRU number AND it's aged; the second problem is that mdc_cancel_weight() tends to not cancel OPEN locks, therefore open locks can be kept forever and finally exhausts memory on the MDT side. The commit 7b2d26b0 ("revert changes to ldlm_cancel_aged_policy") fixed the first problem. This patch will revert the rest of changes related to LRU policy revise. Signed-off-by: Jinshan Xiong <[email protected]> Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5727 Reviewed-on: http://review.whamcloud.com/12733 Reviewed-by: Niu Yawei <[email protected]> Reviewed-by: Bobi Jam <[email protected]> Reviewed-by: Oleg Drokin <[email protected]> Signed-off-by: James Simmons <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2016-09-19staging: lustre: recovery: don't replay closed openNiu Yawei1-1/+5
To avoid scanning the replay open list every time in the ptlrpc_free_committed(), the fix of LU-2613 (4322e0f9) changed the ptlrpc_free_committed() to skip the open list unless the import generation is changed. That introduced a race which could make a closed open being replayed: 1. Application calls ll_close_inode_openhandle()-> mdc_close(), to close file, rq_replay is cleared, but the open request is still on the imp_committed_list; 2. Before the md_clear_open_replay_data() is called for close, client start replay, and that closed open will be replayed mistakenly; 3. Open replay interpret callback (mdc_replay_open) could race with the mdc_clear_open_replay_data() at the end; This patch fix the ptlrpc_free_committed() to make sure the open list is scanned on recovery to prevent the closed open request from being replayed. Signed-off-by: Niu Yawei <[email protected]> Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5507 Reviewed-on: http://review.whamcloud.com/12667 Reviewed-by: Lai Siyao <[email protected]> Reviewed-by: Andreas Dilger <[email protected]> Reviewed-by: Oleg Drokin <[email protected]> Signed-off-by: James Simmons <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2016-09-19staging: lustre: changelog: Proper record remappingHenri Doreau1-22/+44
Fixed changelog_remap_rec() to correctly remap records emitted with jobid_var=disabled, i.e. delivered by new servers but with no jobid field. Signed-off-by: Henri Doreau <[email protected]> Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5862 Reviewed-on: http://review.whamcloud.com/12574 Reviewed-by: Andreas Dilger <[email protected]> Reviewed-by: Robert Read <[email protected]> Signed-off-by: James Simmons <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2016-09-19staging: lustre: llite: remove ll_objects_destroy()John L. Hammond11-239/+6
Remove ll_objects_destroy(). This function is not needed for interoperability with servers of version 2.4 or higher. Remove the then unused function lov_destroy() and its supporting functions. Remove the lsm_destroy method of struct lsm_operations. Remove the unused struct lov_stripe_md, MD export, and capa parameters from obd_destroy() and its implementations. Signed-off-by: John L. Hammond <[email protected]> Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5814 Reviewed-on: http://review.whamcloud.com/12618 Reviewed-by: Andreas Dilger <[email protected]> Reviewed-by: Jinshan Xiong <[email protected]> Reviewed-by: Lai Siyao <[email protected]> Reviewed-by: Oleg Drokin <[email protected]> Signed-off-by: James Simmons <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>