aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2013-09-10iser-target: introduce fast memory registration mode (FRWR)Vu Pham2-13/+371
This model was introduced in 00f7ec36c "RDMA/core: Add memory management extensions support" and works when the IB device supports the IB_DEVICE_MEM_MGT_EXTENSIONS capability. Upon creating the isert device, ib_isert will test whether the HCA supports FRWR. If supported then set the flag and assign function pointers that handle fast registration and deregistration of appropriate resources (fast_reg descriptors). When new connection coming in, ib_isert will check frwr flag and create frwr resouces, if fail to do it will switch back to old model of using global dma key and turn off the frwr support. Registration is done using posting IB_WR_FAST_REG_MR to the QP and invalidations using posting IB_WR_LOCAL_INV. Signed-off-by: Vu Pham <[email protected]> Signed-off-by: Nicholas Bellinger <[email protected]>
2013-09-10iser-target: generalize rdma memory registration and cleanupVu Pham2-4/+23
Current driver uses global dma key to register the memory pointed by sg list provided by the target core. This is the preparation step for adding more methods like fast path memory registration, make the reg/unreg calls be function pointers. Signed-off-by: Vu Pham <[email protected]> Signed-off-by: Nicholas Bellinger <[email protected]>
2013-09-10iser-target: move rdma wr processing to a shared functionVu Pham2-152/+114
isert_put_datain() and isert_get_dataout() share a lot of code in rdma wr processing, move this common code to a shared function. Use isert_unmap_cmd to cleanup for RDMA_READ completion. Remove duplicate field in isert_cmd and isert_rdma_wr structs Change misc debug messages to track isert_cmd Signed-off-by: Sagi Grimberg <[email protected]> Signed-off-by: Vu Pham <[email protected]> Signed-off-by: Nicholas Bellinger <[email protected]>
2013-09-10target: Enable global EXTENDED_COPY setup/releaseNicholas Bellinger1-0/+6
Add calls to target_xcopy_setup_pt() + target_xcopy_release_pt() to target_core_init_configfs() and target_core_exit_configfs() respectively. Cc: Christoph Hellwig <[email protected]> Cc: Hannes Reinecke <[email protected]> Cc: Martin Petersen <[email protected]> Cc: Chris Mason <[email protected]> Cc: Roland Dreier <[email protected]> Cc: Zach Brown <[email protected]> Cc: James Bottomley <[email protected]> Cc: Nicholas Bellinger <[email protected]> Signed-off-by: Nicholas Bellinger <[email protected]>
2013-09-10target: Add Third Party Copy (3PC) bit in INQUIRY responseNicholas Bellinger5-0/+28
This patch adds the Third Party Copy (3PC) bit to signal support for EXTENDED_COPY within standard inquiry response data. Also add emulate_3pc device attribute in configfs (enabled by default) to allow the exposure of this bit to be disabled, if necessary. Cc: Christoph Hellwig <[email protected]> Cc: Hannes Reinecke <[email protected]> Cc: Martin Petersen <[email protected]> Cc: Chris Mason <[email protected]> Cc: Roland Dreier <[email protected]> Cc: Zach Brown <[email protected]> Cc: James Bottomley <[email protected]> Cc: Nicholas Bellinger <[email protected]> Signed-off-by: Nicholas Bellinger <[email protected]>
2013-09-10target: Enable EXTENDED_COPY setup in spc_parse_cdbNicholas Bellinger1-2/+8
Setup up the se_cmd->execute_cmd() pointers for EXTENDED_COPY and RECEIVE_COPY_RESULTS handling within spc_parse_cdb() Cc: Christoph Hellwig <[email protected]> Cc: Hannes Reinecke <[email protected]> Cc: Martin Petersen <[email protected]> Cc: Chris Mason <[email protected]> Cc: Roland Dreier <[email protected]> Cc: Zach Brown <[email protected]> Cc: James Bottomley <[email protected]> Cc: Nicholas Bellinger <[email protected]> Signed-off-by: Nicholas Bellinger <[email protected]>
2013-09-10target: Add support for EXTENDED_COPY copy offload emulationNicholas Bellinger4-1/+1147
This patch adds support for EXTENDED_COPY emulation from SPC-3, that enables full copy offload target support within both a single virtual backend device, and across multiple virtual backend devices. It also functions independent of target fabric, and supports copy offload across multiple target fabric ports. This implemenation supports both EXTENDED_COPY PUSH and PULL models of operation, so the actual CDB may be received on either source or desination logical unit. For Target Descriptors, it currently supports the NAA IEEE Registered Extended designator (type 0xe4), which allows the reference of target ports to occur independent of fabric type using EVPD 0x83 WWNs. For Segment Descriptors, it currently supports copy from block to block (0x02) mode. It also honors any present SCSI reservations of the destination target port. Note that only Supports No List Identifier (SNLID=1) mode is supported. Also included is basic RECEIVE_COPY_RESULTS with service action type OPERATING PARAMETERS (0x03) required for SNLID=1 operation. v3 changes: - Fix incorrect return type in target_do_receive_copy_results() (Fengguang) v2 changes: - Use target_alloc_sgl() instead of transport_generic_get_mem() - Convert debug output to use pr_debug() - Convert target_xcopy_parse_target_descriptors() NAA IEEN WWN dump to use 0x%16phN format specification - Drop unnecessary xcopy_pt_cmd->xpt_passthrough_wsem, and associated usage in xcopy_pt_write_pending() and target_xcopy_issue_pt_cmd() - Add check for unsupported EXTENDED_COPY(LID4) service action bits in target_do_xcopy() Cc: Christoph Hellwig <[email protected]> Cc: Hannes Reinecke <[email protected]> Cc: Martin Petersen <[email protected]> Cc: Chris Mason <[email protected]> Cc: Roland Dreier <[email protected]> Cc: Zach Brown <[email protected]> Cc: James Bottomley <[email protected]> Cc: Nicholas Bellinger <[email protected]> Signed-off-by: Nicholas Bellinger <[email protected]>
2013-09-10target: Avoid non-existent tg_pt_gp_mem in target_alua_state_checkNicholas Bellinger1-0/+3
This patch adds an check for a non-existent port->sep_alua_tg_pt_gp_mem within target_alua_state_check(), which is not present for internally dispatched EXTENDED_COPY WRITE I/O to the destination target port. Cc: Christoph Hellwig <[email protected]> Cc: Hannes Reinecke <[email protected]> Cc: Martin Petersen <[email protected]> Cc: Chris Mason <[email protected]> Cc: Roland Dreier <[email protected]> Cc: Zach Brown <[email protected]> Cc: James Bottomley <[email protected]> Cc: Nicholas Bellinger <[email protected]> Signed-off-by: Nicholas Bellinger <[email protected]>
2013-09-10target: Add global device list for EXTENDED_COPYNicholas Bellinger2-0/+14
EXTENDED_COPY needs to be able to search a global list of devices based on NAA WWN device identifiers, so add a simple g_device_list protected by g_device_mutex. Cc: Christoph Hellwig <[email protected]> Cc: Hannes Reinecke <[email protected]> Cc: Martin Petersen <[email protected]> Cc: Chris Mason <[email protected]> Cc: Roland Dreier <[email protected]> Cc: Zach Brown <[email protected]> Cc: James Bottomley <[email protected]> Cc: Nicholas Bellinger <[email protected]> Signed-off-by: Nicholas Bellinger <[email protected]>
2013-09-10target: Make helpers non static for EXTENDED_COPY command setupNicholas Bellinger2-2/+6
Both target_alloc_sgl() and transport_generic_map_mem_to_cmd() are required by EXTENDED_COPY logic when setting up internally dispatched command descriptors, so go ahead and make both of these non static. Cc: Christoph Hellwig <[email protected]> Cc: Hannes Reinecke <[email protected]> Cc: Martin Petersen <[email protected]> Cc: Chris Mason <[email protected]> Cc: Roland Dreier <[email protected]> Cc: Zach Brown <[email protected]> Cc: James Bottomley <[email protected]> Cc: Nicholas Bellinger <[email protected]> Signed-off-by: Nicholas Bellinger <[email protected]>
2013-09-10target: Make spc_parse_naa_6h_vendor_specific non staticNicholas Bellinger1-2/+2
This patch makes spc_parse_naa_6h_vendor_specific() available to other target code, which is required by EXTENDED_COPY when comparing the received NAA WWN device identifer for locating the associated se_device backend. Cc: Christoph Hellwig <[email protected]> Cc: Hannes Reinecke <[email protected]> Cc: Martin Petersen <[email protected]> Cc: Chris Mason <[email protected]> Cc: Roland Dreier <[email protected]> Cc: Zach Brown <[email protected]> Cc: James Bottomley <[email protected]> Cc: Nicholas Bellinger <[email protected]> Signed-off-by: Nicholas Bellinger <[email protected]>
2013-09-10target: Make target_core_subsystem defined as non staticNicholas Bellinger1-1/+1
This patch makes the top-level target_core_subsystem array available to other target code, which is required by EXTENDED_COPY to pin the backend se_device using configfs_depend_item(), in order to ensure it can't be removed for the duration of a EXTENDED_COPY operation. Cc: Christoph Hellwig <[email protected]> Cc: Hannes Reinecke <[email protected]> Cc: Martin Petersen <[email protected]> Cc: Chris Mason <[email protected]> Cc: Roland Dreier <[email protected]> Cc: Zach Brown <[email protected]> Cc: James Bottomley <[email protected]> Cc: Nicholas Bellinger <[email protected]> Signed-off-by: Nicholas Bellinger <[email protected]>
2013-09-10target/tcm_qla2xxx: Add/use target_reverse_dma_direction() in ↵Nicholas Bellinger2-28/+29
target_core_fabric.h Reversing the dma_data_direction for pci_map_sg() friends is useful for other drivers, so move it from tcm_qla2xxx into inline code within target_core_fabric.h. Also drop internal usage of equivlient in tcm_qla2xxx fabric code. Reported-by: Christoph Hellwig <[email protected]> Cc: Roland Dreier <[email protected]> Cc: Giridhar Malavali <[email protected]> Cc: Chad Dupuis <[email protected]> Cc: Nicholas Bellinger <[email protected]> Signed-off-by: Nicholas Bellinger <[email protected]>
2013-09-10target: Release COMPARE_AND_WRITE mutex in generic failure pathNicholas Bellinger2-0/+14
This patch adds a extra check for SCF_COMPARE_AND_WRITE within transport_generic_request_failure() to invoke the callback for compare_and_write_callback() or compare_and_write_done(), in order to release se_dev->caw_mutex from the generic failure path. It also adds to checks within compare_and_write_callback() to avoid processing when transport_generic_request_failure() occurs early enough that cmd->t_data_sg or cmd->t_bidi_data_sg have not been setup yet, nor se_dev->caw_mutex obtained from within sbc_compare_and_write(). v4 changes: - Add explicit check for cmd->transport_complete_callback in transport_generic_request_failure() to handle case where sbc_compare_and_write()clears callback pointer (Dan Carpenter) Cc: Christoph Hellwig <[email protected]> Cc: Hannes Reinecke <[email protected]> Cc: Martin Petersen <[email protected]> Cc: Chris Mason <[email protected]> Cc: James Bottomley <[email protected]> Cc: Nicholas Bellinger <[email protected]> Signed-off-by: Nicholas Bellinger <[email protected]>
2013-09-10target: Add compare_and_write_post() completion callback fall throughNicholas Bellinger2-10/+12
This patch changes target_complete_ok_work() to fall through after calling the se_cmd->transport_complete_callback() -> compare_and_write_post() callback, by keying off the existance of SCF_COMPARE_AND_WRITE_POST. This is necessary because once SCF_COMPARE_AND_WRITE_POST has been set by compare_and_write_post(), the SCSI response needs to be sent via TFO->queue_status(). Cc: Christoph Hellwig <[email protected]> Cc: Hannes Reinecke <[email protected]> Cc: Martin Petersen <[email protected]> Cc: Chris Mason <[email protected]> Cc: James Bottomley <[email protected]> Cc: Nicholas Bellinger <[email protected]> Signed-off-by: Nicholas Bellinger <[email protected]>
2013-09-10target: Add support for COMPARE_AND_WRITE emulationNicholas Bellinger3-1/+198
This patch adds support for COMPARE_AND_WRITE emulation on a per block basis. This logic is used as an atomic test and set primative currently used by VMWare ESX VAAI for performing array side locking of individual VMFS extent ownership. This includes the COMPARE_AND_WRITE CDB parsing within sbc_parse_cdb(), and does the majority of the work within the compare_and_write_callback() to perform the verify instance user data comparision, and subsequent write instance user data I/O submission upon a successfull comparision. The synchronization is enforced by se_device->caw_sem, that is obtained before the initial READ I/O submission in sbc_compare_and_write(). The mutex is then released upon MISCOMPARE in compare_and_write_callback(), or upon WRITE instance user-data completion in compare_and_write_post(). The implementation currently assumes a single logical block (NoLB=1). v4 changes: - Explicitly clear cmd->transport_complete_callback for two failure cases in sbc_compare_and_write() in order to avoid double unlock of ->caw_sem in compare_and_write_callback() (Dan Carpenter) v3 changes: - Convert se_device->caw_mutex to ->caw_sem v2 changes: - Set SCF_COMPARE_AND_WRITE and cmd->execute_cmd() to sbc_compare_and_write() during setup in sbc_parse_cdb() - Use sbc_compare_and_write() for initial READ submission with DMA_FROM_DEVICE - Reset cmd->execute_cmd() to sbc_execute_rw() for write instance user-data in compare_and_write_callback() - Drop SCF_BIDI command flag usage - Set TRANSPORT_PROCESSING + transport_state flags before write instance submission, and convert to __target_execute_cmd() - Prevent sbc_get_size() from being being called twice to generate incorrect size in sbc_parse_cdb() - Enforce se_device->caw_mutex synchronization between initial READ I/O submission, and final WRITE I/O completion. Cc: Christoph Hellwig <[email protected]> Cc: Hannes Reinecke <[email protected]> Cc: Martin Petersen <[email protected]> Cc: Chris Mason <[email protected]> Cc: James Bottomley <[email protected]> Cc: Nicholas Bellinger <[email protected]> Signed-off-by: Nicholas Bellinger <[email protected]>
2013-09-10[SCSI] lpfc 8.3.42: Fix random errors using first burstJames Smart1-2/+7
Signed-off-by: James Smart <[email protected]> Signed-off-by: James Bottomley <[email protected]>
2013-09-10super: fix for destroy lrusGlauber Costa3-2/+5
This patch adds the missing call to list_lru_destroy (spotted by Li Zhong) and moves the deletion to after the shrinker is unregistered, as correctly spotted by Dave Signed-off-by: Glauber Costa <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Dave Chinner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Al Viro <[email protected]>
2013-09-10list_lru: dynamically adjust node arraysGlauber Costa5-17/+37
We currently use a compile-time constant to size the node array for the list_lru structure. Due to this, we don't need to allocate any memory at initialization time. But as a consequence, the structures that contain embedded list_lru lists can become way too big (the superblock for instance contains two of them). This patch aims at ameliorating this situation by dynamically allocating the node arrays with the firmware provided nr_node_ids. Signed-off-by: Glauber Costa <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Mel Gorman <[email protected]> Cc: "Theodore Ts'o" <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Al Viro <[email protected]> Cc: Artem Bityutskiy <[email protected]> Cc: Arve Hjønnevåg <[email protected]> Cc: Carlos Maiolino <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Chuck Lever <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: David Rientjes <[email protected]> Cc: Gleb Natapov <[email protected]> Cc: Greg Thelen <[email protected]> Cc: J. Bruce Fields <[email protected]> Cc: Jan Kara <[email protected]> Cc: Jerome Glisse <[email protected]> Cc: John Stultz <[email protected]> Cc: KAMEZAWA Hiroyuki <[email protected]> Cc: Kent Overstreet <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Marcelo Tosatti <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Steven Whitehouse <[email protected]> Cc: Thomas Hellstrom <[email protected]> Cc: Trond Myklebust <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Al Viro <[email protected]>
2013-09-10shrinker: Kill old ->shrink API.Dave Chinner3-45/+15
There are no more users of this API, so kill it dead, dead, dead and quietly bury the corpse in a shallow, unmarked grave in a dark forest deep in the hills... [[email protected]: added flowers to the grave] Signed-off-by: Dave Chinner <[email protected]> Signed-off-by: Glauber Costa <[email protected]> Reviewed-by: Greg Thelen <[email protected]> Acked-by: Mel Gorman <[email protected]> Cc: "Theodore Ts'o" <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Al Viro <[email protected]> Cc: Artem Bityutskiy <[email protected]> Cc: Arve Hjønnevåg <[email protected]> Cc: Carlos Maiolino <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Chuck Lever <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: David Rientjes <[email protected]> Cc: Gleb Natapov <[email protected]> Cc: Greg Thelen <[email protected]> Cc: J. Bruce Fields <[email protected]> Cc: Jan Kara <[email protected]> Cc: Jerome Glisse <[email protected]> Cc: John Stultz <[email protected]> Cc: KAMEZAWA Hiroyuki <[email protected]> Cc: Kent Overstreet <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Marcelo Tosatti <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Steven Whitehouse <[email protected]> Cc: Thomas Hellstrom <[email protected]> Cc: Trond Myklebust <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Al Viro <[email protected]>
2013-09-10shrinker: convert remaining shrinkers to count/scan APIDave Chinner2-21/+45
Convert the remaining couple of random shrinkers in the tree to the new API. Signed-off-by: Dave Chinner <[email protected]> Signed-off-by: Glauber Costa <[email protected]> Cc: Marcelo Tosatti <[email protected]> Cc: Gleb Natapov <[email protected]> Cc: Chuck Lever <[email protected]> Cc: J. Bruce Fields <[email protected]> Cc: Trond Myklebust <[email protected]> Cc: "Theodore Ts'o" <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Al Viro <[email protected]> Cc: Artem Bityutskiy <[email protected]> Cc: Arve Hjønnevåg <[email protected]> Cc: Carlos Maiolino <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Chuck Lever <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: David Rientjes <[email protected]> Cc: Gleb Natapov <[email protected]> Cc: Greg Thelen <[email protected]> Cc: J. Bruce Fields <[email protected]> Cc: Jan Kara <[email protected]> Cc: Jerome Glisse <[email protected]> Cc: John Stultz <[email protected]> Cc: KAMEZAWA Hiroyuki <[email protected]> Cc: Kent Overstreet <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Marcelo Tosatti <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Steven Whitehouse <[email protected]> Cc: Thomas Hellstrom <[email protected]> Cc: Trond Myklebust <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Al Viro <[email protected]>
2013-09-10staging/lustre/libcfs: cleanup linux-mem.hPeng Tao1-38/+0
remove shrinker related wrappers. Signed-off-by: Peng Tao <[email protected]> Signed-off-by: Andreas Dilger <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Dave Chinner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Al Viro <[email protected]>
2013-09-10staging/lustre/ptlrpc: convert to new shrinker APIPeng Tao1-34/+42
Convert sptlrpc encode pool shrinker to use scan/count API. Signed-off-by: Peng Tao <[email protected]> Signed-off-by: Andreas Dilger <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Dave Chinner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Al Viro <[email protected]>
2013-09-10staging/lustre/obdclass: convert lu_object shrinker to count/scan APIPeng Tao1-46/+52
convert lu_object shrinker to new count/scan API. Signed-off-by: Peng Tao <[email protected]> Signed-off-by: Andreas Dilger <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Dave Chinner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Al Viro <[email protected]>
2013-09-10staging/lustre/ldlm: convert to shrinkers to count/scan APIPeng Tao1-69/+79
convert ldlm shrinker to new count/scan API. Signed-off-by: Peng Tao <[email protected]> Signed-off-by: Andreas Dilger <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Dave Chinner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Al Viro <[email protected]>
2013-09-10hugepage: convert huge zero page shrinker to new shrinker APIGlauber Costa1-6/+11
It consists of: * returning long instead of int * separating count from scan * returning the number of freed entities in scan Signed-off-by: Glauber Costa <[email protected]> Reviewed-by: Greg Thelen <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Cc: Dave Chinner <[email protected]> Cc: "Theodore Ts'o" <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Al Viro <[email protected]> Cc: Artem Bityutskiy <[email protected]> Cc: Arve Hjønnevåg <[email protected]> Cc: Carlos Maiolino <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Chuck Lever <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: David Rientjes <[email protected]> Cc: Gleb Natapov <[email protected]> Cc: Greg Thelen <[email protected]> Cc: J. Bruce Fields <[email protected]> Cc: Jan Kara <[email protected]> Cc: Jerome Glisse <[email protected]> Cc: John Stultz <[email protected]> Cc: KAMEZAWA Hiroyuki <[email protected]> Cc: Kent Overstreet <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Marcelo Tosatti <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Steven Whitehouse <[email protected]> Cc: Thomas Hellstrom <[email protected]> Cc: Trond Myklebust <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Al Viro <[email protected]>
2013-09-10i915: bail out earlier when shrinker cannot acquire mutexGlauber Costa1-2/+2
The main shrinker driver will keep trying for a while to free objects if the returned value from the shrink scan procedure is 0. That means "no objects now", but a retry could very well succeed. But what we should say here is a different thing: that it is impossible to shrink, and we would better bail out soon. We find this behavior more appropriate for the case where the lock cannot be taken. Specially given the hammer behavior of the i915: if another thread is already shrinking, we are likely not to be able to shrink anything anyway when we finally acquire the mutex. Signed-off-by: Glauber Costa <[email protected]> Acked-by: Daniel Vetter <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Kent Overstreet <[email protected]> Cc: "Theodore Ts'o" <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Al Viro <[email protected]> Cc: Artem Bityutskiy <[email protected]> Cc: Arve Hjønnevåg <[email protected]> Cc: Carlos Maiolino <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Chuck Lever <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: David Rientjes <[email protected]> Cc: Gleb Natapov <[email protected]> Cc: Greg Thelen <[email protected]> Cc: J. Bruce Fields <[email protected]> Cc: Jan Kara <[email protected]> Cc: Jerome Glisse <[email protected]> Cc: John Stultz <[email protected]> Cc: KAMEZAWA Hiroyuki <[email protected]> Cc: Kent Overstreet <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Marcelo Tosatti <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Steven Whitehouse <[email protected]> Cc: Thomas Hellstrom <[email protected]> Cc: Trond Myklebust <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Al Viro <[email protected]>
2013-09-10drivers: convert shrinkers to new count/scan APIDave Chinner9-136/+236
Convert the driver shrinkers to the new API. Most changes are compile tested only because I either don't have the hardware or it's staging stuff. FWIW, the md and android code is pretty good, but the rest of it makes me want to claw my eyes out. The amount of broken code I just encountered is mind boggling. I've added comments explaining what is broken, but I fear that some of the code would be best dealt with by being dragged behind the bike shed, burying in mud up to it's neck and then run over repeatedly with a blunt lawn mower. Special mention goes to the zcache/zcache2 drivers. They can't co-exist in the build at the same time, they are under different menu options in menuconfig, they only show up when you've got the right set of mm subsystem options configured and so even compile testing is an exercise in pulling teeth. And that doesn't even take into account the horrible, broken code... [[email protected]: fixes for i915, android lowmem, zcache, bcache] Signed-off-by: Dave Chinner <[email protected]> Signed-off-by: Glauber Costa <[email protected]> Acked-by: Mel Gorman <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: Kent Overstreet <[email protected]> Cc: John Stultz <[email protected]> Cc: David Rientjes <[email protected]> Cc: Jerome Glisse <[email protected]> Cc: Thomas Hellstrom <[email protected]> Cc: "Theodore Ts'o" <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Al Viro <[email protected]> Cc: Artem Bityutskiy <[email protected]> Cc: Arve Hjønnevåg <[email protected]> Cc: Carlos Maiolino <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Chuck Lever <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: David Rientjes <[email protected]> Cc: Gleb Natapov <[email protected]> Cc: Greg Thelen <[email protected]> Cc: J. Bruce Fields <[email protected]> Cc: Jan Kara <[email protected]> Cc: Jerome Glisse <[email protected]> Cc: John Stultz <[email protected]> Cc: KAMEZAWA Hiroyuki <[email protected]> Cc: Kent Overstreet <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Marcelo Tosatti <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Steven Whitehouse <[email protected]> Cc: Thomas Hellstrom <[email protected]> Cc: Trond Myklebust <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Al Viro <[email protected]>
2013-09-10fs: convert fs shrinkers to new scan/count APIDave Chinner14-96/+158
Convert the filesystem shrinkers to use the new API, and standardise some of the behaviours of the shrinkers at the same time. For example, nr_to_scan means the number of objects to scan, not the number of objects to free. I refactored the CIFS idmap shrinker a little - it really needs to be broken up into a shrinker per tree and keep an item count with the tree root so that we don't need to walk the tree every time the shrinker needs to count the number of objects in the tree (i.e. all the time under memory pressure). [[email protected]: fixes for ext4, ubifs, nfs, cifs and glock. Fixes are needed mainly due to new code merged in the tree] [assorted fixes folded in] Signed-off-by: Dave Chinner <[email protected]> Signed-off-by: Glauber Costa <[email protected]> Acked-by: Mel Gorman <[email protected]> Acked-by: Artem Bityutskiy <[email protected]> Acked-by: Jan Kara <[email protected]> Acked-by: Steven Whitehouse <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: "Theodore Ts'o" <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Al Viro <[email protected]> Cc: Artem Bityutskiy <[email protected]> Cc: Arve Hjønnevåg <[email protected]> Cc: Carlos Maiolino <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Chuck Lever <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: David Rientjes <[email protected]> Cc: Gleb Natapov <[email protected]> Cc: Greg Thelen <[email protected]> Cc: J. Bruce Fields <[email protected]> Cc: Jan Kara <[email protected]> Cc: Jerome Glisse <[email protected]> Cc: John Stultz <[email protected]> Cc: KAMEZAWA Hiroyuki <[email protected]> Cc: Kent Overstreet <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Marcelo Tosatti <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Steven Whitehouse <[email protected]> Cc: Thomas Hellstrom <[email protected]> Cc: Trond Myklebust <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Al Viro <[email protected]>
2013-09-10xfs: fix dquot isolation hangDave Chinner1-4/+6
The new LRU list isolation code in xfs_qm_dquot_isolate() isn't completely up to date. Firstly, it needs conversion to return enum lru_status values, not raw numbers. Secondly - most importantly - it fails to unlock the dquot and relock the LRU in the LRU_RETRY path. This leads to deadlocks in xfstests generic/232. Fix them. Signed-off-by: Dave Chinner <[email protected]> Cc: Glauber Costa <[email protected]> Cc: Michal Hocko <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Al Viro <[email protected]>
2013-09-10xfs-convert-dquot-cache-lru-to-list_lru-fixAndrew Morton1-3/+3
fix warnings Cc: Dave Chinner <[email protected]> Cc: Glauber Costa <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Al Viro <[email protected]>
2013-09-10xfs: convert dquot cache lru to list_lruDave Chinner3-144/+144
Convert the XFS dquot lru to use the list_lru construct and convert the shrinker to being node aware. [[email protected]: edited for conflicts + warning fixes] Signed-off-by: Dave Chinner <[email protected]> Signed-off-by: Glauber Costa <[email protected]> Cc: "Theodore Ts'o" <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Al Viro <[email protected]> Cc: Artem Bityutskiy <[email protected]> Cc: Arve Hjønnevåg <[email protected]> Cc: Carlos Maiolino <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Chuck Lever <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: David Rientjes <[email protected]> Cc: Gleb Natapov <[email protected]> Cc: Greg Thelen <[email protected]> Cc: J. Bruce Fields <[email protected]> Cc: Jan Kara <[email protected]> Cc: Jerome Glisse <[email protected]> Cc: John Stultz <[email protected]> Cc: KAMEZAWA Hiroyuki <[email protected]> Cc: Kent Overstreet <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Marcelo Tosatti <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Steven Whitehouse <[email protected]> Cc: Thomas Hellstrom <[email protected]> Cc: Trond Myklebust <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Al Viro <[email protected]>
2013-09-10xfs: rework buffer dispose list trackingDave Chinner2-58/+79
In converting the buffer lru lists to use the generic code, the locking for marking the buffers as on the dispose list was lost. This results in confusion in LRU buffer tracking and acocunting, resulting in reference counts being mucked up and filesystem beig unmountable. To fix this, introduce an internal buffer spinlock to protect the state field that holds the dispose list information. Because there is now locking needed around xfs_buf_lru_add/del, and they are used in exactly one place each two lines apart, get rid of the wrappers and code the logic directly in place. Further, the LRU emptying code used on unmount is less than optimal. Convert it to use a dispose list as per a normal shrinker walk, and repeat the walk that fills the dispose list until the LRU is empty. Thi avoids needing to drop and regain the LRU lock for every item being freed, and allows the same logic as the shrinker isolate call to be used. Simpler, easier to understand. Signed-off-by: Dave Chinner <[email protected]> Signed-off-by: Glauber Costa <[email protected]> Cc: "Theodore Ts'o" <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Al Viro <[email protected]> Cc: Artem Bityutskiy <[email protected]> Cc: Arve Hjønnevåg <[email protected]> Cc: Carlos Maiolino <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Chuck Lever <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: David Rientjes <[email protected]> Cc: Gleb Natapov <[email protected]> Cc: Greg Thelen <[email protected]> Cc: J. Bruce Fields <[email protected]> Cc: Jan Kara <[email protected]> Cc: Jerome Glisse <[email protected]> Cc: John Stultz <[email protected]> Cc: KAMEZAWA Hiroyuki <[email protected]> Cc: Kent Overstreet <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Marcelo Tosatti <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Steven Whitehouse <[email protected]> Cc: Thomas Hellstrom <[email protected]> Cc: Trond Myklebust <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Al Viro <[email protected]>
2013-09-10xfs-convert-buftarg-lru-to-generic-code-fixAndrew Morton1-3/+3
fix warnings Cc: Dave Chinner <[email protected]> Cc: Glauber Costa <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Al Viro <[email protected]>
2013-09-10xfs: convert buftarg LRU to generic codeDave Chinner2-93/+82
Convert the buftarg LRU to use the new generic LRU list and take advantage of the functionality it supplies to make the buffer cache shrinker node aware. Signed-off-by: Glauber Costa <[email protected]> Signed-off-by: Dave Chinner <[email protected]> Cc: "Theodore Ts'o" <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Al Viro <[email protected]> Cc: Artem Bityutskiy <[email protected]> Cc: Arve Hjønnevåg <[email protected]> Cc: Carlos Maiolino <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Chuck Lever <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: David Rientjes <[email protected]> Cc: Gleb Natapov <[email protected]> Cc: Greg Thelen <[email protected]> Cc: J. Bruce Fields <[email protected]> Cc: Jan Kara <[email protected]> Cc: Jerome Glisse <[email protected]> Cc: John Stultz <[email protected]> Cc: KAMEZAWA Hiroyuki <[email protected]> Cc: Kent Overstreet <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Marcelo Tosatti <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Steven Whitehouse <[email protected]> Cc: Thomas Hellstrom <[email protected]> Cc: Trond Myklebust <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Al Viro <[email protected]>
2013-09-10fs: convert inode and dentry shrinking to be node awareDave Chinner6-21/+33
Now that the shrinker is passing a node in the scan control structure, we can pass this to the the generic LRU list code to isolate reclaim to the lists on matching nodes. Signed-off-by: Dave Chinner <[email protected]> Signed-off-by: Glauber Costa <[email protected]> Acked-by: Mel Gorman <[email protected]> Cc: "Theodore Ts'o" <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Al Viro <[email protected]> Cc: Artem Bityutskiy <[email protected]> Cc: Arve Hjønnevåg <[email protected]> Cc: Carlos Maiolino <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Chuck Lever <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: David Rientjes <[email protected]> Cc: Gleb Natapov <[email protected]> Cc: Greg Thelen <[email protected]> Cc: J. Bruce Fields <[email protected]> Cc: Jan Kara <[email protected]> Cc: Jerome Glisse <[email protected]> Cc: John Stultz <[email protected]> Cc: KAMEZAWA Hiroyuki <[email protected]> Cc: Kent Overstreet <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Marcelo Tosatti <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Steven Whitehouse <[email protected]> Cc: Thomas Hellstrom <[email protected]> Cc: Trond Myklebust <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Al Viro <[email protected]>
2013-09-10vmscan: per-node deferred workGlauber Costa2-103/+152
The list_lru infrastructure already keeps per-node LRU lists in its node-specific list_lru_node arrays and provide us with a per-node API, and the shrinkers are properly equiped with node information. This means that we can now focus our shrinking effort in a single node, but the work that is deferred from one run to another is kept global at nr_in_batch. Work can be deferred, for instance, during direct reclaim under a GFP_NOFS allocation, where situation, all the filesystem shrinkers will be prevented from running and accumulate in nr_in_batch the amount of work they should have done, but could not. This creates an impedance problem, where upon node pressure, work deferred will accumulate and end up being flushed in other nodes. The problem we describe is particularly harmful in big machines, where many nodes can accumulate at the same time, all adding to the global counter nr_in_batch. As we accumulate more and more, we start to ask for the caches to flush even bigger numbers. The result is that the caches are depleted and do not stabilize. To achieve stable steady state behavior, we need to tackle it differently. In this patch we keep the deferred count per-node, in the new array nr_deferred[] (the name is also a bit more descriptive) and will never accumulate that to other nodes. Signed-off-by: Glauber Costa <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Mel Gorman <[email protected]> Cc: "Theodore Ts'o" <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Al Viro <[email protected]> Cc: Artem Bityutskiy <[email protected]> Cc: Arve Hjønnevåg <[email protected]> Cc: Carlos Maiolino <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Chuck Lever <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: David Rientjes <[email protected]> Cc: Gleb Natapov <[email protected]> Cc: Greg Thelen <[email protected]> Cc: J. Bruce Fields <[email protected]> Cc: Jan Kara <[email protected]> Cc: Jerome Glisse <[email protected]> Cc: John Stultz <[email protected]> Cc: KAMEZAWA Hiroyuki <[email protected]> Cc: Kent Overstreet <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Marcelo Tosatti <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Steven Whitehouse <[email protected]> Cc: Thomas Hellstrom <[email protected]> Cc: Trond Myklebust <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Al Viro <[email protected]>
2013-09-10shrinker: add node awarenessDave Chinner5-3/+17
Pass the node of the current zone being reclaimed to shrink_slab(), allowing the shrinker control nodemask to be set appropriately for node aware shrinkers. Signed-off-by: Dave Chinner <[email protected]> Signed-off-by: Glauber Costa <[email protected]> Acked-by: Mel Gorman <[email protected]> Cc: "Theodore Ts'o" <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Al Viro <[email protected]> Cc: Artem Bityutskiy <[email protected]> Cc: Arve Hjønnevåg <[email protected]> Cc: Carlos Maiolino <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Chuck Lever <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: David Rientjes <[email protected]> Cc: Gleb Natapov <[email protected]> Cc: Greg Thelen <[email protected]> Cc: J. Bruce Fields <[email protected]> Cc: Jan Kara <[email protected]> Cc: Jerome Glisse <[email protected]> Cc: John Stultz <[email protected]> Cc: KAMEZAWA Hiroyuki <[email protected]> Cc: Kent Overstreet <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Marcelo Tosatti <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Steven Whitehouse <[email protected]> Cc: Thomas Hellstrom <[email protected]> Cc: Trond Myklebust <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Al Viro <[email protected]>
2013-09-10list_lru: remove special case function list_lru_dispose_all.Glauber Costa3-79/+29
The list_lru implementation has one function, list_lru_dispose_all, with only one user (the dentry code). At first, such function appears to make sense because we are really not interested in the result of isolating each dentry separately - all of them are going away anyway. However, it's implementation is buggy in the following way: When we call list_lru_dispose_all in fs/dcache.c, we scan all dentries marking them with DCACHE_SHRINK_LIST. However, this is done without the nlru->lock taken. The imediate result of that is that someone else may add or remove the dentry from the LRU at the same time. When list_lru_del happens in that scenario we will see an element that is not yet marked with DCACHE_SHRINK_LIST (even though it will be in the future) and obviously remove it from an lru where the element no longer is. Since list_lru_dispose_all will in effect count down nlru's nr_items and list_lru_del will do the same, this will lead to an imbalance. The solution for this would not be so simple: we can obviously just keep the lru_lock taken, but then we have no guarantees that we will be able to acquire the dentry lock (dentry->d_lock). To properly solve this, we need a communication mechanism between the lru and dentry code, so they can coordinate this with each other. Such mechanism already exists in the form of the list_lru_walk_cb callback. So it is possible to construct a dcache-side prune function that does the right thing only by calling list_lru_walk in a loop until no more dentries are available. With only one user, plus the fact that a sane solution for the problem would involve boucing between dcache and list_lru anyway, I see little justification to keep the special case list_lru_dispose_all in tree. Signed-off-by: Glauber Costa <[email protected]> Cc: Michal Hocko <[email protected]> Acked-by: Dave Chinner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Al Viro <[email protected]>
2013-09-10list_lru: per-node APIGlauber Costa2-33/+43
This patch adapts the list_lru API to accept an optional node argument, to be used by NUMA aware shrinking functions. Code that does not care about the NUMA placement of objects can still call into the very same functions as before. They will simply iterate over all nodes. Signed-off-by: Glauber Costa <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Mel Gorman <[email protected]> Cc: "Theodore Ts'o" <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Al Viro <[email protected]> Cc: Artem Bityutskiy <[email protected]> Cc: Arve Hjønnevåg <[email protected]> Cc: Carlos Maiolino <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Chuck Lever <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: David Rientjes <[email protected]> Cc: Gleb Natapov <[email protected]> Cc: Greg Thelen <[email protected]> Cc: J. Bruce Fields <[email protected]> Cc: Jan Kara <[email protected]> Cc: Jerome Glisse <[email protected]> Cc: John Stultz <[email protected]> Cc: KAMEZAWA Hiroyuki <[email protected]> Cc: Kent Overstreet <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Marcelo Tosatti <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Steven Whitehouse <[email protected]> Cc: Thomas Hellstrom <[email protected]> Cc: Trond Myklebust <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Al Viro <[email protected]>
2013-09-10list_lru: fix broken LRU_RETRY behaviourDave Chinner1-17/+12
The LRU_RETRY code assumes that the list traversal status after we have dropped and regained the list lock. Unfortunately, this is not a valid assumption, and that can lead to racing traversals isolating objects that the other traversal expects to be the next item on the list. This is causing problems with the inode cache shrinker isolation, with races resulting in an inode on a dispose list being "isolated" because a racing traversal still thinks it is on the LRU. The inode is then never reclaimed and that causes hangs if a subsequent lookup on that inode occurs. Fix it by always restarting the list walk on a LRU_RETRY return from the isolate callback. Avoid the possibility of livelocks the current code was trying to avoid by always decrementing the nr_to_walk counter on retries so that even if we keep hitting the same item on the list we'll eventually stop trying to walk and exit out of the situation causing the problem. Reported-by: Michal Hocko <[email protected]> Signed-off-by: Dave Chinner <[email protected]> Cc: Glauber Costa <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Al Viro <[email protected]>
2013-09-10list_lru: per-node list infrastructureDave Chinner2-40/+129
Now that we have an LRU list API, we can start to enhance the implementation. This splits the single LRU list into per-node lists and locks to enhance scalability. Items are placed on lists according to the node the memory belongs to. To make scanning the lists efficient, also track whether the per-node lists have entries in them in a active nodemask. Note: We use a fixed-size array for the node LRU, this struct can be very big if MAX_NUMNODES is big. If this becomes a problem this is fixable by turning this into a pointer and dynamically allocating this to nr_node_ids. This quantity is firwmare-provided, and still would provide room for all nodes at the cost of a pointer lookup and an extra allocation. Because that allocation will most likely come from a may very well fail. [[email protected]: fix warnings, added note about node lru] Signed-off-by: Dave Chinner <[email protected]> Signed-off-by: Glauber Costa <[email protected]> Reviewed-by: Greg Thelen <[email protected]> Acked-by: Mel Gorman <[email protected]> Cc: "Theodore Ts'o" <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Al Viro <[email protected]> Cc: Artem Bityutskiy <[email protected]> Cc: Arve Hjønnevåg <[email protected]> Cc: Carlos Maiolino <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Chuck Lever <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: David Rientjes <[email protected]> Cc: Gleb Natapov <[email protected]> Cc: Greg Thelen <[email protected]> Cc: J. Bruce Fields <[email protected]> Cc: Jan Kara <[email protected]> Cc: Jerome Glisse <[email protected]> Cc: John Stultz <[email protected]> Cc: KAMEZAWA Hiroyuki <[email protected]> Cc: Kent Overstreet <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Marcelo Tosatti <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Steven Whitehouse <[email protected]> Cc: Thomas Hellstrom <[email protected]> Cc: Trond Myklebust <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Al Viro <[email protected]>
2013-09-10dcache: convert to use new lru list infrastructureDave Chinner3-106/+90
[[email protected]: don't reintroduce double decrement of nr_unused_dentries, adapted for new LRU return codes] Signed-off-by: Dave Chinner <[email protected]> Signed-off-by: Glauber Costa <[email protected]> Cc: "Theodore Ts'o" <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Al Viro <[email protected]> Cc: Artem Bityutskiy <[email protected]> Cc: Arve Hjønnevåg <[email protected]> Cc: Carlos Maiolino <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Chuck Lever <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: David Rientjes <[email protected]> Cc: Gleb Natapov <[email protected]> Cc: Greg Thelen <[email protected]> Cc: J. Bruce Fields <[email protected]> Cc: Jan Kara <[email protected]> Cc: Jerome Glisse <[email protected]> Cc: John Stultz <[email protected]> Cc: KAMEZAWA Hiroyuki <[email protected]> Cc: Kent Overstreet <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Marcelo Tosatti <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Steven Whitehouse <[email protected]> Cc: Thomas Hellstrom <[email protected]> Cc: Trond Myklebust <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Al Viro <[email protected]>
2013-09-10inode: move inode to a different list inside lockGlauber Costa1-1/+1
When removing an element from the lru, this will be done today after the lock is released. This is a clear mistake, although we are not sure if the bugs we are seeing are related to this. All list manipulations are done inside the lock, and so should this one. Signed-off-by: Glauber Costa <[email protected]> Tested-by: Michal Hocko <[email protected]> Cc: Dave Chinner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Al Viro <[email protected]>
2013-09-10inode: convert inode lru list to generic lru list code.Dave Chinner3-116/+77
[[email protected]: adapted for new LRU return codes] Signed-off-by: Dave Chinner <[email protected]> Signed-off-by: Glauber Costa <[email protected]> Cc: "Theodore Ts'o" <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Al Viro <[email protected]> Cc: Artem Bityutskiy <[email protected]> Cc: Arve Hjønnevåg <[email protected]> Cc: Carlos Maiolino <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Chuck Lever <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: David Rientjes <[email protected]> Cc: Gleb Natapov <[email protected]> Cc: Greg Thelen <[email protected]> Cc: J. Bruce Fields <[email protected]> Cc: Jan Kara <[email protected]> Cc: Jerome Glisse <[email protected]> Cc: John Stultz <[email protected]> Cc: KAMEZAWA Hiroyuki <[email protected]> Cc: Kent Overstreet <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Marcelo Tosatti <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Steven Whitehouse <[email protected]> Cc: Thomas Hellstrom <[email protected]> Cc: Trond Myklebust <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Al Viro <[email protected]>
2013-09-10list: add a new LRU list typeDave Chinner3-1/+233
Several subsystems use the same construct for LRU lists - a list head, a spin lock and and item count. They also use exactly the same code for adding and removing items from the LRU. Create a generic type for these LRU lists. This is the beginning of generic, node aware LRUs for shrinkers to work with. [[email protected]: enum defined constants for lru. Suggested by gthelen, don't relock over retry] Signed-off-by: Dave Chinner <[email protected]> Signed-off-by: Glauber Costa <[email protected]> Reviewed-by: Greg Thelen <[email protected]> Acked-by: Mel Gorman <[email protected]> Cc: "Theodore Ts'o" <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Al Viro <[email protected]> Cc: Artem Bityutskiy <[email protected]> Cc: Arve Hjønnevåg <[email protected]> Cc: Carlos Maiolino <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Chuck Lever <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: David Rientjes <[email protected]> Cc: Gleb Natapov <[email protected]> Cc: Greg Thelen <[email protected]> Cc: J. Bruce Fields <[email protected]> Cc: Jan Kara <[email protected]> Cc: Jerome Glisse <[email protected]> Cc: John Stultz <[email protected]> Cc: KAMEZAWA Hiroyuki <[email protected]> Cc: Kent Overstreet <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Marcelo Tosatti <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Steven Whitehouse <[email protected]> Cc: Thomas Hellstrom <[email protected]> Cc: Trond Myklebust <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Al Viro <[email protected]>
2013-09-10shrinker: convert superblock shrinkers to new APIDave Chinner8-48/+70
Convert superblock shrinker to use the new count/scan API, and propagate the API changes through to the filesystem callouts. The filesystem callouts already use a count/scan API, so it's just changing counters to longs to match the VM API. This requires the dentry and inode shrinker callouts to be converted to the count/scan API. This is mainly a mechanical change. [[email protected]: use mult_frac for fractional proportions, build fixes] Signed-off-by: Dave Chinner <[email protected]> Signed-off-by: Glauber Costa <[email protected]> Acked-by: Mel Gorman <[email protected]> Cc: "Theodore Ts'o" <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Al Viro <[email protected]> Cc: Artem Bityutskiy <[email protected]> Cc: Arve Hjønnevåg <[email protected]> Cc: Carlos Maiolino <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Chuck Lever <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: David Rientjes <[email protected]> Cc: Gleb Natapov <[email protected]> Cc: Greg Thelen <[email protected]> Cc: J. Bruce Fields <[email protected]> Cc: Jan Kara <[email protected]> Cc: Jerome Glisse <[email protected]> Cc: John Stultz <[email protected]> Cc: KAMEZAWA Hiroyuki <[email protected]> Cc: Kent Overstreet <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Marcelo Tosatti <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Steven Whitehouse <[email protected]> Cc: Thomas Hellstrom <[email protected]> Cc: Trond Myklebust <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Al Viro <[email protected]>
2013-09-10mm: new shrinker APIDave Chinner2-29/+69
The current shrinker callout API uses an a single shrinker call for multiple functions. To determine the function, a special magical value is passed in a parameter to change the behaviour. This complicates the implementation and return value specification for the different behaviours. Separate the two different behaviours into separate operations, one to return a count of freeable objects in the cache, and another to scan a certain number of objects in the cache for freeing. In defining these new operations, ensure the return values and resultant behaviours are clearly defined and documented. Modify shrink_slab() to use the new API and implement the callouts for all the existing shrinkers. Signed-off-by: Dave Chinner <[email protected]> Signed-off-by: Glauber Costa <[email protected]> Acked-by: Mel Gorman <[email protected]> Cc: "Theodore Ts'o" <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Al Viro <[email protected]> Cc: Artem Bityutskiy <[email protected]> Cc: Arve Hjønnevåg <[email protected]> Cc: Carlos Maiolino <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Chuck Lever <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: David Rientjes <[email protected]> Cc: Gleb Natapov <[email protected]> Cc: Greg Thelen <[email protected]> Cc: J. Bruce Fields <[email protected]> Cc: Jan Kara <[email protected]> Cc: Jerome Glisse <[email protected]> Cc: John Stultz <[email protected]> Cc: KAMEZAWA Hiroyuki <[email protected]> Cc: Kent Overstreet <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Marcelo Tosatti <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Steven Whitehouse <[email protected]> Cc: Thomas Hellstrom <[email protected]> Cc: Trond Myklebust <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Al Viro <[email protected]>
2013-09-10dcache: remove dentries from LRU before putting on dispose listDave Chinner1-21/+78
One of the big problems with modifying the way the dcache shrinker and LRU implementation works is that the LRU is abused in several ways. One of these is shrink_dentry_list(). Basically, we can move a dentry off the LRU onto a different list without doing any accounting changes, and then use dentry_lru_prune() to remove it from what-ever list it is now on to do the LRU accounting at that point. This makes it -really hard- to change the LRU implementation. The use of the per-sb LRU lock serialises movement of the dentries between the different lists and the removal of them, and this is the only reason that it works. If we want to break up the dentry LRU lock and lists into, say, per-node lists, we remove the only serialisation that allows this lru list/dispose list abuse to work. To make this work effectively, the dispose list has to be isolated from the LRU list - dentries have to be removed from the LRU *before* being placed on the dispose list. This means that the LRU accounting and isolation is completed before disposal is started, and that means we can change the LRU implementation freely in future. This means that dentries *must* be marked with DCACHE_SHRINK_LIST when they are placed on the dispose list so that we don't think that parent dentries found in try_prune_one_dentry() are on the LRU when the are actually on the dispose list. This would result in accounting the dentry to the LRU a second time. Hence dentry_lru_del() has to handle the DCACHE_SHRINK_LIST case Signed-off-by: Dave Chinner <[email protected]> Signed-off-by: Glauber Costa <[email protected]> Cc: "Theodore Ts'o" <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Al Viro <[email protected]> Cc: Artem Bityutskiy <[email protected]> Cc: Arve Hjønnevåg <[email protected]> Cc: Carlos Maiolino <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Chuck Lever <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: David Rientjes <[email protected]> Cc: Gleb Natapov <[email protected]> Cc: Greg Thelen <[email protected]> Cc: J. Bruce Fields <[email protected]> Cc: Jan Kara <[email protected]> Cc: Jerome Glisse <[email protected]> Cc: John Stultz <[email protected]> Cc: KAMEZAWA Hiroyuki <[email protected]> Cc: Kent Overstreet <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Marcelo Tosatti <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Steven Whitehouse <[email protected]> Cc: Thomas Hellstrom <[email protected]> Cc: Trond Myklebust <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Al Viro <[email protected]>
2013-09-10dentry: move to per-sb LRU locksDave Chinner3-18/+20
With the dentry LRUs being per-sb structures, there is no real need for a global dentry_lru_lock. The locking can be made more fine-grained by moving to a per-sb LRU lock, isolating the LRU operations of different filesytsems completely from each other. The need for this is independent of any performance consideration that may arise: in the interest of abstracting the lru operations away, it is mandatory that each lru works around its own lock instead of a global lock for all of them. [[email protected]: updated changelog ] Signed-off-by: Dave Chinner <[email protected]> Signed-off-by: Glauber Costa <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Acked-by: Mel Gorman <[email protected]> Cc: "Theodore Ts'o" <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Al Viro <[email protected]> Cc: Artem Bityutskiy <[email protected]> Cc: Arve Hjønnevåg <[email protected]> Cc: Carlos Maiolino <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Chuck Lever <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: David Rientjes <[email protected]> Cc: Gleb Natapov <[email protected]> Cc: Greg Thelen <[email protected]> Cc: J. Bruce Fields <[email protected]> Cc: Jan Kara <[email protected]> Cc: Jerome Glisse <[email protected]> Cc: John Stultz <[email protected]> Cc: KAMEZAWA Hiroyuki <[email protected]> Cc: Kent Overstreet <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Marcelo Tosatti <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Steven Whitehouse <[email protected]> Cc: Thomas Hellstrom <[email protected]> Cc: Trond Myklebust <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Al Viro <[email protected]>