aboutsummaryrefslogtreecommitdiff
path: root/drivers/scsi/libsas
AgeCommit message (Collapse)AuthorFilesLines
2012-02-29[SCSI] libsas: improve debug statementsDan Williams2-32/+85
It's difficult to determine which domain_device is triggering error recovery, so convert messages like: sas: ex 5001b4da000e703f phy08:T attached: 5001b4da000e7028 sas: ex 5001b4da000e703f phy09:T attached: 5001b4da000e7029 ... ata7: sas eh calling libata port error handler ata8: sas eh calling libata port error handler ...into: sas: ex 5001517e85cfefff phy05:T:9 attached: 5001517e85cfefe5 (stp) sas: ex 5001517e3b0af0bf phy11:T:8 attached: 5001517e3b0af0ab (stp) ... sas: ata7: end_device-21:1: dev error handler sas: ata8: end_device-20:0:5: dev error handler which shows attached link rate, device type, and associates a domain_device with its ata_port id to correlate messages emitted from libata-eh. As Doug notes, we can also take the opportunity to clarify expander phy routing capabilities. [[email protected]: clarify table2table with 'U'] Signed-off-by: Dan Williams <[email protected]> Signed-off-by: James Bottomley <[email protected]>
2012-02-29[SCSI] libsas: kill spurious sas_put_deviceMaciej Trela1-2/+0
Holdover from a patch rework, prior to the addition of SAS_DEV_DESTROY we were holding a reference while the destruct was pending in case the domain was torn down before the desctruct event ran. That case is covered by SAS_DEV_DESTROY, and the sas_put_device() just corrupts freed memory, or worse frees the memory while another agent holds a reference. Signed-off-by: Maciej Trela <[email protected]> Signed-off-by: Dan Williams <[email protected]> Signed-off-by: James Bottomley <[email protected]>
2012-02-29[SCSI] libsas: fix sas_unregister_ports vs sas_drain_workDan Williams3-13/+25
We need to hold drain_mutex across the unregistration as port down events queue device removal as chained events, so we need to make sure no other drainers are active. [ 1118.673968] WARNING: at kernel/workqueue.c:996 __queue_work+0x11a/0x326() [ 1118.681982] Hardware name: S2600CP [ 1118.686193] Modules linked in: isci(-) libsas scsi_transport_sas nls_utf8 ipv6 uinput sg iTCO_wdt iTCO_vendor_support i2c_i801 i2c_core ioatdma dca sd_mod sr_mod cdrom ahci libahci libata [last unloaded: scsi_transport_sas] [ 1118.709893] Pid: 6831, comm: rmmod Not tainted 3.2.0-isci+ #1 [ 1118.716727] Call Trace: [ 1118.719867] [<ffffffff8103e9f5>] warn_slowpath_common+0x85/0x9d [ 1118.727000] [<ffffffff8103ea27>] warn_slowpath_null+0x1a/0x1c [ 1118.733942] [<ffffffff81056d44>] __queue_work+0x11a/0x326 [ 1118.740481] [<ffffffff81056f99>] queue_work_on+0x1b/0x22 [ 1118.746925] [<ffffffff81057106>] queue_work+0x37/0x3e [ 1118.753105] [<ffffffffa0120e05>] ? sas_discover_event+0x55/0x82 [libsas] [ 1118.761094] [<ffffffff813217c3>] scsi_queue_work+0x42/0x44 [ 1118.767717] [<ffffffffa0120e19>] sas_discover_event+0x69/0x82 [libsas] [ 1118.775509] [<ffffffffa0120f5b>] sas_unregister_dev+0xc3/0xcc [libsas] [ 1118.783319] [<ffffffffa0120fae>] sas_unregister_domain_devices+0x4a/0xc8 [libsas] [ 1118.792731] [<ffffffffa0120071>] sas_deform_port+0x60/0x1a6 [libsas] [ 1118.800339] [<ffffffffa01201ea>] sas_unregister_ports+0x33/0x44 [libsas] [ 1118.808342] [<ffffffffa011f7e5>] sas_unregister_ha+0x41/0x6b [libsas] [ 1118.816055] [<ffffffffa0134055>] isci_unregister+0x22/0x4d [isci] [ 1118.823384] [<ffffffffa0143040>] isci_pci_remove+0x2e/0x60 [isci] Reported-by: Jacek Danecki <[email protected]> Signed-off-by: Dan Williams <[email protected]> Signed-off-by: James Bottomley <[email protected]>
2012-02-29[SCSI] libsas: route local link resets through ata-ehDan Williams3-20/+37
Similar to the conversion of the transport-class reset we want bsg initiated resets to be managed by libata. Reported-by: Jacek Danecki <[email protected]> Signed-off-by: Dan Williams <[email protected]> Signed-off-by: James Bottomley <[email protected]>
2012-02-29[SCSI] libsas: fix mixed topology recoveryDan Williams2-12/+9
If we have a domain with sas and sata devices there may still be sas recovery actions to take after peeling off the commands to send to libata. Reported-by: Andrzej Jakowski <[email protected]> Signed-off-by: Dan Williams <[email protected]> Signed-off-by: James Bottomley <[email protected]>
2012-02-29[SCSI] libsas: close scsi_remove_target() vs libata-eh raceDan Williams3-3/+10
ata_port lifetime in libata follows the host. In libsas it follows the scsi_target. Once scsi_remove_device() has caused all commands to be completed it allows scsi_remove_target() to immediately proceed to freeing the ata_port causing bug reports like: [ 848.393333] BUG: spinlock bad magic on CPU#4, kworker/u:2/5107 [ 848.400262] general protection fault: 0000 [#1] SMP [ 848.406244] CPU 4 [ 848.408310] Modules linked in: nls_utf8 ipv6 uinput i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support ioatdma dca sg sd_mod sr_mod cdrom ahci libahci isci libsas libata scsi_transport_sas [last unloaded: scsi_wait_scan] [ 848.432060] [ 848.434137] Pid: 5107, comm: kworker/u:2 Not tainted 3.2.0-isci+ #8 Intel Corporation S2600CP/S2600CP [ 848.445310] RIP: 0010:[<ffffffff8126a68c>] [<ffffffff8126a68c>] spin_dump+0x5e/0x8c [ 848.454787] RSP: 0018:ffff8807f868dca0 EFLAGS: 00010002 [ 848.461137] RAX: 0000000000000048 RBX: ffff8807fe86a630 RCX: ffffffff817d0be0 [ 848.469520] RDX: 0000000000000000 RSI: ffffffff814af1cf RDI: 0000000000000002 [ 848.477959] RBP: ffff8807f868dcb0 R08: 00000000ffffffff R09: 000000006b6b6b6b [ 848.486327] R10: 000000000003fb8c R11: ffffffff81a19448 R12: 6b6b6b6b6b6b6b6b [ 848.494699] R13: ffff8808027dc520 R14: 0000000000000000 R15: 000000000000001e [ 848.503067] FS: 0000000000000000(0000) GS:ffff88083fd00000(0000) knlGS:0000000000000000 [ 848.512899] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 848.519710] CR2: 00007ff77d001000 CR3: 00000007f7a5d000 CR4: 00000000000406e0 [ 848.528072] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 848.536446] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 848.544831] Process kworker/u:2 (pid: 5107, threadinfo ffff8807f868c000, task ffff8807ff348000) [ 848.555327] Stack: [ 848.557959] ffff8807fe86a630 ffff8807fe86a630 ffff8807f868dcd0 ffffffff8126a6e0 [ 848.567072] ffffffff817c142f ffff8807fe86a630 ffff8807f868dcf0 ffffffff8126a703 [ 848.576190] ffff8808027dc520 0000000000000286 ffff8807f868dd10 ffffffff814af1bb [ 848.585281] Call Trace: [ 848.588409] [<ffffffff8126a6e0>] spin_bug+0x26/0x28 [ 848.594357] [<ffffffff8126a703>] do_raw_spin_unlock+0x21/0x88 [ 848.601283] [<ffffffff814af1bb>] _raw_spin_unlock_irqrestore+0x2c/0x65 [ 848.609089] [<ffffffffa001c103>] ata_scsi_port_error_handler+0x548/0x557 [libata] [ 848.618331] [<ffffffff81061813>] ? async_schedule+0x17/0x17 [ 848.625060] [<ffffffffa004f30f>] async_sas_ata_eh+0x45/0x69 [libsas] [ 848.632655] [<ffffffff810618aa>] async_run_entry_fn+0x97/0x125 [ 848.639670] [<ffffffff81057439>] process_one_work+0x207/0x38d [ 848.646577] [<ffffffff8105738c>] ? process_one_work+0x15a/0x38d [ 848.653681] [<ffffffff810576f7>] worker_thread+0x138/0x21c [ 848.660305] [<ffffffff810575bf>] ? process_one_work+0x38d/0x38d [ 848.667493] [<ffffffff8105b098>] kthread+0x9d/0xa5 [ 848.673382] [<ffffffff8106e1bd>] ? trace_hardirqs_on_caller+0x12f/0x166 [ 848.681304] [<ffffffff814b7704>] kernel_thread_helper+0x4/0x10 [ 848.688324] [<ffffffff814af534>] ? retint_restore_args+0x13/0x13 [ 848.695530] [<ffffffff8105affb>] ? __init_kthread_worker+0x5b/0x5b [ 848.702929] [<ffffffff814b7700>] ? gs_change+0x13/0x13 [ 848.709155] Code: 00 00 48 8d 88 38 04 00 00 44 8b 80 84 02 00 00 31 c0 e8 cf 1b 24 00 41 83 c8 ff 44 8b 4b 08 48 c7 c1 e0 0b 7d 81 4d 85 e4 74 10 <45> 8b 84 24 84 02 00 00 49 8d 8c 24 38 04 00 00 8b 53 04 48 89 [ 848.732467] RIP [<ffffffff8126a68c>] spin_dump+0x5e/0x8c [ 848.738905] RSP <ffff8807f868dca0> [ 848.743743] ---[ end trace 143161646eee8caa ]--- ...so arrange for the ata_port to have the same end of life as the domain device. Reported-by: Marcin Tomczak <[email protected]> Acked-by: Jeff Garzik <[email protected]> Signed-off-by: Dan Williams <[email protected]> Signed-off-by: James Bottomley <[email protected]>
2012-02-29[SCSI] libsas: mark all domain devices gone if root port disappearsDan Williams2-5/+7
If the top level expander is hot removed, mark all child devices as gone before unregistration to short circuit futile recovery. Signed-off-by: Dan Williams <[email protected]> Signed-off-by: James Bottomley <[email protected]>
2012-02-29[SCSI] libsas: pre-clean commands that won the eh vs completion raceDan Williams1-9/+16
When scrolling forward through the eh list (in a clear_q scenario) it is possible to encounter commands that won the completion vs eh race. Rather than sprinkle more "if (!task)" throughout the handler just make a pass through the list and delete the race winners before handling the rest. Signed-off-by: Dan Williams <[email protected]> Signed-off-by: James Bottomley <[email protected]>
2012-02-29[SCSI] isci: stop interpreting ->lldd_lu_reset() as an ata soft-resetDan Williams1-0/+2
Driving resets from libsas-eh is pre-mature as libata will make a decision about performing a softreset. Currently libata determines whether to perform a softreset based on ata_eh_followup_srst_needed(), and none of those conditions apply to isci. Remove the srst implementation and translate ->lldd_lu_reset() for ata devices as a request to drive a reset via libata-eh. Signed-off-by: Dan Williams <[email protected]> Signed-off-by: James Bottomley <[email protected]>
2012-02-29[SCSI] libsas: don't recover 'gone' devices in sas_ata_hard_reset()Dan Williams1-0/+3
The commands that timeout when a disk is forcibly removed may trigger libata to attempt recovery of the device. If libsas has decided to remove the device don't permit ata to continue to issue resets to its last known phy. The primary motivation for this patch is hotplug testing by writing 0 to /sys/class/sas_phy/phyX/enable. Without this check this test leads to libata issuing a reset and re-enabling the device that wants to be torn down. Signed-off-by: Dan Williams <[email protected]> Signed-off-by: James Bottomley <[email protected]>
2012-02-29[SCSI] libsas: fix sas_find_local_phy(), take phy referencesDan Williams6-27/+55
In the direct-attached case this routine returns the phy on which this device was first discovered. Which is broken if we want to support wide-targets, as this phy reference can become stale even though the port is still active. In the expander-attached case this routine tries to lookup the phy by scanning the attached sas addresses of the parent expander, and BUG_ONs if it can't find it. However since eh and the libsas workqueue run independently we can still be attempting device recovery via eh after libsas has recorded the device as detached. This is even easier to hit now that eh is blocked while device domain rediscovery takes place, and that libata is fed more timed out commands increasing the chances that it will try to recover the ata device. Arrange for dev->phy to always point to a last known good phy, it may be stale after the port is torn down, but it will catch up for wide port reconfigurations, and never be NULL. Signed-off-by: Dan Williams <[email protected]> Signed-off-by: James Bottomley <[email protected]>
2012-02-29[SCSI] libsas: check for 'gone' expanders in smp_execute_task()Dan Williams1-0/+5
No sense in issuing or retrying commands to an expander that has been removed. Signed-off-by: Dan Williams <[email protected]> Signed-off-by: James Bottomley <[email protected]>
2012-02-29[SCSI] libsas: don't mark expanders as gone when a child device is removedDan Williams1-1/+0
Commit 56dd2c06 "[SCSI] libsas: Don't issue commands to devices that have been hot-removed" marked the parent device of an end-device as gone when all the phys to the end device have been deleted. The expander device is still present until its parent is removed. This is a benign change until the smp_execute_task() path is taught to check ->gone. Signed-off-by: Dan Williams <[email protected]> Signed-off-by: James Bottomley <[email protected]>
2012-02-29[SCSI] libsas: poll for ata device readiness after resetDan Williams3-35/+82
Use ata_wait_after_reset() to poll for link recovery after a reset. This combined with sas_ha->eh_mutex prevents expander rediscovery from probing phys in an intermediate state. Local discovery does not have a mechanism to filter link status changes during this timeout, so it remains the responsibility of lldds to prevent premature port teardown. Although once all lldd's support ->lldd_ata_check_ready() that could be used as a gate to local port teardown. The signature fis is re-transmitted when the link comes back so we should be revalidating the ata device class, but that is left to a future patch. Signed-off-by: Dan Williams <[email protected]> Signed-off-by: James Bottomley <[email protected]>
2012-02-19[SCSI] libsas: async ata-ehDan Williams1-3/+14
Once sas_ata_hard_reset() starts honoring the 'deadline' parameter a pathological configuration could take 25 seconds per ata device (serialized) to recover. Run per-port recoveries in parallel. Signed-off-by: Dan Williams <[email protected]> Signed-off-by: James Bottomley <[email protected]>
2012-02-19[SCSI] libsas: add mutex for SMP task executionJeff Skirvin2-28/+32
SAS does not tag SMP requests, and at least one lldd (isci) does not permit more than one in-flight request at a time. [jejb: fix sas_init_dev tab issues while we're at it] Signed-off-by: Jeff Skirvin <[email protected]> Signed-off-by: Dan Williams <[email protected]> Signed-off-by: James Bottomley <[email protected]>
2012-02-19[SCSI] libsas: Remove redundant phy state notification calls.Jeff Skirvin1-4/+1
In the case of an explicit sas_phy_enable call to disable a phy, the LLDD provides the calls to sas_phy_disconnected and the PHYE_LOSS_OF_SIGNAL event. NOTE: This assumes that the lldd(s) generate the notification, which appears to be the case, but only verfied on isci. Signed-off-by: Jeff Skirvin <[email protected]> Signed-off-by: Dan Williams <[email protected]> Signed-off-by: James Bottomley <[email protected]>
2012-02-19[SCSI] libsas: sas_phy_enable via transport_sas_phy_resetDan Williams3-9/+52
Execute the link-reset triggered by sas_phy_enable via transport_sas_phy_reset so that it can be managed by libata. Signed-off-by: Dan Williams <[email protected]> Signed-off-by: James Bottomley <[email protected]>
2012-02-19[SCSI] libsas: execute transport link resets with libata-eh via host workqueueDan Williams4-2/+68
Link resets leave ata affiliations intact, so arrange for libsas to make an effort to avoid dropping the device due to a slow-to-recover link. Towards this end carry out reset in the host workqueue so that it can check for ata devices and kick the reset request to libata. Hard resets, in contrast, bypass libata since they are meant for associating an ata device with another initiator in the domain (tears down affiliations). Need to add a new transport_sas_phy_reset() since the current sas_phy_reset() is a utility function to libsas lldds. They are not prepared for it to loop back into eh. Signed-off-by: Dan Williams <[email protected]> Signed-off-by: James Bottomley <[email protected]>
2012-02-19[SCSI] libsas: perform sas-transport resets in shost->workq contextDan Williams3-2/+69
Extend the sas transport class to allow transport users to attach extra data to a sas_phy (->hostdata). Use this area in libsas to move resets to workq context in preparation for scheduling ata device resets through libata-eh. Signed-off-by: Dan Williams <[email protected]> Signed-off-by: James Bottomley <[email protected]>
2012-02-19[SCSI] libsas: use libata-eh-reset for sata rediscovery fis transmit failuresDan Williams2-5/+58
Since sata devices can take several seconds to recover the link on reset the 0.5 seconds that libsas currently waits may not be enough. Instead if we are rediscovering a phy that was previously attached to a sata device let libata handle any resets to encourage the device to transmit the initial fis. Once sas_ata_hard_reset() and lldds learn how to honor 'deadline' libsas should stop encountering phys in an intermediate state, until then this will loop until the fis is transmitted or ->attached_sas_addr gets cleared, but in the more likely initial discovery case we keep existing behavior. Signed-off-by: Dan Williams <[email protected]> Signed-off-by: James Bottomley <[email protected]>
2012-02-19[SCSI] libsas: defer SAS_TASK_NEED_DEV_RESET commands to libataDan Williams2-12/+6
lldds use the SAS_TASK_NEED_DEV_RESET interface to request that eh perform a reset. In the sata device case defer the commands that triggered the reset to libata-eh context so it can perform its pre and post reset management. In the sas_ata_post_internal() case the reset request is falling on deaf ears as the sas_task is immediately destroyed without any reset action. Since it is currently a nop, and likely superfluous given the conversion to new-style libata-eh, just drop the request. Signed-off-by: Dan Williams <[email protected]> Signed-off-by: James Bottomley <[email protected]>
2012-02-19[SCSI] libsas: let libata handle command timeoutsDan Williams2-2/+21
libsas-eh if it successfully aborts an ata command will hide the timeout condition (AC_ERR_TIMEOUT) from libata. The command likely completes with the all-zero task->task_status it started with. Instead, interpret a TMF_RESP_FUNC_COMPLETE as the end of the sas_task but keep the scmd around for libata-eh to handle. Tested-by: Andrzej Jakowski <[email protected]> Signed-off-by: Dan Williams <[email protected]> Signed-off-by: James Bottomley <[email protected]>
2012-02-19[SCSI] libsas: fix timeout vs completion raceDan Williams3-75/+65
Until we have told the lldd to forget a task a timed out operation can return from the hardware at any time. Since completion frees the task we need to make sure that no tasks run their normal completion handler once eh has decided to manage the task. Similar to ata_scsi_cmd_error_handler() freeze completions to let eh judge the outcome of the race. Task collector mode is problematic because it presents a situation where a task can be timed out and aborted before the lldd has even seen it. For this case we need to guarantee that a task that an lldd has been told to forget does not get queued after the lldd says "never seen it". With sas_scsi_timed_out we achieve this with the ->task_queue_flush mutex, rather than adding more time. Signed-off-by: Dan Williams <[email protected]> Signed-off-by: James Bottomley <[email protected]>
2012-02-19[SCSI] libsas: prevent double completion of scmds from ehDan Williams1-28/+33
We invoke task->task_done() to free the task in the eh case, but at this point we are prepared for scsi_eh_flush_done_q() to finish off the scmd. Introduce sas_end_task() to capture the final response status from the lldd and free the task. Also take the opportunity to kill this warning. drivers/scsi/libsas/sas_scsi_host.c: In function ‘sas_end_task’: drivers/scsi/libsas/sas_scsi_host.c:102:3: warning: case value ‘2’ not in enumerated type ‘enum exec_status’ [-Wswitch] Signed-off-by: Dan Williams <[email protected]> Signed-off-by: James Bottomley <[email protected]>
2012-02-19[SCSI] libsas: close error handling vs sas_ata_task_done() raceDan Williams2-53/+75
Since sas_ata does not implement ->freeze(), completions for scmds and internal commands can still arrive concurrent with ata_scsi_cmd_error_handler() and sas_ata_post_internal() respectively. By the time either of those is called libata has committed to completing the qc, and the ATA_PFLAG_FROZEN flag tells sas_ata_task_done() it has lost the race. In the sas_ata_post_internal() case we take on the additional responsibility of freeing the sas_task to close the race with sas_ata_task_done() freeing the the task while sas_ata_post_internal() is in the process of invoking ->lldd_abort_task(). Signed-off-by: Dan Williams <[email protected]> Signed-off-by: James Bottomley <[email protected]>
2012-02-19[SCSI] libsas: kill invocation of scsi_eh_finish_cmd from sas_ata_task_doneDan Williams1-14/+0
Prior to the conversion to the new-style libata-eh sas_ata_task_done() may have been the last opportunity to clean up the scmd, but now libata-eh explicitly handles this case. It also races against sas-eh. If a lldd completes a task after SAS_TASK_STATE_ABORTED is set it could trigger a spurious decrement of shost->host_failed. Current lldds have the band-aid of checking SAS_TASK_STATE_ABORTED before calling ->task_done(), but better to just let the scmds escalate to libata for race free cleanup. Signed-off-by: Dan Williams <[email protected]> Signed-off-by: James Bottomley <[email protected]>
2012-02-19[SCSI] libsas: use ->set_dmamode to notify lldds of NCQ parametersDan Williams2-309/+20
sas_discover_sata() notifies lldds of sata devices twice. Once to allow the 'identify' to be sent, and a second time to allow aic94xx (the only libsas driver that cares about sata_dev.identify) to setup NCQ parameters before the device becomes known to the midlayer. Replace this double notification and intervening 'identify' with an explicit ->lldd_ata_set_dmamode notification. With this change all ata internal commands are issued by libata, so we no longer need sas_issue_ata_cmd(). The data from the identify command only needs to be cached in one location so ata_device.id replaces domain_device.sata_dev.identify. Signed-off-by: Dan Williams <[email protected]> Signed-off-by: James Bottomley <[email protected]>
2012-02-19[SCSI] libsas: prevent domain rediscovery competing with ata error handlingDan Williams7-13/+143
libata error handling provides for a timeout for link recovery. libsas must not rescan for previously known devices in this interval otherwise it may remove a device that is simply waiting for its link to recover. Let libata-eh make the determination of when the link is stable and prevent libsas (host workqueue) from taking action while this determination is pending. Using a mutex (ha->disco_mutex) to flush and disable revalidation while eh is running requires any discovery action that may block on eh be moved to its own context outside the lock. Probing ATA devices explicitly waits on ata-eh and the cache-flush-io issued during device removal may also pend awaiting eh completion. Essentially any rphy add/remove activity needs to run outside the lock. This adds two new cleanup states for sas_unregister_domain_devices() 'allocated-but-not-probed', and 'flagged-for-destruction'. In the 'allocated-but-not-probed' state dev->rphy points to a rphy that is known to have not been through a sas_rphy_add() event. At domain teardown check if this device is still pending probe and cleanup accordingly. Similarly if a device has already been queued for removal then sas_unregister_domain_devices has nothing to do. Signed-off-by: Dan Williams <[email protected]> Signed-off-by: James Bottomley <[email protected]>
2012-02-19[SCSI] libsas: convert dev->gone to flagsDan Williams4-6/+6
In preparation for adding tracking of another device state "destroy". Signed-off-by: Dan Williams <[email protected]> Signed-off-by: James Bottomley <[email protected]>
2012-02-19[SCSI] libsas: remove ata_port.lock management duties from llddsDan Williams2-17/+25
Each libsas driver (mvsas, pm8001, and isci) has invented a different method for managing the ap->lock. The lock is held by the ata ->queuecommand() path. mvsas drops it prior to acquiring any internal locks which allows it to hold its internal lock across calls to task->task_done(). This capability is important as it is the only way the driver can flush task->task_done() instances to guarantee that it no longer has any in-flight references to a domain_device at ->lldd_dev_gone() time. Signed-off-by: Dan Williams <[email protected]> Signed-off-by: James Bottomley <[email protected]>
2012-02-19[SCSI] libsas: introduce sas_drain_work()Dan Williams4-18/+81
When an lldd invokes ->notify_port_event() it can trigger a chain of libsas events to: 1/ form the port and find the direct attached device 2/ if the attached device is an expander perform domain discovery A call to flush_workqueue() will only flush the initial port formation work. Currently libsas users need to call scsi_flush_work() up to the max depth of chain (which will grow from 2 to 3 when ata discovery is moved to its own discovery event). Instead of open coding multiple calls switch to use drain_workqueue() to flush sas work. drain_workqueue() does not handle new work submitted during the drain so libsas needs a bit of infrastructure to hold off unchained work submissions while a drain is in flight. A lldd ->notify() event is considered 'unchained' while a sas_discover_event() is 'chained'. As Tejun notes: "For now, I think it would be best to add private wrapper in libsas to support deferring unchained work items while draining." Signed-off-by: Dan Williams <[email protected]> Signed-off-by: James Bottomley <[email protected]>
2012-02-19[SCSI] libsas: convert ha->state to flagsDan Williams2-3/+3
In preparation for adding new states (SAS_HA_DRAINING, SAS_HA_FROZEN), convert ha->state into a set of flags. Signed-off-by: Dan Williams <[email protected]> Signed-off-by: James Bottomley <[email protected]>
2012-02-19[SCSI] libsas: replace event locks with atomic bitopsDan Williams6-57/+23
The locks only served to make sure the pending event bitmask was updated consistently. Signed-off-by: Dan Williams <[email protected]> Signed-off-by: James Bottomley <[email protected]>
2012-02-19[SCSI] libsas: fix leak of dev->sata_dev.identify_[packet_]deviceDan Williams1-0/+6
These are never freed in the nominal path. A domain_device has a different lifetime than a sas_rphy we need a dev->rphy independent way of identifying sata devices. Reviewed-by: Jack Wang <[email protected]> Signed-off-by: Dan Williams <[email protected]> Signed-off-by: James Bottomley <[email protected]>
2012-02-19[SCSI] libsas: fix domain_device leakDan Williams4-26/+55
Arrange for the deallocation of a struct domain_device object when it no longer has: 1/ any children 2/ references by any scsi_targets 3/ references by a lldd The comment about domain_device lifetime in Documentation/scsi/libsas.txt is stale as it appears mainline never had a version of a struct domain_device that was registered as a kobject. We now manage domain_device reference counts on behalf of external agents. Reviewed-by: Jack Wang <[email protected]> Signed-off-by: Dan Williams <[email protected]> Signed-off-by: James Bottomley <[email protected]>
2012-02-19[SCSI] libsas: kill sas_slave_destroyDan Williams1-9/+0
Per commit 3e4ec344 "libata: kill ATA_FLAG_DISABLED" needing to set ATA_DEV_NONE is a holdover from before libsas converted to the "new-style" ata-eh. Signed-off-by: Dan Williams <[email protected]> Signed-off-by: James Bottomley <[email protected]>
2012-02-19[SCSI] libsas: remove unused ata_task_resp fieldsDan Williams1-4/+0
Commit 1e34c838 "[SCSI] libsas: remove spurious sata control register read/write" removed the routines to fake the presence of the sata control registers, now remove the unused data structure fields to kill any remaining confusion. Acked-by: Jack Wang <[email protected]> Signed-off-by: Dan Williams <[email protected]> Signed-off-by: James Bottomley <[email protected]>
2011-10-31scsi: Add export.h for EXPORT_SYMBOL/THIS_MODULE as requiredPaul Gortmaker3-0/+3
For the basic SCSI infrastructure files that are exporting symbols but not modules themselves, add in the basic export.h header file to allow the exports. Signed-off-by: Paul Gortmaker <[email protected]>
2011-10-28Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6Linus Torvalds5-107/+234
* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (204 commits) [SCSI] qla4xxx: export address/port of connection (fix udev disk names) [SCSI] ipr: Fix BUG on adapter dump timeout [SCSI] megaraid_sas: Fix instance access in megasas_reset_timer [SCSI] hpsa: change confusing message to be more clear [SCSI] iscsi class: fix vlan configuration [SCSI] qla4xxx: fix data alignment and use nl helpers [SCSI] iscsi class: fix link local mispelling [SCSI] iscsi class: Replace iscsi_get_next_target_id with IDA [SCSI] aacraid: use lower snprintf() limit [SCSI] lpfc 8.3.27: Change driver version to 8.3.27 [SCSI] lpfc 8.3.27: T10 additions for SLI4 [SCSI] lpfc 8.3.27: Fix queue allocation failure recovery [SCSI] lpfc 8.3.27: Change algorithm for getting physical port name [SCSI] lpfc 8.3.27: Changed worst case mailbox timeout [SCSI] lpfc 8.3.27: Miscellanous logic and interface fixes [SCSI] megaraid_sas: Changelog and version update [SCSI] megaraid_sas: Add driver workaround for PERC5/1068 kdump kernel panic [SCSI] megaraid_sas: Add multiple MSI-X vector/multiple reply queue support [SCSI] megaraid_sas: Add support for MegaRAID 9360/9380 12GB/s controllers [SCSI] megaraid_sas: Clear FUSION_IN_RESET before enabling interrupts ...
2011-10-16[SCSI] libsas: fix port->dev_list lockingDan Williams2-11/+17
port->dev_list maintains a list of devices attached to a given sas root port. It needs to be mutated under a lock as contexts outside of the single-threaded-libsas-workqueue access the list via sas_find_dev_by_rphy(). Fixup locations where the list was being mutated without a lock. This is a follow-up to commit 5911e963 "[SCSI] libsas: remove expander from dev list on error", where Luben noted [1]: > 2/ We have unlocked list manipulations in sas_ex_discover_end_dev(), > sas_unregister_common_dev(), and sas_ex_discover_end_dev() Yes, I can see that and that is very unfortunate. [1]: http://marc.info/?l=linux-scsi&m=131480962006471&w=2 Signed-off-by: Dan Williams <[email protected]> Signed-off-by: James Bottomley <[email protected]>
2011-10-02[SCSI] libsas: fix panic when single phy is disabled on a wide portMark Salyzyn1-4/+6
When a wide port is being utilized to a target, if one disables only one of the phys, we get an OS crash: BUG: unable to handle kernel NULL pointer dereference at 0000000000000238 IP: [<ffffffff814ca9b1>] mutex_lock+0x21/0x50 PGD 4103f5067 PUD 41dba9067 PMD 0 Oops: 0002 [#1] SMP last sysfs file: /sys/bus/pci/slots/5/address CPU 0 Modules linked in: pm8001(U) ses enclosure fuse nfsd exportfs autofs4 ipmi_devintf ipmi_si ipmi_msghandler nfs lockd fscache nfs_acl auth_rpcgss 8021q fcoe libfcoe garp libfc scsi_transport_fc stp scsi_tgt llc sunrpc cpufreq_ondemand acpi_cpufreq freq_table ipv6 sr_mod cdrom dm_mirror dm_region_hash dm_log uinput sg i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support e1000e mlx4_ib ib_mad ib_core mlx4_en mlx4_core ext3 jbd mbcache sd_mod crc_t10dif usb_storage ata_generic pata_acpi ata_piix libsas(U) scsi_transport_sas dm_mod [last unloaded: pm8001] Modules linked in: pm8001(U) ses enclosure fuse nfsd exportfs autofs4 ipmi_devintf ipmi_si ipmi_msghandler nfs lockd fscache nfs_acl auth_rpcgss 8021q fcoe libfcoe garp libfc scsi_transport_fc stp scsi_tgt llc sunrpc cpufreq_ondemand acpi_cpufreq freq_table ipv6 sr_mod cdrom dm_mirror dm_region_hash dm_log uinput sg i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support e1000e mlx4_ib ib_mad ib_core mlx4_en mlx4_core ext3 jbd mbcache sd_mod crc_t10dif usb_storage ata_generic pata_acpi ata_piix libsas(U) scsi_transport_sas dm_mod [last unloaded: pm8001] Pid: 5146, comm: scsi_wq_5 Not tainted 2.6.32-71.29.1.el6.lustre.7.x86_64 #1 Storage Server RIP: 0010:[<ffffffff814ca9b1>] [<ffffffff814ca9b1>] mutex_lock+0x21/0x50 RSP: 0018:ffff8803e4e33d30 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000000238 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffff8803e664c800 RDI: 0000000000000238 RBP: ffff8803e4e33d40 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000 R13: 0000000000000238 R14: ffff88041acb7200 R15: ffff88041c51ada0 FS: 0000000000000000(0000) GS:ffff880028200000(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 0000000000000238 CR3: 0000000410143000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process scsi_wq_5 (pid: 5146, threadinfo ffff8803e4e32000, task ffff8803e4e294a0) Stack: ffff8803e664c800 0000000000000000 ffff8803e4e33d70 ffffffffa001f06e <0> ffff8803e4e33d60 ffff88041c51ada0 ffff88041acb7200 ffff88041bc0aa00 <0> ffff8803e4e33d90 ffffffffa0032b6c 0000000000000014 ffff88041acb7200 Call Trace: [<ffffffffa001f06e>] sas_port_delete_phy+0x2e/0xa0 [scsi_transport_sas] [<ffffffffa0032b6c>] sas_unregister_devs_sas_addr+0xac/0xe0 [libsas] [<ffffffffa0034914>] sas_ex_revalidate_domain+0x204/0x330 [libsas] [<ffffffffa00307f0>] ? sas_revalidate_domain+0x0/0x90 [libsas] [<ffffffffa0030855>] sas_revalidate_domain+0x65/0x90 [libsas] [<ffffffff8108c7d0>] worker_thread+0x170/0x2a0 [<ffffffff81091ea0>] ? autoremove_wake_function+0x0/0x40 [<ffffffff8108c660>] ? worker_thread+0x0/0x2a0 [<ffffffff81091b36>] kthread+0x96/0xa0 [<ffffffff810141ca>] child_rip+0xa/0x20 [<ffffffff81091aa0>] ? kthread+0x0/0xa0 [<ffffffff810141c0>] ? child_rip+0x0/0x20 Code: ff ff 85 c0 75 ed eb d6 66 90 55 48 89 e5 48 83 ec 10 48 89 1c 24 4c 89 64 24 08 0f 1f 44 00 00 48 89 fb e8 92 f4 ff ff 48 89 df <f0> ff 0f 79 05 e8 25 00 00 00 65 48 8b 04 25 08 cc 00 00 48 2d RIP [<ffffffff814ca9b1>] mutex_lock+0x21/0x50 RSP <ffff8803e4e33d30> CR2: 0000000000000238 The following patch is admittedly a band-aid, and does not solve the root cause, but it still is a good candidate for hardening as a pointer check before reference. Signed-off-by: Mark Salyzyn <[email protected]> Tested-by: Jack Wang <[email protected]> Cc: [email protected] Signed-off-by: James Bottomley <[email protected]>
2011-10-02[SCSI] isci: export phy events via ->lldd_control_phy()Dan Williams1-4/+9
Allow the sas-transport-class to update events for local phys via a new PHY_FUNC_GET_EVENTS command to ->lldd_control_phy(). Fixup drivers that are not prepared for new enum phy_func values, and unify ->lldd_control_phy() error codes. These are the SAS defined phy events that are reported in a smp-report-phy-error-log command: * /sys/class/sas_phy/<phyX>/invalid_dword_count * /sys/class/sas_phy/<phyX>/running_disparity_error_count * /sys/class/sas_phy/<phyX>/loss_of_dword_sync_count * /sys/class/sas_phy/<phyX>/phy_reset_problem_count Signed-off-by: Dan Williams <[email protected]> Signed-off-by: James Bottomley <[email protected]>
2011-10-02[SCSI] isci: atapi supportDan Williams1-1/+1
Based on original implementation from Jiangbi Liu and Maciej Trela. ATAPI transfers happen in two-to-three stages. The two stage atapi commands are those that include a dma data transfer. The data transfer portion of these operations is handled by the hardware packet-dma acceleration. The three-stage commands do not have a data transfer and are handled without hardware assistance in raw frame mode. stage1: transmit host-to-device fis to notify the device of an incoming atapi cdb. Upon reception of the pio-setup-fis repost the task_context to perform the dma transfer of the cdb+data (go to stage3), or repost the task_context to transmit the cdb as a raw frame (go to stage 2). stage2: wait for hardware notification of the cdb transmission and then go to stage 3. stage3: wait for the arrival of the terminating device-to-host fis and terminate the command. To keep the implementation simple we only support ATAPI packet-dma protocol (for commands with data) to avoid needing to handle the data transfer manually (like we do for SATA-PIO). This may affect compatibility for a small number of devices (see ATA_HORKAGE_ATAPI_MOD16_DMA). If the data-transfer underruns, or encounters an error the device-to-host fis is expected to arrive in the unsolicited frame queue to pass to libata for disposition. However, in the DONE_UNEXP_FIS (data underrun) case it appears we need to craft a response. In the DONE_REG_ERR case we do receive the UF and propagate it to libsas. Signed-off-by: Maciej Trela <[email protected]> Signed-off-by: Dan Williams <[email protected]> Signed-off-by: James Bottomley <[email protected]>
2011-10-02[SCSI] libsas: set sas_address and device type of rphyJack Wang1-0/+2
Libsas forget to set the sas_address and device type of rphy lead to file under /sys/class/sas_x show wrong value, fix that. Signed-off-by: Jack Wang <[email protected]> Tested-by: Crystal Yu <[email protected]> Cc: [email protected] Signed-off-by: James Bottomley <[email protected]>
2011-10-02[SCSI] libsas: dynamic queue depthDan Williams1-21/+18
The queue-depth for libsas-attached devices initializes to 32 and can only be increased manually via sysfs to a max of 64, while mpt2sas attached devices initialize to 254 and dynamically float via the midlayer ->change_queue_depth interface. No performance regression was observed with this change on the isci driver. Tested-by: Dave Jiang <[email protected]> Signed-off-by: Dan Williams <[email protected]> Signed-off-by: James Bottomley <[email protected]>
2011-10-02[SCSI] libsas,libata: fix ->change_queue_{depth|type} for sata devicesDan Williams1-1/+10
Pass queue_depth change requests to libata, and prevent queue_type changes for ATA devices. Otherwise: 1/ we do not honor the libata specific restrictions on the queue depth 2/ libsas drivers that do not set sdev->tagged_supported are unable to change the queue_depth of ata devices via sysfs Signed-off-by: Dan Williams <[email protected]> Acked-by: Jeff Garzik <[email protected]> Signed-off-by: James Bottomley <[email protected]>
2011-10-02[SCSI] libsas: Allow expander T-T attachmentsLuben Tuikov1-6/+14
Allow expander table-to-table attachments for expanders that support it. Signed-off-by: Luben Tuikov <[email protected]> Signed-off-by: James Bottomley <[email protected]>
2011-09-22[SCSI] libsas: sgpio write supportDan Williams1-2/+101
Add SFF-8485 v0.7 / SAS-1 smp-write-gpio register support to libsas. Defer SAS-2 support unless/until it defines an sgpio interface. Minimum implementation needed to get the lights blinking. try_test_sas_gpio_gp_bit() provides a common method to parse the incoming write data (raw bitstream), and the to_sas_gpio_gp_bit() helper routine can be used as a basis for the set/clear operations for the 'read' implementation. Host implementations parse as many bits (ODx.[012]) as are locally supported and report the number of registers successfully written. If the submitted data overruns the internal number of registers available report the write as a success with the number of bytes remaining reported in ->resid_len. Example (assuming an active backplane) set the "identify" pattern for the first 21 devices: smp_write_gpio --count=2 --data=92,49,24,92,24,92,49,24 -t 4 --index=1 /dev/bsg/sas_hostX Signed-off-by: Dan Williams <[email protected]> Signed-off-by: James Bottomley <[email protected]>
2011-09-22[SCSI] libsas: fix failure to revalidate domain for anything but the first ↵Mark Salyzyn1-1/+1
expander child. In an enclosure model where there are chaining expanders to a large body of storage, it was discovered that libsas, responding to a broadcast event change, would only revalidate the domain of first child expander in the list. The issue is that the pointer value to the discovered source device was used to break out of the loop, rather than the content of the pointer. This still remains non-compliant as the revalidate domain code is supposed to loop through all child expanders, and not stop at the first one it finds that reports a change count. However, the design of this routine does not allow multiple device discoveries and that would be a more complicated set of patches reserved for another day. We are fixing the glaring bug rather than refactoring the code. Signed-off-by: Mark Salyzyn <[email protected]> Cc: [email protected] Signed-off-by: James Bottomley <[email protected]>