aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2022-10-22scsi: ufs: Fix a deadlock between PM and the SCSI error handlerBart Van Assche1-0/+24
The following deadlock has been observed on multiple test setups: * ufshcd_wl_suspend() is waiting for blk_execute_rq(START STOP UNIT) to complete while ufshcd_wl_suspend() holds host_sem. * The SCSI error handler is activated, changes the host state to SHOST_RECOVERY, ufshcd_eh_host_reset_handler() and ufshcd_err_handler() are called and the latter function tries to obtain host_sem. This is a deadlock because blk_execute_rq() can't execute SCSI commands while the host is in the SHOST_RECOVERY state and because the error handler cannot make progress because host_sem is held by another thread. Fix this deadlock as follows: * Fail attempts to suspend the system while the SCSI error handler is in progress by setting the SCMD_FAIL_IF_RECOVERING flag for START STOP UNIT commands. * If the system is suspending and a START STOP UNIT command times out, handle the SCSI command timeout from inside the context of the SCSI timeout handler instead of activating the SCSI error handler. The runtime power management code is not affected by this deadlock since hba->host_sem is not touched by the runtime power management functions in the UFS driver. Reviewed-by: Adrian Hunter <[email protected]> Signed-off-by: Bart Van Assche <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Martin K. Petersen <[email protected]>
2022-10-22scsi: ufs: Introduce the function ufshcd_execute_start_stop()Bart Van Assche1-5/+34
Open-code scsi_execute() because a later patch will modify scmd->flags and because scsi_execute() does not support setting scmd->flags. No functionality is changed. Signed-off-by: Bart Van Assche <[email protected]> Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Mike Christie <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]>
2022-10-22scsi: ufs: Track system suspend / resume activityBart Van Assche2-1/+6
Add a new boolean variable that tracks whether the system is suspending, suspended or resuming. This information will be used in a later commit to fix a deadlock between the SCSI error handler and the suspend code. Reviewed-by: Adrian Hunter <[email protected]> Signed-off-by: Bart Van Assche <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Martin K. Petersen <[email protected]>
2022-10-22scsi: ufs: Try harder to change the power modeBart Van Assche1-3/+5
Instead of only retrying the START STOP UNIT command if a unit attention is reported, repeat it if any SCSI error is reported by the device or if the command timed out. Signed-off-by: Bart Van Assche <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Martin K. Petersen <[email protected]>
2022-10-22scsi: ufs: Reduce the START STOP UNIT timeoutBart Van Assche1-8/+1
Reduce the START STOP UNIT command timeout to one second since on Android devices a kernel panic is triggered if an attempt to suspend the system takes more than 20 seconds. One second should be enough for the START STOP UNIT command since this command completes in less than a millisecond for the UFS devices I have access to. Signed-off-by: Bart Van Assche <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Martin K. Petersen <[email protected]>
2022-10-22scsi: ufs: Use 'else' in ufshcd_set_dev_pwr_mode()Bart Van Assche1-3/+2
Convert if (ret) { ... } if (!ret) { ... } into if (ret) { ... } else { ... }. Reviewed-by: Bean Huo <[email protected]> Reviewed-by: Adrian Hunter <[email protected]> Signed-off-by: Bart Van Assche <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Martin K. Petersen <[email protected]>
2022-10-22scsi: ufs: Remove an outdated commentBart Van Assche1-1/+0
Although the host lock had to be held by ufshcd_clk_scaling_start_busy() callers when that function was introduced, that is no longer the case today. Hence remove the comment that claims that callers of this function must hold the host lock. Reviewed-by: Bean Huo <[email protected]> Reviewed-by: Adrian Hunter <[email protected]> Signed-off-by: Bart Van Assche <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Martin K. Petersen <[email protected]>
2022-10-22scsi: core: Support failing requests while recoveringBart Van Assche2-4/+7
The current behavior for SCSI commands submitted while error recovery is ongoing is to retry command submission after error recovery has finished. See also the scsi_host_in_recovery() check in scsi_host_queue_ready(). Add support for failing SCSI commands while host recovery is in progress. This functionality will be used to fix a deadlock in the UFS driver. Cc: Christoph Hellwig <[email protected]> Cc: Ming Lei <[email protected]> Cc: John Garry <[email protected]> Cc: Mike Christie <[email protected]> Cc: Hannes Reinecke <[email protected]> Signed-off-by: Bart Van Assche <[email protected]> Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Mike Christie <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]>
2022-10-22scsi: core: Change the return type of .eh_timed_out()Bart Van Assche15-59/+77
Commit 6600593cbd93 ("block: rename BLK_EH_NOT_HANDLED to BLK_EH_DONE") made it impossible for .eh_timed_out() implementations to call scsi_done() without causing a crash. Restore support for SCSI timeout handlers to call scsi_done() as follows: * Change all .eh_timed_out() handlers as follows: - Change the return type into enum scsi_timeout_action. - Change BLK_EH_RESET_TIMER into SCSI_EH_RESET_TIMER. - Change BLK_EH_DONE into SCSI_EH_NOT_HANDLED. * In scsi_timeout(), convert the SCSI_EH_* values into BLK_EH_* values. Reviewed-by: Lee Duncan <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Ming Lei <[email protected]> Cc: John Garry <[email protected]> Cc: Mike Christie <[email protected]> Cc: Hannes Reinecke <[email protected]> Signed-off-by: Bart Van Assche <[email protected]> Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Mike Christie <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]>
2022-10-22scsi: core: Fix a race between scsi_done() and scsi_timeout()Bart Van Assche1-11/+3
If there is a race between scsi_done() and scsi_timeout() and if scsi_timeout() loses the race, scsi_timeout() should not reset the request timer. Hence change the return value for this case from BLK_EH_RESET_TIMER into BLK_EH_DONE. Although the block layer holds a reference on a request (req->ref) while calling a timeout handler, restarting the timer (blk_add_timer()) while a request is being completed is racy. Reviewed-by: Mike Christie <[email protected]> Cc: Keith Busch <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Ming Lei <[email protected]> Cc: John Garry <[email protected]> Cc: Hannes Reinecke <[email protected]> Reported-by: Adrian Hunter <[email protected]> Fixes: 15f73f5b3e59 ("blk-mq: move failure injection out of blk_mq_complete_request") Signed-off-by: Bart Van Assche <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Martin K. Petersen <[email protected]>
2022-10-22scsi: lpfc: Update lpfc version to 14.2.0.8Justin Tee1-1/+1
Update lpfc version to 14.2.0.8 Signed-off-by: Justin Tee <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Martin K. Petersen <[email protected]>
2022-10-22scsi: lpfc: Create a sysfs entry called lpfc_xcvr_data for transceiver infoJustin Tee4-2/+252
The DUMP_MEMORY mailbox command is implemented for page A0 and A2 to retrieve transceiver information from firmware. The mailbox command output is then formatted to print raw data values for userspace to parse via sysfs. Signed-off-by: Justin Tee <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Martin K. Petersen <[email protected]>
2022-10-22scsi: lpfc: Log when congestion management limits are in effectJustin Tee1-0/+12
When bandwidth reduces from or recovers back to 100% due to congestion management, log the event. Signed-off-by: Justin Tee <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Martin K. Petersen <[email protected]>
2022-10-22scsi: lpfc: Fix hard lockup when reading the rx_monitor from debugfsJustin Tee1-3/+3
During I/O and simultaneous cat of /sys/kernel/debug/lpfc/fnX/rx_monitor, a hard lockup similar to the call trace below may occur. The spin_lock_bh in lpfc_rx_monitor_report is not protecting from timer interrupts as expected, so change the strength of the spin lock to _irq. Kernel panic - not syncing: Hard LOCKUP CPU: 3 PID: 110402 Comm: cat Kdump: loaded exception RIP: native_queued_spin_lock_slowpath+91 [IRQ stack] native_queued_spin_lock_slowpath at ffffffffb814e30b _raw_spin_lock at ffffffffb89a667a lpfc_rx_monitor_record at ffffffffc0a73a36 [lpfc] lpfc_cmf_timer at ffffffffc0abbc67 [lpfc] __hrtimer_run_queues at ffffffffb8184250 hrtimer_interrupt at ffffffffb8184ab0 smp_apic_timer_interrupt at ffffffffb8a026ba apic_timer_interrupt at ffffffffb8a01c4f [End of IRQ stack] apic_timer_interrupt at ffffffffb8a01c4f lpfc_rx_monitor_report at ffffffffc0a73c80 [lpfc] lpfc_rx_monitor_read at ffffffffc0addde1 [lpfc] full_proxy_read at ffffffffb83e7fc3 vfs_read at ffffffffb833fe71 ksys_read at ffffffffb83402af do_syscall_64 at ffffffffb800430b entry_SYSCALL_64_after_hwframe at ffffffffb8a000ad Signed-off-by: Justin Tee <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Martin K. Petersen <[email protected]>
2022-10-22scsi: lpfc: Set sli4_param's cmf option to zero when CMF is turned offJustin Tee1-0/+1
Add missed clearing of phba->sli4_hba.pc_sli4_params.cmf when CMF is turned off. Signed-off-by: Justin Tee <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Martin K. Petersen <[email protected]>
2022-10-22scsi: qedf: Remove set but unused variable 'page'Jiapeng Chong1-3/+0
The variable page is not used in the function, so delete it. Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=2348 Reported-by: Abaci Robot <[email protected]> Signed-off-by: Jiapeng Chong <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Martin K. Petersen <[email protected]>
2022-10-22scsi: pm80xx: Remove unused reset_in_progress flag logicIgor Pylypiv2-5/+0
The reset_in_progress flag was never set. Signed-off-by: Igor Pylypiv <[email protected]> Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Andrew Konecki <[email protected]> Acked-by: Jack Wang <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]>
2022-10-22scsi: mvsas: Use sas_task_find_rq() for taggingJohn Garry4-23/+29
The request associated with a SCSI command coming from the block layer has a unique tag, so use that when possible for getting a slot. Unfortunately we don't support reserved commands in the SCSI midlayer yet. As such, SMP tasks - as an example - will not have a request associated, so in the interim continue to manage those tags for that type of sas_task internally. We reserve an arbitrary 4 tags for these internal tags. Indeed, we already decrement MVS_RSVD_SLOTS by 2 for the shost can_queue when flag MVF_FLAG_SOC is set. This change was made in commit 20b09c2992fe ("[SCSI] mvsas: add support for 94xx; layout change; bug fixes"), but what those 2 slots are used for is not obvious. Also make the tag management functions static, where possible. Signed-off-by: John Garry <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Martin K. Petersen <[email protected]>
2022-10-22scsi: mvsas: Delete mvs_tag_init()John Garry3-10/+0
All mvs_tag_init() does is zero the tag bitmap, but this is already done with the kzalloc() call to alloc the tags, so delete this unneeded function. Signed-off-by: John Garry <[email protected]> Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Damien Le Moal <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]>
2022-10-22scsi: pm8001: Use sas_task_find_rq() for taggingJohn Garry4-31/+24
The request associated with a SCSI command coming from the block layer has a unique tag, so use that when possible for getting a CCB. Unfortunately we don't support reserved commands in the SCSI midlayer yet, so in the interim continue to manage those tags internally (along with tags for private commands). Signed-off-by: John Garry <[email protected]> Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Jack Wang <[email protected]> Reviewed-by: Damien Le Moal <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]>
2022-10-22scsi: pm8001: Remove pm8001_tag_init()Igor Pylypiv3-10/+0
In commit 5a141315ed7c ("scsi: pm80xx: Increase the number of outstanding I/O supported to 1024") the pm8001_ha->tags allocation was moved into pm8001_init_ccb_tag(). This changed the execution order of allocation. pm8001_tag_init() used to be called after the pm8001_ha->tags allocation and now it is called before the allocation. Before: pm8001_pci_probe() `--> pm8001_pci_alloc() `--> pm8001_alloc() `--> pm8001_ha->tags = kzalloc(...) `--> pm8001_tag_init(pm8001_ha); // OK: tags are allocated After: pm8001_pci_probe() `--> pm8001_pci_alloc() | `--> pm8001_alloc() | `--> pm8001_tag_init(pm8001_ha); // NOK: tags are not allocated | `--> pm8001_init_ccb_tag() `--> pm8001_ha->tags = kzalloc(...) // today it is bitmap_zalloc() Since pm8001_ha->tags_num is zero when pm8001_tag_init() is called it does nothing. Tags memory is allocated with bitmap_zalloc() so there is no need to manually clear each bit with pm8001_tag_free(). Reviewed-by: Changyuan Lyu <[email protected]> Signed-off-by: Igor Pylypiv <[email protected]> Signed-off-by: John Garry <[email protected]> Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Damien Le Moal <[email protected]> Reviewed-by: Jack Wang <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]>
2022-10-22scsi: hisi_sas: Put reserved tags in lower region of tagsetJohn Garry1-7/+7
To be consistent with blk-mq, put the reserved tags in the lower region of the tagset. Eventually we hope to get rid of all this reserved tag management. Signed-off-by: John Garry <[email protected]> Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Hannes Reinecke <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]>
2022-10-22scsi: hisi_sas: Use sas_task_find_rq()John Garry1-18/+8
Use sas_task_find_rq() to lookup the request per task for its driver tag. Signed-off-by: John Garry <[email protected]> Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Damien Le Moal <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]>
2022-10-22scsi: libsas: Add sas_task_find_rq()John Garry1-0/+18
blk-mq already provides a unique tag per request. Some libsas LLDDs - like hisi_sas - already use this tag as the unique per-I/O HW tag. Add a common function to provide the request associated with a sas_task for all libsas LLDDs. Signed-off-by: John Garry <[email protected]> Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Jack Wang <[email protected]> Reviewed-by: Jason Yan <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]>
2022-10-22scsi: target: Remove the unused function transport_lba_64_ext()Jiapeng Chong1-8/+0
The function transport_lba_64_ext() is defined in the target_core_sbc.c file, but not called elsewhere, so remove this unused function. drivers/target/target_core_sbc.c:276:34: warning: unused function 'transport_lba_64_ext'. Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=2427 Reported-by: Abaci Robot <[email protected]> Signed-off-by: Jiapeng Chong <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Martin K. Petersen <[email protected]>
2022-10-18scsi: libsas: Use sas_phy_match_port_addr() instead of open coding itJason Yan1-2/+1
The SAS address comparison of asd_sas_port and expander phy is open coded. Replace it with sas_phy_match_port_addr(). Signed-off-by: Jason Yan <[email protected]> Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Damien Le Moal <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]>
2022-10-18scsi: libsas: Use sas_phy_addr_match() instead of open coding itJason Yan1-2/+1
The SAS address comparison of expander phys is open coded. Replace it with sas_phy_addr_match(). Signed-off-by: Jason Yan <[email protected]> Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Damien Le Moal <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]>
2022-10-18scsi: libsas: Use sas_phy_match_dev_addr() instead of open coding itJason Yan1-12/+6
The SAS address comparison of domain device and expander phy is open coded. Replace it with sas_phy_match_dev_addr(). Signed-off-by: Jason Yan <[email protected]> Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Damien Le Moal <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]>
2022-10-18scsi: hisi_sas: Use sas_find_attathed_phy_id() instead of open coding itJason Yan1-11/+3
The attached phy finding is open coded. Replace it with sas_find_attached_phy_id(). To keep things consistent, the return value of hisi_sas_dev_found() is also changed to -ENODEV after calling sas_find_attathed_phy_id() failed. Signed-off-by: Jason Yan <[email protected]> Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Jack Wang <[email protected]> Reviewed-by: Damien Le Moal <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]> Acked-by: John Garry <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]>
2022-10-18scsi: mvsas: Use sas_find_attached_phy_id() instead of open coding itJason Yan1-12/+5
The attached phy finding is open coded. Replace it with sas_find_attached_phy_id(). To keep things consistent, the return value of mvs_dev_found_notify() is also changed to -ENODEV after calling sas_find_attathed_phy_id() failed. Signed-off-by: Jason Yan <[email protected]> Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Jack Wang <[email protected]> Reviewed-by: Damien Le Moal <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]>
2022-10-18scsi: pm8001: Use sas_find_attached_phy_id() instead of open coding itJason Yan1-12/+6
The attached phy id finding is open coded. Replace it with sas_find_attached_phy_id(). To keep things consistent, the return value of pm8001_dev_found_notify() is also changed to -ENODEV after calling sas_find_attathed_phy_id() failed. Signed-off-by: Jason Yan <[email protected]> Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Jack Wang <[email protected]> Reviewed-by: Damien Le Moal <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]>
2022-10-18scsi: libsas: Introduce sas_find_attached_phy_id() helperJason Yan2-0/+18
LLDDs are all implementing their own attached phy ID finding code. Factor it out to libsas. Signed-off-by: Jason Yan <[email protected]> Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Jack Wang <[email protected]> Reviewed-by: Damien Le Moal <[email protected]> Reviewed-by: John Garry <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]>
2022-10-18scsi: libsas: Introduce SAS address comparison helpersJason Yan1-0/+17
SAS address comparison is widely used in libsas. However they are all opencoded and to avoid the line spill over 80 columns, are mostly split into multi-lines. Introduce some helpers to prepare for some refactoring. Signed-off-by: Jason Yan <[email protected]> Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Damien Le Moal <[email protected]> Reviewed-by: John Garry <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]>
2022-10-18scsi: core: Release SCSI devices synchronouslyBart Van Assche3-21/+4
All upstream scsi_device_put() calls happen from thread context. Hence simplify scsi_device_put() by always calling the release function synchronously. This commit prepares for constifying the SCSI host template by removing an assignment that clears the module pointer in the SCSI host template. scsi_device_dev_release_usercontext() was introduced in 2006 via commit 65110b216895 ("[SCSI] fix wrong context bugs in SCSI"). Cc: Christoph Hellwig <[email protected]> Cc: Ming Lei <[email protected]> Cc: Hannes Reinecke <[email protected]> Cc: John Garry <[email protected]> Cc: Mike Christie <[email protected]> Cc: Krzysztof Kozlowski <[email protected]> Signed-off-by: Bart Van Assche <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Martin K. Petersen <[email protected]>
2022-10-18scsi: core: Remove the put_device() call from scsi_device_get()Bart Van Assche1-5/+5
scsi_device_get() may be called from atomic context, e.g. by shost_for_each_device(). A later commit will allow put_device() to sleep for SCSI devices. Hence remove the put_device() call from scsi_device_get(). According to Rusty Russell's "Module Refcount and Stuff mini-FAQ", calling module_put() from atomic context is allowed since considerable time. See also https://lkml.org/lkml/2002/11/18/330. Cc: Christoph Hellwig <[email protected]> Cc: Ming Lei <[email protected]> Cc: Hannes Reinecke <[email protected]> Cc: Mike Christie <[email protected]> Cc: Krzysztof Kozlowski <[email protected]> Signed-off-by: Bart Van Assche <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Martin K. Petersen <[email protected]>
2022-10-18scsi: ufs: Simplify ufshcd_set_dev_pwr_mode()Bart Van Assche1-7/+2
Simplify the code for incrementing the SCSI device reference count in ufshcd_set_dev_pwr_mode(). This commit removes one scsi_device_put() call that happens from atomic context. Reviewed-by: Adrian Hunter <[email protected]> Cc: Avri Altman <[email protected]> Signed-off-by: Bart Van Assche <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Martin K. Petersen <[email protected]>
2022-10-18scsi: core: Rework scsi_single_lun_run()Bart Van Assche1-17/+15
Use __starget_for_each_device() instead of open-coding starget_for_each_device(). Run the queues asynchronously instead of synchronously. This commit removes code that calls scsi_device_put() from atomic context. Cc: Christoph Hellwig <[email protected]> Cc: Ming Lei <[email protected]> Cc: Hannes Reinecke <[email protected]> Cc: John Garry <[email protected]> Cc: Mike Christie <[email protected]> Cc: Krzysztof Kozlowski <[email protected]> Signed-off-by: Bart Van Assche <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Martin K. Petersen <[email protected]>
2022-10-18scsi: core: Introduce a new list for SCSI proc directory entriesBart Van Assche3-35/+102
Instead of using scsi_host_template members to track the SCSI proc directory entries, track these entries in a list. This changes the time needed for looking up the proc dir pointer from O(1) into O(n). This is considered acceptable since the number of SCSI host adapter types per host is usually small (less than ten). This change has been tested by attaching two USB storage devices to a qemu host: $ grep -aH . /proc/scsi/usb-storage/* /proc/scsi/usb-storage/7: Host scsi7: usb-storage /proc/scsi/usb-storage/7: Vendor: QEMU /proc/scsi/usb-storage/7: Product: QEMU USB HARDDRIVE /proc/scsi/usb-storage/7:Serial Number: 1-0000:00:02.1:00.0-6 /proc/scsi/usb-storage/7: Protocol: Transparent SCSI /proc/scsi/usb-storage/7: Transport: Bulk /proc/scsi/usb-storage/7: Quirks: SANE_SENSE /proc/scsi/usb-storage/8: Host scsi8: usb-storage /proc/scsi/usb-storage/8: Vendor: QEMU /proc/scsi/usb-storage/8: Product: QEMU USB HARDDRIVE /proc/scsi/usb-storage/8:Serial Number: 1-0000:00:02.1:00.0-7 /proc/scsi/usb-storage/8: Protocol: Transparent SCSI /proc/scsi/usb-storage/8: Transport: Bulk /proc/scsi/usb-storage/8: Quirks: SANE_SENSE This commit prepares for constifying most SCSI host templates. Reviewed-by: John Garry <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Ming Lei <[email protected]> Cc: Hannes Reinecke <[email protected]> Cc: Mike Christie <[email protected]> Cc: Krzysztof Kozlowski <[email protected]> Signed-off-by: Bart Van Assche <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Martin K. Petersen <[email protected]>
2022-10-18scsi: core: Fail host creation if creating the proc directory failsBart Van Assche3-7/+13
Users expect that the contents of /proc/scsi is in sync with the contents of /sys/class/scsi_host. Hence fail host creation if creating the proc directory fails. Suggested-by: John Garry <[email protected]> Reviewed-by: John Garry <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Ming Lei <[email protected]> Cc: Hannes Reinecke <[email protected]> Cc: Mike Christie <[email protected]> Cc: Krzysztof Kozlowski <[email protected]> Signed-off-by: Bart Van Assche <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Martin K. Petersen <[email protected]>
2022-10-18scsi: esas2r: Introduce scsi_template_proc_dir()Bart Van Assche3-6/+28
Prepare for removing the 'proc_dir' and 'present' members from the SCSI host template. This commit does not change any functionality. Reviewed-by: John Garry <[email protected]> Cc: Bradley Grove <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Ming Lei <[email protected]> Cc: Hannes Reinecke <[email protected]> Cc: Mike Christie <[email protected]> Cc: Krzysztof Kozlowski <[email protected]> Signed-off-by: Bart Van Assche <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Martin K. Petersen <[email protected]>
2022-10-18scsi: esas2r: Initialize two host template members implicitlyBart Van Assche1-2/+0
Prepare for removing the 'proc_dir' and 'present' members from the SCSI host template by implicitly initializing 'present' and 'emulated' in 'driver_template'. Reviewed-by: John Garry <[email protected]> Cc: Bradley Grove <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Ming Lei <[email protected]> Cc: Hannes Reinecke <[email protected]> Cc: Mike Christie <[email protected]> Cc: Krzysztof Kozlowski <[email protected]> Signed-off-by: Bart Van Assche <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Martin K. Petersen <[email protected]>
2022-10-18scsi: libsas: Update SATA dev FIS in sas_ata_task_done()John Garry1-2/+2
In sas_ata_task_done(), for commands which complete with error we set the SATA dev FIS status field with ATA_ERR. In ata_eh_analyze_tf() this would be interpreted as a HSM error. Set ATA_DRDY, which will lead libata to judge as a device error, which is a safer bet. Signed-off-by: John Garry <[email protected]> Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Niklas Cassel <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]>
2022-10-18scsi: libsas: Make sas_{alloc, alloc_slow, free}_task() privateJohn Garry3-7/+4
We have no users outside libsas any longer, so make sas_alloc_task(), sas_alloc_slow_task(), and sas_free_task() private. Signed-off-by: John Garry <[email protected]> Link: https://lore.kernel.org/r/[email protected] Tested-by: Damien Le Moal <[email protected]> Tested-by: Niklas Cassel <[email protected]> # pm80xx Reviewed-by: Jason Yan <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]>
2022-10-18scsi: pm8001: Use sas_ata_device_link_abort() to handle NCQ errorsJohn Garry4-326/+19
In commit c6b9ef5779c3 ("[SCSI] pm80xx: NCQ error handling changes") the driver had support added to handle NCQ errors but much of what is done in this handling is duplicated from the libata EH. In that named commit we handle in 2x main steps: a. Issue read log ext10 to examine and clear the errors b. Issue SATA_ABORT all command Indeed, in libata EH, we do similar to above: a. ata_do_eh() -> ata_eh_autopsy() -> ata_eh_link_autopsy() -> ata_eh_analyze_ncq_error() -> ata_eh_read_log_10h() b. ata_do_eh() -> ata_eh_recover() which will issue a device soft reset or hard reset Since there is so much duplication, use sas_ata_device_link_abort() which will abort all pending IOs and kick of ATA EH which will do the steps, above. However we will not follow the advisory to send the SATA_ABORT all command after the autopsy in read log ext10. Indeed, in libsas EH, we already send a per-task SATA_ABORT command, and this is prior to the ATA EH kicking in and issuing the read log ext10 in the recovery process. I judge that this is ok as the SATA_ABORT command does not actually send any protocol on the link to abort I/O on the other side, so would not change any state on the disk (for the read log ext10 command). Signed-off-by: John Garry <[email protected]> Link: https://lore.kernel.org/r/[email protected] Tested-by: Damien Le Moal <[email protected]> Tested-by: Niklas Cassel <[email protected]> # pm80xx Acked-by: Jack Wang <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]>
2022-10-18scsi: pm8001: Modify task abort handling for SATA taskJohn Garry3-6/+31
When we try to abort a SATA task, the CCB of the task which we are trying to avoid may still complete. In this case, we should not touch the task associated with that CCB as we can race with libsas freeing the last later in sas_eh_handle_sas_errors() -> sas_eh_finish_cmd() for when TASK_IS_ABORTED is returned from sas_scsi_find_task() Signed-off-by: John Garry <[email protected]> Link: https://lore.kernel.org/r/[email protected] Tested-by: Damien Le Moal <[email protected]> Tested-by: Niklas Cassel <[email protected]> # pm80xx Acked-by: Jack Wang <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]>
2022-10-18scsi: hisi_sas: Modify v3 HW SATA disk error state completion processingXingui Yang1-1/+4
When an NCQ error occurs, the controller will abnormally complete the I/Os that are newly delivered to disk, and bit8 in CQ dw3 will be set which indicates that the SATA disk is in error state. The current processing flow is to set ts->stat to SAS_OPEN_REJECT and then sas_ata_task_done() will set FIS stat to ATA_ERR. After analyzing the I/O by ata_eh_analyze_tf(), err_mask will set to AC_ERR_HSM. If media error occurs for four times within 10 minutes and the chip rejects new I/Os for four times, NCQ will be disabled due to excessive errors, which is undesirable. Therefore, use sas_task_abort() to handle abnormally completed I/Os when SATA disk is in error state, as these abnormally completed I/Os are already processed by sas_ata_device_link_abort() and qc->flag are set to ATA_QCFLAG_FAILED. If sas_task_abort() is used, qc->err_mask will not be modified in EH. Unlike the current process flow, it will not increase the count of ECAT_TOUT_HSM and not turn off NCQ. Like other I/Os on the disk that do not have an error but do not return after the NCQ error, they are retried after the EH. Signed-off-by: Xingui Yang <[email protected]> Signed-off-by: John Garry <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Martin K. Petersen <[email protected]>
2022-10-18scsi: hisi_sas: Add SATA_DISK_ERR bit handling for v3 hwXingui Yang3-3/+64
When CQ header dw3 SATA_DISK_ERR is set it means this SATA disk is in error state and the current IPTT is invalid. An invalid IPTT does not correspond to any slot. In this scenario, new I/Os that delivered to disk will be rejected by the controller and all I/Os remaining in the disk should be aborted, which we add here with the sas_ata_device_link_abort() call. In hisi_sas_abort_task() we don't want to issue a soft reset as it may cause info to be lost in the target disk for the ATA EH autopsy. In this case, just release resources - the disk won't return other I/Os normally after NCQ Error, so this is safe. Signed-off-by: Xingui Yang <[email protected]> Signed-off-by: John Garry <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Martin K. Petersen <[email protected]>
2022-10-18scsi: hisi_sas: Move slot variable definition in hisi_sas_abort_task()Xingui Yang1-5/+3
Each branch currently defines a slot variable independently, and it is neater to move it to the function head. Signed-off-by: Xingui Yang <[email protected]> Signed-off-by: John Garry <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Martin K. Petersen <[email protected]>
2022-10-18scsi: libsas: Add sas_ata_device_link_abort()John Garry2-0/+21
Similar to how AHCI handles NCQ errors in ahci_error_intr() -> ata_port_abort() -> ata_do_link_abort(), add an NCQ error handler for LLDDs to call to initiate a link abort. This will mark all outstanding QCs as failed and kick-off EH. Note: A "force reset" argument is added for drivers which require the ATA error handling to always reset the device. A driver may require this feature for when SATA device per-SCSI cmnd resources are only released during reset for ATA EH. As such, we need an option to force reset to be done, regardless of what any EH autopsy decides. The SATA device FIS fields are set to indicate a device error from ata_eh_analyze_tf(). Suggested-by: Damien Le Moal <[email protected]> Suggested-by: Niklas Cassel <[email protected]> Signed-off-by: John Garry <[email protected]> Link: https://lore.kernel.org/r/[email protected] Tested-by: Damien Le Moal <[email protected]> Tested-by: Niklas Cassel <[email protected]> # pm80xx Reviewed-by: Jason Yan <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]>
2022-10-16Linux 6.1-rc1Linus Torvalds1-2/+2