aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2022-04-18scsi: lpfc: Introduce FC_RSCN_MEMENTO flag for tracking post RSCN completionJames Smart3-3/+9
During an NVMe target reboot, the target may initialize itself as FCP only during the first RSCN and shortly after trigger a second RSCN claiming NVMe support. The timing of these RSCNs occur before FCP-PRLI for the first RSCN completes leading discovery issues over NVMe. Change RSCN and NVME-PRLI send logic based on a new FC_RSCN_MEMENTO flag that signals when lpfc_end_rscn() is completed and serves as a memento that discovery was started from RSCN. Link: https://lore.kernel.org/r/20220412222008.126521-20-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-04-18scsi: lpfc: Register for Application Services FC-4 type in Fabric topologyJames Smart2-30/+32
Add new FC-4 type 0x60 Application Services for fabric registration when VMID is enabled. Modified rft struture to indicate __be format. Removed redundant ipReg variable as it was not used. Link: https://lore.kernel.org/r/20220412222008.126521-19-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-04-18scsi: lpfc: Remove false FDMI NVMe FC-4 support for NPIV portsJames Smart1-1/+3
FDMI FC-4 Active Type for vports mistakenly shows NVMe support. Add a check to only set the NVMe support bit for the physical port. Link: https://lore.kernel.org/r/20220412222008.126521-18-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-04-18scsi: lpfc: Revise FDMI reporting of supported port speed for trunk groupsJames Smart1-20/+48
Trunk port FDMI supported port speed shows single port supported speed rather than the trunked port speed. Modify supported port speed logic calculation during registration. Link: https://lore.kernel.org/r/20220412222008.126521-17-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-04-18scsi: lpfc: Fix call trace observed during I/O with CMF enabledJames Smart1-2/+2
The following was seen with CMF enabled: BUG: using smp_processor_id() in preemptible code: systemd-udevd/31711 kernel: caller is lpfc_update_cmf_cmd+0x214/0x420 [lpfc] kernel: CPU: 12 PID: 31711 Comm: systemd-udevd kernel: Call Trace: kernel: <TASK> kernel: dump_stack_lvl+0x44/0x57 kernel: check_preemption_disabled+0xbf/0xe0 kernel: lpfc_update_cmf_cmd+0x214/0x420 [lpfc] kernel: lpfc_nvme_fcp_io_submit+0x23b4/0x4df0 [lpfc] this_cpu_ptr() calls smp_processor_id() in a preemptible context. Fix by using per_cpu_ptr() with raw_smp_processor_id() instead. Link: https://lore.kernel.org/r/20220412222008.126521-16-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-04-18scsi: lpfc: Correct CRC32 calculation for congestion statsJames Smart1-1/+1
lpfc_cgn_calc_crc32() is returning 32 bits, and lpfc_cgn_update_stat() was using u16 to store the crc32 value. Correct by redeclaring the local variable to u32. Link: https://lore.kernel.org/r/20220412222008.126521-15-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-04-18scsi: lpfc: Move MI module parameter check to handle dynamic disableJames Smart2-5/+7
lpfc_refresh_params() can be called for an async event handler. This could potentially override the value initialized by lpfc_cmf_setup(). Move module parameter check to lpfc_refresh_params(). Link: https://lore.kernel.org/r/20220412222008.126521-14-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-04-18scsi: lpfc: Remove unnecessary NULL pointer assignment for ELS_RDF pathJames Smart1-1/+0
The command IOCB ndlp pointer is overwritten in lpfc_issue_els_rdf(), and the original ndlp pointer is stored ahead of time. This null ptr assignment can be safely removed. Link: https://lore.kernel.org/r/20220412222008.126521-13-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-04-18scsi: lpfc: Transition to NPR state upon LOGO cmpl if link down or abortedJames Smart1-0/+3
In P2P topology, a target controller reboot sometimes results in not reestablishing a login because the ndlp is stuck in LOGO state. Fix by transitioning to NPR state if we get link down before LOGO completes. Link: https://lore.kernel.org/r/20220412222008.126521-12-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-04-18scsi: lpfc: Update fc_prli_sent outstanding only after guaranteed IOCB submitJames Smart1-18/+11
If lpfc_sli_issue_iocb() fails, then the fc_prli_sent is never decremented. Move the fc_prli_sent++ to after a guaranteed IOCB submit. Link: https://lore.kernel.org/r/20220412222008.126521-11-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-04-18scsi: lpfc: Protect memory leak for NPIV ports sending PLOGI_RJTJames Smart2-2/+25
There is a potential memory leak in lpfc_ignore_els_cmpl() and lpfc_els_rsp_reject() that was allocated from NPIV PLOGI_RJT (lpfc_rcv_plogi()'s login_mbox). Check if cmdiocb->context_un.mbox was allocated in lpfc_ignore_els_cmpl(), and then free it back to phba->mbox_mem_pool along with mbox->ctx_buf for service parameters. For lpfc_els_rsp_reject() failure, free both the ctx_buf for service parameters and the login_mbox. Link: https://lore.kernel.org/r/20220412222008.126521-10-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-04-18scsi: lpfc: Fix null pointer dereference after failing to issue FLOGI and PLOGIJames Smart1-16/+35
If lpfc_issue_els_flogi() fails and returns non-zero status, the node reference count is decremented to trigger the release of the nodelist structure. However, if there is a prior registration or dev-loss-evt work pending, the node may be released prematurely. When dev-loss-evt completes, the released node is referenced causing a use-after-free null pointer dereference. Similarly, when processing non-zero ELS PLOGI completion status in lpfc_cmpl_els_plogi(), the ndlp flags are checked for a transport registration before triggering node removal. If dev-loss-evt work is pending, the node may be released prematurely and a subsequent call to lpfc_dev_loss_tmo_handler() results in a use after free ndlp dereference. Add test for pending dev-loss before decrementing the node reference count for FLOGI, PLOGI, PRLI, and ADISC handling. Link: https://lore.kernel.org/r/20220412222008.126521-9-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-04-18scsi: lpfc: Clear fabric topology flag before initiating a new FLOGIJames Smart1-0/+2
Previous topologies may no longer be in fabric mode, so clear FC_FABRIC in fc_flag for every new FLOGI. Link: https://lore.kernel.org/r/20220412222008.126521-8-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-04-18scsi: lpfc: Fix SCSI I/O completion and abort handler deadlockJames Smart1-18/+15
During stress I/O tests with 500+ vports, hard LOCKUP call traces are observed. CPU A: native_queued_spin_lock_slowpath+0x192 _raw_spin_lock_irqsave+0x32 lpfc_handle_fcp_err+0x4c6 lpfc_fcp_io_cmd_wqe_cmpl+0x964 lpfc_sli4_fp_handle_cqe+0x266 __lpfc_sli4_process_cq+0x105 __lpfc_sli4_hba_process_cq+0x3c lpfc_cq_poll_hdler+0x16 irq_poll_softirq+0x76 __softirqentry_text_start+0xe4 irq_exit+0xf7 do_IRQ+0x7f CPU B: native_queued_spin_lock_slowpath+0x5b _raw_spin_lock+0x1c lpfc_abort_handler+0x13e scmd_eh_abort_handler+0x85 process_one_work+0x1a7 worker_thread+0x30 kthread+0x112 ret_from_fork+0x1f Diagram of lockup: CPUA CPUB ---- ---- lpfc_cmd->buf_lock phba->hbalock lpfc_cmd->buf_lock phba->hbalock Fix by reordering the taking of the lpfc_cmd->buf_lock and phba->hbalock in lpfc_abort_handler routine so that it tries to take the lpfc_cmd->buf_lock first before phba->hbalock. Link: https://lore.kernel.org/r/20220412222008.126521-7-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-04-18scsi: lpfc: Requeue SCSI I/O to upper layer when fw reports link downJames Smart1-0/+1
During heavy I/O stress tests with 100+ vports and cable pulls, it may take a while before the vport logs back into the fabric to resume I/O. Currently, the driver immediately fails the I/O with DID_ERROR. Change behavior to return DID_REQUEUE, and rely on SCSI layer's max retry of 5 before erroring out the I/O. Link: https://lore.kernel.org/r/20220412222008.126521-6-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-04-18scsi: lpfc: Zero SLI4 fcp_cmnd buffer's fcpCntl0 fieldJames Smart1-1/+1
It's possible that the fcpCntl0 reserved field is allocated non-zero. For certain target storage arrays this could cause problems expecting reserved fields to be all zero. SLI3 path already allocates fcp_cmnd buffer with dma_pool_zalloc() in lpfc_new_scsi_buf_s3. The fcpCntl0 field itself is never proactively set throughout the SCSI I/O path. Thus, we only change the SLI4 fcp_cmnd buffer allocation to dma_pool_zalloc. Link: https://lore.kernel.org/r/20220412222008.126521-5-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-04-18scsi: lpfc: Fix diagnostic fw logging after a function resetJames Smart2-3/+3
The lpfc_sli4_ras_setup() routine is only called from the lpfc_pci_probe_one_s4() routine, which means diagnostic fw logging initialization only occurs during probing. Thus, any path involving a reset of the HBA that restarts the state of the SLI port does not reinitialize diagnostic fw logging. Move lpfc_sli4_ras_setup() into lpfc_sli4_hba_setup() so that the LOWLEVEL_SET_DIAG_LOG_OPTIONS mailbox command can be sent after a function reset. Link: https://lore.kernel.org/r/20220412222008.126521-4-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-04-18scsi: lpfc: Move cfg_log_verbose check before calling lpfc_dmp_dbg()James Smart2-31/+4
In an attempt to log message 0126 with LOG_TRACE_EVENT, the following hard lockup call trace hangs the system. Call Trace: _raw_spin_lock_irqsave+0x32/0x40 lpfc_dmp_dbg.part.32+0x28/0x220 [lpfc] lpfc_cmpl_els_fdisc+0x145/0x460 [lpfc] lpfc_sli_cancel_jobs+0x92/0xd0 [lpfc] lpfc_els_flush_cmd+0x43c/0x670 [lpfc] lpfc_els_flush_all_cmd+0x37/0x60 [lpfc] lpfc_sli4_async_event_proc+0x956/0x1720 [lpfc] lpfc_do_work+0x1485/0x1d70 [lpfc] kthread+0x112/0x130 ret_from_fork+0x1f/0x40 Kernel panic - not syncing: Hard LOCKUP The same CPU tries to claim the phba->port_list_lock twice. Move the cfg_log_verbose checks as part of the lpfc_printf_vlog() and lpfc_printf_log() macros before calling lpfc_dmp_dbg(). There is no need to take the phba->port_list_lock within lpfc_dmp_dbg(). Link: https://lore.kernel.org/r/20220412222008.126521-3-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-04-18scsi: lpfc: Tweak message log categories for ELS/FDMI/NVMe rescanJames Smart3-6/+6
Several log message categories were updated: - Enable msg 4623 (Xmit of ECD) to display for ELS logging. - Change msg 0220 (FDMI cmd failed) to display for ELS logging. - Change msg 6460 (FDMI RPA failure) to be warning not hard error. - Change msg 6172 (NVME rescan of DID) to be logged under NVMe discovery. Link: https://lore.kernel.org/r/20220412222008.126521-2-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-04-18Merge branch '5.18/scsi-fixes' into 5.19/scsi-stagingMartin K. Petersen51-1137/+747
Pull in 5.18 fixes branch which contains a bunch of fixes required for the lpfc driver update. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-04-18scsi: ufs: core: Remove redundant HPB unmapPo-Wen Kao1-7/+0
Since the HPB mapping is already reset in ufshpb_init() by setting flag QUERY_FLAG_IDN_HPB_RESET, there is no need doing so again in ufshpb_hpb_lu_prepared(). This also resolves the issue where HPB WRITE BUFFER is issued before UAC is cleared. Link: https://lore.kernel.org/r/20220412073131.10644-1-powen.kao@mediatek.com Acked-by: Bean Huo <beanhuo@micron.com> Signed-off-by: Po-Wen Kao <powen.kao@mediatek.com> Signed-off-by: Bean Huo <beanhuo@micron.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-04-11scsi: iscsi: MAINTAINERS: Add Mike Christie as co-maintainerMike Christie1-0/+1
I've been doing a lot of iscsi patches because Oracle is paying me to work on iSCSI again. It was supposed to be temp assignment, but my co-worker that was working on iscsi moved to a new group so it looks like I'm back on this code again. After talking to Chris and Lee this patch adds me back as co-maintainer, so I can help them and people remember to cc me on issues. Link: https://lore.kernel.org/r/20220408001314.5014-11-michael.christie@oracle.com Tested-by: Manish Rangankar <mrangankar@marvell.com> Acked-by: Lee Duncan <lduncan@suse.com> Acked-by: Chris Leech <cleech@redhat.com> Signed-off-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-04-11scsi: qedi: Fix failed disconnect handlingMike Christie1-35/+34
We set the qedi_ep state to EP_STATE_OFLDCONN_START when the ep is created. Then in qedi_set_path we kick off the offload work. If userspace times out the connection and calls ep_disconnect, qedi will only flush the offload work if the qedi_ep state has transitioned away from EP_STATE_OFLDCONN_START. If we can't connect we will not have transitioned state and will leave the offload work running, and we will free the qedi_ep from under it. This patch just has us init the work when we create the ep, then always flush it. Link: https://lore.kernel.org/r/20220408001314.5014-10-michael.christie@oracle.com Tested-by: Manish Rangankar <mrangankar@marvell.com> Reviewed-by: Lee Duncan <lduncan@suse.com> Reviewed-by: Chris Leech <cleech@redhat.com> Acked-by: Manish Rangankar <mrangankar@marvell.com> Signed-off-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-04-11scsi: iscsi: Fix NOP handling during conn recoveryMike Christie2-2/+7
If a offload driver doesn't use the xmit workqueue, then when we are doing ep_disconnect libiscsi can still inject PDUs to the driver. This adds a check for if the connection is bound before trying to inject PDUs. Link: https://lore.kernel.org/r/20220408001314.5014-9-michael.christie@oracle.com Tested-by: Manish Rangankar <mrangankar@marvell.com> Reviewed-by: Lee Duncan <lduncan@suse.com> Reviewed-by: Chris Leech <cleech@redhat.com> Signed-off-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-04-11scsi: iscsi: Merge suspend fieldsMike Christie6-20/+21
Move the tx and rx suspend fields into one flags field. Link: https://lore.kernel.org/r/20220408001314.5014-8-michael.christie@oracle.com Tested-by: Manish Rangankar <mrangankar@marvell.com> Reviewed-by: Lee Duncan <lduncan@suse.com> Reviewed-by: Chris Leech <cleech@redhat.com> Signed-off-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-04-11scsi: iscsi: Fix unbound endpoint error handlingMike Christie1-29/+36
If a driver raises a connection error before the connection is bound, we can leave a cleanup_work queued that can later run and disconnect/stop a connection that is logged in. The problem is that drivers can call iscsi_conn_error_event for endpoints that are connected but not yet bound when something like the network port they are using is brought down. iscsi_cleanup_conn_work_fn will check for this and exit early, but if the cleanup_work is stuck behind other works, it might not get run until after userspace has done ep_disconnect. Because the endpoint is not yet bound there was no way for ep_disconnect to flush the work. The bug of leaving stop_conns queued was added in: Commit 23d6fefbb3f6 ("scsi: iscsi: Fix in-kernel conn failure handling") and: Commit 0ab710458da1 ("scsi: iscsi: Perform connection failure entirely in kernel space") was supposed to fix it, but left this case. This patch moves the conn state check to before we even queue the work so we can avoid queueing. Link: https://lore.kernel.org/r/20220408001314.5014-7-michael.christie@oracle.com Fixes: 0ab710458da1 ("scsi: iscsi: Perform connection failure entirely in kernel space") Tested-by: Manish Rangankar <mrangankar@marvell.com> Reviewed-by: Lee Duncan <lduncan@@suse.com> Reviewed-by: Chris Leech <cleech@redhat.com> Signed-off-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-04-11scsi: iscsi: Fix conn cleanup and stop race during iscsid restartMike Christie2-0/+19
If iscsid is doing a stop_conn at the same time the kernel is starting error recovery we can hit a race that allows the cleanup work to run on a valid connection. In the race, iscsi_if_stop_conn sees the cleanup bit set, but it calls flush_work on the clean_work before iscsi_conn_error_event has queued it. The flush then returns before the queueing and so the cleanup_work can run later and disconnect/stop a conn while it's in a connected state. The patch: Commit 0ab710458da1 ("scsi: iscsi: Perform connection failure entirely in kernel space") added the late stop_conn call bug originally, and the patch: Commit 23d6fefbb3f6 ("scsi: iscsi: Fix in-kernel conn failure handling") attempted to fix it but only fixed the normal EH case and left the above race for the iscsid restart case. For the normal EH case we don't hit the race because we only signal userspace to start recovery after we have done the queueing, so the flush will always catch the queued work or see it completed. For iscsid restart cases like boot, we can hit the race because iscsid will call down to the kernel before the kernel has signaled any error, so both code paths can be running at the same time. This adds a lock around the setting of the cleanup bit and queueing so they happen together. Link: https://lore.kernel.org/r/20220408001314.5014-6-michael.christie@oracle.com Fixes: 0ab710458da1 ("scsi: iscsi: Perform connection failure entirely in kernel space") Tested-by: Manish Rangankar <mrangankar@marvell.com> Reviewed-by: Lee Duncan <lduncan@suse.com> Reviewed-by: Chris Leech <cleech@redhat.com> Signed-off-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-04-11scsi: iscsi: Fix endpoint reuse regressionMike Christie1-1/+11
This patch fixes a bug where when using iSCSI offload we can free an endpoint while userspace still thinks it's active. That then causes the endpoint ID to be reused for a new connection's endpoint while userspace still thinks the ID is for the original connection. Userspace will then end up disconnecting a running connection's endpoint or trying to bind to another connection's endpoint. This bug is a regression added in: Commit 23d6fefbb3f6 ("scsi: iscsi: Fix in-kernel conn failure handling") where we added a in kernel ep_disconnect call to fix a bug in: Commit 0ab710458da1 ("scsi: iscsi: Perform connection failure entirely in kernel space") where we would call stop_conn without having done ep_disconnect. This early ep_disconnect call will then free the endpoint and it's ID while userspace still thinks the ID is valid. Fix the early release of the ID by having the in kernel recovery code keep a reference to the endpoint until userspace has called into the kernel to finish cleaning up the endpoint/connection. It requires the previous commit "scsi: iscsi: Release endpoint ID when its freed" which moved the freeing of the ID until when the endpoint is released. Link: https://lore.kernel.org/r/20220408001314.5014-5-michael.christie@oracle.com Fixes: 23d6fefbb3f6 ("scsi: iscsi: Fix in-kernel conn failure handling") Tested-by: Manish Rangankar <mrangankar@marvell.com> Reviewed-by: Lee Duncan <lduncan@suse.com> Reviewed-by: Chris Leech <cleech@redhat.com> Signed-off-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-04-11scsi: iscsi: Release endpoint ID when its freedMike Christie2-37/+36
We can't release the endpoint ID until all references to the endpoint have been dropped or it could be allocated while in use. This has us use an idr instead of looping over all conns to find a free ID and then free the ID when all references have been dropped instead of when the device is only deleted. Link: https://lore.kernel.org/r/20220408001314.5014-4-michael.christie@oracle.com Tested-by: Manish Rangankar <mrangankar@marvell.com> Reviewed-by: Lee Duncan <lduncan@suse.com> Reviewed-by: Chris Leech <cleech@redhat.com> Reviewed-by: Wu Bo <wubo40@huawei.com> Signed-off-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-04-11scsi: iscsi: Fix offload conn cleanup when iscsid restartsMike Christie1-20/+28
When userspace restarts during boot or upgrades it won't know about the offload driver's endpoint and connection mappings. iscsid will start by cleaning up the old session by doing a stop_conn call. Later, if we are able to create a new connection, we clean up the old endpoint during the binding stage. The problem is that if we do stop_conn before doing the ep_disconnect call offload, drivers can still be executing I/O. We then might free tasks from the under the card/driver. This moves the ep_disconnect call to before we do the stop_conn call for this case. It will then work and look like a normal recovery/cleanup procedure from the driver's point of view. Link: https://lore.kernel.org/r/20220408001314.5014-3-michael.christie@oracle.com Tested-by: Manish Rangankar <mrangankar@marvell.com> Reviewed-by: Lee Duncan <lduncan@suse.com> Reviewed-by: Chris Leech <cleech@redhat.com> Signed-off-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-04-11scsi: iscsi: Move iscsi_ep_disconnect()Mike Christie1-19/+19
This patch moves iscsi_ep_disconnect() so it can be called earlier in the next patch. Link: https://lore.kernel.org/r/20220408001314.5014-2-michael.christie@oracle.com Tested-by: Manish Rangankar <mrangankar@marvell.com> Reviewed-by: Lee Duncan <lduncan@suse.com> Reviewed-by: Chris Leech <cleech@redhat.com> Signed-off-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-04-11scsi: pm80xx: Enable upper inbound, outbound queuesAjish Koshy1-0/+11
Executing driver on servers with more than 32 CPUs were faced with command timeouts. This is because we were not geting completions for commands submitted on IQ32 - IQ63. Set E64Q bit to enable upper inbound and outbound queues 32 to 63 in the MPI main configuration table. Added 500ms delay after successful MPI initialization as mentioned in controller datasheet. Link: https://lore.kernel.org/r/20220411064603.668448-3-Ajish.Koshy@microchip.com Fixes: 05c6c029a44d ("scsi: pm80xx: Increase number of supported queues") Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Acked-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Ajish Koshy <Ajish.Koshy@microchip.com> Signed-off-by: Viswas G <Viswas.G@microchip.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-04-11scsi: pm80xx: Mask and unmask upper interrupt vectors 32-63Ajish Koshy1-9/+13
When upper inbound and outbound queues 32-63 are enabled, we see upper vectors 32-63 in interrupt service routine. We need corresponding registers to handle masking and unmasking of these upper interrupts. To achieve this, we use registers MSGU_ODMR_U(0x34) to mask and MSGU_ODMR_CLR_U(0x3C) to unmask the interrupts. In these registers bit 0-31 represents interrupt vectors 32-63. Link: https://lore.kernel.org/r/20220411064603.668448-2-Ajish.Koshy@microchip.com Fixes: 05c6c029a44d ("scsi: pm80xx: Increase number of supported queues") Reviewed-by: John Garry <john.garry@huawei.com> Acked-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Ajish Koshy <Ajish.Koshy@microchip.com> Signed-off-by: Viswas G <Viswas.G@microchip.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-04-11Revert "scsi: scsi_debug: Address races following module load"Bart Van Assche1-146/+51
Revert the patch mentioned in the subject since it blocks I/O after module unload has started while this is a legitimate use case. For e.g. blktests test case srp/001 that patch causes a command timeout to be triggered for the following call stack: __schedule+0x4c3/0xd20 schedule+0x82/0x110 schedule_timeout+0x122/0x200 io_schedule_timeout+0x7b/0xc0 __wait_for_common+0x2bc/0x380 wait_for_completion_io_timeout+0x1d/0x20 blk_execute_rq+0x1db/0x200 __scsi_execute+0x1fb/0x310 sd_sync_cache+0x155/0x2c0 [sd_mod] sd_shutdown+0xbb/0x190 [sd_mod] sd_remove+0x5b/0x80 [sd_mod] device_remove+0x9a/0xb0 device_release_driver_internal+0x2c5/0x360 device_release_driver+0x12/0x20 bus_remove_device+0x1aa/0x270 device_del+0x2d4/0x640 __scsi_remove_device+0x168/0x1a0 scsi_forget_host+0xa8/0xb0 scsi_remove_host+0x9b/0x150 sdebug_driver_remove+0x3d/0x140 [scsi_debug] device_remove+0x6f/0xb0 device_release_driver_internal+0x2c5/0x360 device_release_driver+0x12/0x20 bus_remove_device+0x1aa/0x270 device_del+0x2d4/0x640 device_unregister+0x18/0x70 sdebug_do_remove_host+0x138/0x180 [scsi_debug] scsi_debug_exit+0x45/0xd5 [scsi_debug] __do_sys_delete_module.constprop.0+0x210/0x320 __x64_sys_delete_module+0x1f/0x30 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae Link: https://lore.kernel.org/r/20220409043704.28573-1-bvanassche@acm.org Fixes: 2aad3cd85370 ("scsi: scsi_debug: Address races following module load") Cc: Douglas Gilbert <dgilbert@interlog.com> Cc: Yi Zhang <yi.zhang@redhat.com> Cc: Bob Pearson <rpearsonhpe@gmail.com> Reported-by: Yi Zhang <yi.zhang@redhat.com> Tested-by: Yi Zhang <yi.zhang@redhat.com> Acked-by: Douglas Gilbert <dgilbert@interlog.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-04-11scsi: megaraid_sas: Remove unnecessary memsetWan Jiabing1-2/+0
instance->cmd_list is allocated by kcalloc(). The memory is already set to zero. It is unnecessary to call memset again. Link: https://lore.kernel.org/r/20220407072442.4137977-1-wanjiabing@vivo.com Acked-by: Sumit Saxena <sumit.saxena@broadcom.com> Signed-off-by: Wan Jiabing <wanjiabing@vivo.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-04-06scsi: vmw_pvscsi: No need to clear memory after a dma_alloc_coherent() callChristophe JAILLET1-1/+0
dma_alloc_coherent() already clear the allocated memory, there is no need to explicitly call memset(). Since 'config_page' and 'header' are the same, a memset() call can be avoided. Link: https://lore.kernel.org/r/cd1220c628c89465dcfcbf4aa9bd53110898a529.1648067518.git.christophe.jaillet@wanadoo.fr Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-04-06scsi: ufs: ufshcd-pltfrm: Simplify pdev->dev usageKrzysztof Kozlowski1-5/+5
The 'struct device' pointer is already cached as a local variable in ufshcd_pltfrm_init(), so use it. Link: https://lore.kernel.org/r/20220401085050.119323-1-krzysztof.kozlowski@linaro.org Reviewed-by: Chanho Park <chanho61.park@samsung.com> Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-04-06scsi: megaraid_sas: Target with invalid LUN ID is deleted during scanChandrakanth patil2-0/+10
The megaraid_sas driver supports single LUN for RAID devices. That is LUN 0. All other LUNs are unsupported. When a device scan on a logical target with invalid LUN number is invoked through sysfs, that target ends up getting removed. Add LUN ID validation in the slave destroy function to avoid the target deletion. Link: https://lore.kernel.org/r/20220324094711.48833-1-chandrakanth.patil@broadcom.com Signed-off-by: Chandrakanth patil <chandrakanth.patil@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-04-06scsi: ufs: ufshpb: Fix a NULL check on list iteratorXiaomeng Tong1-6/+5
The list iterator is always non-NULL so the check 'if (!rgn)' is always false and the dev_err() is never called. Move the check outside the loop and determine if 'victim_rgn' is NULL, to fix this bug. Link: https://lore.kernel.org/r/20220320150733.21824-1-xiam0nd.tong@gmail.com Fixes: 4b5f49079c52 ("scsi: ufs: ufshpb: L2P map management for HPB read") Reviewed-by: Daejun Park <daejun7.park@samsung.com> Signed-off-by: Xiaomeng Tong <xiam0nd.tong@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-04-06scsi: sd: Clean up gendisk if device_add_disk() failedWenchao Hao1-0/+1
We forgot to call blk_cleanup_disk() when device_add_disk() failed. This would cause a memory leak of gendisk and sched_tags allocated in elevator_init_mq() Reference:https://syzkaller.appspot.com/x/log.txt?x=13b41dcb700000 Reported-and-tested-by: syzbot+f08c77040fa163a75a46@syzkaller.appspotmail.com Link: https://lore.kernel.org/r/20220401011018.1026553-1-haowenchao@huawei.com Signed-off-by: Wenchao Hao <haowenchao@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-04-06scsi: target: Allow changing dbroot if there are no registered devicesMaurizio Lombardi1-19/+28
The target driver prevents the users from changing the database root directory if a target module like ib_srpt has been registered. This makes it difficult for users to set their preferred database directory if the module gets loaded during the system boot. Let the users modify dbroot if there are no registered devices Link: https://lore.kernel.org/r/20220328103940.19977-1-mlombard@redhat.com Reviewed-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Maurizio Lombardi <mlombard@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-04-06scsi: message: fusion: Remove redundant variable dmpColin Ian King1-2/+2
Variable dmp is being assigned a value that is never read, the variable is redundant and can be removed. Cleans up clang scan build warning: drivers/message/fusion/mptbase.c:6667:39: warning: Although the value stored to 'dmp' is used in the enclosing expression, the value is never actually read from 'dmp' [deadcode.DeadStores] Link: https://lore.kernel.org/r/20220318003927.81471-1-colin.i.king@gmail.com Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-04-06scsi: mvsas: Add PCI ID of RocketRaid 2640Alexey Galakhov1-0/+1
The HighPoint RocketRaid 2640 is a low-cost SAS controller based on Marvell chip. The chip in question was already supported by the kernel, just the PCI ID of this particular board was missing. Link: https://lore.kernel.org/r/20220309212535.402987-1-agalakhov@gmail.com Signed-off-by: Alexey Galakhov <agalakhov@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-04-06scsi: sd: sd_read_cpr() requires VPD pagesMartin K. Petersen1-1/+1
As such it should be called inside the scsi_device_supports_vpd() conditional. Link: https://lore.kernel.org/r/20220302053559.32147-13-martin.petersen@oracle.com Fixes: e815d36548f0 ("scsi: sd: add concurrent positioning ranges support") Cc: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-04-06scsi: mpt3sas: Fail reset operation if config request timed outSreekanth Reddy1-3/+6
As part of controller reset operation the driver issues a config request command. If this command gets times out, then fail the controller reset operation instead of retrying it. Link: https://lore.kernel.org/r/20220405120637.20528-1-sreekanth.reddy@broadcom.com Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-04-06scsi: sym53c500_cs: Stop using struct scsi_pointerFinn Thain1-27/+25
This driver doesn't use SCp.ptr to save a SCSI command data pointer which means "scsi pointer" is a complete misnomer here. Only a few members of struct scsi_pointer are needed so move those to private command data. Link: https://lore.kernel.org/r/accf71e293ba3aed6d18c8baeb405de8dfe7c935.1649235939.git.fthain@linux-m68k.org Cc: Bart Van Assche <bvanassche@acm.org> Cc: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Finn Thain <fthain@linux-m68k.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-04-06scsi: ufs: ufs-pci: Add support for Intel MTLAdrian Hunter1-0/+17
Add PCI ID and callbacks to support Intel Meteor Lake (MTL). Link: https://lore.kernel.org/r/20220404055038.2208051-1-adrian.hunter@intel.com Cc: stable@vger.kernel.org # v5.15+ Reviewed-by: Avri Altman <avri.altman@wdc.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-04-06scsi: mpt3sas: Fix mpt3sas_check_same_4gb_region() kdoc commentDamien Le Moal1-2/+1
The start_addres argument of mpt3sas_check_same_4gb_region() was misnamed in the function kdoc comment, resulting in the following warning when compiling with W=1. drivers/scsi/mpt3sas/mpt3sas_base.c:5728: warning: Function parameter or member 'start_address' not described in 'mpt3sas_check_same_4gb_region' drivers/scsi/mpt3sas/mpt3sas_base.c:5728: warning: Excess function parameter 'reply_pool_start_address' description in 'mpt3sas_check_same_4gb_region' Fix the argument name in the function kdoc comment to avoid it. While at it, remove a useless blank line between the kdoc and function code. Link: https://lore.kernel.org/r/20220404050041.594774-1-damien.lemoal@opensource.wdc.com Acked-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com> Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-04-06scsi: scsi_debug: Fix sdebug_blk_mq_poll() in_use_bm bitmap useDamien Le Moal1-3/+5
The in_use_bm bitmap of struct sdebug_queue should be accessed under protection of the qc_lock spinlock. Make sure that this lock is taken before calling find_first_bit() at the beginning of the function sdebug_blk_mq_poll(). Link: https://lore.kernel.org/r/20220404045547.579887-1-damien.lemoal@opensource.wdc.com Fixes: 3fd07aecb750 ("scsi: scsi_debug: Fix qc_lock use in sdebug_blk_mq_poll()") Acked-by: Douglas Gilbert <dgilbert@interlog.com> Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-04-06Merge branch '5.18/scsi-queue' into 5.18/scsi-fixesMartin K. Petersen31-782/+393
Pull the remaining commits from 5.18/scsi-queue into fixes. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>