Age | Commit message (Collapse) | Author | Files | Lines |
|
Eliminate the following smatch warning:
drivers/scsi/megaraid/megaraid_sas_fusion.c:5104 megasas_reset_fusion()
warn: inconsistent indenting
Link: https://lore.kernel.org/r/20220225011605.130927-1-yang.lee@linux.alibaba.com
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Eliminate the following smatch warning:
drivers/scsi/aacraid/linit.c:867 aac_eh_tmf_hard_reset_fib() warn:
inconsistent indenting.
Link: https://lore.kernel.org/r/20220309005031.126504-1-jiapeng.chong@linux.alibaba.com
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
A page fault was encountered in mpt3sas on a LUN reset error path:
[ 145.763216] mpt3sas_cm1: Task abort tm failed: handle(0x0002),timeout(30) tr_method(0x0) smid(3) msix_index(0)
[ 145.778932] scsi 1:0:0:0: task abort: FAILED scmd(0x0000000024ba29a2)
[ 145.817307] scsi 1:0:0:0: attempting device reset! scmd(0x0000000024ba29a2)
[ 145.827253] scsi 1:0:0:0: [sg1] tag#2 CDB: Receive Diagnostic 1c 01 01 ff fc 00
[ 145.837617] scsi target1:0:0: handle(0x0002), sas_address(0x500605b0000272b9), phy(0)
[ 145.848598] scsi target1:0:0: enclosure logical id(0x500605b0000272b8), slot(0)
[ 149.858378] mpt3sas_cm1: Poll ReplyDescriptor queues for completion of smid(0), task_type(0x05), handle(0x0002)
[ 149.875202] BUG: unable to handle page fault for address: 00000007fffc445d
[ 149.885617] #PF: supervisor read access in kernel mode
[ 149.894346] #PF: error_code(0x0000) - not-present page
[ 149.903123] PGD 0 P4D 0
[ 149.909387] Oops: 0000 [#1] PREEMPT SMP NOPTI
[ 149.917417] CPU: 24 PID: 3512 Comm: scsi_eh_1 Kdump: loaded Tainted: G S O 5.10.89-altav-1 #1
[ 149.934327] Hardware name: DDN 200NVX2 /200NVX2-MB , BIOS ATHG2.2.02.01 09/10/2021
[ 149.951871] RIP: 0010:_base_process_reply_queue+0x4b/0x900 [mpt3sas]
[ 149.961889] Code: 0f 84 22 02 00 00 8d 48 01 49 89 fd 48 8d 57 38 f0 0f b1 4f 38 0f 85 d8 01 00 00 49 8b 45 10 45 31 e4 41 8b 55 0c 48 8d 1c d0 <0f> b6 03 83 e0 0f 3c 0f 0f 85 a2 00 00 00 e9 e6 01 00 00 0f b7 ee
[ 149.991952] RSP: 0018:ffffc9000f1ebcb8 EFLAGS: 00010246
[ 150.000937] RAX: 0000000000000055 RBX: 00000007fffc445d RCX: 000000002548f071
[ 150.011841] RDX: 00000000ffff8881 RSI: 0000000000000001 RDI: ffff888125ed50d8
[ 150.022670] RBP: 0000000000000000 R08: 0000000000000000 R09: c0000000ffff7fff
[ 150.033445] R10: ffffc9000f1ebb68 R11: ffffc9000f1ebb60 R12: 0000000000000000
[ 150.044204] R13: ffff888125ed50d8 R14: 0000000000000080 R15: 34cdc00034cdea80
[ 150.054963] FS: 0000000000000000(0000) GS:ffff88dfaf200000(0000) knlGS:0000000000000000
[ 150.066715] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 150.076078] CR2: 00000007fffc445d CR3: 000000012448a006 CR4: 0000000000770ee0
[ 150.086887] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 150.097670] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 150.108323] PKRU: 55555554
[ 150.114690] Call Trace:
[ 150.120497] ? printk+0x48/0x4a
[ 150.127049] mpt3sas_scsih_issue_tm.cold.114+0x2e/0x2b3 [mpt3sas]
[ 150.136453] mpt3sas_scsih_issue_locked_tm+0x86/0xb0 [mpt3sas]
[ 150.145759] scsih_dev_reset+0xea/0x300 [mpt3sas]
[ 150.153891] scsi_eh_ready_devs+0x541/0x9e0 [scsi_mod]
[ 150.162206] ? __scsi_host_match+0x20/0x20 [scsi_mod]
[ 150.170406] ? scsi_try_target_reset+0x90/0x90 [scsi_mod]
[ 150.178925] ? blk_mq_tagset_busy_iter+0x45/0x60
[ 150.186638] ? scsi_try_target_reset+0x90/0x90 [scsi_mod]
[ 150.195087] scsi_error_handler+0x3a5/0x4a0 [scsi_mod]
[ 150.203206] ? __schedule+0x1e9/0x610
[ 150.209783] ? scsi_eh_get_sense+0x210/0x210 [scsi_mod]
[ 150.217924] kthread+0x12e/0x150
[ 150.224041] ? kthread_worker_fn+0x130/0x130
[ 150.231206] ret_from_fork+0x1f/0x30
This is caused by mpt3sas_base_sync_reply_irqs() using an invalid reply_q
pointer outside of the list_for_each_entry() loop. At the end of the full
list traversal the pointer is invalid.
Move the _base_process_reply_queue() call inside of the loop.
Link: https://lore.kernel.org/r/d625deae-a958-0ace-2ba3-0888dd0a415b@ddn.com
Fixes: 711a923c14d9 ("scsi: mpt3sas: Postprocessing of target and LUN reset")
Cc: stable@vger.kernel.org
Acked-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Signed-off-by: Matt Lupfer <mlupfer@ddn.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Use the common libsas internal abort functionality.
In addition, this driver has special handling for internal abort timeouts -
specifically whether to reset the controller in that instance, so extend
the API for that.
Timeout is now increased to 20 * Hz from 6 * Hz.
We also retry for failure now, but this should not make a difference.
Link: https://lore.kernel.org/r/1647001432-239276-5-git-send-email-john.garry@huawei.com
Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Acked-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
New special handling is added for SAS_PROTOCOL_INTERNAL_ABORT proto so that
we may use the common queue command API.
Link: https://lore.kernel.org/r/1647001432-239276-4-git-send-email-john.garry@huawei.com
Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Acked-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Add support for a "device" variant of internal abort, which will abort all
pending I/Os for a specific device.
Link: https://lore.kernel.org/r/1647001432-239276-3-git-send-email-john.garry@huawei.com
Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Acked-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
The internal abort feature is common to hisi_sas and pm8001 HBAs, and the
driver support is similar also, so add a common handler.
Two modes of operation will be supported:
- single: Abort a single tagged command
- device: Abort all commands associated with a specific domain device
A new protocol is added, SAS_PROTOCOL_INTERNAL_ABORT, so the common queue
command API may be re-used.
Only add "single" support as a first step.
Link: https://lore.kernel.org/r/1647001432-239276-2-git-send-email-john.garry@huawei.com
Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Acked-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
The soft_wwpn/soft_wwn functionality, which allows the driver to modify
service parameters in an attempt to override the adapter-assigned WWN, was
originally attempted to be removed roughly 6 yrs ago as new fabric features
were being introduced that clashed with the implementation. In the end,
the feature was left in with the user being responsible if things went
south.
We've reached a point where soft_wwn is no longer functional and is failing
in almost all production use cases. Use of Fabric features such as Fabric
Assigned WWPN and Automatic DPORT is now prevalent and the features require
coordination between the adapter and driver that can't be solved by the
simplistic update of the service parameters. As it is no longer functional,
the feature is to be removed.
There are still ways to override the adapter-assigned WWN but they require
the admin to invoke bios/efi level menus.
Link: https://lore.kernel.org/r/20220310154845.11125-1-jsmart2021@gmail.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
When ufs initializes without scmd->device->sector_size set, scsi_get_lba()
will get a wrong shift number and trigger an ubsan error. The shift
exponent 4294967286 is too large for the 64-bit type 'sector_t' (aka
'unsigned long long').
Call scsi_get_lba() only when opcode is READ_10/WRITE_10/UNMAP.
Link: https://lore.kernel.org/r/20220307111752.10465-1-peter.wang@mediatek.com
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Peter Wang <peter.wang@mediatek.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
The driver must perform its 4GB boundary check using the pool's DMA address
instead of using the virtual address.
Link: https://lore.kernel.org/r/20220303140230.13098-1-sreekanth.reddy@broadcom.com
Fixes: d6adc251dd2f ("scsi: mpt3sas: Force PCIe scatterlist allocations to be within same 4 GB region")
Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
When scsi_dma_map() fails by returning a sges_left value less than zero,
the amount of logging produced can be extremely high. In a recent end-user
environment, 1200 messages per second were being sent to the log buffer.
This eventually overwhelmed the system and it stalled.
These error messages are not needed. Remove them.
Link: https://lore.kernel.org/r/20220303140203.12642-1-sreekanth.reddy@broadcom.com
Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
fc_exch_release(ep) will decrease the ep's reference count. When the
reference count reaches zero, it is freed. But ep is still used in the
following code, which will lead to a use after free.
Return after the fc_exch_release() call to avoid use after free.
Link: https://lore.kernel.org/r/20220303015115.459778-1-niejianglei2021@163.com
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Jianglei Nie <niejianglei2021@163.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
The use of the 'locked' boolean variable to control locking and unlocking
of the qc_lock spinlock of struct sdebug_queue confuses sparse, leading to
a warning about an unexpected unlock. Simplify the qc_lock lock/unlock
handling code of this function to avoid this warning by removing the
'locked' boolean variable. This change also fixes unlocked access to the
in_use_bm bitmap with the find_first_bit() function.
Link: https://lore.kernel.org/r/20220301113009.595857-3-damien.lemoal@opensource.wdc.com
Fixes: b05d4e481eff ("scsi: scsi_debug: Refine sdebug_blk_mq_poll()")
Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
The return statement inside the sdeb_read_lock(), sdeb_read_unlock(),
sdeb_write_lock() and sdeb_write_unlock() confuse sparse, leading to many
warnings about unexpected unlocks in the resp_xxx() functions.
Modify the lock/unlock functions using the __acquire() and __release()
inline annotations for the sdebug_no_rwlock == true case to avoid these
warnings.
Link: https://lore.kernel.org/r/20220301113009.595857-2-damien.lemoal@opensource.wdc.com
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Simplify the refcounting and remove the need to clear disk->private_data
by implementing the ->free_disk method.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20220308055200.735835-9-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Implement the ->free_disk method to to put struct scsi_disk when the last
gendisk reference count goes away. This removes the need to clear
->private_data and thus freeze the queue on unbind.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20220308055200.735835-8-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Call free_opal_dev from scsi_disk_release as the opal_dev field is accessed
from the ioctl handler, which isn't synchronized vs sd_release and thus
can be accessed during or after sd_release was called.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20220308055200.735835-7-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
sd_zbc_release_disk accesses disk->device, so ensure that actually still has
a valid reference.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20220308055200.735835-6-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
dev is very hard to grep for. Give the field a more descriptive name and
documents its purpose.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20220308055200.735835-5-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Requiring every ULP to have the scsi_drive as first member of the
private data is rather fragile and not necessary anyway. Just use
the driver hanging off the SCSI device instead.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20220308055200.735835-4-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
It isn't enough to check whether a grant is still being in use by
calling gnttab_query_foreign_access(), as a mapping could be realized
by the other side just after having called that function.
In case the call was done in preparation of revoking a grant it is
better to do so via gnttab_try_end_foreign_access() and check the
success of that operation instead.
This is CVE-2022-23038 / part of XSA-396.
Reported-by: Demi Marie Obenour <demi@invisiblethingslab.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
V2:
- use gnttab_try_end_foreign_access()
|
|
Kernel messages produced during runtime PM can cause a never-ending cycle
because user space utilities (e.g. journald or rsyslog) write the messages
back to storage, causing runtime resume, more messages, and so on.
Messages that tell of things that are expected to happen, are arguably
unnecessary, so suppress them.
UFS driver messages are changes to from dev_err() to dev_dbg() which means
they will not display unless activated by dynamic debug of building with
-DDEBUG.
sdev->silence_suspend is set to skip messages from sd_suspend_common()
"Synchronizing SCSI cache", "Stopping disk" and scsi_report_sense()
"Power-on or device reset occurred" message (Note, that message appears
when the LUN is accessed after runtime PM, not during runtime PM)
Example messages from Ubuntu 21.10:
$ dmesg | tail
[ 1620.380071] ufshcd 0000:00:12.5: ufshcd_print_pwr_info:[RX, TX]: gear=[1, 1], lane[1, 1], pwr[SLOWAUTO_MODE, SLOWAUTO_MODE], rate = 0
[ 1620.408825] ufshcd 0000:00:12.5: ufshcd_print_pwr_info:[RX, TX]: gear=[4, 4], lane[2, 2], pwr[FAST MODE, FAST MODE], rate = 2
[ 1620.409020] ufshcd 0000:00:12.5: ufshcd_find_max_sup_active_icc_level: Regulator capability was not set, actvIccLevel=0
[ 1620.409524] sd 0:0:0:0: Power-on or device reset occurred
[ 1622.938794] sd 0:0:0:0: [sda] Synchronizing SCSI cache
[ 1622.939184] ufs_device_wlun 0:0:0:49488: Power-on or device reset occurred
[ 1625.183175] ufshcd 0000:00:12.5: ufshcd_print_pwr_info:[RX, TX]: gear=[1, 1], lane[1, 1], pwr[SLOWAUTO_MODE, SLOWAUTO_MODE], rate = 0
[ 1625.208041] ufshcd 0000:00:12.5: ufshcd_print_pwr_info:[RX, TX]: gear=[4, 4], lane[2, 2], pwr[FAST MODE, FAST MODE], rate = 2
[ 1625.208311] ufshcd 0000:00:12.5: ufshcd_find_max_sup_active_icc_level: Regulator capability was not set, actvIccLevel=0
[ 1625.209035] sd 0:0:0:0: Power-on or device reset occurred
Note for stable: depends on patch "scsi: core: sd: Add silence_suspend flag
to suppress some PM messages".
Link: https://lore.kernel.org/r/20220228113652.970857-3-adrian.hunter@intel.com
Cc: stable@vger.kernel.org
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Kernel messages produced during runtime PM can cause a never-ending cycle
because user space utilities (e.g. journald or rsyslog) write the messages
back to storage, causing runtime resume, more messages, and so on.
Messages that tell of things that are expected to happen are arguably
unnecessary, so add a flag to suppress them. This flag is used by the UFS
driver.
Link: https://lore.kernel.org/r/20220228113652.970857-2-adrian.hunter@intel.com
Cc: stable@vger.kernel.org
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
We only need the rport structure for lpfc_chk_tgt_mapped().
Link: https://lore.kernel.org/r/20220301143718.40913-6-hare@suse.de
Cc: James Smart <james.smart@broadcom.com>
Reviewed-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Instead of passing in a scsi_cmnd we should be using the rport; we already
have the target and LUN ID as parameters, so there's no need to pass the
scsi_cmnd too.
Link: https://lore.kernel.org/r/20220301143718.40913-5-hare@suse.de
Cc: James Smart <james.smart@broadcom.com>
Reviewed-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Use fc_block_rport() instead of fc_block_scsi_eh() as the SCSI command will
be removed as argument for SCSI EH functions.
Link: https://lore.kernel.org/r/20220301143718.40913-4-hare@suse.de
Cc: James Smart <james.smart@broadcom.com>
Reviewed-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
The default SCSI EH action for a non-existing EH callback is to return
FAILED, so having a callback just returning FAILED is pointless.
Link: https://lore.kernel.org/r/20220301143718.40913-3-hare@suse.de
Cc: James Smart <james.smart@broadcom.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
lpfc_bus_reset_handler() is really just a loop calling
lpfc_target_reset_handler() over all targets, which is what the error
handler will be doing anyway.
Link: https://lore.kernel.org/r/20220301143718.40913-2-hare@suse.de
Cc: James Smart <james.smart@broadcom.com>
Reviewed-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
During the process of driver probing, the probe function should return < 0
for failure, otherwise the kernel will treat value >= 0 as success.
Set 'err' to the error value returned by dma_set_mask() in case of failure.
Link: https://lore.kernel.org/r/1646060055-11361-1-git-send-email-zheyuma97@gmail.com
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Zheyu Ma <zheyuma97@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
When the workqueue code was created it didn't allow variable args so we
have been using a temp buffer. Drop that.
Link: https://lore.kernel.org/r/20220226230435.38733-7-michael.christie@oracle.com
Reviewed-by: Chris Leech <cleech@redhat.com>
Reviewed-by: Lee Duncan <lduncan@suse.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Use the session workqueue for recovery and unbinding. If there are delays
during device blocking/cleanup then it will no longer affect other
sessions.
Link: https://lore.kernel.org/r/20220226230435.38733-6-michael.christie@oracle.com
Reviewed-by: Chris Leech <cleech@redhat.com>
Reviewed-by: Lee Duncan <lduncan@suse.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
We currently allocate a workqueue per host and only use it for removing the
target. For the session per host case we could be using this workqueue to
be able to do recoveries (block, unblock, timeout handling) in parallel. To
also allow offload drivers to do their session recoveries in parallel, this
drops the per host workqueue and replaces it with a per session one.
Link: https://lore.kernel.org/r/20220226230435.38733-5-michael.christie@oracle.com
Reviewed-by: Lee Duncan <lduncan@suse.com>
Reviewed-by: Chris Leech <cleech@redhat.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
qla4xxx does not use iscsi_scan_finished() anymore so remove it.
Link: https://lore.kernel.org/r/20220226230435.38733-4-michael.christie@oracle.com
Reviewed-by: Lee Duncan <lduncan@suse.com>
Reviewed-by: Chris Leech <cleech@redhat.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
When the iSCSI class was added upstream, blocking a queue was fast because
it just set some flag bits and didn't handle I/O that was in the process of
being sent to the driver. That's no longer the case so blocking a queue is
expensive and we can end up with a backlog of blocks by the time we have
relogged in and are trying to start the queues.
For the session unblock case, this has try to cancel the block and recovery
work in case they are still queued so we can avoid unneeded queue
manipulations. For removal, we also now try to cancel all the recovery
related works since a couple lines down we will set the session and device
state so running those functions are not necessary.
Link: https://lore.kernel.org/r/20220226230435.38733-3-michael.christie@oracle.com
Reviewed-by: Lee Duncan <lduncan@suse.com>
Reviewed-by: Chris Leech <cleech@redhat.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
If the user sets the iscsi_eh_timer_workq/iscsi_eh workqueue's max_active
to greater than 1, the recovery_work could be running when
__iscsi_unblock_session() runs. The cancel_delayed_work() will then not
wait for the running work and we can race where we end up with the wrong
session state and scsi_device state set.
This replaces the cancel_delayed_work() with the sync version.
Link: https://lore.kernel.org/r/20220226230435.38733-2-michael.christie@oracle.com
Reviewed-by: Lee Duncan <lduncan@suse.com>
Reviewed-by: Chris Leech <cleech@redhat.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
In the original FPIN commit, stats were incremented by the event_count.
Event_count is the minimum # of events that must occur before an FPIN is
sent. Thus, its not the actual number of events, and could be significantly
off (too low) as it doesn't reflect anything not reported. Rather than
attempt to count events, have the statistic count how many FPINS cross the
threshold and were reported.
Link: https://lore.kernel.org/r/20220301175536.60250-1-jsmart2021@gmail.com
Fixes: 3dcfe0de5a97 ("scsi: fc: Parse FPIN packets and update statistics")
Cc: <stable@vger.kernel.org> # v5.11+
Cc: Shyam Sundar <ssundar@marvell.com>
Cc: Nilesh Javali <njavali@marvell.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Sparse throws a warning about context imbalance ("different lock contexts
for basic block") in sas_form_port() as it gets confused with the fact that
a port is locked within one of the two search loops and unlocked afterward
outside of the search loops once the phy is added to the port. Since this
code is not easy to follow, improve it by factoring out the code adding the
phy to the port once the port is locked into the helper function
sas_form_port_add_phy(). This helper can then be called directly within the
port search loops, avoiding confusion and clearing the sparse warning.
Link: https://lore.kernel.org/r/20220228094857.557329-1-damien.lemoal@opensource.wdc.com
Reviewed-by: John Garry <john.garry@huawei.com>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
This header is empty now except for an include of <linux/blk-mq.h>, so
remove it.
Link: https://lore.kernel.org/r/20220224175552.988286-9-hch@lst.de
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: John Garry <john.garry@huawei.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Let submitters initialize the scmd->allowed field directly instead of
indirecting through struct scsi_request and remove the now superfluous
structure.
Link: https://lore.kernel.org/r/20220224175552.988286-8-hch@lst.de
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: John Garry <john.garry@huawei.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Prepare for removing the scsi_request structure by moving the result field
to struct scsi_cmnd.
Link: https://lore.kernel.org/r/20220224175552.988286-7-hch@lst.de
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: John Garry <john.garry@huawei.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
scsi_cmnd
Prepare for removing the scsi_request structure by moving the resid_len
field to struct scsi_cmnd.
Link: https://lore.kernel.org/r/20220224175552.988286-6-hch@lst.de
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: John Garry <john.garry@huawei.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Just use the sense_buffer field in struct scsi_cmnd for the sense data and
move the sense_len field over to struct scsi_cmnd.
Link: https://lore.kernel.org/r/20220224175552.988286-5-hch@lst.de
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: John Garry <john.garry@huawei.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Now that each scsi_request is backed by a scsi_cmnd, there is no need to
indirect the CDB storage. Change all submitters of SCSI passthrough
requests to store the CDB information directly in the scsi_cmnd, and while
doing so allocate the full 32 bytes that cover all Linux supported SCSI
hosts instead of requiring dynamic allocation for > 16 byte CDBs. On
64-bit systems this does not change the size of the scsi_cmnd at all, while
on 32-bit systems it slightly increases it for now, but that increase will
be made up by the removal of the remaining scsi_request fields.
Link: https://lore.kernel.org/r/20220224175552.988286-4-hch@lst.de
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: John Garry <john.garry@huawei.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Replace the big fat memset that requires saving and restoring various
fields with just initializing those fields that need initialization.
All the clearing to 0 is moved to scsi_prepare_cmd() as scsi_ioctl_reset()
alreadly uses kzalloc() to allocate a pre-zeroed command.
This is still conservative and can probably be optimized further.
Link: https://lore.kernel.org/r/20220224175552.988286-3-hch@lst.de
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: John Garry <john.garry@huawei.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Function queue_work() returns a bool, so use a bool to hold this value for
the return code from callers, which should make the code a tiny bit more
clear.
Also take this opportunity to condense the code of the those callers, such
as sas_queue_work(), as suggested by Damien.
Link: https://lore.kernel.org/r/1645786656-221630-3-git-send-email-john.garry@huawei.com
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Nobody checks the return codes, so make them return void. Indeed, if the
LLDD cannot send an event, nothing much can be done in the LLDD about it.
Also remove prototype for sas_notify_phy_event() in sas_internal.h, which
should not be there.
Link: https://lore.kernel.org/r/1645786656-221630-2-git-send-email-john.garry@huawei.com
Reviewed-by: Xiang Chen <chenxiang66@hisilicon.com>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
In case of SSP underflow allow the response frame IU to be examined for
setting the response stat value rather than always setting
SAS_DATA_UNDERRUN.
This will mean that we call sas_ssp_task_response() in those scenarios and
may send sense data to upper layer.
Such a condition would be for bad blocks were we just reporting an
underflow error to upper layer, but now the sense data will tell
immediately that the media is faulty.
Link: https://lore.kernel.org/r/1645703489-87194-7-git-send-email-john.garry@huawei.com
Signed-off-by: Xingui Yang <yangxingui@huawei.com>
Signed-off-by: Qi Liu <liuqi115@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Add a file operation for "cnt" file under bist directory, so users can only
read "cnt" or clear "cnt" to zero, but cannot randomly modify.
Link: https://lore.kernel.org/r/1645703489-87194-6-git-send-email-john.garry@huawei.com
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Qi Liu <liuqi115@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
To avoid doubt, rename the error labels to indicate the action they will
take.
Link: https://lore.kernel.org/r/1645703489-87194-5-git-send-email-john.garry@huawei.com
Signed-off-by: Qi Liu <liuqi115@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
If the driver probe fails to request the channel IRQ or fatal IRQ, the
driver will free the IRQ vectors before freeing the IRQs in free_irq(),
and this will cause a kernel BUG like this:
------------[ cut here ]------------
kernel BUG at drivers/pci/msi.c:369!
Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
Call trace:
free_msi_irqs+0x118/0x13c
pci_disable_msi+0xfc/0x120
pci_free_irq_vectors+0x24/0x3c
hisi_sas_v3_probe+0x360/0x9d0 [hisi_sas_v3_hw]
local_pci_probe+0x44/0xb0
work_for_cpu_fn+0x20/0x34
process_one_work+0x1d0/0x340
worker_thread+0x2e0/0x460
kthread+0x180/0x190
ret_from_fork+0x10/0x20
---[ end trace b88990335b610c11 ]---
So we use devm_add_action() to control the order in which we free the
vectors.
Link: https://lore.kernel.org/r/1645703489-87194-4-git-send-email-john.garry@huawei.com
Signed-off-by: Qi Liu <liuqi115@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|