aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2020-06-24IB/hfi1: Add atomic triggered sleep/wakeupMike Marciniszyn2-20/+53
When running iperf in a two host configuration the following trace can occur: [ 319.728730] NETDEV WATCHDOG: ib0 (hfi1): transmit queue 0 timed out The issue happens because the current implementation relies on the netif txq being stopped to control the flushing of the tx list. There are two resources that the transmit logic can wait on and stop the txq: - SDMA descriptors - Ring space to hold completions The ring space is tested on the sending side and relieved when the ring is consumed in the napi tx reaping. Unfortunately, that reaping can run conncurrently with the workqueue flushing of the txlist. If the txq is started just before the workitem executes, the txlist will never be flushed, leading to the txq being stuck. Fix by: - Adding sleep/wakeup wrappers * Use an atomic to control the call to the netif routines inside the wrappers - Use another atomic to record ring space exhaustion * Only wakeup when the a ring space exhaustion has happened and it relieved Add additional wrappers to clarify the ring space resource handling. Fixes: d99dc602e2a5 ("IB/hfi1: Add functions to transmit datagram ipoib packets") Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Kaike Wan <[email protected]> Signed-off-by: Mike Marciniszyn <[email protected]> Signed-off-by: Dennis Dalessandro <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-06-24IB/hfi1: Correct -EBUSY handling in tx codeMike Marciniszyn1-17/+18
The current code mishandles -EBUSY in two ways: - The flow change doesn't test the return from the flush and runs on to process the current packet racing with the wakeup processing - The -EBUSY handling for a single packet inserts the tx into the txlist after the submit call, racing with the same wakeup processing Fix the first by dropping the skb and returning NETDEV_TX_OK. Fix the second by insuring the the list entry within the txreq is inited when allocated. This enables the sleep routine to detect that the txreq has used the non-list api and queue the packet to the txlist. Both flaws can lead to having the flushing thread executing in causing two threads to manipulate the txlist. Fixes: d99dc602e2a5 ("IB/hfi1: Add functions to transmit datagram ipoib packets") Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Kaike Wan <[email protected]> Signed-off-by: Mike Marciniszyn <[email protected]> Signed-off-by: Dennis Dalessandro <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-06-24IB/hfi1: Fix module use count flaw due to leftover module put callsDennis Dalessandro1-17/+2
When the try_module_get calls were removed from opening and closing of the i2c debugfs file, the corresponding module_put calls were missed. This results in an inaccurate module use count that requires a power cycle to fix. Fixes: 09fbca8e6240 ("IB/hfi1: No need to use try_module_get for debugfs") Link: https://lore.kernel.org/r/[email protected] Cc: <[email protected]> Reviewed-by: Kaike Wan <[email protected]> Reviewed-by: Mike Marciniszyn <[email protected]> Signed-off-by: Dennis Dalessandro <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-06-24IB/hfi1: Restore kfree in dummy_netdev cleanupDennis Dalessandro1-1/+1
We need to do some rework on the dummy netdev. Calling the free_netdev() would normally make sense, and that will be addressed in an upcoming patch. For now just revert the behavior to what it was before keeping the unused variable removal part of the patch. The dd->dumm_netdev is mainly used for packet receiving through alloc_netdev_mqs() for typical net devices. A a result, it should be freed with kfree instead of free_netdev() that leads to a crash when unloading the hfi1 module: BUG: kernel NULL pointer dereference, address: 0000000000000000 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 8000000855b54067 P4D 8000000855b54067 PUD 84a4f5067 PMD 0 Oops: 0000 [#1] SMP PTI CPU: 73 PID: 10299 Comm: modprobe Not tainted 5.6.0-rc5+ #1 Hardware name: Intel Corporation S2600WT2R/S2600WT2R, BIOS SE5C610.86B.01.01.0016.033120161139 03/31/2016 RIP: 0010:__hw_addr_flush+0x12/0x80 Code: 40 00 48 83 c4 08 4c 89 e7 5b 5d 41 5c e9 76 77 18 00 66 0f 1f 44 00 00 0f 1f 44 00 00 41 54 49 89 fc 55 53 48 8b 1f 48 39 df <48> 8b 2b 75 08 eb 4a 48 89 eb 48 89 c5 48 89 df e8 99 bf d0 ff 84 RSP: 0018:ffffb40e08783db8 EFLAGS: 00010282 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000002 RDX: ffffb40e00000000 RSI: 0000000000000246 RDI: ffff88ab13662298 RBP: ffff88ab13662000 R08: 0000000000001549 R09: 0000000000001549 R10: 0000000000000001 R11: 0000000000aaaaaa R12: ffff88ab13662298 R13: ffff88ab1b259e20 R14: ffff88ab1b259e42 R15: 0000000000000000 FS: 00007fb39b534740(0000) GS:ffff88b31f940000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 000000084d3ea004 CR4: 00000000003606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: dev_addr_flush+0x15/0x30 free_netdev+0x7e/0x130 hfi1_netdev_free+0x59/0x70 [hfi1] remove_one+0x65/0x110 [hfi1] pci_device_remove+0x3b/0xc0 device_release_driver_internal+0xec/0x1b0 driver_detach+0x46/0x90 bus_remove_driver+0x58/0xd0 pci_unregister_driver+0x26/0xa0 hfi1_mod_cleanup+0xc/0xd54 [hfi1] __x64_sys_delete_module+0x16c/0x260 ? exit_to_usermode_loop+0xa4/0xc0 do_syscall_64+0x5b/0x200 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: 193ba03141bb ("IB/hfi1: Use free_netdev() in hfi1_netdev_free()") Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Mike Marciniszyn <[email protected]> Signed-off-by: Kaike Wan <[email protected]> Signed-off-by: Dennis Dalessandro <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-06-24nvme-multipath: fix bogus request queue reference putSagi Grimberg1-0/+8
The mpath disk node takes a reference on the request mpath request queue when adding live path to the mpath gendisk. However if we connected to an inaccessible path device_add_disk is not called, so if we disconnect and remove the mpath gendisk we endup putting an reference on the request queue that was never taken [1]. Fix that to check if we ever added a live path (using NVME_NS_HEAD_HAS_DISK flag) and if not, clear the disk->queue reference. [1]: ------------[ cut here ]------------ refcount_t: underflow; use-after-free. WARNING: CPU: 1 PID: 1372 at lib/refcount.c:28 refcount_warn_saturate+0xa6/0xf0 CPU: 1 PID: 1372 Comm: nvme Tainted: G O 5.7.0-rc2+ #3 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-1ubuntu1 04/01/2014 RIP: 0010:refcount_warn_saturate+0xa6/0xf0 RSP: 0018:ffffb29e8053bdc0 EFLAGS: 00010282 RAX: 0000000000000000 RBX: ffff8b7a2f4fc060 RCX: 0000000000000007 RDX: 0000000000000007 RSI: 0000000000000092 RDI: ffff8b7a3ec99980 RBP: ffff8b7a2f4fc000 R08: 00000000000002e1 R09: 0000000000000004 R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000 R13: fffffffffffffff2 R14: ffffb29e8053bf08 R15: ffff8b7a320e2da0 FS: 00007f135d4ca800(0000) GS:ffff8b7a3ec80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00005651178c0c30 CR3: 000000003b650005 CR4: 0000000000360ee0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: disk_release+0xa2/0xc0 device_release+0x28/0x80 kobject_put+0xa5/0x1b0 nvme_put_ns_head+0x26/0x70 [nvme_core] nvme_put_ns+0x30/0x60 [nvme_core] nvme_remove_namespaces+0x9b/0xe0 [nvme_core] nvme_do_delete_ctrl+0x43/0x5c [nvme_core] nvme_sysfs_delete.cold+0x8/0xd [nvme_core] kernfs_fop_write+0xc1/0x1a0 vfs_write+0xb6/0x1a0 ksys_write+0x5f/0xe0 do_syscall_64+0x52/0x1a0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Reported-by: Anton Eidelman <[email protected]> Tested-by: Anton Eidelman <[email protected]> Signed-off-by: Sagi Grimberg <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2020-06-24nvme-multipath: fix deadlock due to head->lockAnton Eidelman2-2/+4
In the following scenario scan_work and ana_work will deadlock: When scan_work calls nvme_mpath_add_disk() this holds ana_lock and invokes nvme_parse_ana_log(), which may issue IO in device_add_disk() and hang waiting for an accessible path. While nvme_mpath_set_live() only called when nvme_state_is_live(), a transition may cause NVME_SC_ANA_TRANSITION and requeue the IO. Since nvme_mpath_set_live() holds ns->head->lock, an ana_work on ANY ctrl will not be able to complete nvme_mpath_set_live() on the same ns->head, which is required in order to update the new accessible path and remove NVME_NS_ANA_PENDING.. Therefore IO never completes: deadlock [1]. Fix: Move device_add_disk out of the head->lock and protect it with an atomic test_and_set for a new NVME_NS_HEAD_HAS_DISK bit. [1]: kernel: INFO: task kworker/u8:2:160 blocked for more than 120 seconds. kernel: Tainted: G OE 5.3.5-050305-generic #201910071830 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. kernel: kworker/u8:2 D 0 160 2 0x80004000 kernel: Workqueue: nvme-wq nvme_ana_work [nvme_core] kernel: Call Trace: kernel: __schedule+0x2b9/0x6c0 kernel: schedule+0x42/0xb0 kernel: schedule_preempt_disabled+0xe/0x10 kernel: __mutex_lock.isra.0+0x182/0x4f0 kernel: __mutex_lock_slowpath+0x13/0x20 kernel: mutex_lock+0x2e/0x40 kernel: nvme_update_ns_ana_state+0x22/0x60 [nvme_core] kernel: nvme_update_ana_state+0xca/0xe0 [nvme_core] kernel: nvme_parse_ana_log+0xa1/0x180 [nvme_core] kernel: nvme_read_ana_log+0x76/0x100 [nvme_core] kernel: nvme_ana_work+0x15/0x20 [nvme_core] kernel: process_one_work+0x1db/0x380 kernel: worker_thread+0x4d/0x400 kernel: kthread+0x104/0x140 kernel: ret_from_fork+0x35/0x40 kernel: INFO: task kworker/u8:4:439 blocked for more than 120 seconds. kernel: Tainted: G OE 5.3.5-050305-generic #201910071830 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. kernel: kworker/u8:4 D 0 439 2 0x80004000 kernel: Workqueue: nvme-wq nvme_scan_work [nvme_core] kernel: Call Trace: kernel: __schedule+0x2b9/0x6c0 kernel: schedule+0x42/0xb0 kernel: io_schedule+0x16/0x40 kernel: do_read_cache_page+0x438/0x830 kernel: read_cache_page+0x12/0x20 kernel: read_dev_sector+0x27/0xc0 kernel: read_lba+0xc1/0x220 kernel: efi_partition+0x1e6/0x708 kernel: check_partition+0x154/0x244 kernel: rescan_partitions+0xae/0x280 kernel: __blkdev_get+0x40f/0x560 kernel: blkdev_get+0x3d/0x140 kernel: __device_add_disk+0x388/0x480 kernel: device_add_disk+0x13/0x20 kernel: nvme_mpath_set_live+0x119/0x140 [nvme_core] kernel: nvme_update_ns_ana_state+0x5c/0x60 [nvme_core] kernel: nvme_mpath_add_disk+0xbe/0x100 [nvme_core] kernel: nvme_validate_ns+0x396/0x940 [nvme_core] kernel: nvme_scan_work+0x256/0x390 [nvme_core] kernel: process_one_work+0x1db/0x380 kernel: worker_thread+0x4d/0x400 kernel: kthread+0x104/0x140 kernel: ret_from_fork+0x35/0x40 Fixes: 0d0b660f214d ("nvme: add ANA support") Signed-off-by: Anton Eidelman <[email protected]> Signed-off-by: Sagi Grimberg <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2020-06-24nvme: don't protect ns mutation with ns->head->lockSagi Grimberg1-8/+4
Right now ns->head->lock is protecting namespace mutation which is wrong and unneeded. Move it to only protect against head mutations. While we're at it, remove unnecessary ns->head reference as we already have head pointer. The problem with this is that the head->lock spans mpath disk node I/O that may block under some conditions (if for example the controller is disconnecting or the path became inaccessible), The locking scheme does not allow any other path to enable itself, preventing blocked I/O to complete and forward-progress from there. This is a preparation patch for the fix in a subsequent patch where the disk I/O will also be done outside the head->lock. Fixes: 0d0b660f214d ("nvme: add ANA support") Signed-off-by: Anton Eidelman <[email protected]> Signed-off-by: Sagi Grimberg <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2020-06-24nvme-multipath: fix deadlock between ana_work and scan_workAnton Eidelman1-8/+16
When scan_work calls nvme_mpath_add_disk() this holds ana_lock and invokes nvme_parse_ana_log(), which may issue IO in device_add_disk() and hang waiting for an accessible path. While nvme_mpath_set_live() only called when nvme_state_is_live(), a transition may cause NVME_SC_ANA_TRANSITION and requeue the IO. In order to recover and complete the IO ana_work on the same ctrl should be able to update the path state and remove NVME_NS_ANA_PENDING. The deadlock occurs because scan_work keeps holding ana_lock, so ana_work hangs [1]. Fix: Now nvme_mpath_add_disk() uses nvme_parse_ana_log() to obtain a copy of the ANA group desc, and then calls nvme_update_ns_ana_state() without holding ana_lock. [1]: kernel: Workqueue: nvme-wq nvme_scan_work [nvme_core] kernel: Call Trace: kernel: __schedule+0x2b9/0x6c0 kernel: schedule+0x42/0xb0 kernel: io_schedule+0x16/0x40 kernel: do_read_cache_page+0x438/0x830 kernel: read_cache_page+0x12/0x20 kernel: read_dev_sector+0x27/0xc0 kernel: read_lba+0xc1/0x220 kernel: efi_partition+0x1e6/0x708 kernel: check_partition+0x154/0x244 kernel: rescan_partitions+0xae/0x280 kernel: __blkdev_get+0x40f/0x560 kernel: blkdev_get+0x3d/0x140 kernel: __device_add_disk+0x388/0x480 kernel: device_add_disk+0x13/0x20 kernel: nvme_mpath_set_live+0x119/0x140 [nvme_core] kernel: nvme_update_ns_ana_state+0x5c/0x60 [nvme_core] kernel: nvme_set_ns_ana_state+0x1e/0x30 [nvme_core] kernel: nvme_parse_ana_log+0xa1/0x180 [nvme_core] kernel: nvme_mpath_add_disk+0x47/0x90 [nvme_core] kernel: nvme_validate_ns+0x396/0x940 [nvme_core] kernel: nvme_scan_work+0x24f/0x380 [nvme_core] kernel: process_one_work+0x1db/0x380 kernel: worker_thread+0x249/0x400 kernel: kthread+0x104/0x140 kernel: Workqueue: nvme-wq nvme_ana_work [nvme_core] kernel: Call Trace: kernel: __schedule+0x2b9/0x6c0 kernel: schedule+0x42/0xb0 kernel: schedule_preempt_disabled+0xe/0x10 kernel: __mutex_lock.isra.0+0x182/0x4f0 kernel: ? __switch_to_asm+0x34/0x70 kernel: ? select_task_rq_fair+0x1aa/0x5c0 kernel: ? kvm_sched_clock_read+0x11/0x20 kernel: ? sched_clock+0x9/0x10 kernel: __mutex_lock_slowpath+0x13/0x20 kernel: mutex_lock+0x2e/0x40 kernel: nvme_read_ana_log+0x3a/0x100 [nvme_core] kernel: nvme_ana_work+0x15/0x20 [nvme_core] kernel: process_one_work+0x1db/0x380 kernel: worker_thread+0x4d/0x400 kernel: kthread+0x104/0x140 kernel: ? process_one_work+0x380/0x380 kernel: ? kthread_park+0x80/0x80 kernel: ret_from_fork+0x35/0x40 Fixes: 0d0b660f214d ("nvme: add ANA support") Signed-off-by: Anton Eidelman <[email protected]> Signed-off-by: Sagi Grimberg <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2020-06-24nvme: fix possible deadlock when I/O is blockedSagi Grimberg1-1/+0
Revert fab7772bfbcf ("nvme-multipath: revalidate nvme_ns_head gendisk in nvme_validate_ns") When adding a new namespace to the head disk (via nvme_mpath_set_live) we will see partition scan which triggers I/O on the mpath device node. This process will usually be triggered from the scan_work which holds the scan_lock. If I/O blocks (if we got ana change currently have only available paths but none are accessible) this can deadlock on the head disk bd_mutex as both partition scan I/O takes it, and head disk revalidation takes it to check for resize (also triggered from scan_work on a different path). See trace [1]. The mpath disk revalidation was originally added to detect online disk size change, but this is no longer needed since commit cb224c3af4df ("nvme: Convert to use set_capacity_revalidate_and_notify") which already updates resize info without unnecessarily revalidating the disk (the mpath disk doesn't even implement .revalidate_disk fop). [1]: -- kernel: INFO: task kworker/u65:9:494 blocked for more than 241 seconds. kernel: Tainted: G OE 5.3.5-050305-generic #201910071830 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. kernel: kworker/u65:9 D 0 494 2 0x80004000 kernel: Workqueue: nvme-wq nvme_scan_work [nvme_core] kernel: Call Trace: kernel: __schedule+0x2b9/0x6c0 kernel: schedule+0x42/0xb0 kernel: schedule_preempt_disabled+0xe/0x10 kernel: __mutex_lock.isra.0+0x182/0x4f0 kernel: __mutex_lock_slowpath+0x13/0x20 kernel: mutex_lock+0x2e/0x40 kernel: revalidate_disk+0x63/0xa0 kernel: __nvme_revalidate_disk+0xfe/0x110 [nvme_core] kernel: nvme_revalidate_disk+0xa4/0x160 [nvme_core] kernel: ? evict+0x14c/0x1b0 kernel: revalidate_disk+0x2b/0xa0 kernel: nvme_validate_ns+0x49/0x940 [nvme_core] kernel: ? blk_mq_free_request+0xd2/0x100 kernel: ? __nvme_submit_sync_cmd+0xbe/0x1e0 [nvme_core] kernel: nvme_scan_work+0x24f/0x380 [nvme_core] kernel: process_one_work+0x1db/0x380 kernel: worker_thread+0x249/0x400 kernel: kthread+0x104/0x140 kernel: ? process_one_work+0x380/0x380 kernel: ? kthread_park+0x80/0x80 kernel: ret_from_fork+0x1f/0x40 ... kernel: INFO: task kworker/u65:1:2630 blocked for more than 241 seconds. kernel: Tainted: G OE 5.3.5-050305-generic #201910071830 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. kernel: kworker/u65:1 D 0 2630 2 0x80004000 kernel: Workqueue: nvme-wq nvme_scan_work [nvme_core] kernel: Call Trace: kernel: __schedule+0x2b9/0x6c0 kernel: schedule+0x42/0xb0 kernel: io_schedule+0x16/0x40 kernel: do_read_cache_page+0x438/0x830 kernel: ? __switch_to_asm+0x34/0x70 kernel: ? file_fdatawait_range+0x30/0x30 kernel: read_cache_page+0x12/0x20 kernel: read_dev_sector+0x27/0xc0 kernel: read_lba+0xc1/0x220 kernel: ? kmem_cache_alloc_trace+0x19c/0x230 kernel: efi_partition+0x1e6/0x708 kernel: ? vsnprintf+0x39e/0x4e0 kernel: ? snprintf+0x49/0x60 kernel: check_partition+0x154/0x244 kernel: rescan_partitions+0xae/0x280 kernel: __blkdev_get+0x40f/0x560 kernel: blkdev_get+0x3d/0x140 kernel: __device_add_disk+0x388/0x480 kernel: device_add_disk+0x13/0x20 kernel: nvme_mpath_set_live+0x119/0x140 [nvme_core] kernel: nvme_update_ns_ana_state+0x5c/0x60 [nvme_core] kernel: nvme_set_ns_ana_state+0x1e/0x30 [nvme_core] kernel: nvme_parse_ana_log+0xa1/0x180 [nvme_core] kernel: ? nvme_update_ns_ana_state+0x60/0x60 [nvme_core] kernel: nvme_mpath_add_disk+0x47/0x90 [nvme_core] kernel: nvme_validate_ns+0x396/0x940 [nvme_core] kernel: ? blk_mq_free_request+0xd2/0x100 kernel: nvme_scan_work+0x24f/0x380 [nvme_core] kernel: process_one_work+0x1db/0x380 kernel: worker_thread+0x249/0x400 kernel: kthread+0x104/0x140 kernel: ? process_one_work+0x380/0x380 kernel: ? kthread_park+0x80/0x80 kernel: ret_from_fork+0x1f/0x40 -- Fixes: fab7772bfbcf ("nvme-multipath: revalidate nvme_ns_head gendisk in nvme_validate_ns") Signed-off-by: Anton Eidelman <[email protected]> Signed-off-by: Sagi Grimberg <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2020-06-24nvme-rdma: assign completion vector correctlyMax Gurtovoy1-1/+1
The completion vector index that is given during CQ creation can't exceed the number of support vectors by the underlying RDMA device. This violation currently can accure, for example, in case one will try to connect with N regular read/write queues and M poll queues and the sum of N + M > num_supported_vectors. This will lead to failure in establish a connection to remote target. Instead, in that case, share a completion vector between queues. Signed-off-by: Max Gurtovoy <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2020-06-24nvme-loop: initialize tagset numa value to the value of the ctrlMax Gurtovoy1-2/+2
Both admin's and drive's tagsets should be set according the numa node of the controller. Signed-off-by: Max Gurtovoy <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2020-06-24nvme-tcp: initialize tagset numa value to the value of the ctrlMax Gurtovoy1-2/+2
Both admin's and drive's tagsets should be set according the numa node of the controller. Signed-off-by: Max Gurtovoy <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2020-06-24nvme-pci: initialize tagset numa value to the value of the ctrlMax Gurtovoy1-2/+2
Both admin's and drive's tagsets should be set according the numa node of the controller. Signed-off-by: Max Gurtovoy <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2020-06-24nvme-pci: override the value of the controller's numa nodeMax Gurtovoy1-0/+2
Set the node value according to the PCI device numa node. Signed-off-by: Max Gurtovoy <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2020-06-24nvme: set initial value for controller's numa nodeMax Gurtovoy1-0/+1
Initialize the node to NUMA_NO_NODE value. Transports that are aware of numa node affinity can override it (e.g. RDMA transport set the affinity according to the RDMA HCA). Signed-off-by: Max Gurtovoy <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2020-06-24usb: renesas_usbhs: getting residue from callback_resultYoshihiro Shimoda2-11/+14
This driver assumed that dmaengine_tx_status() could return the residue even if the transfer was completed. However, this was not correct usage [1] and this caused to break getting the residue after the commit 24461d9792c2 ("dmaengine: virt-dma: Fix access after free in vchan_complete()") actually. So, this is possible to get wrong received size if the usb controller gets a short packet. For example, g_zero driver causes "bad OUT byte" errors. The usb-dmac driver will support the callback_result, so this driver can use it to get residue correctly. Note that even if the usb-dmac driver has not supported the callback_result yet, this patch doesn't cause any side-effects. [1] https://lore.kernel.org/dmaengine/20200616165550.GP2324254@vkoul-mobl/ Reported-by: Hien Dang <[email protected]> Fixes: 24461d9792c2 ("dmaengine: virt-dma: Fix access after free in vchan_complete()") Signed-off-by: Yoshihiro Shimoda <[email protected]> Link: https://lore.kernel.org/r/1592482277-19563-1-git-send-email-yoshihiro.shimoda.uh@renesas.com Signed-off-by: Greg Kroah-Hartman <[email protected]>
2020-06-24block: release bip in a right way in error pathChengguang Xu1-9/+14
Release bip using kfree() in error path when that was allocated by kmalloc(). Signed-off-by: Chengguang Xu <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Acked-by: Martin K. Petersen <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2020-06-24drm/radeon: fix fb_div check in ni_init_smc_spll_table()Denis Efremov1-1/+1
clk_s is checked twice in a row in ni_init_smc_spll_table(). fb_div should be checked instead. Fixes: 69e0b57a91ad ("drm/radeon/kms: add dpm support for cayman (v5)") Cc: [email protected] Signed-off-by: Denis Efremov <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-06-24Revert "usb: dwc3: exynos: Add support for Exynos5422 suspend clk"Anand Moon1-9/+0
This reverts commit 07f6842341abe978e6375078f84506ec3280ece5. Since SCLK_SCLK_USBD300 suspend clock need to be configured for phy module, I wrongly mapped this clock to DWC3 code. Cc: Felipe Balbi <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Signed-off-by: Anand Moon <[email protected]> Cc: stable <[email protected]> Fixes: 07f6842341ab ("usb: dwc3: exynos: Add support for Exynos5422 suspend clk") Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2020-06-24xhci: Poll for U0 after disabling USB2 LPMKai-Heng Feng1-0/+3
USB2 devices with LPM enabled may interrupt the system suspend: [ 932.510475] usb 1-7: usb suspend, wakeup 0 [ 932.510549] hub 1-0:1.0: hub_suspend [ 932.510581] usb usb1: bus suspend, wakeup 0 [ 932.510590] xhci_hcd 0000:00:14.0: port 9 not suspended [ 932.510593] xhci_hcd 0000:00:14.0: port 8 not suspended .. [ 932.520323] xhci_hcd 0000:00:14.0: Port change event, 1-7, id 7, portsc: 0x400e03 .. [ 932.591405] PM: pci_pm_suspend(): hcd_pci_suspend+0x0/0x30 returns -16 [ 932.591414] PM: dpm_run_callback(): pci_pm_suspend+0x0/0x160 returns -16 [ 932.591418] PM: Device 0000:00:14.0 failed to suspend async: error -16 During system suspend, USB core will let HC suspends the device if it doesn't have remote wakeup enabled and doesn't have any children. However, from the log above we can see that the usb 1-7 doesn't get bus suspended due to not in U0. After a while the port finished U2 -> U0 transition, interrupts the suspend process. The observation is that after disabling LPM, port doesn't transit to U0 immediately and can linger in U2. xHCI spec 4.23.5.2 states that the maximum exit latency for USB2 LPM should be BESL + 10us. The BESL for the affected device is advertised as 400us, which is still not enough based on my testing result. So let's use the maximum permitted latency, 10000, to poll for U0 status to solve the issue. Cc: [email protected] Signed-off-by: Kai-Heng Feng <[email protected]> Signed-off-by: Mathias Nyman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2020-06-24xhci: Return if xHCI doesn't support LPMKai-Heng Feng1-1/+4
Just return if xHCI is quirked to disable LPM. We can save some time from reading registers and doing spinlocks. Add stable tag as we want this patch together with the next one, "Poll for U0 after disabling USB2 LPM" which fixes a suspend issue for some USB2 LPM devices Cc: [email protected] Signed-off-by: Kai-Heng Feng <[email protected]> Signed-off-by: Mathias Nyman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2020-06-24usb: host: xhci-mtk: avoid runtime suspend when removing hcdMacpaul Lin1-2/+3
When runtime suspend was enabled, runtime suspend might happen when xhci is removing hcd. This might cause kernel panic when hcd has been freed but runtime pm suspend related handle need to reference it. Signed-off-by: Macpaul Lin <[email protected]> Reviewed-by: Chunfeng Yun <[email protected]> Cc: [email protected] Signed-off-by: Mathias Nyman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2020-06-24xhci: Fix enumeration issue when setting max packet size for FS devices.Al Cooper1-0/+1
Unable to complete the enumeration of a USB TV Tuner device. Per XHCI spec (4.6.5), the EP state field of the input context shall be cleared for a set address command. In the special case of an FS device that has "MaxPacketSize0 = 8", the Linux XHCI driver does not do this before evaluating the context. With an XHCI controller that checks the EP state field for parameter context error this causes a problem in cases such as the device getting reset again after enumeration. When that field is cleared, the problem does not occur. This was found and fixed by Sasi Kumar. Cc: [email protected] Signed-off-by: Al Cooper <[email protected]> Signed-off-by: Mathias Nyman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2020-06-24xhci: Fix incorrect EP_STATE_MASKMathias Nyman1-1/+1
EP_STATE_MASK should be 0x7 instead of 0xf xhci spec 6.2.3 shows that the EP state field in the endpoint context data structure consist of bits [2:0]. The old value included a bit from the next field which fortunately is a RsvdZ region. So hopefully this hasn't caused too much harm Cc: [email protected] Signed-off-by: Mathias Nyman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2020-06-24usb: cdns3: ep0: add spinlock for cdns3_check_new_setupPeter Chen1-3/+4
The other thread may access other endpoints when the cdns3_check_new_setup is handling, add spinlock to protect it. Fixes: 7733f6c32e36 ("usb: cdns3: Add Cadence USB3 DRD Driver") Cc: <[email protected]> Reviewed-by: Pawel Laszczak <[email protected]> Signed-off-by: Peter Chen <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2020-06-24usb: cdns3: trace: using correct dir valuePeter Chen1-1/+1
It should use the correct direction value from register, not depends on previous software setting. It fixed the EP number wrong issue at trace when the TRBERR interrupt occurs for EP0IN. When the EP0IN IOC has finished, software prepares the setup packet request, the expected direction is OUT, but at that time, the TRBERR for EP0IN may occur since it is DMULT mode, the DMA does not stop until TRBERR has met. Fixes: 7733f6c32e36 ("usb: cdns3: Add Cadence USB3 DRD Driver") Cc: <[email protected]> Reviewed-by: Pawel Laszczak <[email protected]> Signed-off-by: Peter Chen <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2020-06-24usb: cdns3: ep0: fix the test mode set incorrectlyPeter Chen1-1/+2
The 'tmode' is ctrl->wIndex, changing it as the real test mode value for register assignment. Fixes: 7733f6c32e36 ("usb: cdns3: Add Cadence USB3 DRD Driver") Cc: <[email protected]> Reviewed-by: Jun Li <[email protected]> Reviewed-by: Pawel Laszczak <[email protected]> Signed-off-by: Peter Chen <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2020-06-24kselftest: arm64: Remove redundant clean targetMark Brown1-4/+0
The arm64 signal tests generate warnings during build since both they and the toplevel lib.mk define a clean target: Makefile:25: warning: overriding recipe for target 'clean' ../../lib.mk:126: warning: ignoring old recipe for target 'clean' Since the inclusion of lib.mk is in the signal Makefile there is no situation where this warning could be avoided so just remove the redundant clean target. Signed-off-by: Mark Brown <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]>
2020-06-24arm64: kpti: Add KRYO{3, 4}XX silver CPU cores to kpti safelistSai Prakash Ranjan1-0/+2
QCOM KRYO{3,4}XX silver/LITTLE CPU cores are based on Cortex-A55 and are meltdown safe, hence add them to kpti_safe_list[]. Signed-off-by: Sai Prakash Ranjan <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]>
2020-06-24arm64: Don't insert a BTI instruction at inner labelsJean-Philippe Brucker1-6/+0
Some ftrace features are broken since commit 714a8d02ca4d ("arm64: asm: Override SYM_FUNC_START when building the kernel with BTI"). For example the function_graph tracer: $ echo function_graph > /sys/kernel/debug/tracing/current_tracer [ 36.107016] WARNING: CPU: 0 PID: 115 at kernel/trace/ftrace.c:2691 ftrace_modify_all_code+0xc8/0x14c When ftrace_modify_graph_caller() attempts to write a branch at ftrace_graph_call, it finds the "BTI J" instruction inserted by SYM_INNER_LABEL() instead of a NOP, and aborts. It turns out we don't currently need the BTI landing pads inserted by SYM_INNER_LABEL: * ftrace_call and ftrace_graph_call are only used for runtime patching of the active tracer. The patched code is not reached from a branch. * install_el2_stub is reached from a CBZ instruction, which doesn't change PSTATE.BTYPE. * __guest_exit is reached from B instructions in the hyp-entry vectors, which aren't subject to BTI checks either. Remove the BTI annotation from SYM_INNER_LABEL. Fixes: 714a8d02ca4d ("arm64: asm: Override SYM_FUNC_START when building the kernel with BTI") Signed-off-by: Jean-Philippe Brucker <[email protected]> Reviewed-by: Mark Brown <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]>
2020-06-24arm64: vdso: Don't use gcc plugins for building vgettimeofday.cAlexander Popov1-1/+1
Don't use gcc plugins for building arch/arm64/kernel/vdso/vgettimeofday.c to avoid unneeded instrumentation. Signed-off-by: Alexander Popov <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]>
2020-06-24ALSA: usb-audio: Fix OOB access of mixer element listTakashi Iwai3-7/+20
The USB-audio mixer code holds a linked list of usb_mixer_elem_list, and several operations are performed for each mixer element. A few of them (snd_usb_mixer_notify_id() and snd_usb_mixer_interrupt_v2()) assume each mixer element being a usb_mixer_elem_info object that is a subclass of usb_mixer_elem_list, cast via container_of() and access it members. This may result in an out-of-bound access when a non-standard list element has been added, as spotted by syzkaller recently. This patch adds a new field, is_std_info, in usb_mixer_elem_list to indicate that the element is the usb_mixer_elem_info type or not, and skip the access to such an element if needed. Reported-by: [email protected] Reported-by: [email protected] Cc: <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Takashi Iwai <[email protected]>
2020-06-24arm64: vdso: Only pass --no-eh-frame-hdr when linker supports itWill Deacon1-2/+3
Commit 87676cfca141 ("arm64: vdso: Disable dwarf unwinding through the sigreturn trampoline") unconditionally passes the '--no-eh-frame-hdr' option to the linker when building the native vDSO in an attempt to prevent generation of the .eh_frame_hdr section, the presence of which has been implicated in segfaults originating from the libgcc unwinder. Unfortunately, not all versions of binutils support this option, which has been shown to cause build failures in linux-next: | CALL scripts/atomic/check-atomics.sh | CALL scripts/checksyscalls.sh | LD arch/arm64/kernel/vdso/vdso.so.dbg | ld: unrecognized option '--no-eh-frame-hdr' | ld: use the --help option for usage information | arch/arm64/kernel/vdso/Makefile:64: recipe for target | 'arch/arm64/kernel/vdso/vdso.so.dbg' failed | make[1]: *** [arch/arm64/kernel/vdso/vdso.so.dbg] Error 1 | arch/arm64/Makefile:175: recipe for target 'vdso_prepare' failed | make: *** [vdso_prepare] Error 2 Only link the vDSO with '--no-eh-frame-hdr' when the linker supports it. If we end up with the section due to linker defaults, the absence of CFI information in the sigreturn trampoline will prevent the unwinder from breaking. Link: https://lore.kernel.org/r/[email protected] Fixes: 87676cfca141 ("arm64: vdso: Disable dwarf unwinding through the sigreturn trampoline") Reported-by: Shaokun Zhang <[email protected]> Tested-by: Jon Hunter <[email protected]> Signed-off-by: Will Deacon <[email protected]>
2020-06-24habanalabs: increase h/w timer when checking idleOmer Shpigelman1-0/+2
In GAUDI the current timer value for the hardware to check if it is in IDLE state is too low. As a result, there are occasions where the H/W wrongly reports it is not IDLE. The driver checks that before submitting work on behalf of the driver during initialization, so a false report might cause the driver to fail during device initialization. Signed-off-by: Omer Shpigelman <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2020-06-24Revert "usb: dwc3: exynos: Add support for Exynos5422 suspend clk"Anand Moon1-9/+0
This reverts commit 07f6842341abe978e6375078f84506ec3280ece5. Since SCLK_SCLK_USBD300 suspend clock need to be configured for phy module, I wrongly mapped this clock to DWC3 code. Cc: Felipe Balbi <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Signed-off-by: Anand Moon <[email protected]> Signed-off-by: Felipe Balbi <[email protected]>
2020-06-24usb: gadget: udc: Potential Oops in error handling codeDan Carpenter1-1/+2
If this is in "transceiver" mode the the ->qwork isn't required and is a NULL pointer. This can lead to a NULL dereference when we call destroy_workqueue(udc->qwork). Fixes: 3517c31a8ece ("usb: gadget: mv_udc: use devm_xxx for probe") Signed-off-by: Dan Carpenter <[email protected]> Signed-off-by: Felipe Balbi <[email protected]>
2020-06-24usb: phy: tegra: Fix unnecessary check in tegra_usb_phy_probe()Tang Bin1-5/+1
In the function tegra_usb_phy_probe(), if usb_add_phy_dev() failed, the return value will be given to err, and if usb_add_phy_dev() succeed, the return value will be zero. Thus it is unnecessary to repeated check here. Signed-off-by: Zhang Shengju <[email protected]> Signed-off-by: Tang Bin <[email protected]> Signed-off-by: Felipe Balbi <[email protected]>
2020-06-24usb: dwc3: pci: Fix reference count leak in dwc3_pci_resume_workAditya Pakki1-1/+3
dwc3_pci_resume_work() calls pm_runtime_get_sync() that increments the reference counter. In case of failure, decrement the reference before returning. Signed-off-by: Aditya Pakki <[email protected]> Signed-off-by: Felipe Balbi <[email protected]>
2020-06-24usb: cdns3: ep0: add spinlock for cdns3_check_new_setupPeter Chen1-3/+4
The other thread may access other endpoints when the cdns3_check_new_setup is handling, add spinlock to protect it. Cc: <[email protected]> Fixes: 7733f6c32e36 ("usb: cdns3: Add Cadence USB3 DRD Driver") Reviewed-by: Pawel Laszczak <[email protected]> Signed-off-by: Peter Chen <[email protected]> Signed-off-by: Felipe Balbi <[email protected]>
2020-06-24usb: cdns3: trace: using correct dir valuePeter Chen1-1/+1
It should use the correct direction value from register, not depends on previous software setting. It fixed the EP number wrong issue at trace when the TRBERR interrupt occurs for EP0IN. When the EP0IN IOC has finished, software prepares the setup packet request, the expected direction is OUT, but at that time, the TRBERR for EP0IN may occur since it is DMULT mode, the DMA does not stop until TRBERR has met. Cc: <[email protected]> Fixes: 7733f6c32e36 ("usb: cdns3: Add Cadence USB3 DRD Driver") Reviewed-by: Pawel Laszczak <[email protected]> Signed-off-by: Peter Chen <[email protected]> Signed-off-by: Felipe Balbi <[email protected]>
2020-06-24usb: cdns3: ep0: fix the test mode set incorrectlyPeter Chen1-1/+2
The 'tmode' is ctrl->wIndex, changing it as the real test mode value for register assignment. Cc: <[email protected]> Fixes: 7733f6c32e36 ("usb: cdns3: Add Cadence USB3 DRD Driver") Reviewed-by: Jun Li <[email protected]> Signed-off-by: Peter Chen <[email protected]> Signed-off-by: Felipe Balbi <[email protected]>
2020-06-24soc: imx8m: fix build warningPeng Fan1-1/+1
Fix the build warning with x86_64-randconfig >> drivers/soc/imx/soc-imx8m.c:150:34: warning: unused variable >> 'imx8_soc_match' [-Wunused-const-variable] static const struct of_device_id imx8_soc_match[] = { ^ Fixes: fc40200ebf82 ("soc: imx: increase build coverage for imx8m soc driver") Reported-by: kernel test robot <[email protected]> Signed-off-by: Peng Fan <[email protected]> Signed-off-by: Shawn Guo <[email protected]>
2020-06-24habanalabs: Correct handling when failing to enqueue CBOfir Bitton1-0/+13
The fence release flow is different if the CS was never submitted. In that case, we don't have an hw_sob object attached that we need to "put". While if the CS was aborted, we do need to "put" the hw_sob. Signed-off-by: Ofir Bitton <[email protected]> Reviewed-by: Oded Gabbay <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2020-06-24habanalabs: increase GAUDI QMAN ARB WDT timeoutOded Gabbay1-1/+1
The current timeout is too low for some of the workloads and we see false errors as a result. Reviewed-by: Tomer Tayar <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2020-06-24habanalabs: rename mmu_write() to mmu_asid_va_write()Oded Gabbay1-2/+2
The function name conflicts with a static inline function in arch/m68k/include/asm/mcfmmu.h Reported-by: kernel test robot <[email protected]> Reviewed-by: Tomer Tayar <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2020-06-24habanalabs: use PI in MMU cache invalidationOmer Shpigelman2-0/+11
The PS flow for MMU cache invalidation caused timeouts in stress tests. Use PS + PI flow so no timeouts should happen whatsoever. Signed-off-by: Omer Shpigelman <[email protected]> Reviewed-by: Oded Gabbay <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2020-06-24habanalabs: block scalar load_and_exe on external queueOded Gabbay2-1/+27
In Gaudi, the user can't execute scalar load_and_exe on external queue because it can be a security hole. The driver doesn't parse the commands being loaded and it can be msg_prot, which the user isn't allowed to use. Reviewed-by: Tomer Tayar <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2020-06-24scsi: mptscsih: Fix read sense data sizeTomas Henzl1-3/+1
The sense data buffer in sense_buf_pool is allocated with size of MPT_SENSE_BUFFER_ALLOC(64) (multiplied by req_depth) while SNS_LEN(sc)(96) is used when reading the data. That may lead to a read from unallocated area, sometimes from another (unallocated) page. To fix this, limit the read size to MPT_SENSE_BUFFER_ALLOC. Link: https://lore.kernel.org/r/[email protected] Co-developed-by: Stanislav Saner <[email protected]> Signed-off-by: Stanislav Saner <[email protected]> Signed-off-by: Tomas Henzl <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]>
2020-06-24scsi: zfcp: Fix panic on ERP timeout for previously dismissed ERP actionSteffen Maier1-2/+11
Suppose that, for unrelated reasons, FSF requests on behalf of recovery are very slow and can run into the ERP timeout. In the case at hand, we did adapter recovery to a large degree. However due to the slowness a LUN open is pending so the corresponding fc_rport remains blocked. After fast_io_fail_tmo we trigger close physical port recovery for the port under which the LUN should have been opened. The new higher order port recovery dismisses the pending LUN open ERP action and dismisses the pending LUN open FSF request. Such dismissal decouples the ERP action from the pending corresponding FSF request by setting zfcp_fsf_req->erp_action to NULL (among other things) [zfcp_erp_strategy_check_fsfreq()]. If now the ERP timeout for the pending open LUN request runs out, we must not use zfcp_fsf_req->erp_action in the ERP timeout handler. This is a problem since v4.15 commit 75492a51568b ("s390/scsi: Convert timers to use timer_setup()"). Before that we intentionally only passed zfcp_erp_action as context argument to zfcp_erp_timeout_handler(). Note: The lifetime of the corresponding zfcp_fsf_req object continues until a (late) response or an (unrelated) adapter recovery. Just like the regular response path ignores dismissed requests [zfcp_fsf_req_complete() => zfcp_fsf_protstatus_eval() => return early] the ERP timeout handler now needs to ignore dismissed requests. So simply return early in the ERP timeout handler if the FSF request is marked as dismissed in its status flags. To protect against the race where zfcp_erp_strategy_check_fsfreq() dismisses and sets zfcp_fsf_req->erp_action to NULL after our previous status flag check, return early if zfcp_fsf_req->erp_action is NULL. After all, the former ERP action does not need to be woken up as that was already done as part of the dismissal above [zfcp_erp_action_dismiss()]. This fixes the following panic due to kernel page fault in IRQ context: Unable to handle kernel pointer dereference in virtual kernel address space Failing address: 0000000000000000 TEID: 0000000000000483 Fault in home space mode while using kernel ASCE. AS:000009859238c00b R2:00000e3e7ffd000b R3:00000e3e7ffcc007 S:00000e3e7ffd7000 P:000000000000013d Oops: 0004 ilc:2 [#1] SMP Modules linked in: ... CPU: 82 PID: 311273 Comm: stress Kdump: loaded Tainted: G E X ... Hardware name: IBM 8561 T01 701 (LPAR) Krnl PSW : 0404c00180000000 001fffff80549be0 (zfcp_erp_notify+0x40/0xc0 [zfcp]) R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3 Krnl GPRS: 0000000000000080 00000e3d00000000 00000000000000f0 0000000000030000 000000010028e700 000000000400a39c 000000010028e700 00000e3e7cf87e02 0000000010000000 0700098591cb67f0 0000000000000000 0000000000000000 0000033840e9a000 0000000000000000 001fffe008d6bc18 001fffe008d6bbc8 Krnl Code: 001fffff80549bd4: a7180000 lhi %r1,0 001fffff80549bd8: 4120a0f0 la %r2,240(%r10) #001fffff80549bdc: a53e0003 llilh %r3,3 >001fffff80549be0: ba132000 cs %r1,%r3,0(%r2) 001fffff80549be4: a7740037 brc 7,1fffff80549c52 001fffff80549be8: e320b0180004 lg %r2,24(%r11) 001fffff80549bee: e31020e00004 lg %r1,224(%r2) 001fffff80549bf4: 412020e0 la %r2,224(%r2) Call Trace: [<001fffff80549be0>] zfcp_erp_notify+0x40/0xc0 [zfcp] [<00000985915e26f0>] call_timer_fn+0x38/0x190 [<00000985915e2944>] expire_timers+0xfc/0x190 [<00000985915e2ac4>] run_timer_softirq+0xec/0x218 [<0000098591ca7c4c>] __do_softirq+0x144/0x398 [<00000985915110aa>] do_softirq_own_stack+0x72/0x88 [<0000098591551b58>] irq_exit+0xb0/0xb8 [<0000098591510c6a>] do_IRQ+0x82/0xb0 [<0000098591ca7140>] ext_int_handler+0x128/0x12c [<0000098591722d98>] clear_subpage.constprop.13+0x38/0x60 ([<000009859172ae4c>] clear_huge_page+0xec/0x250) [<000009859177e7a2>] do_huge_pmd_anonymous_page+0x32a/0x768 [<000009859172a712>] __handle_mm_fault+0x88a/0x900 [<000009859172a860>] handle_mm_fault+0xd8/0x1b0 [<0000098591529ef6>] do_dat_exception+0x136/0x3e8 [<0000098591ca6d34>] pgm_check_handler+0x1c8/0x220 Last Breaking-Event-Address: [<001fffff80549c88>] zfcp_erp_timeout_handler+0x10/0x18 [zfcp] Kernel panic - not syncing: Fatal exception in interrupt Link: https://lore.kernel.org/r/[email protected] Fixes: 75492a51568b ("s390/scsi: Convert timers to use timer_setup()") Cc: <[email protected]> #4.15+ Reviewed-by: Julian Wiedmann <[email protected]> Signed-off-by: Steffen Maier <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]>
2020-06-23scsi: lpfc: Avoid another null dereference in lpfc_sli4_hba_unset()SeongJae Park1-1/+2
Commit cdb42becdd40 ("scsi: lpfc: Replace io_channels for nvme and fcp with general hdw_queues per cpu") has introduced static checker warnings for potential null dereferences in 'lpfc_sli4_hba_unset()' and commit 1ffdd2c0440d ("scsi: lpfc: resolve static checker warning in lpfc_sli4_hba_unset") has tried to fix it. However, yet another potential null dereference is remaining. This commit fixes it. This bug was discovered and resolved using Coverity Static Analysis Security Testing (SAST) by Synopsys, Inc. Link: https://lore.kernel.org/r/[email protected] Fixes: 1ffdd2c0440d ("scsi: lpfc: resolve static checker warning inlpfc_sli4_hba_unset") Fixes: cdb42becdd40 ("scsi: lpfc: Replace io_channels for nvme and fcp with general hdw_queues per cpu") Reviewed-by: James Smart <[email protected]> Signed-off-by: SeongJae Park <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]>