aboutsummaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_cpu.c
diff options
context:
space:
mode:
authorTao Chiu <taochiu@synology.com>2021-04-26 10:53:55 +0800
committerChristoph Hellwig <hch@lst.de>2021-05-04 09:35:50 +0200
commitd4060d2be1132596154f31f4d57976bd103e969d (patch)
tree0b6e872a10d8e4431e462c2b149847c458c2ea97 /drivers/gpu/drm/amd/amdgpu/amdgpu_vm_cpu.c
parenta97157440e1e69c35d7804d3b72da0c626ef28e6 (diff)
nvme-pci: fix controller reset hang when racing with nvme_timeout
reset_work() in nvme-pci may hang forever in the following scenario: 1) A reset caused by a command timeout occurs due to a controller being temporarily irresponsive. 2) nvme_reset_work() restarts admin queue at nvme_alloc_admin_tags(). At the same time, a user-submitted admin command is queued and waiting for completion. Then, reset_work() changes its state to CONNECTING, and submits an identify command. 3) However, the controller does still not respond to any command, causing a timeout being fired at the user-submitted command. Unfortunately, nvme_timeout() does not see the completion on cq, and any timeout that takes place under CONNECTING state causes a controller shutdown. 4) Normally, the identify command in reset_work() would be canceled with SC_HOST_ABORTED by nvme_dev_disable(), then reset_work can tear down the controller accordingly. But the controller happens to return online and respond the identify command before nvme_dev_disable() should have been reaped it off. 5) reset_work() continues to setup_io_queues() as it observes no error in init_identify(). However, the admin queue has already been quiesced in dev_disable(). Thus, any following commands would be blocked forever in blk_execute_rq(). This can be fixed by restricting usercmd commands when controller is not in a LIVE state in nvme_queue_rq(), as what has been done previously in fabrics. ``` nvme_reset_work(): | nvme_alloc_admin_tags() | | nvme_submit_user_cmd(): nvme_init_identify(): | ... __nvme_submit_sync_cmd(): | ... | ... ---------------------------------------> nvme_timeout(): (Controller starts reponding commands) | nvme_dev_disable(, true): nvme_setup_io_queues(): | __nvme_submit_sync_cmd(): | (hung in blk_execute_rq | since run_hw_queue sees | queue quiesced) | ``` Signed-off-by: Tao Chiu <taochiu@synology.com> Signed-off-by: Cody Wong <codywong@synology.com> Reviewed-by: Leon Chien <leonchien@synology.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
Diffstat (limited to 'drivers/gpu/drm/amd/amdgpu/amdgpu_vm_cpu.c')
0 files changed, 0 insertions, 0 deletions