Age | Commit message (Collapse) | Author | Files | Lines |
|
Don't forget to update wqe->hash_tail after cancelling a pending work
item, if it was hashed.
Cc: [email protected] # 5.7+
Reported-by: Dmitry Shulyak <[email protected]>
Fixes: 86f3cd1b589a1 ("io-wq: handle hashed writes in chains")
Signed-off-by: Pavel Begunkov <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
If an application is doing reads on signalfd, and we arm the poll handler
because there's no data available, then the wakeup can recurse on the
tasks sighand->siglock as the signal delivery from task_work_add() will
use TWA_SIGNAL and that attempts to lock it again.
We can detect the signalfd case pretty easily by comparing the poll->head
wait_queue_head_t with the target task signalfd wait queue. Just use
normal task wakeup for this case.
Cc: [email protected] # v5.7+
Signed-off-by: Jens Axboe <[email protected]>
|
|
Some device drivers call libusb_clear_halt when target ep queue
is not empty. (eg. spice client connected to qemu for usb redir)
Before commit f5249461b504 ("xhci: Clear the host side toggle
manually when endpoint is soft reset"), that works well.
But now, we got the error log:
EP not empty, refuse reset
xhci_endpoint_reset failed and left ep_state's EP_SOFT_CLEAR_TOGGLE
bit still set
So all the subsequent urb sumbits to the ep will fail with the
warn log:
Can't enqueue URB while manually clearing toggle
We need to clear ep_state EP_SOFT_CLEAR_TOGGLE bit after
xhci_endpoint_reset, even if it failed.
Fixes: f5249461b504 ("xhci: Clear the host side toggle manually when endpoint is soft reset")
Cc: stable <[email protected]> # v4.17+
Signed-off-by: Ding Hui <[email protected]>
Signed-off-by: Mathias Nyman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
Sometimes re-plugging a USB device during system sleep renders the device
useless:
[ 173.418345] xhci_hcd 0000:00:14.0: Get port status 2-4 read: 0x14203e2, return 0x10262
...
[ 176.496485] usb 2-4: Waited 2000ms for CONNECT
[ 176.496781] usb usb2-port4: status 0000.0262 after resume, -19
[ 176.497103] usb 2-4: can't resume, status -19
[ 176.497438] usb usb2-port4: logical disconnect
Because PLS equals to XDEV_RESUME, xHCI driver reports U3 to usbcore,
despite of CAS bit is flagged.
So proritize CAS over XDEV_RESUME to let usbcore handle warm-reset for
the port.
Cc: stable <[email protected]>
Signed-off-by: Kai-Heng Feng <[email protected]>
Signed-off-by: Mathias Nyman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
dci is 0 based and xhci_get_ep_ctx() will do ep index increment to get
the ep context.
[rename dci to ep_index -Mathias]
Cc: stable <[email protected]> # v4.15+
Fixes: 02b6fdc2a153 ("usb: xhci: Add debugfs interface for xHCI driver")
Signed-off-by: Li Jun <[email protected]>
Signed-off-by: Mathias Nyman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
git://people.freedesktop.org/~gabbayo/linux into char-misc-linus
Oded writes:
This tag contains the following bug fixes for 5.9-rc2/3:
- Correct cleanup of PCI bar mapping in case of failure during
initialization.
- Several security fixes:
- Validating user addresses before mapping them
- Validating packet id (from user) before using it as index for array.
- Validating F/W file size before coping it.
- Prevent possible overflow when validating address from user in
profiler.
- Validate queue index (from user) before using it as index for array.
- Check for correct vmalloc return code
- Fix memory corruption in debugfs entry
- Fix a loop in gaudi_extract_ecc_info()
- Fix the set clock gating function in gaudi code
- Set maximum power to F/W according to the card type
- Cix incorrect check on failed workqueue create
- Correctly report error when configuring the PCI controller
* tag 'misc-habanalabs-fixes-2020-08-22' of git://people.freedesktop.org/~gabbayo/linux:
habanalabs: correctly report inbound pci region cfg error
habanalabs: check correct vmalloc return code
habanalabs: validate FW file size
habanalabs: fix incorrect check on failed workqueue create
habanalabs: set max power according to card type
habanalabs: proper handling of alloc size in coresight
habanalabs: set clock gating according to mask
habanalabs: verify user input in cs_ioctl_signal_wait
habanalabs: Fix a loop in gaudi_extract_ecc_info()
habanalabs: Fix memory corruption in debugfs
habanalabs: validate packet id during CB parse
habanalabs: Validate user address before mapping
habanalabs: unmap PCI bars upon iATU failure
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull epoll fixes from Al Viro:
"Fix reference counting and clean up exit paths"
* 'work.epoll' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
do_epoll_ctl(): clean the failure exits up a bit
epoll: Keep a reference on files added to the check list
|
|
Signed-off-by: Al Viro <[email protected]>
|
|
When adding a new fd to an epoll, and that this new fd is an
epoll fd itself, we recursively scan the fds attached to it
to detect cycles, and add non-epool files to a "check list"
that gets subsequently parsed.
However, this check list isn't completely safe when deletions
can happen concurrently. To sidestep the issue, make sure that
a struct file placed on the check list sees its f_count increased,
ensuring that a concurrent deletion won't result in the file
disapearing from under our feet.
Cc: [email protected]
Signed-off-by: Marc Zyngier <[email protected]>
Signed-off-by: Al Viro <[email protected]>
|
|
Currently the nexthop code will use an empty NHA_GROUP attribute, but it
requires at least 1 entry in order to function properly. Otherwise we
end up derefencing null or random pointers all over the place due to not
having any nh_grp_entry members allocated, nexthop code relies on having at
least the first member present. Empty NHA_GROUP doesn't make any sense so
just disallow it.
Also add a WARN_ON for any future users of nexthop_create_group().
BUG: kernel NULL pointer dereference, address: 0000000000000080
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: 0000 [#1] SMP
CPU: 0 PID: 558 Comm: ip Not tainted 5.9.0-rc1+ #93
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-2.fc32 04/01/2014
RIP: 0010:fib_check_nexthop+0x4a/0xaa
Code: 0f 84 83 00 00 00 48 c7 02 80 03 f7 81 c3 40 80 fe fe 75 12 b8 ea ff ff ff 48 85 d2 74 6b 48 c7 02 40 03 f7 81 c3 48 8b 40 10 <48> 8b 80 80 00 00 00 eb 36 80 78 1a 00 74 12 b8 ea ff ff ff 48 85
RSP: 0018:ffff88807983ba00 EFLAGS: 00010213
RAX: 0000000000000000 RBX: ffff88807983bc00 RCX: 0000000000000000
RDX: ffff88807983bc00 RSI: 0000000000000000 RDI: ffff88807bdd0a80
RBP: ffff88807983baf8 R08: 0000000000000dc0 R09: 000000000000040a
R10: 0000000000000000 R11: ffff88807bdd0ae8 R12: 0000000000000000
R13: 0000000000000000 R14: ffff88807bea3100 R15: 0000000000000001
FS: 00007f10db393700(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000080 CR3: 000000007bd0f004 CR4: 00000000003706f0
Call Trace:
fib_create_info+0x64d/0xaf7
fib_table_insert+0xf6/0x581
? __vma_adjust+0x3b6/0x4d4
inet_rtm_newroute+0x56/0x70
rtnetlink_rcv_msg+0x1e3/0x20d
? rtnl_calcit.isra.0+0xb8/0xb8
netlink_rcv_skb+0x5b/0xac
netlink_unicast+0xfa/0x17b
netlink_sendmsg+0x334/0x353
sock_sendmsg_nosec+0xf/0x3f
____sys_sendmsg+0x1a0/0x1fc
? copy_msghdr_from_user+0x4c/0x61
___sys_sendmsg+0x63/0x84
? handle_mm_fault+0xa39/0x11b5
? sockfd_lookup_light+0x72/0x9a
__sys_sendmsg+0x50/0x6e
do_syscall_64+0x54/0xbe
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7f10dacc0bb7
Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb cd 66 0f 1f 44 00 00 8b 05 9a 4b 2b 00 85 c0 75 2e 48 63 ff 48 63 d2 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 01 c3 48 8b 15 b1 f2 2a 00 f7 d8 64 89 02 48
RSP: 002b:00007ffcbe628bf8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007ffcbe628f80 RCX: 00007f10dacc0bb7
RDX: 0000000000000000 RSI: 00007ffcbe628c60 RDI: 0000000000000003
RBP: 000000005f41099c R08: 0000000000000001 R09: 0000000000000008
R10: 00000000000005e9 R11: 0000000000000246 R12: 0000000000000000
R13: 0000000000000000 R14: 00007ffcbe628d70 R15: 0000563a86c6e440
Modules linked in:
CR2: 0000000000000080
CC: David Ahern <[email protected]>
Fixes: 430a049190de ("nexthop: Add support for nexthop groups")
Reported-by: [email protected]
Signed-off-by: Nikolay Aleksandrov <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The patch reorganizing the set_freq function made it so the gmu resume
doesn't always set the frequency, because a6xx_gmu_set_freq() exits early
when the frequency hasn't been changed. Note this always happens when
resuming GMU after recovering from a hang.
Use a simple workaround to prevent this from happening.
Fixes: 1f60d11423db ("drm: msm: a6xx: send opp instead of a frequency")
Signed-off-by: Jonathan Marek <[email protected]>
Signed-off-by: Rob Clark <[email protected]>
|
|
For production devices, the debugbus sections will typically be fused
off and empty in the gpu device coredump. But since this may contain
data like cache contents, don't capture it by default.
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Jordan Crouse <[email protected]>
Signed-off-by: Rob Clark <[email protected]>
|
|
Backport note: maybe wait some time for the crashdec MR[1] to look for
both the old typo'd name and the corrected name to land in mesa 20.2
[1] https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6242
Fixes: 1707add81551 ("drm/msm/a6xx: Add a6xx gpu state")
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Jordan Crouse <[email protected]>
Signed-off-by: Rob Clark <[email protected]>
|
|
New Qualcomm firmware has changed a way it reports back the 'started'
event. Support new register values.
Signed-off-by: Dmitry Baryshkov <[email protected]>
Signed-off-by: Rob Clark <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild fixes from Masahiro Yamada:
- move -Wsign-compare warning from W=2 to W=3
- fix the keyword _restrict to __restrict in genksyms
- fix more bugs in qconf
* tag 'kbuild-fixes-v5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
kconfig: qconf: replace deprecated QString::sprintf() with QTextStream
kconfig: qconf: remove redundant help in the info view
kconfig: qconf: remove qInfo() to get back Qt4 support
kconfig: qconf: remove unused colNr
kconfig: qconf: fix the popup menu in the ConfigInfoView window
kconfig: qconf: fix signal connection to invalid slots
genksyms: keywords: Use __restrict not _restrict
kbuild: remove redundant patterns in filter/filter-out
extract-cert: add static to local data
Makefile.extrawarn: Move sign-compare from W=2 to W=3
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Catalin Marinas:
- Allow booting of late secondary CPUs affected by erratum 1418040
(currently they are parked if none of the early CPUs are affected by
this erratum).
- Add the 32-bit vdso Makefile to the vdso_install rule so that 'make
vdso_install' installs the 32-bit compat vdso when it is compiled.
- Print a warning that untrusted guests without a CPU erratum
workaround (Cortex-A57 832075) may deadlock the affected system.
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
ARM64: vdso32: Install vdso32 from vdso_install
KVM: arm64: Print warning when cpu erratum can cause guests to deadlock
arm64: Allow booting of late CPUs affected by erratum 1418040
arm64: Move handling of erratum 1418040 into C code
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Vasily Gorbik:
- a couple of fixes for storage key handling relevant for debugging
- add cond_resched into potentially slow subchannels scanning loop
- fixes for PF/VF linking and to ignore stale PCI configuration request
events
* tag 's390-5.9-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/pci: fix PF/VF linking on hot plug
s390/pci: re-introduce zpci_remove_device()
s390/pci: fix zpci_bus_link_virtfn()
s390/ptrace: fix storage key handling
s390/runtime_instrumentation: fix storage key handling
s390/pci: ignore stale configuration request event
s390/cio: add cond_resched() in the slow_eval_known_fn() loop
|
|
Pull kvm fixes from Paolo Bonzini:
- PAE and PKU bugfixes for x86
- selftests fix for new binutils
- MMU notifier fix for arm64
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: arm64: Only reschedule if MMU_NOTIFIER_RANGE_BLOCKABLE is not set
KVM: Pass MMU notifier range flags to kvm_unmap_hva_range()
kvm: x86: Toggling CR4.PKE does not load PDPTEs in PAE mode
kvm: x86: Toggling CR4.SMAP does not load PDPTEs in PAE mode
KVM: x86: fix access code passed to gva_to_gpa
selftests: kvm: Use a shorter encoding to clear RAX
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"23 fixes in 5 drivers (qla2xxx, ufs, scsi_debug, fcoe, zfcp). The bulk
of the changes are in qla2xxx and ufs and all are mostly small and
definitely don't impact the core"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (23 commits)
Revert "scsi: qla2xxx: Disable T10-DIF feature with FC-NVMe during probe"
Revert "scsi: qla2xxx: Fix crash on qla2x00_mailbox_command"
scsi: qla2xxx: Fix null pointer access during disconnect from subsystem
scsi: qla2xxx: Check if FW supports MQ before enabling
scsi: qla2xxx: Fix WARN_ON in qla_nvme_register_hba
scsi: qla2xxx: Allow ql2xextended_error_logging special value 1 to be set anytime
scsi: qla2xxx: Reduce noisy debug message
scsi: qla2xxx: Fix login timeout
scsi: qla2xxx: Indicate correct supported speeds for Mezz card
scsi: qla2xxx: Flush I/O on zone disable
scsi: qla2xxx: Flush all sessions on zone disable
scsi: qla2xxx: Use MBX_TOV_SECONDS for mailbox command timeout values
scsi: scsi_debug: Fix scp is NULL errors
scsi: zfcp: Fix use-after-free in request timeout handlers
scsi: ufs: No need to send Abort Task if the task in DB was cleared
scsi: ufs: Clean up completed request without interrupt notification
scsi: ufs: Improve interrupt handling for shared interrupts
scsi: ufs: Fix interrupt error message for shared interrupts
scsi: ufs-pci: Add quirk for broken auto-hibernate for Intel EHL
scsi: ufs-mediatek: Fix incorrect time to wait link status
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
Pull devicetree fixes from Rob Herring:
"Another set of DT fixes:
- restore range parsing error check
- workaround PCI range parsing with missing 'device_type' now
required
- correct description of 'phy-connection-type'
- fix erroneous matching on 'snps,dw-pcie' by 'intel,lgm-pcie' schema
- a couple of grammar and whitespace fixes
- update Shawn Guo's email"
* tag 'devicetree-fixes-for-5.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
dt-bindings: vendor-prefixes: Remove trailing whitespace
dt-bindings: net: correct description of phy-connection-type
dt-bindings: PCI: intel,lgm-pcie: Fix matching on all snps,dw-pcie instances
of: address: Work around missing device_type property in pcie nodes
dt: writing-schema: Miscellaneous grammar fixes
dt-bindings: Use Shawn Guo's preferred e-mail for i.MX bindings
of/address: check for invalid range.cpu_addr
|
|
During inbound iATU configuration we can get errors while
configuring PCI registers, there is a certain scenario in which these
errors are not reflected and driver is loaded with wrong configuration.
Signed-off-by: Ofir Bitton <[email protected]>
Signed-off-by: Oded Gabbay <[email protected]>
|
|
vmalloc can return different return code than NULL and a valid
pointer. We must validate it in order to dereference a non valid
pointer.
Signed-off-by: Ofir Bitton <[email protected]>
Signed-off-by: Oded Gabbay <[email protected]>
|
|
We must validate FW size in order not to corrupt memory in case
a malicious FW file will be present in system.
Signed-off-by: Ofir Bitton <[email protected]>
Signed-off-by: Oded Gabbay <[email protected]>
|
|
The null check on a failed workqueue create is currently null checking
hdev->cq_wq rather than the pointer hdev->cq_wq[i] and so the test
will never be true on a failed workqueue create. Fix this by checking
hdev->cq_wq[i].
Addresses-Coverity: ("Dereference before null check")
Fixes: 5574cb2194b1 ("habanalabs: Assign each CQ with its own work queue")
Signed-off-by: Colin Ian King <[email protected]>
Reviewed-by: Oded Gabbay <[email protected]>
Signed-off-by: Oded Gabbay <[email protected]>
|
|
In Gaudi, the default max power setting is different between PCI and PMC
cards. Therefore, the driver need to set the default after knowing what is
the card type.
The current code has a bug where it limits the maximum power of the PMC
card to 200W after a reset occurs.
Signed-off-by: Oded Gabbay <[email protected]>
|
|
Allocation size can go up to 64bit but truncated to 32bit,
we should make sure it is not truncated and validate no address
overflow.
Signed-off-by: Ofir Bitton <[email protected]>
Reviewed-by: Oded Gabbay <[email protected]>
Signed-off-by: Oded Gabbay <[email protected]>
|
|
Once clock gating is set we enable clock gating according to mask,
we should also disable clock gating according to relevant bits.
Signed-off-by: Ofir Bitton <[email protected]>
Reviewed-by: Oded Gabbay <[email protected]>
Signed-off-by: Oded Gabbay <[email protected]>
|
|
User input must be validated before using it to
access internal structures.
Signed-off-by: Ofir Bitton <[email protected]>
Reviewed-by: Oded Gabbay <[email protected]>
Signed-off-by: Oded Gabbay <[email protected]>
|
|
The condition was reversed. It should have been less than instead of
greater than. The result is that we never enter the loop.
Fixes: fcc6a4e60678 ("habanalabs: Extract ECC information from FW")
Signed-off-by: Dan Carpenter <[email protected]>
Reviewed-by: Oded Gabbay <[email protected]>
Signed-off-by: Oded Gabbay <[email protected]>
|
|
This has to be a long instead of a u32 because we write a long value.
On 64 bit systems, this will cause memory corruption.
Fixes: c216477363a3 ("habanalabs: add debugfs support")
Signed-off-by: Dan Carpenter <[email protected]>
Reviewed-by: Oded Gabbay <[email protected]>
Signed-off-by: Oded Gabbay <[email protected]>
|
|
During command buffer parsing, driver extracts packet id
from user buffer. Driver must validate this packet id, since it is
being used in order to extract information from internal structures.
Signed-off-by: Ofir Bitton <[email protected]>
Reviewed-by: Oded Gabbay <[email protected]>
Signed-off-by: Oded Gabbay <[email protected]>
|
|
User address must be validated before driver performs address map.
Signed-off-by: Ofir Bitton <[email protected]>
Reviewed-by: Oded Gabbay <[email protected]>
Signed-off-by: Oded Gabbay <[email protected]>
|
|
In case the driver fails to configure the PCI controller iATU, it needs to
unmap the PCI bars before exiting so if the driver is removed, the bars
won't be left mapped.
Signed-off-by: Ofir Bitton <[email protected]>
Reviewed-by: Oded Gabbay <[email protected]>
Signed-off-by: Oded Gabbay <[email protected]>
|
|
This has roughly the same effect as drm_atomic_helper_wait_for_vblanks(),
basically just ensuring that vblank accounting is enabled so that we get
valid timestamp/seqn on pageflip events.
Signed-off-by: Rob Clark <[email protected]>
Tested-by: Stephen Boyd <[email protected]>
Signed-off-by: Rob Clark <[email protected]>
|
|
REQ_FUA should be checked using rq->cmd_flags instead of req_op().
Fixes: deb78b419dfda ("nullb: emulate cache")
Signed-off-by: Hou Pu <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Based on nvme spec, when keep alive timeout is set to zero
the keep-alive timer should be disabled.
Signed-off-by: Amit Engel <[email protected]>
Signed-off-by: Sagi Grimberg <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
If a command send through nvme-multipath failed on a dying queue, resend it
on another path.
Signed-off-by: Chao Leng <[email protected]>
[hch: rebased on top of the completion refactoring]
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
Reviewed-by: Mike Snitzer <[email protected]>
Signed-off-by: Sagi Grimberg <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Check the SCT sub-field for a path related status instead of enumerating
invididual status code. As of NVMe 1.4 this adds "Internal Path Error"
and "Controller Pathing Error" to the list, but it also future proofs for
additional status codes added to the category.
Suggested-by: Chao Leng <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Mike Snitzer <[email protected]>
Signed-off-by: Sagi Grimberg <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Lift all the code to decide the dispostition of a completed command
from nvme_complete_rq and nvme_failover_req into a new helper, which
returns an emum of the potential actions. nvme_complete_rq then
just switches on those and calls the proper helper for the action.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Mike Snitzer <[email protected]>
Signed-off-by: Sagi Grimberg <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
nvme_end_request is a bit misnamed, as it wraps around the
blk_mq_complete_* API. It's semantics also are non-trivial, so give it
a more descriptive name and add a comment explaining the semantics.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
Reviewed-by: Mike Snitzer <[email protected]>
Signed-off-by: Sagi Grimberg <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Zoned block devices reuse the chunk_sectors queue limit to define zone
boundaries. If a such a device happens to also report an optimal
boundary, do not use that to define the chunk_sectors as that may
intermittently interfere with io splitting and zone size queries.
Signed-off-by: Keith Busch <[email protected]>
Signed-off-by: Sagi Grimberg <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
All operations are based on the controller, not the host page size.
Switch the dma pool to use the controller page size as well to avoid
massive overallocations on large page size systems.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Keith Busch <[email protected]>
Signed-off-by: Sagi Grimberg <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Recently nvme_dev.q_depth was changed from an int to u16 type.
This falls over for the queue depth calculation in nvme_pci_enable(),
where NVME_CAP_MQES(dev->ctrl.cap) + 1 may overflow as a u16, as
NVME_CAP_MQES() is a 16b number also. That happens for me, and this is the
result:
root@ubuntu:/home/john# [148.272996] Unable to handle kernel NULL pointer
dereference at virtual address 0000000000000010
Mem abort info:
ESR = 0x96000004
EC = 0x25: DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
Data abort info:
ISV = 0, ISS = 0x00000004
CM = 0, WnR = 0
user pgtable: 4k pages, 48-bit VAs, pgdp=00000a27bf3c9000
[0000000000000010] pgd=0000000000000000, p4d=0000000000000000
Internal error: Oops: 96000004 [#1] PREEMPT SMP
Modules linked in: nvme nvme_core
CPU: 56 PID: 256 Comm: kworker/u195:0 Not tainted
5.8.0-next-20200812 #27
Hardware name: Huawei D06 /D06, BIOS Hisilicon D06 UEFI RC0 -
V1.16.01 03/15/2019
Workqueue: nvme-reset-wq nvme_reset_work [nvme]
pstate: 80c00009 (Nzcv daif +PAN +UAO BTYPE=--)
pc : __sg_alloc_table_from_pages+0xec/0x238
lr : __sg_alloc_table_from_pages+0xc8/0x238
sp : ffff800013ccbad0
x29: ffff800013ccbad0 x28: ffff0a27b3d380a8
x27: 0000000000000000 x26: 0000000000002dc2
x25: 0000000000000dc0 x24: 0000000000000000
x23: 0000000000000000 x22: ffff800013ccbbe8
x21: 0000000000000010 x20: 0000000000000000
x19: 00000000fffff000 x18: ffffffffffffffff
x17: 00000000000000c0 x16: fffffe289eaf6380
x15: ffff800011b59948 x14: ffff002bc8fe98f8
x13: ff00000000000000 x12: ffff8000114ca000
x11: 0000000000000000 x10: ffffffffffffffff
x9 : ffffffffffffffc0 x8 : ffff0a27b5f9b6a0
x7 : 0000000000000000 x6 : 0000000000000001
x5 : ffff0a27b5f9b680 x4 : 0000000000000000
x3 : ffff0a27b5f9b680 x2 : 0000000000000000
x1 : 0000000000000001 x0 : 0000000000000000
Call trace:
__sg_alloc_table_from_pages+0xec/0x238
sg_alloc_table_from_pages+0x18/0x28
iommu_dma_alloc+0x474/0x678
dma_alloc_attrs+0xd8/0xf0
nvme_alloc_queue+0x114/0x160 [nvme]
nvme_reset_work+0xb34/0x14b4 [nvme]
process_one_work+0x1e8/0x360
worker_thread+0x44/0x478
kthread+0x150/0x158
ret_from_fork+0x10/0x34
Code: f94002c3 6b01017f 540007c2 11000486 (f8645aa5)
---[ end trace 89bb2b72d59bf925 ]---
Fix by making onto a u32.
Also use u32 for nvme_dev.q_depth, as we assign this value from
nvme_dev.q_depth, and nvme_dev.q_depth will possibly hold 65536 - this
avoids the same crash as above.
Fixes: 61f3b8963097 ("nvme-pci: use unsigned for io queue depth")
Signed-off-by: John Garry <[email protected]>
Reviewed-by: Keith Busch <[email protected]>
Signed-off-by: Sagi Grimberg <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
When locking the ctrl->lock spinlock IRQs need to be disabled to avoid a
dead lock. The new spin_lock() calls recently added produce the
following lockdep warning when running the blktest nvme/003:
================================
WARNING: inconsistent lock state
--------------------------------
inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
ksoftirqd/2/22 [HC0[0]:SC1[1]:HE0:SE0] takes:
ffff888276a8c4c0 (&ctrl->lock){+.?.}-{2:2}, at: nvme_keep_alive_end_io+0x50/0xc0
{SOFTIRQ-ON-W} state was registered at:
lock_acquire+0x164/0x500
_raw_spin_lock+0x28/0x40
nvme_get_effects_log+0x37/0x1c0
nvme_init_identify+0x9e4/0x14f0
nvme_reset_work+0xadd/0x2360
process_one_work+0x66b/0xb70
worker_thread+0x6e/0x6c0
kthread+0x1e7/0x210
ret_from_fork+0x22/0x30
irq event stamp: 1449221
hardirqs last enabled at (1449220): [<ffffffff81c58e69>] ktime_get+0xf9/0x140
hardirqs last disabled at (1449221): [<ffffffff83129665>] _raw_spin_lock_irqsave+0x25/0x60
softirqs last enabled at (1449210): [<ffffffff83400447>] __do_softirq+0x447/0x595
softirqs last disabled at (1449215): [<ffffffff81b489b5>] run_ksoftirqd+0x35/0x50
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&ctrl->lock);
<Interrupt>
lock(&ctrl->lock);
*** DEADLOCK ***
no locks held by ksoftirqd/2/22.
stack backtrace:
CPU: 2 PID: 22 Comm: ksoftirqd/2 Not tainted 5.8.0-rc4-eid-vmlocalyes-dbg-00157-g7236657c6b3a #1450
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.12.0-1 04/01/2014
Call Trace:
dump_stack+0xc8/0x11a
print_usage_bug.cold.63+0x235/0x23e
mark_lock+0xa9c/0xcf0
__lock_acquire+0xd9a/0x2b50
lock_acquire+0x164/0x500
_raw_spin_lock_irqsave+0x40/0x60
nvme_keep_alive_end_io+0x50/0xc0
blk_mq_end_request+0x158/0x210
nvme_complete_rq+0x146/0x500
nvme_loop_complete_rq+0x26/0x30 [nvme_loop]
blk_done_softirq+0x187/0x1e0
__do_softirq+0x118/0x595
run_ksoftirqd+0x35/0x50
smpboot_thread_fn+0x1d3/0x310
kthread+0x1e7/0x210
ret_from_fork+0x22/0x30
Fixes: be93e87e7802 ("nvme: support for multiple Command Sets Supported and Effects log pages")
Signed-off-by: Logan Gunthorpe <[email protected]>
Reviewed-by: Keith Busch <[email protected]>
Tested-by: Chaitanya Kulkarni <[email protected]>
Reviewed-by: Chaitanya Kulkarni <[email protected]>
Signed-off-by: Sagi Grimberg <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Instead of calling blk_put_request() which calls blk_mq_free_request(),
call blk_mq_free_request() directly for NVMeOF passthru. This is to
mainly avoid an extra function call in the completion path
nvmet_passthru_req_done().
Signed-off-by: Chaitanya Kulkarni <[email protected]>
Reviewed-by: Keith Busch <[email protected]>
Reviewed-by: Logan Gunthorpe <[email protected]>
Signed-off-by: Sagi Grimberg <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
In the existing NVMeOF Passthru core command handling on failure of
nvme_alloc_request() it errors out with rq value set to NULL. In the
error handling path it calls blk_put_request() without checking if
rq is set to NULL or not which produces following Oops:-
[ 1457.346861] BUG: kernel NULL pointer dereference, address: 0000000000000000
[ 1457.347838] #PF: supervisor read access in kernel mode
[ 1457.348464] #PF: error_code(0x0000) - not-present page
[ 1457.349085] PGD 0 P4D 0
[ 1457.349402] Oops: 0000 [#1] SMP NOPTI
[ 1457.349851] CPU: 18 PID: 10782 Comm: kworker/18:2 Tainted: G OE 5.8.0-rc4nvme-5.9+ #35
[ 1457.350951] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e3214
[ 1457.352347] Workqueue: events nvme_loop_execute_work [nvme_loop]
[ 1457.353062] RIP: 0010:blk_mq_free_request+0xe/0x110
[ 1457.353651] Code: 3f ff ff ff 83 f8 01 75 0d 4c 89 e7 e8 1b db ff ff e9 2d ff ff ff 0f 0b eb ef 66 8
[ 1457.355975] RSP: 0018:ffffc900035b7de0 EFLAGS: 00010282
[ 1457.356636] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000002
[ 1457.357526] RDX: ffffffffa060bd05 RSI: 0000000000000000 RDI: 0000000000000000
[ 1457.358416] RBP: 0000000000000037 R08: 0000000000000000 R09: 0000000000000000
[ 1457.359317] R10: 0000000000000000 R11: 000000000000006d R12: 0000000000000000
[ 1457.360424] R13: ffff8887ffa68600 R14: 0000000000000000 R15: ffff8888150564c8
[ 1457.361322] FS: 0000000000000000(0000) GS:ffff888814600000(0000) knlGS:0000000000000000
[ 1457.362337] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1457.363058] CR2: 0000000000000000 CR3: 000000081c0ac000 CR4: 00000000003406e0
[ 1457.363973] Call Trace:
[ 1457.364296] nvmet_passthru_execute_cmd+0x150/0x2c0 [nvmet]
[ 1457.364990] process_one_work+0x24e/0x5a0
[ 1457.365493] ? __schedule+0x353/0x840
[ 1457.365957] worker_thread+0x3c/0x380
[ 1457.366426] ? process_one_work+0x5a0/0x5a0
[ 1457.366948] kthread+0x135/0x150
[ 1457.367362] ? kthread_create_on_node+0x60/0x60
[ 1457.367934] ret_from_fork+0x22/0x30
[ 1457.368388] Modules linked in: nvme_loop(OE) nvmet(OE) nvme_fabrics(OE) null_blk nvme(OE) nvme_corer
[ 1457.368414] ata_piix crc32c_intel virtio_pci libata virtio_ring serio_raw t10_pi virtio floppy dm_]
[ 1457.380849] CR2: 0000000000000000
[ 1457.381288] ---[ end trace c6cab61bfd1f68fd ]---
[ 1457.381861] RIP: 0010:blk_mq_free_request+0xe/0x110
[ 1457.382469] Code: 3f ff ff ff 83 f8 01 75 0d 4c 89 e7 e8 1b db ff ff e9 2d ff ff ff 0f 0b eb ef 66 8
[ 1457.384749] RSP: 0018:ffffc900035b7de0 EFLAGS: 00010282
[ 1457.385393] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000002
[ 1457.386264] RDX: ffffffffa060bd05 RSI: 0000000000000000 RDI: 0000000000000000
[ 1457.387142] RBP: 0000000000000037 R08: 0000000000000000 R09: 0000000000000000
[ 1457.388029] R10: 0000000000000000 R11: 000000000000006d R12: 0000000000000000
[ 1457.388914] R13: ffff8887ffa68600 R14: 0000000000000000 R15: ffff8888150564c8
[ 1457.389798] FS: 0000000000000000(0000) GS:ffff888814600000(0000) knlGS:0000000000000000
[ 1457.390796] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1457.391508] CR2: 0000000000000000 CR3: 000000081c0ac000 CR4: 00000000003406e0
[ 1457.392525] Kernel panic - not syncing: Fatal exception
[ 1457.394138] Kernel Offset: disabled
[ 1457.394677] ---[ end Kernel panic - not syncing: Fatal exception ]---
We fix this Oops by adding a new goto label out_put_req and reordering
the blk_put_request call to avoid calling blk_put_request() with rq
value is set to NULL. Here we also update the rest of the code
accordingly.
Fixes: 06b7164dfdc0 ("nvmet: add passthru code to process commands")
Signed-off-by: Chaitanya Kulkarni <[email protected]>
Reviewed-by: Logan Gunthorpe <[email protected]>
Signed-off-by: Sagi Grimberg <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
In the current implementation before submitting the passthru cmd we
may come across error e.g. getting ns from passthru controller,
allocating a request from passthru controller, etc. For all the failure
cases it only uses single goto label fail_out.
In the target code, we follow the pattern to have a separate label for
each error out the case when setting up multiple things before the actual
action.
This patch follows the same pattern and renames generic fail_out label
to out_put_ns and updates the error out cases in the
nvmet_passthru_execute_cmd() where it is needed.
Signed-off-by: Chaitanya Kulkarni <[email protected]>
Reviewed-by: Logan Gunthorpe <[email protected]>
Signed-off-by: Sagi Grimberg <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
If we find an optimized path, we quit the loop immediately. Thus we can use
just one variable for the next path, slighly simplifying the code.
Signed-off-by: Martin Wilck <[email protected]>
Reviewed-by: Keith Busch <[email protected]>
Signed-off-by: Sagi Grimberg <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
If there's only one usable, non-optimized path, nvme_round_robin_path()
returns NULL, which is wrong. Fix it by falling back to "old", like in
the single optimized path case. Also, if the active path isn't changed,
there's no need to re-assign the pointer.
Fixes: 3f6e3246db0e ("nvme-multipath: fix logic for non-optimized paths")
Signed-off-by: Martin Wilck <[email protected]>
Signed-off-by: Martin George <[email protected]>
Reviewed-by: Keith Busch <[email protected]>
Signed-off-by: Sagi Grimberg <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
On an error exit path, a negative error code should be returned
instead of a positive return value.
Fixes: e399441de9115 ("nvme-fabrics: Add host support for FC transport")
Cc: James Smart <[email protected]>
Signed-off-by: Tianjia Zhang <[email protected]>
Reviewed-by: Chaitanya Kulkarni <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Sagi Grimberg <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|