aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2017-09-25nvme-fcloop: fix port deletes and callbacksJames Smart1-64/+38
Now that there are potentially long delays between when a remoteport or targetport delete calls is made and when the callback occurs (dev_loss_tmo timeout), no longer block in the delete routines and move the final nport puts to the callbacks. Moved the fcloop_nport_get/put/free routines to avoid forward declarations. Ensure port_info structs used in registrations are nulled in case fields are not set (ex: devloss_tmo values). Signed-off-by: James Smart <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-09-25nvmet-fc: sync header templates with commentsJames Smart1-5/+8
Comments were incorrect: - defer_rcv was in host port template. moved to target port template - Added Mandatory statements for target port template items Signed-off-by: James Smart <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-09-25nvmet-fc: ensure target queue id within range.James Smart1-0/+3
When searching for queue id's ensure they are within the expected range. Signed-off-by: James Smart <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-09-25nvmet-fc: on port remove call put outside lockJames Smart1-1/+5
Avoid calling the put routine, as it may traverse to free routines while holding the target lock. Signed-off-by: James Smart <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-09-25nvme-rdma: don't fully stop the controller in error recoverySagi Grimberg1-1/+1
By calling nvme_stop_ctrl on a already failed controller will wait for the scan work to complete (only by identify timeout expiration which is 60 seconds). This is unnecessary when we already know that the controller has failed. Reported-by: Yi Zhang <[email protected]> Signed-off-by: Sagi Grimberg <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-09-25nvme-rdma: give up reconnect if state change failsSagi Grimberg1-1/+6
If we failed to transition to state LIVE after a successful reconnect, then controller deletion already started. In this case there is no point moving forward with reconnect. Reviewed-by: Johannes Thumshirn <[email protected]> Signed-off-by: Sagi Grimberg <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-09-25nvme-core: Use nvme_wq to queue async events and fw activationSagi Grimberg1-2/+2
async_event_work might race as it is executed from two different workqueues at the moment. Reviewed-by: Johannes Thumshirn <[email protected]> Signed-off-by: Sagi Grimberg <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-09-25nvme: fix sqhd reference when admin queue connect failsJames Smart1-1/+2
Fix bug in sqhd patch. It wasn't the sq that was at risk. In the case where the admin queue connect command fails, the sq->size field is not set. Therefore, this becomes a divide by zero error. Add a quick check to bypass under this failure condition. Signed-off-by: James Smart <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-09-25watchdog/hardlockup/perf: Cure UP damageThomas Gleixner1-1/+6
for_each_cpu() unintuitively reports CPU0 as set independend of the actual cpumask content on UP kernels. That leads to a NULL pointer dereference when the cleanup function is invoked and there is no event to clean up. Reported-by: Fengguang Wu <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]>
2017-09-25gfs2: Fix debugfs glocks dumpAndreas Gruenbacher1-9/+5
The switch to rhashtables (commit 88ffbf3e03) broke the debugfs glock dump (/sys/kernel/debug/gfs2/<device>/glocks) for dumps bigger than a single buffer: the right function for restarting an rhashtable iteration from the beginning of the hash table is rhashtable_walk_enter; rhashtable_walk_stop + rhashtable_walk_start will just resume from the current position. Signed-off-by: Andreas Gruenbacher <[email protected]> Signed-off-by: Bob Peterson <[email protected]> Cc: [email protected] # v4.3+
2017-09-25selftests: timers: set-timer-lat: Fix hang when testing unsupported alarmsShuah Khan1-3/+6
When timer_create() fails on a bootime or realtime clock, setup_timer() returns 0 as if timer has been set. Callers wait forever for the timer to expire. This hang is seen on a system that doesn't have support for: CLOCK_REALTIME_ALARM ABSTIME missing CAP_WAKE_ALARM? : [UNSUPPORTED] Test hangs waiting for a timer that hasn't been set to expire. Fix setup_timer() to return 1, add handling in callers to detect the unsupported case and return 0 without waiting to not fail the test. Signed-off-by: Shuah Khan <[email protected]>
2017-09-25selftests: timers: set-timer-lat: fix hang when std out/err are redirectedShuah Khan1-3/+1
do_timer_oneshot() uses select() as a timer with FD_SETSIZE and readfs is cleared with FD_ZERO without FD_SET. When stdout and stderr are redirected, the test hangs in select forever. Fix the problem calling select() with readfds empty and nfds zero. This is sufficient for using select() for timer. With this fix "./set-timer-lat > /dev/null 2>&1" no longer hangs. Signed-off-by: Shuah Khan <[email protected]> Acked-by: Greg Hackmann <[email protected]> Signed-off-by: Shuah Khan <[email protected]>
2017-09-25selftests/memfd: correct run_tests.sh permissionLi Zhijian1-0/+0
to fix the following issue: ------------------ TAP version 13 selftests: run_tests.sh ======================================== selftests: Warning: file run_tests.sh is not executable, correct this. not ok 1..1 selftests: run_tests.sh [FAIL] ------------------ Signed-off-by: Li Zhijian <[email protected]> Signed-off-by: Shuah Khan <[email protected]>
2017-09-25selftests/seccomp: Support glibc 2.26 siginfo_t.hKees Cook1-5/+13
The 2.26 release of glibc changed how siginfo_t is defined, and the earlier work-around to using the kernel definition are no longer needed. The old way needs to stay around for a while, though. Reported-by: Seth Forshee <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Will Drewry <[email protected]> Cc: Shuah Khan <[email protected]> Cc: [email protected] Cc: [email protected] Signed-off-by: Kees Cook <[email protected]> Tested-by: Seth Forshee <[email protected]> Signed-off-by: Shuah Khan <[email protected]>
2017-09-25selftests: futex: Makefile: fix for loops in targets to run silentlyShuah Khan1-3/+3
Fix for loops in targets to run silently to avoid cluttering the test results. Suppresses the following from targets: for DIR in functional; do \ BUILD_TARGET=./tools/testing/selftests/futex/$DIR; \ mkdir $BUILD_TARGET -p; \ make OUTPUT=$BUILD_TARGET -C $DIR all;\ done ./tools/testing/selftests/futex/run.sh Signed-off-by: Shuah Khan <[email protected]> Reviewed-by: Darren Hart (VMware) <[email protected]> Signed-off-by: Shuah Khan <[email protected]>
2017-09-25selftests: Makefile: fix for loops in targets to run silentlyShuah Khan1-7/+7
Fix for loops in targets to run silently to avoid cluttering the test results. Suppresses the following from targets: e.g run from breakpoints for TARGET in breakpoints; do \ BUILD_TARGET=$BUILD/$TARGET; \ mkdir $BUILD_TARGET -p; \ make OUTPUT=$BUILD_TARGET -C $TARGET;\ done; Signed-off-by: Shuah Khan <[email protected]>
2017-09-25selftests: mqueue: Use full path to run tests from MakefileShuah Khan1-2/+2
Use full path including $(OUTPUT) to run tests from Makefile for normal case when objects reside in the source tree as well as when objects are relocated with make O=dir. In both cases $(OUTPUT) will be set correctly by lib.mk. Signed-off-by: Shuah Khan <[email protected]>
2017-09-25selftests: futex: copy sub-dir test scripts for make O=dir runShuah Khan1-1/+4
For make O=dir run_tests to work, test scripts from sub-directories need to be copied over to the object directory. Running tests from the object directory is necessary to avoid making the source tree dirty. Signed-off-by: Shuah Khan <[email protected]> Reviewed-by: Darren Hart (VMware) <[email protected]> Signed-off-by: Shuah Khan <[email protected]>
2017-09-25PCI: Add dummy pci_acs_enabled() for CONFIG_PCI=n buildGeert Uytterhoeven1-0/+2
If CONFIG_PCI=n and gcc (e.g. 4.1.2) decides not to inline get_pci_function_alias_group(), the build fails with: drivers/iommu/iommu.o: In function `get_pci_function_alias_group': iommu.c:(.text+0xfdc): undefined reference to `pci_acs_enabled' Due to the various dummies for PCI calls in the CONFIG_PCI=n case, pci_acs_enabled() never called, but not all versions of gcc are smart enough to realize that. While explicitly marking get_pci_function_alias_group() inline would fix the build, this would inflate the code for the CONFIG_PCI=y case, as get_pci_function_alias_group() is a not-so-small function called from two places. Hence fix the issue by introducing a dummy for pci_acs_enabled() instead. Fixes: 0ae349a0f33f ("iommu/qcom: Add qcom_iommu") Signed-off-by: Geert Uytterhoeven <[email protected]> Signed-off-by: Bjorn Helgaas <[email protected]> Reviewed-by: Alex Williamson <[email protected]>
2017-09-25IB/mlx5: Fix NULL deference on mlx5_ib_update_xlt failureIlya Lesokhin1-10/+17
mlx5_ib_reg_user_mr called mlx5_ib_dereg_mr in case of MR population failure. This resulted in a NULL dereference as ibmr->device wasn't initialized yet. We address this by adding an internal dereg_mr function that can handle partially initialized MRs, and fixing clean_mr to work on partially initialized MRs. Fixes: ff740aefecb9 ("IB/mlx5: Decouple MR allocation and population flows") Signed-off-by: Ilya Lesokhin <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
2017-09-25IB/mlx5: Simplify mlx5_ib_cont_pagesIlya Lesokhin1-30/+17
The patch simplifies mlx5_ib_cont_pages and fixes the following issues in the original implementation: First issues is related to alignment of the PFNs. After the check base + p != PFN, the alignment of the PFN wasn't checked. So the PFN sequence 0, 1, 1, 2 would result in a page_shift of 13 even though the 3rd PFN is not 8KB aligned. This wasn't actually a bug because it was supported by all the existing mlx5 compatible device, but we don't want to require this support in all future devices. Another issue is because the inner loop didn't advance PFN so the test "if (base + p != pfn)" always failed for SGE with len > (1<<page_shift). Fixes: e126ba97dba9 ("mlx5: Add driver for Mellanox Connect-IB adapters") Signed-off-by: Ilya Lesokhin <[email protected]> Reviewed-by: Eli Cohen <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
2017-09-25IB/ipoib: Fix inconsistency with free_netdev and free_rdma_netdevAlex Vesker2-6/+19
Call free_rdma_netdev instead of free_netdev each time we want to release a netdevice. This call is also relevant for future freeing of offloaded child interfaces. This patch also adds a missing call for free netdevice when releasing a parent interface that has child interfaces using ipoib_remove_one. Fixes: cd565b4b51e5 ('IB/IPoIB: Support acceleration options callbacks') Signed-off-by: Alex Vesker <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
2017-09-25IB/ipoib: Fix sysfs Pkey create<->remove possible deadlockShalom Lagziel1-6/+14
A possible ABBA lock can happen with RTNL and vlan_rwsem. For example: Flow A: Device Flush __ipoib_ib_dev_flush down_read(vlan_rwsem) // Lock A ipoib_flush_ah flush_workqueue(priv->wq) // Wait for completion A work on shared WQ (Mcast carrier) ipoib_mcast_carrier_on_task while (!rtnl_trylock()) // Wait for lock B Flow B: Sysfs PKEY delete ipoib_vlan_delete lock(RTNL) // Lock B down_write(vlan_rwsem) // Wait for lock A This can happen with PKEY creates as well. The solution is to release the RTNL lock in sysfs functions in case it is not possible to lock VLAN RW semaphore and reset the SYS call. Fixes: 69956d83267e ("IB/ipoib: Sync between remove_one to sysfs calls that use rtnl_lock") Signed-off-by: Shalom Lagziel <[email protected]> Signed-off-by: Alex Vesker <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
2017-09-25IB: Correct MR length field to be 64-bitParav Pandit4-5/+5
The ib_mr->length represents the length of the MR in bytes as per the IBTA spec 1.3 section 11.2.10.3 (REGISTER PHYSICAL MEMORY REGION). Currently ib_mr->length field is defined as only 32-bits field. This might result into truncation and failed WRs of consumers who registers more than 4GB bytes memory regions and whose WRs accessing such MRs. This patch makes the length 64-bit to avoid such truncation. Cc: Sagi Grimberg <[email protected]> Cc: Chuck Lever <[email protected]> Cc: Faisal Latif <[email protected]> Fixes: 4c67e2bfc8b7 ("IB/core: Introduce new fast registration API") Signed-off-by: Ilya Lesokhin <[email protected]> Signed-off-by: Parav Pandit <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
2017-09-25IB/core: Fix qp_sec use after free accessParav Pandit1-1/+3
When security_ib_alloc_security fails, qp->qp_sec memory is freed. However ib_destroy_qp still tries to access this memory which result in kernel crash. So its initialized to NULL to avoid such access. Fixes: d291f1a65232 ("IB/core: Enforce PKey security on QPs") Signed-off-by: Parav Pandit <[email protected]> Reviewed-by: Daniel Jurgens <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
2017-09-25IB/core: Fix typo in the name of the tag-matching cap structLeon Romanovsky4-15/+15
The tag matching functionality is implemented by mlx5 driver by extending XRQ, however this internal kernel information was exposed to user space applications with *xrq* name instead of *tm*. This patch renames *xrq* to *tm* to handle that. Fixes: 8d50505ada72 ("IB/uverbs: Expose XRQ capabilities") Signed-off-by: Leon Romanovsky <[email protected]> Reviewed-by: Yishai Hadas <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
2017-09-25perf tools: Fix syscalltbl build failureAkemi Yagi1-1/+1
The build of kernel v4.14-rc1 for i686 fails on RHEL 6 with the error in tools/perf: util/syscalltbl.c:157: error: expected ';', ',' or ')' before '__maybe_unused' mv: cannot stat `util/.syscalltbl.o.tmp': No such file or directory Fix it by placing/moving: #include <linux/compiler.h> outside of #ifdef HAVE_SYSCALL_TABLE block. Signed-off-by: Akemi Yagi <[email protected]> Cc: Alan Bartlett <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2017-09-25perf report: Fix debug messages with --call-graph optionMengting Zhang1-14/+21
With --call-graph option, perf report can display call chains using type, min percent threshold, optional print limit and order. And the default call-graph parameter is 'graph,0.5,caller,function,percent'. Before this patch, 'perf report --call-graph' shows incorrect debug messages as below: # perf report --call-graph Invalid callchain mode: 0.5 Invalid callchain order: 0.5 Invalid callchain sort key: 0.5 Invalid callchain config key: 0.5 Invalid callchain mode: caller Invalid callchain mode: function Invalid callchain order: function Invalid callchain mode: percent Invalid callchain order: percent Invalid callchain sort key: percent That is because in function __parse_callchain_report_opt(),each field of the call-graph parameter is passed to parse_callchain_{mode,order, sort_key,value} in turn until it meets the matching value. For example, the order field "caller" is passed to parse_callchain_mode() firstly and obviously it doesn't match any mode field. Therefore parse_callchain_mode() will shows the debug message "Invalid callchain mode: caller", which could confuse users. The patch fixes this issue by moving the warning out of the function parse_callchain_{mode,order,sort_key,value}. Signed-off-by: Mengting Zhang <[email protected]> Acked-by: Jiri Olsa <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Krister Johansen <[email protected]> Cc: Li Bin <[email protected]> Cc: Milian Wolff <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Wang Nan <[email protected]> Cc: Yao Jin <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2017-09-25dm ioctl: fix alignment of event number in the device listMikulas Patocka4-17/+35
The size of struct dm_name_list is different on 32-bit and 64-bit kernels (so "(nl + 1)" differs between 32-bit and 64-bit kernels). This mismatch caused some harmless difference in padding when using 32-bit or 64-bit kernel. Commit 23d70c5e52dd ("dm ioctl: report event number in DM_LIST_DEVICES") added reporting event number in the output of DM_LIST_DEVICES_CMD. This difference in padding makes it impossible for userspace to determine the location of the event number (the location would be different when running on 32-bit and 64-bit kernels). Fix the padding by using offsetof(struct dm_name_list, name) instead of sizeof(struct dm_name_list) to determine the location of entries. Also, the ioctl version number is incremented to 37 so that userspace can use the version number to determine that the event number is present and correctly located. In addition, a global event is now raised when a DM device is created, removed, renamed or when table is swapped, so that the user can monitor for device changes. Reported-by: Eugene Syromiatnikov <[email protected]> Fixes: 23d70c5e52dd ("dm ioctl: report event number in DM_LIST_DEVICES") Cc: [email protected] # 4.13 Signed-off-by: Mikulas Patocka <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
2017-09-25block: fix a crash caused by wrong APIShaohua Li1-1/+1
part_stat_show takes a part device not a disk, so we should use part_to_disk. Fixes: d62e26b3ffd2("block: pass in queue to inflight accounting") Cc: Bart Van Assche <[email protected]> Cc: Omar Sandoval <[email protected]> Signed-off-by: Shaohua Li <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-09-25fs: Fix page cache inconsistency when mixing buffered and AIO DIOLukas Czerner3-21/+67
Currently when mixing buffered reads and asynchronous direct writes it is possible to end up with the situation where we have stale data in the page cache while the new data is already written to disk. This is permanent until the affected pages are flushed away. Despite the fact that mixing buffered and direct IO is ill-advised it does pose a thread for a data integrity, is unexpected and should be fixed. Fix this by deferring completion of asynchronous direct writes to a process context in the case that there are mapped pages to be found in the inode. Later before the completion in dio_complete() invalidate the pages in question. This ensures that after the completion the pages in the written area are either unmapped, or populated with up-to-date data. Also do the same for the iomap case which uses iomap_dio_complete() instead. This has a side effect of deferring the completion to a process context for every AIO DIO that happens on inode that has pages mapped. However since the consensus is that this is ill-advised practice the performance implication should not be a problem. This was based on proposal from Jeff Moyer, thanks! Reviewed-by: Jan Kara <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Reviewed-by: Jeff Moyer <[email protected]> Signed-off-by: Lukas Czerner <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-09-25nvmet: implement valid sqhd values in completionsJames Smart3-6/+12
To support sqhd, for initiators that are following the spec and paying attention to sqhd vs their sqtail values: - add sqhd to struct nvmet_sq - initialize sqhd to 0 in nvmet_sq_setup - rather than propagate the 0's-based qsize value from the connect message which requires a +1 in every sqhd update, and as nothing else references it, convert to 1's-based value in nvmt_sq/cq_setup() calls. - validate connect message sqsize being non-zero per spec. - updated assign sqhd for every completion that goes back. Also remove handling the NULL sq case in __nvmet_req_complete, as it can't happen with the current code. Signed-off-by: James Smart <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Reviewed-by: Max Gurtovoy <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-09-25nvme-fabrics: Allow 0 as KATO valueGuilherme G. Piccoli1-9/+9
Currently, driver code allows user to set 0 as KATO (Keep Alive TimeOut), but this is not being respected. This patch enforces the expected behavior. Signed-off-by: Guilherme G. Piccoli <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-09-25nvme: allow timed-out ios to retryJames Smart1-2/+0
Currently the nvme_req_needs_retry() applies several checks to see if a retry is allowed. On of those is whether the current time has exceeded the start time of the io plus the timeout length. This check, if an io times out, means there is never a retry allowed for the io. Which means applications see the io failure. Remove this check and allow the io to timeout, like it does on other protocols, and retries to be made. On the FC transport, a frame can be lost for an individual io, and there may be no other errors that escalate for the connection/association. The io will timeout, which causes the transport to escalate into creating a new association, but the io that timed out, due to this retry logic, has already failed back to the application and things are hosed. Signed-off-by: James Smart <[email protected]> Reviewed-by: Keith Busch <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-09-25nvme: stop aer posting if controller state not liveJames Smart1-2/+3
If an nvme async_event command completes, in most cases, a new async event is posted. However, if the controller enters a resetting or reconnecting state, there is nothing to block the scheduled work element from posting the async event again. Nor are there calls from the transport to stop async events when an association dies. In the case of FC, where the association is torn down, the aer must be aborted on the FC link and completes through the normal job completion path. Thus the terminated async event ends up being rescheduled even though the controller isn't in a valid state for the aer, and the reposting gets the transport into a partially torn down data structure. It's possible to hit the scenario on rdma, although much less likely due to an aer completing right as the association is terminated and as the association teardown reclaims the blk requests via nvme_cancel_request() so its immediate, not a link-related action like on FC. Fix by putting controller state checks in both the async event completion routine where it schedules the async event and in the async event work routine before it calls into the transport. It's effectively a "stop_async_events()" behavior. The transport, when it creates a new association with the subsystem will transition the state back to live and is already restarting the async event posting. Signed-off-by: James Smart <[email protected]> [hch: remove taking a lock over reading the controller state] Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-09-25nvme-pci: Print invalid SGL only onceKeith Busch1-12/+18
The WARN_ONCE macro returns true if the condition is true, not if the warn was raised, so we're printing the scatter list every time it's invalid. This is excessive and makes debugging harder, so this patch prints it just once. Signed-off-by: Keith Busch <[email protected]> Reviewed-by: Martin K. Petersen <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-09-25nvme-pci: initialize queue memory before interruptsKeith Busch1-2/+2
A spurious interrupt before the nvme driver has initialized the completion queue may inadvertently cause the driver to believe it has a completion to process. This may result in a NULL dereference since the nvmeq's tags are not set at this point. The patch initializes the host's CQ memory so that a spurious interrupt isn't mistaken for a real completion. Signed-off-by: Keith Busch <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-09-25nvmet-fc: fix failing max io queue connectionsJames Smart1-3/+3
fc transport is treating NVMET_NR_QUEUES as maximum queue count, e.g. admin queue plus NVMET_NR_QUEUES-1 io queues. But NVMET_NR_QUEUES is the number of io queues, so maximum queue count is really NVMET_NR_QUEUES+1. Fix the handling in the target fc transport Signed-off-by: James Smart <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-09-25nvme-fc: use transport-specific sgl formatJames Smart1-6/+7
Sync with NVM Express spec change and FC-NVME 1.18. FC transport sets SGL type to Transport SGL Data Block Descriptor and subtype to transport-specific value 0x0A. Removed the warn-on's on the PRP fields. They are unneeded. They were to check for values from the upper layer that weren't set right, and for the most part were fine. But, with Async events, which reuse the same structure and 2nd time issued the SGL overlay converted them to the Transport SGL values - the warn-on's were errantly firing. Signed-off-by: James Smart <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-09-25nvme: add transport SGL definitionsJames Smart1-0/+6
Add transport SGL defintions from NVMe TP 4008, required for the final NVMe-FC standard. Signed-off-by: James Smart <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-09-25nvme.h: remove FC transport-specific error valuesJames Smart1-13/+0
The NVM express group recinded the reserved range for the transport. Remove the FC-centric values that had been defined. Signed-off-by: James Smart <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-09-25qla2xxx: remove use of FC-specific error codesJames Smart1-1/+1
The qla2xxx driver uses the FC-specific error when it needed to return an error to the FC-NVME transport. Convert to use a generic value instead. Signed-off-by: James Smart <[email protected]> Acked-by: Himanshu Madhani <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-09-25lpfc: remove use of FC-specific error codesJames Smart1-1/+1
The lpfc driver uses the FC-specific error when it needed to return an error to the FC-NVME transport. Convert to use a generic value instead. Signed-off-by: James Smart <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-09-25nvmet-fcloop: remove use of FC-specific error codesJames Smart1-1/+1
The FC-NVME transport loopback test module used the FC-specific error codes in cases where it emulated a transport abort case. Instead of using the FC-specific values, now use a generic value (NVME_SC_INTERNAL). Signed-off-by: James Smart <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-09-25nvmet-fc: remove use of FC-specific error codesJames Smart1-6/+3
The FC-NVME target transport used the FC-specific error codes in return codes when the transport or lldd failed. Instead of using the FC-specific values, now use a generic value (NVME_SC_INTERNAL). Signed-off-by: James Smart <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-09-25nvme-fc: remove use of FC-specific error codesJames Smart1-4/+4
The FC-NVME transport used the FC-specific error codes in cases where it had to fabricate an error to go back up stack. Instead of using the FC-specific values, now use a generic value (NVME_SC_INTERNAL). Signed-off-by: James Smart <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-09-25loop: remove union of use_aio and ref in struct loop_cmdOmar Sandoval1-4/+2
When the request is completed, lo_complete_rq() checks cmd->use_aio. However, if this is in fact an aio request, cmd->use_aio will have already been reused as cmd->ref by lo_rw_aio*. Fix it by not using a union. On x86_64, there's a hole after the union anyways, so this doesn't make struct loop_cmd any bigger. Fixes: 92d773324b7e ("block/loop: fix use after free") Signed-off-by: Omar Sandoval <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-09-25blktrace: Fix potential deadlock between delete & sysfs opsWaiman Long3-6/+16
The lockdep code had reported the following unsafe locking scenario: CPU0 CPU1 ---- ---- lock(s_active#228); lock(&bdev->bd_mutex/1); lock(s_active#228); lock(&bdev->bd_mutex); *** DEADLOCK *** The deadlock may happen when one task (CPU1) is trying to delete a partition in a block device and another task (CPU0) is accessing tracing sysfs file (e.g. /sys/block/dm-1/trace/act_mask) in that partition. The s_active isn't an actual lock. It is a reference count (kn->count) on the sysfs (kernfs) file. Removal of a sysfs file, however, require a wait until all the references are gone. The reference count is treated like a rwsem using lockdep instrumentation code. The fact that a thread is in the sysfs callback method or in the ioctl call means there is a reference to the opended sysfs or device file. That should prevent the underlying block structure from being removed. Instead of using bd_mutex in the block_device structure, a new blk_trace_mutex is now added to the request_queue structure to protect access to the blk_trace structure. Suggested-by: Christoph Hellwig <[email protected]> Signed-off-by: Waiman Long <[email protected]> Acked-by: Steven Rostedt (VMware) <[email protected]> Fix typo in patch subject line, and prune a comment detailing how the code used to work. Signed-off-by: Jens Axboe <[email protected]>
2017-09-25nbd: ignore non-nbd ioctl'sJosef Bacik1-0/+6
In testing we noticed that nbd would spew if you ran a fio job against the raw device itself. This is because fio calls a block device specific ioctl, however the block layer will first pass this back to the driver ioctl handler in case the driver wants to do something special. Since the device was setup using netlink this caused us to spew every time fio called this ioctl. Since we don't have special handling, just error out for any non-nbd specific ioctl's that come in. This fixes the spew. Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-09-25bsg-lib: don't free job in bsg_prepare_jobChristoph Hellwig1-1/+0
The job structure is allocated as part of the request, so we should not free it in the error path of bsg_prepare_job. Signed-off-by: Christoph Hellwig <[email protected]> Cc: [email protected] Reviewed-by: Ming Lei <[email protected]> Signed-off-by: Jens Axboe <[email protected]>