Age | Commit message (Collapse) | Author | Files | Lines |
|
Kernel receive queues oversubscribe CPU cores on multi-HFI systems.
To prevent this, the kernel receive queues are separated onto
different cores, and the SDMA engine interrupts are constrained to
a lesser number of cores.
hfi1s_on_numa_node*krcvqs is the number of CPU cores that are
reserved for kernel receive queues for all HFIs. Each HFI initializes
its kernel receive queues to one of the reserved CPU cores. If there
ends up being 0 CPU cores leftover for SDMA engines, use the same
CPU cores as receive contexts.
In addition, general and control contexts are assigned to their own
CPU core, however, both types of contexts tend to have low traffic.
To save CPU cores, collapse general and control contexts to one CPU
core for all HFI units. This change prevents SDMA engine interrupts
from wrapping around general contexts.
Reviewed-by: Dean Luick <[email protected]>
Signed-off-by: Sebastian Sanchez <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
When HFI units get initialized, they each use their own mask copy for
affinity assignments. On a multi-HFI system, affinity assignments
overbook CPU cores as each HFI doesn't have knowledge of affinity
assignments for other HFI units. Therefore, some CPU cores are never
used for interrupt handlers in systems with high number of CPU cores
per NUMA node.
For multi-HFI systems, SDMA engine interrupt assignments start all over
from the first CPU in the local NUMA node after the first HFI
initialization. This change allows assignments to continue where the
last HFI unit left off.
Add global structure for affinity assignments for multiple HFIs to share
affinity mask.
Reviewed-by: Jianxin Xiong <[email protected]>
Reviewed-by: Jubin John <[email protected]>
Reviewed-by: Mike Marciniszyn <[email protected]>
Signed-off-by: Sebastian Sanchez <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
The q-counter-id is given in modify-QP command associates
the QP with the counter. The offset to which the counter
ID was set is incorrect, causing IB port counters not to
count on QP.
Fixes: 0837e86a7a34 ('IB/mlx5: Add per port counters')
Signed-off-by: Alex Vesker <[email protected]>
Signed-off-by: Leon Romanovsky <[email protected]>
Tested-by: Mark Bloch <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
Number of outstanding_pi may overflow and as a result may indicate that
there are no elements in the queue. The effect of doing this is that the
MAD layer will get stuck waiting for completions. The MAD layer will
think that the QP is full - because it didn't receive these completions.
This fix changes it so the outstanding_pi number is increased
with 32-bit wraparound and is not limited to max_send_wr so
that the difference between outstanding_pi and outstanding_ci will
really indicate the number of outstanding completions.
Cc: Stable <[email protected]>
Fixes: ea6dc2036224 ('IB/mlx5: Reorder GSI completions')
Signed-off-by: Slava Shwartsman <[email protected]>
Signed-off-by: Leon Romanovsky <[email protected]>
Reviewed-by: Haggai Eran <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
i40iw_puda_get_listbuf may return NULL if the list is empty.
Add NULL check prior to accessing the pointer.
Signed-off-by: Mustafa Ismail <[email protected]>
Signed-off-by: Shiraz Saleem <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
Change dup_ack_thressh to u8 since it is a 3 bit field.
Signed-off-by: Mustafa Ismail <[email protected]>
Signed-off-by: Shiraz Saleem <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
In i40iw_cq_poll_completion, we always move the tail. So there is
no reason to check for overflow everytime we move the head.
Signed-off-by: Mustafa Ismail <[email protected]>
Signed-off-by: Shiraz Saleem <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
Replace a subtract and multiply with an add; while populating fragments
in SQ wqe.
Signed-off-by: Mustafa Ismail <[email protected]>
Signed-off-by: Shiraz Saleem <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
Post_cq parameter passed to i40iw_cq_poll_completion is always
true; so remove.
Signed-off-by: Mustafa Ismail <[email protected]>
Signed-off-by: Shiraz Saleem <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
Child_listen_node pointer is used in a debug print after kfree.
Move the print before kfree.
Signed-off-by: Mustafa Ismail <[email protected]>
Signed-off-by: Shiraz Saleem <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
Fix size parameter passed to i40iw_reg_phys_mr and use it to
register memory.
Signed-off-by: Mustafa Ismail <[email protected]>
Signed-off-by: Shiraz Saleem <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
Fix incorrect usage of ENOSYS and other return codes.
Signed-off-by: Shiraz Saleem <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
Remove the complicated logic to free the iw_cm_id inside iw_cm
event handlers vs when an application thread destroys the cm_id.
Also remove the block in iw_destroy_cm_id() to block the application
until all references are removed. This block can cause a deadlock when
disconnecting or destroying cm_ids inside an rdma_cm event handler.
Simply allowing the last deref of the iw_cm_id to free the memory
is cleaner and avoids this potential deadlock. Also a flag is added,
IW_CM_DROP_EVENTS, that is set when the cm_id is marked for destruction.
If any events are pending on this iw_cm_id, then as they are processed
they will be dropped vs posted upstream if IW_CM_DROP_EVENTS is set.
Signed-off-by: Steve Wise <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
Blocking in c4iw_destroy_qp() causes a deadlock when apps destroy a qp
or disconnect a cm_id from their cm event handler function. There is
no need to block here anyway, so just replace the refcnt atomic with a
kref object and free the memory on the last put.
Signed-off-by: Steve Wise <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
This forces the connection to abort if the application failed to
disconnect before flushing. This is aligned with how the common
flush services work.
Signed-off-by: Steve Wise <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
There exists a race where the application can setup a connection
and then disconnect it before iw_cxgb4 processes the fw4_ack
message. For passive side connections, the fw4_ack message is
used to know when to stop the ep timer for MPA_REPLY messages.
If the application disconnects before the fw4_ack is handled then
c4iw_ep_disconnect() needs to clean up the timer state and stop the
timer before restarting it for the disconnect timer. Failure to do this
results in a "timer already started" message and a premature stopping
of the disconnect timer.
Fixes: e4b76a2 ("RDMA/iw_cxgb4: stop_ep_timer() after MPA negotiation")
Signed-off-by: Steve Wise <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
During connection establishment with a large number of
connections, it is possible that the connection requests
might fail. Adding flow control prevents this failure.
Change ibnl_unicast to use blocking to enable flow control.
Signed-off-by: Mustafa Ismail <[email protected]>
Signed-off-by: Faisal Latif <[email protected]>
Signed-off-by: Shiraz Saleem <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
Elsewhere the sin_family field holds a value with a name of the form
AF_..., so it seems reasonable to do so here as well. Also the values
of PF_INET and AF_INET are the same.
The Coccinelle semantic patch that makes this change is as follows:
// <smpl>
@@
struct sockaddr_in sip;
@@
(
sip.sin_family ==
- PF_INET
+ AF_INET
|
sip.sin_family !=
- PF_INET
+ AF_INET
|
sip.sin_family =
- PF_INET
+ AF_INET
)
// </smpl>
Signed-off-by: Amitoj Kaur Chawla <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
The commit 0f8ab0b6e91b4d53 ("RDMA/iw_cxgb4: Low resource fixes for Memory
registration") from Jun 10, 2016, leads to the following static checker
warning:
drivers/infiniband/hw/cxgb4/mem.c:612 c4iw_alloc_mw()
error: use kfree_skb() here instead of kfree(mhp->dereg_skb)
Also fixes skb leak in c4iw_dealloc_mw
Fixes: 0f8ab0b6e91b4d53 ("RDMA/iw_cxgb4: Low resource fixes for Memory registration")
Reported-by: Dan Carpenter <[email protected]>
Signed-off-by: Steve Wise <[email protected]>
Signed-off-by: Hariprasad Shenai <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
Fixes the following sparse warning:
drivers/infiniband/hw/mlx5/main.c:2574:25: warning: duplicate const
Signed-off-by: Wei Yongjun <[email protected]>
Acked-by: Leon Romanovsky <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull more s390 updates from Martin Schwidefsky:
- some cleanup for the hugetlbfs pte/pmd conversion functions
- the code to check for the minimum CPU type is converted from
assembler to C and an informational message is added in case the CPU
is not new enough to run the kernel
- bug fixes
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/ftrace/jprobes: Fix conflict between jprobes and function graph tracing
s390: Define AT_VECTOR_SIZE_ARCH for ARCH_DLINFO
s390/zcrypt: fix possible memory leak in ap_module_init()
s390/numa: only set possible nodes within node_possible_map
s390/als: fix compile with gcov enabled
s390/facilities: do not generate DWORDS define anymore
s390/als: print missing facilities on facility mismatch
s390/als: print machine type on facility mismatch
s390/als: convert architecture level set code to C
s390/sclp: move uninitialized data to data section
s390/zcrypt: Fix zcrypt suspend/resume behavior
s390/cio: fix premature wakeup during chp configure
s390/cio: convert cfg_lock mutex to spinlock
s390/mm: clean up pte/pmd encoding
|
|
Signed-off-by: Bart Van Assche <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Tested-by: Steve Wise <[email protected]>
Tested-by: Laurence Oberman <[email protected]>
Cc: Parav Pandit <[email protected]>
Cc: Nicholas Bellinger <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
Initialize first_wr to &send_wr. This allows to remove a ternary
operator and an else branch. This patch does not change the behavior
of srpt_queue_response().
Signed-off-by: Bart Van Assche <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
Tested-by: Steve Wise <[email protected]>
Tested-by: Laurence Oberman <[email protected]>
Cc: Parav Pandit <[email protected]>
Cc: Nicholas Bellinger <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
Limit the number of SG elements per work request to what the HCA
and the queue pair support.
Fixes: 34693573fde0 ("IB/srpt: Reduce QP buffer size")
Reported-by: Parav Pandit <[email protected]>
Signed-off-by: Bart Van Assche <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Sagi Grimberg <[email protected]>
Cc: Steve Wise <[email protected]>
Cc: Parav Pandit <[email protected]>
Cc: Nicholas Bellinger <[email protected]>
Cc: Laurence Oberman <[email protected]>
Cc: <[email protected]> #v4.7+
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
Compute the SGE limit for RDMA READ and WRITE requests in
ib_create_qp(). Use that limit in the RDMA RW API implementation.
Signed-off-by: Bart Van Assche <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Sagi Grimberg <[email protected]>
Cc: Steve Wise <[email protected]>
Cc: Parav Pandit <[email protected]>
Cc: Nicholas Bellinger <[email protected]>
Cc: Laurence Oberman <[email protected]>
Cc: <[email protected]> #v4.7+
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
Some but not all callers of rdma_rw_ctx_init() zero-initialize
struct rdma_rw_ctx. Hence make rdma_rw_ctx_init() initialize all
work request fields that will be read by ib_post_send().
Fixes: a060b5629ab0 ("IB/core: generic RDMA READ/WRITE API")
Signed-off-by: Bart Van Assche <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Tested-by: Steve Wise <[email protected]>
Tested-by: Laurence Oberman <[email protected]>
Cc: Parav Pandit <[email protected]>
Cc: Nicholas Bellinger <[email protected]>
Cc: <[email protected]> #v4.7+
Signed-off-by: Doug Ledford <[email protected]>
|
|
Add sw counter to track dropped unsupported packets.
Report unsupported packets drop as the RcvError.
Reviewed-by: Dennis Dalessandro <[email protected]>
Signed-off-by: Jakub Pawlak <[email protected]>
Signed-off-by: Dennis Dalessandro <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
Add per VL XmitDiscards counters to the opapmaquery
status and error response.
Reviewed-by: Dean Luick <[email protected]>
Signed-off-by: Jakub Pawlak <[email protected]>
Signed-off-by: Dennis Dalessandro <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
Fix sparse errors by making sure the fast assign destinations
are host cpu typed.
For the void __iomem *, just make the field match source
data.
Fix a bug where the hw_free trace printed the pointer vs.
the dereferenced value.
Reviewed-by: Dennis Dalessandro <[email protected]>
Signed-off-by: Mike Marciniszyn <[email protected]>
Signed-off-by: Dennis Dalessandro <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
The ftrace infrastructure used to evaluate the TRACE_SYSTEM
macro on every DEFINE_EVENT() macro. Now the TRACE_SYSTEM
macro only gets evaluated when trace/define_trace.h is
included, so the group event information is lost. This was
introduced in
commit acd388fd3af3 ("tracing: Give system name a pointer")
Therefore, each system tracepoint must be on its own file.
Reviewed-by: Mike Marciniszyn <[email protected]>
Reviewed-by: Dennis Dalessandro <[email protected]>
Signed-off-by: Sebastian Sanchez <[email protected]>
Signed-off-by: Dennis Dalessandro <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
Fix a copy and paste typo in comment.
Reviewed-by: Dennis Dalessandro <[email protected]>
Signed-off-by: Tadeusz Struk <[email protected]>
Signed-off-by: Dennis Dalessandro <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
Simple code clean up of hfi1_write_iter.
Reviewed-by: Dennis Dalessandro <[email protected]>
Signed-off-by: Ira Weiny <[email protected]>
Signed-off-by: Dennis Dalessandro <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
The definition of port state changed mid development and the
old structure was kept accidentally. Remove this dead code.
Reviewed-by: Dennis Dalessandro <[email protected]>
Signed-off-by: Ira Weiny <[email protected]>
Signed-off-by: Dennis Dalessandro <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
Use kvfree() instead of open-coding it.
Signed-off-by: Wei Yongjun <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>
|
|
vringh is pulled in by caif and mic, but the other
vhost config does not need to be there.
In particular, it makes no sense to have vhost net/scsi/sock
under caif/mic.
Create a separate Kconfig file and put vringh bits there.
Signed-off-by: Michael S. Tsirkin <[email protected]>
|
|
Detect and fail early if long wrap around is triggered.
Signed-off-by: Michael S. Tsirkin <[email protected]>
|
|
This patch tries to implement an device IOTLB for vhost. This could be
used with userspace(qemu) implementation of DMA remapping
to emulate an IOMMU for the guest.
The idea is simple, cache the translation in a software device IOTLB
(which is implemented as an interval tree) in vhost and use vhost_net
file descriptor for reporting IOTLB miss and IOTLB
update/invalidation. When vhost meets an IOTLB miss, the fault
address, size and access can be read from the file. After userspace
finishes the translation, it writes the translated address to the
vhost_net file to update the device IOTLB.
When device IOTLB is enabled by setting VIRTIO_F_IOMMU_PLATFORM all vq
addresses set by ioctl are treated as iova instead of virtual address and
the accessing can only be done through IOTLB instead of direct userspace
memory access. Before each round or vq processing, all vq metadata is
prefetched in device IOTLB to make sure no translation fault happens
during vq processing.
In most cases, virtqueues are contiguous even in virtual address space.
The IOTLB translation for virtqueue itself may make it a little
slower. We might add fast path cache on top of this patch.
Signed-off-by: Jason Wang <[email protected]>
[mst: use virtio feature bit: VHOST_F_DEVICE_IOTLB -> VIRTIO_F_IOMMU_PLATFORM ]
[mst: fix build warnings ]
Signed-off-by: Michael S. Tsirkin <[email protected]>
[ weiyj.lk: missing unlock on error ]
Signed-off-by: Wei Yongjun <[email protected]>
|
|
vringh isn't used by vhost net or scsi - it's used
by CAIF only at the moment. Drop the dependency.
Signed-off-by: Michael S. Tsirkin <[email protected]>
|
|
Fix to return error code -ENOMEM from the workqueue alloc error handling
case instead of 0, as done elsewhere in this function.
Signed-off-by: Wei Yongjun <[email protected]>
Acked-by: Brian King <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>
|
|
Add the missing destroy_workqueue() before return from fcoe_init() in
the fcoe transport register failed error handling case.
Signed-off-by: Wei Yongjun <[email protected]>
Acked-by: Johannes Thumshirn <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>
|
|
Check for the existence of piocb->vport before accessing it.
Signed-off-by: Johannes Thumshirn <[email protected]>
Acked-by: James Smart <[email protected]>
Reviewed-by: Tyrel Datwyler <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>
|
|
100g support is not available in MSI mode. Failing the driver load in this scenario.
Please consider applying this to `net'.
Signed-off-by: Sudarsana Reddy Kalluru <[email protected]>
Signed-off-by: Yuval Mintz <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
of_parse_phandle
of_node_put needs to be called when the device node which is got
from of_parse_phandle has finished using.
Signed-off-by: Peter Chen <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
of_parse_phandle
of_node_put needs to be called when the device node which is got
from of_parse_phandle has finished using.
This commit fixes both local (in stmmac_axi_setup) and global
(plat->phy_node) device_node for this issue, and using the
correct device node when tries to put node at stmmac_probe_config_dt
for error path.
Signed-off-by: Peter Chen <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
calling of_parse_phandle
of_node_put needs to be called when the device node which is got
from of_parse_phandle has finished using.
Signed-off-by: Peter Chen <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
of_parse_phandle
of_node_put needs to be called when the device node which is got
from of_parse_phandle has finished using.
Signed-off-by: Peter Chen <[email protected]>
Acked-by: Sergei Shtylyov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
of_parse_phandle
of_node_put needs to be called when the device node which is got
from of_parse_phandle has finished using.
Signed-off-by: Peter Chen <[email protected]>
Acked-by: Sergei Shtylyov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
of_parse_phandle
of_node_put needs to be called when the device node which is got
from of_parse_phandle has finished using.
Signed-off-by: Peter Chen <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
of_node_put needs to be called when the device node which is got
from of_parse_phandle has finished using.
Signed-off-by: Peter Chen <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
of_parse_phandle
of_node_put needs to be called when the device node which is got
from of_parse_phandle has finished using.
Signed-off-by: Peter Chen <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|