aboutsummaryrefslogtreecommitdiff
path: root/include/linux/mlx5
AgeCommit message (Collapse)AuthorFilesLines
2019-07-15Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds2-3/+7
Pull rdma updates from Jason Gunthorpe: "A smaller cycle this time. Notably we see another new driver, 'Soft iWarp', and the deletion of an ancient unused driver for nes. - Revise and simplify the signature offload RDMA MR APIs - More progress on hoisting object allocation boiler plate code out of the drivers - Driver bug fixes and revisions for hns, hfi1, efa, cxgb4, qib, i40iw - Tree wide cleanups: struct_size, put_user_page, xarray, rst doc conversion - Removal of obsolete ib_ucm chardev and nes driver - netlink based discovery of chardevs and autoloading of the modules providing them - Move more of the rdamvt/hfi1 uapi to include/uapi/rdma - New driver 'siw' for software based iWarp running on top of netdev, much like rxe's software RoCE. - mlx5 feature to report events in their raw devx format to userspace - Expose per-object counters through rdma tool - Adaptive interrupt moderation for RDMA (DIM), sharing the DIM core from netdev" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (194 commits) RMDA/siw: Require a 64 bit arch RDMA/siw: Mark expected switch fall-throughs RDMA/core: Fix -Wunused-const-variable warnings rdma/siw: Remove set but not used variable 's' rdma/siw: Add missing dependencies on LIBCRC32C and DMA_VIRT_OPS RDMA/siw: Add missing rtnl_lock around access to ifa rdma/siw: Use proper enumerated type in map_cqe_status RDMA/siw: Remove unnecessary kthread create/destroy printouts IB/rdmavt: Fix variable shadowing issue in rvt_create_cq RDMA/core: Fix race when resolving IP address RDMA/core: Make rdma_counter.h compile stand alone IB/core: Work on the caller socket net namespace in nldev_newlink() RDMA/rxe: Fill in wc byte_len with IB_WC_RECV_RDMA_WITH_IMM RDMA/mlx5: Set RDMA DIM to be enabled by default RDMA/nldev: Added configuration of RDMA dynamic interrupt moderation to netlink RDMA/core: Provide RDMA DIM support for ULPs linux/dim: Implement RDMA adaptive moderation (DIM) IB/mlx5: Report correctly tag matching rendezvous capability docs: infiniband: add it to the driver-api bookset IB/mlx5: Implement VHCA tunnel mechanism in DEVX ...
2019-07-11Merge tag 'mlx5-fixes-2019-07-11' of ↵David S. Miller1-1/+2
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== Mellanox, mlx5 fixes 2019-07-11 This series introduces some fixes to mlx5 driver. Please pull and let me know if there is any problem. For -stable v4.15 ('net/mlx5e: IPoIB, Add error path in mlx5_rdma_setup_rn') For -stable v5.1 ('net/mlx5e: Fix port tunnel GRE entropy control') ('net/mlx5e: Rx, Fix checksum calculation for new hardware') ('net/mlx5e: Fix return value from timeout recover function') ('net/mlx5e: Fix error flow in tx reporter diagnose') For -stable v5.2 ('net/mlx5: E-Switch, Fix default encap mode') Conflict note: This pull request will produce a small conflict when merged with net-next. In drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c Take the hunk from net and replace: esw_offloads_steering_init(esw, vf_nvports, total_nvports); with: esw_offloads_steering_init(esw); ==================== Signed-off-by: David S. Miller <[email protected]>
2019-07-11net/mlx5e: Rx, Fix checksum calculation for new hardwareSaeed Mahameed1-1/+2
CQE checksum full mode in new HW, provides a full checksum of rx frame. Covering bytes starting from eth protocol up to last byte in the received frame (frame_size - ETH_HLEN), as expected by the stack. Fixing up skb->csum by the driver is not required in such case. This fix is to avoid wrong checksum calculation in drivers which already support the new hardware with the new checksum mode. Fixes: 85327a9c4150 ("net/mlx5: Update the list of the PCI supported devices") Signed-off-by: Saeed Mahameed <[email protected]>
2019-07-08Merge branch 'vhca-tunnel' into rdma.git for-nextJason Gunthorpe1-2/+4
Max Gurtovoy says: ==================== Those two patches introduce VHCA tunnel mechanism to DEVX interface needed for Bluefield SOC. See extensive commit messages for more information. ==================== Based on the mlx5-next branch from git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux for dependencies * branch 'vcha-tunnel': IB/mlx5: Implement VHCA tunnel mechanism in DEVX net/mlx5: Introduce VHCA tunnel device capability
2019-07-07net/mlx5: Introduce VHCA tunnel device capabilityMax Gurtovoy1-2/+4
When using the device emulation feature (introduced in Bluefield-1 SOC), a privileged function (the device emulation manager) will be able to create a channel to execute commands on behalf of the emulated function. This channel will be a general object of type VHCA_TUNNEL that will have a unique ID for each emulated function. This ID will be passed in each cmd that will be issued by the emulation SW in a well known offset in the command header. This channel is needed since the emulated function doesn't have a normal command interface to the HCA HW, but some basic configuration for that function is needed (e.g. initialize and enable the HCA). For that matter, a specific command-set was defined and only those commands will be issued by the HCA. Signed-off-by: Max Gurtovoy <[email protected]> Reviewed-by: Yishai Hadas <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]>
2019-07-05net/mlx5: Kconfig, Better organize compilation flagsTariq Toukan1-1/+1
Always contain all acceleration functions declarations in 'accel' files, independent to the flags setting. For this, introduce new flags CONFIG_FPGA_{IPSEC/TLS} and use stubs where needed. This obsoletes the need for stubs in 'fpga' files. Remove them. Also use the new flags in Makefile, to decide whether to compile TLS-specific or IPSEC-specific objects, or not. Signed-off-by: Tariq Toukan <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-07-05IB/mlx5: Support set qp counterMark Zhang1-0/+1
Support bind a qp with counter. If counter is null then bind the qp to the default counter. Different QP state has different operation: - RESET: Set the counter field so that it will take effective during RST2INIT change; - RTS: Issue an RTS2RTS change to update the QP counter; - Other: Set the counter field and mark the counter_pending flag, when QP is moved to RTS state and this flag is set, then issue an RTS2RTS modification to update the counter. Signed-off-by: Mark Zhang <[email protected]> Reviewed-by: Majd Dibbiny <[email protected]> Acked-by: Saeed Mahameed <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2019-07-05Merge mlx5-next into rdma for-nextJason Gunthorpe6-13/+131
From git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux Required for dependencies in the next patches. * mlx5-next: net/mlx5: Add rts2rts_qp_counters_set_id field in hca cap net/mlx5: Properly name the generic WQE control field net/mlx5: Introduce TLS TX offload hardware bits and structures net/mlx5: Refactor mlx5_esw_query_functions for modularity net/mlx5: E-Switch prepare functions change handler to be modular net/mlx5: Introduce and use mlx5_eswitch_get_total_vports()
2019-07-04Merge branch 'mlx5-next' of ↵Saeed Mahameed8-34/+344
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux Misc updates from mlx5-next branch: 1) Add the required HW definitions and structures for upcoming TLS support. 2) Add support for MCQI and MCQS hardware registers for fw version query. 3) Added hardware bits and structures definitions for sub-functions 4) Small code cleanup and improvement for PF pci driver. 5) Bluefield (ECPF) updates and refactoring for better E-Switch management on ECPF embedded CPU NIC: 5.1) Consolidate querying eswitch number of VFs 5.2) Register event handler at the correct E-Switch init stage 5.3) Setup PF's inline mode and vlan pop when the ECPF is the E-Swtich manager ( the host PF is basically a VF ). 5.4) Handle Vport UC address changes in switchdev mode. 6) Cleanup the rep and netdev reference when unloading IB rep. Signed-off-by: Saeed Mahameed <[email protected]> i# All conflicts fixed but you are still merging.
2019-07-04net/mlx5: Add rts2rts_qp_counters_set_id field in hca capMark Zhang1-1/+3
Add rts2rts_qp_counters_set_id field in hca cap so that RTS2RTS qp modification can be used to change the counter of a QP. Signed-off-by: Mark Zhang <[email protected]> Reviewed-by: Majd Dibbiny <[email protected]> Acked-by: Saeed Mahameed <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]>
2019-07-03net/mlx5: Properly name the generic WQE control fieldTariq Toukan1-1/+6
A generic WQE control field is used for different purposes in different cases. Use union to allow using the proper name in each case. Signed-off-by: Tariq Toukan <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2019-07-03net/mlx5: Introduce TLS TX offload hardware bits and structuresEran Ben Elisha2-4/+114
Add TLS offload related IFC structs, layouts and enumerations. Signed-off-by: Eran Ben Elisha <[email protected]> Signed-off-by: Tariq Toukan <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2019-07-03net/mlx5: Introduce and use mlx5_eswitch_get_total_vports()Parav Pandit3-7/+8
Instead MLX5_TOTAL_VPORTS, use mlx5_eswitch_get_total_vports(). mlx5_eswitch_get_total_vports() in subsequent patch accounts for SF vports as well. Expanding MLX5_TOTAL_VPORTS macro would require exposing SF internals to more generic vport.h header file. Such exposure is not desired. Hence a mlx5_eswitch_get_total_vports() is introduced. Given that mlx5_eswitch_get_total_vports() API wants to work on const mlx5_core_dev*, change its helper functions also to accept const *dev. Signed-off-by: Parav Pandit <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2019-07-03Merge mlx5-next into rdma for-nextJason Gunthorpe9-49/+294
From git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux Required for dependencies in the next patches. Resolved the conflicts: - esw_destroy_offloads_acl_tables() use the newer mlx5_esw_for_all_vports() version - esw_offloads_steering_init() drop the cap test - esw_offloads_init() drop the extra function arguments * branch 'mlx5-next': (39 commits) net/mlx5: Expose device definitions for object events net/mlx5: Report EQE data upon CQ completion net/mlx5: Report a CQ error event only when a handler was set net/mlx5: mlx5_core_create_cq() enhancements net/mlx5: Expose the API to register for ANY event net/mlx5: Use event mask based on device capabilities net/mlx5: Fix mlx5_core_destroy_cq() error flow net/mlx5: E-Switch, Handle UC address change in switchdev mode net/mlx5: E-Switch, Consider host PF for inline mode and vlan pop net/mlx5: E-Switch, Use iterator for vlan and min-inline setups net/mlx5: E-Switch, Reg/unreg function changed event at correct stage net/mlx5: E-Switch, Consolidate eswitch function number of VFs net/mlx5: E-Switch, Refactor eswitch SR-IOV interface net/mlx5: Handle host PF vport mac/guid for ECPF net/mlx5: E-Switch, Use correct flags when configuring vlan net/mlx5: Reduce dependency on enabled_vfs counter and num_vfs net/mlx5: Don't handle VF func change if host PF is disabled net/mlx5: Limit scope of mlx5_get_next_phys_dev() to PCI PF devices net/mlx5: Move pci status reg access mutex to mlx5_pci_init net/mlx5: Rename mlx5_pci_dev_type to mlx5_coredev_type ... Signed-off-by: Jason Gunthorpe <[email protected]>
2019-07-03net/mlx5: Expose device definitions for object eventsYishai Hadas1-0/+21
Expose an extra device definitions for objects events. It includes: object_type values for legacy objects and generic data header for any other object. Signed-off-by: Yishai Hadas <[email protected]> Acked-by: Saeed Mahameed <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]>
2019-07-03net/mlx5: Report EQE data upon CQ completionYishai Hadas1-2/+2
Report EQE data upon CQ completion to let upper layers use this data. Signed-off-by: Yishai Hadas <[email protected]> Acked-by: Saeed Mahameed <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]>
2019-07-03net/mlx5: mlx5_core_create_cq() enhancementsYishai Hadas1-1/+1
Enhance mlx5_core_create_cq() to get the command out buffer from the callers to let them use the output. Signed-off-by: Yishai Hadas <[email protected]> Acked-by: Saeed Mahameed <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]>
2019-07-03net/mlx5: Expose the API to register for ANY eventYishai Hadas1-0/+2
Expose the API to register for ANY event, mlx5_ib will be able to use this functionality for its needs. Signed-off-by: Yishai Hadas <[email protected]> Acked-by: Saeed Mahameed <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]>
2019-07-03net/mlx5: Use event mask based on device capabilitiesYishai Hadas3-5/+16
Use the reported device capabilities for the supported user events (i.e. affiliated and un-affiliated) to set the EQ mask. As the event mask can be up to 256 defined by 4 entries of u64 change the applicable code to work accordingly. Signed-off-by: Yishai Hadas <[email protected]> Acked-by: Saeed Mahameed <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]>
2019-07-01net/mlx5: E-Switch, Consider host PF for inline mode and vlan popBodong Wang1-0/+1
When ECPF is the eswitch manager, host PF is treated like other VFs. Driver should do the same for inline mode and vlan pop. Add new iterators to include host PF if ECPF is the eswitch manager. Signed-off-by: Bodong Wang <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2019-07-01net/mlx5: E-Switch, Refactor eswitch SR-IOV interfaceBodong Wang1-3/+3
Devlink eswitch mode is not necessarily related to SR-IOV, e.g, ECPF can be at offload mode when SR-IOV is not enabled. Rename the interface and eswitch mode names to decouple from SR-IOV, and cleanup eswitch messages accordingly. Signed-off-by: Bodong Wang <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2019-07-01net/mlx5: Handle host PF vport mac/guid for ECPFBodong Wang1-1/+2
When ECPF is eswitch manager, it has the privilege to query and configure the mac and node guid of host PF. While vport number of host PF is 0, the vport command should be issued with other_vport set in this case as the cmd is issued by ECPF vport(0xfffe). Add a specific function to query own vport mac. Low level functions are used by vport manager to query/modify any vport mac and node guid. Signed-off-by: Bodong Wang <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2019-07-01net/mlx5: Reduce dependency on enabled_vfs counter and num_vfsParav Pandit1-1/+0
While enabling SR-IOV, PCI core already checks that if SR-IOV is already enabled, it returns failure error code. Hence, remove such duplicate check from mlx5_core driver. While at it, make mlx5_device_disable_sriov() to perform cleanup of VFs in reverse order of mlx5_device_enable_sriov(). Signed-off-by: Parav Pandit <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2019-07-01net/mlx5: Don't handle VF func change if host PF is disabledBodong Wang1-1/+2
When ECPF eswitch manager is at offloads mode, it monitors functions changed event from host PF side and acts according to the number of VFs enabled/disabled. As ECPF and host PF work in two independent hosts, it's possible that host PF OS reboots but ECPF system is still kept on and continues monitoring events from host PF. When kernel from host PF side is booting, PCI iov driver does sriov_init and compute_max_vf_buses by iterating over all valid num of VFs. This triggers FLR and generates functions changed events, even though host PF HCA is not enabled at this time. However, ECPF is not aware of this information, and still handles these events as usual. ECPF system will see massive number of reps are created, but destroyed immediately once creation finished. To eliminate this noise, a bit is added to host parameter context to indicate host PF is disabled. ECPF will not handle the VF changed event if this bit is set. Signed-off-by: Bodong Wang <[email protected]> Reviewed-by: Daniel Jurgens <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2019-07-01net/mlx5: Rename mlx5_pci_dev_type to mlx5_coredev_typeHuy Nguyen1-3/+8
Rename mlx5_pci_dev_type to mlx5_coredev_type to distinguish different mlx5 device types. mlx5_coredev_type represents mlx5_core_dev instance type. Hence keep mlx5_coredev_type in mlx5_core_dev structure. Signed-off-by: Huy Nguyen <[email protected]> Signed-off-by: Vu Pham <[email protected]> Signed-off-by: Parav Pandit <[email protected]> Reviewed-by: Parav Pandit <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2019-07-01{IB, net}/mlx5: E-Switch, Use index of rep for vport to IB port mappingBodong Wang1-0/+2
In the single IB device mode, the mapping between vport number and rep relies on a counter. However for dynamic vport allocation, it is desired to keep consistent map of eswitch vport and IB port. Hence, simplify code to remove the free running counter and instead use the available vport index during load/unload sequence from the eswitch. Signed-off-by: Bodong Wang <[email protected]> Suggested-by: Parav Pandit <[email protected]> Reviewed-by: Parav Pandit <[email protected]> Reviewed-by: Mark Bloch <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2019-07-01net/mlx5: Added MCQI and MCQS registers' description to ifcShay Agroskin2-2/+58
Given a fw component index, the MCQI register allows us to query this component's information (e.g. its version and capabilities). Given a fw component index, the MCQS register allows us to query the status of a fw component, including its type and state (e.g. PRESET/IN_USE). It can be used to find the index of a component of a specific type, by sequentially increasing the component index, and querying each time the type of the returned component. If max component index is reached, 'last_index_flag' is set by the HCA. These registers' description was added to query the running and pending fw version of the HCA. Signed-off-by: Shay Agroskin <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2019-07-01net/mlx5: Add hardware definitions for sub functionsParav Pandit1-3/+96
Update mlx5 device interface data structures for: 1. New command definitions for allocating, deallocating SF 2. Query SF partition 3. Eswitch SF fields 4. HCA CAP SF fields 5. Extend Eswitch functions command for SF Signed-off-by: Parav Pandit <[email protected]> Signed-off-by: Vu Pham <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2019-06-28Merge tag 'v5.2-rc6' into rdma.git for-nextJason Gunthorpe1-3/+3
For dependencies in next patches. Resolve conflicts: - Use uverbs_get_cleared_udata() with new cq allocation flow - Continue to delete nes despite SPDX conflict - Resolve list appends in mlx5_command_str() - Use u16 for vport_rule stuff - Resolve list appends in struct ib_client Signed-off-by: Jason Gunthorpe <[email protected]>
2019-06-28net/mlx5e: reduce stack usage in mlx5_eswitch_termtbl_createArnd Bergmann1-1/+1
Putting an empty 'mlx5_flow_spec' structure on the stack is a bit wasteful and causes a warning on 32-bit architectures when building with clang -fsanitize-coverage: drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads_termtbl.c: In function 'mlx5_eswitch_termtbl_create': drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads_termtbl.c:90:1: error: the frame size of 1032 bytes is larger than 1024 bytes [-Werror=frame-larger-than=] Since the structure is never written to, we can statically allocate it to avoid the stack usage. To be on the safe side, mark all subsequent function arguments that we pass it into as 'const' as well. Fixes: 10caabdaad5a ("net/mlx5e: Use termination table for VLAN push actions") Signed-off-by: Arnd Bergmann <[email protected]> Acked-by: Saeed Mahameed <[email protected]> Acked-by: Mark Bloch <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2019-06-28Merge branch 'mlx5-next' of ↵Saeed Mahameed6-49/+105
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux Misc updates from mlx5-next branch: 1) E-Switch vport metadata support for source vport matching 2) Convert mkey_table to XArray 3) Shared IRQs and to use single IRQ for all async EQs Signed-off-by: Saeed Mahameed <[email protected]>
2019-06-26net/mlx5e: Specifying known origin of packets matching the flowJianbo Liu1-0/+1
In vport metadata matching, source port number is replaced by metadata. While FW has no idea about what it is in the metadata, a syndrome will happen. Specify a known origin to avoid the syndrome. However, there is no functional change because ANY_VPORT (0) is filled in flow_source, the same default value as before, as a pre-step towards metadata matching for fast path. There are two other values can be filled in flow_source. When setting 0x1, packet matching this rule is from uplink, while 0x2 is for packet from other local vports. Signed-off-by: Jianbo Liu <[email protected]> Reviewed-by: Mark Bloch <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2019-06-26net/mlx5: E-Switch, Tag packet with vport number in VF vports and uplink ↵Jianbo Liu1-0/+17
ingress ACLs When a dual-port VHCA sends a RoCE packet on its non-native port, and the packet arrives to its affiliated vport FDB, a mismatch might occur on the rules that match the packet source vport as it is not represented by single VHCA only in this case. So we change to match on metadata instead of source vport. To do that, a rule is created in all vports and uplink ingress ACLs, to save the source vport number and vhca id in the packet's metadata in order to match on it later. The metadata register used is the first of the 32-bit type C registers. It can be used for matching and header modify operations. The higher 16 bits of this register are for vhca id, and the lower 16 ones is for vport number. This change is not for dual-port RoCE only. If HW and FW allow, the vport metadata matching is enabled by default. Signed-off-by: Jianbo Liu <[email protected]> Reviewed-by: Eli Britstein <[email protected]> Reviewed-by: Roi Dayan <[email protected]> Reviewed-by: Mark Bloch <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2019-06-26net/mlx5: Add flow context for flow tagJianbo Liu1-4/+11
Refactor the flow data structures, add new flow_context and move flow_tag into it, as flow_tag doesn't belong to the rule action. Signed-off-by: Jianbo Liu <[email protected]> Reviewed-by: Mark Bloch <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2019-06-26net/mlx5: Introduce vport metadata matching bits and enum constantsJianbo Liu1-7/+49
When a dual-port VHCA sends a RoCE packet on its non-native port, and the packet arrives to its affiliated vport FDB, a mismatch might occur on the rules that match the packet source vport. So we replace the match on source port with the match on metadata that was configured in ingress ACL, and that metadata will be passed further also to the NIC RX table of the eswitch manager. Introduce vport metadata matching bits and enum constants as a pre-step towards metadata matching. o metadata type C registers in the misc parameters 2 fields. o esw_uplink_ingress_acl bit in esw cap. If it set, the device supports ingress ACL for the uplink vport. o fdb_to_vport_reg_* bits in flow table cap and esw vport context, to support propagating the metadata to the nic rx through the loopback path. o flow_source in flow context, to indicate the known origin of packets. o enum constants, to support the above bits. Signed-off-by: Jianbo Liu <[email protected]> Reviewed-by: Eli Britstein <[email protected]> Reviewed-by: Roi Dayan <[email protected]> Reviewed-by: Mark Bloch <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2019-06-24net/mlx5: Convert mkey_table to XArrayMatthew Wilcox2-16/+2
The lock protecting the data structure does not need to be an rwlock. The only read access to the lock is in an error path, and if that's limiting your scalability, you have bigger performance problems. Eliminate mlx5_mkey_table in favour of using the xarray directly. reg_mr_callback must use GFP_ATOMIC for allocating XArray nodes as it may be called in interrupt context. This also fixes a minor bug where SRCU locking was being used on the radix tree read side, when RCU was needed too. Signed-off-by: Matthew Wilcox <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2019-06-24RDMA/mlx5: Introduce and implement new IB_WR_REG_MR_INTEGRITY work requestMax Gurtovoy1-1/+2
This new WR will be used to perform PI (protection information) handover using the new API. Using the new API, the user will post a single WR that will internally perform all the needed actions to complete PI operation. This new WR will use a memory region that was allocated as IB_MR_TYPE_INTEGRITY and was mapped using ib_map_mr_sg_pi to perform the registration. In the old API, in order to perform a signature handover operation, each ULP should perform the following: 1. Map and register the data buffers. 2. Map and register the protection buffers. 3. Post a special reg WR to configure the signature handover operation layout. 4. Invalidate the signature memory key. 5. Invalidate protection buffers memory key. 6. Invalidate data buffers memory key. In the new API, the mapping of both data and protection buffers is performed using a single call to ib_map_mr_sg_pi function. Also the registration of the buffers and the configuration of the signature operation layout is done by a single new work request called IB_WR_REG_MR_INTEGRITY. This patch implements this operation for mlx5 devices that are capable to offload data integrity generation/validation while performing the actual buffer transfer. This patch will not remove the old signature API that is used by the iSER initiator and target drivers. This will be done in the future. In the internal implementation, for each IB_WR_REG_MR_INTEGRITY work request, we are using a single UMR operation to register both data and protection buffers using KLM's. Afterwards, another UMR operation will describe the strided block format. These will be followed by 2 SET_PSV operations to set the memory/wire domains initial signature parameters passed by the user. In the end of the whole transaction, only the signature memory key (the one that exposed for the RDMA operation) will be invalidated. Signed-off-by: Max Gurtovoy <[email protected]> Signed-off-by: Israel Rukshin <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2019-06-16net/mlx5: Expose eswitch encap modeMaor Gottlieb1-0/+12
Add API to get the current Eswitch encap mode. It will be used in downstream patches to check if flow table can be created with encap support or not. Signed-off-by: Maor Gottlieb <[email protected]> Reviewed-by: Petr Vorel <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Reviewed-by: Parav Pandit <[email protected]>
2019-06-13net/mlx5: Report devlink health on FW fatal issuesMoshe Shemesh1-1/+1
Report devlink health on FW fatal issues via fw_fatal_reporter. The driver recover flow for FW fatal error is now being handled by the devlink health. Having the recovery controlled by devlink health, the user has the ability to cancel the auto-recovery for debug session and run it manually. Call mlx5_enter_error_state() before calling devlink_health_report() to ensure entering device error state even if auto-recovery is off. Signed-off-by: Moshe Shemesh <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2019-06-13net/mlx5: Add fw fatal devlink_health_reporterMoshe Shemesh1-0/+1
Create mlx5_devlink_health_reporter for fw fatal reporter. The fw fatal reporter is added in addition to the fw reporter and implements the recover callback. The point of having two reporters for FW issues, is that we don't want to run FW recover on any issue, but only fatal ones. Signed-off-by: Moshe Shemesh <[email protected]> Signed-off-by: Eran Ben Elisha <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2019-06-13net/mlx5: Report devlink health on FW issuesMoshe Shemesh1-1/+2
Use devlink_health_report() to report any symptom of FW issue as FW counter miss or new health syndrome. The FW issues detected in mlx5 during poll_health which is called in timer atomic context and so health work queue is used to schedule the reports. Signed-off-by: Moshe Shemesh <[email protected]> Signed-off-by: Eran Ben Elisha <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2019-06-13net/mlx5: Create FW devlink_health_reporterMoshe Shemesh1-0/+2
Create mlx5_devlink_health_reporter for FW reporter. The FW reporter implements devlink_health_reporter diagnose callback. The fw reporter diagnose command can be triggered any time by the user to check current fw status. In healthy status, it will return clear syndrome. Otherwise it will return the syndrome and description of the error type. Command example and output on healthy status: $ devlink health diagnose pci/0000:82:00.0 reporter fw Syndrome: 0 Command example and output on non healthy status: $ devlink health diagnose pci/0000:82:00.0 reporter fw Syndrome: 8 Description: unrecoverable hardware error Signed-off-by: Moshe Shemesh <[email protected]> Signed-off-by: Eran Ben Elisha <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2019-06-13net/mlx5: Issue SW reset on FW assertFeras Daoud2-1/+10
If a FW assert is considered fatal, indicated by a new bit in the health buffer, reset the FW. After the reset go through the normal recovery flow. Only one PF needs to issue the reset, so an attempt is made to prevent the 2nd function from also issuing the reset. It's not an error if that happens, it just slows recovery. Signed-off-by: Feras Daoud <[email protected]> Signed-off-by: Alex Vesker <[email protected]> Signed-off-by: Moshe Shemesh <[email protected]> Signed-off-by: Daniel Jurgens <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2019-06-13net/mlx5: Handle SW reset of FW in error flowFeras Daoud1-1/+1
New mlx5 adapters allow the driver to reset the FW in the event of an error, this action called "SW Reset". When an SW reset is issued on any PF all PFs enter reset state which is a recoverable condition. The existing recovery flow was designed to allow the recovery of a VF after a PF driver reload. This patch adds the sw reset to the NIC states as a preparation for sw reset handling. When a software reset is issued the following occurs: 1. The NIC interface mode is set to 7 while the reset is in progress. 2. Once the reset completes the NIC interface mode is set to 1. Signed-off-by: Feras Daoud <[email protected]> Signed-off-by: Moshe Shemesh <[email protected]> Signed-off-by: Daniel Jurgens <[email protected]> Reviewed-by: Alex Vesker <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2019-06-13net/mlx5: Add Crdump supportAlex Vesker1-0/+1
Crdump allows the driver to retrieve a dump of the FW PCI crspace. This is useful in case of catastrophic issues which may require FW reset. The crspace dump can be used for later debug. Signed-off-by: Alex Vesker <[email protected]> Signed-off-by: Moshe Shemesh <[email protected]> Reviewed-by: Feras Daoud <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2019-06-13net/mlx5: Add Vendor Specific Capability access gatewayAlex Vesker1-0/+1
The Vendor Specific Capability (VSC) is used to activate a gateway interfacing with the device. The gateway is used to read or write device configurations, which are organized in different domains (spaces). A configuration access may result in multiple actions, reads, writes. Example usages are accessing the Crspace domain to read the crspace or locking a device semaphore using the Semaphore domain. The configuration access use pci_cfg_access to prevent parallel access to the VSC space by the driver and userspace calls. Signed-off-by: Alex Vesker <[email protected]> Signed-off-by: Feras Daoud <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2019-06-13net/mlx5: Add EQ enable/disable APIYuval Avnery1-1/+4
Previously, EQ joined the chain notifier on creation. This forced the caller to be ready to handle events before creating the EQ through eq_create_generic interface. To help the caller control when the created EQ will be attached to the IRQ, add enable/disable API. Signed-off-by: Yuval Avnery <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2019-06-13net/mlx5: Use a single IRQ for all async EQsAriel Levkovich1-12/+2
The patch modifies the IRQ allocation so that all async EQs are assigned to the same IRQ resulting in more available IRQs for completion EQs. The changes are using the support for IRQ sharing and EQ polling budget that was introduced in previous patches so when the shared interrupt is triggered, the kernel will serially call the handler of each of the sharing EQs with a certain budget of EQEs to poll in order to prevent starvation. Signed-off-by: Ariel Levkovich <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2019-06-13net/mlx5: Separate IRQ data from EQ table dataYuval Avnery1-0/+3
IRQ table should only exist for mlx5_core_dev for PF and VF only. EQ table of mediated devices should hold a pointer to the IRQ table of the parent PCI device. Signed-off-by: Yuval Avnery <[email protected]> Reviewed-by: Parav Pandit <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2019-06-13net/mlx5: Separate IRQ request/free from EQ life cycleYuval Avnery1-2/+1
Instead of requesting IRQ with eq creation, IRQs will be requested before EQ table creation. Instead of freeing the IRQs after EQ destroy, free IRQs after eq table destroy. Signed-off-by: Yuval Avnery <[email protected]> Reviewed-by: Parav Pandit <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>