aboutsummaryrefslogtreecommitdiff
path: root/drivers/net/ethernet
AgeCommit message (Collapse)AuthorFilesLines
2020-10-02dpaa2-eth: add support for devlink parser error drop trapsIoana Ciornei3-1/+385
Add support for the new group of devlink traps - PARSER_ERROR_DROPS. This consists of registering the array of parser error drops supported, controlling their action through the .trap_group_action_set() callback and reporting an erroneous skb received on the error queue appropriately. DPAA2 devices do not support controlling the action of independent parser error traps, thus the .trap_action_set() callback just returns an EOPNOTSUPP while .trap_group_action_set() actually notifies the hardware what it should do with a frame marked as having a header error. Signed-off-by: Ioana Ciornei <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-10-02dpaa2-eth: add basic devlink supportIoana Ciornei4-1/+119
Add basic support in dpaa2-eth for devlink. For the moment, just register the device with devlink, add the corresponding devlink port and implement the .info_get() callback. Signed-off-by: Ioana Ciornei <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-10-02ionic: add new bad firmware error codeShannon Nelson2-0/+3
If the new firmware image downladed for update is corrupted or is a bad format, the download process will report a status code specifically for that. Signed-off-by: Shannon Nelson <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-10-02ionic: use lif ident for filter countShannon Nelson1-9/+10
Use the lif's ident information for the uc and mc filter counts rather than the ionic's version, to be sure we're getting the info that is specific to this lif. While we're thinking about it, add some missing error checking where we get the lif's identity information. Signed-off-by: Shannon Nelson <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-10-02ionic: refill lif identity after fw_upShannon Nelson3-14/+22
After we do a fw upgrade and refill the ionic->ident.dev, we also need to update the other identity info. Since the lif identity needs to be updated each time the ionic identity is refreshed, we can pull it into ionic_identify(). The debugfs entry is moved so that it doesn't cause an error message when the data is refreshed after the fw upgrade. Signed-off-by: Shannon Nelson <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-10-02ionic: disable all queue napi contexts on timeoutShannon Nelson1-26/+21
Some time ago we short-circuited the queue disables on a timeout error in order to not have to wait on every queue when we already know it will time out. However, this meant that we're not properly stopping all the interrupts and napi contexts. This changes queue disable to always call ionic_qcq_disable() and to give it an argument to know when to not do the adminq request. Signed-off-by: Shannon Nelson <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-10-02ionic: check qcq ptr in ionic_qcq_disableShannon Nelson1-11/+20
There are a couple of error recovery paths that can come through ionic_qcq_disable() without having set up the qcq, so we need to make sure we have a valid qcq pointer before using it. Signed-off-by: Shannon Nelson <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-10-02ionic: clear linkcheck bit on alloc failShannon Nelson1-1/+3
Clear our link check requested flag on an allocation error. We end up dropping this link check request, but that should be fine as our watchdog will come back a few seconds later and request it again. Reported-by: Sebastian Andrzej Siewior <[email protected]> Signed-off-by: Shannon Nelson <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-10-02ionic: drain the work queueShannon Nelson1-10/+13
Check through our work list for additional items. This normally will only have one item, but occasionally may have another job waiting. There really is no need reschedule ourself here. Reported-by: Sebastian Andrzej Siewior <[email protected]> Signed-off-by: Shannon Nelson <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-10-02ionic: contiguous memory for notifyqShannon Nelson1-22/+47
The event notification queue is set up a little differently in the NIC and so the notifyq q and cq descriptor structures need to be contiguous, which got missed in an earlier patch that separated out the q and cq descriptor allocations. That patch was aimed at making the big tx and rx descriptor queue allocations easier to manage - the notifyq is much smaller and doesn't need to be split. This patch simply adds an if/else and slightly different code for the notifyq descriptor allocation. Fixes: ea5a8b09dc3a ("ionic: reduce contiguous memory allocation requirement") Signed-off-by: Shannon Nelson <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-10-02Merge tag 'mlx5-fixes-2020-09-30' of ↵David S. Miller12-119/+347
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux From: Saeed Mahameed <[email protected]> ==================== This series introduces some fixes to mlx5 driver. v1->v2: - Patch #1 Don't return while mutex is held. (Dave) v2->v3: - Drop patch #1, will consider a better approach (Jakub) - use cpu_relax() instead of cond_resched() (Jakub) - while(i--) to reveres a loop (Jakub) - Drop old mellanox email sign-off and change the committer email (Jakub) Please pull and let me know if there is any problem. For -stable v4.15 ('net/mlx5e: Fix VLAN cleanup flow') ('net/mlx5e: Fix VLAN create flow') For -stable v4.16 ('net/mlx5: Fix request_irqs error flow') For -stable v5.4 ('net/mlx5e: Add resiliency in Striding RQ mode for packets larger than MTU') ('net/mlx5: Avoid possible free of command entry while timeout comp handler') For -stable v5.7 ('net/mlx5e: Fix return status when setting unsupported FEC mode') For -stable v5.8 ('net/mlx5e: Fix race condition on nhe->n pointer in neigh update') ==================== Signed-off-by: David S. Miller <[email protected]>
2020-10-02net: mscc: ocelot: offload redirect action to VCAP IS2Vladimir Oltean1-3/+25
Via the OCELOT_MASK_MODE_REDIRECT flag put in the IS2 action vector, it is possible to replace previous forwarding decisions with the port mask installed in this rule. I have studied Table 54 "MASK_MODE and PORT_MASK Combinations" from the VSC7514 documentation and it appears to behave sanely when this rule is installed in either lookup 0 or 1. Namely, a redirect in lookup 1 will overwrite the forwarding decision taken by any entry in lookup 0. Signed-off-by: Vladimir Oltean <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-10-02net: mscc: ocelot: relax ocelot_exclusive_mac_etype_filter_rules()Vladimir Oltean1-14/+22
The issue which led to the introduction of this check was that MAC_ETYPE rules, such as filters on dst_mac and src_mac, would only match non-IP frames. There is a knob in VCAP_S2_CFG which forces all IP frames to be treated as non-IP, which is what we're currently doing if the user requested a dst_mac filter, in order to maintain sanity. But that knob is actually per IS2 lookup. And the good thing with exposing the lookups to the user via tc chains is that we're now able to offload MAC_ETYPE keys to one lookup, and IP keys to the other lookup. So let's do that. Signed-off-by: Vladimir Oltean <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-10-02net: mscc: ocelot: only install TCAM entries into a specific lookup and PAGVladimir Oltean2-5/+11
We were installing TCAM rules with the LOOKUP field as unmasked, meaning that all entries were matching on all lookups. Now that lookups are exposed as individual chains, let's make the LOOKUP explicit when offloading TCAM entries. Signed-off-by: Vladimir Oltean <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-10-02net: mscc: ocelot: offload egress VLAN rewriting to VCAP ES0Xiaoliang Yang4-10/+266
VCAP ES0 is an egress VCAP operating on all outgoing frames. This patch added ES0 driver to support vlan push action of tc filter. Usage: tc filter add dev swp1 egress protocol 802.1Q flower indev swp0 skip_sw \ vlan_id 1 vlan_prio 1 action vlan push id 2 priority 2 Signed-off-by: Xiaoliang Yang <[email protected]> Signed-off-by: Vladimir Oltean <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-10-02net: mscc: ocelot: offload ingress skbedit and vlan actions to VCAP IS1Xiaoliang Yang4-0/+225
VCAP IS1 is a VCAP module which can filter on the most common L2/L3/L4 Ethernet keys, and modify the results of the basic QoS classification and VLAN classification based on those flow keys. There are 3 VCAP IS1 lookups, mapped over chains 10000, 11000 and 12000. Currently the driver is hardcoded to use IS1_ACTION_TYPE_NORMAL half keys. Note that the VLAN_MANGLE has been omitted for now. In hardware, the VCAP_IS1_ACT_VID_REPLACE_ENA field replaces the classified VLAN (metadata associated with the frame) and not the VLAN from the header itself. There are currently some issues which need to be addressed when operating in standalone, or in bridge with vlan_filtering=0 modes, because in those cases the switch ports have VLAN awareness disabled, and changing the classified VLAN to anything other than the pvid causes the packets to be dropped. Another issue is that on egress, we expect port tagging to push the classified VLAN, but port tagging is disabled in the modes mentioned above, so although the classified VLAN is replaced, it is not visible in the packet transmitted by the switch. Signed-off-by: Xiaoliang Yang <[email protected]> Signed-off-by: Vladimir Oltean <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-10-02net: mscc: ocelot: create TCAM skeleton from tc filter chainsVladimir Oltean3-32/+288
For Ocelot switches, there are 2 ingress pipelines for flow offload rules: VCAP IS1 (Ingress Classification) and IS2 (Security Enforcement). IS1 and IS2 support different sets of actions. The pipeline order for a packet on ingress is: Basic classification -> VCAP IS1 -> VCAP IS2 Furthermore, IS1 is looked up 3 times, and IS2 is looked up twice (each TCAM entry can be configured to match only on the first lookup, or only on the second, or on both etc). Because the TCAMs are completely independent in hardware, and because of the fixed pipeline, we actually have very limited options when it comes to offloading complex rules to them while still maintaining the same semantics with the software data path. This patch maps flow offload rules to ingress TCAMs according to a predefined chain index number. There is going to be a script in selftests that clarifies the usage model. There is also an egress TCAM (VCAP ES0, the Egress Rewriter), which is modeled on top of the default chain 0 of the egress qdisc, because it doesn't have multiple lookups. Suggested-by: Allan W. Nielsen <[email protected]> Co-developed-by: Xiaoliang Yang <[email protected]> Signed-off-by: Xiaoliang Yang <[email protected]> Signed-off-by: Vladimir Oltean <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-10-02net: mscc: ocelot: introduce conversion helpers between port and netdevVladimir Oltean3-0/+34
Since the mscc_ocelot_switch_lib is common between a pure switchdev and a DSA driver, the procedure of retrieving a net_device for a certain port index differs, as those are registered by their individual front-ends. Up to now that has been dealt with by always passing the port index to the switch library, but now, we're going to need to work with net_device pointers from the tc-flower offload, for things like indev, or mirred. It is not desirable to refactor that, so let's make sure that the flower offload core has the ability to translate between a net_device and a port index properly. Signed-off-by: Vladimir Oltean <[email protected]> Acked-by: Alexandre Belloni <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-10-02net: mscc: ocelot: offload multiple tc-flower actions in same ruleVladimir Oltean3-55/+53
At this stage, the tc-flower offload of mscc_ocelot can only delegate rules to the VCAP IS2 security enforcement block. These rules have, in hardware, separate bits for policing and for overriding the destination port mask and/or copying to the CPU. So it makes sense that we attempt to expose some more of that low-level complexity instead of simply choosing between a single type of action. Something similar happens with the VCAP IS1 block, where the same action can contain enable bits for VLAN classification and for QoS classification at the same time. So model the action structure after the hardware description, and let the high-level ocelot_flower.c construct an action vector from multiple tc actions. Signed-off-by: Vladimir Oltean <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-10-02net/mlx5e: Fix race condition on nhe->n pointer in neigh updateVlad Buslov2-37/+50
Current neigh update event handler implementation takes reference to neighbour structure, assigns it to nhe->n, tries to schedule workqueue task and releases the reference if task was already enqueued. This results potentially overwriting existing nhe->n pointer with another neighbour instance, which causes double release of the instance (once in neigh update handler that failed to enqueue to workqueue and another one in neigh update workqueue task that processes updated nhe->n pointer instead of original one): [ 3376.512806] ------------[ cut here ]------------ [ 3376.513534] refcount_t: underflow; use-after-free. [ 3376.521213] Modules linked in: act_skbedit act_mirred act_tunnel_key vxlan ip6_udp_tunnel udp_tunnel nfnetlink act_gact cls_flower sch_ingress openvswitch nsh nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 mlx5_ib mlx5_core mlxfw pci_hyperv_intf ptp pps_core nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache ib_isert iscsi_target_mod ib_srpt target_core_mod ib_srp rpcrdma rdma_ucm ib_umad ib_ipoib ib_iser rdma_cm ib_cm iw_cm rfkill ib_uverbs ib_core sunrpc kvm_intel kvm iTCO_wdt iTCO_vendor_support virtio_net irqbypass net_failover crc32_pclmul lpc_ich i2c_i801 failover pcspkr i2c_smbus mfd_core ghash_clmulni_intel sch_fq_codel drm i2c _core ip_tables crc32c_intel serio_raw [last unloaded: mlxfw] [ 3376.529468] CPU: 8 PID: 22756 Comm: kworker/u20:5 Not tainted 5.9.0-rc5+ #6 [ 3376.530399] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 [ 3376.531975] Workqueue: mlx5e mlx5e_rep_neigh_update [mlx5_core] [ 3376.532820] RIP: 0010:refcount_warn_saturate+0xd8/0xe0 [ 3376.533589] Code: ff 48 c7 c7 e0 b8 27 82 c6 05 0b b6 09 01 01 e8 94 93 c1 ff 0f 0b c3 48 c7 c7 88 b8 27 82 c6 05 f7 b5 09 01 01 e8 7e 93 c1 ff <0f> 0b c3 0f 1f 44 00 00 8b 07 3d 00 00 00 c0 74 12 83 f8 01 74 13 [ 3376.536017] RSP: 0018:ffffc90002a97e30 EFLAGS: 00010286 [ 3376.536793] RAX: 0000000000000000 RBX: ffff8882de30d648 RCX: 0000000000000000 [ 3376.537718] RDX: ffff8882f5c28f20 RSI: ffff8882f5c18e40 RDI: ffff8882f5c18e40 [ 3376.538654] RBP: ffff8882cdf56c00 R08: 000000000000c580 R09: 0000000000001a4d [ 3376.539582] R10: 0000000000000731 R11: ffffc90002a97ccd R12: 0000000000000000 [ 3376.540519] R13: ffff8882de30d600 R14: ffff8882de30d640 R15: ffff88821e000900 [ 3376.541444] FS: 0000000000000000(0000) GS:ffff8882f5c00000(0000) knlGS:0000000000000000 [ 3376.542732] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 3376.543545] CR2: 0000556e5504b248 CR3: 00000002c6f10005 CR4: 0000000000770ee0 [ 3376.544483] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 3376.545419] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 3376.546344] PKRU: 55555554 [ 3376.546911] Call Trace: [ 3376.547479] mlx5e_rep_neigh_update.cold+0x33/0xe2 [mlx5_core] [ 3376.548299] process_one_work+0x1d8/0x390 [ 3376.548977] worker_thread+0x4d/0x3e0 [ 3376.549631] ? rescuer_thread+0x3e0/0x3e0 [ 3376.550295] kthread+0x118/0x130 [ 3376.550914] ? kthread_create_worker_on_cpu+0x70/0x70 [ 3376.551675] ret_from_fork+0x1f/0x30 [ 3376.552312] ---[ end trace d84e8f46d2a77eec ]--- Fix the bug by moving work_struct to dedicated dynamically-allocated structure. This enabled every event handler to work on its own private neighbour pointer and removes the need for handling the case when task is already enqueued. Fixes: 232c001398ae ("net/mlx5e: Add support to neighbour update flow") Signed-off-by: Vlad Buslov <[email protected]> Reviewed-by: Roi Dayan <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2020-10-02net/mlx5e: Fix VLAN create flowAya Levin1-2/+4
When interface is attached while in promiscuous mode and with VLAN filtering turned off, both configurations are not respected and VLAN filtering is performed. There are 2 flows which add the any-vid rules during interface attach: VLAN creation table and set rx mode. Each is relaying on the other to add any-vid rules, eventually non of them does. Fix this by adding any-vid rules on VLAN creation regardless of promiscuous mode. Fixes: 9df30601c843 ("net/mlx5e: Restore vlan filter after seamless reset") Signed-off-by: Aya Levin <[email protected]> Reviewed-by: Moshe Shemesh <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2020-10-02net/mlx5e: Fix VLAN cleanup flowAya Levin1-2/+6
Prior to this patch unloading an interface in promiscuous mode with RX VLAN filtering feature turned off - resulted in a warning. This is due to a wrong condition in the VLAN rules cleanup flow, which left the any-vid rules in the VLAN steering table. These rules prevented destroying the flow group and the flow table. The any-vid rules are removed in 2 flows, but none of them remove it in case both promiscuous is set and VLAN filtering is off. Fix the issue by changing the condition of the VLAN table cleanup flow to clean also in case of promiscuous mode. mlx5_core 0000:00:08.0: mlx5_destroy_flow_group:2123:(pid 28729): Flow group 20 wasn't destroyed, refcount > 1 mlx5_core 0000:00:08.0: mlx5_destroy_flow_group:2123:(pid 28729): Flow group 19 wasn't destroyed, refcount > 1 mlx5_core 0000:00:08.0: mlx5_destroy_flow_table:2112:(pid 28729): Flow table 262149 wasn't destroyed, refcount > 1 ... ... ------------[ cut here ]------------ FW pages counter is 11560 after reclaiming all pages WARNING: CPU: 1 PID: 28729 at drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c:660 mlx5_reclaim_startup_pages+0x178/0x230 [mlx5_core] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 Call Trace: mlx5_function_teardown+0x2f/0x90 [mlx5_core] mlx5_unload_one+0x71/0x110 [mlx5_core] remove_one+0x44/0x80 [mlx5_core] pci_device_remove+0x3e/0xc0 device_release_driver_internal+0xfb/0x1c0 device_release_driver+0x12/0x20 pci_stop_bus_device+0x68/0x90 pci_stop_and_remove_bus_device+0x12/0x20 hv_eject_device_work+0x6f/0x170 [pci_hyperv] ? __schedule+0x349/0x790 process_one_work+0x206/0x400 worker_thread+0x34/0x3f0 ? process_one_work+0x400/0x400 kthread+0x126/0x140 ? kthread_park+0x90/0x90 ret_from_fork+0x22/0x30 ---[ end trace 6283bde8d26170dc ]--- Fixes: 9df30601c843 ("net/mlx5e: Restore vlan filter after seamless reset") Signed-off-by: Aya Levin <[email protected]> Reviewed-by: Moshe Shemesh <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2020-10-02net/mlx5e: Fix return status when setting unsupported FEC modeAya Levin1-0/+3
Verify the configured FEC mode is supported by at least a single link mode before applying the command. Otherwise fail the command and return "Operation not supported". Prior to this patch, the command was successful, yet it falsely set all link modes to FEC auto mode - like configuring FEC mode to auto. Auto mode is the default configuration if a link mode doesn't support the configured FEC mode. Fixes: b5ede32d3329 ("net/mlx5e: Add support for FEC modes based on 50G per lane links") Signed-off-by: Aya Levin <[email protected]> Reviewed-by: Eran Ben Elisha <[email protected]> Reviewed-by: Moshe Shemesh <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2020-10-02net/mlx5e: Fix driver's declaration to support GRE offloadAya Levin1-1/+18
Declare GRE offload support with respect to the inner protocol. Add a list of supported inner protocols on which the driver can offload checksum and GSO. For other protocols, inform the stack to do the needed operations. There is no noticeable impact on GRE performance. Fixes: 2729984149e6 ("net/mlx5e: Support TSO and TX checksum offloads for GRE tunnels") Signed-off-by: Aya Levin <[email protected]> Reviewed-by: Moshe Shemesh <[email protected]> Reviewed-by: Tariq Toukan <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2020-10-02net/mlx5e: CT, Fix coverity issueMaor Dickman1-1/+3
The cited commit introduced the following coverity issue at function mlx5_tc_ct_rule_to_tuple_nat: - Memory - corruptions (OVERRUN) Overrunning array "tuple->ip.src_v6.in6_u.u6_addr32" of 4 4-byte elements at element index 7 (byte offset 31) using index "ip6_offset" (which evaluates to 7). In case of IPv6 destination address rewrite, ip6_offset values are between 4 to 7, which will cause memory overrun of array "tuple->ip.src_v6.in6_u.u6_addr32" to array "tuple->ip.dst_v6.in6_u.u6_addr32". Fixed by writing the value directly to array "tuple->ip.dst_v6.in6_u.u6_addr32" in case ip6_offset values are between 4 to 7. Fixes: bc562be9674b ("net/mlx5e: CT: Save ct entries tuples in hashtables") Signed-off-by: Maor Dickman <[email protected]> Reviewed-by: Roi Dayan <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2020-10-02net/mlx5e: Add resiliency in Striding RQ mode for packets larger than MTUAya Levin2-5/+58
Prior to this fix, in Striding RQ mode the driver was vulnerable when receiving packets in the range (stride size - headroom, stride size]. Where stride size is calculated by mtu+headroom+tailroom aligned to the closest power of 2. Usually, this filtering is performed by the HW, except for a few cases: - Between 2 VFs over the same PF with different MTUs - On bluefield, when the host physical function sets a larger MTU than the ARM has configured on its representor and uplink representor. When the HW filtering is not present, packets that are larger than MTU might be harmful for the RQ's integrity, in the following impacts: 1) Overflow from one WQE to the next, causing a memory corruption that in most cases is unharmful: as the write happens to the headroom of next packet, which will be overwritten by build_skb(). In very rare cases, high stress/load, this is harmful. When the next WQE is not yet reposted and points to existing SKB head. 2) Each oversize packet overflows to the headroom of the next WQE. On the last WQE of the WQ, where addresses wrap-around, the address of the remainder headroom does not belong to the next WQE, but it is out of the memory region range. This results in a HW CQE error that moves the RQ into an error state. Solution: Add a page buffer at the end of each WQE to absorb the leak. Actually the maximal overflow size is headroom but since all memory units must be of the same size, we use page size to comply with UMR WQEs. The increase in memory consumption is of a single page per RQ. Initialize the mkey with all MTTs pointing to a default page. When the channels are activated, UMR WQEs will redirect the RX WQEs to the actual memory from the RQ's pool, while the overflow MTTs remain mapped to the default page. Fixes: 73281b78a37a ("net/mlx5e: Derive Striding RQ size from MTU") Signed-off-by: Aya Levin <[email protected]> Reviewed-by: Tariq Toukan <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2020-10-02net/mlx5e: Fix error path for RQ allocAya Levin1-15/+17
Increase granularity of the error path to avoid unneeded free/release. Fix the cleanup to be symmetric to the order of creation. Fixes: 0ddf543226ac ("xdp/mlx5: setup xdp_rxq_info") Fixes: 422d4c401edd ("net/mlx5e: RX, Split WQ objects for different RQ types") Signed-off-by: Aya Levin <[email protected]> Reviewed-by: Tariq Toukan <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2020-10-02net/mlx5: Fix request_irqs error flowMaor Gottlieb1-1/+1
Fix error flow handling in request_irqs which try to free irq that we failed to request. It fixes the below trace. WARNING: CPU: 1 PID: 7587 at kernel/irq/manage.c:1684 free_irq+0x4d/0x60 CPU: 1 PID: 7587 Comm: bash Tainted: G W OE 4.15.15-1.el7MELLANOXsmp-x86_64 #1 Hardware name: Advantech SKY-6200/SKY-6200, BIOS F2.00 08/06/2020 RIP: 0010:free_irq+0x4d/0x60 RSP: 0018:ffffc9000ef47af0 EFLAGS: 00010282 RAX: ffff88001476ae00 RBX: 0000000000000655 RCX: 0000000000000000 RDX: ffff88001476ae00 RSI: ffffc9000ef47ab8 RDI: ffff8800398bb478 RBP: ffff88001476a838 R08: ffff88001476ae00 R09: 000000000000156d R10: 0000000000000000 R11: 0000000000000004 R12: ffff88001476a838 R13: 0000000000000006 R14: ffff88001476a888 R15: 00000000ffffffe4 FS: 00007efeadd32740(0000) GS:ffff88047fc40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fc9cc010008 CR3: 00000001a2380004 CR4: 00000000007606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: mlx5_irq_table_create+0x38d/0x400 [mlx5_core] ? atomic_notifier_chain_register+0x50/0x60 mlx5_load_one+0x7ee/0x1130 [mlx5_core] init_one+0x4c9/0x650 [mlx5_core] pci_device_probe+0xb8/0x120 driver_probe_device+0x2a1/0x470 ? driver_allows_async_probing+0x30/0x30 bus_for_each_drv+0x54/0x80 __device_attach+0xa3/0x100 pci_bus_add_device+0x4a/0x90 pci_iov_add_virtfn+0x2dc/0x2f0 pci_enable_sriov+0x32e/0x420 mlx5_core_sriov_configure+0x61/0x1b0 [mlx5_core] ? kstrtoll+0x22/0x70 num_vf_store+0x4b/0x70 [mlx5_core] kernfs_fop_write+0x102/0x180 __vfs_write+0x26/0x140 ? rcu_all_qs+0x5/0x80 ? _cond_resched+0x15/0x30 ? __sb_start_write+0x41/0x80 vfs_write+0xad/0x1a0 SyS_write+0x42/0x90 do_syscall_64+0x60/0x110 entry_SYSCALL_64_after_hwframe+0x3d/0xa2 Fixes: 24163189da48 ("net/mlx5: Separate IRQ request/free from EQ life cycle") Signed-off-by: Maor Gottlieb <[email protected]> Reviewed-by: Eran Ben Elisha <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2020-10-02net/mlx5: cmdif, Avoid skipping reclaim pages if FW is not accessibleSaeed Mahameed2-9/+10
In case of pci is offline reclaim_pages_cmd() will still try to call the FW to release FW pages, cmd_exec() in this case will return a silent success without actually calling the FW. This is wrong and will cause page leaks, what we should do is to detect pci offline or command interface un-available before tying to access the FW and manually release the FW pages in the driver. In this patch we share the code to check for FW command interface availability and we call it in sensitive places e.g. reclaim_pages_cmd(). Alternative fix: 1. Remove MLX5_CMD_OP_MANAGE_PAGES form mlx5_internal_err_ret_value, command success simulation list. 2. Always Release FW pages even if cmd_exec fails in reclaim_pages_cmd(). Reviewed-by: Moshe Shemesh <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2020-10-02net/mlx5: Add retry mechanism to the command entry index allocationEran Ben Elisha1-1/+20
It is possible that new command entry index allocation will temporarily fail. The new command holds the semaphore, so it means that a free entry should be ready soon. Add one second retry mechanism before returning an error. Patch "net/mlx5: Avoid possible free of command entry while timeout comp handler" increase the possibility to bump into this temporarily failure as it delays the entry index release for non-callback commands. Fixes: e126ba97dba9 ("mlx5: Add driver for Mellanox Connect-IB adapters") Signed-off-by: Eran Ben Elisha <[email protected]> Reviewed-by: Moshe Shemesh <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2020-10-02net/mlx5: poll cmd EQ in case of command timeoutEran Ben Elisha3-9/+86
Once driver detects a command interface command timeout, it warns the user and returns timeout error to the caller. In such case, the entry of the command is not evacuated (because only real event interrupt is allowed to clear command interface entry). If the HW event interrupt of this entry will never arrive, this entry will be left unused forever. Command interface entries are limited and eventually we can end up without the ability to post a new command. In addition, if driver will not consume the EQE of the lost interrupt and rearm the EQ, no new interrupts will arrive for other commands. Add a resiliency mechanism for manually polling the command EQ in case of a command timeout. In case resiliency mechanism will find non-handled EQE, it will consume it, and the command interface will be fully functional again. Once the resiliency flow finished, wait another 5 seconds for the command interface to complete for this command entry. Define mlx5_cmd_eq_recover() to manage the cmd EQ polling resiliency flow. Add an async EQ spinlock to avoid races between resiliency flows and real interrupts that might run simultaneously. Fixes: e126ba97dba9 ("mlx5: Add driver for Mellanox Connect-IB adapters") Signed-off-by: Eran Ben Elisha <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2020-10-02net/mlx5: Avoid possible free of command entry while timeout comp handlerEran Ben Elisha1-38/+71
Upon command completion timeout, driver simulates a forced command completion. In a rare case where real interrupt for that command arrives simultaneously, it might release the command entry while the forced handler might still access it. Fix that by adding an entry refcount, to track current amount of allowed handlers. Command entry to be released only when this refcount is decremented to zero. Command refcount is always initialized to one. For callback commands, command completion handler is the symmetric flow to decrement it. For non-callback commands, it is wait_func(). Before ringing the doorbell, increment the refcount for the real completion handler. Once the real completion handler is called, it will decrement it. For callback commands, once the delayed work is scheduled, increment the refcount. Upon callback command completion handler, we will try to cancel the timeout callback. In case of success, we need to decrement the callback refcount as it will never run. In addition, gather the entry index free and the entry free into a one flow for all command types release. Fixes: e126ba97dba9 ("mlx5: Add driver for Mellanox Connect-IB adapters") Signed-off-by: Eran Ben Elisha <[email protected]> Reviewed-by: Moshe Shemesh <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2020-10-02net/mlx5: Fix a race when moving command interface to polling modeEran Ben Elisha1-0/+2
As part of driver unload, it destroys the commands EQ (via FW command). As the commands EQ is destroyed, FW will not generate EQEs for any command that driver sends afterwards. Driver should poll for later commands status. Driver commands mode metadata is updated before the commands EQ is actually destroyed. This can lead for double completion handle by the driver (polling and interrupt), if a command is executed and completed by FW after the mode was changed, but before the EQ was destroyed. Fix that by using the mlx5_cmd_allowed_opcode mechanism to guarantee that only DESTROY_EQ command can be executed during this time period. Fixes: e126ba97dba9 ("mlx5: Add driver for Mellanox Connect-IB adapters") Signed-off-by: Eran Ben Elisha <[email protected]> Reviewed-by: Moshe Shemesh <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2020-10-01lib8390: Use netif_msg_init to initialize msg_enable bitsArmin Wolf1-6/+8
Use netif_msg_init() to process param settings and use only the proper initialized value of ei_local->msg_level for later processing; Signed-off-by: Armin Wolf <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-10-01ravb: Add support for explicit internal clock delay configurationGeert Uytterhoeven2-9/+28
Some EtherAVB variants support internal clock delay configuration, which can add larger delays than the delays that are typically supported by the PHY (using an "rgmii-*id" PHY mode, and/or "[rt]xc-skew-ps" properties). Historically, the EtherAVB driver configured these delays based on the "rgmii-*id" PHY mode. This caused issues with PHY drivers that implement PHY internal delays properly[1]. Hence a backwards-compatible workaround was added by masking the PHY mode[2]. Add proper support for explicit configuration of the MAC internal clock delays using the new "[rt]x-internal-delay-ps" properties. Fall back to the old handling if none of these properties is present. [1] Commit bcf3440c6dd78bfe ("net: phy: micrel: add phy-mode support for the KSZ9031 PHY") [2] Commit 9b23203c32ee02cd ("ravb: Mask PHY mode to avoid inserting delays twice"). Signed-off-by: Geert Uytterhoeven <[email protected]> Reviewed-by: Sergei Shtylyov <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-10-01ravb: Split delay handling in parsing and applyingGeert Uytterhoeven2-6/+19
Currently, full delay handling is done in both the probe and resume paths. Split it in two parts, so the resume path doesn't have to redo the parsing part over and over again. Signed-off-by: Geert Uytterhoeven <[email protected]> Reviewed-by: Sergei Shtylyov <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-10-01r8169: fix data corruption issue on RTL8402Heiner Kallweit1-0/+4
Petr reported that after resume from suspend RTL8402 partially truncates incoming packets, and re-initializing register RxConfig before the actual chip re-initialization sequence is needed to avoid the issue. Reported-by: Petr Tesarik <[email protected]> Proposed-by: Petr Tesarik <[email protected]> Tested-by: Petr Tesarik <[email protected]> Signed-off-by: Heiner Kallweit <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-10-01r8169: fix handling ether_clkHeiner Kallweit1-13/+19
Petr reported that system freezes on r8169 driver load on a system using ether_clk. The original change was done under the assumption that the clock isn't needed for basic operations like chip register access. But obviously that was wrong. Therefore effectively revert the original change, and in addition leave the clock active when suspending and WoL is enabled. Chip may not be able to process incoming packets otherwise. Fixes: 9f0b54cd1672 ("r8169: move switching optional clock on/off to pll power functions") Reported-by: Petr Tesarik <[email protected]> Tested-by: Petr Tesarik <[email protected]> Signed-off-by: Heiner Kallweit <[email protected]> Reviewed-by: Hans de Goede <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-09-30net/mlx5e: Fix potential null pointer dereferenceGustavo A. R. Silva1-4/+6
Calls to kzalloc() and kvzalloc() should be null-checked in order to avoid any potential failures. In this case, a potential null pointer dereference. Fix this by adding null checks for _parse_attr_ and _flow_ right after allocation. Addresses-Coverity-ID: 1497154 ("Dereference before null check") Fixes: c620b772152b ("net/mlx5: Refactor tc flow attributes structure") Signed-off-by: Gustavo A. R. Silva <[email protected]> Reviewed-by: Leon Romanovsky <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2020-09-30net/mlx5e: Fix a use after free on error in mlx5_tc_ct_shared_counter_get()Dan Carpenter1-1/+3
This code frees "shared_counter" and then dereferences on the next line to get the error code. Fixes: 1edae2335adf ("net/mlx5e: CT: Use the same counter for both directions") Signed-off-by: Dan Carpenter <[email protected]> Reviewed-by: Leon Romanovsky <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2020-09-30net/mlx5: Fix dereference on pointer attr after null checkAriel Levkovich1-2/+4
When removing a flow from the slow path fdb, a flow attr struct is allocated for the rule removal process. If the allocation fails the code prints a warning message but continues with the removal flow which include dereferencing a pointer which could be null. Fix this by exiting the function in case the attr allocation failed. Fixes: c620b772152b ("net/mlx5: Refactor tc flow attributes structure") Reported-by: Dan Carpenter <[email protected]> Signed-off-by: Ariel Levkovich <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2020-09-30net/mlx5: Use dma device access helperParav Pandit12-34/+37
Use the PCI device directly for dma accesses as non PCI device unlikely support IOMMU and dma mappings. Introduce and use helper routine to access DMA device. Signed-off-by: Parav Pandit <[email protected]> Reviewed-by: Vu Pham <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2020-09-30net/mlx5: E-Switch, Support flow source for local vportHamdan Igbaria1-2/+5
Set flow source as hint for local vport. Signed-off-by: Hamdan Igbaria <[email protected]> Reviewed-by: Oz Shlomo <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2020-09-30net/mlx5: E-switch, Move devlink eswitch ports closer to eswitchParav Pandit6-92/+154
Currently devlink eswitch ports are registered and unregistered by the representor layer. However it is better to register them at eswitch layer so that in future user initiated command port add and delete commands can also register/unregister devlink ports without depending on representor layer. Signed-off-by: Parav Pandit <[email protected]> Reviewed-by: Roi Dayan <[email protected]> Reviewed-by: Vu Pham <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2020-09-30net/mlx5: E-switch, Use helper function to load unload representorParav Pandit1-8/+21
To register and unregister devlink ports when loading/unload representors, refactor the code to helper functions. Signed-off-by: Parav Pandit <[email protected]> Reviewed-by: Roi Dayan <[email protected]> Reviewed-by: Vu Pham <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2020-09-30net/mlx5: E-switch, Add helper to check egress ACL needParav Pandit2-5/+11
Currently only VF vports need egress ACL table. Add a generic helper to check whether a vport need egress ACL table or not. Signed-off-by: Parav Pandit <[email protected]> Reviewed-by: Roi Dayan <[email protected]> Reviewed-by: Vu Pham <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2020-09-30net/mlx5: E-switch, Use PF num in metadata reg c0sunils1-18/+18
Currently only 256 vports can be supported as only 8 bits are reserved for them and 8 bits are reserved for vhca_ids in metadata reg c0. To support more than 256 vports, replace vhca_id with a unique shorter 4-bit PF number which covers upto 16 PF's. Use remaining 12 bits for vports ranging 1-4095. This will continue to generate unique metadata even if multiple PCI devices have same switch_id. Signed-off-by: sunils <[email protected]> Reviewed-by: Parav Pandit <[email protected]> Reviewed-by: Vu Pham <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2020-09-30net/mlx5: DR, Add support for rule creation with flow source hintHamdan Igbaria4-22/+26
Skip the rule according to flow arrival source, in case of RX and the source is local port skip and in case of TX and the source is uplink skip, we get this info according to the flow source hint we get from upper layers when creating the rule. This is needed because for example in case of FDB table which has a TX and RX tables and we are inserting a rule with an encap action which is only a TX action, in this case rule will fail on RX, so we can rely on the flow source hint and skip RX in such case. Until now we relied on metadata regc_0 that upper layer mapped the port in the regc_0, but the problem is that upper layer did not always use regc_0 for port mapping, so now we added support to flow source hint which upper layers will pass to SW steering when creating a rule. Signed-off-by: Alex Vesker <[email protected]> Signed-off-by: Hamdan Igbaria <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2020-09-30net/mlx5: DR, Call ste_builder directly with tag pointerYevgeny Kliteynik2-68/+33
Instead of getting the tag in each function, call the builder directly with the tag. This will allow to use the same function for building the tag and the bitmask. Signed-off-by: Alex Vesker <[email protected]> Signed-off-by: Yevgeny Kliteynik <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2020-09-30net/mlx5: DR, Remove unneeded local variableYevgeny Kliteynik1-3/+1
The misc3 variable is used only once and can be dropped. Signed-off-by: Alex Vesker <[email protected]> Signed-off-by: Yevgeny Kliteynik <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>