Age | Commit message (Collapse) | Author | Files | Lines |
|
Current implementation for nvm_attr configuration instructs the management
FW to load/unload the nvm-cfg image for each user-provided attribute in
the input file. This consumes lot of cycles even for few tens of
attributes.
This patch updates the implementation to perform load/commit of the config
for every 50 attributes. After loading the nvm-image, MFW expects that
config should be committed in a predefined timer value (5 sec), hence it's
not possible to write large number of attributes in a single load/commit
window. Hence performing the commits in chunks.
Fixes: 0dabbe1bb3a4 ("qed: Add driver API for flashing the config attributes.")
Signed-off-by: Sudarsana Reddy Kalluru <[email protected]>
Signed-off-by: Ariel Elior <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
There is a spelling misake in a DP_NOTICE message. Fix it.
Signed-off-by: Colin Ian King <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
Mellanox, mlx5 fixes 2019-10-24
This series introduces misc fixes to mlx5 driver.
v1->v2:
- Dropped the kTLS counter documentation patch, Tariq will fix it and
send it later.
- Added a new fix for link speed mode reporting.
('net/mlx5e: Initialize link modes bitmap on stack')
For -stable v4.14
('net/mlx5e: Fix handling of compressed CQEs in case of low NAPI budget')
For -stable v4.19
('net/mlx5e: Fix ethtool self test: link speed')
For -stable v5.2
('net/mlx5: Fix flow counter list auto bits struct')
('net/mlx5: Fix rtable reference leak')
For -stable v5.3
('net/mlx5e: Remove incorrect match criteria assignment line')
('net/mlx5e: Determine source port properly for vlan push action')
('net/mlx5e: Initialize link modes bitmap on stack')
====================
Signed-off-by: David S. Miller <[email protected]>
|
|
Use platform_get_irq_byname_optional() and platform_get_irq_optional()
instead of platform_get_irq_byname() and platform_get_irq() for optional
IRQs to avoid below error message during probe:
[ 0.795803] fec 30be0000.ethernet: IRQ pps not found
[ 0.800787] fec 30be0000.ethernet: IRQ index 3 not found
Signed-off-by: Anson Huang <[email protected]>
Acked-by: Fugang Duan <[email protected]>
Reviewed-by: Stephen Boyd <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Failed to get irq using name is NOT fatal as driver will use index
to get irq instead, use platform_get_irq_byname_optional() instead
of platform_get_irq_byname() to avoid below error message during
probe:
[ 0.819312] fec 30be0000.ethernet: IRQ int0 not found
[ 0.824433] fec 30be0000.ethernet: IRQ int1 not found
[ 0.829539] fec 30be0000.ethernet: IRQ int2 not found
Signed-off-by: Anson Huang <[email protected]>
Acked-by: Fugang Duan <[email protected]>
Reviewed-by: Stephen Boyd <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
This is due to error in over budget processing.
When dealing with high throughput, the used buffers
that exceeds the budget is not cleaned up. In addition,
it takes a lot of cycles to clean up the used buffer,
and then the buffer where the valid data is located can take effect.
Signed-off-by: Jiangfeng Xiao <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Prior to this patch, the amount of counters guaranteed per VF in the
resource tracker was MLX4_VF_COUNTERS_PER_PORT * MLX4_MAX_PORTS. It was
set regardless if the VF was single or dual port.
This caused several VFs to have no guaranteed counters although the
system could satisfy their request.
The fix is to dynamically guarantee counters, based on each VF
specification.
Fixes: 9de92c60beaa ("net/mlx4_core: Adjust counter grant policy in the resource tracker")
Signed-off-by: Eran Ben Elisha <[email protected]>
Signed-off-by: Jack Morgenstein <[email protected]>
Signed-off-by: Tariq Toukan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Initialize link modes bitmap on stack before using it, otherwise the
outcome of ethtool set link ksettings might have unexpected values.
Fixes: 4b95840a6ced ("net/mlx5e: Fix matching of speed to PRM link modes")
Signed-off-by: Aya Levin <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
Ethtool self test contains a test for link speed. This test reads the
PTYS register and determines whether the current speed is valid or not.
Change current implementation to use the function mlx5e_port_linkspeed()
that does the same check and fails when speed is invalid. This code
redundancy lead to a bug when mlx5e_port_linkspeed() was updated with
expended speeds and the self test was not.
Fixes: 2c81bfd5ae56 ("net/mlx5e: Move port speed code from en_ethtool.c to en/port.c")
Signed-off-by: Aya Levin <[email protected]>
Reviewed-by: Moshe Shemesh <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
When CQE compression is enabled, compressed CQEs use the following
structure: a title is followed by one or many blocks, each containing 8
mini CQEs (except the last, which may contain fewer mini CQEs).
Due to NAPI budget restriction, a complete structure is not always
parsed in one NAPI run, and some blocks with mini CQEs may be deferred
to the next NAPI poll call - we have the mlx5e_decompress_cqes_cont call
in the beginning of mlx5e_poll_rx_cq. However, if the budget is
extremely low, some blocks may be left even after that, but the code
that follows the mlx5e_decompress_cqes_cont call doesn't check it and
assumes that a new CQE begins, which may not be the case. In such cases,
random memory corruptions occur.
An extremely low NAPI budget of 8 is used when busy_poll or busy_read is
active.
This commit adds a check to make sure that the previous compressed CQE
has been completely parsed after mlx5e_decompress_cqes_cont, otherwise
it prevents a new CQE from being fetched in the middle of a compressed
CQE.
This commit fixes random crashes in __build_skb, __page_pool_put_page
and other not-related-directly places, that used to happen when both CQE
compression and busy_poll/busy_read were enabled.
Fixes: 7219ab34f184 ("net/mlx5e: CQE compression")
Signed-off-by: Maxim Mikityanskiy <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
Geneve implementation changed mlx5 tc to user direct pointer to tunnel_key
action's internal struct ip_tunnel_info instance. However, this leads to
use-after-free error when initial filter that caused creation of new encap
entry is deleted or when tunnel_key action is manually overwritten through
action API. Moreover, with recent TC offloads API unlocking change struct
flow_action_entry->tunnel point to temporal copy of tunnel info that is
deallocated after filter is offloaded to hardware which causes bug to
reproduce every time new filter is attached to existing encap entry with
following KASAN bug:
[ 314.885555] ==================================================================
[ 314.886641] BUG: KASAN: use-after-free in memcmp+0x2c/0x60
[ 314.886864] Read of size 1 at addr ffff88886c746280 by task tc/2682
[ 314.887179] CPU: 22 PID: 2682 Comm: tc Not tainted 5.3.0-rc7+ #703
[ 314.887188] Hardware name: Supermicro SYS-2028TP-DECR/X10DRT-P, BIOS 2.0b 03/30/2017
[ 314.887195] Call Trace:
[ 314.887215] dump_stack+0x9a/0xf0
[ 314.887236] print_address_description+0x67/0x323
[ 314.887248] ? memcmp+0x2c/0x60
[ 314.887257] ? memcmp+0x2c/0x60
[ 314.887272] __kasan_report.cold+0x1a/0x3d
[ 314.887474] ? __mlx5e_tc_del_fdb_peer_flow+0x100/0x1b0 [mlx5_core]
[ 314.887484] ? memcmp+0x2c/0x60
[ 314.887509] kasan_report+0xe/0x12
[ 314.887521] memcmp+0x2c/0x60
[ 314.887662] mlx5e_tc_add_fdb_flow+0x51b/0xbe0 [mlx5_core]
[ 314.887838] ? mlx5e_encap_take+0x110/0x110 [mlx5_core]
[ 314.887902] ? lockdep_init_map+0x87/0x2c0
[ 314.887924] ? __init_waitqueue_head+0x4f/0x60
[ 314.888062] ? mlx5e_alloc_flow.isra.0+0x18c/0x1c0 [mlx5_core]
[ 314.888207] __mlx5e_add_fdb_flow+0x2d7/0x440 [mlx5_core]
[ 314.888359] ? mlx5e_tc_update_neigh_used_value+0x6f0/0x6f0 [mlx5_core]
[ 314.888374] ? match_held_lock+0x2e/0x240
[ 314.888537] mlx5e_configure_flower+0x830/0x16a0 [mlx5_core]
[ 314.888702] ? __mlx5e_add_fdb_flow+0x440/0x440 [mlx5_core]
[ 314.888713] ? down_read+0x118/0x2c0
[ 314.888728] ? down_read_killable+0x300/0x300
[ 314.888882] ? mlx5e_rep_get_ethtool_stats+0x180/0x180 [mlx5_core]
[ 314.888899] tc_setup_cb_add+0x127/0x270
[ 314.888937] fl_hw_replace_filter+0x2ac/0x380 [cls_flower]
[ 314.888976] ? fl_hw_destroy_filter+0x1b0/0x1b0 [cls_flower]
[ 314.888990] ? fl_change+0xbcf/0x27ef [cls_flower]
[ 314.889030] ? fl_change+0xa57/0x27ef [cls_flower]
[ 314.889069] fl_change+0x16bd/0x27ef [cls_flower]
[ 314.889135] ? __rhashtable_insert_fast.constprop.0+0xa00/0xa00 [cls_flower]
[ 314.889167] ? __radix_tree_lookup+0xa4/0x130
[ 314.889200] ? fl_get+0x169/0x240 [cls_flower]
[ 314.889218] ? fl_walk+0x230/0x230 [cls_flower]
[ 314.889249] tc_new_tfilter+0x5e1/0xd40
[ 314.889281] ? __rhashtable_insert_fast.constprop.0+0xa00/0xa00 [cls_flower]
[ 314.889309] ? tc_del_tfilter+0xa30/0xa30
[ 314.889335] ? __lock_acquire+0x5b5/0x2460
[ 314.889378] ? find_held_lock+0x85/0xa0
[ 314.889442] ? tc_del_tfilter+0xa30/0xa30
[ 314.889465] rtnetlink_rcv_msg+0x4ab/0x5f0
[ 314.889488] ? rtnl_dellink+0x490/0x490
[ 314.889518] ? lockdep_hardirqs_on+0x260/0x260
[ 314.889538] ? netlink_deliver_tap+0xab/0x5a0
[ 314.889550] ? match_held_lock+0x1b/0x240
[ 314.889575] netlink_rcv_skb+0xd0/0x200
[ 314.889588] ? rtnl_dellink+0x490/0x490
[ 314.889605] ? netlink_ack+0x440/0x440
[ 314.889635] ? netlink_deliver_tap+0x161/0x5a0
[ 314.889648] ? lock_downgrade+0x360/0x360
[ 314.889657] ? lock_acquire+0xe5/0x210
[ 314.889686] netlink_unicast+0x296/0x350
[ 314.889707] ? netlink_attachskb+0x390/0x390
[ 314.889726] ? _copy_from_iter_full+0xe0/0x3a0
[ 314.889738] ? __virt_addr_valid+0xbb/0x130
[ 314.889771] netlink_sendmsg+0x394/0x600
[ 314.889800] ? netlink_unicast+0x350/0x350
[ 314.889817] ? move_addr_to_kernel.part.0+0x90/0x90
[ 314.889852] ? netlink_unicast+0x350/0x350
[ 314.889872] sock_sendmsg+0x96/0xa0
[ 314.889891] ___sys_sendmsg+0x482/0x520
[ 314.889919] ? copy_msghdr_from_user+0x250/0x250
[ 314.889930] ? __fput+0x1fa/0x390
[ 314.889941] ? task_work_run+0xb7/0xf0
[ 314.889957] ? exit_to_usermode_loop+0x117/0x120
[ 314.889972] ? entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 314.889982] ? do_syscall_64+0x74/0xe0
[ 314.889992] ? entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 314.890012] ? mark_lock+0xac/0x9a0
[ 314.890028] ? __lock_acquire+0x5b5/0x2460
[ 314.890053] ? mark_lock+0xac/0x9a0
[ 314.890083] ? __lock_acquire+0x5b5/0x2460
[ 314.890112] ? match_held_lock+0x1b/0x240
[ 314.890144] ? __fget_light+0xa1/0xf0
[ 314.890166] ? sockfd_lookup_light+0x91/0xb0
[ 314.890187] __sys_sendmsg+0xba/0x130
[ 314.890201] ? __sys_sendmsg_sock+0xb0/0xb0
[ 314.890225] ? __blkcg_punt_bio_submit+0xd0/0xd0
[ 314.890264] ? lockdep_hardirqs_off+0xbe/0x100
[ 314.890274] ? mark_held_locks+0x24/0x90
[ 314.890286] ? do_syscall_64+0x1e/0xe0
[ 314.890308] do_syscall_64+0x74/0xe0
[ 314.890325] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 314.890336] RIP: 0033:0x7f00ca33d7b8
[ 314.890348] Code: 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 8d 05 65 8f 0c 00 8b 00 85 c0 75 17 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 48 83 ec 28 89 5
4
[ 314.890356] RSP: 002b:00007ffea2983928 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
[ 314.890369] RAX: ffffffffffffffda RBX: 000000005d777d5b RCX: 00007f00ca33d7b8
[ 314.890377] RDX: 0000000000000000 RSI: 00007ffea2983990 RDI: 0000000000000003
[ 314.890384] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000006
[ 314.890392] R10: 0000000000404eda R11: 0000000000000246 R12: 0000000000000001
[ 314.890400] R13: 000000000047f640 R14: 00007ffea2987b58 R15: 0000000000000021
[ 314.890529] Allocated by task 2687:
[ 314.890684] save_stack+0x1b/0x80
[ 314.890694] __kasan_kmalloc.constprop.0+0xc2/0xd0
[ 314.890705] __kmalloc_track_caller+0x102/0x340
[ 314.890721] kmemdup+0x1d/0x40
[ 314.890730] tc_setup_flow_action+0x731/0x2c27
[ 314.890743] fl_hw_replace_filter+0x23b/0x380 [cls_flower]
[ 314.890756] fl_change+0x16bd/0x27ef [cls_flower]
[ 314.890765] tc_new_tfilter+0x5e1/0xd40
[ 314.890776] rtnetlink_rcv_msg+0x4ab/0x5f0
[ 314.890786] netlink_rcv_skb+0xd0/0x200
[ 314.890796] netlink_unicast+0x296/0x350
[ 314.890805] netlink_sendmsg+0x394/0x600
[ 314.890815] sock_sendmsg+0x96/0xa0
[ 314.890825] ___sys_sendmsg+0x482/0x520
[ 314.890834] __sys_sendmsg+0xba/0x130
[ 314.890844] do_syscall_64+0x74/0xe0
[ 314.890854] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 314.890937] Freed by task 2687:
[ 314.891076] save_stack+0x1b/0x80
[ 314.891086] __kasan_slab_free+0x12c/0x170
[ 314.891095] kfree+0xeb/0x2f0
[ 314.891106] tc_cleanup_flow_action+0x69/0xa0
[ 314.891119] fl_hw_replace_filter+0x2c5/0x380 [cls_flower]
[ 314.891132] fl_change+0x16bd/0x27ef [cls_flower]
[ 314.891140] tc_new_tfilter+0x5e1/0xd40
[ 314.891151] rtnetlink_rcv_msg+0x4ab/0x5f0
[ 314.891161] netlink_rcv_skb+0xd0/0x200
[ 314.891170] netlink_unicast+0x296/0x350
[ 314.891180] netlink_sendmsg+0x394/0x600
[ 314.891190] sock_sendmsg+0x96/0xa0
[ 314.891200] ___sys_sendmsg+0x482/0x520
[ 314.891208] __sys_sendmsg+0xba/0x130
[ 314.891218] do_syscall_64+0x74/0xe0
[ 314.891228] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 314.891315] The buggy address belongs to the object at ffff88886c746280
which belongs to the cache kmalloc-96 of size 96
[ 314.891762] The buggy address is located 0 bytes inside of
96-byte region [ffff88886c746280, ffff88886c7462e0)
[ 314.892196] The buggy address belongs to the page:
[ 314.892387] page:ffffea0021b1d180 refcount:1 mapcount:0 mapping:ffff88835d00ef80 index:0x0
[ 314.892398] flags: 0x57ffffc0000200(slab)
[ 314.892413] raw: 0057ffffc0000200 ffffea00219e0340 0000000800000008 ffff88835d00ef80
[ 314.892423] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 314.892430] page dumped because: kasan: bad access detected
[ 314.892515] Memory state around the buggy address:
[ 314.892707] ffff88886c746180: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
[ 314.892976] ffff88886c746200: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
[ 314.893251] >ffff88886c746280: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
[ 314.893522] ^
[ 314.893657] ffff88886c746300: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
[ 314.893924] ffff88886c746380: 00 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc
[ 314.894189] ==================================================================
Fix the issue by duplicating tunnel info into per-encap copy that is
deallocated with encap structure. Also, duplicate tunnel info in flow parse
attribute to support cases when flow might be attached asynchronously.
Fixes: 1f6da30697d0 ("net/mlx5e: Geneve, Keep tunnel info as pointer to the original struct")
Signed-off-by: Vlad Buslov <[email protected]>
Reviewed-by: Yevgeny Kliteynik <[email protected]>
Reviewed-by: Roi Dayan <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
The cited commit refactored the encap id into a struct pointed from the
destination.
Bug fix for the case there is no encap for one of the destinations.
Fixes: 2b688ea5efde ("net/mlx5: Add flow steering actions to fs_cmd shim layer")
Signed-off-by: Eli Britstein <[email protected]>
Reviewed-by: Roi Dayan <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
If the rt entry gateway family is not AF_INET for multipath device,
rtable reference is leaked.
Hence, fix it by releasing the reference.
Fixes: 5fb091e8130b ("net/mlx5e: Use hint to resolve route when in HW multipath mode")
Fixes: e32ee6c78efa ("net/mlx5e: Support tunnel encap over tagged Ethernet")
Signed-off-by: Parav Pandit <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
When encap entry initialization completes successfully e->compl_result is
set to positive value and not zero, like mlx5e_rep_update_flows() assumes
at the moment. Fix the conditional to only skip encap flows update when
e->compl_result < 0.
Fixes: 2a1f1768fa17 ("net/mlx5e: Refactor neigh update for concurrent execution")
Signed-off-by: Vlad Buslov <[email protected]>
Reviewed-by: Roi Dayan <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
Memory allocated by kvzalloc should be freed by kvfree.
Fixes: cef35af34d6d ("net/mlx5e: Add mlx5e HV VHCA stats agent")
Signed-off-by: Maor Gottlieb <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
Driver have function, which enable match criteria for misc parameters
in dependence of eswitch capabilities.
Fixes: 4f5d1beadc10 ("Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux")
Signed-off-by: Dmytro Linkin <[email protected]>
Reviewed-by: Jianbo Liu <[email protected]>
Reviewed-by: Roi Dayan <[email protected]>
Reviewed-by: Saeed Mahameed <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
Termination tables are used for vlan push actions on uplink ports.
To support RoCE dual port the source port value was placed in a register.
Fix the code to use an API method returning the source port according to
the FW capabilities.
Fixes: 10caabdaad5a ("net/mlx5e: Use termination table for VLAN push actions")
Signed-off-by: Dmytro Linkin <[email protected]>
Reviewed-by: Jianbo Liu <[email protected]>
Reviewed-by: Oz Shlomo <[email protected]>
Reviewed-by: Roi Dayan <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
The switch driver keeps a "vid" variable per port, which signifies _the_
VLAN ID that is stripped on that port's egress (aka the native VLAN on a
trunk port).
That is the way the hardware is designed (mostly). The port->vid is
programmed into REW:PORT:PORT_VLAN_CFG:PORT_VID and the rewriter is told
to send all traffic as tagged except the one having port->vid.
There exists a possibility of finer-grained egress untagging decisions:
using the VCAP IS1 engine, one rule can be added to match every
VLAN-tagged frame whose VLAN should be untagged, and set POP_CNT=1 as
action. However, the IS1 can hold at most 512 entries, and the VLANs are
in the order of 6 * 4096.
So the code is fine for now. But this sequence of commands:
$ bridge vlan add dev swp0 vid 1 pvid untagged
$ bridge vlan add dev swp0 vid 2 untagged
makes untagged and pvid-tagged traffic be sent out of swp0 as tagged
with VID 1, despite user's request.
Prevent that from happening. The user should temporarily remove the
existing untagged VLAN (1 in this case), add it back as tagged, and then
add the new untagged VLAN (2 in this case).
Cc: Antoine Tenart <[email protected]>
Cc: Alexandre Belloni <[email protected]>
Fixes: 7142529f1688 ("net: mscc: ocelot: add VLAN filtering")
Signed-off-by: Vladimir Oltean <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Acked-by: Alexandre Belloni <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Background information: the driver operates the hardware in a mode where
a single VLAN can be transmitted as untagged on a particular egress
port. That is the "native VLAN on trunk port" use case. Its value is
held in port->vid.
Consider the following command sequence (no network manager, all
interfaces are down, debugging prints added by me):
$ ip link add dev br0 type bridge vlan_filtering 1
$ ip link set dev swp0 master br0
Kernel code path during last command:
br_add_slave -> ocelot_netdevice_port_event (NETDEV_CHANGEUPPER):
[ 21.401901] ocelot_vlan_port_apply: port 0 vlan aware 0 pvid 0 vid 0
br_add_slave -> nbp_vlan_init -> switchdev_port_attr_set -> ocelot_port_attr_set (SWITCHDEV_ATTR_ID_BRIDGE_VLAN_FILTERING):
[ 21.413335] ocelot_vlan_port_apply: port 0 vlan aware 1 pvid 0 vid 0
br_add_slave -> nbp_vlan_init -> nbp_vlan_add -> br_switchdev_port_vlan_add -> switchdev_port_obj_add -> ocelot_port_obj_add -> ocelot_vlan_vid_add
[ 21.667421] ocelot_vlan_port_apply: port 0 vlan aware 1 pvid 1 vid 1
So far so good. The bridge has replaced the driver's default pvid used
in standalone mode (0) with its own default_pvid (1). The port's vid
(native VLAN) has also changed from 0 to 1.
$ ip link set dev swp0 up
[ 31.722956] 8021q: adding VLAN 0 to HW filter on device swp0
do_setlink -> dev_change_flags -> vlan_vid_add -> ocelot_vlan_rx_add_vid -> ocelot_vlan_vid_add:
[ 31.728700] ocelot_vlan_port_apply: port 0 vlan aware 1 pvid 1 vid 0
The 8021q module uses the .ndo_vlan_rx_add_vid API on .ndo_open to make
ports be able to transmit and receive 802.1p-tagged traffic by default.
This API is supposed to offload a VLAN sub-interface, which for a switch
port means to add a VLAN that is not a pvid, and tagged on egress.
But the driver implementation of .ndo_vlan_rx_add_vid is wrong: it adds
back vid 0 as "egress untagged". Now back to the initial paragraph:
there is a single untagged VID that the driver keeps track of, and that
has just changed from 1 (the pvid) to 0. So this breaks the bridge
core's expectation, because it has changed vid 1 from untagged to
tagged, when what the user sees is.
$ bridge vlan
port vlan ids
swp0 1 PVID Egress Untagged
br0 1 PVID Egress Untagged
But curiously, instead of manifesting itself as "untagged and
pvid-tagged traffic gets sent as tagged on egress", the bug:
- is hidden when vlan_filtering=0
- manifests as dropped traffic when vlan_filtering=1, due to this setting:
if (port->vlan_aware && !port->vid)
/* If port is vlan-aware and tagged, drop untagged and priority
* tagged frames.
*/
val |= ANA_PORT_DROP_CFG_DROP_UNTAGGED_ENA |
ANA_PORT_DROP_CFG_DROP_PRIO_S_TAGGED_ENA |
ANA_PORT_DROP_CFG_DROP_PRIO_C_TAGGED_ENA;
which would have made sense if it weren't for this bug. The setting's
intention was "this is a trunk port with no native VLAN, so don't accept
untagged traffic". So the driver was never expecting to set VLAN 0 as
the value of the native VLAN, 0 was just encoding for "invalid".
So the fix is to not send 802.1p traffic as untagged, because that would
change the port's native vlan to 0, unbeknownst to the bridge, and
trigger unexpected code paths in the driver.
Cc: Antoine Tenart <[email protected]>
Cc: Alexandre Belloni <[email protected]>
Fixes: 7142529f1688 ("net: mscc: ocelot: add VLAN filtering")
Signed-off-by: Vladimir Oltean <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Acked-by: Alexandre Belloni <[email protected]>
Reviewed-by: Horatiu Vultur <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
When rmmod hip04_eth.ko, we can get the following warning:
Task track: rmmod(1623)>bash(1591)>login(1581)>init(1)
------------[ cut here ]------------
WARNING: CPU: 0 PID: 1623 at kernel/irq/manage.c:1557 __free_irq+0xa4/0x2ac()
Trying to free already-free IRQ 200
Modules linked in: ping(O) pramdisk(O) cpuinfo(O) rtos_snapshot(O) interrupt_ctrl(O) mtdblock mtd_blkdevrtfs nfs_acl nfs lockd grace sunrpc xt_tcpudp ipt_REJECT iptable_filter ip_tables x_tables nf_reject_ipv
CPU: 0 PID: 1623 Comm: rmmod Tainted: G O 4.4.193 #1
Hardware name: Hisilicon A15
[<c020b408>] (rtos_unwind_backtrace) from [<c0206624>] (show_stack+0x10/0x14)
[<c0206624>] (show_stack) from [<c03f2be4>] (dump_stack+0xa0/0xd8)
[<c03f2be4>] (dump_stack) from [<c021a780>] (warn_slowpath_common+0x84/0xb0)
[<c021a780>] (warn_slowpath_common) from [<c021a7e8>] (warn_slowpath_fmt+0x3c/0x68)
[<c021a7e8>] (warn_slowpath_fmt) from [<c026876c>] (__free_irq+0xa4/0x2ac)
[<c026876c>] (__free_irq) from [<c0268a14>] (free_irq+0x60/0x7c)
[<c0268a14>] (free_irq) from [<c0469e80>] (release_nodes+0x1c4/0x1ec)
[<c0469e80>] (release_nodes) from [<c0466924>] (__device_release_driver+0xa8/0x104)
[<c0466924>] (__device_release_driver) from [<c0466a80>] (driver_detach+0xd0/0xf8)
[<c0466a80>] (driver_detach) from [<c0465e18>] (bus_remove_driver+0x64/0x8c)
[<c0465e18>] (bus_remove_driver) from [<c02935b0>] (SyS_delete_module+0x198/0x1e0)
[<c02935b0>] (SyS_delete_module) from [<c0202ed0>] (__sys_trace_return+0x0/0x10)
---[ end trace bb25d6123d849b44 ]---
Currently "rmmod hip04_eth.ko" call free_irq more than once
as devres_release_all and hip04_remove both call free_irq.
This results in a 'Trying to free already-free IRQ' warning.
To solve the problem free_irq has been moved out of hip04_remove.
Signed-off-by: Jiangfeng Xiao <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
We are calling the checksum helper after the dma_map_single()
call to map the packet. This is incorrect as the checksumming
code will touch the packet from the CPU. This means the cache
won't be properly flushes (or the bounce buffering will leave
us with the unmodified packet to DMA).
This moves the calculation of the checksum & vlan tags to
before the DMA mapping.
This also has the side effect of fixing another bug: If the
checksum helper fails, we goto "drop" to drop the packet, which
will not unmap the DMA mapping.
Signed-off-by: Benjamin Herrenschmidt <[email protected]>
Fixes: 05690d633f30 ("ftgmac100: Upgrade to NETIF_F_HW_CSUM")
Reviewed-by: Vijay Khemka <[email protected]>
Tested-by: Vijay Khemka <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
This patch corrects the SPDX License Identifier style in
header files related to DPAA2 Ethernet driver supporting
Freescale SoCs with DPAA2. For C header files
Documentation/process/license-rules.rst mandates C-like comments
(opposed to C source files where C++ style should be used)
Changes made by using a script provided by Joe Perches here:
https://lkml.org/lkml/2019/2/7/46.
Suggested-by: Joe Perches <[email protected]>
Signed-off-by: Nishad Kamdar <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
For adapters which support the SGE Doorbell Queue Timer facility,
we configured the Ethernet TX Queues to send CIDX Updates to the
Associated Ethernet RX Response Queue with CPL_SGE_EGR_UPDATE
messages to allow us to respond more quickly to the CIDX Updates.
But, this was adding load to PCIe Link RX bandwidth and,
potentially, resulting in higher CPU Interrupt load.
This patch requests the HW to deliver the CIDX updates to the TX
queue status page rather than generating an ingress queue message
(as an interrupt). With this patch, the load on RX bandwidth is
reduced and a substantial improvement in BW is noticed at lower
IO sizes.
Fixes: d429005fdf2c ("cxgb4/cxgb4vf: Add support for SGE doorbell queue timer")
Signed-off-by: Raju Rangoju <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
This patch corrects the SPDX License Identifier style in
header file related to ethernet driver for Cortina Gemini
devices. For C header files Documentation/process/license-rules.rst
mandates C-like comments (opposed to C source files where
C++ style should be used)
Changes made by using a script provided by Joe Perches here:
https://lkml.org/lkml/2019/2/7/46.
Suggested-by: Joe Perches <[email protected]>
Signed-off-by: Nishad Kamdar <[email protected]>
Acked-by: Linus Walleij <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
If the CONFIG_MVNET_BA is not set, then make the stub functions
static inline to avoid trying to export them, and remove hte
following sparse warnings:
drivers/net/ethernet/marvell/mvneta_bm.h:163:6: warning: symbol 'mvneta_bm_pool_destroy' was not declared. Should it be static?
drivers/net/ethernet/marvell/mvneta_bm.h:165:6: warning: symbol 'mvneta_bm_bufs_free' was not declared. Should it be static?
drivers/net/ethernet/marvell/mvneta_bm.h:167:5: warning: symbol 'mvneta_bm_construct' was not declared. Should it be static?
drivers/net/ethernet/marvell/mvneta_bm.h:168:5: warning: symbol 'mvneta_bm_pool_refill' was not declared. Should it be static?
drivers/net/ethernet/marvell/mvneta_bm.h:170:23: warning: symbol 'mvneta_bm_pool_use' was not declared. Should it be static?
drivers/net/ethernet/marvell/mvneta_bm.h:181:18: warning: symbol 'mvneta_bm_get' was not declared. Should it be static?
drivers/net/ethernet/marvell/mvneta_bm.h:182:6: warning: symbol 'mvneta_bm_put' was not declared. Should it be static?
Signed-off-by: Ben Dooks (Codethink) <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
This patch removes variables and callback these are related to the nested
device structure.
devices that can be nested have their own nest_level variable that
represents the depth of nested devices.
In the previous patch, new {lower/upper}_level variables are added and
they replace old private nest_level variable.
So, this patch removes all 'nest_level' variables.
In order to avoid lockdep warning, ->ndo_get_lock_subclass() was added
to get lockdep subclass value, which is actually lower nested depth value.
But now, they use the dynamic lockdep key to avoid lockdep warning instead
of the subclass.
So, this patch removes ->ndo_get_lock_subclass() callback.
Signed-off-by: Taehee Yoo <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Some interface types could be nested.
(VLAN, BONDING, TEAM, MACSEC, MACVLAN, IPVLAN, VIRT_WIFI, VXLAN, etc..)
These interface types should set lockdep class because, without lockdep
class key, lockdep always warn about unexisting circular locking.
In the current code, these interfaces have their own lockdep class keys and
these manage itself. So that there are so many duplicate code around the
/driver/net and /net/.
This patch adds new generic lockdep keys and some helper functions for it.
This patch does below changes.
a) Add lockdep class keys in struct net_device
- qdisc_running, xmit, addr_list, qdisc_busylock
- these keys are used as dynamic lockdep key.
b) When net_device is being allocated, lockdep keys are registered.
- alloc_netdev_mqs()
c) When net_device is being free'd llockdep keys are unregistered.
- free_netdev()
d) Add generic lockdep key helper function
- netdev_register_lockdep_key()
- netdev_unregister_lockdep_key()
- netdev_update_lockdep_key()
e) Remove unnecessary generic lockdep macro and functions
f) Remove unnecessary lockdep code of each interfaces.
After this patch, each interface modules don't need to maintain
their lockdep keys.
Signed-off-by: Taehee Yoo <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
disabled device.
With the recently added error recovery logic, the device may already
be disabled if the firmware recovery is unsuccessful. In
bnxt_remove_one(), check that the device is still enabled first
before calling pci_disable_device().
Fixes: 3bc7d4a352ef ("bnxt_en: Add BNXT_STATE_IN_FW_RESET state.")
Signed-off-by: Vasundhara Volam <[email protected]>
Signed-off-by: Michael Chan <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Minor formatting changes to diagnose cb for FW devlink health
reporter.
Suggested-by: Jiri Pirko <[email protected]>
Cc: Jiri Pirko <[email protected]>
Signed-off-by: Vasundhara Volam <[email protected]>
Signed-off-by: Michael Chan <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
When firmware indicates that driver needs to invoke firmware reset
which is common for both error recovery and live firmware reset path,
driver needs a different time to wait before polling for firmware
readiness.
Modify the wait time to fw_reset_min_dsecs, which is initialised to
correct timeout for error recovery and firmware reset.
Fixes: 4037eb715680 ("bnxt_en: Add a new BNXT_FW_RESET_STATE_POLL_FW_DOWN state.")
Signed-off-by: Vasundhara Volam <[email protected]>
Signed-off-by: Michael Chan <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
The current code does not do endian swapping between the devlink
parameter and the internal NVRAM representation. Define a union to
represent the little endian NVRAM data and add 2 helper functions to
copy to and from the NVRAM data with the proper byte swapping.
Fixes: 782a624d00fa ("bnxt_en: Add bnxt_en initial port params table and register it")
Cc: Jiri Pirko <[email protected]>
Reviewed-by: Vasundhara Volam <[email protected]>
Signed-off-by: Michael Chan <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
The current code that rounds up the NVRAM parameter bit size to the next
byte size for the devlink parameter is not always correct. The MSIX
devlink parameters are 4 bytes and we don't get the correct size
using this method.
Fix it by adding a new dl_num_bytes member to the bnxt_dl_nvm_param
structure which statically provides bytesize information according
to the devlink parameter type definition.
Fixes: 782a624d00fa ("bnxt_en: Add bnxt_en initial port params table and register it")
Cc: Jiri Pirko <[email protected]>
Signed-off-by: Vasundhara Volam <[email protected]>
Signed-off-by: Michael Chan <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
When the address width of DMA is greater than 32, the packet header occupies
a BD descriptor. The starting address of the data should be added to the
header length.
Fixes: a993db88d17d ("net: stmmac: Enable support for > 32 Bits addressing in XGMAC")
Cc: Eric Dumazet <[email protected]>
Cc: Giuseppe Cavallaro <[email protected]>
Cc: Alexandre Torgue <[email protected]>
Cc: Jose Abreu <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Maxime Coquelin <[email protected]>
Signed-off-by: yuqi jin <[email protected]>
Signed-off-by: Shaokun Zhang <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
The ionic driver started using dymamic_hex_dump(), but
that is not always defined:
drivers/net/ethernet/pensando/ionic/ionic_main.c:229:2: error: implicit declaration of function 'dynamic_hex_dump' [-Werror,-Wimplicit-function-declaration]
Add a dummy implementation to use when CONFIG_DYNAMIC_DEBUG
is disabled, printing nothing.
Fixes: 938962d55229 ("ionic: Add adminq action")
Signed-off-by: Arnd Bergmann <[email protected]>
Acked-by: Shannon Nelson <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
Mellanox, mlx5 kTLS fixes 18-10-2019
This series introduces kTLS related fixes to mlx5 driver from Tariq,
and two misc memory leak fixes form Navid Emamdoost.
Please pull and let me know if there is any problem.
I would appreciate it if you queue up kTLS fixes from the list below to
stable kernel v5.3 !
For -stable v4.13:
nett/mlx5: prevent memory leak in mlx5_fpga_conn_create_cq
====================
Signed-off-by: David S. Miller <[email protected]>
|
|
Pull networking fixes from David Miller:
"I was battling a cold after some recent trips, so quite a bit piled up
meanwhile, sorry about that.
Highlights:
1) Fix fd leak in various bpf selftests, from Brian Vazquez.
2) Fix crash in xsk when device doesn't support some methods, from
Magnus Karlsson.
3) Fix various leaks and use-after-free in rxrpc, from David Howells.
4) Fix several SKB leaks due to confusion of who owns an SKB and who
should release it in the llc code. From Eric Biggers.
5) Kill a bunc of KCSAN warnings in TCP, from Eric Dumazet.
6) Jumbo packets don't work after resume on r8169, as the BIOS resets
the chip into non-jumbo mode during suspend. From Heiner Kallweit.
7) Corrupt L2 header during MPLS push, from Davide Caratti.
8) Prevent possible infinite loop in tc_ctl_action, from Eric
Dumazet.
9) Get register bits right in bcmgenet driver, based upon chip
version. From Florian Fainelli.
10) Fix mutex problems in microchip DSA driver, from Marek Vasut.
11) Cure race between route lookup and invalidation in ipv4, from Wei
Wang.
12) Fix performance regression due to false sharing in 'net'
structure, from Eric Dumazet"
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (145 commits)
net: reorder 'struct net' fields to avoid false sharing
net: dsa: fix switch tree list
net: ethernet: dwmac-sun8i: show message only when switching to promisc
net: aquantia: add an error handling in aq_nic_set_multicast_list
net: netem: correct the parent's backlog when corrupted packet was dropped
net: netem: fix error path for corrupted GSO frames
macb: propagate errors when getting optional clocks
xen/netback: fix error path of xenvif_connect_data()
net: hns3: fix mis-counting IRQ vector numbers issue
net: usb: lan78xx: Connect PHY before registering MAC
vsock/virtio: discard packets if credit is not respected
vsock/virtio: send a credit update when buffer size is changed
mlxsw: spectrum_trap: Push Ethernet header before reporting trap
net: ensure correct skb->tstamp in various fragmenters
net: bcmgenet: reset 40nm EPHY on energy detect
net: bcmgenet: soft reset 40nm EPHYs before MAC init
net: phy: bcm7xxx: define soft_reset for 40nm EPHY
net: bcmgenet: don't set phydev->link from MAC
net: Update address for MediaTek ethernet driver in MAINTAINERS
ipv4: fix race condition between route lookup and invalidation
...
|
|
Printing the info message every time more than the max number of mac
addresses are requested generates unnecessary log spam. Showing it only
when the hw is not already in promiscous mode is equally informative
without being annoying.
Signed-off-by: Mans Rullgard <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
add an error handling in aq_nic_set_multicast_list, it may not
work when hw_multicast_list_set error; and at the same time
it will remove gcc Wunused-but-set-variable warning.
Signed-off-by: Chenwandun <[email protected]>
Reviewed-by: Igor Russkikh <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The tx_clk, rx_clk, and tsu_clk are optional. Currently the macb driver
marks clock as not available if it receives an error when trying to get
a clock. This is wrong, because a clock controller might return
-EPROBE_DEFER if a clock is not available, but will eventually become
available.
In these cases, the driver would probe successfully but will never be
able to adjust the clocks, because the clocks were not available during
probe, but became available later.
For example, the clock controller for the ZynqMP is implemented in the
PMU firmware and the clocks are only available after the firmware driver
has been probed.
Use devm_clk_get_optional() in instead of devm_clk_get() to get the
optional clock and propagate all errors to the calling function.
Signed-off-by: Michael Tretter <[email protected]>
Acked-by: Nicolas Ferre <[email protected]>
Tested-by: Nicolas Ferre <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Currently, the num_msi_left means the vector numbers of NIC,
but if the PF supported RoCE, it contains the vector numbers
of NIC and RoCE(Not expected).
This may cause interrupts lost in some case, because of the
NIC module used the vector resources which belongs to RoCE.
This patch adds a new variable num_nic_msi to store the vector
numbers of NIC, and adjust the default TQP numbers and rss_size
according to the value of num_nic_msi.
Fixes: 46a3df9f9718 ("net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support")
Signed-off-by: Yonglong Liu <[email protected]>
Signed-off-by: Huazhong Tan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
In mlx5_fw_fatal_reporter_dump if mlx5_crdump_collect fails the
allocated memory for cr_data must be released otherwise there will be
memory leak. To fix this, this commit changes the return instruction
into goto error handling.
Fixes: 9b1f29823605 ("net/mlx5: Add support for FW fatal reporter dump")
Signed-off-by: Navid Emamdoost <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
In mlx5_fpga_conn_create_cq if mlx5_vector2eqn fails the allocated
memory should be released.
Fixes: 537a50574175 ("net/mlx5: FPGA, Add high-speed connection routines")
Signed-off-by: Navid Emamdoost <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
The completion queue consumer index increments upon a call to
mlx5_cqwq_pop().
When dumping an error CQE, the index is already incremented.
Decrease one for the print command.
Fixes: 16cc14d81733 ("net/mlx5e: Dump xmit error completions")
Signed-off-by: Tariq Toukan <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
Once the kTLS TX resync function is called, it used to return
a binary value, for success or failure.
However, in case the TLS SKB is a retransmission of the connection
handshake, it initiates the resync flow (as the tcp seq check holds),
while regular packet handle is expected.
In this patch, we identify this case and skip the resync operation
accordingly.
Counters:
- Add a counter (tls_skip_no_sync_data) to monitor this.
- Bump the dump counters up as they are used more frequently.
- Add a missing counter descriptor declaration for tls_resync_bytes
in sq_stats_desc.
Fixes: d2ead1f360e8 ("net/mlx5e: Add kTLS TX HW offload support")
Signed-off-by: Tariq Toukan <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
Do not assume the crypto info is accessible during the
connection lifetime. Save a copy of it in the private
TX context.
Fixes: d2ead1f360e8 ("net/mlx5e: Add kTLS TX HW offload support")
Signed-off-by: Tariq Toukan <[email protected]>
Reviewed-by: Eran Ben Elisha <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
Cipher type is checked upon connection addition.
No need to recheck it per every TX resync invocation.
Fixes: d2ead1f360e8 ("net/mlx5e: Add kTLS TX HW offload support")
Signed-off-by: Tariq Toukan <[email protected]>
Reviewed-by: Eran Ben Elisha <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
HW expects the data size in DUMP WQEs to be up to MTU.
Make sure they are in range.
We elevate the frag page refcount by 'n-1', in addition to the
one obtained in tx_sync_info_get(), having an overall of 'n'
references. We bulk increments by using a single page_ref_add()
command, to optimize perfermance.
The refcounts are released one by one, by the corresponding completions.
Fixes: d2ead1f360e8 ("net/mlx5e: Add kTLS TX HW offload support")
Signed-off-by: Tariq Toukan <[email protected]>
Reviewed-by: Eran Ben Elisha <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
Before posting the context params WQEs, make sure there is enough
contiguous room for them, and fill frag edge if needed.
When posting only a nop, no need for room check, as it needs a single
WQEBB, meaning no contiguity issue.
Fixes: d2ead1f360e8 ("net/mlx5e: Add kTLS TX HW offload support")
Signed-off-by: Tariq Toukan <[email protected]>
Reviewed-by: Eran Ben Elisha <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
All references for frag pages that are obtained in tx_sync_info_get()
should be released.
Release usually occurs in the corresponding CQE of the WQE.
In error flows, not all fragments have a WQE posted for them, hence
no matching CQE will be generated.
For these pages, release the reference in the error flow.
Fixes: d2ead1f360e8 ("net/mlx5e: Add kTLS TX HW offload support")
Signed-off-by: Tariq Toukan <[email protected]>
Reviewed-by: Eran Ben Elisha <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
Access the record fragments only under the TLS ctx lock.
In the resync flow, save a copy of them to be used when
preparing and posting the required DUMP WQEs.
Fixes: d2ead1f360e8 ("net/mlx5e: Add kTLS TX HW offload support")
Signed-off-by: Tariq Toukan <[email protected]>
Reviewed-by: Eran Ben Elisha <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|