aboutsummaryrefslogtreecommitdiff
path: root/drivers/net
AgeCommit message (Collapse)AuthorFilesLines
2022-08-22net/mlx5e: Fix wrong tc flag used when set hw-tc-offload offMaor Dickman1-1/+3
The cited commit reintroduced the ability to set hw-tc-offload in switchdev mode by reusing NIC mode calls without modifying it to support both modes, this can cause an illegal memory access when trying to turn hw-tc-offload off. Fix this by using the right TC_FLAG when checking if tc rules are installed while disabling hw-tc-offload. Fixes: d3cbd4254df8 ("net/mlx5e: Add ndo_set_feature for uplink representor") Signed-off-by: Maor Dickman <maord@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-08-22net/mlx5e: TC, Add missing policer validationRoi Dayan1-0/+4
There is a missing policer validation when offloading police action with tc action api. Add it. Fixes: 7d1a5ce46e47 ("net/mlx5e: TC, Support tc action api for police") Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-08-22net/mlx5e: Fix wrong application of the LRO stateAya Levin1-8/+0
Driver caches packet merge type in mlx5e_params instance which must be in perfect sync with the netdev_feature's bit. Prior to this patch, in certain conditions (*) LRO state was set in mlx5e_params, while netdev_feature's bit was off. Causing the LRO to be applied on the RQs (HW level). (*) This can happen only on profile init (mlx5e_build_nic_params()), when RQ expect non-linear SKB and PCI is fast enough in comparison to link width. Solution: remove setting of packet merge type from mlx5e_build_nic_params() as netdev features are not updated. Fixes: 619a8f2a42f1 ("net/mlx5e: Use linear SKB in Striding RQ") Signed-off-by: Aya Levin <ayal@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-08-22net/mlx5: Avoid false positive lockdep warning by adding lock_class_keyMoshe Shemesh1-0/+4
Add a lock_class_key per mlx5 device to avoid a false positive "possible circular locking dependency" warning by lockdep, on flows which lock more than one mlx5 device, such as adding SF. kernel log: ====================================================== WARNING: possible circular locking dependency detected 5.19.0-rc8+ #2 Not tainted ------------------------------------------------------ kworker/u20:0/8 is trying to acquire lock: ffff88812dfe0d98 (&dev->intf_state_mutex){+.+.}-{3:3}, at: mlx5_init_one+0x2e/0x490 [mlx5_core] but task is already holding lock: ffff888101aa7898 (&(&notifier->n_head)->rwsem){++++}-{3:3}, at: blocking_notifier_call_chain+0x5a/0x130 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&(&notifier->n_head)->rwsem){++++}-{3:3}: down_write+0x90/0x150 blocking_notifier_chain_register+0x53/0xa0 mlx5_sf_table_init+0x369/0x4a0 [mlx5_core] mlx5_init_one+0x261/0x490 [mlx5_core] probe_one+0x430/0x680 [mlx5_core] local_pci_probe+0xd6/0x170 work_for_cpu_fn+0x4e/0xa0 process_one_work+0x7c2/0x1340 worker_thread+0x6f6/0xec0 kthread+0x28f/0x330 ret_from_fork+0x1f/0x30 -> #0 (&dev->intf_state_mutex){+.+.}-{3:3}: __lock_acquire+0x2fc7/0x6720 lock_acquire+0x1c1/0x550 __mutex_lock+0x12c/0x14b0 mlx5_init_one+0x2e/0x490 [mlx5_core] mlx5_sf_dev_probe+0x29c/0x370 [mlx5_core] auxiliary_bus_probe+0x9d/0xe0 really_probe+0x1e0/0xaa0 __driver_probe_device+0x219/0x480 driver_probe_device+0x49/0x130 __device_attach_driver+0x1b8/0x280 bus_for_each_drv+0x123/0x1a0 __device_attach+0x1a3/0x460 bus_probe_device+0x1a2/0x260 device_add+0x9b1/0x1b40 __auxiliary_device_add+0x88/0xc0 mlx5_sf_dev_state_change_handler+0x67e/0x9d0 [mlx5_core] blocking_notifier_call_chain+0xd5/0x130 mlx5_vhca_state_work_handler+0x2b0/0x3f0 [mlx5_core] process_one_work+0x7c2/0x1340 worker_thread+0x59d/0xec0 kthread+0x28f/0x330 ret_from_fork+0x1f/0x30 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&(&notifier->n_head)->rwsem); lock(&dev->intf_state_mutex); lock(&(&notifier->n_head)->rwsem); lock(&dev->intf_state_mutex); *** DEADLOCK *** 4 locks held by kworker/u20:0/8: #0: ffff888150612938 ((wq_completion)mlx5_events){+.+.}-{0:0}, at: process_one_work+0x6e2/0x1340 #1: ffff888100cafdb8 ((work_completion)(&work->work)#3){+.+.}-{0:0}, at: process_one_work+0x70f/0x1340 #2: ffff888101aa7898 (&(&notifier->n_head)->rwsem){++++}-{3:3}, at: blocking_notifier_call_chain+0x5a/0x130 #3: ffff88813682d0e8 (&dev->mutex){....}-{3:3}, at:__device_attach+0x76/0x460 stack backtrace: CPU: 6 PID: 8 Comm: kworker/u20:0 Not tainted 5.19.0-rc8+ Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 Workqueue: mlx5_events mlx5_vhca_state_work_handler [mlx5_core] Call Trace: <TASK> dump_stack_lvl+0x57/0x7d check_noncircular+0x278/0x300 ? print_circular_bug+0x460/0x460 ? lock_chain_count+0x20/0x20 ? register_lock_class+0x1880/0x1880 __lock_acquire+0x2fc7/0x6720 ? register_lock_class+0x1880/0x1880 ? register_lock_class+0x1880/0x1880 lock_acquire+0x1c1/0x550 ? mlx5_init_one+0x2e/0x490 [mlx5_core] ? lockdep_hardirqs_on_prepare+0x400/0x400 __mutex_lock+0x12c/0x14b0 ? mlx5_init_one+0x2e/0x490 [mlx5_core] ? mlx5_init_one+0x2e/0x490 [mlx5_core] ? _raw_read_unlock+0x1f/0x30 ? mutex_lock_io_nested+0x1320/0x1320 ? __ioremap_caller.constprop.0+0x306/0x490 ? mlx5_sf_dev_probe+0x269/0x370 [mlx5_core] ? iounmap+0x160/0x160 mlx5_init_one+0x2e/0x490 [mlx5_core] mlx5_sf_dev_probe+0x29c/0x370 [mlx5_core] ? mlx5_sf_dev_remove+0x130/0x130 [mlx5_core] auxiliary_bus_probe+0x9d/0xe0 really_probe+0x1e0/0xaa0 __driver_probe_device+0x219/0x480 ? auxiliary_match_id+0xe9/0x140 driver_probe_device+0x49/0x130 __device_attach_driver+0x1b8/0x280 ? driver_allows_async_probing+0x140/0x140 bus_for_each_drv+0x123/0x1a0 ? bus_for_each_dev+0x1a0/0x1a0 ? lockdep_hardirqs_on_prepare+0x286/0x400 ? trace_hardirqs_on+0x2d/0x100 __device_attach+0x1a3/0x460 ? device_driver_attach+0x1e0/0x1e0 ? kobject_uevent_env+0x22d/0xf10 bus_probe_device+0x1a2/0x260 device_add+0x9b1/0x1b40 ? dev_set_name+0xab/0xe0 ? __fw_devlink_link_to_suppliers+0x260/0x260 ? memset+0x20/0x40 ? lockdep_init_map_type+0x21a/0x7d0 __auxiliary_device_add+0x88/0xc0 ? auxiliary_device_init+0x86/0xa0 mlx5_sf_dev_state_change_handler+0x67e/0x9d0 [mlx5_core] blocking_notifier_call_chain+0xd5/0x130 mlx5_vhca_state_work_handler+0x2b0/0x3f0 [mlx5_core] ? mlx5_vhca_event_arm+0x100/0x100 [mlx5_core] ? lock_downgrade+0x6e0/0x6e0 ? lockdep_hardirqs_on_prepare+0x286/0x400 process_one_work+0x7c2/0x1340 ? lockdep_hardirqs_on_prepare+0x400/0x400 ? pwq_dec_nr_in_flight+0x230/0x230 ? rwlock_bug.part.0+0x90/0x90 worker_thread+0x59d/0xec0 ? process_one_work+0x1340/0x1340 kthread+0x28f/0x330 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x1f/0x30 </TASK> Fixes: 6a3273217469 ("net/mlx5: SF, Port function state change support") Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-08-22net/mlx5: Fix cmd error logging for manage pages cmdRoy Novich1-3/+6
When the driver unloads, give/reclaim_pages may fail as PF driver in teardown flow, current code will lead to the following kernel log print 'failed reclaiming pages: err 0'. Fix it to get same behavior as before the cited commits, by calling mlx5_cmd_check before handling error state. mlx5_cmd_check will verify if the returned error is an actual error needed to be handled by the driver or not and will return an appropriate value. Fixes: 8d564292a166 ("net/mlx5: Remove redundant error on reclaim pages") Fixes: 4dac2f10ada0 ("net/mlx5: Remove redundant notify fail on give pages") Signed-off-by: Roy Novich <royno@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-08-22net/mlx5: Disable irq when locking lag_lockVlad Buslov1-22/+33
The lag_lock is taken from both process and softirq contexts which results lockdep warning[0] about potential deadlock. However, just disabling softirqs by using *_bh spinlock API is not enough since it will cause warning in some contexts where the lock is obtained with hard irqs disabled. To fix the issue save current irq state, disable them before obtaining the lock an re-enable irqs from saved state after releasing it. [0]: [Sun Aug 7 13:12:29 2022] ================================ [Sun Aug 7 13:12:29 2022] WARNING: inconsistent lock state [Sun Aug 7 13:12:29 2022] 5.19.0_for_upstream_debug_2022_08_04_16_06 #1 Not tainted [Sun Aug 7 13:12:29 2022] -------------------------------- [Sun Aug 7 13:12:29 2022] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage. [Sun Aug 7 13:12:29 2022] swapper/0/0 [HC0[0]:SC1[1]:HE1:SE0] takes: [Sun Aug 7 13:12:29 2022] ffffffffa06dc0d8 (lag_lock){+.?.}-{2:2}, at: mlx5_lag_is_shared_fdb+0x1f/0x120 [mlx5_core] [Sun Aug 7 13:12:29 2022] {SOFTIRQ-ON-W} state was registered at: [Sun Aug 7 13:12:29 2022] lock_acquire+0x1c1/0x550 [Sun Aug 7 13:12:29 2022] _raw_spin_lock+0x2c/0x40 [Sun Aug 7 13:12:29 2022] mlx5_lag_add_netdev+0x13b/0x480 [mlx5_core] [Sun Aug 7 13:12:29 2022] mlx5e_nic_enable+0x114/0x470 [mlx5_core] [Sun Aug 7 13:12:29 2022] mlx5e_attach_netdev+0x30e/0x6a0 [mlx5_core] [Sun Aug 7 13:12:29 2022] mlx5e_resume+0x105/0x160 [mlx5_core] [Sun Aug 7 13:12:29 2022] mlx5e_probe+0xac3/0x14f0 [mlx5_core] [Sun Aug 7 13:12:29 2022] auxiliary_bus_probe+0x9d/0xe0 [Sun Aug 7 13:12:29 2022] really_probe+0x1e0/0xaa0 [Sun Aug 7 13:12:29 2022] __driver_probe_device+0x219/0x480 [Sun Aug 7 13:12:29 2022] driver_probe_device+0x49/0x130 [Sun Aug 7 13:12:29 2022] __driver_attach+0x1e4/0x4d0 [Sun Aug 7 13:12:29 2022] bus_for_each_dev+0x11e/0x1a0 [Sun Aug 7 13:12:29 2022] bus_add_driver+0x3f4/0x5a0 [Sun Aug 7 13:12:29 2022] driver_register+0x20f/0x390 [Sun Aug 7 13:12:29 2022] __auxiliary_driver_register+0x14e/0x260 [Sun Aug 7 13:12:29 2022] mlx5e_init+0x38/0x90 [mlx5_core] [Sun Aug 7 13:12:29 2022] vhost_iotlb_itree_augment_rotate+0xcb/0x180 [vhost_iotlb] [Sun Aug 7 13:12:29 2022] do_one_initcall+0xc4/0x400 [Sun Aug 7 13:12:29 2022] do_init_module+0x18a/0x620 [Sun Aug 7 13:12:29 2022] load_module+0x563a/0x7040 [Sun Aug 7 13:12:29 2022] __do_sys_finit_module+0x122/0x1d0 [Sun Aug 7 13:12:29 2022] do_syscall_64+0x3d/0x90 [Sun Aug 7 13:12:29 2022] entry_SYSCALL_64_after_hwframe+0x46/0xb0 [Sun Aug 7 13:12:29 2022] irq event stamp: 3596508 [Sun Aug 7 13:12:29 2022] hardirqs last enabled at (3596508): [<ffffffff813687c2>] __local_bh_enable_ip+0xa2/0x100 [Sun Aug 7 13:12:29 2022] hardirqs last disabled at (3596507): [<ffffffff813687da>] __local_bh_enable_ip+0xba/0x100 [Sun Aug 7 13:12:29 2022] softirqs last enabled at (3596488): [<ffffffff81368a2a>] irq_exit_rcu+0x11a/0x170 [Sun Aug 7 13:12:29 2022] softirqs last disabled at (3596495): [<ffffffff81368a2a>] irq_exit_rcu+0x11a/0x170 [Sun Aug 7 13:12:29 2022] other info that might help us debug this: [Sun Aug 7 13:12:29 2022] Possible unsafe locking scenario: [Sun Aug 7 13:12:29 2022] CPU0 [Sun Aug 7 13:12:29 2022] ---- [Sun Aug 7 13:12:29 2022] lock(lag_lock); [Sun Aug 7 13:12:29 2022] <Interrupt> [Sun Aug 7 13:12:29 2022] lock(lag_lock); [Sun Aug 7 13:12:29 2022] *** DEADLOCK *** [Sun Aug 7 13:12:29 2022] 4 locks held by swapper/0/0: [Sun Aug 7 13:12:29 2022] #0: ffffffff84643260 (rcu_read_lock){....}-{1:2}, at: mlx5e_napi_poll+0x43/0x20a0 [mlx5_core] [Sun Aug 7 13:12:29 2022] #1: ffffffff84643260 (rcu_read_lock){....}-{1:2}, at: netif_receive_skb_list_internal+0x2d7/0xd60 [Sun Aug 7 13:12:29 2022] #2: ffff888144a18b58 (&br->hash_lock){+.-.}-{2:2}, at: br_fdb_update+0x301/0x570 [Sun Aug 7 13:12:29 2022] #3: ffffffff84643260 (rcu_read_lock){....}-{1:2}, at: atomic_notifier_call_chain+0x5/0x1d0 [Sun Aug 7 13:12:29 2022] stack backtrace: [Sun Aug 7 13:12:29 2022] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.19.0_for_upstream_debug_2022_08_04_16_06 #1 [Sun Aug 7 13:12:29 2022] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 [Sun Aug 7 13:12:29 2022] Call Trace: [Sun Aug 7 13:12:29 2022] <IRQ> [Sun Aug 7 13:12:29 2022] dump_stack_lvl+0x57/0x7d [Sun Aug 7 13:12:29 2022] mark_lock.part.0.cold+0x5f/0x92 [Sun Aug 7 13:12:29 2022] ? lock_chain_count+0x20/0x20 [Sun Aug 7 13:12:29 2022] ? unwind_next_frame+0x1c4/0x1b50 [Sun Aug 7 13:12:29 2022] ? secondary_startup_64_no_verify+0xcd/0xdb [Sun Aug 7 13:12:29 2022] ? mlx5e_napi_poll+0x4e9/0x20a0 [mlx5_core] [Sun Aug 7 13:12:29 2022] ? mlx5e_napi_poll+0x4e9/0x20a0 [mlx5_core] [Sun Aug 7 13:12:29 2022] ? stack_access_ok+0x1d0/0x1d0 [Sun Aug 7 13:12:29 2022] ? start_kernel+0x3a7/0x3c5 [Sun Aug 7 13:12:29 2022] __lock_acquire+0x1260/0x6720 [Sun Aug 7 13:12:29 2022] ? lock_chain_count+0x20/0x20 [Sun Aug 7 13:12:29 2022] ? lock_chain_count+0x20/0x20 [Sun Aug 7 13:12:29 2022] ? register_lock_class+0x1880/0x1880 [Sun Aug 7 13:12:29 2022] ? mark_lock.part.0+0xed/0x3060 [Sun Aug 7 13:12:29 2022] ? stack_trace_save+0x91/0xc0 [Sun Aug 7 13:12:29 2022] lock_acquire+0x1c1/0x550 [Sun Aug 7 13:12:29 2022] ? mlx5_lag_is_shared_fdb+0x1f/0x120 [mlx5_core] [Sun Aug 7 13:12:29 2022] ? lockdep_hardirqs_on_prepare+0x400/0x400 [Sun Aug 7 13:12:29 2022] ? __lock_acquire+0xd6f/0x6720 [Sun Aug 7 13:12:29 2022] _raw_spin_lock+0x2c/0x40 [Sun Aug 7 13:12:29 2022] ? mlx5_lag_is_shared_fdb+0x1f/0x120 [mlx5_core] [Sun Aug 7 13:12:29 2022] mlx5_lag_is_shared_fdb+0x1f/0x120 [mlx5_core] [Sun Aug 7 13:12:29 2022] mlx5_esw_bridge_rep_vport_num_vhca_id_get+0x1a0/0x600 [mlx5_core] [Sun Aug 7 13:12:29 2022] ? mlx5_esw_bridge_update_work+0x90/0x90 [mlx5_core] [Sun Aug 7 13:12:29 2022] ? lock_acquire+0x1c1/0x550 [Sun Aug 7 13:12:29 2022] mlx5_esw_bridge_switchdev_event+0x185/0x8f0 [mlx5_core] [Sun Aug 7 13:12:29 2022] ? mlx5_esw_bridge_port_obj_attr_set+0x3e0/0x3e0 [mlx5_core] [Sun Aug 7 13:12:29 2022] ? check_chain_key+0x24a/0x580 [Sun Aug 7 13:12:29 2022] atomic_notifier_call_chain+0xd7/0x1d0 [Sun Aug 7 13:12:29 2022] br_switchdev_fdb_notify+0xea/0x100 [Sun Aug 7 13:12:29 2022] ? br_switchdev_set_port_flag+0x310/0x310 [Sun Aug 7 13:12:29 2022] fdb_notify+0x11b/0x150 [Sun Aug 7 13:12:29 2022] br_fdb_update+0x34c/0x570 [Sun Aug 7 13:12:29 2022] ? lock_chain_count+0x20/0x20 [Sun Aug 7 13:12:29 2022] ? br_fdb_add_local+0x50/0x50 [Sun Aug 7 13:12:29 2022] ? br_allowed_ingress+0x5f/0x1070 [Sun Aug 7 13:12:29 2022] ? check_chain_key+0x24a/0x580 [Sun Aug 7 13:12:29 2022] br_handle_frame_finish+0x786/0x18e0 [Sun Aug 7 13:12:29 2022] ? check_chain_key+0x24a/0x580 [Sun Aug 7 13:12:29 2022] ? br_handle_local_finish+0x20/0x20 [Sun Aug 7 13:12:29 2022] ? __lock_acquire+0xd6f/0x6720 [Sun Aug 7 13:12:29 2022] ? sctp_inet_bind_verify+0x4d/0x190 [Sun Aug 7 13:12:29 2022] ? xlog_unpack_data+0x2e0/0x310 [Sun Aug 7 13:12:29 2022] ? br_handle_local_finish+0x20/0x20 [Sun Aug 7 13:12:29 2022] br_nf_hook_thresh+0x227/0x380 [br_netfilter] [Sun Aug 7 13:12:29 2022] ? setup_pre_routing+0x460/0x460 [br_netfilter] [Sun Aug 7 13:12:29 2022] ? br_handle_local_finish+0x20/0x20 [Sun Aug 7 13:12:29 2022] ? br_nf_pre_routing_ipv6+0x48b/0x69c [br_netfilter] [Sun Aug 7 13:12:29 2022] br_nf_pre_routing_finish_ipv6+0x5c2/0xbf0 [br_netfilter] [Sun Aug 7 13:12:29 2022] ? br_handle_local_finish+0x20/0x20 [Sun Aug 7 13:12:29 2022] br_nf_pre_routing_ipv6+0x4c6/0x69c [br_netfilter] [Sun Aug 7 13:12:29 2022] ? br_validate_ipv6+0x9e0/0x9e0 [br_netfilter] [Sun Aug 7 13:12:29 2022] ? br_nf_forward_arp+0xb70/0xb70 [br_netfilter] [Sun Aug 7 13:12:29 2022] ? br_nf_pre_routing+0xacf/0x1160 [br_netfilter] [Sun Aug 7 13:12:29 2022] br_handle_frame+0x8a9/0x1270 [Sun Aug 7 13:12:29 2022] ? br_handle_frame_finish+0x18e0/0x18e0 [Sun Aug 7 13:12:29 2022] ? register_lock_class+0x1880/0x1880 [Sun Aug 7 13:12:29 2022] ? br_handle_local_finish+0x20/0x20 [Sun Aug 7 13:12:29 2022] ? bond_handle_frame+0xf9/0xac0 [bonding] [Sun Aug 7 13:12:29 2022] ? br_handle_frame_finish+0x18e0/0x18e0 [Sun Aug 7 13:12:29 2022] __netif_receive_skb_core+0x7c0/0x2c70 [Sun Aug 7 13:12:29 2022] ? check_chain_key+0x24a/0x580 [Sun Aug 7 13:12:29 2022] ? generic_xdp_tx+0x5b0/0x5b0 [Sun Aug 7 13:12:29 2022] ? __lock_acquire+0xd6f/0x6720 [Sun Aug 7 13:12:29 2022] ? register_lock_class+0x1880/0x1880 [Sun Aug 7 13:12:29 2022] ? check_chain_key+0x24a/0x580 [Sun Aug 7 13:12:29 2022] __netif_receive_skb_list_core+0x2d7/0x8a0 [Sun Aug 7 13:12:29 2022] ? lock_acquire+0x1c1/0x550 [Sun Aug 7 13:12:29 2022] ? process_backlog+0x960/0x960 [Sun Aug 7 13:12:29 2022] ? lockdep_hardirqs_on_prepare+0x129/0x400 [Sun Aug 7 13:12:29 2022] ? kvm_clock_get_cycles+0x14/0x20 [Sun Aug 7 13:12:29 2022] netif_receive_skb_list_internal+0x5f4/0xd60 [Sun Aug 7 13:12:29 2022] ? do_xdp_generic+0x150/0x150 [Sun Aug 7 13:12:29 2022] ? mlx5e_poll_rx_cq+0xf6b/0x2960 [mlx5_core] [Sun Aug 7 13:12:29 2022] ? mlx5e_poll_ico_cq+0x3d/0x1590 [mlx5_core] [Sun Aug 7 13:12:29 2022] napi_complete_done+0x188/0x710 [Sun Aug 7 13:12:29 2022] mlx5e_napi_poll+0x4e9/0x20a0 [mlx5_core] [Sun Aug 7 13:12:29 2022] ? __queue_work+0x53c/0xeb0 [Sun Aug 7 13:12:29 2022] __napi_poll+0x9f/0x540 [Sun Aug 7 13:12:29 2022] net_rx_action+0x420/0xb70 [Sun Aug 7 13:12:29 2022] ? napi_threaded_poll+0x470/0x470 [Sun Aug 7 13:12:29 2022] ? __common_interrupt+0x79/0x1a0 [Sun Aug 7 13:12:29 2022] __do_softirq+0x271/0x92c [Sun Aug 7 13:12:29 2022] irq_exit_rcu+0x11a/0x170 [Sun Aug 7 13:12:29 2022] common_interrupt+0x7d/0xa0 [Sun Aug 7 13:12:29 2022] </IRQ> [Sun Aug 7 13:12:29 2022] <TASK> [Sun Aug 7 13:12:29 2022] asm_common_interrupt+0x22/0x40 [Sun Aug 7 13:12:29 2022] RIP: 0010:default_idle+0x42/0x60 [Sun Aug 7 13:12:29 2022] Code: c1 83 e0 07 48 c1 e9 03 83 c0 03 0f b6 14 11 38 d0 7c 04 84 d2 75 14 8b 05 6b f1 22 02 85 c0 7e 07 0f 00 2d 80 3b 4a 00 fb f4 <c3> 48 c7 c7 e0 07 7e 85 e8 21 bd 40 fe eb de 66 66 2e 0f 1f 84 00 [Sun Aug 7 13:12:29 2022] RSP: 0018:ffffffff84407e18 EFLAGS: 00000242 [Sun Aug 7 13:12:29 2022] RAX: 0000000000000001 RBX: ffffffff84ec4a68 RCX: 1ffffffff0afc0fc [Sun Aug 7 13:12:29 2022] RDX: 0000000000000004 RSI: 0000000000000000 RDI: ffffffff835b1fac [Sun Aug 7 13:12:29 2022] RBP: 0000000000000000 R08: 0000000000000001 R09: ffff8884d2c44ac3 [Sun Aug 7 13:12:29 2022] R10: ffffed109a588958 R11: 00000000ffffffff R12: 0000000000000000 [Sun Aug 7 13:12:29 2022] R13: ffffffff84efac20 R14: 0000000000000000 R15: dffffc0000000000 [Sun Aug 7 13:12:29 2022] ? default_idle_call+0xcc/0x460 [Sun Aug 7 13:12:29 2022] default_idle_call+0xec/0x460 [Sun Aug 7 13:12:29 2022] do_idle+0x394/0x450 [Sun Aug 7 13:12:29 2022] ? arch_cpu_idle_exit+0x40/0x40 [Sun Aug 7 13:12:29 2022] cpu_startup_entry+0x19/0x20 [Sun Aug 7 13:12:29 2022] rest_init+0x156/0x250 [Sun Aug 7 13:12:29 2022] arch_call_rest_init+0xf/0x15 [Sun Aug 7 13:12:29 2022] start_kernel+0x3a7/0x3c5 [Sun Aug 7 13:12:29 2022] secondary_startup_64_no_verify+0xcd/0xdb [Sun Aug 7 13:12:29 2022] </TASK> Fixes: ff9b7521468b ("net/mlx5: Bridge, support LAG") Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-08-22net/mlx5: Eswitch, Fix forwarding decision to uplinkEli Cohen1-1/+2
Make sure to modify the rule for uplink forwarding only for the case where destination vport number is MLX5_VPORT_UPLINK. Fixes: 94db33177819 ("net/mlx5: Support multiport eswitch mode") Signed-off-by: Eli Cohen <elic@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-08-22net/mlx5: LAG, fix logic over MLX5_LAG_FLAG_NDEVS_READYEli Cohen1-1/+1
Only set MLX5_LAG_FLAG_NDEVS_READY if both netdevices are registered. Doing so guarantees that both ldev->pf[MLX5_LAG_P0].dev and ldev->pf[MLX5_LAG_P1].dev have valid pointers when MLX5_LAG_FLAG_NDEVS_READY is set. The core issue is asymmetry in setting MLX5_LAG_FLAG_NDEVS_READY and clearing it. Setting it is done wrongly when both ldev->pf[MLX5_LAG_P0].dev and ldev->pf[MLX5_LAG_P1].dev are set; clearing it is done right when either of ldev->pf[i].netdev is cleared. Consider the following scenario: 1. PF0 loads and sets ldev->pf[MLX5_LAG_P0].dev to a valid pointer 2. PF1 loads and sets both ldev->pf[MLX5_LAG_P1].dev and ldev->pf[MLX5_LAG_P1].netdev with valid pointers. This results in MLX5_LAG_FLAG_NDEVS_READY is set. 3. PF0 is unloaded before setting dev->pf[MLX5_LAG_P0].netdev. MLX5_LAG_FLAG_NDEVS_READY remains set. Further execution of mlx5_do_bond() will result in null pointer dereference when calling mlx5_lag_is_multipath() This patch fixes the following call trace actually encountered: [ 1293.475195] BUG: kernel NULL pointer dereference, address: 00000000000009a8 [ 1293.478756] #PF: supervisor read access in kernel mode [ 1293.481320] #PF: error_code(0x0000) - not-present page [ 1293.483686] PGD 0 P4D 0 [ 1293.484434] Oops: 0000 [#1] SMP PTI [ 1293.485377] CPU: 1 PID: 23690 Comm: kworker/u16:2 Not tainted 5.18.0-rc5_for_upstream_min_debug_2022_05_05_10_13 #1 [ 1293.488039] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 [ 1293.490836] Workqueue: mlx5_lag mlx5_do_bond_work [mlx5_core] [ 1293.492448] RIP: 0010:mlx5_lag_is_multipath+0x5/0x50 [mlx5_core] [ 1293.494044] Code: e8 70 40 ff e0 48 8b 14 24 48 83 05 5c 1a 1b 00 01 e9 19 ff ff ff 48 83 05 47 1a 1b 00 01 eb d7 0f 1f 44 00 00 0f 1f 44 00 00 <48> 8b 87 a8 09 00 00 48 85 c0 74 26 48 83 05 a7 1b 1b 00 01 41 b8 [ 1293.498673] RSP: 0018:ffff88811b2fbe40 EFLAGS: 00010202 [ 1293.500152] RAX: ffff88818a94e1c0 RBX: ffff888165eca6c0 RCX: 0000000000000000 [ 1293.501841] RDX: 0000000000000001 RSI: ffff88818a94e1c0 RDI: 0000000000000000 [ 1293.503585] RBP: 0000000000000000 R08: ffff888119886740 R09: ffff888165eca73c [ 1293.505286] R10: 0000000000000018 R11: 0000000000000018 R12: ffff88818a94e1c0 [ 1293.506979] R13: ffff888112729800 R14: 0000000000000000 R15: ffff888112729858 [ 1293.508753] FS: 0000000000000000(0000) GS:ffff88852cc40000(0000) knlGS:0000000000000000 [ 1293.510782] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1293.512265] CR2: 00000000000009a8 CR3: 00000001032d4002 CR4: 0000000000370ea0 [ 1293.514001] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 1293.515806] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Fixes: 8a66e4585979 ("net/mlx5: Change ownership model for lag") Signed-off-by: Eli Cohen <elic@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-08-22net/mlx5e: Properly disable vlan strip on non-UL repsVlad Buslov1-0/+2
When querying mlx5 non-uplink representors capabilities with ethtool rx-vlan-offload is marked as "off [fixed]". However, it is actually always enabled because mlx5e_params->vlan_strip_disable is 0 by default when initializing struct mlx5e_params instance. Fix the issue by explicitly setting the vlan_strip_disable to 'true' for non-uplink representors. Fixes: cb67b832921c ("net/mlx5e: Introduce SRIOV VF representors") Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-08-22ice: xsk: use Rx ring's XDP ring when picking NAPI contextMaciej Fijalkowski4-29/+48
Ice driver allocates per cpu XDP queues so that redirect path can safely use smp_processor_id() as an index to the array. At the same time though, XDP rings are used to pick NAPI context to call napi_schedule() or set NAPIF_STATE_MISSED. When user reduces queue count, say to 8, and num_possible_cpus() of underlying platform is 44, then this means queue vectors with correlated NAPI contexts will carry several XDP queues. This in turn can result in a broken behavior where NAPI context of interest will never be scheduled and AF_XDP socket will not process any traffic. To fix this, let us change the way how XDP rings are assigned to Rx rings and use this information later on when setting ice_tx_ring::xsk_pool pointer. For each Rx ring, grab the associated queue vector and walk through Tx ring's linked list. Once we stumble upon XDP ring in it, assign this ring to ice_rx_ring::xdp_ring. Previous [0] approach of fixing this issue was for txonly scenario because of the described grouping of XDP rings across queue vectors. So, relying on Rx ring meant that NAPI context could be scheduled with a queue vector without XDP ring with associated XSK pool. [0]: https://lore.kernel.org/netdev/20220707161128.54215-1-maciej.fijalkowski@intel.com/ Fixes: 2d4238f55697 ("ice: Add support for AF_XDP") Fixes: 22bf877e528f ("ice: introduce XDP_TX fallback path") Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-08-22ice: xsk: prohibit usage of non-balanced queue idMaciej Fijalkowski1-0/+6
Fix the following scenario: 1. ethtool -L $IFACE rx 8 tx 96 2. xdpsock -q 10 -t -z Above refers to a case where user would like to attach XSK socket in txonly mode at a queue id that does not have a corresponding Rx queue. At this moment ice's XSK logic is tightly bound to act on a "queue pair", e.g. both Tx and Rx queues at a given queue id are disabled/enabled and both of them will get XSK pool assigned, which is broken for the presented queue configuration. This results in the splat included at the bottom, which is basically an OOB access to Rx ring array. To fix this, allow using the ids only in scope of "combined" queues reported by ethtool. However, logic should be rewritten to allow such configurations later on, which would end up as a complete rewrite of the control path, so let us go with this temporary fix. [420160.558008] BUG: kernel NULL pointer dereference, address: 0000000000000082 [420160.566359] #PF: supervisor read access in kernel mode [420160.572657] #PF: error_code(0x0000) - not-present page [420160.579002] PGD 0 P4D 0 [420160.582756] Oops: 0000 [#1] PREEMPT SMP NOPTI [420160.588396] CPU: 10 PID: 21232 Comm: xdpsock Tainted: G OE 5.19.0-rc7+ #10 [420160.597893] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0008.031920191559 03/19/2019 [420160.609894] RIP: 0010:ice_xsk_pool_setup+0x44/0x7d0 [ice] [420160.616968] Code: f3 48 83 ec 40 48 8b 4f 20 48 8b 3f 65 48 8b 04 25 28 00 00 00 48 89 44 24 38 31 c0 48 8d 04 ed 00 00 00 00 48 01 c1 48 8b 11 <0f> b7 92 82 00 00 00 48 85 d2 0f 84 2d 75 00 00 48 8d 72 ff 48 85 [420160.639421] RSP: 0018:ffffc9002d2afd48 EFLAGS: 00010282 [420160.646650] RAX: 0000000000000050 RBX: ffff88811d8bdd00 RCX: ffff888112c14ff8 [420160.655893] RDX: 0000000000000000 RSI: ffff88811d8bdd00 RDI: ffff888109861000 [420160.665166] RBP: 000000000000000a R08: 000000000000000a R09: 0000000000000000 [420160.674493] R10: 000000000000889f R11: 0000000000000000 R12: 000000000000000a [420160.683833] R13: 000000000000000a R14: 0000000000000000 R15: ffff888117611828 [420160.693211] FS: 00007fa869fc1f80(0000) GS:ffff8897e0880000(0000) knlGS:0000000000000000 [420160.703645] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [420160.711783] CR2: 0000000000000082 CR3: 00000001d076c001 CR4: 00000000007706e0 [420160.721399] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [420160.731045] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [420160.740707] PKRU: 55555554 [420160.745960] Call Trace: [420160.750962] <TASK> [420160.755597] ? kmalloc_large_node+0x79/0x90 [420160.762703] ? __kmalloc_node+0x3f5/0x4b0 [420160.769341] xp_assign_dev+0xfd/0x210 [420160.775661] ? shmem_file_read_iter+0x29a/0x420 [420160.782896] xsk_bind+0x152/0x490 [420160.788943] __sys_bind+0xd0/0x100 [420160.795097] ? exit_to_user_mode_prepare+0x20/0x120 [420160.802801] __x64_sys_bind+0x16/0x20 [420160.809298] do_syscall_64+0x38/0x90 [420160.815741] entry_SYSCALL_64_after_hwframe+0x63/0xcd [420160.823731] RIP: 0033:0x7fa86a0dd2fb [420160.830264] Code: c3 66 0f 1f 44 00 00 48 8b 15 69 8b 0c 00 f7 d8 64 89 02 b8 ff ff ff ff eb bc 0f 1f 44 00 00 f3 0f 1e fa b8 31 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 3d 8b 0c 00 f7 d8 64 89 01 48 [420160.855410] RSP: 002b:00007ffc1146f618 EFLAGS: 00000246 ORIG_RAX: 0000000000000031 [420160.866366] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fa86a0dd2fb [420160.876957] RDX: 0000000000000010 RSI: 00007ffc1146f680 RDI: 0000000000000003 [420160.887604] RBP: 000055d7113a0520 R08: 00007fa868fb8000 R09: 0000000080000000 [420160.898293] R10: 0000000000008001 R11: 0000000000000246 R12: 000055d7113a04e0 [420160.909038] R13: 000055d7113a0320 R14: 000000000000000a R15: 0000000000000000 [420160.919817] </TASK> [420160.925659] Modules linked in: ice(OE) af_packet binfmt_misc nls_iso8859_1 ipmi_ssif intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp mei_me coretemp ioatdma mei ipmi_si wmi ipmi_msghandler acpi_pad acpi_power_meter ip_tables x_tables autofs4 ixgbe i40e crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd ahci mdio dca libahci lpc_ich [last unloaded: ice] [420160.977576] CR2: 0000000000000082 [420160.985037] ---[ end trace 0000000000000000 ]--- [420161.097724] RIP: 0010:ice_xsk_pool_setup+0x44/0x7d0 [ice] [420161.107341] Code: f3 48 83 ec 40 48 8b 4f 20 48 8b 3f 65 48 8b 04 25 28 00 00 00 48 89 44 24 38 31 c0 48 8d 04 ed 00 00 00 00 48 01 c1 48 8b 11 <0f> b7 92 82 00 00 00 48 85 d2 0f 84 2d 75 00 00 48 8d 72 ff 48 85 [420161.134741] RSP: 0018:ffffc9002d2afd48 EFLAGS: 00010282 [420161.144274] RAX: 0000000000000050 RBX: ffff88811d8bdd00 RCX: ffff888112c14ff8 [420161.155690] RDX: 0000000000000000 RSI: ffff88811d8bdd00 RDI: ffff888109861000 [420161.168088] RBP: 000000000000000a R08: 000000000000000a R09: 0000000000000000 [420161.179295] R10: 000000000000889f R11: 0000000000000000 R12: 000000000000000a [420161.190420] R13: 000000000000000a R14: 0000000000000000 R15: ffff888117611828 [420161.201505] FS: 00007fa869fc1f80(0000) GS:ffff8897e0880000(0000) knlGS:0000000000000000 [420161.213628] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [420161.223413] CR2: 0000000000000082 CR3: 00000001d076c001 CR4: 00000000007706e0 [420161.234653] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [420161.245893] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [420161.257052] PKRU: 55555554 Fixes: 2d4238f55697 ("ice: Add support for AF_XDP") Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-08-22r8152: fix the RX FIFO settings when suspendingHayes Wang1-0/+10
The RX FIFO would be changed when suspending, so the related settings have to be modified, too. Otherwise, the flow control would work abnormally. BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=216333 Reported-by: Mark Blakeney <mark.blakeney@bullet-systems.net> Fixes: cdf0b86b250f ("r8152: fix a WOL issue") Signed-off-by: Hayes Wang <hayeswang@realtek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-22r8152: fix the units of some registers for RTL8156AHayes Wang1-15/+2
The units of PLA_RX_FIFO_FULL and PLA_RX_FIFO_EMPTY are 16 bytes. Fixes: 195aae321c82 ("r8152: support new chips") Signed-off-by: Hayes Wang <hayeswang@realtek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-20Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds1-1/+1
Pull rdma fixes from Jason Gunthorpe: "A few minor fixes: - Fix buffer management in SRP to correct a regression with the login authentication feature from v5.17 - Don't iterate over non-present ports in mlx5 - Fix an error introduced by the foritify work in cxgb4 - Two bug fixes for the recently merged ERDMA driver - Unbreak RDMA dmabuf support, a regresion from v5.19" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: RDMA: Handle the return code from dma_resv_wait_timeout() properly RDMA/erdma: Correct the max_qp and max_cq capacities of the device RDMA/erdma: Using the key in FMR WR instead of MR structure RDMA/cxgb4: fix accept failure due to increased cpl_t5_pass_accept_rpl size RDMA/mlx5: Use the proper number of ports IB/iser: Fix login with authentication
2022-08-19Revert "net: macsec: update SCI upon MAC address change."Sabrina Dubroca1-6/+5
This reverts commit 6fc498bc82929ee23aa2f35a828c6178dfd3f823. Commit 6fc498bc8292 states: SCI should be updated, because it contains MAC in its first 6 octets. That's not entirely correct. The SCI can be based on the MAC address, but doesn't have to be. We can also use any 64-bit number as the SCI. When the SCI based on the MAC address, it uses a 16-bit "port number" provided by userspace, which commit 6fc498bc8292 overwrites with 1. In addition, changing the SCI after macsec has been setup can just confuse the receiver. If we configure the RXSC on the peer based on the original SCI, we should keep the same SCI on TX. When the macsec device is being managed by a userspace key negotiation daemon such as wpa_supplicant, commit 6fc498bc8292 would also overwrite the SCI defined by userspace. Fixes: 6fc498bc8292 ("net: macsec: update SCI upon MAC address change.") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Link: https://lore.kernel.org/r/9b1a9d28327e7eb54550a92eebda45d25e54dd0d.1660667033.git.sd@queasysnail.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-19net: dpaa: Fix <1G ethernet on LS1046ARDBSean Anderson1-1/+5
As discussed in commit 73a21fa817f0 ("dpaa_eth: support all modes with rate adapting PHYs"), we must add a workaround for Aquantia phys with in-tree support in order to keep 1G support working. Update this workaround for the AQR113C phy found on revision C LS1046ARDB boards. Fixes: 12cf1b89a668 ("net: phy: Add support for AQR113C EPHY") Signed-off-by: Sean Anderson <sean.anderson@seco.com> Acked-by: Camelia Groza <camelia.groza@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20220818164029.2063293-1-sean.anderson@seco.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-18Merge tag 'net-6.0-rc2' of ↵Linus Torvalds35-452/+1645
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from netfilter. Current release - regressions: - tcp: fix cleanup and leaks in tcp_read_skb() (the new way BPF socket maps get data out of the TCP stack) - tls: rx: react to strparser initialization errors - netfilter: nf_tables: fix scheduling-while-atomic splat - net: fix suspicious RCU usage in bpf_sk_reuseport_detach() Current release - new code bugs: - mlxsw: ptp: fix a couple of races, static checker warnings and error handling Previous releases - regressions: - netfilter: - nf_tables: fix possible module reference underflow in error path - make conntrack helpers deal with BIG TCP (skbs > 64kB) - nfnetlink: re-enable conntrack expectation events - net: fix potential refcount leak in ndisc_router_discovery() Previous releases - always broken: - sched: cls_route: disallow handle of 0 - neigh: fix possible local DoS due to net iface start/stop loop - rtnetlink: fix module refcount leak in rtnetlink_rcv_msg - sched: fix adding qlen to qcpu->backlog in gnet_stats_add_queue_cpu - virtio_net: fix endian-ness for RSS - dsa: mv88e6060: prevent crash on an unused port - fec: fix timer capture timing in `fec_ptp_enable_pps()` - ocelot: stats: fix races, integer wrapping and reading incorrect registers (the change of register definitions here accounts for bulk of the changed LoC in this PR)" * tag 'net-6.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (77 commits) net: moxa: MAC address reading, generating, validity checking tcp: handle pure FIN case correctly tcp: refactor tcp_read_skb() a bit tcp: fix tcp_cleanup_rbuf() for tcp_read_skb() tcp: fix sock skb accounting in tcp_read_skb() igb: Add lock to avoid data race dt-bindings: Fix incorrect "the the" corrections net: genl: fix error path memory leak in policy dumping stmmac: intel: Add a missing clk_disable_unprepare() call in intel_eth_pci_remove() net: ethernet: mtk_eth_soc: fix possible NULL pointer dereference in mtk_xdp_run net/mlx5e: Allocate flow steering storage during uplink initialization net: mscc: ocelot: report ndo_get_stats64 from the wraparound-resistant ocelot->stats net: mscc: ocelot: keep ocelot_stat_layout by reg address, not offset net: mscc: ocelot: make struct ocelot_stat_layout array indexable net: mscc: ocelot: fix race between ndo_get_stats64 and ocelot_check_stats_work net: mscc: ocelot: turn stats_lock into a spinlock net: mscc: ocelot: fix address of SYS_COUNT_TX_AGING counter net: mscc: ocelot: fix incorrect ndo_get_stats64 packet counters net: dsa: felix: fix ethtool 256-511 and 512-1023 TX packet counters net: dsa: don't warn in dsa_port_set_state_now() when driver doesn't support it ...
2022-08-18ip_tunnel: Respect tunnel key's "flow_flags" in IP tunnelsEyal Birger1-1/+2
Commit 451ef36bd229 ("ip_tunnels: Add new flow flags field to ip_tunnel_key") added a "flow_flags" member to struct ip_tunnel_key which was later used by the commit in the fixes tag to avoid dropping packets with sources that aren't locally configured when set in bpf_set_tunnel_key(). VXLAN and GENEVE were made to respect this flag, ip tunnels like IPIP and GRE were not. This commit fixes this omission by making ip_tunnel_init_flow() receive the flow flags from the tunnel key in the relevant collect_md paths. Fixes: b8fff748521c ("bpf: Set flow flag to allow any source IP in bpf_tunnel_key") Signed-off-by: Eyal Birger <eyal.birger@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Paul Chaignon <paul@isovalent.com> Link: https://lore.kernel.org/bpf/20220818074118.726639-1-eyal.birger@gmail.com
2022-08-18net: moxa: MAC address reading, generating, validity checkingSergei Antonov1-6/+7
This device does not remember its MAC address, so add a possibility to get it from the platform. If it fails, generate a random address. This will provide a MAC address early during boot without user space being involved. Also remove extra calls to is_valid_ether_addr(). Made after suggestions by Andrew Lunn: 1) Use eth_hw_addr_random() to assign a random MAC address during probe. 2) Remove is_valid_ether_addr() from moxart_mac_open() 3) Add a call to platform_get_ethdev_address() during probe 4) Remove is_valid_ether_addr() from moxart_set_mac_address(). The core does this v1 -> v2: Handle EPROBE_DEFER returned from platform_get_ethdev_address(). Move MAC reading code to the beginning of the probe function. Signed-off-by: Sergei Antonov <saproj@gmail.com> Suggested-by: Andrew Lunn <andrew@lunn.ch> CC: Yang Yingliang <yangyingliang@huawei.com> CC: Pavel Skripkin <paskripkin@gmail.com> CC: Guobin Huang <huangguobin4@huawei.com> CC: Yang Wei <yang.wei9@zte.com.cn> CC: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Link: https://lore.kernel.org/r/20220818092317.529557-1-saproj@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-18igb: Add lock to avoid data raceLin Ma2-1/+13
The commit c23d92b80e0b ("igb: Teardown SR-IOV before unregister_netdev()") places the unregister_netdev() call after the igb_disable_sriov() call to avoid functionality issue. However, it introduces several race conditions when detaching a device. For example, when .remove() is called, the below interleaving leads to use-after-free. (FREE from device detaching) | (USE from netdev core) igb_remove | igb_ndo_get_vf_config igb_disable_sriov | vf >= adapter->vfs_allocated_count? kfree(adapter->vf_data) | adapter->vfs_allocated_count = 0 | | memcpy(... adapter->vf_data[vf] Moreover, the igb_disable_sriov() also suffers from data race with the requests from VF driver. (FREE from device detaching) | (USE from requests) igb_remove | igb_msix_other igb_disable_sriov | igb_msg_task kfree(adapter->vf_data) | vf < adapter->vfs_allocated_count adapter->vfs_allocated_count = 0 | To this end, this commit first eliminates the data races from netdev core by using rtnl_lock (similar to commit 719479230893 ("dpaa2-eth: add MAC/PHY support through phylink")). And then adds a spinlock to eliminate races from driver requests. (similar to commit 1e53834ce541 ("ixgbe: Add locking to prevent panic when setting sriov_numvfs to zero") Fixes: c23d92b80e0b ("igb: Teardown SR-IOV before unregister_netdev()") Signed-off-by: Lin Ma <linma@zju.edu.cn> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://lore.kernel.org/r/20220817184921.735244-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-18Merge branch '100GbE' of ↵Jakub Kicinski6-18/+85
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2022-08-17 (ice) This series contains updates to ice driver only. Grzegorz prevents modifications to VLAN 0 when setting VLAN promiscuous as it will already be set. He also ignores -EEXIST error when attempting to set promiscuous and ensures promiscuous mode is properly cleared from the hardware when being removed. Benjamin ignores additional -EEXIST errors when setting promiscuous mode since the existing mode is the desired mode. Sylwester fixes VFs to allow sending of tagged traffic when no VLAN filters exist. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue: ice: Fix VF not able to send tagged traffic with no VLAN filters ice: Ignore error message when setting same promiscuous mode ice: Fix clearing of promisc mode with bridge over bond ice: Ignore EEXIST when setting promisc mode ice: Fix double VLAN error when entering promisc mode ==================== Link: https://lore.kernel.org/r/20220817171329.65285-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-18stmmac: intel: Add a missing clk_disable_unprepare() call in ↵Christophe JAILLET1-0/+1
intel_eth_pci_remove() Commit 09f012e64e4b ("stmmac: intel: Fix clock handling on error and remove paths") removed this clk_disable_unprepare() This was partly revert by commit ac322f86b56c ("net: stmmac: Fix clock handling on remove path") which removed this clk_disable_unprepare() because: " While unloading the dwmac-intel driver, clk_disable_unprepare() is being called twice in stmmac_dvr_remove() and intel_eth_pci_remove(). This causes kernel panic on the second call. " However later on, commit 5ec55823438e8 ("net: stmmac: add clocks management for gmac driver") has updated stmmac_dvr_remove() which do not call clk_disable_unprepare() anymore. So this call should now be called from intel_eth_pci_remove(). Fixes: 5ec55823438e8 ("net: stmmac: add clocks management for gmac driver") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/d7c8c1dadf40df3a7c9e643f76ffadd0ccc1ad1b.1660659689.git.christophe.jaillet@wanadoo.fr Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-17net: ethernet: mtk_eth_soc: fix possible NULL pointer dereference in mtk_xdp_runLorenzo Bianconi1-1/+1
Fix possible NULL pointer dereference in mtk_xdp_run() if the ebpf program returns XDP_TX and xdp_convert_buff_to_frame routine fails returning NULL. Fixes: 5886d26fd25bb ("net: ethernet: mtk_eth_soc: add xmit XDP support") Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://lore.kernel.org/r/627a07d759020356b64473e09f0855960e02db28.1660659112.git.lorenzo@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-17net/mlx5e: Allocate flow steering storage during uplink initializationLeon Romanovsky1-8/+17
IPsec code relies on valid priv->fs pointer that is the case in NIC flow, but not correct in uplink. Before commit that mentioned in the Fixes line, that pointer was valid in all flows as it was allocated together with priv struct. In addition, the cleanup representors routine called to that not-initialized priv->fs pointer and its internals which caused NULL deference. So, move FS allocation to be as early as possible. Fixes: af8bbf730068 ("net/mlx5e: Convert mlx5e_flow_steering member of mlx5e_priv to pointer") Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/ae46fa5bed3c67f937bfdfc0370101278f5422f1.1660639564.git.leonro@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-17net: mscc: ocelot: report ndo_get_stats64 from the wraparound-resistant ↵Vladimir Oltean1-27/+26
ocelot->stats Rather than reading the stats64 counters directly from the 32-bit hardware, it's better to rely on the output produced by the periodic ocelot_port_update_stats(). It would be even better to call ocelot_port_update_stats() right from ocelot_get_stats64() to make sure we report the current values rather than the ones from 2 seconds ago. But we need to export ocelot_port_update_stats() from the switch lib towards the switchdev driver for that, and future work will largely undo that. There are more ocelot-based drivers waiting to be introduced, an example of which is the SPI-controlled VSC7512. In that driver's case, it will be impossible to call ocelot_port_update_stats() from ndo_get_stats64 context, since the latter is atomic, and reading the stats over SPI is sleepable. So the compromise taken here, which will also hold going forward, is to report 64-bit counters to stats64, which are not 100% up to date. Fixes: a556c76adc05 ("net: mscc: Add initial Ocelot switch support") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-17net: mscc: ocelot: keep ocelot_stat_layout by reg address, not offsetVladimir Oltean5-285/+478
With so many counter addresses recently discovered as being wrong, it is desirable to at least have a central database of information, rather than two: one through the SYS_COUNT_* registers (used for ndo_get_stats64), and the other through the offset field of struct ocelot_stat_layout elements (used for ethtool -S). The strategy will be to keep the SYS_COUNT_* definitions as the single source of truth, but for that we need to expand our current definitions to cover all registers. Then we need to convert the ocelot region creation logic, and stats worker, to the read semantics imposed by going through SYS_COUNT_* absolute register addresses, rather than offsets of 32-bit words relative to SYS_COUNT_RX_OCTETS (which should have been SYS_CNT, by the way). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-17net: mscc: ocelot: make struct ocelot_stat_layout array indexableVladimir Oltean4-298/+1146
The ocelot counters are 32-bit and require periodic reading, every 2 seconds, by ocelot_port_update_stats(), so that wraparounds are detected. Currently, the counters reported by ocelot_get_stats64() come from the 32-bit hardware counters directly, rather than from the 64-bit accumulated ocelot->stats, and this is a problem for their integrity. The strategy is to make ocelot_get_stats64() able to cherry-pick individual stats from ocelot->stats the way in which it currently reads them out from SYS_COUNT_* registers. But currently it can't, because ocelot->stats is an opaque u64 array that's used only to feed data into ethtool -S. To solve that problem, we need to make ocelot->stats indexable, and associate each element with an element of struct ocelot_stat_layout used by ethtool -S. This makes ocelot_stat_layout a fat (and possibly sparse) array, so we need to change the way in which we access it. We no longer need OCELOT_STAT_END as a sentinel, because we know the array's size (OCELOT_NUM_STATS). We just need to skip the array elements that were left unpopulated for the switch revision (ocelot, felix, seville). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-17net: mscc: ocelot: fix race between ndo_get_stats64 and ocelot_check_stats_workVladimir Oltean1-0/+4
The 2 methods can run concurrently, and one will change the window of counters (SYS_STAT_CFG_STAT_VIEW) that the other sees. The fix is similar to what commit 7fbf6795d127 ("net: mscc: ocelot: fix mutex lock error during ethtool stats read") has done for ethtool -S. Fixes: a556c76adc05 ("net: mscc: Add initial Ocelot switch support") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-17net: mscc: ocelot: turn stats_lock into a spinlockVladimir Oltean2-8/+7
ocelot_get_stats64() currently runs unlocked and therefore may collide with ocelot_port_update_stats() which indirectly accesses the same counters. However, ocelot_get_stats64() runs in atomic context, and we cannot simply take the sleepable ocelot->stats_lock mutex. We need to convert it to an atomic spinlock first. Do that as a preparatory change. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-17net: mscc: ocelot: fix address of SYS_COUNT_TX_AGING counterVladimir Oltean1-1/+1
This register, used as part of stats->tx_dropped in ocelot_get_stats64(), has a wrong address. At the address currently given, there is actually the c_tx_green_prio_6 counter. Fixes: a556c76adc05 ("net: mscc: Add initial Ocelot switch support") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-17net: mscc: ocelot: fix incorrect ndo_get_stats64 packet countersVladimir Oltean4-28/+38
Reading stats using the SYS_COUNT_* register definitions is only used by ocelot_get_stats64() from the ocelot switchdev driver, however, currently the bucket definitions are incorrect. Separately, on both RX and TX, we have the following problems: - a 256-1023 bucket which actually tracks the 256-511 packets - the 1024-1526 bucket actually tracks the 512-1023 packets - the 1527-max bucket actually tracks the 1024-1526 packets => nobody tracks the packets from the real 1527-max bucket Additionally, the RX_PAUSE, RX_CONTROL, RX_LONGS and RX_CLASSIFIED_DROPS all track the wrong thing. However this doesn't seem to have any consequence, since ocelot_get_stats64() doesn't use these. Even though this problem only manifests itself for the switchdev driver, we cannot split the fix for ocelot and for DSA, since it requires fixing the bucket definitions from enum ocelot_reg, which makes us necessarily adapt the structures from felix and seville as well. Fixes: 84705fc16552 ("net: dsa: felix: introduce support for Seville VSC9953 switch") Fixes: 56051948773e ("net: dsa: ocelot: add driver for Felix switch family") Fixes: a556c76adc05 ("net: mscc: Add initial Ocelot switch support") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-17net: dsa: felix: fix ethtool 256-511 and 512-1023 TX packet countersVladimir Oltean1-1/+2
What the driver actually reports as 256-511 is in fact 512-1023, and the TX packets in the 256-511 bucket are not reported. Fix that. Fixes: 56051948773e ("net: dsa: ocelot: add driver for Felix switch family") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-17net: dsa: sja1105: fix buffer overflow in sja1105_setup_devlink_regions()Rustam Subkhankulov1-1/+1
If an error occurs in dsa_devlink_region_create(), then 'priv->regions' array will be accessed by negative index '-1'. Found by Linux Verification Center (linuxtesting.org) with SVACE. Signed-off-by: Rustam Subkhankulov <subkhankulov@ispras.ru> Fixes: bf425b82059e ("net: dsa: sja1105: expose static config as devlink region") Link: https://lore.kernel.org/r/20220817003845.389644-1-subkhankulov@ispras.ru Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-17net: dsa: microchip: ksz9477: fix fdb_dump last invalid entryArun Ramadoss1-0/+3
In the ksz9477_fdb_dump function it reads the ALU control register and exit from the timeout loop if there is valid entry or search is complete. After exiting the loop, it reads the alu entry and report to the user space irrespective of entry is valid. It works till the valid entry. If the loop exited when search is complete, it reads the alu table. The table returns all ones and it is reported to user space. So bridge fdb show gives ff:ff:ff:ff:ff:ff as last entry for every port. To fix it, after exiting the loop the entry is reported only if it is valid one. Fixes: b987e98e50ab ("dsa: add DSA switch driver for Microchip KSZ9477") Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Link: https://lore.kernel.org/r/20220816105516.18350-1-arun.ramadoss@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-17ice: Fix VF not able to send tagged traffic with no VLAN filtersSylwester Dziedziuch2-11/+57
VF was not able to send tagged traffic when it didn't have any VLAN interfaces and VLAN anti-spoofing was enabled. Fix this by allowing VFs with no VLAN filters to send tagged traffic. After VF adds a VLAN interface it will be able to send tagged traffic matching VLAN filters only. Testing hints: 1. Spawn VF 2. Send tagged packet from a VF 3. The packet should be sent out and not dropped 4. Add a VLAN interface on VF 5. Send tagged packet on that VLAN interface 6. Packet should be sent out and not dropped 7. Send tagged packet with id different than VLAN interface 8. Packet should be dropped Fixes: daf4dd16438b ("ice: Refactor spoofcheck configuration functions") Signed-off-by: Sylwester Dziedziuch <sylwesterx.dziedziuch@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-08-17ice: Ignore error message when setting same promiscuous modeBenjamin Mikailenko1-4/+4
Commit 1273f89578f2 ("ice: Fix broken IFF_ALLMULTI handling") introduced new checks when setting/clearing promiscuous mode. But if the requested promiscuous mode setting already exists, an -EEXIST error message would be printed. This is incorrect because promiscuous mode is either on/off and shouldn't print an error when the requested configuration is already set. This can happen when removing a bridge with two bonded interfaces and promiscuous most isn't fully cleared from VLAN VSI in hardware. Fix this by ignoring cases where requested promiscuous mode exists. Fixes: 1273f89578f2 ("ice: Fix broken IFF_ALLMULTI handling") Signed-off-by: Benjamin Mikailenko <benjamin.mikailenko@intel.com> Signed-off-by: Grzegorz Siwik <grzegorz.siwik@intel.com> Link: https://lore.kernel.org/all/CAK8fFZ7m-KR57M_rYX6xZN39K89O=LGooYkKsu6HKt0Bs+x6xQ@mail.gmail.com/ Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-08-17ice: Fix clearing of promisc mode with bridge over bondGrzegorz Siwik2-2/+16
When at least two interfaces are bonded and a bridge is enabled on the bond, an error can occur when the bridge is removed and re-added. The reason for the error is because promiscuous mode was not fully cleared from the VLAN VSI in the hardware. With this change, promiscuous mode is properly removed when the bridge disconnects from bonding. [ 1033.676359] bond1: link status definitely down for interface enp95s0f0, disabling it [ 1033.676366] bond1: making interface enp175s0f0 the new active one [ 1033.676369] device enp95s0f0 left promiscuous mode [ 1033.676522] device enp175s0f0 entered promiscuous mode [ 1033.676901] ice 0000:af:00.0 enp175s0f0: Error setting Multicast promiscuous mode on VSI 6 [ 1041.795662] ice 0000:af:00.0 enp175s0f0: Error setting Multicast promiscuous mode on VSI 6 [ 1041.944826] bond1: link status definitely down for interface enp175s0f0, disabling it [ 1041.944874] device enp175s0f0 left promiscuous mode [ 1041.944918] bond1: now running without any active interface! Fixes: c31af68a1b94 ("ice: Add outer_vlan_ops and VSI specific VLAN ops implementations") Co-developed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Grzegorz Siwik <grzegorz.siwik@intel.com> Link: https://lore.kernel.org/all/CAK8fFZ7m-KR57M_rYX6xZN39K89O=LGooYkKsu6HKt0Bs+x6xQ@mail.gmail.com/ Tested-by: Jaroslav Pulchart <jaroslav.pulchart@gooddata.com> Tested-by: Igor Raits <igor@gooddata.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-08-17ice: Ignore EEXIST when setting promisc modeGrzegorz Siwik1-1/+1
Ignore EEXIST error when setting promiscuous mode. This fix is needed because the driver could set promiscuous mode when it still has not cleared properly. Promiscuous mode could be set only once, so setting it second time will be rejected. Fixes: 5eda8afd6bcc ("ice: Add support for PF/VF promiscuous mode") Signed-off-by: Grzegorz Siwik <grzegorz.siwik@intel.com> Link: https://lore.kernel.org/all/CAK8fFZ7m-KR57M_rYX6xZN39K89O=LGooYkKsu6HKt0Bs+x6xQ@mail.gmail.com/ Tested-by: Jaroslav Pulchart <jaroslav.pulchart@gooddata.com> Tested-by: Igor Raits <igor@gooddata.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-08-17ice: Fix double VLAN error when entering promisc modeGrzegorz Siwik1-0/+7
Avoid enabling or disabling VLAN 0 when trying to set promiscuous VLAN mode if double VLAN mode is enabled. This fix is needed because the driver tries to add the VLAN 0 filter twice (once for inner and once for outer) when double VLAN mode is enabled. The filter program is rejected by the firmware when double VLAN is enabled, because the promiscuous filter only needs to be set once. This issue was missed in the initial implementation of double VLAN mode. Fixes: 5eda8afd6bcc ("ice: Add support for PF/VF promiscuous mode") Signed-off-by: Grzegorz Siwik <grzegorz.siwik@intel.com> Link: https://lore.kernel.org/all/CAK8fFZ7m-KR57M_rYX6xZN39K89O=LGooYkKsu6HKt0Bs+x6xQ@mail.gmail.com/ Tested-by: Jaroslav Pulchart <jaroslav.pulchart@gooddata.com> Tested-by: Igor Raits <igor@gooddata.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-08-16i40e: Fix to stop tx_timeout recovery if GLOBR failsAlan Brady1-1/+3
When a tx_timeout fires, the PF attempts to recover by incrementally resetting. First we try a PFR, then CORER and finally a GLOBR. If the GLOBR fails, then we keep hitting the tx_timeout and incrementing the recovery level and issuing dmesgs, which is both annoying to the user and accomplishes nothing. If the GLOBR fails, then we're pretty much totally hosed, and there's not much else we can do to recover, so this makes it such that we just kill the VSI and stop hitting the tx_timeout in such a case. Fixes: 41c445ff0f48 ("i40e: main driver core") Signed-off-by: Alan Brady <alan.brady@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-08-16i40e: Fix tunnel checksum offload with fragmented trafficPrzemyslaw Patynowski1-3/+5
Fix checksum offload on VXLAN tunnels. In case, when mpls protocol is not used, set l4 header to transport header of skb. This fixes case, when user tries to offload checksums of VXLAN tunneled traffic. Steps for reproduction (requires link partner with tunnels): ip l s enp130s0f0 up ip a f enp130s0f0 ip a a 10.10.110.2/24 dev enp130s0f0 ip l s enp130s0f0 mtu 1600 ip link add vxlan12_sut type vxlan id 12 group 238.168.100.100 dev \ enp130s0f0 dstport 4789 ip l s vxlan12_sut up ip a a 20.10.110.2/24 dev vxlan12_sut iperf3 -c 20.10.110.1 #should connect Without this patch, TX descriptor was using wrong data, due to l4 header pointing wrong address. NIC would then drop those packets internally, due to incorrect TX descriptor data, which increased GLV_TEPC register. Fixes: b4fb2d33514a ("i40e: Add support for MPLS + TSO") Signed-off-by: Przemyslaw Patynowski <przemyslawx.patynowski@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Tested-by: Marek Szlosek <marek.szlosek@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-08-16RDMA/cxgb4: fix accept failure due to increased cpl_t5_pass_accept_rpl sizePotnuri Bharat Teja1-1/+1
Commit 'c2ed5611afd7' has increased the cpl_t5_pass_accept_rpl{} structure size by 8B to avoid roundup. cpl_t5_pass_accept_rpl{} is a HW specific structure and increasing its size will lead to unwanted adapter errors. Current commit reverts the cpl_t5_pass_accept_rpl{} back to its original and allocates zeroed skb buffer there by avoiding the memset for iss field. Reorder code to minimize chip type checks. Fixes: c2ed5611afd7 ("iw_cxgb4: Use memset_startat() for cpl_t5_pass_accept_rpl") Link: https://lore.kernel.org/r/20220809184118.2029-1-rahul.lakkireddy@chelsio.com Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com> Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-08-16virtio_net: Revert "virtio_net: set the default max ring size by find_vqs()"Michael S. Tsirkin1-38/+4
This reverts commit 762faee5a2678559d3dc09d95f8f2c54cd0466a7. This has been reported to trip up guests on GCP (Google Cloud). The reason is that virtio_find_vqs_ctx_size is broken on legacy devices. We can in theory fix virtio_find_vqs_ctx_size but in fact the patch itself has several other issues: - It treats unknown speed as < 10G - It leaves userspace no way to find out the ring size set by hypervisor - It tests speed when link is down - It ignores the virtio spec advice: Both \field{speed} and \field{duplex} can change, thus the driver is expected to re-read these values after receiving a configuration change notification. - It is not clear the performance impact has been tested properly Revert the patch for now. Reported-by: Andres Freund <andres@anarazel.de> Link: https://lore.kernel.org/r/20220814212610.GA3690074%40roeck-us.net Link: https://lore.kernel.org/r/20220815070203.plwjx7b3cyugpdt7%40awork3.anarazel.de Link: https://lore.kernel.org/r/3df6bb82-1951-455d-a768-e9e1513eb667%40www.fastmail.com Link: https://lore.kernel.org/r/FCDC5DDE-3CDD-4B8A-916F-CA7D87B547CE%40anarazel.de Fixes: 762faee5a267 ("virtio_net: set the default max ring size by find_vqs()") Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Cc: Jason Wang <jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Tested-by: Andres Freund <andres@anarazel.de> Tested-by: Guenter Roeck <linux@roeck-us.net> Message-Id: <20220816053602.173815-2-mst@redhat.com>
2022-08-15Merge branch '40GbE' of ↵Jakub Kicinski2-7/+30
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2022-08-12 (iavf) This series contains updates to iavf driver only. Przemyslaw frees memory for admin queues in initialization error paths, prevents freeing of vf_res which is causing null pointer dereference, and adjusts calls in error path of reset to avoid iavf_close() which could cause deadlock. Ivan Vecera avoids deadlock that can occur when driver if part of failover. * '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue: iavf: Fix deadlock in initialization iavf: Fix reset error handling iavf: Fix NULL pointer dereference in iavf_get_link_ksettings iavf: Fix adminq error handling ==================== Link: https://lore.kernel.org/r/20220812172309.853230-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-15net: moxa: pass pdev instead of ndev to DMA functionsSergei Antonov1-10/+10
dma_map_single() calls fail in moxart_mac_setup_desc_ring() and moxart_mac_start_xmit() which leads to an incessant output of this: [ 16.043925] moxart-ethernet 92000000.mac eth0: DMA mapping error [ 16.050957] moxart-ethernet 92000000.mac eth0: DMA mapping error [ 16.058229] moxart-ethernet 92000000.mac eth0: DMA mapping error Passing pdev to DMA is a common approach among net drivers. Fixes: 6c821bd9edc9 ("net: Add MOXA ART SoCs ethernet driver") Signed-off-by: Sergei Antonov <saproj@gmail.com> Suggested-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20220812171339.2271788-1-saproj@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-15mlxsw: spectrum_ptp: Forbid PTP enablement only in RX or in TXAmit Cohen1-0/+3
Currently mlxsw driver configures one global PTP configuration for all ports. The reason is that the switch behaves like a transparent clock between CPU port and front-panel ports. When time stamp is enabled in any port, the hardware is configured to update the correction field. The fact that the configuration of CPU port affects all the ports, makes the correction field update to be global for all ports. Otherwise, user will see odd values in the correction field, as the switch will update the correction field in the CPU port, but not in all the front-panel ports. The CPU port is relevant in both RX and TX, so to avoid problematic configuration, forbid PTP enablement only in one direction, i.e., only in RX or TX. Without the change: $ hwstamp_ctl -i swp1 -r 12 -t 0 current settings: tx_type 0 rx_filter 0 new settings: tx_type 0 rx_filter 2 $ echo $? 0 With the change: $ hwstamp_ctl -i swp1 -r 12 -t 0 current settings: tx_type 1 rx_filter 2 SIOCSHWTSTAMP failed: Invalid argument Fixes: 08ef8bc825d96 ("mlxsw: spectrum_ptp: Support SIOCGHWTSTAMP, SIOCSHWTSTAMP ioctls") Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-15mlxsw: spectrum_ptp: Protect PTP configuration with a mutexAmit Cohen1-7/+20
Currently the functions mlxsw_sp2_ptp_{configure, deconfigure}_port() assume that they are called when RTNL is locked and they warn otherwise. The deconfigure function can be called when port is removed, for example as part of device reload, then there is no locked RTNL and the function warns [1]. To avoid such case, do not assume that RTNL protects this code, add a dedicated mutex instead. The mutex protects 'ptp_state->config' which stores the existing global configuration in hardware. Use this mutex also to protect the code which configures the hardware. Then, there will be only one configuration in any time, which will be updated in 'ptp_state' and a race will be avoided. [1]: RTNL: assertion failed at drivers/net/ethernet/mellanox/mlxsw/spectrum_ptp.c (1600) WARNING: CPU: 1 PID: 1583493 at drivers/net/ethernet/mellanox/mlxsw/spectrum_ptp.c:1600 mlxsw_sp2_ptp_hwtstamp_set+0x2d3/0x300 [mlxsw_spectrum] [...] CPU: 1 PID: 1583493 Comm: devlink Not tainted5.19.0-rc8-custom-127022-gb371dffda095 #789 Hardware name: Mellanox Technologies Ltd.MSN3420/VMOD0005, BIOS 5.11 01/06/2019 RIP: 0010:mlxsw_sp2_ptp_hwtstamp_set+0x2d3/0x300[mlxsw_spectrum] [...] Call Trace: <TASK> mlxsw_sp_port_remove+0x7e/0x190 [mlxsw_spectrum] mlxsw_sp_fini+0xd1/0x270 [mlxsw_spectrum] mlxsw_core_bus_device_unregister+0x55/0x280 [mlxsw_core] mlxsw_devlink_core_bus_device_reload_down+0x1c/0x30[mlxsw_core] devlink_reload+0x1ee/0x230 devlink_nl_cmd_reload+0x4de/0x580 genl_family_rcv_msg_doit+0xdc/0x140 genl_rcv_msg+0xd7/0x1d0 netlink_rcv_skb+0x49/0xf0 genl_rcv+0x1f/0x30 netlink_unicast+0x22f/0x350 netlink_sendmsg+0x208/0x440 __sys_sendto+0xf0/0x140 __x64_sys_sendto+0x1b/0x20 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x63/0xcd Fixes: 08ef8bc825d96 ("mlxsw: spectrum_ptp: Support SIOCGHWTSTAMP, SIOCSHWTSTAMP ioctls") Reported-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-15mlxsw: spectrum: Clear PTP configuration after unregistering the netdeviceAmit Cohen1-1/+1
Currently as part of removing port, PTP API is called to clear the existing configuration and set the 'rx_filter' and 'tx_type' to zero. The clearing is done before unregistering the netdevice, which means that there is a window of time in which the user can reconfigure PTP in the port, and this configuration will not be cleared. Reorder the operations, clear PTP configuration after unregistering the netdevice. Fixes: 8748642751ede ("mlxsw: spectrum: PTP: Support SIOCGHWTSTAMP, SIOCSHWTSTAMP ioctls") Signed-off-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-15mlxsw: spectrum_ptp: Fix compilation warningsAmit Cohen1-8/+10
In case that 'CONFIG_PTP_1588_CLOCK' is not enabled in the config file, there are implementations for the functions mlxsw_{sp,sp2}_ptp_txhdr_construct() as part of 'spectrum_ptp.h'. In this case, they should be defined as 'static' as they are not supposed to be used out of this file. Make the functions 'static', otherwise the following warnings are returned: "warning: no previous prototype for 'mlxsw_sp_ptp_txhdr_construct'" "warning: no previous prototype for 'mlxsw_sp2_ptp_txhdr_construct'" In addition, make the functions 'inline' for case that 'spectrum_ptp.h' will be included anywhere else and the functions would probably not be used, so compilation warnings about unused static will be returned. Fixes: 24157bc69f45 ("mlxsw: Send PTP packets as data packets to overcome a limitation") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-15Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/netDavid S. Miller2-2/+4
-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2022-08-11 (ice) This series contains updates to ice driver only. Benjamin corrects a misplaced parenthesis for a WARN_ON check. Michal removes WARN_ON from a check as its recoverable and not warranting of a call trace. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>