Age | Commit message (Collapse) | Author | Files | Lines |
|
Unlock before returning if mlx5_device_enable_sriov() fails.
Fixes: 84a433a40d0e ("net/mlx5: Lock mlx5 devlink reload callbacks")
Signed-off-by: Dan Carpenter <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
Call mlx5e_fs_vlan_free(fs) before kvfree(fs).
Fixes: af8bbf730068 ("net/mlx5e: Convert mlx5e_flow_steering member of mlx5e_priv to pointer")
Signed-off-by: Dan Carpenter <[email protected]>
Reviewed-by: Tariq Toukan <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
Use the list_for_each_entry_safe() macro to prevent dereferencing "obj"
after it has been freed.
Fixes: c4dfe704f53f ("net/mlx5e: kTLS, Recycle objects of device-offloaded TLS TX connections")
Signed-off-by: Dan Carpenter <[email protected]>
Reviewed-by: Tariq Toukan <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
Unlock before returning on this error path.
Fixes: f1bc646c9a06 ("net/mlx5: Use devl_ API in mlx5_esw_offloads_devlink_port_register")
Signed-off-by: Dan Carpenter <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
The cited commit reintroduced the ability to set hw-tc-offload
in switchdev mode by reusing NIC mode calls without modifying it
to support both modes, this can cause an illegal memory access
when trying to turn hw-tc-offload off.
Fix this by using the right TC_FLAG when checking if tc rules
are installed while disabling hw-tc-offload.
Fixes: d3cbd4254df8 ("net/mlx5e: Add ndo_set_feature for uplink representor")
Signed-off-by: Maor Dickman <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
There is a missing policer validation when offloading police action
with tc action api. Add it.
Fixes: 7d1a5ce46e47 ("net/mlx5e: TC, Support tc action api for police")
Signed-off-by: Roi Dayan <[email protected]>
Reviewed-by: Maor Dickman <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
Driver caches packet merge type in mlx5e_params instance which must be
in perfect sync with the netdev_feature's bit.
Prior to this patch, in certain conditions (*) LRO state was set in
mlx5e_params, while netdev_feature's bit was off. Causing the LRO to
be applied on the RQs (HW level).
(*) This can happen only on profile init (mlx5e_build_nic_params()),
when RQ expect non-linear SKB and PCI is fast enough in comparison to
link width.
Solution: remove setting of packet merge type from
mlx5e_build_nic_params() as netdev features are not updated.
Fixes: 619a8f2a42f1 ("net/mlx5e: Use linear SKB in Striding RQ")
Signed-off-by: Aya Levin <[email protected]>
Reviewed-by: Tariq Toukan <[email protected]>
Reviewed-by: Maxim Mikityanskiy <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
Add a lock_class_key per mlx5 device to avoid a false positive
"possible circular locking dependency" warning by lockdep, on flows
which lock more than one mlx5 device, such as adding SF.
kernel log:
======================================================
WARNING: possible circular locking dependency detected
5.19.0-rc8+ #2 Not tainted
------------------------------------------------------
kworker/u20:0/8 is trying to acquire lock:
ffff88812dfe0d98 (&dev->intf_state_mutex){+.+.}-{3:3}, at: mlx5_init_one+0x2e/0x490 [mlx5_core]
but task is already holding lock:
ffff888101aa7898 (&(¬ifier->n_head)->rwsem){++++}-{3:3}, at: blocking_notifier_call_chain+0x5a/0x130
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (&(¬ifier->n_head)->rwsem){++++}-{3:3}:
down_write+0x90/0x150
blocking_notifier_chain_register+0x53/0xa0
mlx5_sf_table_init+0x369/0x4a0 [mlx5_core]
mlx5_init_one+0x261/0x490 [mlx5_core]
probe_one+0x430/0x680 [mlx5_core]
local_pci_probe+0xd6/0x170
work_for_cpu_fn+0x4e/0xa0
process_one_work+0x7c2/0x1340
worker_thread+0x6f6/0xec0
kthread+0x28f/0x330
ret_from_fork+0x1f/0x30
-> #0 (&dev->intf_state_mutex){+.+.}-{3:3}:
__lock_acquire+0x2fc7/0x6720
lock_acquire+0x1c1/0x550
__mutex_lock+0x12c/0x14b0
mlx5_init_one+0x2e/0x490 [mlx5_core]
mlx5_sf_dev_probe+0x29c/0x370 [mlx5_core]
auxiliary_bus_probe+0x9d/0xe0
really_probe+0x1e0/0xaa0
__driver_probe_device+0x219/0x480
driver_probe_device+0x49/0x130
__device_attach_driver+0x1b8/0x280
bus_for_each_drv+0x123/0x1a0
__device_attach+0x1a3/0x460
bus_probe_device+0x1a2/0x260
device_add+0x9b1/0x1b40
__auxiliary_device_add+0x88/0xc0
mlx5_sf_dev_state_change_handler+0x67e/0x9d0 [mlx5_core]
blocking_notifier_call_chain+0xd5/0x130
mlx5_vhca_state_work_handler+0x2b0/0x3f0 [mlx5_core]
process_one_work+0x7c2/0x1340
worker_thread+0x59d/0xec0
kthread+0x28f/0x330
ret_from_fork+0x1f/0x30
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&(¬ifier->n_head)->rwsem);
lock(&dev->intf_state_mutex);
lock(&(¬ifier->n_head)->rwsem);
lock(&dev->intf_state_mutex);
*** DEADLOCK ***
4 locks held by kworker/u20:0/8:
#0: ffff888150612938 ((wq_completion)mlx5_events){+.+.}-{0:0}, at: process_one_work+0x6e2/0x1340
#1: ffff888100cafdb8 ((work_completion)(&work->work)#3){+.+.}-{0:0}, at: process_one_work+0x70f/0x1340
#2: ffff888101aa7898 (&(¬ifier->n_head)->rwsem){++++}-{3:3}, at: blocking_notifier_call_chain+0x5a/0x130
#3: ffff88813682d0e8 (&dev->mutex){....}-{3:3}, at:__device_attach+0x76/0x460
stack backtrace:
CPU: 6 PID: 8 Comm: kworker/u20:0 Not tainted 5.19.0-rc8+
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
Workqueue: mlx5_events mlx5_vhca_state_work_handler [mlx5_core]
Call Trace:
<TASK>
dump_stack_lvl+0x57/0x7d
check_noncircular+0x278/0x300
? print_circular_bug+0x460/0x460
? lock_chain_count+0x20/0x20
? register_lock_class+0x1880/0x1880
__lock_acquire+0x2fc7/0x6720
? register_lock_class+0x1880/0x1880
? register_lock_class+0x1880/0x1880
lock_acquire+0x1c1/0x550
? mlx5_init_one+0x2e/0x490 [mlx5_core]
? lockdep_hardirqs_on_prepare+0x400/0x400
__mutex_lock+0x12c/0x14b0
? mlx5_init_one+0x2e/0x490 [mlx5_core]
? mlx5_init_one+0x2e/0x490 [mlx5_core]
? _raw_read_unlock+0x1f/0x30
? mutex_lock_io_nested+0x1320/0x1320
? __ioremap_caller.constprop.0+0x306/0x490
? mlx5_sf_dev_probe+0x269/0x370 [mlx5_core]
? iounmap+0x160/0x160
mlx5_init_one+0x2e/0x490 [mlx5_core]
mlx5_sf_dev_probe+0x29c/0x370 [mlx5_core]
? mlx5_sf_dev_remove+0x130/0x130 [mlx5_core]
auxiliary_bus_probe+0x9d/0xe0
really_probe+0x1e0/0xaa0
__driver_probe_device+0x219/0x480
? auxiliary_match_id+0xe9/0x140
driver_probe_device+0x49/0x130
__device_attach_driver+0x1b8/0x280
? driver_allows_async_probing+0x140/0x140
bus_for_each_drv+0x123/0x1a0
? bus_for_each_dev+0x1a0/0x1a0
? lockdep_hardirqs_on_prepare+0x286/0x400
? trace_hardirqs_on+0x2d/0x100
__device_attach+0x1a3/0x460
? device_driver_attach+0x1e0/0x1e0
? kobject_uevent_env+0x22d/0xf10
bus_probe_device+0x1a2/0x260
device_add+0x9b1/0x1b40
? dev_set_name+0xab/0xe0
? __fw_devlink_link_to_suppliers+0x260/0x260
? memset+0x20/0x40
? lockdep_init_map_type+0x21a/0x7d0
__auxiliary_device_add+0x88/0xc0
? auxiliary_device_init+0x86/0xa0
mlx5_sf_dev_state_change_handler+0x67e/0x9d0 [mlx5_core]
blocking_notifier_call_chain+0xd5/0x130
mlx5_vhca_state_work_handler+0x2b0/0x3f0 [mlx5_core]
? mlx5_vhca_event_arm+0x100/0x100 [mlx5_core]
? lock_downgrade+0x6e0/0x6e0
? lockdep_hardirqs_on_prepare+0x286/0x400
process_one_work+0x7c2/0x1340
? lockdep_hardirqs_on_prepare+0x400/0x400
? pwq_dec_nr_in_flight+0x230/0x230
? rwlock_bug.part.0+0x90/0x90
worker_thread+0x59d/0xec0
? process_one_work+0x1340/0x1340
kthread+0x28f/0x330
? kthread_complete_and_exit+0x20/0x20
ret_from_fork+0x1f/0x30
</TASK>
Fixes: 6a3273217469 ("net/mlx5: SF, Port function state change support")
Signed-off-by: Moshe Shemesh <[email protected]>
Reviewed-by: Shay Drory <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
When the driver unloads, give/reclaim_pages may fail as PF driver in
teardown flow, current code will lead to the following kernel log print
'failed reclaiming pages: err 0'.
Fix it to get same behavior as before the cited commits,
by calling mlx5_cmd_check before handling error state.
mlx5_cmd_check will verify if the returned error is an actual error
needed to be handled by the driver or not and will return an
appropriate value.
Fixes: 8d564292a166 ("net/mlx5: Remove redundant error on reclaim pages")
Fixes: 4dac2f10ada0 ("net/mlx5: Remove redundant notify fail on give pages")
Signed-off-by: Roy Novich <[email protected]>
Reviewed-by: Moshe Shemesh <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
The lag_lock is taken from both process and softirq contexts which results
lockdep warning[0] about potential deadlock. However, just disabling
softirqs by using *_bh spinlock API is not enough since it will cause
warning in some contexts where the lock is obtained with hard irqs
disabled. To fix the issue save current irq state, disable them before
obtaining the lock an re-enable irqs from saved state after releasing it.
[0]:
[Sun Aug 7 13:12:29 2022] ================================
[Sun Aug 7 13:12:29 2022] WARNING: inconsistent lock state
[Sun Aug 7 13:12:29 2022] 5.19.0_for_upstream_debug_2022_08_04_16_06 #1 Not tainted
[Sun Aug 7 13:12:29 2022] --------------------------------
[Sun Aug 7 13:12:29 2022] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
[Sun Aug 7 13:12:29 2022] swapper/0/0 [HC0[0]:SC1[1]:HE1:SE0] takes:
[Sun Aug 7 13:12:29 2022] ffffffffa06dc0d8 (lag_lock){+.?.}-{2:2}, at: mlx5_lag_is_shared_fdb+0x1f/0x120 [mlx5_core]
[Sun Aug 7 13:12:29 2022] {SOFTIRQ-ON-W} state was registered at:
[Sun Aug 7 13:12:29 2022] lock_acquire+0x1c1/0x550
[Sun Aug 7 13:12:29 2022] _raw_spin_lock+0x2c/0x40
[Sun Aug 7 13:12:29 2022] mlx5_lag_add_netdev+0x13b/0x480 [mlx5_core]
[Sun Aug 7 13:12:29 2022] mlx5e_nic_enable+0x114/0x470 [mlx5_core]
[Sun Aug 7 13:12:29 2022] mlx5e_attach_netdev+0x30e/0x6a0 [mlx5_core]
[Sun Aug 7 13:12:29 2022] mlx5e_resume+0x105/0x160 [mlx5_core]
[Sun Aug 7 13:12:29 2022] mlx5e_probe+0xac3/0x14f0 [mlx5_core]
[Sun Aug 7 13:12:29 2022] auxiliary_bus_probe+0x9d/0xe0
[Sun Aug 7 13:12:29 2022] really_probe+0x1e0/0xaa0
[Sun Aug 7 13:12:29 2022] __driver_probe_device+0x219/0x480
[Sun Aug 7 13:12:29 2022] driver_probe_device+0x49/0x130
[Sun Aug 7 13:12:29 2022] __driver_attach+0x1e4/0x4d0
[Sun Aug 7 13:12:29 2022] bus_for_each_dev+0x11e/0x1a0
[Sun Aug 7 13:12:29 2022] bus_add_driver+0x3f4/0x5a0
[Sun Aug 7 13:12:29 2022] driver_register+0x20f/0x390
[Sun Aug 7 13:12:29 2022] __auxiliary_driver_register+0x14e/0x260
[Sun Aug 7 13:12:29 2022] mlx5e_init+0x38/0x90 [mlx5_core]
[Sun Aug 7 13:12:29 2022] vhost_iotlb_itree_augment_rotate+0xcb/0x180 [vhost_iotlb]
[Sun Aug 7 13:12:29 2022] do_one_initcall+0xc4/0x400
[Sun Aug 7 13:12:29 2022] do_init_module+0x18a/0x620
[Sun Aug 7 13:12:29 2022] load_module+0x563a/0x7040
[Sun Aug 7 13:12:29 2022] __do_sys_finit_module+0x122/0x1d0
[Sun Aug 7 13:12:29 2022] do_syscall_64+0x3d/0x90
[Sun Aug 7 13:12:29 2022] entry_SYSCALL_64_after_hwframe+0x46/0xb0
[Sun Aug 7 13:12:29 2022] irq event stamp: 3596508
[Sun Aug 7 13:12:29 2022] hardirqs last enabled at (3596508): [<ffffffff813687c2>] __local_bh_enable_ip+0xa2/0x100
[Sun Aug 7 13:12:29 2022] hardirqs last disabled at (3596507): [<ffffffff813687da>] __local_bh_enable_ip+0xba/0x100
[Sun Aug 7 13:12:29 2022] softirqs last enabled at (3596488): [<ffffffff81368a2a>] irq_exit_rcu+0x11a/0x170
[Sun Aug 7 13:12:29 2022] softirqs last disabled at (3596495): [<ffffffff81368a2a>] irq_exit_rcu+0x11a/0x170
[Sun Aug 7 13:12:29 2022]
other info that might help us debug this:
[Sun Aug 7 13:12:29 2022] Possible unsafe locking scenario:
[Sun Aug 7 13:12:29 2022] CPU0
[Sun Aug 7 13:12:29 2022] ----
[Sun Aug 7 13:12:29 2022] lock(lag_lock);
[Sun Aug 7 13:12:29 2022] <Interrupt>
[Sun Aug 7 13:12:29 2022] lock(lag_lock);
[Sun Aug 7 13:12:29 2022]
*** DEADLOCK ***
[Sun Aug 7 13:12:29 2022] 4 locks held by swapper/0/0:
[Sun Aug 7 13:12:29 2022] #0: ffffffff84643260 (rcu_read_lock){....}-{1:2}, at: mlx5e_napi_poll+0x43/0x20a0 [mlx5_core]
[Sun Aug 7 13:12:29 2022] #1: ffffffff84643260 (rcu_read_lock){....}-{1:2}, at: netif_receive_skb_list_internal+0x2d7/0xd60
[Sun Aug 7 13:12:29 2022] #2: ffff888144a18b58 (&br->hash_lock){+.-.}-{2:2}, at: br_fdb_update+0x301/0x570
[Sun Aug 7 13:12:29 2022] #3: ffffffff84643260 (rcu_read_lock){....}-{1:2}, at: atomic_notifier_call_chain+0x5/0x1d0
[Sun Aug 7 13:12:29 2022]
stack backtrace:
[Sun Aug 7 13:12:29 2022] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.19.0_for_upstream_debug_2022_08_04_16_06 #1
[Sun Aug 7 13:12:29 2022] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[Sun Aug 7 13:12:29 2022] Call Trace:
[Sun Aug 7 13:12:29 2022] <IRQ>
[Sun Aug 7 13:12:29 2022] dump_stack_lvl+0x57/0x7d
[Sun Aug 7 13:12:29 2022] mark_lock.part.0.cold+0x5f/0x92
[Sun Aug 7 13:12:29 2022] ? lock_chain_count+0x20/0x20
[Sun Aug 7 13:12:29 2022] ? unwind_next_frame+0x1c4/0x1b50
[Sun Aug 7 13:12:29 2022] ? secondary_startup_64_no_verify+0xcd/0xdb
[Sun Aug 7 13:12:29 2022] ? mlx5e_napi_poll+0x4e9/0x20a0 [mlx5_core]
[Sun Aug 7 13:12:29 2022] ? mlx5e_napi_poll+0x4e9/0x20a0 [mlx5_core]
[Sun Aug 7 13:12:29 2022] ? stack_access_ok+0x1d0/0x1d0
[Sun Aug 7 13:12:29 2022] ? start_kernel+0x3a7/0x3c5
[Sun Aug 7 13:12:29 2022] __lock_acquire+0x1260/0x6720
[Sun Aug 7 13:12:29 2022] ? lock_chain_count+0x20/0x20
[Sun Aug 7 13:12:29 2022] ? lock_chain_count+0x20/0x20
[Sun Aug 7 13:12:29 2022] ? register_lock_class+0x1880/0x1880
[Sun Aug 7 13:12:29 2022] ? mark_lock.part.0+0xed/0x3060
[Sun Aug 7 13:12:29 2022] ? stack_trace_save+0x91/0xc0
[Sun Aug 7 13:12:29 2022] lock_acquire+0x1c1/0x550
[Sun Aug 7 13:12:29 2022] ? mlx5_lag_is_shared_fdb+0x1f/0x120 [mlx5_core]
[Sun Aug 7 13:12:29 2022] ? lockdep_hardirqs_on_prepare+0x400/0x400
[Sun Aug 7 13:12:29 2022] ? __lock_acquire+0xd6f/0x6720
[Sun Aug 7 13:12:29 2022] _raw_spin_lock+0x2c/0x40
[Sun Aug 7 13:12:29 2022] ? mlx5_lag_is_shared_fdb+0x1f/0x120 [mlx5_core]
[Sun Aug 7 13:12:29 2022] mlx5_lag_is_shared_fdb+0x1f/0x120 [mlx5_core]
[Sun Aug 7 13:12:29 2022] mlx5_esw_bridge_rep_vport_num_vhca_id_get+0x1a0/0x600 [mlx5_core]
[Sun Aug 7 13:12:29 2022] ? mlx5_esw_bridge_update_work+0x90/0x90 [mlx5_core]
[Sun Aug 7 13:12:29 2022] ? lock_acquire+0x1c1/0x550
[Sun Aug 7 13:12:29 2022] mlx5_esw_bridge_switchdev_event+0x185/0x8f0 [mlx5_core]
[Sun Aug 7 13:12:29 2022] ? mlx5_esw_bridge_port_obj_attr_set+0x3e0/0x3e0 [mlx5_core]
[Sun Aug 7 13:12:29 2022] ? check_chain_key+0x24a/0x580
[Sun Aug 7 13:12:29 2022] atomic_notifier_call_chain+0xd7/0x1d0
[Sun Aug 7 13:12:29 2022] br_switchdev_fdb_notify+0xea/0x100
[Sun Aug 7 13:12:29 2022] ? br_switchdev_set_port_flag+0x310/0x310
[Sun Aug 7 13:12:29 2022] fdb_notify+0x11b/0x150
[Sun Aug 7 13:12:29 2022] br_fdb_update+0x34c/0x570
[Sun Aug 7 13:12:29 2022] ? lock_chain_count+0x20/0x20
[Sun Aug 7 13:12:29 2022] ? br_fdb_add_local+0x50/0x50
[Sun Aug 7 13:12:29 2022] ? br_allowed_ingress+0x5f/0x1070
[Sun Aug 7 13:12:29 2022] ? check_chain_key+0x24a/0x580
[Sun Aug 7 13:12:29 2022] br_handle_frame_finish+0x786/0x18e0
[Sun Aug 7 13:12:29 2022] ? check_chain_key+0x24a/0x580
[Sun Aug 7 13:12:29 2022] ? br_handle_local_finish+0x20/0x20
[Sun Aug 7 13:12:29 2022] ? __lock_acquire+0xd6f/0x6720
[Sun Aug 7 13:12:29 2022] ? sctp_inet_bind_verify+0x4d/0x190
[Sun Aug 7 13:12:29 2022] ? xlog_unpack_data+0x2e0/0x310
[Sun Aug 7 13:12:29 2022] ? br_handle_local_finish+0x20/0x20
[Sun Aug 7 13:12:29 2022] br_nf_hook_thresh+0x227/0x380 [br_netfilter]
[Sun Aug 7 13:12:29 2022] ? setup_pre_routing+0x460/0x460 [br_netfilter]
[Sun Aug 7 13:12:29 2022] ? br_handle_local_finish+0x20/0x20
[Sun Aug 7 13:12:29 2022] ? br_nf_pre_routing_ipv6+0x48b/0x69c [br_netfilter]
[Sun Aug 7 13:12:29 2022] br_nf_pre_routing_finish_ipv6+0x5c2/0xbf0 [br_netfilter]
[Sun Aug 7 13:12:29 2022] ? br_handle_local_finish+0x20/0x20
[Sun Aug 7 13:12:29 2022] br_nf_pre_routing_ipv6+0x4c6/0x69c [br_netfilter]
[Sun Aug 7 13:12:29 2022] ? br_validate_ipv6+0x9e0/0x9e0 [br_netfilter]
[Sun Aug 7 13:12:29 2022] ? br_nf_forward_arp+0xb70/0xb70 [br_netfilter]
[Sun Aug 7 13:12:29 2022] ? br_nf_pre_routing+0xacf/0x1160 [br_netfilter]
[Sun Aug 7 13:12:29 2022] br_handle_frame+0x8a9/0x1270
[Sun Aug 7 13:12:29 2022] ? br_handle_frame_finish+0x18e0/0x18e0
[Sun Aug 7 13:12:29 2022] ? register_lock_class+0x1880/0x1880
[Sun Aug 7 13:12:29 2022] ? br_handle_local_finish+0x20/0x20
[Sun Aug 7 13:12:29 2022] ? bond_handle_frame+0xf9/0xac0 [bonding]
[Sun Aug 7 13:12:29 2022] ? br_handle_frame_finish+0x18e0/0x18e0
[Sun Aug 7 13:12:29 2022] __netif_receive_skb_core+0x7c0/0x2c70
[Sun Aug 7 13:12:29 2022] ? check_chain_key+0x24a/0x580
[Sun Aug 7 13:12:29 2022] ? generic_xdp_tx+0x5b0/0x5b0
[Sun Aug 7 13:12:29 2022] ? __lock_acquire+0xd6f/0x6720
[Sun Aug 7 13:12:29 2022] ? register_lock_class+0x1880/0x1880
[Sun Aug 7 13:12:29 2022] ? check_chain_key+0x24a/0x580
[Sun Aug 7 13:12:29 2022] __netif_receive_skb_list_core+0x2d7/0x8a0
[Sun Aug 7 13:12:29 2022] ? lock_acquire+0x1c1/0x550
[Sun Aug 7 13:12:29 2022] ? process_backlog+0x960/0x960
[Sun Aug 7 13:12:29 2022] ? lockdep_hardirqs_on_prepare+0x129/0x400
[Sun Aug 7 13:12:29 2022] ? kvm_clock_get_cycles+0x14/0x20
[Sun Aug 7 13:12:29 2022] netif_receive_skb_list_internal+0x5f4/0xd60
[Sun Aug 7 13:12:29 2022] ? do_xdp_generic+0x150/0x150
[Sun Aug 7 13:12:29 2022] ? mlx5e_poll_rx_cq+0xf6b/0x2960 [mlx5_core]
[Sun Aug 7 13:12:29 2022] ? mlx5e_poll_ico_cq+0x3d/0x1590 [mlx5_core]
[Sun Aug 7 13:12:29 2022] napi_complete_done+0x188/0x710
[Sun Aug 7 13:12:29 2022] mlx5e_napi_poll+0x4e9/0x20a0 [mlx5_core]
[Sun Aug 7 13:12:29 2022] ? __queue_work+0x53c/0xeb0
[Sun Aug 7 13:12:29 2022] __napi_poll+0x9f/0x540
[Sun Aug 7 13:12:29 2022] net_rx_action+0x420/0xb70
[Sun Aug 7 13:12:29 2022] ? napi_threaded_poll+0x470/0x470
[Sun Aug 7 13:12:29 2022] ? __common_interrupt+0x79/0x1a0
[Sun Aug 7 13:12:29 2022] __do_softirq+0x271/0x92c
[Sun Aug 7 13:12:29 2022] irq_exit_rcu+0x11a/0x170
[Sun Aug 7 13:12:29 2022] common_interrupt+0x7d/0xa0
[Sun Aug 7 13:12:29 2022] </IRQ>
[Sun Aug 7 13:12:29 2022] <TASK>
[Sun Aug 7 13:12:29 2022] asm_common_interrupt+0x22/0x40
[Sun Aug 7 13:12:29 2022] RIP: 0010:default_idle+0x42/0x60
[Sun Aug 7 13:12:29 2022] Code: c1 83 e0 07 48 c1 e9 03 83 c0 03 0f b6 14 11 38 d0 7c 04 84 d2 75 14 8b 05 6b f1 22 02 85 c0 7e 07 0f 00 2d 80 3b 4a 00 fb f4 <c3> 48 c7 c7 e0 07 7e 85 e8 21 bd 40 fe eb de 66 66 2e 0f 1f 84 00
[Sun Aug 7 13:12:29 2022] RSP: 0018:ffffffff84407e18 EFLAGS: 00000242
[Sun Aug 7 13:12:29 2022] RAX: 0000000000000001 RBX: ffffffff84ec4a68 RCX: 1ffffffff0afc0fc
[Sun Aug 7 13:12:29 2022] RDX: 0000000000000004 RSI: 0000000000000000 RDI: ffffffff835b1fac
[Sun Aug 7 13:12:29 2022] RBP: 0000000000000000 R08: 0000000000000001 R09: ffff8884d2c44ac3
[Sun Aug 7 13:12:29 2022] R10: ffffed109a588958 R11: 00000000ffffffff R12: 0000000000000000
[Sun Aug 7 13:12:29 2022] R13: ffffffff84efac20 R14: 0000000000000000 R15: dffffc0000000000
[Sun Aug 7 13:12:29 2022] ? default_idle_call+0xcc/0x460
[Sun Aug 7 13:12:29 2022] default_idle_call+0xec/0x460
[Sun Aug 7 13:12:29 2022] do_idle+0x394/0x450
[Sun Aug 7 13:12:29 2022] ? arch_cpu_idle_exit+0x40/0x40
[Sun Aug 7 13:12:29 2022] cpu_startup_entry+0x19/0x20
[Sun Aug 7 13:12:29 2022] rest_init+0x156/0x250
[Sun Aug 7 13:12:29 2022] arch_call_rest_init+0xf/0x15
[Sun Aug 7 13:12:29 2022] start_kernel+0x3a7/0x3c5
[Sun Aug 7 13:12:29 2022] secondary_startup_64_no_verify+0xcd/0xdb
[Sun Aug 7 13:12:29 2022] </TASK>
Fixes: ff9b7521468b ("net/mlx5: Bridge, support LAG")
Signed-off-by: Vlad Buslov <[email protected]>
Reviewed-by: Mark Bloch <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
Make sure to modify the rule for uplink forwarding only for the case
where destination vport number is MLX5_VPORT_UPLINK.
Fixes: 94db33177819 ("net/mlx5: Support multiport eswitch mode")
Signed-off-by: Eli Cohen <[email protected]>
Reviewed-by: Maor Dickman <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
Only set MLX5_LAG_FLAG_NDEVS_READY if both netdevices are registered.
Doing so guarantees that both ldev->pf[MLX5_LAG_P0].dev and
ldev->pf[MLX5_LAG_P1].dev have valid pointers when
MLX5_LAG_FLAG_NDEVS_READY is set.
The core issue is asymmetry in setting MLX5_LAG_FLAG_NDEVS_READY and
clearing it. Setting it is done wrongly when both
ldev->pf[MLX5_LAG_P0].dev and ldev->pf[MLX5_LAG_P1].dev are set;
clearing it is done right when either of ldev->pf[i].netdev is cleared.
Consider the following scenario:
1. PF0 loads and sets ldev->pf[MLX5_LAG_P0].dev to a valid pointer
2. PF1 loads and sets both ldev->pf[MLX5_LAG_P1].dev and
ldev->pf[MLX5_LAG_P1].netdev with valid pointers. This results in
MLX5_LAG_FLAG_NDEVS_READY is set.
3. PF0 is unloaded before setting dev->pf[MLX5_LAG_P0].netdev.
MLX5_LAG_FLAG_NDEVS_READY remains set.
Further execution of mlx5_do_bond() will result in null pointer
dereference when calling mlx5_lag_is_multipath()
This patch fixes the following call trace actually encountered:
[ 1293.475195] BUG: kernel NULL pointer dereference, address: 00000000000009a8
[ 1293.478756] #PF: supervisor read access in kernel mode
[ 1293.481320] #PF: error_code(0x0000) - not-present page
[ 1293.483686] PGD 0 P4D 0
[ 1293.484434] Oops: 0000 [#1] SMP PTI
[ 1293.485377] CPU: 1 PID: 23690 Comm: kworker/u16:2 Not tainted 5.18.0-rc5_for_upstream_min_debug_2022_05_05_10_13 #1
[ 1293.488039] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 1293.490836] Workqueue: mlx5_lag mlx5_do_bond_work [mlx5_core]
[ 1293.492448] RIP: 0010:mlx5_lag_is_multipath+0x5/0x50 [mlx5_core]
[ 1293.494044] Code: e8 70 40 ff e0 48 8b 14 24 48 83 05 5c 1a 1b 00 01 e9 19 ff ff ff 48 83 05 47 1a 1b 00 01 eb d7 0f 1f 44 00 00 0f 1f 44 00 00 <48> 8b 87 a8 09 00 00 48 85 c0 74 26 48 83 05 a7 1b 1b 00 01 41 b8
[ 1293.498673] RSP: 0018:ffff88811b2fbe40 EFLAGS: 00010202
[ 1293.500152] RAX: ffff88818a94e1c0 RBX: ffff888165eca6c0 RCX: 0000000000000000
[ 1293.501841] RDX: 0000000000000001 RSI: ffff88818a94e1c0 RDI: 0000000000000000
[ 1293.503585] RBP: 0000000000000000 R08: ffff888119886740 R09: ffff888165eca73c
[ 1293.505286] R10: 0000000000000018 R11: 0000000000000018 R12: ffff88818a94e1c0
[ 1293.506979] R13: ffff888112729800 R14: 0000000000000000 R15: ffff888112729858
[ 1293.508753] FS: 0000000000000000(0000) GS:ffff88852cc40000(0000) knlGS:0000000000000000
[ 1293.510782] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1293.512265] CR2: 00000000000009a8 CR3: 00000001032d4002 CR4: 0000000000370ea0
[ 1293.514001] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1293.515806] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Fixes: 8a66e4585979 ("net/mlx5: Change ownership model for lag")
Signed-off-by: Eli Cohen <[email protected]>
Reviewed-by: Maor Dickman <[email protected]>
Reviewed-by: Mark Bloch <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
When querying mlx5 non-uplink representors capabilities with ethtool
rx-vlan-offload is marked as "off [fixed]". However, it is actually always
enabled because mlx5e_params->vlan_strip_disable is 0 by default when
initializing struct mlx5e_params instance. Fix the issue by explicitly
setting the vlan_strip_disable to 'true' for non-uplink representors.
Fixes: cb67b832921c ("net/mlx5e: Introduce SRIOV VF representors")
Signed-off-by: Vlad Buslov <[email protected]>
Reviewed-by: Roi Dayan <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
Ice driver allocates per cpu XDP queues so that redirect path can safely
use smp_processor_id() as an index to the array. At the same time
though, XDP rings are used to pick NAPI context to call napi_schedule()
or set NAPIF_STATE_MISSED. When user reduces queue count, say to 8, and
num_possible_cpus() of underlying platform is 44, then this means queue
vectors with correlated NAPI contexts will carry several XDP queues.
This in turn can result in a broken behavior where NAPI context of
interest will never be scheduled and AF_XDP socket will not process any
traffic.
To fix this, let us change the way how XDP rings are assigned to Rx
rings and use this information later on when setting
ice_tx_ring::xsk_pool pointer. For each Rx ring, grab the associated
queue vector and walk through Tx ring's linked list. Once we stumble
upon XDP ring in it, assign this ring to ice_rx_ring::xdp_ring.
Previous [0] approach of fixing this issue was for txonly scenario
because of the described grouping of XDP rings across queue vectors. So,
relying on Rx ring meant that NAPI context could be scheduled with a
queue vector without XDP ring with associated XSK pool.
[0]: https://lore.kernel.org/netdev/[email protected]/
Fixes: 2d4238f55697 ("ice: Add support for AF_XDP")
Fixes: 22bf877e528f ("ice: introduce XDP_TX fallback path")
Signed-off-by: Maciej Fijalkowski <[email protected]>
Tested-by: George Kuruvinakunnel <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
Fix the following scenario:
1. ethtool -L $IFACE rx 8 tx 96
2. xdpsock -q 10 -t -z
Above refers to a case where user would like to attach XSK socket in
txonly mode at a queue id that does not have a corresponding Rx queue.
At this moment ice's XSK logic is tightly bound to act on a "queue pair",
e.g. both Tx and Rx queues at a given queue id are disabled/enabled and
both of them will get XSK pool assigned, which is broken for the presented
queue configuration. This results in the splat included at the bottom,
which is basically an OOB access to Rx ring array.
To fix this, allow using the ids only in scope of "combined" queues
reported by ethtool. However, logic should be rewritten to allow such
configurations later on, which would end up as a complete rewrite of the
control path, so let us go with this temporary fix.
[420160.558008] BUG: kernel NULL pointer dereference, address: 0000000000000082
[420160.566359] #PF: supervisor read access in kernel mode
[420160.572657] #PF: error_code(0x0000) - not-present page
[420160.579002] PGD 0 P4D 0
[420160.582756] Oops: 0000 [#1] PREEMPT SMP NOPTI
[420160.588396] CPU: 10 PID: 21232 Comm: xdpsock Tainted: G OE 5.19.0-rc7+ #10
[420160.597893] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0008.031920191559 03/19/2019
[420160.609894] RIP: 0010:ice_xsk_pool_setup+0x44/0x7d0 [ice]
[420160.616968] Code: f3 48 83 ec 40 48 8b 4f 20 48 8b 3f 65 48 8b 04 25 28 00 00 00 48 89 44 24 38 31 c0 48 8d 04 ed 00 00 00 00 48 01 c1 48 8b 11 <0f> b7 92 82 00 00 00 48 85 d2 0f 84 2d 75 00 00 48 8d 72 ff 48 85
[420160.639421] RSP: 0018:ffffc9002d2afd48 EFLAGS: 00010282
[420160.646650] RAX: 0000000000000050 RBX: ffff88811d8bdd00 RCX: ffff888112c14ff8
[420160.655893] RDX: 0000000000000000 RSI: ffff88811d8bdd00 RDI: ffff888109861000
[420160.665166] RBP: 000000000000000a R08: 000000000000000a R09: 0000000000000000
[420160.674493] R10: 000000000000889f R11: 0000000000000000 R12: 000000000000000a
[420160.683833] R13: 000000000000000a R14: 0000000000000000 R15: ffff888117611828
[420160.693211] FS: 00007fa869fc1f80(0000) GS:ffff8897e0880000(0000) knlGS:0000000000000000
[420160.703645] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[420160.711783] CR2: 0000000000000082 CR3: 00000001d076c001 CR4: 00000000007706e0
[420160.721399] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[420160.731045] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[420160.740707] PKRU: 55555554
[420160.745960] Call Trace:
[420160.750962] <TASK>
[420160.755597] ? kmalloc_large_node+0x79/0x90
[420160.762703] ? __kmalloc_node+0x3f5/0x4b0
[420160.769341] xp_assign_dev+0xfd/0x210
[420160.775661] ? shmem_file_read_iter+0x29a/0x420
[420160.782896] xsk_bind+0x152/0x490
[420160.788943] __sys_bind+0xd0/0x100
[420160.795097] ? exit_to_user_mode_prepare+0x20/0x120
[420160.802801] __x64_sys_bind+0x16/0x20
[420160.809298] do_syscall_64+0x38/0x90
[420160.815741] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[420160.823731] RIP: 0033:0x7fa86a0dd2fb
[420160.830264] Code: c3 66 0f 1f 44 00 00 48 8b 15 69 8b 0c 00 f7 d8 64 89 02 b8 ff ff ff ff eb bc 0f 1f 44 00 00 f3 0f 1e fa b8 31 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 3d 8b 0c 00 f7 d8 64 89 01 48
[420160.855410] RSP: 002b:00007ffc1146f618 EFLAGS: 00000246 ORIG_RAX: 0000000000000031
[420160.866366] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fa86a0dd2fb
[420160.876957] RDX: 0000000000000010 RSI: 00007ffc1146f680 RDI: 0000000000000003
[420160.887604] RBP: 000055d7113a0520 R08: 00007fa868fb8000 R09: 0000000080000000
[420160.898293] R10: 0000000000008001 R11: 0000000000000246 R12: 000055d7113a04e0
[420160.909038] R13: 000055d7113a0320 R14: 000000000000000a R15: 0000000000000000
[420160.919817] </TASK>
[420160.925659] Modules linked in: ice(OE) af_packet binfmt_misc nls_iso8859_1 ipmi_ssif intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp mei_me coretemp ioatdma mei ipmi_si wmi ipmi_msghandler acpi_pad acpi_power_meter ip_tables x_tables autofs4 ixgbe i40e crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd ahci mdio dca libahci lpc_ich [last unloaded: ice]
[420160.977576] CR2: 0000000000000082
[420160.985037] ---[ end trace 0000000000000000 ]---
[420161.097724] RIP: 0010:ice_xsk_pool_setup+0x44/0x7d0 [ice]
[420161.107341] Code: f3 48 83 ec 40 48 8b 4f 20 48 8b 3f 65 48 8b 04 25 28 00 00 00 48 89 44 24 38 31 c0 48 8d 04 ed 00 00 00 00 48 01 c1 48 8b 11 <0f> b7 92 82 00 00 00 48 85 d2 0f 84 2d 75 00 00 48 8d 72 ff 48 85
[420161.134741] RSP: 0018:ffffc9002d2afd48 EFLAGS: 00010282
[420161.144274] RAX: 0000000000000050 RBX: ffff88811d8bdd00 RCX: ffff888112c14ff8
[420161.155690] RDX: 0000000000000000 RSI: ffff88811d8bdd00 RDI: ffff888109861000
[420161.168088] RBP: 000000000000000a R08: 000000000000000a R09: 0000000000000000
[420161.179295] R10: 000000000000889f R11: 0000000000000000 R12: 000000000000000a
[420161.190420] R13: 000000000000000a R14: 0000000000000000 R15: ffff888117611828
[420161.201505] FS: 00007fa869fc1f80(0000) GS:ffff8897e0880000(0000) knlGS:0000000000000000
[420161.213628] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[420161.223413] CR2: 0000000000000082 CR3: 00000001d076c001 CR4: 00000000007706e0
[420161.234653] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[420161.245893] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[420161.257052] PKRU: 55555554
Fixes: 2d4238f55697 ("ice: Add support for AF_XDP")
Signed-off-by: Maciej Fijalkowski <[email protected]>
Tested-by: George Kuruvinakunnel <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
When the pn532 uart device is detaching, the pn532_uart_remove()
is called. But there are no functions in pn532_uart_remove() that
could delete the cmd_timeout timer, which will cause use-after-free
bugs. The process is shown below:
(thread 1) | (thread 2)
| pn532_uart_send_frame
pn532_uart_remove | mod_timer(&pn532->cmd_timeout,...)
... | (wait a time)
kfree(pn532) //FREE | pn532_cmd_timeout
| pn532_uart_send_frame
| pn532->... //USE
This patch adds del_timer_sync() in pn532_uart_remove() in order to
prevent the use-after-free bugs. What's more, the pn53x_unregister_nfc()
is well synchronized, it sets nfc_dev->shutting_down to true and there
are no syscalls could restart the cmd_timeout timer.
Fixes: c656aa4c27b1 ("nfc: pn533: add UART phy driver")
Signed-off-by: Duoming Zhou <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The RX FIFO would be changed when suspending, so the related settings
have to be modified, too. Otherwise, the flow control would work
abnormally.
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=216333
Reported-by: Mark Blakeney <[email protected]>
Fixes: cdf0b86b250f ("r8152: fix a WOL issue")
Signed-off-by: Hayes Wang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The units of PLA_RX_FIFO_FULL and PLA_RX_FIFO_EMPTY are 16 bytes.
Fixes: 195aae321c82 ("r8152: support new chips")
Signed-off-by: Hayes Wang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
As the possible failure of the kmalloc(), it should be better
to fix this error path, check and return '-ENOMEM' error code.
Signed-off-by: Li Qiong <[email protected]>
Signed-off-by: Helge Deller <[email protected]>
|
|
Follow the advice of the below link and prefer 'strscpy' in this
subsystem. Conversion is 1:1 because the return value is not used.
Generated by a coccinelle script.
Link: https://lore.kernel.org/r/CAHk-=wgfRnXz0W3D37d01q3JFkr_i_uTL=V6A6G1oUZcprmknw@mail.gmail.com/
Signed-off-by: Wolfram Sang <[email protected]>
Signed-off-by: Helge Deller <[email protected]>
|
|
The double `was' is duplicated in the comment, remove one.
Signed-off-by: Jason Wang <[email protected]>
Signed-off-by: Helge Deller <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Ingo Molnar:
"Misc irqchip fixes: LoongArch driver fixes and a Hyper-V IOMMU fix"
* tag 'irq-urgent-2022-08-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/loongson-liointc: Fix an error handling path in liointc_init()
irqchip/loongarch: Fix irq_domain_alloc_fwnode() abuse
irqchip/loongson-pch-pic: Move find_pch_pic() into CONFIG_ACPI
irqchip/loongson-eiointc: Fix a build warning
irqchip/loongson-eiointc: Fix irq affinity setting
iommu/hyper-v: Use helper instead of directly accessing affinity
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
"A revert to fix a regression introduced this merge window and a fix
for proper error handling in the remove path of the iMX driver"
* tag 'i2c-for-6.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: imx: Make sure to unregister adapter on remove()
Revert "i2c: scmi: Replace open coded device_get_match_data()"
|
|
If for whatever reasons pm_runtime_resume_and_get() fails and .remove() is
exited early, the i2c adapter stays around and the irq still calls its
handler, while the driver data and the register mapping go away. So if
later the i2c adapter is accessed or the irq triggers this results in
havoc accessing freed memory and unmapped registers.
So unregister the software resources even if resume failed, and only skip
the hardware access in that case.
Fixes: 588eb93ea49f ("i2c: imx: add runtime pm support to improve the performance")
Signed-off-by: Uwe Kleine-König <[email protected]>
Acked-by: Oleksij Rempel <[email protected]>
Signed-off-by: Wolfram Sang <[email protected]>
|
|
This reverts commit 9ae551ded5ba55f96a83cd0811f7ef8c2f329d0c. We got a
regression report, so ensure this machine boots again. We will come back
with a better version hopefully.
Reported-by: Josef Johansson <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Wolfram Sang <[email protected]>
|
|
This reverts commit e7be8d1dd983156b ("zram: remove double compression
logic") as it causes zram failures. It does not revert cleanly, PTR_ERR
handling was introduced in the meantime. This is handled by appropriate
IS_ERR.
When under memory pressure, zs_malloc() can fail. Before the above
commit, the allocation was retried with direct reclaim enabled (GFP_NOIO).
After the commit, it is not -- only __GFP_KSWAPD_RECLAIM is tried.
So when the failure occurs under memory pressure, the overlaying
filesystem such as ext2 (mounted by ext4 module in this case) can emit
failures, making the (file)system unusable:
EXT4-fs warning (device zram0): ext4_end_bio:343: I/O error 10 writing to inode 16386 starting block 159744)
Buffer I/O error on device zram0, logical block 159744
With direct reclaim, memory is really reclaimed and allocation succeeds,
eventually. In the worst case, the oom killer is invoked, which is proper
outcome if user sets up zram too large (in comparison to available RAM).
This very diff doesn't apply to 5.19 (stable) cleanly (see PTR_ERR note
above). Use revert of e7be8d1dd983 directly.
Link: https://bugzilla.suse.com/show_bug.cgi?id=1202203
Link: https://lkml.kernel.org/r/[email protected]
Fixes: e7be8d1dd983 ("zram: remove double compression logic")
Signed-off-by: Jiri Slaby <[email protected]>
Reviewed-by: Sergey Senozhatsky <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Nitin Gupta <[email protected]>
Cc: Alexey Romanov <[email protected]>
Cc: Dmitry Rokosov <[email protected]>
Cc: Lukas Czerner <[email protected]>
Cc: <[email protected]> [5.19]
Signed-off-by: Andrew Morton <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Alexander Gordeev:
- Fix a KVM crash on z12 and older machines caused by a wrong
assumption that Query AP Configuration Information is always
available.
- Lower severity of excessive Hypervisor filesystem error messages
when booting under KVM.
* tag 's390-6.0-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/ap: fix crash on older machines based on QCI info missing
s390/hypfs: avoid error message under KVM
|
|
Pull rdma fixes from Jason Gunthorpe:
"A few minor fixes:
- Fix buffer management in SRP to correct a regression with the login
authentication feature from v5.17
- Don't iterate over non-present ports in mlx5
- Fix an error introduced by the foritify work in cxgb4
- Two bug fixes for the recently merged ERDMA driver
- Unbreak RDMA dmabuf support, a regresion from v5.19"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
RDMA: Handle the return code from dma_resv_wait_timeout() properly
RDMA/erdma: Correct the max_qp and max_cq capacities of the device
RDMA/erdma: Using the key in FMR WR instead of MR structure
RDMA/cxgb4: fix accept failure due to increased cpl_t5_pass_accept_rpl size
RDMA/mlx5: Use the proper number of ports
IB/iser: Fix login with authentication
|
|
Pull block fixes from Jens Axboe:
"A few fixes that should go into this release:
- Small series of patches for ublk (ZiyangZhang)
- Remove dead function (Yu)
- Fix for running a block queue in case of resource starvation
(Yufen)"
* tag 'block-6.0-2022-08-19' of git://git.kernel.dk/linux-block:
blk-mq: run queue no matter whether the request is the last request
blk-mq: remove unused function blk_mq_queue_stopped()
ublk_drv: do not add a re-issued request aborted previously to ioucmd's task_work
ublk_drv: update comment for __ublk_fail_req()
ublk_drv: check ubq_daemon_is_dying() in __ublk_rq_task_work()
ublk_drv: update iod->addr for UBLK_IO_NEED_GET_DATA
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata
Pull ATA fixes from Damien Le Moal:
- Add a missing command name definition for ata_get_cmd_name(), from
me.
- A fix to address a performance regression due to the default
max_sectors queue limit for ATA devices connected to AHCI adapters
being too small, from John.
* tag 'ata-6.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata:
ata: libata: Set __ATA_BASE_SHT max_sectors
ata: libata-eh: Add missing command name
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc
Pull MMC host fixes from Ulf Hansson:
- meson-gx: Fix error handling in ->probe()
- mtk-sd: Fix a command problem when using cqe off/disable
- pxamci: Fix error handling in ->probe()
- sdhci-of-dwcmshc: Fix broken support for the BlueField-3 variant
* tag 'mmc-v6.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
mmc: sdhci-of-dwcmshc: Re-enable support for the BlueField-3 SoC
mmc: meson-gx: Fix an error handling path in meson_mmc_probe()
mmc: mtk-sd: Clear interrupts when cqe off/disable
mmc: pxamci: Fix another error handling path in pxamci_probe()
mmc: pxamci: Fix an error handling path in pxamci_probe()
|
|
Passthrough users will set the scsi_cmnd->allowed value and were expecting
up to $allowed retries. The problem is that before:
commit 6aded12b10e0 ("scsi: core: Remove struct scsi_request")
we used to set the retries on the scsi_request then copy them over to
scsi_cmnd->allowed in scsi_setup_scsi_cmnd. With that patch we now set
scsi_cmnd->allowed to 0 in scsi_prepare_cmd and overwrite what the
passthrough user set.
This moves the allowed initialization to after the blk_rq_is_passthrough()
check so it's only done for the non-passthrough path where the ULD
init_command will normally set an allowed value it prefers.
Link: https://lore.kernel.org/r/[email protected]
Fixes: 6aded12b10e0 ("scsi: core: Remove struct scsi_request")
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Mike Christie <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>
|
|
The current power mode change timeout (180 s) is so large that it can cause
a watchdog timer to fire. Reduce the power mode change timeout to 10
seconds.
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Stanley Chu <[email protected]>
Signed-off-by: Bart Van Assche <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>
|
|
This reverts commit 6fc498bc82929ee23aa2f35a828c6178dfd3f823.
Commit 6fc498bc8292 states:
SCI should be updated, because it contains MAC in its first 6
octets.
That's not entirely correct. The SCI can be based on the MAC address,
but doesn't have to be. We can also use any 64-bit number as the
SCI. When the SCI based on the MAC address, it uses a 16-bit "port
number" provided by userspace, which commit 6fc498bc8292 overwrites
with 1.
In addition, changing the SCI after macsec has been setup can just
confuse the receiver. If we configure the RXSC on the peer based on
the original SCI, we should keep the same SCI on TX.
When the macsec device is being managed by a userspace key negotiation
daemon such as wpa_supplicant, commit 6fc498bc8292 would also
overwrite the SCI defined by userspace.
Fixes: 6fc498bc8292 ("net: macsec: update SCI upon MAC address change.")
Signed-off-by: Sabrina Dubroca <[email protected]>
Link: https://lore.kernel.org/r/9b1a9d28327e7eb54550a92eebda45d25e54dd0d.1660667033.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
As discussed in commit 73a21fa817f0 ("dpaa_eth: support all modes with
rate adapting PHYs"), we must add a workaround for Aquantia phys with
in-tree support in order to keep 1G support working. Update this
workaround for the AQR113C phy found on revision C LS1046ARDB boards.
Fixes: 12cf1b89a668 ("net: phy: Add support for AQR113C EPHY")
Signed-off-by: Sean Anderson <[email protected]>
Acked-by: Camelia Groza <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Include commits that weren't submitted during the 6.0 merge window.
Signed-off-by: Martin K. Petersen <[email protected]>
|
|
Although radeon card fence and wait for gpu to finish processing current batch rings,
there is still a corner case that radeon lockup work queue may not be fully flushed,
and meanwhile the radeon_suspend_kms() function has called pci_set_power_state() to
put device in D3hot state.
Per PCI spec rev 4.0 on 5.3.1.4.1 D3hot State.
> Configuration and Message requests are the only TLPs accepted by a Function in
> the D3hot state. All other received Requests must be handled as Unsupported Requests,
> and all received Completions may optionally be handled as Unexpected Completions.
This issue will happen in following logs:
Unable to handle kernel paging request at virtual address 00008800e0008010
CPU 0 kworker/0:3(131): Oops 0
pc = [<ffffffff811bea5c>] ra = [<ffffffff81240844>] ps = 0000 Tainted: G W
pc is at si_gpu_check_soft_reset+0x3c/0x240
ra is at si_dma_is_lockup+0x34/0xd0
v0 = 0000000000000000 t0 = fff08800e0008010 t1 = 0000000000010000
t2 = 0000000000008010 t3 = fff00007e3c00000 t4 = fff00007e3c00258
t5 = 000000000000ffff t6 = 0000000000000001 t7 = fff00007ef078000
s0 = fff00007e3c016e8 s1 = fff00007e3c00000 s2 = fff00007e3c00018
s3 = fff00007e3c00000 s4 = fff00007fff59d80 s5 = 0000000000000000
s6 = fff00007ef07bd98
a0 = fff00007e3c00000 a1 = fff00007e3c016e8 a2 = 0000000000000008
a3 = 0000000000000001 a4 = 8f5c28f5c28f5c29 a5 = ffffffff810f4338
t8 = 0000000000000275 t9 = ffffffff809b66f8 t10 = ff6769c5d964b800
t11= 000000000000b886 pv = ffffffff811bea20 at = 0000000000000000
gp = ffffffff81d89690 sp = 00000000aa814126
Disabling lock debugging due to kernel taint
Trace:
[<ffffffff81240844>] si_dma_is_lockup+0x34/0xd0
[<ffffffff81119610>] radeon_fence_check_lockup+0xd0/0x290
[<ffffffff80977010>] process_one_work+0x280/0x550
[<ffffffff80977350>] worker_thread+0x70/0x7c0
[<ffffffff80977410>] worker_thread+0x130/0x7c0
[<ffffffff80982040>] kthread+0x200/0x210
[<ffffffff809772e0>] worker_thread+0x0/0x7c0
[<ffffffff80981f8c>] kthread+0x14c/0x210
[<ffffffff80911658>] ret_from_kernel_thread+0x18/0x20
[<ffffffff80981e40>] kthread+0x0/0x210
Code: ad3e0008 43f0074a ad7e0018 ad9e0020 8c3001e8 40230101
<88210000> 4821ed21
So force lockup work queue flush to fix this problem.
Acked-by: Christian König <[email protected]>
Signed-off-by: Zhenneng Li <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
The file amdgpu_dm_plane.c missed the header amdgpu_dm_plane.h, which
resulted on the following warning:
drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm_plane.c:1046:5:
warning: no previous prototype for 'fill_dc_scaling_info'
[-Wmissing-prototypes]
drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm_plane.c:1222:6:
warning: no previous prototype for 'handle_cursor_update'
[-Wmissing-prototypes]
drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm_plane.c:152:6:
warning: no previous prototype for 'modifier_has_dcc'
[-Wmissing-prototypes]
drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm_plane.c:1576:5:
warning: no previous prototype for 'amdgpu_dm_plane_init'
[-Wmissing-prototypes]
drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm_plane.c:157:10:
warning: no previous prototype for 'modifier_gfx9_swizzle_mode'
[-Wmissing-prototypes]
drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm_plane.c:752:5:
warning: no previous prototype for 'fill_plane_buffer_attributes'
[-Wmissing-prototypes]
drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm_plane.c:83:31:
warning: no previous prototype for 'amd_get_format_info'
[-Wmissing-prototypes]
drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm_plane.c:88:6:
warning: no previous prototype for 'fill_blending_from_plane_state'
[-Wmissing-prototypes]
drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm_plane.c:992:5:
warning: no previous prototype for 'dm_plane_helper_check_state'
[-Wmissing-prototypes]
Therefore, include the missing header on the file and turn global functions
that are not used outside of the file into static functions.
Fixes: 5d945cbcd4b1 ("drm/amd/display: Create a file dedicated to planes")
Reported-by: kernel test robot <[email protected]>
Signed-off-by: Maíra Canal <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
The additional call is caused by merge conflict
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: shaoyunl <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
No need to set up rb when no gfx rings.
Signed-off-by: Candice Li <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Only amdgpu_get_xgmi_hive but no amdgpu_put_xgmi_hive
which will leak the hive reference.
Signed-off-by: YiPeng Chai <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
psp_hw_fini
V1:
The amdgpu_xgmi_remove_device function will send unload command
to psp through psp ring to terminate xgmi, but psp ring has been
destroyed in psp_hw_fini.
V2:
1. Change the commit title.
2. Restore amdgpu_xgmi_remove_device to its original calling location.
Move psp_xgmi_terminate call from amdgpu_xgmi_remove_device to
psp_hw_fini.
Signed-off-by: YiPeng Chai <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Enable GFXOFF allow control when set the GFX power gating.
Signed-off-by: Tim Huang <[email protected]>
Reviewed-by: Yifan Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V fixes from Palmer Dabbelt:
- A fix to make the ISA extension static keys writable after init. This
manifests at least as a crash when loading modules (including KVM).
- A fixup for a build warning related to a poorly formed comment in our
perf driver.
* tag 'riscv-for-linus-6.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
perf: riscv legacy: fix kerneldoc comment warning
riscv: Ensure isa-ext static keys are writable
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"The only significant core change is ASoC DPCM fix for asymmetric
setup; other remaining changes are device-specific fixes, including
the hardening of string manipulations.
One change in platform/x86 is the patch I forgot to apply from a
series for CS35L41 codec"
* tag 'sound-6.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (21 commits)
ALSA: hda/realtek: Add quirk for Clevo NS50PU, NS70PU
ALSA: info: Fix llseek return value when using callback
ALSA: hda/cs8409: Support new Dolphin Variants
platform/x86: serial-multi-instantiate: Add CLSA0101 Laptop
ALSA: hda/realtek: Add quirk for Lenovo Yoga7 14IAL7
ALSA: hda: cs35l41: Clarify support for CSC3551 without _DSD Properties
ALSA: hda/realtek: Add quirks for ASUS Zenbooks using CS35L41
ASoC: codec: tlv320aic32x4: fix mono playback via I2S
ASoC: rt5640: Fix the JD voltage dropping issue
ASoC: tas2770: Fix handling of mute/unmute
ASoC: tas2770: Drop conflicting set_bias_level power setting
ASoC: tas2770: Allow mono streams
ASoC: tas2770: Set correct FSYNC polarity
ASoC: Intel: fix sof_es8336 probe
ASoC: DPCM: Don't pick up BE without substream
ASoC: SOF: ipc3-topology: Fix clang -Wformat warning
ASoC: sh: rz-ssi: Improve error handling in rz_ssi_probe() error path
ASoC: SOF: Intel: hda: Fix potential buffer overflow by snprintf()
ASoC: SOF: debug: Fix potential buffer overflow by snprintf()
ASoC: Intel: avs: Fix potential buffer overflow by snprintf()
...
|
|
Pull drm fixes from Dave Airlie:
"Regular weekly fixes.
The nouveau patch just enables modesetting on GA103 hw which is like
other ampere cards that are already supported. amdgpu has 2 weeks of
fixes, as Alex was away, so a bit larger than usual, otherwise some
i915 and misc other fixes.
ttm:
- NULL ptr dereference
i915:
- disable pci resize on 32-bit systems
- don't leak the ccs state
- TLB invalidation fixes
nouveau:
- GA103 enablement
- off-by-one fix
amdgpu:
- Revert some DML stack changes
- Rounding fixes in KFD allocations
- atombios vram info table parsing fix
- DCN 3.1.4 fixes
- Clockgating fixes for various new IPs
- SMU 13.0.4 fixes
- DCN 3.1.4 FP fixes
- TMDS fixes for YCbCr420 4k modes
- DCN 3.2.x fixes
- USB 4 fixes
- SMU 13.0 fixes
- SMU driver unload memory leak fixes
- Display orientation fix
- Regression fix for generic fbdev conversion
- SDMA 6.x fixes
- SR-IOV fixes
- IH 6.x fixes
- Use after free fix in bo list handling
- Revert pipe1 support
- XGMI hive reset fix
amdkfd:
- Fix potential crach in kfd_create_indirect_link_prop()
imx:
- warning fix
meson:
- refcounting fix
lvds-codec:
- error check fix
sun4i:
- underflow fix
- dt-binding fix"
* tag 'drm-fixes-2022-08-19' of git://anongit.freedesktop.org/drm/drm: (109 commits)
Revert "drm/amd/amdgpu: add pipe1 hardware support"
drm/amdgpu: Fix use-after-free on amdgpu_bo_list mutex
drm/amdgpu: Fix interrupt handling on ih_soft ring
drm/amdgpu: Add secure display TA load for Renoir
drm/amd/display: Include scaling factor for SubVP command
drm/amdgpu/vcn: Return void from the stop_dbg_mode
drm/amdgpu: remove useless condition in amdgpu_job_stop_all_jobs_on_sched()
drm/amdgpu: Add decode_iv_ts helper for ih_v6 block
drm/amd/display: add chip revision to DCN32
drm/amd/display: avoid doing vm_init multiple time
drm/amd/display: Use pitch when calculating size to cache in MALL
drm/amd/display: Don't set DSC for phantom pipes
drm/amd/display: Update clock table policy for DCN314
drm/amd/display: Modify header inclusion pattern
drm/amd/display: Fix plug/unplug external monitor will hang while playback MPO video
drm/amd/display: Add debug parameter to retain default clock table
drm/amdgpu: Increase tlb flush timeout for sriov
drm/amd/display: do not compare integers of different widths
drm/amd/display: Add reserved dc_log_type.
drm/amd/display: Fix pixel clock programming
...
|
|
Currently we are assuming a one to one mapping between dmabuf and
GEM handle when releasing GEM handles.
But that is not always true, since we would create extra handles for the
GEM obj in cases like gem_open() and getfb{,2}().
A similar issue was reported at:
https://lore.kernel.org/all/[email protected]/
Another problem is that the imported dmabuf might not always have
gem_obj->dma_buf set, which would cause leaks in
drm_gem_remove_prime_handles().
Let's fix these for now by using handle to find the exact map to remove.
Signed-off-by: Jeffy Chen <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Christian König <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from netfilter.
Current release - regressions:
- tcp: fix cleanup and leaks in tcp_read_skb() (the new way BPF
socket maps get data out of the TCP stack)
- tls: rx: react to strparser initialization errors
- netfilter: nf_tables: fix scheduling-while-atomic splat
- net: fix suspicious RCU usage in bpf_sk_reuseport_detach()
Current release - new code bugs:
- mlxsw: ptp: fix a couple of races, static checker warnings and
error handling
Previous releases - regressions:
- netfilter:
- nf_tables: fix possible module reference underflow in error path
- make conntrack helpers deal with BIG TCP (skbs > 64kB)
- nfnetlink: re-enable conntrack expectation events
- net: fix potential refcount leak in ndisc_router_discovery()
Previous releases - always broken:
- sched: cls_route: disallow handle of 0
- neigh: fix possible local DoS due to net iface start/stop loop
- rtnetlink: fix module refcount leak in rtnetlink_rcv_msg
- sched: fix adding qlen to qcpu->backlog in gnet_stats_add_queue_cpu
- virtio_net: fix endian-ness for RSS
- dsa: mv88e6060: prevent crash on an unused port
- fec: fix timer capture timing in `fec_ptp_enable_pps()`
- ocelot: stats: fix races, integer wrapping and reading incorrect
registers (the change of register definitions here accounts for
bulk of the changed LoC in this PR)"
* tag 'net-6.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (77 commits)
net: moxa: MAC address reading, generating, validity checking
tcp: handle pure FIN case correctly
tcp: refactor tcp_read_skb() a bit
tcp: fix tcp_cleanup_rbuf() for tcp_read_skb()
tcp: fix sock skb accounting in tcp_read_skb()
igb: Add lock to avoid data race
dt-bindings: Fix incorrect "the the" corrections
net: genl: fix error path memory leak in policy dumping
stmmac: intel: Add a missing clk_disable_unprepare() call in intel_eth_pci_remove()
net: ethernet: mtk_eth_soc: fix possible NULL pointer dereference in mtk_xdp_run
net/mlx5e: Allocate flow steering storage during uplink initialization
net: mscc: ocelot: report ndo_get_stats64 from the wraparound-resistant ocelot->stats
net: mscc: ocelot: keep ocelot_stat_layout by reg address, not offset
net: mscc: ocelot: make struct ocelot_stat_layout array indexable
net: mscc: ocelot: fix race between ndo_get_stats64 and ocelot_check_stats_work
net: mscc: ocelot: turn stats_lock into a spinlock
net: mscc: ocelot: fix address of SYS_COUNT_TX_AGING counter
net: mscc: ocelot: fix incorrect ndo_get_stats64 packet counters
net: dsa: felix: fix ethtool 256-511 and 512-1023 TX packet counters
net: dsa: don't warn in dsa_port_set_state_now() when driver doesn't support it
...
|
|
https://gitlab.freedesktop.org/agd5f/linux into drm-fixes
amd-drm-fixes-6.0-2022-08-17:
amdgpu:
- Revert some DML stack changes
- Rounding fixes in KFD allocations
- atombios vram info table parsing fix
- DCN 3.1.4 fixes
- Clockgating fixes for various new IPs
- SMU 13.0.4 fixes
- DCN 3.1.4 FP fixes
- TMDS fixes for YCbCr420 4k modes
- DCN 3.2.x fixes
- USB 4 fixes
- SMU 13.0 fixes
- SMU driver unload memory leak fixes
- Display orientation fix
- Regression fix for generic fbdev conversion
- SDMA 6.x fixes
- SR-IOV fixes
- IH 6.x fixes
- Use after free fix in bo list handling
- Revert pipe1 support
- XGMI hive reset fix
amdkfd:
- Fix potential crach in kfd_create_indirect_link_prop()
Signed-off-by: Dave Airlie <[email protected]>
From: Alex Deucher <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
Fix the warning:
drivers/perf/riscv_pmu_legacy.c:76: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
Fixes: 9b3e150e310e ("RISC-V: Add a simple platform driver for RISC-V legacy perf")
Signed-off-by: Conor Dooley <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Palmer Dabbelt <[email protected]>
|