aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2024-06-05perf/core: Fix missing wakeup when waiting for context referenceHaifeng Xu1-0/+13
In our production environment, we found many hung tasks which are blocked for more than 18 hours. Their call traces are like this: [346278.191038] __schedule+0x2d8/0x890 [346278.191046] schedule+0x4e/0xb0 [346278.191049] perf_event_free_task+0x220/0x270 [346278.191056] ? init_wait_var_entry+0x50/0x50 [346278.191060] copy_process+0x663/0x18d0 [346278.191068] kernel_clone+0x9d/0x3d0 [346278.191072] __do_sys_clone+0x5d/0x80 [346278.191076] __x64_sys_clone+0x25/0x30 [346278.191079] do_syscall_64+0x5c/0xc0 [346278.191083] ? syscall_exit_to_user_mode+0x27/0x50 [346278.191086] ? do_syscall_64+0x69/0xc0 [346278.191088] ? irqentry_exit_to_user_mode+0x9/0x20 [346278.191092] ? irqentry_exit+0x19/0x30 [346278.191095] ? exc_page_fault+0x89/0x160 [346278.191097] ? asm_exc_page_fault+0x8/0x30 [346278.191102] entry_SYSCALL_64_after_hwframe+0x44/0xae The task was waiting for the refcount become to 1, but from the vmcore, we found the refcount has already been 1. It seems that the task didn't get woken up by perf_event_release_kernel() and got stuck forever. The below scenario may cause the problem. Thread A Thread B ... ... perf_event_free_task perf_event_release_kernel ... acquire event->child_mutex ... get_ctx ... release event->child_mutex acquire ctx->mutex ... perf_free_event (acquire/release event->child_mutex) ... release ctx->mutex wait_var_event acquire ctx->mutex acquire event->child_mutex # move existing events to free_list release event->child_mutex release ctx->mutex put_ctx ... ... In this case, all events of the ctx have been freed, so we couldn't find the ctx in free_list and Thread A will miss the wakeup. It's thus necessary to add a wakeup after dropping the reference. Fixes: 1cf8dfe8a661 ("perf/core: Fix race between close() and fork()") Signed-off-by: Haifeng Xu <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Frederic Weisbecker <[email protected]> Acked-by: Mark Rutland <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected]
2024-06-05Merge branch 'mlx5-fixes'David S. Miller4-0/+19
Tariq Toukan says: ==================== mlx5 core fixes 20240603 This small patchset provides two bug fixes from the team to the mlx5 core driver. Series generated against: commit 33700a0c9b56 ("net/tcp: Don't consider TCP_CLOSE in TCP_AO_ESTABLISHED") ==================== Signed-off-by: David S. Miller <[email protected]>
2024-06-05net/mlx5: Always stop health timer during driver removalShay Drory1-0/+3
Currently, if teardown_hca fails to execute during driver removal, mlx5 does not stop the health timer. Afterwards, mlx5 continue with driver teardown. This may lead to a UAF bug, which results in page fault Oops[1], since the health timer invokes after resources were freed. Hence, stop the health monitor even if teardown_hca fails. [1] mlx5_core 0000:18:00.0: E-Switch: Unload vfs: mode(LEGACY), nvfs(0), necvfs(0), active vports(0) mlx5_core 0000:18:00.0: E-Switch: Disable: mode(LEGACY), nvfs(0), necvfs(0), active vports(0) mlx5_core 0000:18:00.0: E-Switch: Disable: mode(LEGACY), nvfs(0), necvfs(0), active vports(0) mlx5_core 0000:18:00.0: E-Switch: cleanup mlx5_core 0000:18:00.0: wait_func:1155:(pid 1967079): TEARDOWN_HCA(0x103) timeout. Will cause a leak of a command resource mlx5_core 0000:18:00.0: mlx5_function_close:1288:(pid 1967079): tear_down_hca failed, skip cleanup BUG: unable to handle page fault for address: ffffa26487064230 PGD 100c00067 P4D 100c00067 PUD 100e5a067 PMD 105ed7067 PTE 0 Oops: 0000 [#1] PREEMPT SMP PTI CPU: 0 PID: 0 Comm: swapper/0 Tainted: G OE ------- --- 6.7.0-68.fc38.x86_64 #1 Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0013.121520200651 12/15/2020 RIP: 0010:ioread32be+0x34/0x60 RSP: 0018:ffffa26480003e58 EFLAGS: 00010292 RAX: ffffa26487064200 RBX: ffff9042d08161a0 RCX: ffff904c108222c0 RDX: 000000010bbf1b80 RSI: ffffffffc055ddb0 RDI: ffffa26487064230 RBP: ffff9042d08161a0 R08: 0000000000000022 R09: ffff904c108222e8 R10: 0000000000000004 R11: 0000000000000441 R12: ffffffffc055ddb0 R13: ffffa26487064200 R14: ffffa26480003f00 R15: ffff904c108222c0 FS: 0000000000000000(0000) GS:ffff904c10800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffa26487064230 CR3: 00000002c4420006 CR4: 00000000007706f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: <IRQ> ? __die+0x23/0x70 ? page_fault_oops+0x171/0x4e0 ? exc_page_fault+0x175/0x180 ? asm_exc_page_fault+0x26/0x30 ? __pfx_poll_health+0x10/0x10 [mlx5_core] ? __pfx_poll_health+0x10/0x10 [mlx5_core] ? ioread32be+0x34/0x60 mlx5_health_check_fatal_sensors+0x20/0x100 [mlx5_core] ? __pfx_poll_health+0x10/0x10 [mlx5_core] poll_health+0x42/0x230 [mlx5_core] ? __next_timer_interrupt+0xbc/0x110 ? __pfx_poll_health+0x10/0x10 [mlx5_core] call_timer_fn+0x21/0x130 ? __pfx_poll_health+0x10/0x10 [mlx5_core] __run_timers+0x222/0x2c0 run_timer_softirq+0x1d/0x40 __do_softirq+0xc9/0x2c8 __irq_exit_rcu+0xa6/0xc0 sysvec_apic_timer_interrupt+0x72/0x90 </IRQ> <TASK> asm_sysvec_apic_timer_interrupt+0x1a/0x20 RIP: 0010:cpuidle_enter_state+0xcc/0x440 ? cpuidle_enter_state+0xbd/0x440 cpuidle_enter+0x2d/0x40 do_idle+0x20d/0x270 cpu_startup_entry+0x2a/0x30 rest_init+0xd0/0xd0 arch_call_rest_init+0xe/0x30 start_kernel+0x709/0xa90 x86_64_start_reservations+0x18/0x30 x86_64_start_kernel+0x96/0xa0 secondary_startup_64_no_verify+0x18f/0x19b ---[ end trace 0000000000000000 ]--- Fixes: 9b98d395b85d ("net/mlx5: Start health poll at earlier stage of driver load") Signed-off-by: Shay Drory <[email protected]> Reviewed-by: Moshe Shemesh <[email protected]> Signed-off-by: Tariq Toukan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-06-05net/mlx5: Stop waiting for PCI if pci channel is offlineMoshe Shemesh3-0/+16
In case pci channel becomes offline the driver should not wait for PCI reads during health dump and recovery flow. The driver has timeout for each of these loops trying to read PCI, so it would fail anyway. However, in case of recovery waiting till timeout may cause the pci error_detected() callback fail to meet pci_dpc_recovered() wait timeout. Fixes: b3bd076f7501 ("net/mlx5: Report devlink health on FW fatal issues") Signed-off-by: Moshe Shemesh <[email protected]> Reviewed-by: Shay Drori <[email protected]> Signed-off-by: Tariq Toukan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-06-05net: ethernet: mtk_eth_soc: handle dma buffer size soc specificFrank Wunderlich2-36/+77
The mainline MTK ethernet driver suffers long time from rarly but annoying tx queue timeouts. We think that this is caused by fixed dma sizes hardcoded for all SoCs. We suspect this problem arises from a low level of free TX DMADs, the TX Ring alomost full. The transmit timeout is caused by the Tx queue not waking up. The Tx queue stops when the free counter is less than ring->thres, and it will wake up once the free counter is greater than ring->thres. If the CPU is too late to wake up the Tx queues, it may cause a transmit timeout. Therefore, we increased the TX and RX DMADs to improve this error situation. Use the dma-size implementation from SDK in a per SoC manner. In difference to SDK we have no RSS feature yet, so all RX/TX sizes should be raised from 512 to 2048 byte except fqdma on mt7988 to avoid the tx timeout issue. Fixes: 656e705243fd ("net-next: mediatek: add support for MT7623 ethernet") Suggested-by: Daniel Golle <[email protected]> Signed-off-by: Frank Wunderlich <[email protected]> Reviewed-by: Jacob Keller <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-06-05arm64/io: add constant-argument checkArnd Bergmann1-20/+16
In some configurations __const_iowrite32_copy() does not get inlined and gcc runs into the BUILD_BUG(): In file included from <command-line>: In function '__const_memcpy_toio_aligned32', inlined from '__const_iowrite32_copy' at arch/arm64/include/asm/io.h:203:3, inlined from '__const_iowrite32_copy' at arch/arm64/include/asm/io.h:199:20: include/linux/compiler_types.h:487:45: error: call to '__compiletime_assert_538' declared with attribute error: BUILD_BUG failed 487 | _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__) | ^ include/linux/compiler_types.h:468:25: note: in definition of macro '__compiletime_assert' 468 | prefix ## suffix(); \ | ^~~~~~ include/linux/compiler_types.h:487:9: note: in expansion of macro '_compiletime_assert' 487 | _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__) | ^~~~~~~~~~~~~~~~~~~ include/linux/build_bug.h:39:37: note: in expansion of macro 'compiletime_assert' 39 | #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg) | ^~~~~~~~~~~~~~~~~~ include/linux/build_bug.h:59:21: note: in expansion of macro 'BUILD_BUG_ON_MSG' 59 | #define BUILD_BUG() BUILD_BUG_ON_MSG(1, "BUILD_BUG failed") | ^~~~~~~~~~~~~~~~ arch/arm64/include/asm/io.h:193:17: note: in expansion of macro 'BUILD_BUG' 193 | BUILD_BUG(); | ^~~~~~~~~ Move the check for constant arguments into the inline function to ensure it is still constant if the compiler decides against inlining it, and mark them as __always_inline to override the logic that sometimes leads to the compiler not producing the simplified output. Note that either the __always_inline annotation or the check for a constant value are sufficient here, but combining the two looks cleaner as it also avoids the macro. With clang-8 and older, the macro was still needed, but all versions of gcc and clang can reliably perform constant folding here. Fixes: ead79118dae6 ("arm64/io: Provide a WC friendly __iowriteXX_copy()") Signed-off-by: Arnd Bergmann <[email protected]> Reviewed-by: Mark Rutland <[email protected]> Reviewed-by: Jason Gunthorpe <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]>
2024-06-05rtnetlink: make the "split" NLM_DONE handling genericJakub Kicinski4-9/+45
Jaroslav reports Dell's OMSA Systems Management Data Engine expects NLM_DONE in a separate recvmsg(), both for rtnl_dump_ifinfo() and inet_dump_ifaddr(). We already added a similar fix previously in commit 460b0d33cf10 ("inet: bring NLM_DONE out to a separate recv() again") Instead of modifying all the dump handlers, and making them look different than modern for_each_netdev_dump()-based dump handlers - put the workaround in rtnetlink code. This will also help us move the custom rtnl-locking from af_netlink in the future (in net-next). Note that this change is not touching rtnl_dump_all(). rtnl_dump_all() is different kettle of fish and a potential problem. We now mix families in a single recvmsg(), but NLM_DONE is not coalesced. Tested: ./cli.py --dbg-small-recv 4096 --spec netlink/specs/rt_addr.yaml \ --dump getaddr --json '{"ifa-family": 2}' ./cli.py --dbg-small-recv 4096 --spec netlink/specs/rt_route.yaml \ --dump getroute --json '{"rtm-family": 2}' ./cli.py --dbg-small-recv 4096 --spec netlink/specs/rt_link.yaml \ --dump getlink Fixes: 3e41af90767d ("rtnetlink: use xarray iterator to implement rtnl_dump_ifinfo()") Fixes: cdb2f80f1c10 ("inet: use xa_array iterator to implement inet_dump_ifaddr()") Reported-by: Jaroslav Pulchart <[email protected]> Link: https://lore.kernel.org/all/CAK8fFZ7MKoFSEzMBDAOjoUt+vTZRRQgLDNXEOfdCCXSoXXKE0g@mail.gmail.com Signed-off-by: Jakub Kicinski <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-06-05Merge branch 'tcp-mptcp-close-wait'David S. Miller2-3/+12
Jason Xing says: ==================== tcp/mptcp: count CLOSE-WAIT for CurrEstab Taking CLOSE-WAIT sockets into CurrEstab counters is in accordance with RFC 1213, as suggested by Eric and Neal. v5 Link: https://lore.kernel.org/all/[email protected]/ 1. add more detailed comment (Matthieu) v4 Link: https://lore.kernel.org/all/[email protected]/ 1. correct the Fixes: tag in patch [2/2]. (Eric) Previous discussion Link: https://lore.kernel.org/all/[email protected]/ ==================== Signed-off-by: David S. Miller <[email protected]>
2024-06-05mptcp: count CLOSE-WAIT sockets for MPTCP_MIB_CURRESTABJason Xing1-2/+7
Like previous patch does in TCP, we need to adhere to RFC 1213: "tcpCurrEstab OBJECT-TYPE ... The number of TCP connections for which the current state is either ESTABLISHED or CLOSE- WAIT." So let's consider CLOSE-WAIT sockets. The logic of counting When we increment the counter? a) Only if we change the state to ESTABLISHED. When we decrement the counter? a) if the socket leaves ESTABLISHED and will never go into CLOSE-WAIT, say, on the client side, changing from ESTABLISHED to FIN-WAIT-1. b) if the socket leaves CLOSE-WAIT, say, on the server side, changing from CLOSE-WAIT to LAST-ACK. Fixes: d9cd27b8cd19 ("mptcp: add CurrEstab MIB counter support") Signed-off-by: Jason Xing <[email protected]> Reviewed-by: Matthieu Baerts (NGI0) <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-06-05tcp: count CLOSE-WAIT sockets for TCP_MIB_CURRESTABJason Xing1-1/+5
According to RFC 1213, we should also take CLOSE-WAIT sockets into consideration: "tcpCurrEstab OBJECT-TYPE ... The number of TCP connections for which the current state is either ESTABLISHED or CLOSE- WAIT." After this, CurrEstab counter will display the total number of ESTABLISHED and CLOSE-WAIT sockets. The logic of counting When we increment the counter? a) if we change the state to ESTABLISHED. b) if we change the state from SYN-RECEIVED to CLOSE-WAIT. When we decrement the counter? a) if the socket leaves ESTABLISHED and will never go into CLOSE-WAIT, say, on the client side, changing from ESTABLISHED to FIN-WAIT-1. b) if the socket leaves CLOSE-WAIT, say, on the server side, changing from CLOSE-WAIT to LAST-ACK. Please note: there are two chances that old state of socket can be changed to CLOSE-WAIT in tcp_fin(). One is SYN-RECV, the other is ESTABLISHED. So we have to take care of the former case. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Jason Xing <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-06-05KVM: x86/mmu: Don't save mmu_invalidate_seq after checking private attrTao Su1-3/+0
Drop the second snapshot of mmu_invalidate_seq in kvm_faultin_pfn(). Before checking the mismatch of private vs. shared, mmu_invalidate_seq is saved to fault->mmu_seq, which can be used to detect an invalidation related to the gfn occurred, i.e. KVM will not install a mapping in page table if fault->mmu_seq != mmu_invalidate_seq. Currently there is a second snapshot of mmu_invalidate_seq, which may not be same as the first snapshot in kvm_faultin_pfn(), i.e. the gfn attribute may be changed between the two snapshots, but the gfn may be mapped in page table without hindrance. Therefore, drop the second snapshot as it has no obvious benefits. Fixes: f6adeae81f35 ("KVM: x86/mmu: Handle no-slot faults at the beginning of kvm_faultin_pfn()") Signed-off-by: Tao Su <[email protected]> Message-ID: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2024-06-05Merge tag 'kvmarm-fixes-6.10-1' of ↵Paolo Bonzini21-73/+391
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 fixes for 6.10, take #1 - Large set of FP/SVE fixes for pKVM, addressing the fallout from the per-CPU data rework and making sure that the host is not involved in the FP/SVE switching any more - Allow FEAT_BTI to be enabled with NV now that FEAT_PAUTH is copletely supported - Fix for the respective priorities of Failed PAC, Illegal Execution state and Instruction Abort exceptions - Fix the handling of AArch32 instruction traps failing their condition code, which was broken by the introduction of ESR_EL2.ISS2 - Allow vpcus running in AArch32 state to be restored in System mode - Fix AArch32 GPR restore that would lose the 64 bit state under some conditions
2024-06-05arm64: armv8_deprecated: Fix warning in isndep cpuhp starting processWei Li1-0/+3
The function run_all_insn_set_hw_mode() is registered as startup callback of 'CPUHP_AP_ARM64_ISNDEP_STARTING', it invokes set_hw_mode() methods of all emulated instructions. As the STARTING callbacks are not expected to fail, if one of the set_hw_mode() fails, e.g. due to el0 mixed-endian is not supported for 'setend', it will report a warning: ``` CPU[2] cannot support the emulation of setend CPU 2 UP state arm64/isndep:starting (136) failed (-22) CPU2: Booted secondary processor 0x0000000002 [0x414fd0c1] ``` To fix it, add a check for INSN_UNAVAILABLE status and skip the process. Signed-off-by: Wei Li <[email protected]> Tested-by: Huisong Li <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]>
2024-06-05selftests: hsr: add missing config for CONFIG_BRIDGEHangbin Liu1-0/+1
hsr_redbox.sh test need to create bridge for testing. Add the missing config CONFIG_BRIDGE in config file. Fixes: eafbf0574e05 ("test: hsr: Extend the hsr_redbox.sh to have more SAN devices connected") Signed-off-by: Hangbin Liu <[email protected]> Tested-by: Simon Horman <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-06-05vxlan: Fix regression when dropping packets due to invalid src addressesDaniel Borkmann1-4/+4
Commit f58f45c1e5b9 ("vxlan: drop packets from invalid src-address") has recently been added to vxlan mainly in the context of source address snooping/learning so that when it is enabled, an entry in the FDB is not being created for an invalid address for the corresponding tunnel endpoint. Before commit f58f45c1e5b9 vxlan was similarly behaving as geneve in that it passed through whichever macs were set in the L2 header. It turns out that this change in behavior breaks setups, for example, Cilium with netkit in L3 mode for Pods as well as tunnel mode has been passing before the change in f58f45c1e5b9 for both vxlan and geneve. After mentioned change it is only passing for geneve as in case of vxlan packets are dropped due to vxlan_set_mac() returning false as source and destination macs are zero which for E/W traffic via tunnel is totally fine. Fix it by only opting into the is_valid_ether_addr() check in vxlan_set_mac() when in fact source address snooping/learning is actually enabled in vxlan. This is done by moving the check into vxlan_snoop(). With this change, the Cilium connectivity test suite passes again for both tunnel flavors. Fixes: f58f45c1e5b9 ("vxlan: drop packets from invalid src-address") Signed-off-by: Daniel Borkmann <[email protected]> Cc: David Bauer <[email protected]> Cc: Ido Schimmel <[email protected]> Cc: Nikolay Aleksandrov <[email protected]> Cc: Martin KaFai Lau <[email protected]> Reviewed-by: Ido Schimmel <[email protected]> Reviewed-by: Nikolay Aleksandrov <[email protected]> Reviewed-by: David Bauer <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-06-05net: sched: sch_multiq: fix possible OOB write in multiq_tune()Hangyu Hua1-1/+1
q->bands will be assigned to qopt->bands to execute subsequent code logic after kmalloc. So the old q->bands should not be used in kmalloc. Otherwise, an out-of-bounds write will occur. Fixes: c2999f7fb05b ("net: sched: multiq: don't call qdisc_put() while holding tree lock") Signed-off-by: Hangyu Hua <[email protected]> Acked-by: Cong Wang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-06-05ionic: fix kernel panic in XDP_TX actionTaehee Yoo1-0/+1
In the XDP_TX path, ionic driver sends a packet to the TX path with rx page and corresponding dma address. After tx is done, ionic_tx_clean() frees that page. But RX ring buffer isn't reset to NULL. So, it uses a freed page, which causes kernel panic. BUG: unable to handle page fault for address: ffff8881576c110c PGD 773801067 P4D 773801067 PUD 87f086067 PMD 87efca067 PTE 800ffffea893e060 Oops: Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN NOPTI CPU: 1 PID: 25 Comm: ksoftirqd/1 Not tainted 6.9.0+ #11 Hardware name: ASUS System Product Name/PRIME Z690-P D4, BIOS 0603 11/01/2021 RIP: 0010:bpf_prog_f0b8caeac1068a55_balancer_ingress+0x3b/0x44f Code: 00 53 41 55 41 56 41 57 b8 01 00 00 00 48 8b 5f 08 4c 8b 77 00 4c 89 f7 48 83 c7 0e 48 39 d8 RSP: 0018:ffff888104e6fa28 EFLAGS: 00010283 RAX: 0000000000000002 RBX: ffff8881576c1140 RCX: 0000000000000002 RDX: ffffffffc0051f64 RSI: ffffc90002d33048 RDI: ffff8881576c110e RBP: ffff888104e6fa88 R08: 0000000000000000 R09: ffffed1027a04a23 R10: 0000000000000000 R11: 0000000000000000 R12: ffff8881b03a21a8 R13: ffff8881589f800f R14: ffff8881576c1100 R15: 00000001576c1100 FS: 0000000000000000(0000) GS:ffff88881ae00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffff8881576c110c CR3: 0000000767a90000 CR4: 00000000007506f0 PKRU: 55555554 Call Trace: <TASK> ? __die+0x20/0x70 ? page_fault_oops+0x254/0x790 ? __pfx_page_fault_oops+0x10/0x10 ? __pfx_is_prefetch.constprop.0+0x10/0x10 ? search_bpf_extables+0x165/0x260 ? fixup_exception+0x4a/0x970 ? exc_page_fault+0xcb/0xe0 ? asm_exc_page_fault+0x22/0x30 ? 0xffffffffc0051f64 ? bpf_prog_f0b8caeac1068a55_balancer_ingress+0x3b/0x44f ? do_raw_spin_unlock+0x54/0x220 ionic_rx_service+0x11ab/0x3010 [ionic 9180c3001ab627d82bbc5f3ebe8a0decaf6bb864] ? ionic_tx_clean+0x29b/0xc60 [ionic 9180c3001ab627d82bbc5f3ebe8a0decaf6bb864] ? __pfx_ionic_tx_clean+0x10/0x10 [ionic 9180c3001ab627d82bbc5f3ebe8a0decaf6bb864] ? __pfx_ionic_rx_service+0x10/0x10 [ionic 9180c3001ab627d82bbc5f3ebe8a0decaf6bb864] ? ionic_tx_cq_service+0x25d/0xa00 [ionic 9180c3001ab627d82bbc5f3ebe8a0decaf6bb864] ? __pfx_ionic_rx_service+0x10/0x10 [ionic 9180c3001ab627d82bbc5f3ebe8a0decaf6bb864] ionic_cq_service+0x69/0x150 [ionic 9180c3001ab627d82bbc5f3ebe8a0decaf6bb864] ionic_txrx_napi+0x11a/0x540 [ionic 9180c3001ab627d82bbc5f3ebe8a0decaf6bb864] __napi_poll.constprop.0+0xa0/0x440 net_rx_action+0x7e7/0xc30 ? __pfx_net_rx_action+0x10/0x10 Fixes: 8eeed8373e1c ("ionic: Add XDP_TX support") Signed-off-by: Taehee Yoo <[email protected]> Reviewed-by: Shannon Nelson <[email protected]> Reviewed-by: Brett Creeley <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-06-05net: phy: Micrel KSZ8061: fix errata solution not taking effect problemTristram Ha1-1/+41
KSZ8061 needs to write to a MMD register at driver initialization to fix an errata. This worked in 5.0 kernel but not in newer kernels. The issue is the main phylib code no longer resets PHY at the very beginning. Calling phy resuming code later will reset the chip if it is already powered down at the beginning. This wipes out the MMD register write. Solution is to implement a phy resume function for KSZ8061 to take care of this problem. Fixes: 232ba3a51cc2 ("net: phy: Micrel KSZ8061: link failure after cable connect") Signed-off-by: Tristram Ha <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-06-05net/smc: avoid overwriting when adjusting sock bufsizesWen Gu1-20/+2
When copying smc settings to clcsock, avoid setting clcsock's sk_sndbuf to sysctl_tcp_wmem[1], since this may overwrite the value set by tcp_sndbuf_expand() in TCP connection establishment. And the other setting sk_{snd|rcv}buf to sysctl value in smc_adjust_sock_bufsizes() can also be omitted since the initialization of smc sock and clcsock has set sk_{snd|rcv}buf to smc.sysctl_{w|r}mem or ipv4_sysctl_tcp_{w|r}mem[1]. Fixes: 30c3c4a4497c ("net/smc: Use correct buffer sizes when switching between TCP and SMC") Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Wen Gu <[email protected]> Reviewed-by: Wenjia Zhang <[email protected]> Reviewed-by: Gerd Bayer <[email protected]>, too. Signed-off-by: David S. Miller <[email protected]>
2024-06-05octeontx2-af: Always allocate PF entries from low prioriy zoneSubbaraya Sundeep1-11/+22
PF mcam entries has to be at low priority always so that VF can install longest prefix match rules at higher priority. This was taken care currently but when priority allocation wrt reference entry is requested then entries are allocated from mid-zone instead of low priority zone. Fix this and always allocate entries from low priority zone for PFs. Fixes: 7df5b4b260dd ("octeontx2-af: Allocate low priority entries for PF") Signed-off-by: Subbaraya Sundeep <[email protected]> Reviewed-by: Jacob Keller <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-06-05efi: Add missing __nocfi annotations to runtime wrappersArd Biesheuvel1-7/+6
The EFI runtime wrappers are a sandbox for calling into EFI runtime services, which are invoked using indirect calls. When running with kCFI enabled, the compiler will require the target of any indirect call to be type annotated. Given that the EFI runtime services prototypes and calling convention are governed by the EFI spec, not the Linux kernel, adding such type annotations for firmware routines is infeasible, and so the compiler must be informed that prototype validation should be omitted. Add the __nocfi annotation at the appropriate places in the EFI runtime wrapper code to achieve this. Note that this currently only affects 32-bit ARM, given that other architectures that support both kCFI and EFI use an asm wrapper to call EFI runtime services, and this hides the indirect call from the compiler. Fixes: 1a4fec49efe5 ("ARM: 9392/2: Support CLANG CFI") Reviewed-by: Linus Walleij <[email protected]> Tested-by: Nathan Chancellor <[email protected]> Signed-off-by: Ard Biesheuvel <[email protected]>
2024-06-05Revert "xsk: Document ability to redirect to any socket bound to the same umem"Magnus Karlsson1-19/+14
This reverts commit 968595a93669b6b4f6d1fcf80cf2d97956b6868f. Reported-by: Yuval El-Hanany <[email protected]> Signed-off-by: Magnus Karlsson <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Link: https://lore.kernel.org/xdp-newbies/[email protected] Link: https://lore.kernel.org/bpf/[email protected]
2024-06-05Revert "xsk: Support redirect to any socket bound to the same umem"Magnus Karlsson1-4/+1
This reverts commit 2863d665ea41282379f108e4da6c8a2366ba66db. This patch introduced a potential kernel crash when multiple napi instances redirect to the same AF_XDP socket. By removing the queue_index check, it is possible for multiple napi instances to access the Rx ring at the same time, which will result in a corrupted ring state which can lead to a crash when flushing the rings in __xsk_flush(). This can happen when the linked list of sockets to flush gets corrupted by concurrent accesses. A quick and small fix is not possible, so let us revert this for now. Reported-by: Yuval El-Hanany <[email protected]> Signed-off-by: Magnus Karlsson <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Link: https://lore.kernel.org/xdp-newbies/[email protected] Link: https://lore.kernel.org/bpf/[email protected]
2024-06-05bpf: Set run context for rawtp test_run callbackJiri Olsa1-0/+6
syzbot reported crash when rawtp program executed through the test_run interface calls bpf_get_attach_cookie helper or any other helper that touches task->bpf_ctx pointer. Setting the run context (task->bpf_ctx pointer) for test_run callback. Fixes: 7adfc6c9b315 ("bpf: Add bpf_get_attach_cookie() BPF helper to access bpf_cookie value") Reported-by: [email protected] Signed-off-by: Jiri Olsa <[email protected]> Signed-off-by: Andrii Nakryiko <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Closes: https://syzkaller.appspot.com/bug?extid=3ab78ff125b7979e45f9 Link: https://lore.kernel.org/bpf/[email protected]
2024-06-05tpm: Switch to new Intel CPU model definesTony Luck2-2/+2
New CPU #defines encode vendor and family as well as model. Link: https://lore.kernel.org/all/[email protected]/ Signed-off-by: Tony Luck <[email protected]> Reviewed-by: Jarkko Sakkinen <[email protected]> Signed-off-by: Jarkko Sakkinen <[email protected]>
2024-06-05tpm_tis: Do *not* flush uninitialized workJan Beulich1-1/+2
tpm_tis_core_init() may fail before tpm_tis_probe_irq_single() is called, in which case tpm_tis_remove() unconditionally calling flush_work() is triggering a warning for .func still being NULL. Cc: [email protected] # v6.5+ Fixes: 481c2d14627d ("tpm,tpm_tis: Disable interrupts after 1000 unhandled IRQs") Signed-off-by: Jan Beulich <[email protected]> Reviewed-by: Jarkko Sakkinen <[email protected]> Signed-off-by: Jarkko Sakkinen <[email protected]>
2024-06-04Merge tag 'devicetree-fixes-for-6.10-1' of ↵Linus Torvalds6-76/+92
git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux Pull devicetree fixes from Rob Herring: - Fix regression in 'interrupt-map' handling affecting Apple M1 mini (at least) - Fix binding example warning in stm32 st,mlahb binding - Fix schema error in Allwinner platform binding causing lots of spurious warnings - Add missing MODULE_DESCRIPTION() to DT kunit tests * tag 'devicetree-fixes-for-6.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: of: property: Fix fw_devlink handling of interrupt-map of/irq: Factor out parsing of interrupt-map parent phandle+args from of_irq_parse_raw() dt-bindings: arm: stm32: st,mlahb: Drop spurious "reg" property from example dt-bindings: arm: sunxi: Fix incorrect '-' usage of: of_test: add MODULE_DESCRIPTION()
2024-06-04Merge tag 'linux_kselftest-fixes-6.10-rc3' of ↵Linus Torvalds11-15/+47
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull kselftest fixes from Shuah Khan: "Fixes to build warnings in several tests and fixes to ftrace tests" * tag 'linux_kselftest-fixes-6.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: selftests/futex: don't pass a const char* to asprintf(3) selftests/futex: don't redefine .PHONY targets (all, clean) selftests/tracing: Fix event filter test to retry up to 10 times selftests/futex: pass _GNU_SOURCE without a value to the compiler selftests/overlayfs: Fix build error on ppc64 selftests/openat2: Fix build warnings on ppc64 selftests: cachestat: Fix build warnings on ppc64 tracing/selftests: Fix kprobe event name test for .isra. functions selftests/ftrace: Update required config selftests/ftrace: Fix to check required event file kselftest/alsa: Ensure _GNU_SOURCE is defined
2024-06-04Merge branch 'efi/next' into efi/urgentArd Biesheuvel2-4/+5
2024-06-04PCI: Revert the cfg_access_lock lockdep mechanismDan Williams5-15/+0
While the experiment did reveal that there are additional places that are missing the lock during secondary bus reset, one of the places that needs to take cfg_access_lock (pci_bus_lock()) is not prepared for lockdep annotation. Specifically, pci_bus_lock() takes pci_dev_lock() recursively and is currently dependent on the fact that the device_lock() is marked lockdep_set_novalidate_class(&dev->mutex). Otherwise, without that annotation, pci_bus_lock() would need to use something like a new pci_dev_lock_nested() helper, a scheme to track a PCI device's depth in the topology, and a hope that the depth of a PCI tree never exceeds the max value for a lockdep subclass. The alternative to ripping out the lockdep coverage would be to deploy a dynamic lock key for every PCI device. Unfortunately, there is evidence that increasing the number of keys that lockdep needs to track to be per-PCI-device is prohibitively expensive for something like the cfg_access_lock. The main motivation for adding the annotation in the first place was to catch unlocked secondary bus resets, not necessarily catch lock ordering problems between cfg_access_lock and other locks. Solve that narrower problem with follow-on patches, and just due to targeted revert for now. Link: https://lore.kernel.org/r/171711746402.1628941.14575335981264103013.stgit@dwillia2-xfh.jf.intel.com Fixes: 7e89efc6e9e4 ("PCI: Lock upstream bridge for pci_reset_function()") Reported-by: Imre Deak <[email protected]> Closes: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_134186v1/shard-dg2-1/igt@[email protected] Signed-off-by: Dan Williams <[email protected]> Signed-off-by: Bjorn Helgaas <[email protected]> Tested-by: Hans de Goede <[email protected]> Tested-by: Kalle Valo <[email protected]> Reviewed-by: Dave Jiang <[email protected]> Cc: Jani Saarinen <[email protected]>
2024-06-04drm/xe/pf: Update the LMTT when freeing VF GT configMichal Wajdeczko1-0/+1
The LMTT must be updated whenever we change the VF LMEM configuration. We missed that step when freeing the whole VF GT config, which could result in stale PTE in LMTT or LMTT PT object leaks. Fix that. Fixes: ac6598aed1b3 ("drm/xe/pf: Add support to configure SR-IOV VFs") Signed-off-by: Michal Wajdeczko <[email protected]> Reviewed-by: Piotr Piórkowski <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] (cherry picked from commit c063cce7df3a765539e2a2d75ab943f334446cce) Signed-off-by: Thomas Hellström <[email protected]>
2024-06-04HID: logitech-hidpp: add missing MODULE_DESCRIPTION() macroJeff Johnson1-0/+1
make allmodconfig && make W=1 C=1 reports: WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/hid/hid-logitech-hidpp.o Add the missing invocation of the MODULE_DESCRIPTION() macro. Signed-off-by: Jeff Johnson <[email protected]> Signed-off-by: Jiri Kosina <[email protected]>
2024-06-04HID: intel-ish-hid: fix endian-conversionArnd Bergmann2-45/+57
The newly added file causes a ton of sparse warnings about the incorrect use of __le32 and similar types: drivers/hid/intel-ish-hid/ishtp/loader.h:41:23: error: invalid bitfield specifier for type restricted __le32. drivers/hid/intel-ish-hid/ishtp/loader.h:42:27: error: invalid bitfield specifier for type restricted __le32. drivers/hid/intel-ish-hid/ishtp/loader.h:43:24: error: invalid bitfield specifier for type restricted __le32. drivers/hid/intel-ish-hid/ishtp/loader.h:44:24: error: invalid bitfield specifier for type restricted __le32. drivers/hid/intel-ish-hid/ishtp/loader.h:45:22: error: invalid bitfield specifier for type restricted __le32. drivers/hid/intel-ish-hid/ishtp/loader.c:172:33: warning: restricted __le32 degrades to integer drivers/hid/intel-ish-hid/ishtp/loader.c:178:50: warning: incorrect type in assignment (different base types) drivers/hid/intel-ish-hid/ishtp/loader.c:178:50: expected restricted __le32 [usertype] length drivers/hid/intel-ish-hid/ishtp/loader.c:178:50: got unsigned long drivers/hid/intel-ish-hid/ishtp/loader.c:179:50: warning: incorrect type in assignment (different base types) drivers/hid/intel-ish-hid/ishtp/loader.c:179:50: expected restricted __le32 [usertype] fw_off drivers/hid/intel-ish-hid/ishtp/loader.c:179:50: got unsigned int [usertype] offset drivers/hid/intel-ish-hid/ishtp/loader.c:180:17: warning: cast from restricted __le32 drivers/hid/intel-ish-hid/ishtp/loader.c:183:24: warning: invalid assignment: += drivers/hid/intel-ish-hid/ishtp/loader.c:183:24: left side has type unsigned int drivers/hid/intel-ish-hid/ishtp/loader.c:183:24: right side has type restricted __le32 Add the necessary conversions and use temporary variables where appropriate to avoid converting back. Fixes: 579a267e4617 ("HID: intel-ish-hid: Implement loading firmware from host feature") Signed-off-by: Arnd Bergmann <[email protected]> Reviewed-by: Zhang Lixu <[email protected]> Tested-by: Zhang Lixu <[email protected]> Signed-off-by: Jiri Kosina <[email protected]>
2024-06-04HID: nintendo: Fix an error handling path in nintendo_hid_probe()Christophe JAILLET1-2/+4
joycon_leds_create() has a ida_alloc() call. So if an error occurs after it, a corresponding ida_free() call is needed, as already done in the .remove function. This is not 100% perfect, because if ida_alloc() fails, then 'ctlr->player_id' will forced to be U32_MAX, and an error will be logged when ida_free() is called. Considering that this can't happen in real life, no special handling is done to handle it. Fixes: 5307de63d71d ("HID: nintendo: use ida for LED player id") Signed-off-by: Christophe JAILLET <[email protected]> Reviewed-by: Silvan Jegen <[email protected]> Signed-off-by: Jiri Kosina <[email protected]>
2024-06-04HID: logitech-dj: Fix memory leak in logi_dj_recv_switch_to_dj_mode()José Expósito1-1/+3
Fix a memory leak on logi_dj_recv_send_report() error path. Fixes: 6f20d3261265 ("HID: logitech-dj: Fix error handling in logi_dj_recv_switch_to_dj_mode()") Signed-off-by: José Expósito <[email protected]> Signed-off-by: Jiri Kosina <[email protected]>
2024-06-04KVM: arm64: Ensure that SME controls are disabled in protected modeFuad Tabba2-0/+18
KVM (and pKVM) do not support SME guests. Therefore KVM ensures that the host's SME state is flushed and that SME controls for enabling access to ZA storage and for streaming are disabled. pKVM needs to protect against a buggy/malicious host. Ensure that it wouldn't run a guest when protected mode is enabled should any of the SME controls be enabled. Signed-off-by: Fuad Tabba <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Marc Zyngier <[email protected]>
2024-06-04KVM: arm64: Refactor CPACR trap bit setting/clearing to use ELx formatFuad Tabba6-19/+14
When setting/clearing CPACR bits for EL0 and EL1, use the ELx format of the bits, which covers both. This makes the code clearer, and reduces the chances of accidentally missing a bit. No functional change intended. Reviewed-by: Oliver Upton <[email protected]> Signed-off-by: Fuad Tabba <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Marc Zyngier <[email protected]>
2024-06-04KVM: arm64: Consolidate initializing the host data's fpsimd_state/sve in pKVMFuad Tabba5-21/+22
Now that we have introduced finalize_init_hyp_mode(), lets consolidate the initializing of the host_data fpsimd_state and sve state. Reviewed-by: Oliver Upton <[email protected]> Signed-off-by: Fuad Tabba <[email protected]> Reviewed-by: Mark Brown <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Marc Zyngier <[email protected]>
2024-06-04KVM: arm64: Eagerly restore host fpsimd/sve state in pKVMFuad Tabba4-5/+93
When running in protected mode we don't want to leak protected guest state to the host, including whether a guest has used fpsimd/sve. Therefore, eagerly restore the host state on guest exit when running in protected mode, which happens only if the guest has used fpsimd/sve. Reviewed-by: Oliver Upton <[email protected]> Signed-off-by: Fuad Tabba <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Marc Zyngier <[email protected]>
2024-06-04KVM: arm64: Allocate memory mapped at hyp for host sve state in pKVMFuad Tabba7-0/+124
Protected mode needs to maintain (save/restore) the host's sve state, rather than relying on the host kernel to do that. This is to avoid leaking information to the host about guests and the type of operations they are performing. As a first step towards that, allocate memory mapped at hyp, per cpu, for the host sve state. The following patch will use this memory to save/restore the host state. Reviewed-by: Oliver Upton <[email protected]> Signed-off-by: Fuad Tabba <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Marc Zyngier <[email protected]>
2024-06-04KVM: arm64: Specialize handling of host fpsimd state on trapFuad Tabba3-1/+13
In subsequent patches, n/vhe will diverge on saving the host fpsimd/sve state when taking a guest fpsimd/sve trap. Add a specialized helper to handle it. No functional change intended. Reviewed-by: Mark Brown <[email protected]> Reviewed-by: Oliver Upton <[email protected]> Signed-off-by: Fuad Tabba <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Marc Zyngier <[email protected]>
2024-06-04KVM: arm64: Abstract set/clear of CPTR_EL2 bits behind helperFuad Tabba4-19/+73
The same traps controlled by CPTR_EL2 or CPACR_EL1 need to be toggled in different parts of the code, but the exact bits and their polarity differ between these two formats and the mode (vhe/nvhe/hvhe). To reduce the amount of duplicated code and the chance of getting the wrong bit/polarity or missing a field, abstract the set/clear of CPTR_EL2 bits behind a helper. Since (h)VHE is the way of the future, use the CPACR_EL1 format, which is a subset of the VHE CPTR_EL2, as a reference. No functional change intended. Suggested-by: Oliver Upton <[email protected]> Reviewed-by: Oliver Upton <[email protected]> Signed-off-by: Fuad Tabba <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Marc Zyngier <[email protected]>
2024-06-04KVM: arm64: Fix prototype for __sve_save_state/__sve_restore_stateFuad Tabba2-3/+4
Since the prototypes for __sve_save_state/__sve_restore_state at hyp were added, the underlying macro has acquired a third parameter for saving/restoring ffr. Fix the prototypes to account for the third parameter, and restore the ffr for the guest since it is saved. Suggested-by: Mark Brown <[email protected]> Signed-off-by: Fuad Tabba <[email protected]> Reviewed-by: Mark Brown <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Marc Zyngier <[email protected]>
2024-06-04KVM: arm64: Reintroduce __sve_save_stateFuad Tabba2-0/+7
Now that the hypervisor is handling the host sve state in protected mode, it needs to be able to save it. This reverts commit e66425fc9ba3 ("KVM: arm64: Remove unused __sve_save_state"). Reviewed-by: Oliver Upton <[email protected]> Signed-off-by: Fuad Tabba <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Marc Zyngier <[email protected]>
2024-06-04io_uring: fix possible deadlock in io_register_iowq_max_workers()Hagar Hemdan1-0/+4
The io_register_iowq_max_workers() function calls io_put_sq_data(), which acquires the sqd->lock without releasing the uring_lock. Similar to the commit 009ad9f0c6ee ("io_uring: drop ctx->uring_lock before acquiring sqd->lock"), this can lead to a potential deadlock situation. To resolve this issue, the uring_lock is released before calling io_put_sq_data(), and then it is re-acquired after the function call. This change ensures that the locks are acquired in the correct order, preventing the possibility of a deadlock. Suggested-by: Maximilian Heyne <[email protected]> Signed-off-by: Hagar Hemdan <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2024-06-04io_uring/io-wq: avoid garbage value of 'match' in io_wq_enqueue()Su Hui1-5/+5
Clang static checker (scan-build) warning: o_uring/io-wq.c:line 1051, column 3 The expression is an uninitialized value. The computed value will also be garbage. 'match.nr_pending' is used in io_acct_cancel_pending_work(), but it is not fully initialized. Change the order of assignment for 'match' to fix this problem. Fixes: 42abc95f05bf ("io-wq: decouple work_list protection from the big wqe->lock") Signed-off-by: Su Hui <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2024-06-04io_uring/napi: fix timeout calculationJens Axboe1-11/+13
Not quite sure what __io_napi_adjust_timeout() was attemping to do, it's adjusting both the NAPI timeout and the general overall timeout, and calculating a value that is never used. The overall timeout is a super set of the NAPI timeout, and doesn't need adjusting. The only thing we really need to care about is that the NAPI timeout doesn't exceed the overall timeout. If a user asked for a timeout of eg 5 usec and NAPI timeout is 10 usec, then we should not spin for 10 usec. While in there, sanitize the time checking a bit. If we have a negative value in the passed in timeout, discard it. Round up the value as well, so we don't end up with a NAPI timeout for the majority of the wait, with only a tiny sleep value at the end. Hence the only case we need to care about is if the NAPI timeout is larger than the overall timeout. If it is, cap the NAPI timeout at what the overall timeout is. Cc: [email protected] Fixes: 8d0c12a80cde ("io-uring: add napi busy poll support") Reported-by: Lewis Baker <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2024-06-04iommu/amd: Fix Invalid wait context issueVasant Hegde2-42/+28
With commit c4cb23111103 ("iommu/amd: Add support for enable/disable IOPF") we are hitting below issue. This happens because in IOPF enablement path it holds spin lock with irq disable and then tries to take mutex lock. dmesg: ----- [ 0.938739] ============================= [ 0.938740] [ BUG: Invalid wait context ] [ 0.938742] 6.10.0-rc1+ #1 Not tainted [ 0.938745] ----------------------------- [ 0.938746] swapper/0/1 is trying to lock: [ 0.938748] ffffffff8c9f01d8 (&port_lock_key){....}-{3:3}, at: serial8250_console_write+0x78/0x4a0 [ 0.938767] other info that might help us debug this: [ 0.938768] context-{5:5} [ 0.938769] 7 locks held by swapper/0/1: [ 0.938772] #0: ffff888101a91310 (&group->mutex){+.+.}-{4:4}, at: bus_iommu_probe+0x70/0x160 [ 0.938790] #1: ffff888101d1f1b8 (&domain->lock){....}-{3:3}, at: amd_iommu_attach_device+0xa5/0x700 [ 0.938799] #2: ffff888101cc3d18 (&dev_data->lock){....}-{3:3}, at: amd_iommu_attach_device+0xc5/0x700 [ 0.938806] #3: ffff888100052830 (&iommu->lock){....}-{2:2}, at: amd_iommu_iopf_add_device+0x3f/0xa0 [ 0.938813] #4: ffffffff8945a340 (console_lock){+.+.}-{0:0}, at: _printk+0x48/0x50 [ 0.938822] #5: ffffffff8945a390 (console_srcu){....}-{0:0}, at: console_flush_all+0x58/0x4e0 [ 0.938867] #6: ffffffff82459f80 (console_owner){....}-{0:0}, at: console_flush_all+0x1f0/0x4e0 [ 0.938872] stack backtrace: [ 0.938874] CPU: 2 PID: 1 Comm: swapper/0 Not tainted 6.10.0-rc1+ #1 [ 0.938877] Hardware name: HP HP EliteBook 745 G3/807E, BIOS N73 Ver. 01.39 04/16/2019 Fix above issue by re-arranging code in attach device path: - move device PASID/IOPF enablement outside lock in AMD IOMMU driver. This is safe as core layer holds group->mutex lock before calling iommu_ops->attach_dev. Reported-by: Borislav Petkov <[email protected]> Reported-by: Mikhail Gavrilov <[email protected]> Reported-by: Chris Bainbridge <[email protected]> Fixes: c4cb23111103 ("iommu/amd: Add support for enable/disable IOPF") Tested-by: Borislav Petkov <[email protected]> Tested-by: Chris Bainbridge <[email protected]> Tested-by: Mikhail Gavrilov <[email protected]> Signed-off-by: Vasant Hegde <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Joerg Roedel <[email protected]>
2024-06-04iommu/amd: Check EFR[EPHSup] bit before enabling PPRVasant Hegde1-1/+2
Check for EFR[EPHSup] bit before enabling PPR. This bit must be set to enable PPR. Reported-by: Borislav Petkov <[email protected]> Fixes: c4cb23111103 ("iommu/amd: Add support for enable/disable IOPF") Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218900 Tested-by: Borislav Petkov <[email protected]> Tested-by: Jean-Christophe Guillain <[email protected]> Signed-off-by: Vasant Hegde <[email protected]> Reviewed-by: Suravee Suthikulpanit <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Joerg Roedel <[email protected]>
2024-06-04iommu/amd: Fix workqueue nameVasant Hegde1-2/+1
Workqueue name length is crossing WQ_NAME_LEN limit. Fix it by changing name format. New format : "iopf_queue/amdvi-<iommu-devid>" kernel warning: [ 11.146912] workqueue: name exceeds WQ_NAME_LEN. Truncating to: iopf_queue/amdiommu-0xc002-iopf Reported-by: Borislav Petkov <[email protected]> Fixes: 61928bab9d26 ("iommu/amd: Define per-IOMMU iopf_queue") Signed-off-by: Vasant Hegde <[email protected]> Acked-by: Borislav Petkov (AMD) <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Joerg Roedel <[email protected]>