aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2024-02-09net: fill in MODULE_DESCRIPTION()s for 6LoWPANBreno Leitao1-0/+1
W=1 builds now warn if module is built without a MODULE_DESCRIPTION(). Add descriptions to IPv6 over Low power Wireless Personal Area Network. Signed-off-by: Breno Leitao <[email protected]> Acked-by: Alexander Aring <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-02-09net: fill in MODULE_DESCRIPTION()s for af_keyBreno Leitao1-0/+1
W=1 builds now warn if module is built without a MODULE_DESCRIPTION(). Add descriptions to the PF_KEY socket helpers. Signed-off-by: Breno Leitao <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-02-09net: fill in MODULE_DESCRIPTION()s for mpoaBreno Leitao1-0/+1
W=1 builds now warn if module is built without a MODULE_DESCRIPTION(). Add descriptions to the Multi-Protocol Over ATM (MPOA) driver. Signed-off-by: Breno Leitao <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-02-09net: fill in MODULE_DESCRIPTION()s for xfrmBreno Leitao2-0/+2
W=1 builds now warn if module is built without a MODULE_DESCRIPTION(). Add descriptions to the XFRM interface drivers. Signed-off-by: Breno Leitao <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-02-09lan966x: Fix crash when adding interface under a lagHoratiu Vultur1-2/+7
There is a crash when adding one of the lan966x interfaces under a lag interface. The issue can be reproduced like this: ip link add name bond0 type bond miimon 100 mode balance-xor ip link set dev eth0 master bond0 The reason is because when adding a interface under the lag it would go through all the ports and try to figure out which other ports are under that lag interface. And the issue is that lan966x can have ports that are NULL pointer as they are not probed. So then iterating over these ports it would just crash as they are NULL pointers. The fix consists in actually checking for NULL pointers before accessing something from the ports. Like we do in other places. Fixes: cabc9d49333d ("net: lan966x: Add lag support for lan966x") Signed-off-by: Horatiu Vultur <[email protected]> Reviewed-by: Michal Swiatkowski <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-02-09net/sched: act_mirred: Don't zero blockid when net device is being deletedVictor Nogueira1-2/+0
While testing tdc with parallel tests for mirred to block we caught an intermittent bug. The blockid was being zeroed out when a net device was deleted and, thus, giving us an incorrect blockid value whenever we tried to dump the mirred action. Since we don't increment the block refcount in the control path (and only use the ID), we don't need to zero the blockid field whenever a net device is going down. Fixes: 42f39036cda8 ("net/sched: act_mirred: Allow mirred to block") Signed-off-by: Victor Nogueira <[email protected]> Reviewed-by: Simon Horman <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Acked-by: Jamal Hadi Salim <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-02-09Merge branch 'net-openvswitch-limit-the-recursions-from-action-sets'Jakub Kicinski3-31/+102
Aaron Conole says: ==================== net: openvswitch: limit the recursions from action sets Open vSwitch module accepts actions as a list from the netlink socket and then creates a copy which it uses in the action set processing. During processing of the action list on a packet, the module keeps a count of the execution depth and exits processing if the action depth goes too high. However, during netlink processing the recursion depth isn't checked anywhere, and the copy trusts that kernel has large enough stack to accommodate it. The OVS sample action was the original action which could perform this kinds of recursion, and it originally checked that it didn't exceed the sample depth limit. However, when sample became optimized to provide the clone() semantics, the recursion limit was dropped. This series adds a depth limit during the __ovs_nla_copy_actions() call that will ensure we don't exceed the max that the OVS userspace could generate for a clone(). Additionally, this series provides a selftest in 2/2 that can be used to determine if the OVS module is allowing unbounded access. It can be safely omitted where the ovs selftest framework isn't available. ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-02-09selftests: openvswitch: Add validation for the recursion testAaron Conole2-15/+69
Add a test case into the netlink checks that will show the number of nested action recursions won't exceed 16. Going to 17 on a small clone call isn't enough to exhaust the stack on (most) systems, so it should be safe to run even on systems that don't have the fix applied. Signed-off-by: Aaron Conole <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-02-09net: openvswitch: limit the number of recursions from action setsAaron Conole1-16/+33
The ovs module allows for some actions to recursively contain an action list for complex scenarios, such as sampling, checking lengths, etc. When these actions are copied into the internal flow table, they are evaluated to validate that such actions make sense, and these calls happen recursively. The ovs-vswitchd userspace won't emit more than 16 recursion levels deep. However, the module has no such limit and will happily accept limits larger than 16 levels nested. Prevent this by tracking the number of recursions happening and manually limiting it to 16 levels nested. The initial implementation of the sample action would track this depth and prevent more than 3 levels of recursion, but this was removed to support the clone use case, rather than limited at the current userspace limit. Fixes: 798c166173ff ("openvswitch: Optimize sample action for the clone use cases") Signed-off-by: Aaron Conole <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-02-09smb3: clarify mount warningSteve French1-1/+1
When a user tries to use the "sec=krb5p" mount parameter to encrypt data on connection to a server (when authenticating with Kerberos), we indicate that it is not supported, but do not note the equivalent recommended mount parameter ("sec=krb5,seal") which turns on encryption for that mount (and uses Kerberos for auth). Update the warning message. Reviewed-by: Shyam Prasad N <[email protected]> Signed-off-by: Steve French <[email protected]>
2024-02-09cifs: handle cases where multiple sessions share connectionShyam Prasad N2-1/+6
Based on our implementation of multichannel, it is entirely possible that a server struct may not be found in any channel of an SMB session. In such cases, we should be prepared to move on and search for the server struct in the next session. Signed-off-by: Shyam Prasad N <[email protected]> Signed-off-by: Steve French <[email protected]>
2024-02-09cifs: change tcon status when need_reconnect is set on itShyam Prasad N3-1/+14
When a tcon is marked for need_reconnect, the intention is to have it reconnected. This change adjusts tcon->status in cifs_tree_connect when need_reconnect is set. Also, this change has a minor correction in resetting need_reconnect on success. It makes sure that it is done with tc_lock held. Signed-off-by: Shyam Prasad N <[email protected]> Signed-off-by: Steve French <[email protected]>
2024-02-09Merge branch 'selftests-forwarding-various-fixes'Jakub Kicinski3-9/+17
Ido Schimmel says: ==================== selftests: forwarding: Various fixes Fix various problems in the forwarding selftests so that they will pass in the netdev CI instead of being ignored. See commit messages for details. ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-02-09selftests: forwarding: Fix bridge locked port test flakinessIdo Schimmel1-2/+2
The redirection test case fails in the netdev CI on debug kernels because an FDB entry is learned despite the presence of a tc filter that redirects incoming traffic [1]. I am unable to reproduce the failure locally, but I can see how it can happen given that learning is first enabled and only then the ingress tc filter is configured. On debug kernels the time window between these two operations is longer compared to regular kernels, allowing random packets to be transmitted and trigger learning. Fix by reversing the order and configure the ingress tc filter before enabling learning. [1] [...] # TEST: Locked port MAB redirect [FAIL] # Locked entry created for redirected traffic Fixes: 38c43a1ce758 ("selftests: forwarding: Add test case for traffic redirection from a locked port") Signed-off-by: Ido Schimmel <[email protected]> Reviewed-by: Hangbin Liu <[email protected]> Acked-by: Nikolay Aleksandrov <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-02-09selftests: forwarding: Suppress grep warningsIdo Schimmel1-3/+3
Suppress the following grep warnings: [...] INFO: # Port group entries configuration tests - (*, G) TEST: Common port group entries configuration tests (IPv4 (*, G)) [ OK ] TEST: Common port group entries configuration tests (IPv6 (*, G)) [ OK ] grep: warning: stray \ before / grep: warning: stray \ before / grep: warning: stray \ before / TEST: IPv4 (*, G) port group entries configuration tests [ OK ] grep: warning: stray \ before / grep: warning: stray \ before / grep: warning: stray \ before / TEST: IPv6 (*, G) port group entries configuration tests [ OK ] [...] They do not fail the test, but do clutter the output. Fixes: b6d00da08610 ("selftests: forwarding: Add bridge MDB test") Signed-off-by: Ido Schimmel <[email protected]> Reviewed-by: Hangbin Liu <[email protected]> Acked-by: Nikolay Aleksandrov <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-02-09selftests: forwarding: Fix bridge MDB test flakinessIdo Schimmel1-2/+6
After enabling a multicast querier on the bridge (like the test is doing), the bridge will wait for the Max Response Delay before starting to forward according to its MDB in order to let Membership Reports enough time to be received and processed. Currently, the test is waiting for exactly the default Max Response Delay (10 seconds) which is racy and leads to failures [1]. Fix by reducing the Max Response Delay to 1 second. [1] [...] # TEST: IPv4 host entries forwarding tests [FAIL] # Packet locally received after flood Fixes: b6d00da08610 ("selftests: forwarding: Add bridge MDB test") Signed-off-by: Ido Schimmel <[email protected]> Reviewed-by: Hangbin Liu <[email protected]> Acked-by: Nikolay Aleksandrov <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-02-09selftests: forwarding: Fix layer 2 miss test flakinessIdo Schimmel1-2/+6
After enabling a multicast querier on the bridge (like the test is doing), the bridge will wait for the Max Response Delay before starting to forward according to its MDB in order to let Membership Reports enough time to be received and processed. Currently, the test is waiting for exactly the default Max Response Delay (10 seconds) which is racy and leads to failures [1]. Fix by reducing the Max Response Delay to 1 second. [1] [...] # TEST: L2 miss - Multicast (IPv4) [FAIL] # Unregistered multicast filter was hit after adding MDB entry Fixes: 8c33266ae26a ("selftests: forwarding: Add layer 2 miss test cases") Signed-off-by: Ido Schimmel <[email protected]> Reviewed-by: Hangbin Liu <[email protected]> Acked-by: Nikolay Aleksandrov <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-02-09selftests: net: Fix bridge backup port test flakinessIdo Schimmel1-0/+23
The test toggles the carrier of a bridge port in order to test the bridge backup port feature. Due to the linkwatch delayed work the carrier change is not always reflected fast enough to the bridge driver and packets are not forwarded as the test expects, resulting in failures [1]. Fix by busy waiting on the bridge port state until it changes to the desired state following the carrier change. [1] # Backup port # ----------- [...] # TEST: swp1 carrier off [ OK ] # TEST: No forwarding out of swp1 [FAIL] [ 641.995910] br0: port 1(swp1) entered disabled state # TEST: No forwarding out of vx0 [ OK ] Fixes: b408453053fb ("selftests: net: Add bridge backup port and backup nexthop ID test") Signed-off-by: Ido Schimmel <[email protected]> Reviewed-by: Petr Machata <[email protected]> Acked-by: Paolo Abeni <[email protected]> Acked-by: Nikolay Aleksandrov <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-02-09btrfs: add new unused block groups to the list of unused block groupsFilipe Manana1-0/+31
Space reservations for metadata are, most of the time, pessimistic as we reserve space for worst possible cases - where tree heights are at the maximum possible height (8), we need to COW every extent buffer in a tree path, need to split extent buffers, etc. For data, we generally reserve the exact amount of space we are going to allocate. The exception here is when using compression, in which case we reserve space matching the uncompressed size, as the compression only happens at writeback time and in the worst possible case we need that amount of space in case the data is not compressible. This means that when there's not available space in the corresponding space_info object, we may need to allocate a new block group, and then that block group might not be used after all. In this case the block group is never added to the list of unused block groups and ends up never being deleted - except if we unmount and mount again the fs, as when reading block groups from disk we add unused ones to the list of unused block groups (fs_info->unused_bgs). Otherwise a block group is only added to the list of unused block groups when we deallocate the last extent from it, so if no extent is ever allocated, the block group is kept around forever. This also means that if we have a bunch of tasks reserving space in parallel we can end up allocating many block groups that end up never being used or kept around for too long without being used, which has the potential to result in ENOSPC failures in case for example we over allocate too many metadata block groups and then end up in a state without enough unallocated space to allocate a new data block group. This is more likely to happen with metadata reservations as of kernel 6.7, namely since commit 28270e25c69a ("btrfs: always reserve space for delayed refs when starting transaction"), because we started to always reserve space for delayed references when starting a transaction handle for a non-zero number of items, and also to try to reserve space to fill the gap between the delayed block reserve's reserved space and its size. So to avoid this, when finishing the creation a new block group, add the block group to the list of unused block groups if it's still unused at that time. This way the next time the cleaner kthread runs, it will delete the block group if it's still unused and not needed to satisfy existing space reservations. Reported-by: Ivan Shapovalov <[email protected]> Link: https://lore.kernel.org/linux-btrfs/[email protected]/ CC: [email protected] # 6.7+ Reviewed-by: Johannes Thumshirn <[email protected]> Reviewed-by: Josef Bacik <[email protected]> Reviewed-by: Boris Burkov <[email protected]> Signed-off-by: Filipe Manana <[email protected]> Signed-off-by: David Sterba <[email protected]>
2024-02-09btrfs: do not delete unused block group if it may be used soonFilipe Manana1-0/+46
Before deleting a block group that is in the list of unused block groups (fs_info->unused_bgs), we check if the block group became used before deleting it, as extents from it may have been allocated after it was added to the list. However even if the block group was not yet used, there may be tasks that have only reserved space and have not yet allocated extents, and they might be relying on the availability of the unused block group in order to allocate extents. The reservation works first by increasing the "bytes_may_use" field of the corresponding space_info object (which may first require flushing delayed items, allocating a new block group, etc), and only later a task does the actual allocation of extents. For metadata we usually don't end up using all reserved space, as we are pessimistic and typically account for the worst cases (need to COW every single node in a path of a tree at maximum possible height, etc). For data we usually reserve the exact amount of space we're going to allocate later, except when using compression where we always reserve space based on the uncompressed size, as compression is only triggered when writeback starts so we don't know in advance how much space we'll actually need, or if the data is compressible. So don't delete an unused block group if the total size of its space_info object minus the block group's size is less then the sum of used space and space that may be used (space_info->bytes_may_use), as that means we have tasks that reserved space and may need to allocate extents from the block group. In this case, besides skipping the deletion, re-add the block group to the list of unused block groups so that it may be reconsidered later, in case the tasks that reserved space end up not needing to allocate extents from it. Allowing the deletion of the block group while we have reserved space, can result in tasks failing to allocate metadata extents (-ENOSPC) while under a transaction handle, resulting in a transaction abort, or failure during writeback for the case of data extents. CC: [email protected] # 6.0+ Reviewed-by: Johannes Thumshirn <[email protected]> Reviewed-by: Josef Bacik <[email protected]> Reviewed-by: Boris Burkov <[email protected]> Signed-off-by: Filipe Manana <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2024-02-09btrfs: add and use helper to check if block group is usedFilipe Manana2-2/+8
Add a helper function to determine if a block group is being used and make use of it at btrfs_delete_unused_bgs(). This helper will also be used in future code changes. Reviewed-by: Johannes Thumshirn <[email protected]> Reviewed-by: Josef Bacik <[email protected]> Reviewed-by: Boris Burkov <[email protected]> Signed-off-by: Filipe Manana <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2024-02-09btrfs: don't drop extent_map for free space inode on write errorJosef Bacik1-2/+17
While running the CI for an unrelated change I hit the following panic with generic/648 on btrfs_holes_spacecache. assertion failed: block_start != EXTENT_MAP_HOLE, in fs/btrfs/extent_io.c:1385 ------------[ cut here ]------------ kernel BUG at fs/btrfs/extent_io.c:1385! invalid opcode: 0000 [#1] PREEMPT SMP NOPTI CPU: 1 PID: 2695096 Comm: fsstress Kdump: loaded Tainted: G W 6.8.0-rc2+ #1 RIP: 0010:__extent_writepage_io.constprop.0+0x4c1/0x5c0 Call Trace: <TASK> extent_write_cache_pages+0x2ac/0x8f0 extent_writepages+0x87/0x110 do_writepages+0xd5/0x1f0 filemap_fdatawrite_wbc+0x63/0x90 __filemap_fdatawrite_range+0x5c/0x80 btrfs_fdatawrite_range+0x1f/0x50 btrfs_write_out_cache+0x507/0x560 btrfs_write_dirty_block_groups+0x32a/0x420 commit_cowonly_roots+0x21b/0x290 btrfs_commit_transaction+0x813/0x1360 btrfs_sync_file+0x51a/0x640 __x64_sys_fdatasync+0x52/0x90 do_syscall_64+0x9c/0x190 entry_SYSCALL_64_after_hwframe+0x6e/0x76 This happens because we fail to write out the free space cache in one instance, come back around and attempt to write it again. However on the second pass through we go to call btrfs_get_extent() on the inode to get the extent mapping. Because this is a new block group, and with the free space inode we always search the commit root to avoid deadlocking with the tree, we find nothing and return a EXTENT_MAP_HOLE for the requested range. This happens because the first time we try to write the space cache out we hit an error, and on an error we drop the extent mapping. This is normal for normal files, but the free space cache inode is special. We always expect the extent map to be correct. Thus the second time through we end up with a bogus extent map. Since we're deprecating this feature, the most straightforward way to fix this is to simply skip dropping the extent map range for this failed range. I shortened the test by using error injection to stress the area to make it easier to reproduce. With this patch in place we no longer panic with my error injection test. CC: [email protected] # 4.14+ Reviewed-by: Filipe Manana <[email protected]> Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: David Sterba <[email protected]>
2024-02-09Merge tag 'riscv-for-linus-6.8-rc4' of ↵Linus Torvalds7-6/+89
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V fixes from Palmer Dabbelt: - fix missing TLB flush during early boot on SPARSEMEM_VMEMMAP configurations - fixes to correctly implement the break-before-make behavior requried by the ISA for NAPOT mappings - fix a missing TLB flush on intermediate mapping changes - fix build warning about a missing declaration of overflow_stack - fix performace regression related to incorrect tracking of completed batch TLB flushes * tag 'riscv-for-linus-6.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: riscv: Fix arch_tlbbatch_flush() by clearing the batch cpumask riscv: declare overflow_stack as exported from traps.c riscv: Fix arch_hugetlb_migration_supported() for NAPOT riscv: Flush the tlb when a page directory is freed riscv: Fix hugetlb_mask_last_page() when NAPOT is enabled riscv: Fix set_huge_pte_at() for NAPOT mapping riscv: mm: execute local TLB flush after populating vmemmap
2024-02-09Merge tag 'trace-v6.8-rc3' of ↵Linus Torvalds2-38/+47
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing fixes from Steven Rostedt: - Fix broken direct trampolines being called when another callback is attached the same function. ARM 64 does not support FTRACE_WITH_REGS, and when it added direct trampoline calls from ftrace, it removed the "WITH_REGS" flag from the ftrace_ops for direct trampolines. This broke x86 as x86 requires direct trampolines to have WITH_REGS. This wasn't noticed because direct trampolines work as long as the function it is attached to is not shared with other callbacks (like the function tracer). When there are other callbacks, a helper trampoline is called, to call all the non direct callbacks and when it returns, the direct trampoline is called. For x86, the direct trampoline sets a flag in the regs field to tell the x86 specific code to call the direct trampoline. But this only works if the ftrace_ops had WITH_REGS set. ARM does things differently that does not require this. For now, set WITH_REGS if the arch supports WITH_REGS (which ARM does not), and this makes it work for both ARM64 and x86. - Fix wasted memory in the saved_cmdlines logic. The saved_cmdlines is a cache that maps PIDs to COMMs that tracing can use. Most trace events only save the PID in the event. The saved_cmdlines file lists PIDs to COMMs so that the tracing tools can show an actual name and not just a PID for each event. There's an array of PIDs that map to a small set of saved COMM strings. The array is set to PID_MAX_DEFAULT which is usually set to 32768. When a PID comes in, it will add itself to this array along with the index into the COMM array (note if the system allows more than PID_MAX_DEFAULT, this cache is similar to cache lines as an update of a PID that has the same PID_MAX_DEFAULT bits set will flush out another task with the same matching bits set). A while ago, the size of this cache was changed to be dynamic and the array was moved into a structure and created with kmalloc(). But this new structure had the size of 131104 bytes, or 0x20020 in hex. As kmalloc allocates in powers of two, it was actually allocating 0x40000 bytes (262144) leaving 131040 bytes of wasted memory. The last element of this structure was a pointer to the COMM string array which defaulted to just saving 128 COMMs. By changing the last field of this structure to a variable length string, and just having it round up to fill the allocated memory, the default size of the saved COMM cache is now 8190. This not only uses the wasted space, but actually saves space by removing the extra allocation for the COMM names. * tag 'trace-v6.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing: Fix wasted memory in saved_cmdlines logic ftrace: Fix DIRECT_CALLS to use SAVE_REGS by default
2024-02-09Merge tag 'probes-fixes-v6.8-rc3' of ↵Linus Torvalds3-17/+22
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull probes fixes from Masami Hiramatsu: - remove unnecessary initial values of kprobes local variables - probe-events parser bug fixes: - calculate the argument size and format string after setting type information from BTF, because BTF can change the size and format string. - show $comm parse error correctly instead of failing silently. * tag 'probes-fixes-v6.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: kprobes: Remove unnecessary initial values of variables tracing/probes: Fix to set arg size and fmt after setting type from BTF tracing/probes: Fix to show a parse error for bad type for $comm
2024-02-09Merge tag 'efi-fixes-for-v6.8-1' of ↵Linus Torvalds17-73/+97
git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi Pull EFI fixes from Ard Biesheuvel: "The only notable change here is the patch that changes the way we deal with spurious errors from the EFI memory attribute protocol. This will be backported to v6.6, and is intended to ensure that we will not paint ourselves into a corner when we tighten this further in order to comply with MS requirements on signed EFI code. Note that this protocol does not currently exist in x86 production systems in the field, only in Microsoft's fork of OVMF, but it will be mandatory for Windows logo certification for x86 PCs in the future. - Tighten ELF relocation checks on the RISC-V EFI stub - Give up if the new EFI memory attributes protocol fails spuriously on x86 - Take care not to place the kernel in the lowest 16 MB of DRAM on x86 - Omit special purpose EFI memory from memblock - Some fixes for the CXL CPER reporting code - Make the PE/COFF layout of mixed-mode capable images comply with a strict interpretation of the spec" * tag 'efi-fixes-for-v6.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi: x86/efistub: Use 1:1 file:memory mapping for PE/COFF .compat section cxl/trace: Remove unnecessary memcpy's cxl/cper: Fix errant CPER prints for CXL events efi: Don't add memblocks for soft-reserved memory efi: runtime: Fix potential overflow of soft-reserved region size efi/libstub: Add one kernel-doc comment x86/efistub: Avoid placing the kernel below LOAD_PHYSICAL_ADDR x86/efistub: Give up if memory attribute protocol returns an error riscv/efistub: Tighten ELF relocation check riscv/efistub: Ensure GP-relative addressing is not used
2024-02-09Merge tag 'pci-v6.8-fixes-2' of ↵Linus Torvalds1-4/+6
git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci Pull pci fixes from Bjorn Helgaas: - Fix an unintentional truncation of DWC MSI-X address to 32 bits and update similar MSI code to match (Dan Carpenter) * tag 'pci-v6.8-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: PCI: dwc: Clean up dw_pcie_ep_raise_msi_irq() alignment PCI: dwc: Fix a 64bit bug in dw_pcie_ep_raise_msix_irq()
2024-02-09Merge tag 'hwmon-for-v6.8-rc4' of ↵Linus Torvalds2-20/+29
git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging Pull hwmon fixes from Guenter Roeck: - coretemp: Various fixes, and increase number of supported CPU cores - aspeed-pwm-tacho: Add missing mutex protection * tag 'hwmon-for-v6.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging: hwmon: (coretemp) Enlarge per package core count limit hwmon: (coretemp) Fix bogus core_id to attr name mapping hwmon: (coretemp) Fix out-of-bounds memory access hwmon: (aspeed-pwm-tacho) mutex for tach reading
2024-02-09Merge tag 'mmc-v6.8-rc2' of ↵Linus Torvalds2-1/+35
git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc Pull MMC fixes from Ulf Hansson: "MMC core: - Allow non-sleeping read-only slot-gpio MMC host: - sdhci-pci-o2micro: Fix a warm reboot BIOS issue" * tag 'mmc-v6.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc: mmc: slot-gpio: Allow non-sleeping GPIO ro mmc: sdhci-pci-o2micro: Fix a warm reboot issue that disk can't be detected by BIOS
2024-02-09Merge tag 'pmdomain-v6.8-rc1' of ↵Linus Torvalds3-10/+10
git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm Pull pmdomain fixes from Ulf Hansson: "Core: - Move the unused cleanup to a _sync initcall Providers: - mediatek: Fix race conditions at probe/remove with genpd - renesas: r8a77980-sysc: CR7 must be always on" * tag 'pmdomain-v6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm: pmdomain: mediatek: fix race conditions with genpd pmdomain: renesas: r8a77980-sysc: CR7 must be always on pmdomain: core: Move the unused cleanup to a _sync initcall
2024-02-09Merge tag 'gpio-fixes-for-v6.8-rc4' of ↵Linus Torvalds1-4/+4
git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux Pull gpio fix from Bartosz Golaszewski: - remove the new GPIO device from the global list unconditionally in error path in core GPIOLIB * tag 'gpio-fixes-for-v6.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux: gpio: remove GPIO device from the list unconditionally in error path
2024-02-09Merge tag 'drm-fixes-2024-02-09' of git://anongit.freedesktop.org/drm/drmLinus Torvalds57-253/+390
Pull drm fixes from Dave Airlie: "Regular weekly fixes, xe, amdgpu and msm are most of them, with some misc in i915, ivpu and nouveau, scattered but nothing too intense at this point. i915: - gvt: docs fix, uninit var, MAINTAINERS ivpu: - add aborted job status - disable d3 hot delay - mmu fixes nouveau: - fix gsp rpc size request - fix dma buffer leaks - use common code for gsp mem ctor xe: - Fix a loop in an error path - Fix a missing dma-fence reference - Fix a retry path on userptr REMAP - Workaround for a false gcc warning - Fix missing map of the usm batch buffer in the migrate vm. - Fix a memory leak. - Fix a bad assumption of used page size - Fix hitting a BUG() due to zero pages to map. - Remove some leftover async bind queue relics amdgpu: - Misc NULL/bounds check fixes - ODM pipe policy fix - Aborted suspend fixes - JPEG 4.0.5 fix - DCN 3.5 fixes - PSP fix - DP MST fix - Phantom pipe fix - VRAM vendor fix - Clang fix - SR-IOV fix msm: - DPU: - fix for kernel doc warnings and smatch warnings in dpu_encoder - fix for smatch warning in dpu_encoder - fix the bus bandwidth value for SDM670 - DP: - fixes to handle unknown bpc case correctly for DP - fix for MISC0 programming - GPU: - dmabuf vmap fix - a610 UBWC corruption fix (incorrect hbb) - revert a commit that was making GPU recovery unreliable" * tag 'drm-fixes-2024-02-09' of git://anongit.freedesktop.org/drm/drm: (43 commits) drm/xe: Remove TEST_VM_ASYNC_OPS_ERROR drm/xe/vm: don't ignore error when in_kthread drm/xe: Assume large page size if VMA not yet bound drm/xe/display: Fix memleak in display initialization drm/xe: Map both mem.kernel_bb_pool and usm.bb_pool drm/xe: circumvent bogus stringop-overflow warning drm/xe: Pick correct userptr VMA to repin on REMAP op failure drm/xe: Take a reference in xe_exec_queue_last_fence_get() drm/xe: Fix loop in vm_bind_ioctl_ops_unwind drm/amdgpu: Fix HDP flush for VFs on nbio v7.9 drm/amd/display: Implement bounds check for stream encoder creation in DCN301 drm/amd/display: Increase frame-larger-than for all display_mode_vba files drm/amd/display: Clear phantom stream count and plane count drm/amdgpu: Avoid fetching VRAM vendor info drm/amd/display: Disable ODM by default for DCN35 drm/amd/display: Update phantom pipe enable / disable sequence drm/amd/display: Fix MST Null Ptr for RV drm/amdgpu: Fix shared buff copy to user drm/amd/display: Increase eval/entry delay for DCN35 drm/amdgpu: remove asymmetrical irq disabling in jpeg 4.0.5 suspend ...
2024-02-09x86/Kconfig: Transmeta Crusoe is CPU family 5, not 6Aleksander Mazur1-1/+1
The kernel built with MCRUSOE is unbootable on Transmeta Crusoe. It shows the following error message: This kernel requires an i686 CPU, but only detected an i586 CPU. Unable to boot - please use a kernel appropriate for your CPU. Remove MCRUSOE from the condition introduced in commit in Fixes, effectively changing X86_MINIMUM_CPU_FAMILY back to 5 on that machine, which matches the CPU family given by CPUID. [ bp: Massage commit message. ] Fixes: 25d76ac88821 ("x86/Kconfig: Explicitly enumerate i686-class CPUs in Kconfig") Signed-off-by: Aleksander Mazur <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Acked-by: H. Peter Anvin <[email protected]> Cc: <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-02-09spi: spi-ppc4xx: include missing platform_device.hChristian Lamparter1-0/+1
the driver currently fails to compile on 6.8-rc3 due to: | spi-ppc4xx.c: In function ‘spi_ppc4xx_of_probe’: | @346:36: error: invalid use of undefined type ‘struct platform_device’ | 346 | struct device_node *np = op->dev.of_node; | | ^~ | ... (more similar errors) it was working with 6.7. Looks like it only needed the include and its compiling fine! Signed-off-by: Christian Lamparter <[email protected]> Link: https://lore.kernel.org/r/3eb3f9c4407ba99d1cd275662081e46b9e839173.1707490664.git.chunkeey@gmail.com Signed-off-by: Mark Brown <[email protected]>
2024-02-09tracing: Fix wasted memory in saved_cmdlines logicSteven Rostedt (Google)1-38/+37
While looking at improving the saved_cmdlines cache I found a huge amount of wasted memory that should be used for the cmdlines. The tracing data saves pids during the trace. At sched switch, if a trace occurred, it will save the comm of the task that did the trace. This is saved in a "cache" that maps pids to comms and exposed to user space via the /sys/kernel/tracing/saved_cmdlines file. Currently it only caches by default 128 comms. The structure that uses this creates an array to store the pids using PID_MAX_DEFAULT (which is usually set to 32768). This causes the structure to be of the size of 131104 bytes on 64 bit machines. In hex: 131104 = 0x20020, and since the kernel allocates generic memory in powers of two, the kernel would allocate 0x40000 or 262144 bytes to store this structure. That leaves 131040 bytes of wasted space. Worse, the structure points to an allocated array to store the comm names, which is 16 bytes times the amount of names to save (currently 128), which is 2048 bytes. Instead of allocating a separate array, make the structure end with a variable length string and use the extra space for that. This is similar to a recommendation that Linus had made about eventfs_inode names: https://lore.kernel.org/all/[email protected]/ Instead of allocating a separate string array to hold the saved comms, have the structure end with: char saved_cmdlines[]; and round up to the next power of two over sizeof(struct saved_cmdline_buffers) + num_cmdlines * TASK_COMM_LEN It will use this extra space for the saved_cmdline portion. Now, instead of saving only 128 comms by default, by using this wasted space at the end of the structure it can save over 8000 comms and even saves space by removing the need for allocating the other array. Link: https://lore.kernel.org/linux-trace-kernel/[email protected] Cc: [email protected] Cc: Masami Hiramatsu <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Vincent Donnefort <[email protected]> Cc: Sven Schnelle <[email protected]> Cc: Mete Durlu <[email protected]> Fixes: 939c7a4f04fcd ("tracing: Introduce saved_cmdlines_size file") Signed-off-by: Steven Rostedt (Google) <[email protected]>
2024-02-09of: property: Add in-ports/out-ports support to of_graph_get_port_parent()Saravana Kannan1-1/+3
Similar to the existing "ports" node name, coresight device tree bindings have added "in-ports" and "out-ports" as standard node names for a collection of ports. Add support for these name to of_graph_get_port_parent() so that remote-endpoint parsing can find the correct parent node for these coresight ports too. Signed-off-by: Saravana Kannan <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Rob Herring <[email protected]>
2024-02-09of: property: Improve finding the supplier of a remote-endpoint propertySaravana Kannan1-1/+11
After commit 4a032827daa8 ("of: property: Simplify of_link_to_phandle()"), remote-endpoint properties created a fwnode link from the consumer device to the supplier endpoint. This is a tiny bit inefficient (not buggy) when trying to create device links or detecting cycles. So, improve this the same way we improved finding the consumer of a remote-endpoint property. Fixes: 4a032827daa8 ("of: property: Simplify of_link_to_phandle()") Signed-off-by: Saravana Kannan <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Rob Herring <[email protected]>
2024-02-09of: property: Improve finding the consumer of a remote-endpoint propertySaravana Kannan1-37/+10
We have a more accurate function to find the right consumer of a remote-endpoint property instead of searching for a parent with compatible string property. So, use that instead. While at it, make the code to find the consumer a bit more flexible and based on the property being parsed. Fixes: f7514a663016 ("of: property: fw_devlink: Add support for remote-endpoint") Signed-off-by: Saravana Kannan <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Rob Herring <[email protected]>
2024-02-09ftrace: Fix DIRECT_CALLS to use SAVE_REGS by defaultMasami Hiramatsu (Google)1-0/+10
The commit 60c8971899f3 ("ftrace: Make DIRECT_CALLS work WITH_ARGS and !WITH_REGS") changed DIRECT_CALLS to use SAVE_ARGS when there are multiple ftrace_ops at the same function, but since the x86 only support to jump to direct_call from ftrace_regs_caller, when we set the function tracer on the same target function on x86, ftrace-direct does not work as below (this actually works on arm64.) At first, insmod ftrace-direct.ko to put a direct_call on 'wake_up_process()'. # insmod kernel/samples/ftrace/ftrace-direct.ko # less trace ... <idle>-0 [006] ..s1. 564.686958: my_direct_func: waking up rcu_preempt-17 <idle>-0 [007] ..s1. 564.687836: my_direct_func: waking up kcompactd0-63 <idle>-0 [006] ..s1. 564.690926: my_direct_func: waking up rcu_preempt-17 <idle>-0 [006] ..s1. 564.696872: my_direct_func: waking up rcu_preempt-17 <idle>-0 [007] ..s1. 565.191982: my_direct_func: waking up kcompactd0-63 Setup a function filter to the 'wake_up_process' too, and enable it. # cd /sys/kernel/tracing/ # echo wake_up_process > set_ftrace_filter # echo function > current_tracer # less trace ... <idle>-0 [006] ..s3. 686.180972: wake_up_process <-call_timer_fn <idle>-0 [006] ..s3. 686.186919: wake_up_process <-call_timer_fn <idle>-0 [002] ..s3. 686.264049: wake_up_process <-call_timer_fn <idle>-0 [002] d.h6. 686.515216: wake_up_process <-kick_pool <idle>-0 [002] d.h6. 686.691386: wake_up_process <-kick_pool Then, only function tracer is shown on x86. But if you enable 'kprobe on ftrace' event (which uses SAVE_REGS flag) on the same function, it is shown again. # echo 'p wake_up_process' >> dynamic_events # echo 1 > events/kprobes/p_wake_up_process_0/enable # echo > trace # less trace ... <idle>-0 [006] ..s2. 2710.345919: p_wake_up_process_0: (wake_up_process+0x4/0x20) <idle>-0 [006] ..s3. 2710.345923: wake_up_process <-call_timer_fn <idle>-0 [006] ..s1. 2710.345928: my_direct_func: waking up rcu_preempt-17 <idle>-0 [006] ..s2. 2710.349931: p_wake_up_process_0: (wake_up_process+0x4/0x20) <idle>-0 [006] ..s3. 2710.349934: wake_up_process <-call_timer_fn <idle>-0 [006] ..s1. 2710.349937: my_direct_func: waking up rcu_preempt-17 To fix this issue, use SAVE_REGS flag for multiple ftrace_ops flag of direct_call by default. Link: https://lore.kernel.org/linux-trace-kernel/170484558617.178953.1590516949390270842.stgit@devnote2 Fixes: 60c8971899f3 ("ftrace: Make DIRECT_CALLS work WITH_ARGS and !WITH_REGS") Cc: [email protected] Cc: Florent Revest <[email protected]> Signed-off-by: Masami Hiramatsu (Google) <[email protected]> Reviewed-by: Mark Rutland <[email protected]> Tested-by: Mark Rutland <[email protected]> [arm64] Acked-by: Jiri Olsa <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2024-02-08selftests: net: add more missing kernel configPaolo Abeni1-1/+5
The reuseport_addr_any.sh is currently skipping DCCP tests and pmtu.sh is skipping all the FOU/GUE related cases: add the missing options. Signed-off-by: Paolo Abeni <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Link: https://lore.kernel.org/r/38d3ca7f909736c1aef56e6244d67c82a9bba6ff.1707326987.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <[email protected]>
2024-02-08devlink: Fix command annotation documentationParav Pandit1-1/+1
Command example string is not read as command. Fix command annotation. Fixes: a8ce7b26a51e ("devlink: Expose port function commands to control migratable") Signed-off-by: Parav Pandit <[email protected]> Reviewed-by: Jiri Pirko <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-02-08bonding: do not report NETDEV_XDP_ACT_XSK_ZEROCOPYMagnus Karlsson1-3/+2
Do not report the XDP capability NETDEV_XDP_ACT_XSK_ZEROCOPY as the bonding driver does not support XDP and AF_XDP in zero-copy mode even if the real NIC drivers do. Note that the driver used to report everything as supported before a device was bonded. Instead of just masking out the zero-copy support from this, have the driver report that no XDP feature is supported until a real device is bonded. This seems to be more truthful as it is the real drivers that decide what XDP features are supported. Fixes: cb9e6e584d58 ("bonding: add xdp_features support") Reported-by: Prashant Batra <[email protected]> Link: https://lore.kernel.org/all/CAJ8uoz2ieZCopgqTvQ9ZY6xQgTbujmC6XkMTamhp68O-h_-rLg@mail.gmail.com/T/ Signed-off-by: Magnus Karlsson <[email protected]> Reviewed-by: Toke Høiland-Jørgensen <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-02-08net/handshake: Fix handshake_req_destroy_test1Chuck Lever1-1/+4
Recently, handshake_req_destroy_test1 started failing: Expected handshake_req_destroy_test == req, but handshake_req_destroy_test == 0000000000000000 req == 0000000060f99b40 not ok 11 req_destroy works This is because "sock_release(sock)" was replaced with "fput(filp)" to address a memory leak. Note that sock_release() is synchronous but fput() usually delays the final close and clean-up. The delay is not consequential in the other cases that were changed but handshake_req_destroy_test1 is testing that handshake_req_cancel() followed by closing the file actually does call the ->hp_destroy method. Thus the PTR_EQ test at the end has to be sure that the final close is complete before it checks the pointer. We cannot use a completion here because if ->hp_destroy is never called (ie, there is an API bug) then the test will hang. Reported by: Guenter Roeck <[email protected]> Closes: https://lore.kernel.org/netdev/[email protected]/T/#mac5c6299f86799f1c71776f3a07f9c566c7c3c40 Fixes: 4a0f07d71b04 ("net/handshake: Fix memory leak in __sock_create() and sock_alloc_file()") Signed-off-by: Chuck Lever <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Link: https://lore.kernel.org/r/170724699027.91401.7839730697326806733.stgit@oracle-102.nfsv4bat.org Signed-off-by: Jakub Kicinski <[email protected]>
2024-02-08net/mlx5: DPLL, Fix possible use after free after delayed work timer triggersJiri Pirko1-1/+1
I managed to hit following use after free warning recently: [ 2169.711665] ================================================================== [ 2169.714009] BUG: KASAN: slab-use-after-free in __run_timers.part.0+0x179/0x4c0 [ 2169.716293] Write of size 8 at addr ffff88812b326a70 by task swapper/4/0 [ 2169.719022] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 6.8.0-rc2jiri+ #2 [ 2169.720974] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 [ 2169.722457] Call Trace: [ 2169.722756] <IRQ> [ 2169.723024] dump_stack_lvl+0x58/0xb0 [ 2169.723417] print_report+0xc5/0x630 [ 2169.723807] ? __virt_addr_valid+0x126/0x2b0 [ 2169.724268] kasan_report+0xbe/0xf0 [ 2169.724667] ? __run_timers.part.0+0x179/0x4c0 [ 2169.725116] ? __run_timers.part.0+0x179/0x4c0 [ 2169.725570] __run_timers.part.0+0x179/0x4c0 [ 2169.726003] ? call_timer_fn+0x320/0x320 [ 2169.726404] ? lock_downgrade+0x3a0/0x3a0 [ 2169.726820] ? kvm_clock_get_cycles+0x14/0x20 [ 2169.727257] ? ktime_get+0x92/0x150 [ 2169.727630] ? lapic_next_deadline+0x35/0x60 [ 2169.728069] run_timer_softirq+0x40/0x80 [ 2169.728475] __do_softirq+0x1a1/0x509 [ 2169.728866] irq_exit_rcu+0x95/0xc0 [ 2169.729241] sysvec_apic_timer_interrupt+0x6b/0x80 [ 2169.729718] </IRQ> [ 2169.729993] <TASK> [ 2169.730259] asm_sysvec_apic_timer_interrupt+0x16/0x20 [ 2169.730755] RIP: 0010:default_idle+0x13/0x20 [ 2169.731190] Code: c0 08 00 00 00 4d 29 c8 4c 01 c7 4c 29 c2 e9 72 ff ff ff cc cc cc cc 8b 05 9a 7f 1f 02 85 c0 7e 07 0f 00 2d cf 69 43 00 fb f4 <fa> c3 66 66 2e 0f 1f 84 00 00 00 00 00 65 48 8b 04 25 c0 93 04 00 [ 2169.732759] RSP: 0018:ffff888100dbfe10 EFLAGS: 00000242 [ 2169.733264] RAX: 0000000000000001 RBX: ffff888100d9c200 RCX: ffffffff8241bd62 [ 2169.733925] RDX: ffffed109a848b15 RSI: 0000000000000004 RDI: ffffffff8127ac55 [ 2169.734566] RBP: 0000000000000004 R08: 0000000000000000 R09: ffffed109a848b14 [ 2169.735200] R10: ffff8884d42458a3 R11: 000000000000ba7e R12: ffffffff83d7d3a0 [ 2169.735835] R13: 1ffff110201b7fc6 R14: 0000000000000000 R15: ffff888100d9c200 [ 2169.736478] ? ct_kernel_exit.constprop.0+0xa2/0xc0 [ 2169.736954] ? do_idle+0x285/0x290 [ 2169.737323] default_idle_call+0x63/0x90 [ 2169.737730] do_idle+0x285/0x290 [ 2169.738089] ? arch_cpu_idle_exit+0x30/0x30 [ 2169.738511] ? mark_held_locks+0x1a/0x80 [ 2169.738917] ? lockdep_hardirqs_on_prepare+0x12e/0x200 [ 2169.739417] cpu_startup_entry+0x30/0x40 [ 2169.739825] start_secondary+0x19a/0x1c0 [ 2169.740229] ? set_cpu_sibling_map+0xbd0/0xbd0 [ 2169.740673] secondary_startup_64_no_verify+0x15d/0x16b [ 2169.741179] </TASK> [ 2169.741686] Allocated by task 1098: [ 2169.742058] kasan_save_stack+0x1c/0x40 [ 2169.742456] kasan_save_track+0x10/0x30 [ 2169.742852] __kasan_kmalloc+0x83/0x90 [ 2169.743246] mlx5_dpll_probe+0xf5/0x3c0 [mlx5_dpll] [ 2169.743730] auxiliary_bus_probe+0x62/0xb0 [ 2169.744148] really_probe+0x127/0x590 [ 2169.744534] __driver_probe_device+0xd2/0x200 [ 2169.744973] device_driver_attach+0x6b/0xf0 [ 2169.745402] bind_store+0x90/0xe0 [ 2169.745761] kernfs_fop_write_iter+0x1df/0x2a0 [ 2169.746210] vfs_write+0x41f/0x790 [ 2169.746579] ksys_write+0xc7/0x160 [ 2169.746947] do_syscall_64+0x6f/0x140 [ 2169.747333] entry_SYSCALL_64_after_hwframe+0x46/0x4e [ 2169.748049] Freed by task 1220: [ 2169.748393] kasan_save_stack+0x1c/0x40 [ 2169.748789] kasan_save_track+0x10/0x30 [ 2169.749188] kasan_save_free_info+0x3b/0x50 [ 2169.749621] poison_slab_object+0x106/0x180 [ 2169.750044] __kasan_slab_free+0x14/0x50 [ 2169.750451] kfree+0x118/0x330 [ 2169.750792] mlx5_dpll_remove+0xf5/0x110 [mlx5_dpll] [ 2169.751271] auxiliary_bus_remove+0x2e/0x40 [ 2169.751694] device_release_driver_internal+0x24b/0x2e0 [ 2169.752191] unbind_store+0xa6/0xb0 [ 2169.752563] kernfs_fop_write_iter+0x1df/0x2a0 [ 2169.753004] vfs_write+0x41f/0x790 [ 2169.753381] ksys_write+0xc7/0x160 [ 2169.753750] do_syscall_64+0x6f/0x140 [ 2169.754132] entry_SYSCALL_64_after_hwframe+0x46/0x4e [ 2169.754847] Last potentially related work creation: [ 2169.755315] kasan_save_stack+0x1c/0x40 [ 2169.755709] __kasan_record_aux_stack+0x9b/0xf0 [ 2169.756165] __queue_work+0x382/0x8f0 [ 2169.756552] call_timer_fn+0x126/0x320 [ 2169.756941] __run_timers.part.0+0x2ea/0x4c0 [ 2169.757376] run_timer_softirq+0x40/0x80 [ 2169.757782] __do_softirq+0x1a1/0x509 [ 2169.758387] Second to last potentially related work creation: [ 2169.758924] kasan_save_stack+0x1c/0x40 [ 2169.759322] __kasan_record_aux_stack+0x9b/0xf0 [ 2169.759773] __queue_work+0x382/0x8f0 [ 2169.760156] call_timer_fn+0x126/0x320 [ 2169.760550] __run_timers.part.0+0x2ea/0x4c0 [ 2169.760978] run_timer_softirq+0x40/0x80 [ 2169.761381] __do_softirq+0x1a1/0x509 [ 2169.761998] The buggy address belongs to the object at ffff88812b326a00 which belongs to the cache kmalloc-256 of size 256 [ 2169.763061] The buggy address is located 112 bytes inside of freed 256-byte region [ffff88812b326a00, ffff88812b326b00) [ 2169.764346] The buggy address belongs to the physical page: [ 2169.764866] page:000000000f2b1e89 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x12b324 [ 2169.765731] head:000000000f2b1e89 order:2 entire_mapcount:0 nr_pages_mapped:0 pincount:0 [ 2169.766484] anon flags: 0x200000000000840(slab|head|node=0|zone=2) [ 2169.767048] page_type: 0xffffffff() [ 2169.767422] raw: 0200000000000840 ffff888100042b40 0000000000000000 dead000000000001 [ 2169.768183] raw: 0000000000000000 0000000000200020 00000001ffffffff 0000000000000000 [ 2169.768899] page dumped because: kasan: bad access detected [ 2169.769649] Memory state around the buggy address: [ 2169.770116] ffff88812b326900: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 2169.770805] ffff88812b326980: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 2169.771485] >ffff88812b326a00: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 2169.772173] ^ [ 2169.772787] ffff88812b326a80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 2169.773477] ffff88812b326b00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 2169.774160] ================================================================== [ 2169.774845] ================================================================== I didn't manage to reproduce it. Though the issue seems to be obvious. There is a chance that the mlx5_dpll_remove() calls cancel_delayed_work() when the work runs and manages to re-arm itself. In that case, after delay timer triggers next attempt to queue it, it works with freed memory. Fix this by using cancel_delayed_work_sync() instead which makes sure that work is done when it returns. Fixes: 496fd0a26bbf ("mlx5: Implement SyncE support using DPLL infrastructure") Signed-off-by: Jiri Pirko <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-02-08dpll: fix possible deadlock during netlink dump operationJiri Pirko4-24/+6
Recently, I've been hitting following deadlock warning during dpll pin dump: [52804.637962] ====================================================== [52804.638536] WARNING: possible circular locking dependency detected [52804.639111] 6.8.0-rc2jiri+ #1 Not tainted [52804.639529] ------------------------------------------------------ [52804.640104] python3/2984 is trying to acquire lock: [52804.640581] ffff88810e642678 (nlk_cb_mutex-GENERIC){+.+.}-{3:3}, at: netlink_dump+0xb3/0x780 [52804.641417] but task is already holding lock: [52804.642010] ffffffff83bde4c8 (dpll_lock){+.+.}-{3:3}, at: dpll_lock_dumpit+0x13/0x20 [52804.642747] which lock already depends on the new lock. [52804.643551] the existing dependency chain (in reverse order) is: [52804.644259] -> #1 (dpll_lock){+.+.}-{3:3}: [52804.644836] lock_acquire+0x174/0x3e0 [52804.645271] __mutex_lock+0x119/0x1150 [52804.645723] dpll_lock_dumpit+0x13/0x20 [52804.646169] genl_start+0x266/0x320 [52804.646578] __netlink_dump_start+0x321/0x450 [52804.647056] genl_family_rcv_msg_dumpit+0x155/0x1e0 [52804.647575] genl_rcv_msg+0x1ed/0x3b0 [52804.648001] netlink_rcv_skb+0xdc/0x210 [52804.648440] genl_rcv+0x24/0x40 [52804.648831] netlink_unicast+0x2f1/0x490 [52804.649290] netlink_sendmsg+0x36d/0x660 [52804.649742] __sock_sendmsg+0x73/0xc0 [52804.650165] __sys_sendto+0x184/0x210 [52804.650597] __x64_sys_sendto+0x72/0x80 [52804.651045] do_syscall_64+0x6f/0x140 [52804.651474] entry_SYSCALL_64_after_hwframe+0x46/0x4e [52804.652001] -> #0 (nlk_cb_mutex-GENERIC){+.+.}-{3:3}: [52804.652650] check_prev_add+0x1ae/0x1280 [52804.653107] __lock_acquire+0x1ed3/0x29a0 [52804.653559] lock_acquire+0x174/0x3e0 [52804.653984] __mutex_lock+0x119/0x1150 [52804.654423] netlink_dump+0xb3/0x780 [52804.654845] __netlink_dump_start+0x389/0x450 [52804.655321] genl_family_rcv_msg_dumpit+0x155/0x1e0 [52804.655842] genl_rcv_msg+0x1ed/0x3b0 [52804.656272] netlink_rcv_skb+0xdc/0x210 [52804.656721] genl_rcv+0x24/0x40 [52804.657119] netlink_unicast+0x2f1/0x490 [52804.657570] netlink_sendmsg+0x36d/0x660 [52804.658022] __sock_sendmsg+0x73/0xc0 [52804.658450] __sys_sendto+0x184/0x210 [52804.658877] __x64_sys_sendto+0x72/0x80 [52804.659322] do_syscall_64+0x6f/0x140 [52804.659752] entry_SYSCALL_64_after_hwframe+0x46/0x4e [52804.660281] other info that might help us debug this: [52804.661077] Possible unsafe locking scenario: [52804.661671] CPU0 CPU1 [52804.662129] ---- ---- [52804.662577] lock(dpll_lock); [52804.662924] lock(nlk_cb_mutex-GENERIC); [52804.663538] lock(dpll_lock); [52804.664073] lock(nlk_cb_mutex-GENERIC); [52804.664490] The issue as follows: __netlink_dump_start() calls control->start(cb) with nlk->cb_mutex held. In control->start(cb) the dpll_lock is taken. Then nlk->cb_mutex is released and taken again in netlink_dump(), while dpll_lock still being held. That leads to ABBA deadlock when another CPU races with the same operation. Fix this by moving dpll_lock taking into dumpit() callback which ensures correct lock taking order. Fixes: 9d71b54b65b1 ("dpll: netlink: Add DPLL framework base functions") Signed-off-by: Jiri Pirko <[email protected]> Reviewed-by: Arkadiusz Kubalewski <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-02-09Merge tag 'drm-msm-fixes-2024-02-07' of ↵Dave Airlie6-20/+22
https://gitlab.freedesktop.org/drm/msm into drm-fixes Fixes for v6.8-rc4 DPU: - fix for kernel doc warnings and smatch warnings in dpu_encoder - fix for smatch warning in dpu_encoder - fix the bus bandwidth value for SDM670 DP: - fixes to handle unknown bpc case correctly for DP. The current code was spilling over into other bits of DP configuration register, had to be fixed to avoid the extra shifts which were causing the spill over - fix for MISC0 programming in DP driver to program the correct colorimetry value GPU: - dmabuf vmap fix - a610 UBWC corruption fix (incorrect hbb) - revert a commit that was making GPU recovery unreliable Signed-off-by: Dave Airlie <[email protected]> From: Rob Clark <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/CAF6AEGv+tb1+_cp7ftxcMZbbxE9810rvxeaC50eL=msQ+zkm0g@mail.gmail.com
2024-02-09Merge tag 'amd-drm-fixes-6.8-2024-02-08' of ↵Dave Airlie28-98/+220
https://gitlab.freedesktop.org/agd5f/linux into drm-fixes amd-drm-fixes-6.8-2024-02-08: amdgpu: - Misc NULL/bounds check fixes - ODM pipe policy fix - Aborted suspend fixes - JPEG 4.0.5 fix - DCN 3.5 fixes - PSP fix - DP MST fix - Phantom pipe fix - VRAM vendor fix - Clang fix - SR-IOV fix Signed-off-by: Dave Airlie <[email protected]> From: Alex Deucher <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2024-02-09Merge tag 'drm-intel-fixes-2024-02-08' of ↵Dave Airlie4-6/+5
git://anongit.freedesktop.org/drm/drm-intel into drm-fixes - Just includes gvt-fixes-2024-02-05 Signed-off-by: Dave Airlie <[email protected]> From: Joonas Lahtinen <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2024-02-09Merge tag 'drm-xe-fixes-2024-02-08' of ↵Dave Airlie9-65/+57
https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes Driver Changes: - Fix a loop in an error path - Fix a missing dma-fence reference - Fix a retry path on userptr REMAP - Workaround for a false gcc warning - Fix missing map of the usm batch buffer in the migrate vm. - Fix a memory leak. - Fix a bad assumption of used page size - Fix hitting a BUG() due to zero pages to map. - Remove some leftover async bind queue relics Signed-off-by: Dave Airlie <[email protected]> From: Thomas Hellstrom <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/ZcS2LllawGifubsk@fedora
2024-02-09Merge tag 'drm-misc-fixes-2024-02-08' of ↵Dave Airlie9-58/+77
git://anongit.freedesktop.org/drm/drm-misc into drm-fixes A null pointer dereference fix for v3d, a TTM pool initialization fix, several fixes for nouveau around register size, DMA buffer leaks and API consistency, a multiple fixes for ivpu around MMU setup, initialization and firmware interactions. Signed-off-by: Dave Airlie <[email protected]> From: Maxime Ripard <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/4wsi2i6kgkqdu7nzp4g7hxasbswnrmc5cakgf5zzvnix53u7lr@4rmp7hwblow3