aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2024-02-10Merge tag 'mm-hotfixes-stable-2024-02-10-11-16' of ↵Linus Torvalds17-101/+162
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "21 hotfixes. 12 are cc:stable and the remainder pertain to post-6.7 issues or aren't considered to be needed in earlier kernel versions" * tag 'mm-hotfixes-stable-2024-02-10-11-16' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (21 commits) nilfs2: fix potential bug in end_buffer_async_write mm/damon/sysfs-schemes: fix wrong DAMOS tried regions update timeout setup nilfs2: fix hang in nilfs_lookup_dirty_data_buffers() MAINTAINERS: Leo Yan has moved mm/zswap: don't return LRU_SKIP if we have dropped lru lock fs,hugetlb: fix NULL pointer dereference in hugetlbs_fill_super mailmap: switch email address for John Moon mm: zswap: fix objcg use-after-free in entry destruction mm/madvise: don't forget to leave lazy MMU mode in madvise_cold_or_pageout_pte_range() arch/arm/mm: fix major fault accounting when retrying under per-VMA lock selftests: core: include linux/close_range.h for CLOSE_RANGE_* macros mm/memory-failure: fix crash in split_huge_page_to_list from soft_offline_page mm: memcg: optimize parent iteration in memcg_rstat_updated() nilfs2: fix data corruption in dsync block recovery for small block sizes mm/userfaultfd: UFFDIO_MOVE implementation should use ptep_get() exit: wait_task_zombie: kill the no longer necessary spin_lock_irq(siglock) fs/proc: do_task_stat: use sig->stats_lock to gather the threads/children stats fs/proc: do_task_stat: move thread_group_cputime_adjusted() outside of lock_task_sighand() getrusage: use sig->stats_lock rather than lock_task_sighand() getrusage: move thread_group_cputime_adjusted() outside of lock_task_sighand() ...
2024-02-10Merge branch 'tls-fixes'David S. Miller3-82/+66
Jakub Kicinski says: ==================== net: tls: fix some issues with async encryption valis was reporting a race on socket close so I sat down to try to fix it. I used Sabrina's async crypto debug patch to test... and in the process run into some of the same issues, and created very similar fixes :( I didn't realize how many of those patches weren't applied. Once I found Sabrina's code [1] it turned out to be so similar in fact that I added her S-o-b's and Co-develop'eds in a semi-haphazard way. With this series in place all expected tests pass with async crypto. Sabrina had a few more fixes, but I'll leave those to her, things are not crashing anymore. [1] https://lore.kernel.org/netdev/[email protected]/ ==================== Signed-off-by: David S. Miller <[email protected]>
2024-02-10net: tls: fix returned read length with async decryptJakub Kicinski1-1/+0
We double count async, non-zc rx data. The previous fix was lucky because if we fully zc async_copy_bytes is 0 so we add 0. Decrypted already has all the bytes we handled, in all cases. We don't have to adjust anything, delete the erroneous line. Fixes: 4d42cd6bc2ac ("tls: rx: fix return value for async crypto") Co-developed-by: Sabrina Dubroca <[email protected]> Signed-off-by: Sabrina Dubroca <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-02-10selftests: tls: use exact comparison in recv_partialJakub Kicinski1-4/+4
This exact case was fail for async crypto and we weren't catching it. Signed-off-by: Jakub Kicinski <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-02-10net: tls: fix use-after-free with partial reads and async decryptSabrina Dubroca1-2/+3
tls_decrypt_sg doesn't take a reference on the pages from clear_skb, so the put_page() in tls_decrypt_done releases them, and we trigger a use-after-free in process_rx_list when we try to read from the partially-read skb. Fixes: fd31f3996af2 ("tls: rx: decrypt into a fresh skb") Signed-off-by: Sabrina Dubroca <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-02-10net: tls: handle backlogging of crypto requestsJakub Kicinski1-0/+22
Since we're setting the CRYPTO_TFM_REQ_MAY_BACKLOG flag on our requests to the crypto API, crypto_aead_{encrypt,decrypt} can return -EBUSY instead of -EINPROGRESS in valid situations. For example, when the cryptd queue for AESNI is full (easy to trigger with an artificially low cryptd.cryptd_max_cpu_qlen), requests will be enqueued to the backlog but still processed. In that case, the async callback will also be called twice: first with err == -EINPROGRESS, which it seems we can just ignore, then with err == 0. Compared to Sabrina's original patch this version uses the new tls_*crypt_async_wait() helpers and converts the EBUSY to EINPROGRESS to avoid having to modify all the error handling paths. The handling is identical. Fixes: a54667f6728c ("tls: Add support for encryption using async offload accelerator") Fixes: 94524d8fc965 ("net/tls: Add support for async decryption of tls records") Co-developed-by: Sabrina Dubroca <[email protected]> Signed-off-by: Sabrina Dubroca <[email protected]> Link: https://lore.kernel.org/netdev/9681d1febfec295449a62300938ed2ae66983f28.1694018970.git.sd@queasysnail.net/ Signed-off-by: Jakub Kicinski <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-02-10tls: fix race between tx work scheduling and socket closeJakub Kicinski1-10/+6
Similarly to previous commit, the submitting thread (recvmsg/sendmsg) may exit as soon as the async crypto handler calls complete(). Reorder scheduling the work before calling complete(). This seems more logical in the first place, as it's the inverse order of what the submitting thread will do. Reported-by: valis <[email protected]> Fixes: a42055e8d2c3 ("net/tls: Add support for async encryption of records for performance") Signed-off-by: Jakub Kicinski <[email protected]> Reviewed-by: Simon Horman <[email protected]> Reviewed-by: Sabrina Dubroca <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-02-10tls: fix race between async notify and socket closeJakub Kicinski2-38/+10
The submitting thread (one which called recvmsg/sendmsg) may exit as soon as the async crypto handler calls complete() so any code past that point risks touching already freed data. Try to avoid the locking and extra flags altogether. Have the main thread hold an extra reference, this way we can depend solely on the atomic ref counter for synchronization. Don't futz with reiniting the completion, either, we are now tightly controlling when completion fires. Reported-by: valis <[email protected]> Fixes: 0cada33241d9 ("net/tls: fix race condition causing kernel panic") Signed-off-by: Jakub Kicinski <[email protected]> Reviewed-by: Simon Horman <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Reviewed-by: Sabrina Dubroca <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-02-10net: tls: factor out tls_*crypt_async_wait()Jakub Kicinski1-51/+45
Factor out waiting for async encrypt and decrypt to finish. There are already multiple copies and a subsequent fix will need more. No functional changes. Note that crypto_wait_req() returns wait->err Signed-off-by: Jakub Kicinski <[email protected]> Reviewed-by: Simon Horman <[email protected]> Reviewed-by: Sabrina Dubroca <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-02-10Merge tag 'block-6.8-2024-02-10' of git://git.kernel.dk/linuxLinus Torvalds10-32/+45
Pull block fixes from Jens Axboe: - NVMe pull request via Keith: - Update a potentially stale firmware attribute (Maurizio) - Fixes for the recent verbose error logging (Keith, Chaitanya) - Protection information payload size fix for passthrough (Francis) - Fix for a queue freezing issue in virtblk (Yi) - blk-iocost underflow fix (Tejun) - blk-wbt task detection fix (Jan) * tag 'block-6.8-2024-02-10' of git://git.kernel.dk/linux: virtio-blk: Ensure no requests in virtqueues before deleting vqs. blk-iocost: Fix an UBSAN shift-out-of-bounds warning nvme: use ns->head->pi_size instead of t10_pi_tuple structure size nvme-core: fix comment to reflect right functions nvme: move passthrough logging attribute to head blk-wbt: Fix detection of dirty-throttled tasks nvme-host: fix the updating of the firmware version
2024-02-10Merge tag 'firewire-fixes-6.8-rc4' of ↵Linus Torvalds1-1/+17
git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394 Pull firewire fix from Takashi Sakamoto: "A change to accelerate the device detection step in some cases. In the self-identification step after bus-reset, all nodes in the same bus broadcast selfID packet including the value of gap count. The value is related to the cable hops between nodes, and used to calculate the subaction gap and the arbitration reset gap. When each node has the different value of the gap count, the asynchronous communication between them is unreliable, since an asynchronous transaction could be interrupted by another asynchronous transaction before completion. The gap count inconsistency can be resolved by several ways; e.g. the transfer of PHY configuration packet and generation of bus-reset. The current implementation of firewire stack can correctly detect the gap count inconsistency, however the recovery action from the inconsistency tends to be delayed after reading configuration ROM of root node. This results in the long time to probe devices in some combinations of hardware. Here the stack is changed to schedule the action as soon as possible" * tag 'firewire-fixes-6.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394: firewire: core: send bus reset promptly on gap count error
2024-02-10Merge tag '6.8-rc3-ksmbd-server-fixes' of git://git.samba.org/ksmbdLinus Torvalds2-2/+7
Pull smb server fixes from Steve French: "Two ksmbd server fixes: - memory leak fix - a minor kernel-doc fix" * tag '6.8-rc3-ksmbd-server-fixes' of git://git.samba.org/ksmbd: ksmbd: free aux buffer if ksmbd_iov_pin_rsp_read fails ksmbd: Add kernel-doc for ksmbd_extract_sharename() function
2024-02-09Merge tag 'scsi-fixes' of ↵Linus Torvalds4-10/+14
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "Three small driver fixes and one core fix. The core fix being a fixup to the one in the last pull request which didn't entirely move checking of scsi_host_busy() out from under the host lock" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: ufs: core: Remove the ufshcd_release() in ufshcd_err_handling_prepare() scsi: ufs: core: Fix shift issue in ufshcd_clear_cmd() scsi: lpfc: Use unsigned type for num_sge scsi: core: Move scsi_host_busy() out of host lock if it is for per-command
2024-02-09Merge tag '6.8-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds7-11/+36
Pull smb client fixes from Steve French: - reconnect fix - multichannel channel selection fix - minor mount warning fix - reparse point fix - null pointer check improvement * tag '6.8-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: smb3: clarify mount warning cifs: handle cases where multiple sessions share connection cifs: change tcon status when need_reconnect is set on it smb: client: set correct d_type for reparse points under DFS mounts smb3: add missing null server pointer check
2024-02-09Merge tag 'ceph-for-6.8-rc4' of https://github.com/ceph/ceph-clientLinus Torvalds10-44/+49
Pull ceph fixes from Ilya Dryomov: "Some fscrypt-related fixups (sparse reads are used only for encrypted files) and two cap handling fixes from Xiubo and Rishabh" * tag 'ceph-for-6.8-rc4' of https://github.com/ceph/ceph-client: ceph: always check dir caps asynchronously ceph: prevent use-after-free in encode_cap_msg() ceph: always set initial i_blkbits to CEPH_FSCRYPT_BLOCK_SHIFT libceph: just wait for more data to be available on the socket libceph: rename read_sparse_msg_*() to read_partial_sparse_msg_*() libceph: fail sparse-read if the data length doesn't match
2024-02-09Merge tag 'ntfs3_for_6.8' of ↵Linus Torvalds16-247/+381
https://github.com/Paragon-Software-Group/linux-ntfs3 Pull ntfs3 fixes from Konstantin Komarov: "Fixed: - size update for compressed file - some logic errors, overflows - memory leak - some code was refactored Added: - implement super_operations::shutdown Improved: - alternative boot processing - reduced stack usage" * tag 'ntfs3_for_6.8' of https://github.com/Paragon-Software-Group/linux-ntfs3: (28 commits) fs/ntfs3: Slightly simplify ntfs_inode_printk() fs/ntfs3: Add ioctl operation for directories (FITRIM) fs/ntfs3: Fix oob in ntfs_listxattr fs/ntfs3: Fix an NULL dereference bug fs/ntfs3: Update inode->i_size after success write into compressed file fs/ntfs3: Fixed overflow check in mi_enum_attr() fs/ntfs3: Correct function is_rst_area_valid fs/ntfs3: Use i_size_read and i_size_write fs/ntfs3: Prevent generic message "attempt to access beyond end of device" fs/ntfs3: use non-movable memory for ntfs3 MFT buffer cache fs/ntfs3: Use kvfree to free memory allocated by kvmalloc fs/ntfs3: Disable ATTR_LIST_ENTRY size check fs/ntfs3: Fix c/mtime typo fs/ntfs3: Add NULL ptr dereference checking at the end of attr_allocate_frame() fs/ntfs3: Add and fix comments fs/ntfs3: ntfs3_forced_shutdown use int instead of bool fs/ntfs3: Implement super_operations::shutdown fs/ntfs3: Drop suid and sgid bits as a part of fpunch fs/ntfs3: Add file_modified fs/ntfs3: Correct use bh_read ...
2024-02-09work around gcc bugs with 'asm goto' with outputsLinus Torvalds35-77/+96
We've had issues with gcc and 'asm goto' before, and we created a 'asm_volatile_goto()' macro for that in the past: see commits 3f0116c3238a ("compiler/gcc4: Add quirk for 'asm goto' miscompilation bug") and a9f180345f53 ("compiler/gcc4: Make quirk for asm_volatile_goto() unconditional"). Then, much later, we ended up removing the workaround in commit 43c249ea0b1e ("compiler-gcc.h: remove ancient workaround for gcc PR 58670") because we no longer supported building the kernel with the affected gcc versions, but we left the macro uses around. Now, Sean Christopherson reports a new version of a very similar problem, which is fixed by re-applying that ancient workaround. But the problem in question is limited to only the 'asm goto with outputs' cases, so instead of re-introducing the old workaround as-is, let's rename and limit the workaround to just that much less common case. It looks like there are at least two separate issues that all hit in this area: (a) some versions of gcc don't mark the asm goto as 'volatile' when it has outputs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98619 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110420 which is easy to work around by just adding the 'volatile' by hand. (b) Internal compiler errors: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110422 which are worked around by adding the extra empty 'asm' as a barrier, as in the original workaround. but the problem Sean sees may be a third thing since it involves bad code generation (not an ICE) even with the manually added 'volatile'. but the same old workaround works for this case, even if this feels a bit like voodoo programming and may only be hiding the issue. Reported-and-tested-by: Sean Christopherson <[email protected]> Link: https://lore.kernel.org/all/[email protected]/ Cc: Nick Desaulniers <[email protected]> Cc: Uros Bizjak <[email protected]> Cc: Jakub Jelinek <[email protected]> Cc: Andrew Pinski <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2024-02-09Merge branch 'net-fix-module_description-for-net-p5'Jakub Kicinski29-0/+29
Breno Leitao says: ==================== net: Fix MODULE_DESCRIPTION() for net (p5) There are hundreds of network modules that misses MODULE_DESCRIPTION(), causing a warning when compiling with W=1. Example: WARNING: modpost: missing MODULE_DESCRIPTION() in net/sched/em_cmp.o WARNING: modpost: missing MODULE_DESCRIPTION() in net/sched/em_nbyte.o WARNING: modpost: missing MODULE_DESCRIPTION() in net/sched/em_u32.o WARNING: modpost: missing MODULE_DESCRIPTION() in net/sched/em_meta.o WARNING: modpost: missing MODULE_DESCRIPTION() in net/sched/em_text.o WARNING: modpost: missing MODULE_DESCRIPTION() in net/sched/em_canid.o WARNING: modpost: missing MODULE_DESCRIPTION() in net/ipv4/ip_tunnel.o WARNING: modpost: missing MODULE_DESCRIPTION() in net/ipv4/ipip.o WARNING: modpost: missing MODULE_DESCRIPTION() in net/ipv4/ip_gre.o WARNING: modpost: missing MODULE_DESCRIPTION() in net/ipv4/udp_tunnel.o WARNING: modpost: missing MODULE_DESCRIPTION() in net/ipv4/ip_vti.o WARNING: modpost: missing MODULE_DESCRIPTION() in net/ipv4/ah4.o WARNING: modpost: missing MODULE_DESCRIPTION() in net/ipv4/esp4.o WARNING: modpost: missing MODULE_DESCRIPTION() in net/ipv4/xfrm4_tunnel.o WARNING: modpost: missing MODULE_DESCRIPTION() in net/ipv4/tunnel4.o WARNING: modpost: missing MODULE_DESCRIPTION() in net/xfrm/xfrm_algo.o WARNING: modpost: missing MODULE_DESCRIPTION() in net/xfrm/xfrm_user.o WARNING: modpost: missing MODULE_DESCRIPTION() in net/ipv6/ah6.o WARNING: modpost: missing MODULE_DESCRIPTION() in net/ipv6/esp6.o WARNING: modpost: missing MODULE_DESCRIPTION() in net/ipv6/xfrm6_tunnel.o WARNING: modpost: missing MODULE_DESCRIPTION() in net/ipv6/tunnel6.o This part5 of the patchset focus on the missing net/ module, which are now warning free. v1: https://lore.kernel.org/all/[email protected]/ v2: https://lore.kernel.org/all/[email protected]/ ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-02-09net: fill in MODULE_DESCRIPTION()s for dsa_loop_bdinfoBreno Leitao1-0/+1
W=1 builds now warn if module is built without a MODULE_DESCRIPTION(). Add descriptions to the DSA loopback fixed PHY module. Suggested-by: Florian Fainelli <[email protected]> Signed-off-by: Breno Leitao <[email protected]> Acked-by: Florian Fainelli <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-02-09net: fill in MODULE_DESCRIPTION()s for ipvtapBreno Leitao1-0/+1
W=1 builds now warn if module is built without a MODULE_DESCRIPTION(). Add descriptions to the IP-VLAN based tap driver. Signed-off-by: Breno Leitao <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-02-09net: fill in MODULE_DESCRIPTION()s for net/schedBreno Leitao6-0/+6
W=1 builds now warn if module is built without a MODULE_DESCRIPTION(). Add descriptions to the network schedulers. Suggested-by: Jamal Hadi Salim <[email protected]> Signed-off-by: Breno Leitao <[email protected]> Reviewed-by: Jamal Hadi Salim <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-02-09net: fill in MODULE_DESCRIPTION()s for ipv4 modulesBreno Leitao9-0/+9
W=1 builds now warn if module is built without a MODULE_DESCRIPTION(). Add descriptions to the IPv4 modules. Signed-off-by: Breno Leitao <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-02-09net: fill in MODULE_DESCRIPTION()s for ipv6 modulesBreno Leitao7-0/+7
W=1 builds now warn if module is built without a MODULE_DESCRIPTION(). Add descriptions to the IPv6 modules. Signed-off-by: Breno Leitao <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-02-09net: fill in MODULE_DESCRIPTION()s for 6LoWPANBreno Leitao1-0/+1
W=1 builds now warn if module is built without a MODULE_DESCRIPTION(). Add descriptions to IPv6 over Low power Wireless Personal Area Network. Signed-off-by: Breno Leitao <[email protected]> Acked-by: Alexander Aring <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-02-09net: fill in MODULE_DESCRIPTION()s for af_keyBreno Leitao1-0/+1
W=1 builds now warn if module is built without a MODULE_DESCRIPTION(). Add descriptions to the PF_KEY socket helpers. Signed-off-by: Breno Leitao <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-02-09net: fill in MODULE_DESCRIPTION()s for mpoaBreno Leitao1-0/+1
W=1 builds now warn if module is built without a MODULE_DESCRIPTION(). Add descriptions to the Multi-Protocol Over ATM (MPOA) driver. Signed-off-by: Breno Leitao <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-02-09net: fill in MODULE_DESCRIPTION()s for xfrmBreno Leitao2-0/+2
W=1 builds now warn if module is built without a MODULE_DESCRIPTION(). Add descriptions to the XFRM interface drivers. Signed-off-by: Breno Leitao <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-02-09lan966x: Fix crash when adding interface under a lagHoratiu Vultur1-2/+7
There is a crash when adding one of the lan966x interfaces under a lag interface. The issue can be reproduced like this: ip link add name bond0 type bond miimon 100 mode balance-xor ip link set dev eth0 master bond0 The reason is because when adding a interface under the lag it would go through all the ports and try to figure out which other ports are under that lag interface. And the issue is that lan966x can have ports that are NULL pointer as they are not probed. So then iterating over these ports it would just crash as they are NULL pointers. The fix consists in actually checking for NULL pointers before accessing something from the ports. Like we do in other places. Fixes: cabc9d49333d ("net: lan966x: Add lag support for lan966x") Signed-off-by: Horatiu Vultur <[email protected]> Reviewed-by: Michal Swiatkowski <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-02-09net/sched: act_mirred: Don't zero blockid when net device is being deletedVictor Nogueira1-2/+0
While testing tdc with parallel tests for mirred to block we caught an intermittent bug. The blockid was being zeroed out when a net device was deleted and, thus, giving us an incorrect blockid value whenever we tried to dump the mirred action. Since we don't increment the block refcount in the control path (and only use the ID), we don't need to zero the blockid field whenever a net device is going down. Fixes: 42f39036cda8 ("net/sched: act_mirred: Allow mirred to block") Signed-off-by: Victor Nogueira <[email protected]> Reviewed-by: Simon Horman <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Acked-by: Jamal Hadi Salim <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-02-09Merge branch 'net-openvswitch-limit-the-recursions-from-action-sets'Jakub Kicinski3-31/+102
Aaron Conole says: ==================== net: openvswitch: limit the recursions from action sets Open vSwitch module accepts actions as a list from the netlink socket and then creates a copy which it uses in the action set processing. During processing of the action list on a packet, the module keeps a count of the execution depth and exits processing if the action depth goes too high. However, during netlink processing the recursion depth isn't checked anywhere, and the copy trusts that kernel has large enough stack to accommodate it. The OVS sample action was the original action which could perform this kinds of recursion, and it originally checked that it didn't exceed the sample depth limit. However, when sample became optimized to provide the clone() semantics, the recursion limit was dropped. This series adds a depth limit during the __ovs_nla_copy_actions() call that will ensure we don't exceed the max that the OVS userspace could generate for a clone(). Additionally, this series provides a selftest in 2/2 that can be used to determine if the OVS module is allowing unbounded access. It can be safely omitted where the ovs selftest framework isn't available. ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-02-09selftests: openvswitch: Add validation for the recursion testAaron Conole2-15/+69
Add a test case into the netlink checks that will show the number of nested action recursions won't exceed 16. Going to 17 on a small clone call isn't enough to exhaust the stack on (most) systems, so it should be safe to run even on systems that don't have the fix applied. Signed-off-by: Aaron Conole <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-02-09net: openvswitch: limit the number of recursions from action setsAaron Conole1-16/+33
The ovs module allows for some actions to recursively contain an action list for complex scenarios, such as sampling, checking lengths, etc. When these actions are copied into the internal flow table, they are evaluated to validate that such actions make sense, and these calls happen recursively. The ovs-vswitchd userspace won't emit more than 16 recursion levels deep. However, the module has no such limit and will happily accept limits larger than 16 levels nested. Prevent this by tracking the number of recursions happening and manually limiting it to 16 levels nested. The initial implementation of the sample action would track this depth and prevent more than 3 levels of recursion, but this was removed to support the clone use case, rather than limited at the current userspace limit. Fixes: 798c166173ff ("openvswitch: Optimize sample action for the clone use cases") Signed-off-by: Aaron Conole <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-02-09smb3: clarify mount warningSteve French1-1/+1
When a user tries to use the "sec=krb5p" mount parameter to encrypt data on connection to a server (when authenticating with Kerberos), we indicate that it is not supported, but do not note the equivalent recommended mount parameter ("sec=krb5,seal") which turns on encryption for that mount (and uses Kerberos for auth). Update the warning message. Reviewed-by: Shyam Prasad N <[email protected]> Signed-off-by: Steve French <[email protected]>
2024-02-09cifs: handle cases where multiple sessions share connectionShyam Prasad N2-1/+6
Based on our implementation of multichannel, it is entirely possible that a server struct may not be found in any channel of an SMB session. In such cases, we should be prepared to move on and search for the server struct in the next session. Signed-off-by: Shyam Prasad N <[email protected]> Signed-off-by: Steve French <[email protected]>
2024-02-09cifs: change tcon status when need_reconnect is set on itShyam Prasad N3-1/+14
When a tcon is marked for need_reconnect, the intention is to have it reconnected. This change adjusts tcon->status in cifs_tree_connect when need_reconnect is set. Also, this change has a minor correction in resetting need_reconnect on success. It makes sure that it is done with tc_lock held. Signed-off-by: Shyam Prasad N <[email protected]> Signed-off-by: Steve French <[email protected]>
2024-02-09Merge branch 'selftests-forwarding-various-fixes'Jakub Kicinski3-9/+17
Ido Schimmel says: ==================== selftests: forwarding: Various fixes Fix various problems in the forwarding selftests so that they will pass in the netdev CI instead of being ignored. See commit messages for details. ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-02-09selftests: forwarding: Fix bridge locked port test flakinessIdo Schimmel1-2/+2
The redirection test case fails in the netdev CI on debug kernels because an FDB entry is learned despite the presence of a tc filter that redirects incoming traffic [1]. I am unable to reproduce the failure locally, but I can see how it can happen given that learning is first enabled and only then the ingress tc filter is configured. On debug kernels the time window between these two operations is longer compared to regular kernels, allowing random packets to be transmitted and trigger learning. Fix by reversing the order and configure the ingress tc filter before enabling learning. [1] [...] # TEST: Locked port MAB redirect [FAIL] # Locked entry created for redirected traffic Fixes: 38c43a1ce758 ("selftests: forwarding: Add test case for traffic redirection from a locked port") Signed-off-by: Ido Schimmel <[email protected]> Reviewed-by: Hangbin Liu <[email protected]> Acked-by: Nikolay Aleksandrov <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-02-09selftests: forwarding: Suppress grep warningsIdo Schimmel1-3/+3
Suppress the following grep warnings: [...] INFO: # Port group entries configuration tests - (*, G) TEST: Common port group entries configuration tests (IPv4 (*, G)) [ OK ] TEST: Common port group entries configuration tests (IPv6 (*, G)) [ OK ] grep: warning: stray \ before / grep: warning: stray \ before / grep: warning: stray \ before / TEST: IPv4 (*, G) port group entries configuration tests [ OK ] grep: warning: stray \ before / grep: warning: stray \ before / grep: warning: stray \ before / TEST: IPv6 (*, G) port group entries configuration tests [ OK ] [...] They do not fail the test, but do clutter the output. Fixes: b6d00da08610 ("selftests: forwarding: Add bridge MDB test") Signed-off-by: Ido Schimmel <[email protected]> Reviewed-by: Hangbin Liu <[email protected]> Acked-by: Nikolay Aleksandrov <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-02-09selftests: forwarding: Fix bridge MDB test flakinessIdo Schimmel1-2/+6
After enabling a multicast querier on the bridge (like the test is doing), the bridge will wait for the Max Response Delay before starting to forward according to its MDB in order to let Membership Reports enough time to be received and processed. Currently, the test is waiting for exactly the default Max Response Delay (10 seconds) which is racy and leads to failures [1]. Fix by reducing the Max Response Delay to 1 second. [1] [...] # TEST: IPv4 host entries forwarding tests [FAIL] # Packet locally received after flood Fixes: b6d00da08610 ("selftests: forwarding: Add bridge MDB test") Signed-off-by: Ido Schimmel <[email protected]> Reviewed-by: Hangbin Liu <[email protected]> Acked-by: Nikolay Aleksandrov <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-02-09selftests: forwarding: Fix layer 2 miss test flakinessIdo Schimmel1-2/+6
After enabling a multicast querier on the bridge (like the test is doing), the bridge will wait for the Max Response Delay before starting to forward according to its MDB in order to let Membership Reports enough time to be received and processed. Currently, the test is waiting for exactly the default Max Response Delay (10 seconds) which is racy and leads to failures [1]. Fix by reducing the Max Response Delay to 1 second. [1] [...] # TEST: L2 miss - Multicast (IPv4) [FAIL] # Unregistered multicast filter was hit after adding MDB entry Fixes: 8c33266ae26a ("selftests: forwarding: Add layer 2 miss test cases") Signed-off-by: Ido Schimmel <[email protected]> Reviewed-by: Hangbin Liu <[email protected]> Acked-by: Nikolay Aleksandrov <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-02-09selftests: net: Fix bridge backup port test flakinessIdo Schimmel1-0/+23
The test toggles the carrier of a bridge port in order to test the bridge backup port feature. Due to the linkwatch delayed work the carrier change is not always reflected fast enough to the bridge driver and packets are not forwarded as the test expects, resulting in failures [1]. Fix by busy waiting on the bridge port state until it changes to the desired state following the carrier change. [1] # Backup port # ----------- [...] # TEST: swp1 carrier off [ OK ] # TEST: No forwarding out of swp1 [FAIL] [ 641.995910] br0: port 1(swp1) entered disabled state # TEST: No forwarding out of vx0 [ OK ] Fixes: b408453053fb ("selftests: net: Add bridge backup port and backup nexthop ID test") Signed-off-by: Ido Schimmel <[email protected]> Reviewed-by: Petr Machata <[email protected]> Acked-by: Paolo Abeni <[email protected]> Acked-by: Nikolay Aleksandrov <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-02-09btrfs: add new unused block groups to the list of unused block groupsFilipe Manana1-0/+31
Space reservations for metadata are, most of the time, pessimistic as we reserve space for worst possible cases - where tree heights are at the maximum possible height (8), we need to COW every extent buffer in a tree path, need to split extent buffers, etc. For data, we generally reserve the exact amount of space we are going to allocate. The exception here is when using compression, in which case we reserve space matching the uncompressed size, as the compression only happens at writeback time and in the worst possible case we need that amount of space in case the data is not compressible. This means that when there's not available space in the corresponding space_info object, we may need to allocate a new block group, and then that block group might not be used after all. In this case the block group is never added to the list of unused block groups and ends up never being deleted - except if we unmount and mount again the fs, as when reading block groups from disk we add unused ones to the list of unused block groups (fs_info->unused_bgs). Otherwise a block group is only added to the list of unused block groups when we deallocate the last extent from it, so if no extent is ever allocated, the block group is kept around forever. This also means that if we have a bunch of tasks reserving space in parallel we can end up allocating many block groups that end up never being used or kept around for too long without being used, which has the potential to result in ENOSPC failures in case for example we over allocate too many metadata block groups and then end up in a state without enough unallocated space to allocate a new data block group. This is more likely to happen with metadata reservations as of kernel 6.7, namely since commit 28270e25c69a ("btrfs: always reserve space for delayed refs when starting transaction"), because we started to always reserve space for delayed references when starting a transaction handle for a non-zero number of items, and also to try to reserve space to fill the gap between the delayed block reserve's reserved space and its size. So to avoid this, when finishing the creation a new block group, add the block group to the list of unused block groups if it's still unused at that time. This way the next time the cleaner kthread runs, it will delete the block group if it's still unused and not needed to satisfy existing space reservations. Reported-by: Ivan Shapovalov <[email protected]> Link: https://lore.kernel.org/linux-btrfs/[email protected]/ CC: [email protected] # 6.7+ Reviewed-by: Johannes Thumshirn <[email protected]> Reviewed-by: Josef Bacik <[email protected]> Reviewed-by: Boris Burkov <[email protected]> Signed-off-by: Filipe Manana <[email protected]> Signed-off-by: David Sterba <[email protected]>
2024-02-09btrfs: do not delete unused block group if it may be used soonFilipe Manana1-0/+46
Before deleting a block group that is in the list of unused block groups (fs_info->unused_bgs), we check if the block group became used before deleting it, as extents from it may have been allocated after it was added to the list. However even if the block group was not yet used, there may be tasks that have only reserved space and have not yet allocated extents, and they might be relying on the availability of the unused block group in order to allocate extents. The reservation works first by increasing the "bytes_may_use" field of the corresponding space_info object (which may first require flushing delayed items, allocating a new block group, etc), and only later a task does the actual allocation of extents. For metadata we usually don't end up using all reserved space, as we are pessimistic and typically account for the worst cases (need to COW every single node in a path of a tree at maximum possible height, etc). For data we usually reserve the exact amount of space we're going to allocate later, except when using compression where we always reserve space based on the uncompressed size, as compression is only triggered when writeback starts so we don't know in advance how much space we'll actually need, or if the data is compressible. So don't delete an unused block group if the total size of its space_info object minus the block group's size is less then the sum of used space and space that may be used (space_info->bytes_may_use), as that means we have tasks that reserved space and may need to allocate extents from the block group. In this case, besides skipping the deletion, re-add the block group to the list of unused block groups so that it may be reconsidered later, in case the tasks that reserved space end up not needing to allocate extents from it. Allowing the deletion of the block group while we have reserved space, can result in tasks failing to allocate metadata extents (-ENOSPC) while under a transaction handle, resulting in a transaction abort, or failure during writeback for the case of data extents. CC: [email protected] # 6.0+ Reviewed-by: Johannes Thumshirn <[email protected]> Reviewed-by: Josef Bacik <[email protected]> Reviewed-by: Boris Burkov <[email protected]> Signed-off-by: Filipe Manana <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2024-02-09btrfs: add and use helper to check if block group is usedFilipe Manana2-2/+8
Add a helper function to determine if a block group is being used and make use of it at btrfs_delete_unused_bgs(). This helper will also be used in future code changes. Reviewed-by: Johannes Thumshirn <[email protected]> Reviewed-by: Josef Bacik <[email protected]> Reviewed-by: Boris Burkov <[email protected]> Signed-off-by: Filipe Manana <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2024-02-09btrfs: don't drop extent_map for free space inode on write errorJosef Bacik1-2/+17
While running the CI for an unrelated change I hit the following panic with generic/648 on btrfs_holes_spacecache. assertion failed: block_start != EXTENT_MAP_HOLE, in fs/btrfs/extent_io.c:1385 ------------[ cut here ]------------ kernel BUG at fs/btrfs/extent_io.c:1385! invalid opcode: 0000 [#1] PREEMPT SMP NOPTI CPU: 1 PID: 2695096 Comm: fsstress Kdump: loaded Tainted: G W 6.8.0-rc2+ #1 RIP: 0010:__extent_writepage_io.constprop.0+0x4c1/0x5c0 Call Trace: <TASK> extent_write_cache_pages+0x2ac/0x8f0 extent_writepages+0x87/0x110 do_writepages+0xd5/0x1f0 filemap_fdatawrite_wbc+0x63/0x90 __filemap_fdatawrite_range+0x5c/0x80 btrfs_fdatawrite_range+0x1f/0x50 btrfs_write_out_cache+0x507/0x560 btrfs_write_dirty_block_groups+0x32a/0x420 commit_cowonly_roots+0x21b/0x290 btrfs_commit_transaction+0x813/0x1360 btrfs_sync_file+0x51a/0x640 __x64_sys_fdatasync+0x52/0x90 do_syscall_64+0x9c/0x190 entry_SYSCALL_64_after_hwframe+0x6e/0x76 This happens because we fail to write out the free space cache in one instance, come back around and attempt to write it again. However on the second pass through we go to call btrfs_get_extent() on the inode to get the extent mapping. Because this is a new block group, and with the free space inode we always search the commit root to avoid deadlocking with the tree, we find nothing and return a EXTENT_MAP_HOLE for the requested range. This happens because the first time we try to write the space cache out we hit an error, and on an error we drop the extent mapping. This is normal for normal files, but the free space cache inode is special. We always expect the extent map to be correct. Thus the second time through we end up with a bogus extent map. Since we're deprecating this feature, the most straightforward way to fix this is to simply skip dropping the extent map range for this failed range. I shortened the test by using error injection to stress the area to make it easier to reproduce. With this patch in place we no longer panic with my error injection test. CC: [email protected] # 4.14+ Reviewed-by: Filipe Manana <[email protected]> Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: David Sterba <[email protected]>
2024-02-09Merge tag 'riscv-for-linus-6.8-rc4' of ↵Linus Torvalds7-6/+89
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V fixes from Palmer Dabbelt: - fix missing TLB flush during early boot on SPARSEMEM_VMEMMAP configurations - fixes to correctly implement the break-before-make behavior requried by the ISA for NAPOT mappings - fix a missing TLB flush on intermediate mapping changes - fix build warning about a missing declaration of overflow_stack - fix performace regression related to incorrect tracking of completed batch TLB flushes * tag 'riscv-for-linus-6.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: riscv: Fix arch_tlbbatch_flush() by clearing the batch cpumask riscv: declare overflow_stack as exported from traps.c riscv: Fix arch_hugetlb_migration_supported() for NAPOT riscv: Flush the tlb when a page directory is freed riscv: Fix hugetlb_mask_last_page() when NAPOT is enabled riscv: Fix set_huge_pte_at() for NAPOT mapping riscv: mm: execute local TLB flush after populating vmemmap
2024-02-09Merge tag 'trace-v6.8-rc3' of ↵Linus Torvalds2-38/+47
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing fixes from Steven Rostedt: - Fix broken direct trampolines being called when another callback is attached the same function. ARM 64 does not support FTRACE_WITH_REGS, and when it added direct trampoline calls from ftrace, it removed the "WITH_REGS" flag from the ftrace_ops for direct trampolines. This broke x86 as x86 requires direct trampolines to have WITH_REGS. This wasn't noticed because direct trampolines work as long as the function it is attached to is not shared with other callbacks (like the function tracer). When there are other callbacks, a helper trampoline is called, to call all the non direct callbacks and when it returns, the direct trampoline is called. For x86, the direct trampoline sets a flag in the regs field to tell the x86 specific code to call the direct trampoline. But this only works if the ftrace_ops had WITH_REGS set. ARM does things differently that does not require this. For now, set WITH_REGS if the arch supports WITH_REGS (which ARM does not), and this makes it work for both ARM64 and x86. - Fix wasted memory in the saved_cmdlines logic. The saved_cmdlines is a cache that maps PIDs to COMMs that tracing can use. Most trace events only save the PID in the event. The saved_cmdlines file lists PIDs to COMMs so that the tracing tools can show an actual name and not just a PID for each event. There's an array of PIDs that map to a small set of saved COMM strings. The array is set to PID_MAX_DEFAULT which is usually set to 32768. When a PID comes in, it will add itself to this array along with the index into the COMM array (note if the system allows more than PID_MAX_DEFAULT, this cache is similar to cache lines as an update of a PID that has the same PID_MAX_DEFAULT bits set will flush out another task with the same matching bits set). A while ago, the size of this cache was changed to be dynamic and the array was moved into a structure and created with kmalloc(). But this new structure had the size of 131104 bytes, or 0x20020 in hex. As kmalloc allocates in powers of two, it was actually allocating 0x40000 bytes (262144) leaving 131040 bytes of wasted memory. The last element of this structure was a pointer to the COMM string array which defaulted to just saving 128 COMMs. By changing the last field of this structure to a variable length string, and just having it round up to fill the allocated memory, the default size of the saved COMM cache is now 8190. This not only uses the wasted space, but actually saves space by removing the extra allocation for the COMM names. * tag 'trace-v6.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing: Fix wasted memory in saved_cmdlines logic ftrace: Fix DIRECT_CALLS to use SAVE_REGS by default
2024-02-09Merge tag 'probes-fixes-v6.8-rc3' of ↵Linus Torvalds3-17/+22
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull probes fixes from Masami Hiramatsu: - remove unnecessary initial values of kprobes local variables - probe-events parser bug fixes: - calculate the argument size and format string after setting type information from BTF, because BTF can change the size and format string. - show $comm parse error correctly instead of failing silently. * tag 'probes-fixes-v6.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: kprobes: Remove unnecessary initial values of variables tracing/probes: Fix to set arg size and fmt after setting type from BTF tracing/probes: Fix to show a parse error for bad type for $comm
2024-02-09Merge tag 'efi-fixes-for-v6.8-1' of ↵Linus Torvalds17-73/+97
git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi Pull EFI fixes from Ard Biesheuvel: "The only notable change here is the patch that changes the way we deal with spurious errors from the EFI memory attribute protocol. This will be backported to v6.6, and is intended to ensure that we will not paint ourselves into a corner when we tighten this further in order to comply with MS requirements on signed EFI code. Note that this protocol does not currently exist in x86 production systems in the field, only in Microsoft's fork of OVMF, but it will be mandatory for Windows logo certification for x86 PCs in the future. - Tighten ELF relocation checks on the RISC-V EFI stub - Give up if the new EFI memory attributes protocol fails spuriously on x86 - Take care not to place the kernel in the lowest 16 MB of DRAM on x86 - Omit special purpose EFI memory from memblock - Some fixes for the CXL CPER reporting code - Make the PE/COFF layout of mixed-mode capable images comply with a strict interpretation of the spec" * tag 'efi-fixes-for-v6.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi: x86/efistub: Use 1:1 file:memory mapping for PE/COFF .compat section cxl/trace: Remove unnecessary memcpy's cxl/cper: Fix errant CPER prints for CXL events efi: Don't add memblocks for soft-reserved memory efi: runtime: Fix potential overflow of soft-reserved region size efi/libstub: Add one kernel-doc comment x86/efistub: Avoid placing the kernel below LOAD_PHYSICAL_ADDR x86/efistub: Give up if memory attribute protocol returns an error riscv/efistub: Tighten ELF relocation check riscv/efistub: Ensure GP-relative addressing is not used
2024-02-09Merge tag 'pci-v6.8-fixes-2' of ↵Linus Torvalds1-4/+6
git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci Pull pci fixes from Bjorn Helgaas: - Fix an unintentional truncation of DWC MSI-X address to 32 bits and update similar MSI code to match (Dan Carpenter) * tag 'pci-v6.8-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: PCI: dwc: Clean up dw_pcie_ep_raise_msi_irq() alignment PCI: dwc: Fix a 64bit bug in dw_pcie_ep_raise_msix_irq()