aboutsummaryrefslogtreecommitdiff
path: root/include/uapi
AgeCommit message (Collapse)AuthorFilesLines
2024-03-11Merge tag 'vfs-6.9.uuid' of ↵Linus Torvalds1-0/+25
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs uuid updates from Christian Brauner: "This adds two new ioctl()s for getting the filesystem uuid and retrieving the sysfs path based on the path of a mounted filesystem. Getting the filesystem uuid has been implemented in filesystem specific code for a while it's now lifted as a generic ioctl" * tag 'vfs-6.9.uuid' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: xfs: add support for FS_IOC_GETFSSYSFSPATH fs: add FS_IOC_GETFSSYSFSPATH fat: Hook up sb->s_uuid fs: FS_IOC_GETUUID ovl: convert to super_set_uuid() fs: super_set_uuid()
2024-03-11Merge tag 'vfs-6.9.pidfd' of ↵Linus Torvalds2-1/+8
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull pdfd updates from Christian Brauner: - Until now pidfds could only be created for thread-group leaders but not for threads. There was no technical reason for this. We simply had no users that needed support for this. Now we do have users that need support for this. This introduces a new PIDFD_THREAD flag for pidfd_open(). If that flag is set pidfd_open() creates a pidfd that refers to a specific thread. In addition, we now allow clone() and clone3() to be called with CLONE_PIDFD | CLONE_THREAD which wasn't possible before. A pidfd that refers to an individual thread differs from a pidfd that refers to a thread-group leader: (1) Pidfds are pollable. A task may poll a pidfd and get notified when the task has exited. For thread-group leader pidfds the polling task is woken if the thread-group is empty. In other words, if the thread-group leader task exits when there are still threads alive in its thread-group the polling task will not be woken when the thread-group leader exits but rather when the last thread in the thread-group exits. For thread-specific pidfds the polling task is woken if the thread exits. (2) Passing a thread-group leader pidfd to pidfd_send_signal() will generate thread-group directed signals like kill(2) does. Passing a thread-specific pidfd to pidfd_send_signal() will generate thread-specific signals like tgkill(2) does. The default scope of the signal is thus determined by the type of the pidfd. Since use-cases exist where the default scope of the provided pidfd needs to be overriden the following flags are added to pidfd_send_signal(): - PIDFD_SIGNAL_THREAD Send a thread-specific signal. - PIDFD_SIGNAL_THREAD_GROUP Send a thread-group directed signal. - PIDFD_SIGNAL_PROCESS_GROUP Send a process-group directed signal. The scope change will only work if the struct pid is actually used for this scope. For example, in order to send a thread-group directed signal the provided pidfd must be used as a thread-group leader and similarly for PIDFD_SIGNAL_PROCESS_GROUP the struct pid must be used as a process group leader. - Move pidfds from the anonymous inode infrastructure to a tiny pseudo filesystem. This will unblock further work that we weren't able to do simply because of the very justified limitations of anonymous inodes. Moving pidfds to a tiny pseudo filesystem allows for statx on pidfds to become useful for the first time. They can now be compared by inode number which are unique for the system lifetime. Instead of stashing struct pid in file->private_data we can now stash it in inode->i_private. This makes it possible to introduce concepts that operate on a process once all file descriptors have been closed. A concrete example is kill-on-last-close. Another side-effect is that file->private_data is now freed up for per-file options for pidfds. Now, each struct pid will refer to a different inode but the same struct pid will refer to the same inode if it's opened multiple times. In contrast to now where each struct pid refers to the same inode. The tiny pseudo filesystem is not visible anywhere in userspace exactly like e.g., pipefs and sockfs. There's no lookup, there's no complex inode operations, nothing. Dentries and inodes are always deleted when the last pidfd is closed. We allocate a new inode and dentry for each struct pid and we reuse that inode and dentry for all pidfds that refer to the same struct pid. The code is entirely optional and fairly small. If it's not selected we fallback to anonymous inodes. Heavily inspired by nsfs. The dentry and inode allocation mechanism is moved into generic infrastructure that is now shared between nsfs and pidfs. The path_from_stashed() helper must be provided with a stashing location, an inode number, a mount, and the private data that is supposed to be used and it will provide a path that can be passed to dentry_open(). The helper will try retrieve an existing dentry from the provided stashing location. If a valid dentry is found it is reused. If not a new one is allocated and we try to stash it in the provided location. If this fails we retry until we either find an existing dentry or the newly allocated dentry could be stashed. Subsequent openers of the same namespace or task are then able to reuse it. - Currently it is only possible to get notified when a task has exited, i.e., become a zombie and userspace gets notified with EPOLLIN. We now also support waiting until the task has been reaped, notifying userspace with EPOLLHUP. - Ensure that ESRCH is reported for getfd if a task is exiting instead of the confusing EBADF. - Various smaller cleanups to pidfd functions. * tag 'vfs-6.9.pidfd' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (23 commits) libfs: improve path_from_stashed() libfs: add stashed_dentry_prune() libfs: improve path_from_stashed() helper pidfs: convert to path_from_stashed() helper nsfs: convert to path_from_stashed() helper libfs: add path_from_stashed() pidfd: add pidfs pidfd: move struct pidfd_fops pidfd: allow to override signal scope in pidfd_send_signal() pidfd: change pidfd_send_signal() to respect PIDFD_THREAD signal: fill in si_code in prepare_kill_siginfo() selftests: add ESRCH tests for pidfd_getfd() pidfd: getfd should always report ESRCH if a task is exiting pidfd: clone: allow CLONE_THREAD | CLONE_PIDFD together pidfd: exit: kill the no longer used thread_group_exited() pidfd: change do_notify_pidfd() to use __wake_up(poll_to_key(EPOLLIN)) pid: kill the obsolete PIDTYPE_PID code in transfer_pid() pidfd: kill the no longer needed do_notify_pidfd() in de_thread() pidfd_poll: report POLLHUP when pid_task() == NULL pidfd: implement PIDFD_THREAD flag for pidfd_open() ...
2024-03-11Merge tag 'vfs-6.9.misc' of ↵Linus Torvalds1-1/+4
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull misc vfs updates from Christian Brauner: "Misc features, cleanups, and fixes for vfs and individual filesystems. Features: - Support idmapped mounts for hugetlbfs. - Add RWF_NOAPPEND flag for pwritev2(). This allows us to fix a bug where the passed offset is ignored if the file is O_APPEND. The new flag allows a caller to enforce that the offset is honored to conform to posix even if the file was opened in append mode. - Move i_mmap_rwsem in struct address_space to avoid false sharing between i_mmap and i_mmap_rwsem. - Convert efs, qnx4, and coda to use the new mount api. - Add a generic is_dot_dotdot() helper that's used by various filesystems and the VFS code instead of open-coding it multiple times. - Recently we've added stable offsets which allows stable ordering when iterating directories exported through NFS on e.g., tmpfs filesystems. Originally an xarray was used for the offset map but that caused slab fragmentation issues over time. This switches the offset map to the maple tree which has a dense mode that handles this scenario a lot better. Includes tests. - Finally merge the case-insensitive improvement series Gabriel has been working on for a long time. This cleanly propagates case insensitive operations through ->s_d_op which in turn allows us to remove the quite ugly generic_set_encrypted_ci_d_ops() operations. It also improves performance by trying a case-sensitive comparison first and then fallback to case-insensitive lookup if that fails. This also fixes a bug where overlayfs would be able to be mounted over a case insensitive directory which would lead to all sort of odd behaviors. Cleanups: - Make file_dentry() a simple accessor now that ->d_real() is simplified because of the backing file work we did the last two cycles. - Use the dedicated file_mnt_idmap helper in ntfs3. - Use smp_load_acquire/store_release() in the i_size_read/write helpers and thus remove the hack to handle i_size reads in the filemap code. - The SLAB_MEM_SPREAD is a nop now. Remove it from various places in fs/ - It's no longer necessary to perform a second built-in initramfs unpack call because we retain the contents of the previous extraction. Remove it. - Now that we have removed various allocators kfree_rcu() always works with kmem caches and kmalloc(). So simplify various places that only use an rcu callback in order to handle the kmem cache case. - Convert the pipe code to use a lockdep comparison function instead of open-coding the nesting making lockdep validation easier. - Move code into fs-writeback.c that was located in a header but can be made static as it's only used in that one file. - Rewrite the alignment checking iterators for iovec and bvec to be easier to read, and also significantly more compact in terms of generated code. This saves 270 bytes of text on x86-64 (with clang-18) and 224 bytes on arm64 (with gcc-13). In profiles it also saves a bit of time for the same workload. - Switch various places to use KMEM_CACHE instead of kmem_cache_create(). - Use inode_set_ctime_to_ts() in inode_set_ctime_current() - Use kzalloc() in name_to_handle_at() to avoid kernel infoleak. - Various smaller cleanups for eventfds. Fixes: - Fix various comments and typos, and unneeded initializations. - Fix stack allocation hack for clang in the select code. - Improve dump_mapping() debug code on a best-effort basis. - Fix build errors in various selftests. - Avoid wrap-around instrumentation in various places. - Don't allow user namespaces without an idmapping to be used for idmapped mounts. - Fix sysv sb_read() call. - Fix fallback implementation of the get_name() export operation" * tag 'vfs-6.9.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (70 commits) hugetlbfs: support idmapped mounts qnx4: convert qnx4 to use the new mount api fs: use inode_set_ctime_to_ts to set inode ctime to current time libfs: Drop generic_set_encrypted_ci_d_ops ubifs: Configure dentry operations at dentry-creation time f2fs: Configure dentry operations at dentry-creation time ext4: Configure dentry operations at dentry-creation time libfs: Add helper to choose dentry operations at mount-time libfs: Merge encrypted_ci_dentry_ops and ci_dentry_ops fscrypt: Drop d_revalidate once the key is added fscrypt: Drop d_revalidate for valid dentries during lookup fscrypt: Factor out a helper to configure the lookup dentry ovl: Always reject mounting over case-insensitive directories libfs: Attempt exact-match comparison first during casefolded lookup efs: remove SLAB_MEM_SPREAD flag usage jfs: remove SLAB_MEM_SPREAD flag usage minix: remove SLAB_MEM_SPREAD flag usage openpromfs: remove SLAB_MEM_SPREAD flag usage proc: remove SLAB_MEM_SPREAD flag usage qnx6: remove SLAB_MEM_SPREAD flag usage ...
2024-03-11Merge tag 'asoc-v6.9' of ↵Takashi Iwai7-9/+29
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus ASoC: Updates for v6.9 This has been quite a small release, there's a lot of driver specific cleanups and minor enhancements but hardly anything on the core and only one new driver. Highlights include: - SoundWire support for AMD ACP 6.3 systems. - Support for reporting version information for AVS firmware. - Support DSPless mode for Intel Soundwire systems. - Support for configuring CS35L56 amplifiers using EFI calibration data. - Log which component is being operated on as part of power management trace events. - Support for Microchip SAM9x7, NXP i.MX95 and Qualcomm WCD939x
2024-03-11Merge tag 'loongarch-kvm-6.9' of ↵Paolo Bonzini6-27/+24
git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson into HEAD LoongArch KVM changes for v6.9 * Set reserved bits as zero in CPUCFG. * Start SW timer only when vcpu is blocking. * Do not restart SW timer when it is expired. * Remove unnecessary CSR register saving during enter guest.
2024-03-11Merge branch '100GbE' of ↵David S. Miller1-0/+48
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== ethtool: ice: Support for RSS settings to GTP Takeru Hayasaka enables RSS functionality for GTP packets on ice driver with ethtool. A user can include TEID and make RSS work for GTP-U over IPv4 by doing the following:`ethtool -N ens3 rx-flow-hash gtpu4 sde` In addition to gtpu(4|6), we now support gtpc(4|6),gtpc(4|6)t,gtpu(4|6)e, gtpu(4|6)u, and gtpu(4|6)d. gtpc(4|6): Used for GTP-C in IPv4 and IPv6, where the GTP header format does not include a TEID. gtpc(4|6)t: Used for GTP-C in IPv4 and IPv6, with a GTP header format that includes a TEID. gtpu(4|6): Used for GTP-U in both IPv4 and IPv6 scenarios. gtpu(4|6)e: Used for GTP-U with extended headers in both IPv4 and IPv6. gtpu(4|6)u: Used when the PSC (PDU session container) in the GTP-U extended header includes Uplink, applicable to both IPv4 and IPv6. gtpu(4|6)d: Used when the PSC in the GTP-U extended header includes Downlink, for both IPv4 and IPv6. ==================== Signed-off-by: David S. Miller <[email protected]>
2024-03-11Merge branch 'for-next' into for-linusTakashi Iwai1-0/+154
Prep for 6.9 merge. Signed-off-by: Takashi Iwai <[email protected]>
2024-03-09Merge tag 'kvm-x86-guest_memfd_fixes-6.8' of ↵Paolo Bonzini2-0/+3
https://github.com/kvm-x86/linux into HEAD KVM GUEST_MEMFD fixes for 6.8: - Make KVM_MEM_GUEST_MEMFD mutually exclusive with KVM_MEM_READONLY to avoid creating ABI that KVM can't sanely support. - Update documentation for KVM_SW_PROTECTED_VM to make it abundantly clear that such VMs are purely a development and testing vehicle, and come with zero guarantees. - Limit KVM_SW_PROTECTED_VM guests to the TDP MMU, as the long term plan is to support confidential VMs with deterministic private memory (SNP and TDX) only in the TDP MMU. - Fix a bug in a GUEST_MEMFD negative test that resulted in false passes when verifying that KVM_MEM_GUEST_MEMFD memslots can't be dirty logged.
2024-03-08net: nexthop: Expose nexthop group HW stats to user spaceIdo Schimmel1-0/+9
Add netlink support for reading NH group hardware stats. Stats collection is done through a new notifier, NEXTHOP_EVENT_HW_STATS_REPORT_DELTA. Drivers that implement HW counters for a given NH group are thereby asked to collect the stats and report back to core by calling nh_grp_hw_stats_report_delta(). This is similar to what netdevice L3 stats do. Besides exposing number of packets that passed in the HW datapath, also include information on whether any driver actually realizes the counters. The core can tell based on whether it got any _report_delta() reports from the drivers. This allows enabling the statistics at the group at any time, with drivers opting into supporting them. This is also in line with what netdevice L3 stats are doing. So as not to waste time and space, tie the collection and reporting of HW stats with a new op flag, NHA_OP_FLAG_DUMP_HW_STATS. Co-developed-by: Petr Machata <[email protected]> Signed-off-by: Petr Machata <[email protected]> Signed-off-by: Ido Schimmel <[email protected]> Reviewed-by: Kees Cook <[email protected]> # For the __counted_by bits Reviewed-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-03-08net: nexthop: Add ability to enable / disable hardware statisticsIdo Schimmel1-0/+3
Add netlink support for enabling collection of HW statistics on nexthop groups. Signed-off-by: Ido Schimmel <[email protected]> Reviewed-by: David Ahern <[email protected]> Signed-off-by: Petr Machata <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-03-08net: nexthop: Expose nexthop group stats to user spaceIdo Schimmel1-0/+30
Add netlink support for reading NH group stats. This data is only for statistics of the traffic in the SW datapath. HW nexthop group statistics will be added in the following patches. Emission of the stats is keyed to a new op_stats flag to avoid cluttering the netlink message with stats if the user doesn't need them: NHA_OP_FLAG_DUMP_STATS. Co-developed-by: Petr Machata <[email protected]> Signed-off-by: Petr Machata <[email protected]> Signed-off-by: Ido Schimmel <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-03-08net: nexthop: Add NHA_OP_FLAGSPetr Machata1-0/+3
In order to add per-nexthop statistics, but still not increase netlink message size for consumers that do not care about them, there needs to be a toggle through which the user indicates their desire to get the statistics. To that end, add a new attribute, NHA_OP_FLAGS. The idea is to be able to use the attribute for carrying of arbitrary operation-specific flags, i.e. not make it specific for get / dump. Add the new attribute to get and dump policies, but do not actually allow any flags yet -- those will come later as the flags themselves are defined. Add the necessary parsing code. Signed-off-by: Petr Machata <[email protected]> Reviewed-by: David Ahern <[email protected]> Reviewed-by: Ido Schimmel <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-03-08Merge drm/drm-next into drm-misc-nextThomas Zimmermann2-0/+9
Backmerging to get the latest fixes from drm-next; specifically the build fix from the patchset at [1]. Also fixes the build by removing an unused variable from rzg2l_du_vsp_atomic_flush(). Signed-off-by: Thomas Zimmermann <[email protected]> Link: https://patchwork.freedesktop.org/series/130720/ # 1
2024-03-08Merge branches 'arm/mediatek', 'arm/renesas', 'arm/smmu', 'x86/vt-d', ↵Joerg Roedel1-161/+0
'x86/amd' and 'core' into next
2024-03-07netdev: add queue stat for alloc failuresJakub Kicinski1-0/+1
Rx alloc failures are commonly counted by drivers. Support reporting those via netdev-genl queue stats. Acked-by: Stanislav Fomichev <[email protected]> Reviewed-by: Amritha Nambiar <[email protected]> Reviewed-by: Xuan Zhuo <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-03-07netdev: add per-queue statisticsJakub Kicinski1-0/+19
The ethtool-nl family does a good job exposing various protocol related and IEEE/IETF statistics which used to get dumped under ethtool -S, with creative names. Queue stats don't have a netlink API, yet, and remain a lion's share of ethtool -S output for new drivers. Not only is that bad because the names differ driver to driver but it's also bug-prone. Intuitively drivers try to report only the stats for active queues, but querying ethtool stats involves multiple system calls, and the number of stats is read separately from the stats themselves. Worse still when user space asks for values of the stats, it doesn't inform the kernel how big the buffer is. If number of stats increases in the meantime kernel will overflow user buffer. Add a netlink API for dumping queue stats. Queue information is exposed via the netdev-genl family, so add the stats there. Support per-queue and sum-for-device dumps. Latter will be useful when subsequent patches add more interesting common stats than just bytes and packets. The API does not currently distinguish between HW and SW stats. The expectation is that the source of the stats will either not matter much (good packets) or be obvious (skb alloc errors). Acked-by: Stanislav Fomichev <[email protected]> Reviewed-by: Amritha Nambiar <[email protected]> Reviewed-by: Xuan Zhuo <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-03-08Merge tag 'drm-etnaviv-next-2024-03-07' of ↵Dave Airlie1-0/+5
https://git.pengutronix.de/git/lst/linux into drm-next - various code cleanups - enhancements for NPU and MRT support Signed-off-by: Dave Airlie <[email protected]> From: Lucas Stach <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2024-03-07Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski2-22/+3
Cross-merge networking fixes after downstream PR. No conflicts. Adjacent changes: net/core/page_pool_user.c 0b11b1c5c320 ("netdev: let netlink core handle -EMSGSIZE errors") 429679dcf7d9 ("page_pool: fix netlink dump stop/resume") Signed-off-by: Jakub Kicinski <[email protected]>
2024-03-07drm/i915/guc: Use context hints for GT frequencyVinay Belgaumkar1-0/+15
Allow user to provide a low latency context hint. When set, KMD sends a hint to GuC which results in special handling for this context. SLPC will ramp the GT frequency aggressively every time it switches to this context. The down freq threshold will also be lower so GuC will ramp down the GT freq for this context more slowly. We also disable waitboost for this context as that will interfere with the strategy. We need to enable the use of SLPC Compute strategy during init, but it will apply only to contexts that set this bit during context creation. Userland can check whether this feature is supported using a new param- I915_PARAM_HAS_CONTEXT_FREQ_HINT. This flag is true for all guc submission enabled platforms as they use SLPC for frequency management. The Mesa usage model for this flag is here - https://gitlab.freedesktop.org/sushmave/mesa/-/commits/compute_hint v2: Rename flags as per review suggestions (Rodrigo, Tvrtko). Also, use flag bits in intel_context as it allows finer control for toggling per engine if needed (Tvrtko). v3: Minor review comments (Tvrtko) v4: Update comment (Sushma) Cc: Rodrigo Vivi <[email protected]> Cc: Tvrtko Ursulin <[email protected]> Cc: Sushma Venkatesh Reddy <[email protected]> Reviewed-by: Rodrigo Vivi <[email protected]> Acked-by: Ivan Briano <[email protected]> Signed-off-by: Vinay Belgaumkar <[email protected]> Signed-off-by: John Harrison <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2024-03-07arm64/ptrace: Expose FPMR via ptraceMark Brown1-0/+1
Add a new regset to expose FPMR via ptrace. It is not added to the FPSIMD registers since that structure is exposed elsewhere without any allowance for extension we don't add there. Signed-off-by: Mark Brown <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Catalin Marinas <[email protected]>
2024-03-06bpf: Introduce may_goto instructionAlexei Starovoitov1-0/+5
Introduce may_goto instruction that from the verifier pov is similar to open coded iterators bpf_for()/bpf_repeat() and bpf_loop() helper, but it doesn't iterate any objects. In assembly 'may_goto' is a nop most of the time until bpf runtime has to terminate the program for whatever reason. In the current implementation may_goto has a hidden counter, but other mechanisms can be used. For programs written in C the later patch introduces 'cond_break' macro that combines 'may_goto' with 'break' statement and has similar semantics: cond_break is a nop until bpf runtime has to break out of this loop. It can be used in any normal "for" or "while" loop, like for (i = zero; i < cnt; cond_break, i++) { The verifier recognizes that may_goto is used in the program, reserves additional 8 bytes of stack, initializes them in subprog prologue, and replaces may_goto instruction with: aux_reg = *(u64 *)(fp - 40) if aux_reg == 0 goto pc+off aux_reg -= 1 *(u64 *)(fp - 40) = aux_reg may_goto instruction can be used by LLVM to implement __builtin_memcpy, __builtin_strcmp. may_goto is not a full substitute for bpf_for() macro. bpf_for() doesn't have induction variable that verifiers sees, so 'i' in bpf_for(i, 0, 100) is seen as imprecise and bounded. But when the code is written as: for (i = 0; i < 100; cond_break, i++) the verifier see 'i' as precise constant zero, hence cond_break (aka may_goto) doesn't help to converge the loop. A static or global variable can be used as a workaround: static int zero = 0; for (i = zero; i < 100; cond_break, i++) // works! may_goto works well with arena pointers that don't need to be bounds checked on access. Load/store from arena returns imprecise unbounded scalar and loops with may_goto pass the verifier. Reserve new opcode BPF_JMP | BPF_JCOND for may_goto insn. JCOND stands for conditional pseudo jump. Since goto_or_nop insn was proposed, it may use the same opcode. may_goto vs goto_or_nop can be distinguished by src_reg: code = BPF_JMP | BPF_JCOND src_reg = 0 - may_goto src_reg = 1 - goto_or_nop Signed-off-by: Alexei Starovoitov <[email protected]> Signed-off-by: Andrii Nakryiko <[email protected]> Acked-by: Andrii Nakryiko <[email protected]> Acked-by: Eduard Zingerman <[email protected]> Acked-by: John Fastabend <[email protected]> Tested-by: John Fastabend <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2024-03-06ethtool: Add GTP RSS hash options to ethtool.hTakeru Hayasaka1-0/+48
This is a patch that enables RSS functionality for GTP packets using ethtool. A user can include TEID and make RSS work for GTP-U over IPv4 by doing the following:`ethtool -N ens3 rx-flow-hash gtpu4 sde` In addition to gtpu(4|6), we now support gtpc(4|6),gtpc(4|6)t,gtpu(4|6)e, gtpu(4|6)u, and gtpu(4|6)d. gtpc(4|6): Used for GTP-C in IPv4 and IPv6, where the GTP header format does not include a TEID. gtpc(4|6)t: Used for GTP-C in IPv4 and IPv6, with a GTP header format that includes a TEID. gtpu(4|6): Used for GTP-U in both IPv4 and IPv6 scenarios. gtpu(4|6)e: Used for GTP-U with extended headers in both IPv4 and IPv6. gtpu(4|6)u: Used when the PSC (PDU session container) in the GTP-U extended header includes Uplink, applicable to both IPv4 and IPv6. gtpu(4|6)d: Used when the PSC in the GTP-U extended header includes Downlink, for both IPv4 and IPv6. GTP generates a flow that includes an ID called TEID to identify the tunnel. This tunnel is created for each UE (User Equipment).By performing RSS based on this flow, it is possible to apply RSS for each communication unit from the UE. Without this, RSS would only be effective within the range of IP addresses. For instance, the PGW can only perform RSS within the IP range of the SGW. Problematic from a load distribution perspective, especially if there's a bias in the terminals connected to a particular base station.This case can be solved by using this patch. Signed-off-by: Takeru Hayasaka <[email protected]> Reviewed-by: Marcin Szycik <[email protected]> Tested-by: Pucha Himasekhar Reddy <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
2024-03-06fanotify: Fix misspelling of "writable"Vicki Pfau1-2/+2
Several file system notification system headers have "writable" misspelled as "writtable" in the comments. This patch fixes it in the fanotify header. Signed-off-by: Vicki Pfau <[email protected]> Signed-off-by: Jan Kara <[email protected]> Message-Id: <[email protected]>
2024-03-06inotify: Fix misspelling of "writable"Vicki Pfau1-2/+2
Several file system notification system headers have "writable" misspelled as "writtable" in the comments. This patch fixes it in the inotify header. Signed-off-by: Vicki Pfau <[email protected]> Signed-off-by: Jan Kara <[email protected]> Message-Id: <[email protected]>
2024-03-06fuse: Use the high bit of request ID for indicating resend requestsZhao Chen1-1/+12
Some FUSE daemons want to know if the received request is a resend request. The high bit of the fuse request ID is utilized for indicating this, enabling the receiver to perform appropriate handling. The init flag "FUSE_HAS_RESEND" is added to indicate this feature. Signed-off-by: Zhao Chen <[email protected]> Signed-off-by: Miklos Szeredi <[email protected]>
2024-03-06fuse: Introduce a new notification type for resend pending requestsZhao Chen1-0/+2
When a FUSE daemon panics and failover, we aim to minimize the impact on applications by reusing the existing FUSE connection. During this process, another daemon is employed to preserve the FUSE connection's file descriptor. The new started FUSE Daemon will takeover the fd and continue to provide service. However, it is possible for some inflight requests to be lost and never returned. As a result, applications awaiting replies would become stuck forever. To address this, we can resend these pending requests to the new started FUSE daemon. This patch introduces a new notification type "FUSE_NOTIFY_RESEND", which can trigger resending of the pending requests, ensuring they are properly processed again. Signed-off-by: Zhao Chen <[email protected]> Signed-off-by: Miklos Szeredi <[email protected]>
2024-03-06fuse: add support for explicit export disablingJingbo Xu1-0/+3
open_by_handle_at(2) can fail with -ESTALE with a valid handle returned by a previous name_to_handle_at(2) for evicted fuse inodes, which is especially common when entry_valid_timeout is 0, e.g. when the fuse daemon is in "cache=none" mode. The time sequence is like: name_to_handle_at(2) # succeed evict fuse inode open_by_handle_at(2) # fail The root cause is that, with 0 entry_valid_timeout, the dput() called in name_to_handle_at(2) will trigger iput -> evict(), which will send FUSE_FORGET to the daemon. The following open_by_handle_at(2) will send a new FUSE_LOOKUP request upon inode cache miss since the previous inode eviction. Then the fuse daemon may fail the FUSE_LOOKUP request with -ENOENT as the cached metadata of the requested inode has already been cleaned up during the previous FUSE_FORGET. The returned -ENOENT is treated as -ESTALE when open_by_handle_at(2) returns. This confuses the application somehow, as open_by_handle_at(2) fails when the previous name_to_handle_at(2) succeeds. The returned errno is also confusing as the requested file is not deleted and already there. It is reasonable to fail name_to_handle_at(2) early in this case, after which the application can fallback to open(2) to access files. Since this issue typically appears when entry_valid_timeout is 0 which is configured by the fuse daemon, the fuse daemon is the right person to explicitly disable the export when required. Also considering FUSE_EXPORT_SUPPORT actually indicates the support for lookups of "." and "..", and there are existing fuse daemons supporting export without FUSE_EXPORT_SUPPORT set, for compatibility, we add a new INIT flag for such purpose. Reviewed-by: Amir Goldstein <[email protected]> Signed-off-by: Jingbo Xu <[email protected]> Signed-off-by: Miklos Szeredi <[email protected]>
2024-03-05Merge tag 'v6.8-rc7' into gpio/for-nextBartosz Golaszewski6-27/+24
Linux 6.8-rc7
2024-03-05btrfs: qgroup: validate btrfs_qgroup_inherit parameterQu Wenruo1-0/+1
[BUG] Currently btrfs can create subvolume with an invalid qgroup inherit without triggering any error: # mkfs.btrfs -O quota -f $dev # mount $dev $mnt # btrfs subvolume create -i 2/0 $mnt/subv1 # btrfs qgroup show -prce --sync $mnt Qgroupid Referenced Exclusive Path -------- ---------- --------- ---- 0/5 16.00KiB 16.00KiB <toplevel> 0/256 16.00KiB 16.00KiB subv1 [CAUSE] We only do a very basic size check for btrfs_qgroup_inherit structure, but never really verify if the values are correct. Thus in btrfs_qgroup_inherit() function, we have to skip non-existing qgroups, and never return any error. [FIX] Fix the behavior and introduce extra checks: - Introduce early check for btrfs_qgroup_inherit structure Not only the size, but also all the qgroup ids would be verified. And the timing is very early, so we can return error early. This early check is very important for snapshot creation, as snapshot is delayed to transaction commit. - Drop support for btrfs_qgroup_inherit::num_ref_copies and num_excl_copies Those two members are used to specify to copy refr/excl numbers from other qgroups. This would definitely mark qgroup inconsistent, and btrfs-progs has dropped the support for them for a long time. It's time to drop the support for kernel. - Verify the supported btrfs_qgroup_inherit::flags Just in case we want to add extra flags for btrfs_qgroup_inherit. Now above subvolume creation would fail with -ENOENT other than silently ignore the non-existing qgroup. CC: [email protected] # 6.7+ Signed-off-by: Qu Wenruo <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2024-03-05drm/nouveau: move more missing UAPI bitsKarol Herbst1-0/+22
Those are already de-facto UAPI, so let's just move it into the uapi header. Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Lyude Paul <[email protected]> Reviewed-by: Danilo Krummrich <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2024-03-05fuse: implement ioctls to manage backing filesAmir Goldstein1-0/+9
FUSE server calls the FUSE_DEV_IOC_BACKING_OPEN ioctl with a backing file descriptor. If the call succeeds, a backing file identifier is returned. A later change will be using this backing file id in a reply to OPEN request with the flag FOPEN_PASSTHROUGH to setup passthrough of file operations on the open FUSE file to the backing file. The FUSE server should call FUSE_DEV_IOC_BACKING_CLOSE ioctl to close the backing file by its id. This can be done at any time, but if an open reply with FOPEN_PASSTHROUGH flag is still in progress, the open may fail if the backing file is closed before the fuse file was opened. Setting up backing files requires a server with CAP_SYS_ADMIN privileges. For the backing file to be successfully setup, the backing file must implement both read_iter and write_iter file operations. The limitation on the level of filesystem stacking allowed for the backing file is enforced before setting up the backing file. Signed-off-by: Alessio Balsini <[email protected]> Signed-off-by: Amir Goldstein <[email protected]> Signed-off-by: Miklos Szeredi <[email protected]>
2024-03-03RDMA/hns: Support userspace configuring congestion control algorithm with QP ↵Junxian Huang1-0/+16
granularity Currently, congestion control algorithm is statically configured in FW, and all QPs use the same algorithm(except UD which has a fixed configuration of DCQCN). This is not flexible enough. Support userspace configuring congestion control algorithm with QP granularity while creating QPs. If the algorithm is not specified in userspace, use the default one. Signed-off-by: Junxian Huang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Leon Romanovsky <[email protected]>
2024-03-02Merge tag 'for-netdev' of ↵Jakub Kicinski1-2/+23
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2024-02-29 We've added 119 non-merge commits during the last 32 day(s) which contain a total of 150 files changed, 3589 insertions(+), 995 deletions(-). The main changes are: 1) Extend the BPF verifier to enable static subprog calls in spin lock critical sections, from Kumar Kartikeya Dwivedi. 2) Fix confusing and incorrect inference of PTR_TO_CTX argument type in BPF global subprogs, from Andrii Nakryiko. 3) Larger batch of riscv BPF JIT improvements and enabling inlining of the bpf_kptr_xchg() for RV64, from Pu Lehui. 4) Allow skeleton users to change the values of the fields in struct_ops maps at runtime, from Kui-Feng Lee. 5) Extend the verifier's capabilities of tracking scalars when they are spilled to stack, especially when the spill or fill is narrowing, from Maxim Mikityanskiy & Eduard Zingerman. 6) Various BPF selftest improvements to fix errors under gcc BPF backend, from Jose E. Marchesi. 7) Avoid module loading failure when the module trying to register a struct_ops has its BTF section stripped, from Geliang Tang. 8) Annotate all kfuncs in .BTF_ids section which eventually allows for automatic kfunc prototype generation from bpftool, from Daniel Xu. 9) Several updates to the instruction-set.rst IETF standardization document, from Dave Thaler. 10) Shrink the size of struct bpf_map resp. bpf_array, from Alexei Starovoitov. 11) Initial small subset of BPF verifier prepwork for sleepable bpf_timer, from Benjamin Tissoires. 12) Fix bpftool to be more portable to musl libc by using POSIX's basename(), from Arnaldo Carvalho de Melo. 13) Add libbpf support to gcc in CORE macro definitions, from Cupertino Miranda. 14) Remove a duplicate type check in perf_event_bpf_event, from Florian Lehner. 15) Fix bpf_spin_{un,}lock BPF helpers to actually annotate them with notrace correctly, from Yonghong Song. 16) Replace the deprecated bpf_lpm_trie_key 0-length array with flexible array to fix build warnings, from Kees Cook. 17) Fix resolve_btfids cross-compilation to non host-native endianness, from Viktor Malik. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (119 commits) selftests/bpf: Test if shadow types work correctly. bpftool: Add an example for struct_ops map and shadow type. bpftool: Generated shadow variables for struct_ops maps. libbpf: Convert st_ops->data to shadow type. libbpf: Set btf_value_type_id of struct bpf_map for struct_ops. bpf: Replace bpf_lpm_trie_key 0-length array with flexible array bpf, arm64: use bpf_prog_pack for memory management arm64: patching: implement text_poke API bpf, arm64: support exceptions arm64: stacktrace: Implement arch_bpf_stack_walk() for the BPF JIT bpf: add is_async_callback_calling_insn() helper bpf: introduce in_sleepable() helper bpf: allow more maps in sleepable bpf programs selftests/bpf: Test case for lacking CFI stub functions. bpf: Check cfi_stubs before registering a struct_ops type. bpf: Clarify batch lookup/lookup_and_delete semantics bpf, docs: specify which BPF_ABS and BPF_IND fields were zero bpf, docs: Fix typos in instruction-set.rst selftests/bpf: update tcp_custom_syncookie to use scalar packet offset bpf: Shrink size of struct bpf_map/bpf_array. ... ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-03-01Merge tag 'sound-6.8-rc7' of ↵Linus Torvalds1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "The amount of changes wasn't as small as wished, but all reasonably small fixes. There is a PCM core API change, which is for correcting the behavior change we took in 6.8. The rest are device-specific fixes for ASoC AMD, Qualcomm, Cirrus codecs, HD-audio quirks & co" * tag 'sound-6.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: ASoC: amd: yc: Fix non-functional mic on Lenovo 21J2 ASoC: amd: yc: add new YC platform variant (0x63) support ALSA: hda/realtek - ALC285 reduce pop noise from Headphone port ASoC: amd: yc: Add Lenovo ThinkBook 21J0 into DMI quirk table ALSA: hda/realtek: Add special fixup for Lenovo 14IRP8 ASoC: soc-card: Fix missing locking in snd_soc_card_get_kcontrol() ALSA: hda/realtek: tas2781: enable subwoofer volume control ALSA: pcm: clarify and fix default msbits value for all formats ASoC: qcom: Fix uninitialized pointer dmactl ALSA: hda/realtek: fix mute/micmute LED For HP mt440 ALSA: Drop leftover snd-rtctimer stuff from Makefile ALSA: ump: Fix the discard error code from snd_ump_legacy_open() ALSA: hda/realtek: Enable Mute LED on HP 840 G8 (MB 8AB8) ASoC: cs35l56: Must clear HALO_STATE before issuing SYSTEM_RESET ALSA: hda/realtek: Fix top speaker connection on Dell Inspiron 16 Plus 7630 ALSA: firewire-lib: fix to check cycle continuity
2024-03-01Merge tag 'drm-fixes-2024-03-01' of https://gitlab.freedesktop.org/drm/kernelLinus Torvalds1-20/+1
Pull drm fixes from Dave Airlie: "Bunch of fixes, xe, amdgpu, nouveau and tegra all have a few. Then drm/bridge including some drivers/soc fallout fixes. The biggest thing in here is a new unit test for some buddy allocator fixes, otherwise a misc fbcon, ttm unit test and one msm revert. Seems pretty normal for this stage. buddy: - two allocation fixes + unit test fbcon: - font restore syzkaller fix ttm: - kunit test fix bridge: - fix aux-hpd leaks - fix aux-hpd registration - fix use after free in soc/qcom - fix boot on soc/qcom xe: - A couple of tracepoint updates from Priyanka and Lucas - Make sure BINDs are completed before accepting UNBINDs on LR vms - Don't arbitrarily restrict max number of batched binds - Add uapi for dumpable bos (agreed on IRC) - Remove unused uapi flags and a leftover comment - A couple of fixes related to the execlist backend msm: - DP: Revert a change which was causing a HDP regression amdgpu: - Fix potential buffer overflow - Fix power min cap - Suspend/resume fix - SI PM fix - eDP fix nouveau: - fix a misreported VRAM sizing - fix a regression in suspend/resume due to freeing tegra: - host1x reset fix - only remove existing driver if display is possible" * tag 'drm-fixes-2024-03-01' of https://gitlab.freedesktop.org/drm/kernel: (32 commits) drm/nouveau: keep DMA buffers required for suspend/resume nouveau: report byte usage in VRAM usage drm/xe/xe_trace: Add move_lacks_source detail to xe_bo_move trace drm/xe: Deny unbinds if uapi ufence pending drm/xe: Expose user fence from xe_sync_entry drm/xe: Use pointers in trace events drm/xe/xe_bo_move: Enhance xe_bo_move trace drm/xe/mmio: fix build warning for BAR resize on 32-bit drm/xe: get rid of MAX_BINDS drm/xe: Use vmalloc for array of bind allocation in bind IOCTL drm/xe: Don't support execlists in xe_gt_tlb_invalidation layer drm/xe: Fix execlist splat drm/xe/uapi: Remove unused flags drm/xe/uapi: Remove DRM_XE_VM_BIND_FLAG_ASYNC comment left over drm/xe: Add uapi for dumpable bos drm/amd/display: Add monitor patch for specific eDP Revert "drm/msm/dp: use drm_bridge_hpd_notify() to report HPD status changes" drm/tests/drm_buddy: add alloc_range_bias test drm/buddy: check range allocation matches alignment drm/buddy: fix range bias ...
2024-03-01pidfd: add pidfsChristian Brauner1-0/+1
This moves pidfds from the anonymous inode infrastructure to a tiny pseudo filesystem. This has been on my todo for quite a while as it will unblock further work that we weren't able to do simply because of the very justified limitations of anonymous inodes. Moving pidfds to a tiny pseudo filesystem allows: * statx() on pidfds becomes useful for the first time. * pidfds can be compared simply via statx() and then comparing inode numbers. * pidfds have unique inode numbers for the system lifetime. * struct pid is now stashed in inode->i_private instead of file->private_data. This means it is now possible to introduce concepts that operate on a process once all file descriptors have been closed. A concrete example is kill-on-last-close. * file->private_data is freed up for per-file options for pidfds. * Each struct pid will refer to a different inode but the same struct pid will refer to the same inode if it's opened multiple times. In contrast to now where each struct pid refers to the same inode. Even if we were to move to anon_inode_create_getfile() which creates new inodes we'd still be associating the same struct pid with multiple different inodes. The tiny pseudo filesystem is not visible anywhere in userspace exactly like e.g., pipefs and sockfs. There's no lookup, there's no complex inode operations, nothing. Dentries and inodes are always deleted when the last pidfd is closed. We allocate a new inode for each struct pid and we reuse that inode for all pidfds. We use iget_locked() to find that inode again based on the inode number which isn't recycled. We allocate a new dentry for each pidfd that uses the same inode. That is similar to anonymous inodes which reuse the same inode for thousands of dentries. For pidfds we're talking way less than that. There usually won't be a lot of concurrent openers of the same struct pid. They can probably often be counted on two hands. I know that systemd does use separate pidfd for the same struct pid for various complex process tracking issues. So I think with that things actually become way simpler. Especially because we don't have to care about lookup. Dentries and inodes continue to be always deleted. The code is entirely optional and fairly small. If it's not selected we fallback to anonymous inodes. Heavily inspired by nsfs which uses a similar stashing mechanism just for namespaces. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Christian Brauner <[email protected]>
2024-03-01drm/i915: Add missing doc for drm_i915_reset_statsNirmoy Das1-3/+13
Add missing doc for struct drm_i915_reset_stats. Cc: Andi Shyti <[email protected]> Signed-off-by: Nirmoy Das <[email protected]> Reviewed-by: Andi Shyti <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2024-03-01drm/panthor: Add uAPIBoris Brezillon1-0/+945
Panthor follows the lead of other recently submitted drivers with ioctls allowing us to support modern Vulkan features, like sparse memory binding: - Pretty standard GEM management ioctls (BO_CREATE and BO_MMAP_OFFSET), with the 'exclusive-VM' bit to speed-up BO reservation on job submission - VM management ioctls (VM_CREATE, VM_DESTROY and VM_BIND). The VM_BIND ioctl is loosely based on the Xe model, and can handle both asynchronous and synchronous requests - GPU execution context creation/destruction, tiler heap context creation and job submission. Those ioctls reflect how the hardware/scheduler works and are thus driver specific. We also have a way to expose IO regions, such that the usermode driver can directly access specific/well-isolate registers, like the LATEST_FLUSH register used to implement cache-flush reduction. This uAPI intentionally keeps usermode queues out of the scope, which explains why doorbell registers and command stream ring-buffers are not directly exposed to userspace. v6: - Add Maxime's and Heiko's acks v5: - Fix typo - Add Liviu's R-b v4: - Add a VM_GET_STATE ioctl - Fix doc - Expose the CORE_FEATURES register so we can deal with variants in the UMD - Add Steve's R-b v3: - Add the concept of sync-only VM operation - Fix support for 32-bit userspace - Rework drm_panthor_vm_create to pass the user VA size instead of the kernel VA size (suggested by Robin Murphy) - Typo fixes - Explicitly cast enums with top bit set to avoid compiler warnings in -pedantic mode. - Drop property core_group_count as it can be easily calculated by the number of bits set in l2_present. Co-developed-by: Steven Price <[email protected]> Signed-off-by: Steven Price <[email protected]> Signed-off-by: Boris Brezillon <[email protected]> Reviewed-by: Steven Price <[email protected]> Reviewed-by: Liviu Dudau <[email protected]> Acked-by: Maxime Ripard <[email protected]> Acked-by: Heiko Stuebner <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2024-03-01Merge tag 'drm-xe-fixes-2024-02-29' of ↵Dave Airlie1-20/+1
https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes UAPI Changes: - A couple of tracepoint updates from Priyanka and Lucas. - Make sure BINDs are completed before accepting UNBINDs on LR vms. - Don't arbitrarily restrict max number of batched binds. - Add uapi for dumpable bos (agreed on IRC). - Remove unused uapi flags and a leftover comment. Driver Changes: - A couple of fixes related to the execlist backend. Signed-off-by: Dave Airlie <[email protected]> From: Thomas Hellstrom <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/ZeCBg4MA2hd1oggN@fedora
2024-03-01Merge tag 'drm-intel-gt-next-2024-02-28' of ↵Dave Airlie1-0/+4
git://anongit.freedesktop.org/drm/drm-intel into drm-next Driver Changes: Fixes: - Add some boring kerneldoc (Tvrtko Ursulin) - Check before removing mm notifier (Nirmoy Signed-off-by: Dave Airlie <[email protected]> From: Tvrtko Ursulin <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/Zd889Wvu/ZKZSK4/@tursulin-desk
2024-02-29Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski3-2/+15
Cross-merge networking fixes after downstream PR. Conflicts: net/mptcp/protocol.c adf1bb78dab5 ("mptcp: fix snd_wnd initialization for passive socket") 9426ce476a70 ("mptcp: annotate lockless access for RX path fields") https://lore.kernel.org/all/[email protected]/ Adjacent changes: drivers/dpll/dpll_core.c 0d60d8df6f49 ("dpll: rely on rcu for netdev_dpll_pin()") e7f8df0e81bf ("dpll: move xa_erase() call in to match dpll_pin_alloc() error path order") drivers/net/veth.c 1ce7d306ea63 ("veth: try harder when allocating queue memory") 0bef512012b1 ("net: add netdev_lockdep_set_classes() to virtual drivers") drivers/net/wireless/intel/iwlwifi/mvm/d3.c 8c9bef26e98b ("wifi: iwlwifi: mvm: d3: implement suspend with MLO") 78f65fbf421a ("wifi: iwlwifi: mvm: ensure offloading TID queue exists") net/wireless/nl80211.c f78c1375339a ("wifi: nl80211: reject iftype change with mesh ID change") 414532d8aa89 ("wifi: cfg80211: use IEEE80211_MAX_MESH_ID_LEN appropriately") Signed-off-by: Jakub Kicinski <[email protected]>
2024-02-29bpf: Replace bpf_lpm_trie_key 0-length array with flexible arrayKees Cook1-1/+18
Replace deprecated 0-length array in struct bpf_lpm_trie_key with flexible array. Found with GCC 13: ../kernel/bpf/lpm_trie.c:207:51: warning: array subscript i is outside array bounds of 'const __u8[0]' {aka 'const unsigned char[]'} [-Warray-bounds=] 207 | *(__be16 *)&key->data[i]); | ^~~~~~~~~~~~~ ../include/uapi/linux/swab.h:102:54: note: in definition of macro '__swab16' 102 | #define __swab16(x) (__u16)__builtin_bswap16((__u16)(x)) | ^ ../include/linux/byteorder/generic.h:97:21: note: in expansion of macro '__be16_to_cpu' 97 | #define be16_to_cpu __be16_to_cpu | ^~~~~~~~~~~~~ ../kernel/bpf/lpm_trie.c:206:28: note: in expansion of macro 'be16_to_cpu' 206 | u16 diff = be16_to_cpu(*(__be16 *)&node->data[i] ^ | ^~~~~~~~~~~ In file included from ../include/linux/bpf.h:7: ../include/uapi/linux/bpf.h:82:17: note: while referencing 'data' 82 | __u8 data[0]; /* Arbitrary size */ | ^~~~ And found at run-time under CONFIG_FORTIFY_SOURCE: UBSAN: array-index-out-of-bounds in kernel/bpf/lpm_trie.c:218:49 index 0 is out of range for type '__u8 [*]' Changing struct bpf_lpm_trie_key is difficult since has been used by userspace. For example, in Cilium: struct egress_gw_policy_key { struct bpf_lpm_trie_key lpm_key; __u32 saddr; __u32 daddr; }; While direct references to the "data" member haven't been found, there are static initializers what include the final member. For example, the "{}" here: struct egress_gw_policy_key in_key = { .lpm_key = { 32 + 24, {} }, .saddr = CLIENT_IP, .daddr = EXTERNAL_SVC_IP & 0Xffffff, }; To avoid the build time and run time warnings seen with a 0-sized trailing array for struct bpf_lpm_trie_key, introduce a new struct that correctly uses a flexible array for the trailing bytes, struct bpf_lpm_trie_key_u8. As part of this, include the "header" portion (which is just the "prefixlen" member), so it can be used by anything building a bpf_lpr_trie_key that has trailing members that aren't a u8 flexible array (like the self-test[1]), which is named struct bpf_lpm_trie_key_hdr. Unfortunately, C++ refuses to parse the __struct_group() helper, so it is not possible to define struct bpf_lpm_trie_key_hdr directly in struct bpf_lpm_trie_key_u8, so we must open-code the union directly. Adjust the kernel code to use struct bpf_lpm_trie_key_u8 through-out, and for the selftest to use struct bpf_lpm_trie_key_hdr. Add a comment to the UAPI header directing folks to the two new options. Reported-by: Mark Rutland <[email protected]> Signed-off-by: Kees Cook <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Gustavo A. R. Silva <[email protected]> Closes: https://paste.debian.net/hidden/ca500597/ Link: https://lore.kernel.org/all/202206281009.4332AA33@keescook/ [1] Link: https://lore.kernel.org/bpf/[email protected]
2024-02-29Merge tag 'net-6.8-rc7' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from bluetooth, WiFi and netfilter. We have one outstanding issue with the stmmac driver, which may be a LOCKDEP false positive, not a blocker. Current release - regressions: - netfilter: nf_tables: re-allow NFPROTO_INET in nft_(match/target)_validate() - eth: ionic: fix error handling in PCI reset code Current release - new code bugs: - eth: stmmac: complete meta data only when enabled, fix null-deref - kunit: fix again checksum tests on big endian CPUs Previous releases - regressions: - veth: try harder when allocating queue memory - Bluetooth: - hci_bcm4377: do not mark valid bd_addr as invalid - hci_event: fix handling of HCI_EV_IO_CAPA_REQUEST Previous releases - always broken: - info leak in __skb_datagram_iter() on netlink socket - mptcp: - map v4 address to v6 when destroying subflow - fix potential wake-up event loss due to sndbuf auto-tuning - fix double-free on socket dismantle - wifi: nl80211: reject iftype change with mesh ID change - fix small out-of-bound read when validating netlink be16/32 types - rtnetlink: fix error logic of IFLA_BRIDGE_FLAGS writing back - ipv6: fix potential "struct net" ref-leak in inet6_rtm_getaddr() - ip_tunnel: prevent perpetual headroom growth with huge number of tunnels on top of each other - mctp: fix skb leaks on error paths of mctp_local_output() - eth: ice: fixes for DPLL state reporting - dpll: rely on rcu for netdev_dpll_pin() to prevent UaF - eth: dpaa: accept phy-interface-type = '10gbase-r' in the device tree" * tag 'net-6.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (73 commits) dpll: fix build failure due to rcu_dereference_check() on unknown type kunit: Fix again checksum tests on big endian CPUs tls: fix use-after-free on failed backlog decryption tls: separate no-async decryption request handling from async tls: fix peeking with sync+async decryption tls: decrement decrypt_pending if no async completion will be called gtp: fix use-after-free and null-ptr-deref in gtp_newlink() net: hsr: Use correct offset for HSR TLV values in supervisory HSR frames igb: extend PTP timestamp adjustments to i211 rtnetlink: fix error logic of IFLA_BRIDGE_FLAGS writing back tools: ynl: fix handling of multiple mcast groups selftests: netfilter: add bridge conntrack + multicast test case netfilter: bridge: confirm multicast packets before passing them up the stack netfilter: nf_tables: allow NFPROTO_INET in nft_(match/target)_validate() Bluetooth: qca: Fix triggering coredump implementation Bluetooth: hci_qca: Set BDA quirk bit if fwnode exists in DT Bluetooth: qca: Fix wrong event type for patch config command Bluetooth: Enforce validation on max value of connection interval Bluetooth: hci_event: Fix handling of HCI_EV_IO_CAPA_REQUEST Bluetooth: mgmt: Fix limited discoverable off timeout ...
2024-02-29drm/xe/uapi: Remove unused flagsFrancois Dugast1-19/+0
Those cases missed in previous uAPI cleanups were mostly accidentally brought in from i915 or created to exercise the possibilities of gpuvm but they are not used by userspace yet, so let's remove them. They can still be brought back later if needed. v2: - Fix XE_VM_FLAG_FAULT_MODE support in xe_lrc.c (Brian Welty) - Leave DRM_XE_VM_BIND_OP_UNMAP_ALL (José Roberto de Souza) - Ensure invalid flag values are rejected (Rodrigo Vivi) v3: Rebase after removal of persistent exec_queues (Francois Dugast) v4: Rodrigo: Rebase after the new dumpable flag. Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Cc: Thomas Hellström <[email protected]> Cc: Rodrigo Vivi <[email protected]> Signed-off-by: Francois Dugast <[email protected]> Reviewed-by: Rodrigo Vivi <[email protected]> Signed-off-by: Rodrigo Vivi <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] (cherry picked from commit 84a1ed5e67565b09b8fd22a26754d2897de55ce0) Signed-off-by: Thomas Hellström <[email protected]>
2024-02-29drm/xe/uapi: Remove DRM_XE_VM_BIND_FLAG_ASYNC comment left overJosé Roberto de Souza1-1/+0
This is a comment left over of commit d3d767396a02 ("drm/xe/uapi: Remove sync binds"). Fixes: d3d767396a02 ("drm/xe/uapi: Remove sync binds") Reviewed-by: Rodrigo Vivi <[email protected]> Cc: Matthew Brost <[email protected]> Signed-off-by: José Roberto de Souza <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] (cherry picked from commit f031c3a7af8ea06790dd0a71872c4f0175084baa) Signed-off-by: Thomas Hellström <[email protected]>
2024-02-29drm/xe: Add uapi for dumpable bosMaarten Lankhorst1-0/+1
Add the flag XE_VM_BIND_FLAG_DUMPABLE to notify devcoredump that this mapping should be dumped. This is not hooked up, but the uapi should be ready before merging. It's likely easier to dump the contents of the bo's at devcoredump readout time, so it's better if the bos will stay unmodified after a hang. The NEEDS_CPU_MAPPING flag is removed as requirement. Signed-off-by: Maarten Lankhorst <[email protected]> Reviewed-by: José Roberto de Souza <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] (cherry picked from commit 76a86b58d2b3de31e88acb487ebfa0c3cc7c41d2) Signed-off-by: Thomas Hellström <[email protected]>
2024-02-28ublk: add UBLK_CMD_DEL_DEV_ASYNCMing Lei1-0/+2
The current command UBLK_CMD_DEL_DEV won't return until the device is released, this way looks more reliable, but makes userspace more difficult to implement, especially about orders: unmap command buffer(which holds one ublkc reference), ublkc close, io_uring_file_unregister, ublkb close. Add UBLK_CMD_DEL_DEV_ASYNC so that device deletion won't wait release, then userspace needn't worry about the above order. Actually both loop and nbd is deleted in this async way. Signed-off-by: Ming Lei <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2024-02-28uapi: ioam6: API for netlink multicast eventsJustin Iurman1-0/+20
Add new api to support ioam6 events for generic netlink multicast. A first "trace" event is added to the list of ioam6 events, which will represent an IOAM Pre-allocated Trace Option-Type. It provides another solution to share IOAM data with user space. Reviewed-by: David Ahern <[email protected]> Signed-off-by: Justin Iurman <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-02-27uapi: in6: replace temporary label with rfc9486Justin Iurman1-1/+1
Not really a fix per se, but IPV6_TLV_IOAM is still tagged as "TEMPORARY IANA allocation for IOAM", while RFC 9486 is available for some time now. Just update the reference. Fixes: 9ee11f0fff20 ("ipv6: ioam: Data plane support for Pre-allocated Trace") Signed-off-by: Justin Iurman <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-02-27Merge 6.8-rc6 into tty-nextGreg Kroah-Hartman2-1/+14
We need the tty/serial fixes in here as well. Signed-off-by: Greg Kroah-Hartman <[email protected]>