aboutsummaryrefslogtreecommitdiff
path: root/include/uapi
AgeCommit message (Collapse)AuthorFilesLines
2023-10-27bridge: add MDB get uAPI attributesIdo Schimmel1-0/+18
Add MDB get attributes that correspond to the MDB set attributes used in RTM_NEWMDB messages. Specifically, add 'MDBA_GET_ENTRY' which will hold a 'struct br_mdb_entry' and 'MDBA_GET_ENTRY_ATTRS' which will hold 'MDBE_ATTR_*' attributes that are used as indexes (source IP and source VNI). An example request will look as follows: [ struct nlmsghdr ] [ struct br_port_msg ] [ MDBA_GET_ENTRY ] struct br_mdb_entry [ MDBA_GET_ENTRY_ATTRS ] [ MDBE_ATTR_SOURCE ] struct in_addr / struct in6_addr [ MDBE_ATTR_SRC_VNI ] u32 Signed-off-by: Ido Schimmel <[email protected]> Acked-by: Nikolay Aleksandrov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-10-27net/tcp: Add TCP_AO_REPAIRDmitry Safonov1-0/+8
Add TCP_AO_REPAIR setsockopt(), getsockopt(). They let a user to repair TCP-AO ISNs/SNEs. Also let the user hack around when (tp->repair) is on and add ao_info on a socket in any supported state. As SNEs now can be read/written at any moment, use WRITE_ONCE()/READ_ONCE() to set/read them. Signed-off-by: Dmitry Safonov <[email protected]> Acked-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-10-27net/tcp: Allow asynchronous delete for TCP-AO keys (MKTs)Dmitry Safonov1-1/+2
Delete becomes very, very fast - almost free, but after setsockopt() syscall returns, the key is still alive until next RCU grace period. Which is fine for listen sockets as userspace needs to be aware of setsockopt(TCP_AO) and accept() race and resolve it with verification by getsockopt() after TCP connection was accepted. The benchmark results (on non-loaded box, worse with more RCU work pending): > ok 33 Worst case delete 16384 keys: min=5ms max=10ms mean=6.93904ms stddev=0.263421 > ok 34 Add a new key 16384 keys: min=1ms max=4ms mean=2.17751ms stddev=0.147564 > ok 35 Remove random-search 16384 keys: min=5ms max=10ms mean=6.50243ms stddev=0.254999 > ok 36 Remove async 16384 keys: min=0ms max=0ms mean=0.0296107ms stddev=0.0172078 Co-developed-by: Francesco Ruggeri <[email protected]> Signed-off-by: Francesco Ruggeri <[email protected]> Co-developed-by: Salam Noureddine <[email protected]> Signed-off-by: Salam Noureddine <[email protected]> Signed-off-by: Dmitry Safonov <[email protected]> Acked-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-10-27net/tcp: Add TCP-AO getsockopt()sDmitry Safonov1-14/+49
Introduce getsockopt(TCP_AO_GET_KEYS) that lets a user get TCP-AO keys and their properties from a socket. The user can provide a filter to match the specific key to be dumped or ::get_all = 1 may be used to dump all keys in one syscall. Add another getsockopt(TCP_AO_INFO) for providing per-socket/per-ao_info stats: packet counters, Current_key/RNext_key and flags like ::ao_required and ::accept_icmps. Co-developed-by: Francesco Ruggeri <[email protected]> Signed-off-by: Francesco Ruggeri <[email protected]> Co-developed-by: Salam Noureddine <[email protected]> Signed-off-by: Salam Noureddine <[email protected]> Signed-off-by: Dmitry Safonov <[email protected]> Acked-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-10-27net/tcp: Add option for TCP-AO to (not) hash headerDmitry Safonov1-0/+5
Provide setsockopt() key flag that makes TCP-AO exclude hashing TCP header for peers that match the key. This is needed for interraction with middleboxes that may change TCP options, see RFC5925 (9.2). Co-developed-by: Francesco Ruggeri <[email protected]> Signed-off-by: Francesco Ruggeri <[email protected]> Co-developed-by: Salam Noureddine <[email protected]> Signed-off-by: Salam Noureddine <[email protected]> Signed-off-by: Dmitry Safonov <[email protected]> Acked-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-10-27net/tcp: Ignore specific ICMPs for TCP-AO connectionsDmitry Safonov2-1/+4
Similarly to IPsec, RFC5925 prescribes: ">> A TCP-AO implementation MUST default to ignore incoming ICMPv4 messages of Type 3 (destination unreachable), Codes 2-4 (protocol unreachable, port unreachable, and fragmentation needed -- ’hard errors’), and ICMPv6 Type 1 (destination unreachable), Code 1 (administratively prohibited) and Code 4 (port unreachable) intended for connections in synchronized states (ESTABLISHED, FIN-WAIT-1, FIN- WAIT-2, CLOSE-WAIT, CLOSING, LAST-ACK, TIME-WAIT) that match MKTs." A selftest (later in patch series) verifies that this attack is not possible in this TCP-AO implementation. Co-developed-by: Francesco Ruggeri <[email protected]> Signed-off-by: Francesco Ruggeri <[email protected]> Co-developed-by: Salam Noureddine <[email protected]> Signed-off-by: Salam Noureddine <[email protected]> Signed-off-by: Dmitry Safonov <[email protected]> Acked-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-10-27net/tcp: Add TCP-AO segments countersDmitry Safonov2-1/+11
Introduce segment counters that are useful for troubleshooting/debugging as well as for writing tests. Now there are global snmp counters as well as per-socket and per-key. Co-developed-by: Francesco Ruggeri <[email protected]> Signed-off-by: Francesco Ruggeri <[email protected]> Co-developed-by: Salam Noureddine <[email protected]> Signed-off-by: Salam Noureddine <[email protected]> Signed-off-by: Dmitry Safonov <[email protected]> Acked-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-10-27net/tcp: Introduce TCP_AO setsockopt()sDmitry Safonov1-0/+46
Add 3 setsockopt()s: 1. TCP_AO_ADD_KEY to add a new Master Key Tuple (MKT) on a socket 2. TCP_AO_DEL_KEY to delete present MKT from a socket 3. TCP_AO_INFO to change flags, Current_key/RNext_key on a TCP-AO sk Userspace has to introduce keys on every socket it wants to use TCP-AO option on, similarly to TCP_MD5SIG/TCP_MD5SIG_EXT. RFC5925 prohibits definition of MKTs that would match the same peer, so do sanity checks on the data provided by userspace. Be as conservative as possible, including refusal of defining MKT on an established connection with no AO, removing the key in-use and etc. (1) and (2) are to be used by userspace key manager to add/remove keys. (3) main purpose is to set RNext_key, which (as prescribed by RFC5925) is the KeyID that will be requested in TCP-AO header from the peer to sign their segments with. At this moment the life of ao_info ends in tcp_v4_destroy_sock(). Co-developed-by: Francesco Ruggeri <[email protected]> Signed-off-by: Francesco Ruggeri <[email protected]> Co-developed-by: Salam Noureddine <[email protected]> Signed-off-by: Salam Noureddine <[email protected]> Signed-off-by: Dmitry Safonov <[email protected]> Acked-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-10-27net/tcp: Add TCP-AO config and structuresDmitry Safonov1-0/+2
Introduce new kernel config option and common structures as well as helpers to be used by TCP-AO code. Co-developed-by: Francesco Ruggeri <[email protected]> Signed-off-by: Francesco Ruggeri <[email protected]> Co-developed-by: Salam Noureddine <[email protected]> Signed-off-by: Salam Noureddine <[email protected]> Signed-off-by: Dmitry Safonov <[email protected]> Acked-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-10-27tty: n_gsm: add copyright Siemens Mobility GmbHDaniel Starke1-0/+1
More than 1/3 of the n_gsm code has been contributed by us in the last 1.5 years, completing conformance with the standard and stabilizing the driver: - added UI (unnumbered information) frame support - added PN (parameter negotiation) message handling and function support - added optional keep-alive control link supervision via test messages - added TIOCM_OUT1 and TIOCM_OUT2 to allow responder to operate as modem - added TIOCMIWAIT support on virtual ttys - added additional ioctls and parameters to configure the new functions - added overall locking mechanism to avoid data race conditions - added outgoing data flow to decouple physical from virtual tty handling for better performance and to avoid dead-locks - fixed advanced option mode implementation - fixed convergence layer type 2 implementation - fixed handling of CLD (multiplexer close down) messages - fixed broken muxer close down procedure - and many more bug fixes With this most of our initial RFC has been implemented. It gives the driver a quality boost unseen in the decade before. Add a copyright notice to the n_gsm files to highlight this contribution. Link: https://lore.kernel.org/all/[email protected]/ Signed-off-by: Daniel Starke <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2023-10-26Merge tag 'wireless-next-2023-10-26' of ↵Jakub Kicinski1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next Kalle Valo says: ==================== wireless-next patches for v6.7 The third, and most likely the last, features pull request for v6.7. Fixes all over and only few small new features. Major changes: iwlwifi - more Multi-Link Operation (MLO) work ath12k - QCN9274: mesh support ath11k - firmware-2.bin container file format support * tag 'wireless-next-2023-10-26' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (155 commits) wifi: ray_cs: Remove unnecessary (void*) conversions Revert "wifi: ath11k: call ath11k_mac_fils_discovery() without condition" wifi: ath12k: Introduce and use ath12k_sta_to_arsta() wifi: ath12k: fix htt mlo-offset event locking wifi: ath12k: fix dfs-radar and temperature event locking wifi: ath11k: fix gtk offload status event locking wifi: ath11k: fix htt pktlog locking wifi: ath11k: fix dfs radar event locking wifi: ath11k: fix temperature event locking wifi: ath12k: rename the sc naming convention to ab wifi: ath12k: rename the wmi_sc naming convention to wmi_ab wifi: ath11k: add firmware-2.bin support wifi: ath11k: qmi: refactor ath11k_qmi_m3_load() wifi: rtw89: cleanup firmware elements parsing wifi: rt2x00: rework MT7620 PA/LNA RF calibration wifi: rt2x00: rework MT7620 channel config function wifi: rt2x00: improve MT7620 register initialization MAINTAINERS: wifi: rt2x00: drop Helmut Schaa wifi: wlcore: main: replace deprecated strncpy with strscpy wifi: wlcore: boot: replace deprecated strncpy with strscpy ... ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-10-26Merge tag 'for-netdev' of ↵Jakub Kicinski2-0/+38
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2023-10-26 We've added 51 non-merge commits during the last 10 day(s) which contain a total of 75 files changed, 5037 insertions(+), 200 deletions(-). The main changes are: 1) Add open-coded task, css_task and css iterator support. One of the use cases is customizable OOM victim selection via BPF, from Chuyi Zhou. 2) Fix BPF verifier's iterator convergence logic to use exact states comparison for convergence checks, from Eduard Zingerman, Andrii Nakryiko and Alexei Starovoitov. 3) Add BPF programmable net device where bpf_mprog defines the logic of its xmit routine. It can operate in L3 and L2 mode, from Daniel Borkmann and Nikolay Aleksandrov. 4) Batch of fixes for BPF per-CPU kptr and re-enable unit_size checking for global per-CPU allocator, from Hou Tao. 5) Fix libbpf which eagerly assumed that SHT_GNU_verdef ELF section was going to be present whenever a binary has SHT_GNU_versym section, from Andrii Nakryiko. 6) Fix BPF ringbuf correctness to fold smp_mb__before_atomic() into atomic_set_release(), from Paul E. McKenney. 7) Add a warning if NAPI callback missed xdp_do_flush() under CONFIG_DEBUG_NET which helps checking if drivers were missing the former, from Sebastian Andrzej Siewior. 8) Fix missed RCU read-lock in bpf_task_under_cgroup() which was throwing a warning under sleepable programs, from Yafang Shao. 9) Avoid unnecessary -EBUSY from htab_lock_bucket by disabling IRQ before checking map_locked, from Song Liu. 10) Make BPF CI linked_list failure test more robust, from Kumar Kartikeya Dwivedi. 11) Enable samples/bpf to be built as PIE in Fedora, from Viktor Malik. 12) Fix xsk starving when multiple xsk sockets were associated with a single xsk_buff_pool, from Albert Huang. 13) Clarify the signed modulo implementation for the BPF ISA standardization document that it uses truncated division, from Dave Thaler. 14) Improve BPF verifier's JEQ/JNE branch taken logic to also consider signed bounds knowledge, from Andrii Nakryiko. 15) Add an option to XDP selftests to use multi-buffer AF_XDP xdp_hw_metadata and mark used XDP programs as capable to use frags, from Larysa Zaremba. 16) Fix bpftool's BTF dumper wrt printing a pointer value and another one to fix struct_ops dump in an array, from Manu Bretelle. * tag 'for-netdev' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (51 commits) netkit: Remove explicit active/peer ptr initialization selftests/bpf: Fix selftests broken by mitigations=off samples/bpf: Allow building with custom bpftool samples/bpf: Fix passing LDFLAGS to libbpf samples/bpf: Allow building with custom CFLAGS/LDFLAGS bpf: Add more WARN_ON_ONCE checks for mismatched alloc and free selftests/bpf: Add selftests for netkit selftests/bpf: Add netlink helper library bpftool: Extend net dump with netkit progs bpftool: Implement link show support for netkit libbpf: Add link-based API for netkit tools: Sync if_link uapi header netkit, bpf: Add bpf programmable net device bpf: Improve JEQ/JNE branch taken logic bpf: Fold smp_mb__before_atomic() into atomic_set_release() bpf: Fix unnecessary -EBUSY from htab_lock_bucket xsk: Avoid starving the xsk further down the list bpf: print full verifier states on infinite loop detection selftests/bpf: test if state loops are detected in a tricky case bpf: correct loop detection for iterators convergence ... ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-10-26Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski2-3/+3
Cross-merge networking fixes after downstream PR. Conflicts: net/mac80211/rx.c 91535613b609 ("wifi: mac80211: don't drop all unprotected public action frames") 6c02fab72429 ("wifi: mac80211: split ieee80211_drop_unencrypted_mgmt() return value") Adjacent changes: drivers/net/ethernet/apm/xgene/xgene_enet_main.c 61471264c018 ("net: ethernet: apm: Convert to platform remove callback returning void") d2ca43f30611 ("net: xgene: Fix unused xgene_enet_of_match warning for !CONFIG_OF") net/vmw_vsock/virtio_transport.c 64c99d2d6ada ("vsock/virtio: support to send non-linear skb") 53b08c498515 ("vsock/virtio: initialize the_virtio_vsock before using VQs") Signed-off-by: Jakub Kicinski <[email protected]>
2023-10-26landlock: Support network rules with TCP bind and connectKonstantin Meskhidze1-0/+55
Add network rules support in the ruleset management helpers and the landlock_create_ruleset() syscall. Extend user space API to support network actions: * Add new network access rights: LANDLOCK_ACCESS_NET_BIND_TCP and LANDLOCK_ACCESS_NET_CONNECT_TCP. * Add a new network rule type: LANDLOCK_RULE_NET_PORT tied to struct landlock_net_port_attr. The allowed_access field contains the network access rights, and the port field contains the port value according to the controlled protocol. This field can take up to a 64-bit value but the maximum value depends on the related protocol (e.g. 16-bit value for TCP). Network port is in host endianness [1]. * Add a new handled_access_net field to struct landlock_ruleset_attr that contains network access rights. * Increment the Landlock ABI version to 4. Implement socket_bind() and socket_connect() LSM hooks, which enable to control TCP socket binding and connection to specific ports. Expand access_masks_t from u16 to u32 to be able to store network access rights alongside filesystem access rights for rulesets' handled access rights. Access rights are not tied to socket file descriptors but checked at bind() or connect() call time against the caller's Landlock domain. For the filesystem, a file descriptor is a direct access to a file/data. However, for network sockets, we cannot identify for which data or peer a newly created socket will give access to. Indeed, we need to wait for a connect or bind request to identify the use case for this socket. Likewise a directory file descriptor may enable to open another file (i.e. a new data item), but this opening is also restricted by the caller's domain, not the file descriptor's access rights [2]. [1] https://lore.kernel.org/r/[email protected] [2] https://lore.kernel.org/r/[email protected] Signed-off-by: Konstantin Meskhidze <[email protected]> Link: https://lore.kernel.org/r/[email protected] [mic: Extend commit message, fix typo in comments, and specify endianness in the documentation] Co-developed-by: Mickaël Salaün <[email protected]> Signed-off-by: Mickaël Salaün <[email protected]>
2023-10-26Merge tag 'net-6.6-rc8' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from WiFi and netfilter. Most regressions addressed here come from quite old versions, with the exceptions of the iavf one and the WiFi fixes. No known outstanding reports or investigation. Fixes to fixes: - eth: iavf: in iavf_down, disable queues when removing the driver Previous releases - regressions: - sched: act_ct: additional checks for outdated flows - tcp: do not leave an empty skb in write queue - tcp: fix wrong RTO timeout when received SACK reneging - wifi: cfg80211: pass correct pointer to rdev_inform_bss() - eth: i40e: sync next_to_clean and next_to_process for programming status desc - eth: iavf: initialize waitqueues before starting watchdog_task Previous releases - always broken: - eth: r8169: fix data-races - eth: igb: fix potential memory leak in igb_add_ethtool_nfc_entry - eth: r8152: avoid writing garbage to the adapter's registers - eth: gtp: fix fragmentation needed check with gso" * tag 'net-6.6-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (43 commits) iavf: in iavf_down, disable queues when removing the driver vsock/virtio: initialize the_virtio_vsock before using VQs net: ipv6: fix typo in comments net: ipv4: fix typo in comments net/sched: act_ct: additional checks for outdated flows netfilter: flowtable: GC pushes back packets to classic path i40e: Fix wrong check for I40E_TXR_FLAGS_WB_ON_ITR gtp: fix fragmentation needed check with gso gtp: uapi: fix GTPA_MAX Fix NULL pointer dereference in cn_filter() sfc: cleanup and reduce netlink error messages net/handshake: fix file ref count in handshake_nl_accept_doit() wifi: mac80211: don't drop all unprotected public action frames wifi: cfg80211: fix assoc response warning on failed links wifi: cfg80211: pass correct pointer to rdev_inform_bss() isdn: mISDN: hfcsusb: Spelling fix in comment tcp: fix wrong RTO timeout when received SACK reneging r8152: Block future register access if register access fails r8152: Rename RTL8152_UNPLUG to RTL8152_INACCESSIBLE r8152: Check for unplug in r8153b_ups_en() / r8153c_ups_en() ...
2023-10-26iommu/vt-d: Disallow read-only mappings to nest parent domainLu Baolu1-1/+11
When remapping hardware is configured by system software in scalable mode as Nested (PGTT=011b) and with PWSNP field Set in the PASID-table-entry, it may Set Accessed bit and Dirty bit (and Extended Access bit if enabled) in first-stage page-table entries even when second-stage mappings indicate that corresponding first-stage page-table is Read-Only. As the result, contents of pages designated by VMM as Read-Only can be modified by IOMMU via PML5E (PML4E for 4-level tables) access as part of address translation process due to DMAs issued by Guest. This disallows read-only mappings in the domain that is supposed to be used as nested parent. Reference from Sapphire Rapids Specification Update [1], errata details, SPR17. Userspace should know this limitation by checking the IOMMU_HW_INFO_VTD_ERRATA_772415_SPR17 flag reported in the IOMMU_GET_HW_INFO ioctl. [1] https://www.intel.com/content/www/us/en/content-details/772415/content-details.html Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Kevin Tian <[email protected]> Signed-off-by: Lu Baolu <[email protected]> Signed-off-by: Yi Liu <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2023-10-26iommufd: Add data structure for Intel VT-d stage-1 domain allocationYi Liu1-0/+30
This adds IOMMU_HWPT_DATA_VTD_S1 for stage-1 hw_pagetable of Intel VT-d and the corressponding data structure for userspace specified parameter for the domain allocation. Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Kevin Tian <[email protected]> Signed-off-by: Yi Liu <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2023-10-26iommufd: Add a nested HW pagetable objectNicolin Chen1-2/+29
IOMMU_HWPT_ALLOC already supports iommu_domain allocation for usersapce. But it can only allocate a hw_pagetable that associates to a given IOAS, i.e. only a kernel-managed hw_pagetable of IOMMUFD_OBJ_HWPT_PAGING type. IOMMU drivers can now support user-managed hw_pagetables, for two-stage translation use cases that require user data input from the user space. Add a new IOMMUFD_OBJ_HWPT_NESTED type with its abort/destroy(). Pair it with a new iommufd_hwpt_nested structure and its to_hwpt_nested() helper. Update the to_hwpt_paging() helper, so a NESTED-type hw_pagetable can be handled in the callers, for example iommufd_hw_pagetable_enforce_rr(). Screen the inputs including the parent PAGING-type hw_pagetable that has a need of a new nest_parent flag in the iommufd_hwpt_paging structure. Extend the IOMMU_HWPT_ALLOC ioctl to accept an IOMMU driver specific data input which is tagged by the enum iommu_hwpt_data_type. Also, update the @pt_id to accept hwpt_id too besides an ioas_id. Then, use them to allocate a hw_pagetable of IOMMUFD_OBJ_HWPT_NESTED type using the iommufd_hw_pagetable_alloc_nested() allocator. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Nicolin Chen <[email protected]> Co-developed-by: Yi Liu <[email protected]> Signed-off-by: Yi Liu <[email protected]> Reviewed-by: Kevin Tian <[email protected]> Reviewed-by: Jason Gunthorpe <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2023-10-26ALSA: seq: Replace with __packed attributeTakashi Iwai1-2/+2
Replace the old __attribute__((packed)) with the new __packed. Only cleanup, no functional changes. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Takashi Iwai <[email protected]>
2023-10-25ipv6: drop feature RTAX_FEATURE_ALLFRAGYan Zhai1-1/+1
RTAX_FEATURE_ALLFRAG was added before the first git commit: https://www.mail-archive.com/[email protected]/msg03399.html The feature would send packets to the fragmentation path if a box receives a PMTU value with less than 1280 byte. However, since commit 9d289715eb5c ("ipv6: stop sending PTB packets for MTU < 1280"), such message would be simply discarded. The feature flag is neither supported in iproute2 utility. In theory one can still manipulate it with direct netlink message, but it is not ideal because it was based on obsoleted guidance of RFC-2460 (replaced by RFC-8200). The feature would always test false at the moment, so remove related code or mark them as unused. Signed-off-by: Yan Zhai <[email protected]> Reviewed-by: Florian Westphal <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Link: https://lore.kernel.org/r/d78e44dcd9968a252143ffe78460446476a472a1.1698156966.git.yan@cloudflare.com Signed-off-by: Jakub Kicinski <[email protected]>
2023-10-25mempolicy: remove confusing MPOL_MF_LAZY dead codeHugh Dickins1-1/+1
v3.8 commit b24f53a0bea3 ("mm: mempolicy: Add MPOL_MF_LAZY") introduced MPOL_MF_LAZY, and included it in the MPOL_MF_VALID flags; but a720094ded8 ("mm: mempolicy: Hide MPOL_NOOP and MPOL_MF_LAZY from userspace for now") immediately removed it from MPOL_MF_VALID flags, pending further review. "This will need to be revisited", but it has not been reinstated. The present state is confusing: there is dead code in mm/mempolicy.c to handle MPOL_MF_LAZY cases which can never occur. Remove that: it can be resurrected later if necessary. But keep the definition of MPOL_MF_LAZY, which must remain in the UAPI, even though it always fails with EINVAL. https://lore.kernel.org/linux-mm/[email protected]/ links to a previous request to remove MPOL_MF_LAZY. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Hugh Dickins <[email protected]> Reviewed-by: Matthew Wilcox (Oracle) <[email protected]> Reviewed-by: Yang Shi <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: "Huang, Ying" <[email protected]> Cc: Kefeng Wang <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Nhat Pham <[email protected]> Cc: Sidhartha Kumar <[email protected]> Cc: Suren Baghdasaryan <[email protected]> Cc: Tejun heo <[email protected]> Cc: Vishal Moola (Oracle) <[email protected]> Cc: Yosry Ahmed <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-10-25drm/doc: document DRM_IOCTL_MODE_CREATE_DUMBSimon Ser2-2/+34
The main motivation is to repeat that dumb buffers should not be abused for anything else than basic software rendering with KMS. User-space devs are more likely to look at the IOCTL docs than to actively search for the driver-oriented "Dumb Buffer Objects" section. v2: reference DRM_CAP_DUMB_BUFFER, DRM_CAP_DUMB_PREFERRED_DEPTH and DRM_CAP_DUMB_PREFER_SHADOW (Pekka) Signed-off-by: Simon Ser <[email protected]> Acked-by: Daniel Vetter <[email protected]> Acked-by: Pekka Paalanen <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2023-10-24netkit, bpf: Add bpf programmable net deviceDaniel Borkmann2-0/+38
This work adds a new, minimal BPF-programmable device called "netkit" (former PoC code-name "meta") we recently presented at LSF/MM/BPF. The core idea is that BPF programs are executed within the drivers xmit routine and therefore e.g. in case of containers/Pods moving BPF processing closer to the source. One of the goals was that in case of Pod egress traffic, this allows to move BPF programs from hostns tcx ingress into the device itself, providing earlier drop or forward mechanisms, for example, if the BPF program determines that the skb must be sent out of the node, then a redirect to the physical device can take place directly without going through per-CPU backlog queue. This helps to shift processing for such traffic from softirq to process context, leading to better scheduling decisions/performance (see measurements in the slides). In this initial version, the netkit device ships as a pair, but we plan to extend this further so it can also operate in single device mode. The pair comes with a primary and a peer device. Only the primary device, typically residing in hostns, can manage BPF programs for itself and its peer. The peer device is designated for containers/Pods and cannot attach/detach BPF programs. Upon the device creation, the user can set the default policy to 'pass' or 'drop' for the case when no BPF program is attached. Additionally, the device can be operated in L3 (default) or L2 mode. The management of BPF programs is done via bpf_mprog, so that multi-attach is supported right from the beginning with similar API and dependency controls as tcx. For details on the latter see commit 053c8e1f235d ("bpf: Add generic attach/detach/query API for multi-progs"). tc BPF compatibility is provided, so that existing programs can be easily migrated. Going forward, we plan to use netkit devices in Cilium as the main device type for connecting Pods. They will be operated in L3 mode in order to simplify a Pod's neighbor management and the peer will operate in default drop mode, so that no traffic is leaving between the time when a Pod is brought up by the CNI plugin and programs attached by the agent. Additionally, the programs we attach via tcx on the physical devices are using bpf_redirect_peer() for inbound traffic into netkit device, hence the latter is also supporting the ndo_get_peer_dev callback. Similarly, we use bpf_redirect_neigh() for the way out, pushing from netkit peer to phys device directly. Also, BIG TCP is supported on netkit device. For the follow-up work in single device mode, we plan to convert Cilium's cilium_host/_net devices into a single one. An extensive test suite for checking device operations and the BPF program and link management API comes as BPF selftests in this series. Co-developed-by: Nikolay Aleksandrov <[email protected]> Signed-off-by: Nikolay Aleksandrov <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Reviewed-by: Toke Høiland-Jørgensen <[email protected]> Acked-by: Stanislav Fomichev <[email protected]> Acked-by: Martin KaFai Lau <[email protected]> Link: https://github.com/borkmann/iproute2/tree/pr/netkit Link: http://vger.kernel.org/bpfconf2023_material/tcx_meta_netdev_borkmann.pdf (24ff.) Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Martin KaFai Lau <[email protected]>
2023-10-24PCI/PME: Use FIELD_GET()Bjorn Helgaas1-0/+1
Use FIELD_GET() to remove dependences on the field position, i.e., the shift value. No functional change intended. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Bjorn Helgaas <[email protected]> Reviewed-by: Ilpo Järvinen <[email protected]> Reviewed-by: Jonathan Cameron <[email protected]> Reviewed-by: Kuppuswamy Sathyanarayanan <[email protected]>
2023-10-24PCI/ATS: Use FIELD_GET()Bjorn Helgaas1-0/+1
Use FIELD_GET() to remove dependences on the field position, i.e., the shift value. No functional change intended. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Bjorn Helgaas <[email protected]> Reviewed-by: Jonathan Cameron <[email protected]> Reviewed-by: Ilpo Järvinen <[email protected]> Reviewed-by: Kuppuswamy Sathyanarayanan <[email protected]>
2023-10-24PCI/ATS: Show PASID Capability register width in bitmasksBjorn Helgaas1-5/+5
The PASID Capability and Control registers are both 16 bits wide. Use 16-bit wide constants in field names to match the register width. No functional change intended. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Bjorn Helgaas <[email protected]> Reviewed-by: Ilpo Järvinen <[email protected]> Reviewed-by: Jonathan Cameron <[email protected]> Reviewed-by: Kuppuswamy Sathyanarayanan <[email protected]>
2023-10-24net: dsa: Rename IFLA_DSA_MASTER to IFLA_DSA_CONDUITFlorian Fainelli1-1/+3
This preserves the existing IFLA_DSA_MASTER which is part of the uAPI and creates an alias named IFLA_DSA_CONDUIT. Reviewed-by: Andrew Lunn <[email protected]> Reviewed-by: Vladimir Oltean <[email protected]> Signed-off-by: Florian Fainelli <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-10-24uapi: mptcp: use header file generated from YAML specDavide Caratti2-172/+160
generated with: $ ./tools/net/ynl/ynl-gen-c.py --mode uapi \ > --spec Documentation/netlink/specs/mptcp.yaml \ > --header -o include/uapi/linux/mptcp_pm.h Link: https://github.com/multipath-tcp/mptcp_net-next/issues/340 Acked-by: Paolo Abeni <[email protected]> Signed-off-by: Davide Caratti <[email protected]> Signed-off-by: Mat Martineau <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-10-24net: mptcp: convert netlink from small_ops to opsDavide Caratti1-0/+8
in the current MPTCP control plane, all operations use a netlink attribute of the same type "MPTCP_PM_ATTR". However, add/del/get/flush operations only parse the first element in the message _ the one that describes MPTCP endpoints (that was named MPTCP_PM_ATTR_ADDR and mostly used in ADD_ADDR operations _ probably the similarity of "attr", "addr" and "add" might cause some confusion to human readers). Convert MPTCP from 'small_ops' to 'ops', thus allowing different attributes for each single operation, hopefully makes all this clearer to human readers. - use a separate attribute set for add/del/get/flush address operation, binary compatible with the existing one, to store the endpoint address. MPTCP_PM_ENDPOINT_ADDR is added to the uAPI (with the same value as MPTCP_PM_ATTR_ADDR) for these operations. - convert mptcp_pm_ops[] and add policy files accordingly. this prepares MPTCP control plane to be described as YAML spec. Link: https://github.com/multipath-tcp/mptcp_net-next/issues/340 Acked-by: Paolo Abeni <[email protected]> Signed-off-by: Davide Caratti <[email protected]> Signed-off-by: Mat Martineau <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-10-24drm/fourcc: Add NV20 and NV30 YUV formatsJonas Karlman1-0/+2
DRM_FORMAT_NV20 and DRM_FORMAT_NV30 formats is the 2x1 and non-subsampled variant of NV15, a 10-bit 2-plane YUV format that has no padding between components. Instead, luminance and chrominance samples are grouped into 4s so that each group is packed into an integer number of bytes: YYYY = UVUV = 4 * 10 bits = 40 bits = 5 bytes The '20' and '30' suffix refers to the optimum effective bits per pixel which is achieved when the total number of luminance samples is a multiple of 4. V2: Added NV30 format Signed-off-by: Jonas Karlman <[email protected]> Reviewed-by: Sandy Huang <[email protected]> Reviewed-by: Christopher Obbard <[email protected]> Tested-by: Christopher Obbard <[email protected]> Signed-off-by: Heiko Stuebner <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2023-10-24PCI/DPC: Use defines with DPC reason fieldsIlpo Järvinen1-0/+6
Add new defines for DPC reason fields and use them instead of literals. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Ilpo Järvinen <[email protected]> [bhelgaas: shorten comments] Signed-off-by: Bjorn Helgaas <[email protected]>
2023-10-24PCI/DPC: Use FIELD_GET()Bjorn Helgaas1-0/+1
Use FIELD_GET() to remove dependencies on the field position, i.e., the shift value. No functional change intended. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Ilpo Järvinen <[email protected]> Signed-off-by: Bjorn Helgaas <[email protected]>
2023-10-24PCI: dwc: Use FIELD_GET/PREP()Ilpo Järvinen1-0/+2
Convert open-coded variants of PCI field access into FIELD_GET/PREP() to make the code easier to understand. Add two missing defines into pci_regs.h. Logically, the Max No-Snoop Latency Register is a separate word sized register in the PCIe spec, but the pre-existing LTR defines in pci_regs.h with dword long values seem to consider the registers together (the same goes for the only user). Thus, follow the custom and make the new values also take both word long LTR registers as a joint dword register. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Ilpo Järvinen <[email protected]> Signed-off-by: Bjorn Helgaas <[email protected]>
2023-10-24iommufd: Add a flag to skip clearing of IOPTE dirtyJoao Martins1-1/+14
VFIO has an operation where it unmaps an IOVA while returning a bitmap with the dirty data. In reality the operation doesn't quite query the IO pagetables that the PTE was dirty or not. Instead it marks as dirty on anything that was mapped, and doing so in one syscall. In IOMMUFD the equivalent is done in two operations by querying with GET_DIRTY_IOVA followed by UNMAP_IOVA. However, this would incur two TLB flushes given that after clearing dirty bits IOMMU implementations require invalidating their IOTLB, plus another invalidation needed for the UNMAP. To allow dirty bits to be queried faster, add a flag (IOMMU_HWPT_GET_DIRTY_BITMAP_NO_CLEAR) that requests to not clear the dirty bits from the PTE (but just reading them), under the expectation that the next operation is the unmap. An alternative is to unmap and just perpectually mark as dirty as that's the same behaviour as today. So here equivalent functionally can be provided with unmap alone, and if real dirty info is required it will amortize the cost while querying. There's still a race against DMA where in theory the unmap of the IOVA (when the guest invalidates the IOTLB via emulated iommu) would race against the VF performing DMA on the same IOVA. As discussed in [0], we are accepting to resolve this race as throwing away the DMA and it doesn't matter if it hit physical DRAM or not, the VM can't tell if we threw it away because the DMA was blocked or because we failed to copy the DRAM. [0] https://lore.kernel.org/linux-iommu/[email protected]/ Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Joao Martins <[email protected]> Reviewed-by: Jason Gunthorpe <[email protected]> Reviewed-by: Kevin Tian <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2023-10-24iommufd: Add capabilities to IOMMU_GET_HW_INFOJoao Martins1-0/+17
Extend IOMMUFD_CMD_GET_HW_INFO op to query generic iommu capabilities for a given device. Capabilities are IOMMU agnostic and use device_iommu_capable() API passing one of the IOMMU_CAP_*. Enumerate IOMMU_CAP_DIRTY_TRACKING for now in the out_capabilities field returned back to userspace. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Joao Martins <[email protected]> Reviewed-by: Jason Gunthorpe <[email protected]> Reviewed-by: Kevin Tian <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2023-10-24iommufd: Add IOMMU_HWPT_GET_DIRTY_BITMAPJoao Martins1-0/+35
Connect a hw_pagetable to the IOMMU core dirty tracking read_and_clear_dirty iommu domain op. It exposes all of the functionality for the UAPI that read the dirtied IOVAs while clearing the Dirty bits from the PTEs. In doing so, add an IO pagetable API iopt_read_and_clear_dirty_data() that performs the reading of dirty IOPTEs for a given IOVA range and then copying back to userspace bitmap. Underneath it uses the IOMMU domain kernel API which will read the dirty bits, as well as atomically clearing the IOPTE dirty bit and flushing the IOTLB at the end. The IOVA bitmaps usage takes care of the iteration of the bitmaps user pages efficiently and without copies. Within the iterator function we iterate over io-pagetable contigous areas that have been mapped. Contrary to past incantation of a similar interface in VFIO the IOVA range to be scanned is tied in to the bitmap size, thus the application needs to pass a appropriately sized bitmap address taking into account the iova range being passed *and* page size ... as opposed to allowing bitmap-iova != iova. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Joao Martins <[email protected]> Reviewed-by: Jason Gunthorpe <[email protected]> Reviewed-by: Kevin Tian <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2023-10-24iommufd: Add IOMMU_HWPT_SET_DIRTY_TRACKINGJoao Martins1-0/+28
Every IOMMU driver should be able to implement the needed iommu domain ops to control dirty tracking. Connect a hw_pagetable to the IOMMU core dirty tracking ops, specifically the ability to enable/disable dirty tracking on an IOMMU domain (hw_pagetable id). To that end add an io_pagetable kernel API to toggle dirty tracking: * iopt_set_dirty_tracking(iopt, [domain], state) The intended caller of this is via the hw_pagetable object that is created. Internally it will ensure the leftover dirty state is cleared /right before/ dirty tracking starts. This is also useful for iommu drivers which may decide that dirty tracking is always-enabled at boot without wanting to toggle dynamically via corresponding iommu domain op. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Joao Martins <[email protected]> Reviewed-by: Jason Gunthorpe <[email protected]> Reviewed-by: Kevin Tian <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2023-10-24iommufd: Add a flag to enforce dirty tracking on attachJoao Martins1-0/+3
Throughout IOMMU domain lifetime that wants to use dirty tracking, some guarantees are needed such that any device attached to the iommu_domain supports dirty tracking. The idea is to handle a case where IOMMU in the system are assymetric feature-wise and thus the capability may not be supported for all devices. The enforcement is done by adding a flag into HWPT_ALLOC namely: IOMMU_HWPT_ALLOC_DIRTY_TRACKING .. Passed in HWPT_ALLOC ioctl() flags. The enforcement is done by creating a iommu_domain via domain_alloc_user() and validating the requested flags with what the device IOMMU supports (and failing accordingly) advertised). Advertising the new IOMMU domain feature flag requires that the individual iommu driver capability is supported when a future device attachment happens. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Joao Martins <[email protected]> Reviewed-by: Jason Gunthorpe <[email protected]> Reviewed-by: Kevin Tian <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2023-10-24gtp: uapi: fix GTPA_MAXPablo Neira Ayuso1-1/+1
Subtract one to __GTPA_MAX, otherwise GTPA_MAX is off by 2. Fixes: 459aa660eb1d ("gtp: add initial driver for datapath of GPRS Tunneling Protocol (GTP-U)") Signed-off-by: Pablo Neira Ayuso <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2023-10-23devlink: make devlink_flash_overwrite enum named oneJiri Pirko1-1/+1
Since this enum is going to be used in generated userspace file, name it properly. Signed-off-by: Jiri Pirko <[email protected]> Reviewed-by: Jacob Keller <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-10-23wifi: nl80211: fix doc typosRandy Dunlap1-2/+2
Correct some typos. Signed-off-by: Randy Dunlap <[email protected]> Cc: Johannes Berg <[email protected]> Cc: Kalle Valo <[email protected]> Cc: [email protected] Cc: "David S. Miller" <[email protected]> Cc: Eric Dumazet <[email protected]> Cc: Jakub Kicinski <[email protected]> Cc: Paolo Abeni <[email protected]> Cc: [email protected] Reviewed-by: Simon Horman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Johannes Berg <[email protected]>
2023-10-23Merge tag 'v6.6-rc7' into sched/core, to pick up fixesIngo Molnar2-5/+11
Pick up recent sched/urgent fixes merged upstream. Signed-off-by: Ingo Molnar <[email protected]>
2023-10-23tcp: add TCPI_OPT_USEC_TSEric Dumazet1-0/+1
Add the ability to report in tcp_info.tcpi_options if a flow is using usec resolution in TCP TS val. Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-10-23tcp: add RTAX_FEATURE_TCP_USEC_TSEric Dumazet1-7/+11
This new dst feature flag will be used to allow TCP to use usec based timestamps instead of msec ones. ip route .... feature tcp_usec_ts Also document that RTAX_FEATURE_SACK and RTAX_FEATURE_TIMESTAMP are unused. RTAX_FEATURE_ALLFRAG is also going away soon. Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-10-23BackMerge tag 'v6.6-rc7' into drm-nextDave Airlie4-6/+21
This is needed to add the msm pr which is based on a higher base. Signed-off-by: Dave Airlie <[email protected]>
2023-10-21iommufd: Correct IOMMU_HWPT_ALLOC_NEST_PARENT descriptionNicolin Chen1-3/+2
The IOMMU_HWPT_ALLOC_NEST_PARENT flag is used to allocate a HWPT. Though a HWPT holds a domain in the core structure, it is still quite confusing to describe it using "domain" in the uAPI kdoc. Correct it to "HWPT". Fixes: 4ff542163397 ("iommufd: Support allocating nested parent domain") Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Nicolin Chen <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2023-10-20net: fix IPSTATS_MIB_OUTPKGS increment in OutForwDatagrams.Heng Guo1-1/+2
Reproduce environment: network with 3 VM linuxs is connected as below: VM1<---->VM2(latest kernel 6.5.0-rc7)<---->VM3 VM1: eth0 ip: 192.168.122.207 MTU 1500 VM2: eth0 ip: 192.168.122.208, eth1 ip: 192.168.123.224 MTU 1500 VM3: eth0 ip: 192.168.123.240 MTU 1500 Reproduce: VM1 send 1400 bytes UDP data to VM3 using tools scapy with flags=0. scapy command: send(IP(dst="192.168.123.240",flags=0)/UDP()/str('0'*1400),count=1, inter=1.000000) Result: Before IP data is sent. ---------------------------------------------------------------------- root@qemux86-64:~# cat /proc/net/snmp Ip: Forwarding DefaultTTL InReceives InHdrErrors InAddrErrors ForwDatagrams InUnknownProtos InDiscards InDelivers OutRequests OutDiscards OutNoRoutes ReasmTimeout ReasmReqds ReasmOKs ReasmFails FragOKs FragFails FragCreates Ip: 1 64 11 0 3 4 0 0 4 7 0 0 0 0 0 0 0 0 0 ...... ---------------------------------------------------------------------- After IP data is sent. ---------------------------------------------------------------------- root@qemux86-64:~# cat /proc/net/snmp Ip: Forwarding DefaultTTL InReceives InHdrErrors InAddrErrors ForwDatagrams InUnknownProtos InDiscards InDelivers OutRequests OutDiscards OutNoRoutes ReasmTimeout ReasmReqds ReasmOKs ReasmFails FragOKs FragFails FragCreates Ip: 1 64 12 0 3 5 0 0 4 8 0 0 0 0 0 0 0 0 0 ...... ---------------------------------------------------------------------- "ForwDatagrams" increase from 4 to 5 and "OutRequests" also increase from 7 to 8. Issue description and patch: IPSTATS_MIB_OUTPKTS("OutRequests") is counted with IPSTATS_MIB_OUTOCTETS ("OutOctets") in ip_finish_output2(). According to RFC 4293, it is "OutOctets" counted with "OutTransmits" but not "OutRequests". "OutRequests" does not include any datagrams counted in "ForwDatagrams". ipSystemStatsOutOctets OBJECT-TYPE DESCRIPTION "The total number of octets in IP datagrams delivered to the lower layers for transmission. Octets from datagrams counted in ipIfStatsOutTransmits MUST be counted here. ipSystemStatsOutRequests OBJECT-TYPE DESCRIPTION "The total number of IP datagrams that local IP user- protocols (including ICMP) supplied to IP in requests for transmission. Note that this counter does not include any datagrams counted in ipSystemStatsOutForwDatagrams. So do patch to define IPSTATS_MIB_OUTPKTS to "OutTransmits" and add IPSTATS_MIB_OUTREQUESTS for "OutRequests". Add IPSTATS_MIB_OUTREQUESTS counter in __ip_local_out() for ipv4 and add IPSTATS_MIB_OUT counter in ip6_finish_output2() for ipv6. Test result with patch: Before IP data is sent. ---------------------------------------------------------------------- root@qemux86-64:~# cat /proc/net/snmp Ip: Forwarding DefaultTTL InReceives InHdrErrors InAddrErrors ForwDatagrams InUnknownProtos InDiscards InDelivers OutRequests OutDiscards OutNoRoutes ReasmTimeout ReasmReqds ReasmOKs ReasmFails FragOKs FragFails FragCreates OutTransmits Ip: 1 64 9 0 5 1 0 0 3 3 0 0 0 0 0 0 0 0 0 4 ...... root@qemux86-64:~# cat /proc/net/netstat ...... IpExt: InNoRoutes InTruncatedPkts InMcastPkts OutMcastPkts InBcastPkts OutBcastPkts InOctets OutOctets InMcastOctets OutMcastOctets InBcastOctets OutBcastOctets InCsumErrors InNoECTPkts InECT1Pkts InECT0Pkts InCEPkts ReasmOverlaps IpExt: 0 0 0 0 0 0 2976 1896 0 0 0 0 0 9 0 0 0 0 ---------------------------------------------------------------------- After IP data is sent. ---------------------------------------------------------------------- root@qemux86-64:~# cat /proc/net/snmp Ip: Forwarding DefaultTTL InReceives InHdrErrors InAddrErrors ForwDatagrams InUnknownProtos InDiscards InDelivers OutRequests OutDiscards OutNoRoutes ReasmTimeout ReasmReqds ReasmOKs ReasmFails FragOKs FragFails FragCreates OutTransmits Ip: 1 64 10 0 5 2 0 0 3 3 0 0 0 0 0 0 0 0 0 5 ...... root@qemux86-64:~# cat /proc/net/netstat ...... IpExt: InNoRoutes InTruncatedPkts InMcastPkts OutMcastPkts InBcastPkts OutBcastPkts InOctets OutOctets InMcastOctets OutMcastOctets InBcastOctets OutBcastOctets InCsumErrors InNoECTPkts InECT1Pkts InECT0Pkts InCEPkts ReasmOverlaps IpExt: 0 0 0 0 0 0 4404 3324 0 0 0 0 0 10 0 0 0 0 ---------------------------------------------------------------------- "ForwDatagrams" increase from 1 to 2 and "OutRequests" is keeping 3. "OutTransmits" increase from 4 to 5 and "OutOctets" increase 1428. Signed-off-by: Heng Guo <[email protected]> Reviewed-by: Kun Song <[email protected]> Reviewed-by: Filip Pudak <[email protected]> Reviewed-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-10-20netlink: add variable-length / auto integersJakub Kicinski1-0/+5
We currently push everyone to use padding to align 64b values in netlink. Un-padded nla_put_u64() doesn't even exist any more. The story behind this possibly start with this thread: https://lore.kernel.org/netdev/[email protected]/ where DaveM was concerned about the alignment of a structure containing 64b stats. If user space tries to access such struct directly: struct some_stats *stats = nla_data(attr); printf("A: %llu", stats->a); lack of alignment may become problematic for some architectures. These days we most often put every single member in a separate attribute, meaning that the code above would use a helper like nla_get_u64(), which can deal with alignment internally. Even for arches which don't have good unaligned access - access aligned to 4B should be pretty efficient. Kernel and well known libraries deal with unaligned input already. Padded 64b is quite space-inefficient (64b + pad means at worst 16B per attr vs 32b which takes 8B). It is also more typing: if (nla_put_u64_pad(rsp, NETDEV_A_SOMETHING_SOMETHING, value, NETDEV_A_SOMETHING_PAD)) Create a new attribute type which will use 32 bits at netlink level if value is small enough (probably most of the time?), and (4B-aligned) 64 bits otherwise. Kernel API is just: if (nla_put_uint(rsp, NETDEV_A_SOMETHING_SOMETHING, value)) Calling this new type "just" sint / uint with no specific size will hopefully also make people more comfortable with using it. Currently telling people "don't use u8, you may need the bits, and netlink will round up to 4B, anyway" is the #1 comment we give to newcomers. In terms of netlink layout it looks like this: 0 4 8 12 16 32b: [nlattr][ u32 ] 64b: [ pad ][nlattr][ u64 ] uint(32) [nlattr][ u32 ] uint(64) [nlattr][ u64 ] Signed-off-by: Jakub Kicinski <[email protected]> Acked-by: Nicolas Dichtel <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-10-20Merge tag 'counter-updates-for-6.7a' of ↵Greg Kroah-Hartman1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/wbg/counter into char-misc-next William writes: First set of Counter updates for the 6.7 cycle A minor typographical error is fixed in the description comment block for struct counter_component. * tag 'counter-updates-for-6.7a' of git://git.kernel.org/pub/scm/linux/kernel/git/wbg/counter: counter: chrdev: remove a typo in header file comment
2023-10-20Merge tag 'iio-for-6.7a' of ↵Greg Kroah-Hartman1-0/+4
https://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio into char-misc-next Jonathan writes: IIO: 1st set of new device support, features and cleanup for 6.7 Particularly great to see a resolver driver move out of staging via a massive set of changes. Only took 13 years :) One small patch added then reverted due to a report of test breakage (ashai-kasei,ak8975: Drop deprecated enums.) An immutable branch was used for some hid-senors changes in case there was a need to take them into the HID tree as well. New device support ----------------- adi,hmc425a - Add support for HMC540SLP3E broadband 4-bit digital attenuator. kionix,kx022a - Add support for the kx132-1211 accelerometer. Require significant driver rework to enable this including add a chip type specific structure to deal with the chip differences. - Add support for the kx132acr-lbz accelerometer (subset of the kx022a feature set). lltc,ltc2309 - New driver for this 8 channel ADC. microchip,mcp3911 - Add support for rest of mcp391x family of ADCs (there are various differences beyond simple channel count variation. Series includes some general driver cleanup. microchip,mcp3564 - New driver for MCP3461, MCP3462, MCP3464, MCP3541, MCP3562, MCP3564 and their R variants of 16/24bit ADCs. A few minor fixed followed. rohm,bu1390 - New driver for this pressure sensor. Staging graduation ------------------ adi,ad1210 (after 13 or so years :) - More or less a complete (step-wise) rewrite of this resolver driver to bring it up to date with modern IIO standards. The fault signal handling mapping to event channels was particularly complex and significant part of the changes. Features -------- iio-core - Add chromacity and color temperature channel types. adi,ad7192 - Oversampling ratio control (called fast settling in datasheet). adi,adis16475 - Add core support and then driver support for delta angle and delta velocity channels. These are intended for summation to establish angle and velocity changes over larger timescales. Fix was needed for alignment after the temperature channel. Further fix reduced set of devices for which the buffer support was applicable as seems burst reads don't cover these on all devices. hid-sensors-als - Chromacity and color temperatures support including in amd sfh. stx104 - Add support for counter subsystem to this multipurpose device. ti,twl6030 - Add missing device tree binding description. Clean up and minor fixes. ------------------------ treewide - Drop some unused declarations across IIO. - Make more use of device_get_match_data() instead of OF specific approaches. Similar cleanup to sets of drivers. - Stop platform remove callbacks returning anything by using the temporary remove_new() callback. - Use i2c_get_match_data() to cope nicely with all types of ID table entry. - Use device_get_match_data() for various platform device to cope with more types of firmware. - Convert from enum to pointer in ID tables allowing use of i2c_get_match_data(). - Fix sorting on some ID tables. - Include specific string helper headers rather than simply string_helpers.h docs - Better description of the ordering requirements etc for available_scan_masks. tools - Handle alignment of mixed sizes where the last element isn't the biggest correctly. Seems that doesn't happen often! adi,ad2s1210 - Lots of work from David Lechner on this driver including a few fixes that are going with the rework to avoid slowing that down. adi,ad4310 - Replace deprecated devm_clk_register() adi,ad74413r - Bring the channel function setting inline with the datasheet. adi,ad7192 - Change to FIELD_PREP(), FIELD_GET(). - Calculate f_order from the sinc filter and chop filter states. - Move more per chip config into data in struct ad7192_chip_info - Cleanup unused parameter in channel macros. adi,adf4350 - Make use of devm_* to simplify error handling for many of the setup calls in probe() / tear down in remove() and error paths. Some more work to be done on this one. - Use dev_err_probe() for errors in probe() callback. adi,adf4413 - Typo in function name prefix. adi,adxl345 - Add channel scale to the chip type specific structure and drop using a type field previously used for indirection. asahi,ak8985 - Fix a mismatch introduced when switching from enum->pointers in the match tables. amlogic,meson - Expand error logging during probe. invensense,mpu6050 - Support level-shifter control. Whilst no one is sure exactly what this is doing it is needed for some old boards. - Document mount-matrix dt-binding. mediatek,mt6577 - Use devm_clk_get_enabled() to replace open coded version and move everything over to being device managed. Drop now empty remove() callback. Fix follows to put the drvdata back. - Use dev_err_probe() for error reporting in probe() callback. memsic,mxc4005 - Add of_match_table. microchip,mcp4725 - Move various chip specific data from being looked up by chip ID to data in the chip type specific structure. silicon-labs,si7005 - Add of_match_table and entry in trivial-devices.yaml st,lsm6dsx - Add missing mount-matrix dt binding documentation. st,spear - Use devm_clk_get_enabled() and some other devm calls to move everything over to being device managed. Drop now empty remove() callback. - Use dev_err_probe() to better handled deferred probing and tidy up error reporting in probe() callback. st,stm32-adc - Add a bit of additional checking in probe() to protect against a NULL pointer (no known path to trigger it today). - Replace deprecated strncpy() ti,ads1015 - Allow for edge triggers. - Document interrupt in dt-bindings. * tag 'iio-for-6.7a' of https://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio: (201 commits) iio: Use device_get_match_data() iio: adc: MCP3564: fix warn: unsigned '__x' is never less than zero. dt-bindings: trivial-devices: add silabs,si7005 iio: si7005: Add device tree support drivers: imu: adis16475.c: Remove scan index from delta channels dt-bindings: iio: imu: st,lsm6dsx: add mount-matrix property iio: resolver: ad2s1210: remove of_match_ptr() iio: resolver: ad2s1210: remove DRV_NAME macro iio: resolver: ad2s1210: move out of staging staging: iio: resolver: ad2s1210: simplify code with guard(mutex) staging: iio: resolver: ad2s1210: clear faults after soft reset staging: iio: resolver: ad2s1210: refactor sample toggle staging: iio: resolver: ad2s1210: remove fault attribute staging: iio: resolver: ad2s1210: add label attribute support staging: iio: resolver: ad2s1210: add register/fault support summary staging: iio: resolver: ad2s1210: implement fault events iio: event: add optional event label support staging: iio: resolver: ad2s1210: rename DOS reset min/max attrs staging: iio: resolver: ad2s1210: convert DOS mismatch threshold to event attr staging: iio: resolver: ad2s1210: convert DOS overrange threshold to event attr ...