aboutsummaryrefslogtreecommitdiff
path: root/include/uapi/linux
AgeCommit message (Collapse)AuthorFilesLines
2020-12-11dmaengine: idxd: add IAX configuration support in the IDXD driverDave Jiang1-0/+79
Add support to allow configuration of Intel Analytics Accelerator (IAX) in addition to the Intel Data Streaming Accelerator (DSA). The IAX hardware has the same configuration interface as DSA. The main difference is the type of operations it performs. We can support the DSA and IAX devices on the same driver with some tweaks. IAX has a 64B completion record that needs to be 64B aligned, as opposed to a 32B completion record that is 32B aligned for DSA. IAX also does not support token management. Signed-off-by: Dave Jiang <[email protected]> Link: https://lore.kernel.org/r/160564555488.1834439.4261958859935360473.stgit@djiang5-desk3.ch.intel.com Signed-off-by: Vinod Koul <[email protected]>
2020-12-11nl80211: add common API to configure SAR power limitationsCarl Huang1-0/+105
NL80211_CMD_SET_SAR_SPECS is added to configure SAR from user space. NL80211_ATTR_SAR_SPEC is used to pass the SAR power specification when used with NL80211_CMD_SET_SAR_SPECS. Wireless driver needs to register SAR type, supported frequency ranges to wiphy, so user space can query it. The index in frequency range is used to specify which sub band the power limitation applies to. The SAR type is for compatibility, so later other SAR mechanism can be implemented without breaking the user space SAR applications. Normal process is user space queries the SAR capability, and gets the index of supported frequency ranges and associates the power limitation with this index and sends to kernel. Here is an example of message send to kernel: 8c 00 00 00 08 00 01 00 00 00 00 00 38 00 2b 81 08 00 01 00 00 00 00 00 2c 00 02 80 14 00 00 80 08 00 02 00 00 00 00 00 08 00 01 00 38 00 00 00 14 00 01 80 08 00 02 00 01 00 00 00 08 00 01 00 48 00 00 00 NL80211_CMD_SET_SAR_SPECS: 0x8c NL80211_ATTR_WIPHY: 0x01(phy idx is 0) NL80211_ATTR_SAR_SPEC: 0x812b (NLA_NESTED) NL80211_SAR_ATTR_TYPE: 0x00 (NL80211_SAR_TYPE_POWER) NL80211_SAR_ATTR_SPECS: 0x8002 (NLA_NESTED) freq range 0 power: 0x38 in 0.25dbm unit (14dbm) freq range 1 power: 0x48 in 0.25dbm unit (18dbm) Signed-off-by: Carl Huang <[email protected]> Reviewed-by: Brian Norris <[email protected]> Reviewed-by: Abhishek Kumar <[email protected]> Link: https://lore.kernel.org/r/[email protected] [minor edits, NLA parse cleanups] Signed-off-by: Johannes Berg <[email protected]>
2020-12-11cfg80211: support immediate reconnect request hintJohannes Berg1-0/+6
There are cases where it's necessary to disconnect, but an immediate reconnection is desired. Support a hint to userspace that this is the case, by including a new attribute in the deauth or disassoc event. Signed-off-by: Luca Coelho <[email protected]> Link: https://lore.kernel.org/r/iwlwifi.20201206145305.58d33941fb9d.I0e7168c205c7949529c8e3b86f3c9b12c01a7017@changeid Signed-off-by: Johannes Berg <[email protected]>
2020-12-11cfg80211: include block-tx flag in channel switch started eventJohannes Berg1-1/+2
In the NL80211_CMD_CH_SWITCH_STARTED_NOTIFY event, include the NL80211_ATTR_CH_SWITCH_BLOCK_TX flag attribute if block-tx was requested by the AP. Signed-off-by: Luca Coelho <[email protected]> Link: https://lore.kernel.org/r/iwlwifi.20201129172929.8953ef22cc64.Ifee9cab337a4369938545920ba5590559e91327a@changeid Signed-off-by: Johannes Berg <[email protected]>
2020-12-11rfkill: add a reason to the HW rfkill stateEmmanuel Grumbach1-1/+15
The WLAN device may exist yet not be usable. This can happen when the WLAN device is controllable by both the host and some platform internal component. We need some arbritration that is vendor specific, but when the device is not available for the host, we need to reflect this state towards the user space. Add a reason field to the rfkill object (and event) so that userspace can know why the device is in rfkill: because some other platform component currently owns the device, or because the actual hw rfkill signal is asserted. Capable userspace can now determine the reason for the rfkill and possibly do some negotiation on a side band channel using a proprietary protocol to gain ownership on the device in case the device is owned by some other component. When the host gains ownership on the device, the kernel can remove the RFKILL_HARD_BLOCK_NOT_OWNER reason and the hw rfkill state will be off. Then, the userspace can bring the device up and start normal operation. The rfkill_event structure is enlarged to include the additional byte, it is now 9 bytes long. Old user space will ask to read only 8 bytes so that the kernel can know not to feed them with more data. When the user space writes 8 bytes, new kernels will just read what is present in the file descriptor. This new byte is read only from the userspace standpoint anyway. If a new user space uses an old kernel, it'll ask to read 9 bytes but will get only 8, and it'll know that it didn't get the new state. When it'll write 9 bytes, the kernel will again ignore this new byte which is read only from the userspace standpoint. Signed-off-by: Emmanuel Grumbach <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Johannes Berg <[email protected]>
2020-12-10ppp: add PPPIOCBRIDGECHAN and PPPIOCUNBRIDGECHAN ioctlsTom Parkin1-0/+2
This new ioctl pair allows two ppp channels to be bridged together: frames arriving in one channel are transmitted in the other channel and vice versa. The practical use for this is primarily to support the L2TP Access Concentrator use-case. The end-user session is presented as a ppp channel (typically PPPoE, although it could be e.g. PPPoA, or even PPP over a serial link) and is switched into a PPPoL2TP session for transmission to the LNS. At the LNS the PPP session is terminated in the ISP's network. When a PPP channel is bridged to another it takes a reference on the other's struct ppp_file. This reference is dropped when the channels are unbridged, which can occur either explicitly on userspace calling the PPPIOCUNBRIDGECHAN ioctl, or implicitly when either channel in the bridge is unregistered. In order to implement the channel bridge, struct channel is extended with a new field, 'bridge', which points to the other struct channel making up the bridge. This pointer is RCU protected to avoid adding another lock to the data path. To guard against concurrent writes to the pointer, the existing struct channel lock 'upl' coverage is extended rather than adding a new lock. The 'upl' lock is used to protect the existing unit pointer. Since the bridge effectively replaces the unit (they're mutually exclusive for a channel) it makes coding easier to use the same lock to cover them both. Signed-off-by: Tom Parkin <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-12-10can: isotp: add SF_BROADCAST support for functional addressingOliver Hartkopp1-1/+1
When CAN_ISOTP_SF_BROADCAST is set in the CAN_ISOTP_OPTS flags the CAN_ISOTP socket is switched into functional addressing mode, where only single frame (SF) protocol data units can be send on the specified CAN interface and the given tp.tx_id after bind(). In opposite to normal and extended addressing this socket does not register a CAN-ID for reception which would be needed for a 1-to-1 ISOTP connection with a segmented bi-directional data transfer. Sending SFs on this socket is therefore a TX-only 'broadcast' operation. Signed-off-by: Oliver Hartkopp <[email protected]> Signed-off-by: Thomas Wagner <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Marc Kleine-Budde <[email protected]>
2020-12-09io_uring: add timeout updatePavel Begunkov1-0/+1
Support timeout updates through IORING_OP_TIMEOUT_REMOVE with passed in IORING_TIMEOUT_UPDATE. Updates doesn't support offset timeout mode. Oirignal timeout.off will be ignored as well. Signed-off-by: Pavel Begunkov <[email protected]> [axboe: remove now unused 'ret' variable] Signed-off-by: Jens Axboe <[email protected]>
2020-12-09io_uring: add timeout support for io_uring_enter()Hao Xu1-0/+9
Now users who want to get woken when waiting for events should submit a timeout command first. It is not safe for applications that split SQ and CQ handling between two threads, such as mysql. Users should synchronize the two threads explicitly to protect SQ and that will impact the performance. This patch adds support for timeout to existing io_uring_enter(). To avoid overloading arguments, it introduces a new parameter structure which contains sigmask and timeout. I have tested the workloads with one thread submiting nop requests while the other reaping the cqe with timeout. It shows 1.8~2x faster when the iodepth is 16. Signed-off-by: Jiufei Xue <[email protected]> Signed-off-by: Hao Xu <[email protected]> [axboe: various cleanups/fixes, and name change to SIG_IS_DATA] Signed-off-by: Jens Axboe <[email protected]>
2020-12-09io_uring: add support for IORING_OP_UNLINKATJens Axboe1-0/+2
IORING_OP_UNLINKAT behaves like unlinkat(2) and takes the same flags and arguments. Signed-off-by: Jens Axboe <[email protected]>
2020-12-09io_uring: add support for IORING_OP_RENAMEATJens Axboe1-0/+2
IORING_OP_RENAMEAT behaves like renameat2(), and takes the same flags etc. Signed-off-by: Jens Axboe <[email protected]>
2020-12-09io_uring: allow non-fixed files with SQPOLLJens Axboe1-0/+1
The restriction of needing fixed files for SQPOLL is problematic, and prevents/inhibits several valid uses cases. With the referenced files_struct that we have now, it's trivially supportable. Treat ->files like we do the mm for the SQPOLL thread - grab a reference to it (and assign it), and drop it when we're done. This feature is exposed as IORING_FEAT_SQPOLL_NONFIXED. Signed-off-by: Jens Axboe <[email protected]>
2020-12-09btrfs: calculate inline extent buffer page size based on page sizeQu Wenruo1-1/+2
Btrfs only support 64K as maximum node size, thus for 4K page system, we would have at most 16 pages for one extent buffer. For a system using 64K page size, we would really have just one page. While we always use 16 pages for extent_buffer::pages, this means for systems using 64K pages, we are wasting memory for 15 page pointers which will never be used. Calculate the array size based on page size and the node size maximum. - for systems using 4K page size, it will stay 16 pages - for systems using 64K page size, it will be 1 page Move the definition of BTRFS_MAX_METADATA_BLOCKSIZE to btrfs_tree.h, to avoid circular inclusion of ctree.h. Reviewed-by: Johannes Thumshirn <[email protected]> Reviewed-by: Nikolay Borisov <[email protected]> Signed-off-by: Qu Wenruo <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2020-12-09binder: add flag to clear buffer on txn completeTodd Kjos1-0/+1
Add a per-transaction flag to indicate that the buffer must be cleared when the transaction is complete to prevent copies of sensitive data from being preserved in memory. Signed-off-by: Todd Kjos <[email protected]> Link: https://lore.kernel.org/r/[email protected] Cc: stable <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2020-12-08btrfs: introduce ZONED feature flagNaohiro Aota1-0/+1
This patch introduces the ZONED incompat flag. The flag indicates that the volume management will satisfy the constraints imposed by host-managed zoned block devices (aligned chunk allocation, append-only updates, reset zone after filled). As the zoned support will happen incrementally due to enhancing some core infrastructure like super block writes, tree-log, raid support, the feature will appear in sysfs only on debug builds. It will be enabled once the support is feature complete and applications can reliably check whether zoned support is present or not. Reviewed-by: Anand Jain <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]> Signed-off-by: Damien Le Moal <[email protected]> Signed-off-by: Naohiro Aota <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2020-12-07media: doc: pixfmt-rgb: Clarify naming scheme for RGB formatsLaurent Pinchart1-1/+3
The naming scheme for the RGB pixel formats has been developed organically, and isn't consistent between formats using less than 8 bits per pixels (mostly stored in 1 or 2 bytes per pixel, except for RGB666 that uses 4 bytes per pixel) and formats with 8 bits per pixel (stored in 3 or 4 bytes). For the latter category, the names use a components order convention that is the opposite of the first category, and the opposite of DRM pixel formats. This has led to lots of confusion in the past, and would really benefit from being explained more precisely. Do so, which also prepares for the addition of additional RGB pixels formats. Signed-off-by: Laurent Pinchart <[email protected]> Signed-off-by: Hans Verkuil <[email protected]> Signed-off-by: Mauro Carvalho Chehab <[email protected]>
2020-12-07media: videodev2.h: Move HM12 format to YUV semi-planar sectionLaurent Pinchart1-1/+1
V4L2_PIX_FMT_HM12 is a YUV semi-planar macro-block format. Move it from the packed YUV formats section where it was misplaced to the YUV semi-planar formats section. Signed-off-by: Laurent Pinchart <[email protected]> Signed-off-by: Hans Verkuil <[email protected]> Signed-off-by: Mauro Carvalho Chehab <[email protected]>
2020-12-07media: videodev2.h: Move HI240 format to vendor-specific sectionLaurent Pinchart1-1/+1
V4L2_PIX_FMT_HI240 is a 8-bit dithered RGB format specific to BTTV. Move it from the packed YUV formats section where it was misplaced to the vendor-specific formats section. Signed-off-by: Laurent Pinchart <[email protected]> Signed-off-by: Hans Verkuil <[email protected]> Signed-off-by: Mauro Carvalho Chehab <[email protected]>
2020-12-07media: videodev2.h: Remove unneeded comment about 4CC valueLaurent Pinchart1-6/+0
The V4L2_PIX_FMT_BGRA444 format has a comment that explains why its 4CC value is GA12. This explains the development history and isn't of much interest to readers, it should have been part of a commit message instead. Drop the comment, anyone interested in history can turn to git. Signed-off-by: Laurent Pinchart <[email protected]> Signed-off-by: Hans Verkuil <[email protected]> Signed-off-by: Mauro Carvalho Chehab <[email protected]>
2020-12-07Merge 5.10-rc7 into tty-nextGreg Kroah-Hartman3-3/+10
We want the tty fixes in here as well. Signed-off-by: Greg Kroah-Hartman <[email protected]>
2020-12-05Merge tag 'batadv-next-pullrequest-20201204' of ↵Jakub Kicinski1-0/+26
git://git.open-mesh.org/linux-merge Simon Wunderlich says: ==================== This cleanup patchset includes the following patches: - bump version strings, by Simon Wunderlich - update include for min/max helpers, by Sven Eckelmann - add infrastructure and netlink functions for routing algo selection, by Sven Eckelmann (2 patches) - drop deprecated debugfs and sysfs support and obsoleted functionality, by Sven Eckelmann (3 patches) - drop unused include in fragmentation.c, by Simon Wunderlich * tag 'batadv-next-pullrequest-20201204' of git://git.open-mesh.org/linux-merge: batman-adv: Drop unused soft-interface.h include in fragmentation.c batman-adv: Drop legacy code for auto deleting mesh interfaces batman-adv: Drop deprecated debugfs support batman-adv: Drop deprecated sysfs support batman-adv: Allow selection of routing algorithm over rtnetlink batman-adv: Prepare infrastructure for newlink settings batman-adv: Add new include for min/max helpers batman-adv: Start new development cycle ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2020-12-05gpiolib: cdev: allow edge event timestamps to be configured as REALTIMEKent Gibson1-3/+9
Using CLOCK_REALTIME as the source for event timestamps is crucial for some specific applications, particularly those requiring timetamps relative to a PTP clock, so provide an option to switch the event timestamp source from the default CLOCK_MONOTONIC to CLOCK_REALTIME. Note that CLOCK_REALTIME was the default source clock for GPIO until Linux 5.7 when it was changed to CLOCK_MONOTONIC due to issues with the shifting of the realtime clock. Providing this option maintains the CLOCK_MONOTONIC as the default, while also providing a path forward for those dependent on the pre-5.7 behaviour. Suggested-by: Jack Winch <[email protected]> Signed-off-by: Kent Gibson <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Linus Walleij <[email protected]>
2020-12-04net-zerocopy: Defer vm zap unless actually needed.Arjun Roy1-0/+2
Zapping pages is required only if we are calling vm_insert_page into a region where pages had previously been mapped. Receive zerocopy allows reusing such regions, and hitherto called zap_page_range() before calling vm_insert_page() in that range. zap_page_range() can also be triggered from userspace with madvise(MADV_DONTNEED). If userspace is configured to call this before reusing a segment, or if there was nothing mapped at this virtual address to begin with, we can avoid calling zap_page_range() under the socket lock. That said, if userspace does not do that, then we are still responsible for calling zap_page_range(). This patch adds a flag that the user can use to hint to the kernel that a zap is not required. If the flag is not set, or if an older user application does not have a flags field at all, then the kernel calls zap_page_range as before. Also, if the flag is set but a zap is still required, the kernel performs that zap as necessary. Thus incorrectly indicating that a zap can be avoided does not change the correctness of operation. It also increases the batchsize for vm_insert_pages and prefetches the page struct for the batch since we're about to bump the refcount. An alternative mechanism could be to not have a flag, assume by default a zap is not needed, and fall back to zapping if needed. However, this would harm performance for older applications for which a zap is necessary, and thus we implement it with an explicit flag so newer applications can opt in. When using RPC-style traffic with medium sized (tens of KB) RPCs, this change yields an efficency improvement of about 30% for QPS/CPU usage. Signed-off-by: Arjun Roy <[email protected]> Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: Soheil Hassas Yeganeh <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2020-12-04net-zerocopy: Copy straggler unaligned data for TCP Rx. zerocopy.Arjun Roy1-0/+2
When TCP receive zerocopy does not successfully map the entire requested space, it outputs a 'hint' that the caller should recvmsg(). Augment zerocopy to accept a user buffer that it tries to copy this hint into - if it is possible to copy the entire hint, it will do so. This elides a recvmsg() call for received traffic that isn't exactly page-aligned in size. This was tested with RPC-style traffic of arbitrary sizes. Normally, each received message required at least one getsockopt() call, and one recvmsg() call for the remaining unaligned data. With this change, almost all of the recvmsg() calls are eliminated, leading to a savings of about 25%-50% in number of system calls for RPC-style workloads. Signed-off-by: Arjun Roy <[email protected]> Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: Soheil Hassas Yeganeh <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2020-12-04bpf: Add a bpf_sock_from_file helperFlorent Revest1-0/+9
While eBPF programs can check whether a file is a socket by file->f_op == &socket_file_ops, they cannot convert the void private_data pointer to a struct socket BTF pointer. In order to do this a new helper wrapping sock_from_file is added. This is useful to tracing programs but also other program types inheriting this set of helpers such as iterators or LSM programs. Signed-off-by: Florent Revest <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: KP Singh <[email protected]> Acked-by: Martin KaFai Lau <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-12-04seg6: add support for the SRv6 End.DT4 behaviorAndrea Mayer1-0/+1
SRv6 End.DT4 is defined in the SRv6 Network Programming [1]. The SRv6 End.DT4 is used to implement IPv4 L3VPN use-cases in multi-tenants environments. It decapsulates the received packets and it performs IPv4 routing lookup in the routing table of the tenant. The SRv6 End.DT4 Linux implementation leverages a VRF device in order to force the routing lookup into the associated routing table. To make the End.DT4 work properly, it must be guaranteed that the routing table used for routing lookup operations is bound to one and only one VRF during the tunnel creation. Such constraint has to be enforced by enabling the VRF strict_mode sysctl parameter, i.e: $ sysctl -wq net.vrf.strict_mode=1. At JANOG44, LINE corporation presented their multi-tenant DC architecture using SRv6 [2]. In the slides, they reported that the Linux kernel is missing the support of SRv6 End.DT4 behavior. The SRv6 End.DT4 behavior can be instantiated using a command similar to the following: $ ip route add 2001:db8::1 encap seg6local action End.DT4 vrftable 100 dev eth0 We introduce the "vrftable" extension in iproute2 in a following patch. [1] https://tools.ietf.org/html/draft-ietf-spring-srv6-network-programming [2] https://speakerdeck.com/line_developers/line-data-center-networking-with-srv6 Signed-off-by: Andrea Mayer <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2020-12-04PCI/ERR: Bind RCEC devices to the Root Port driverQiuxu Zhuo1-0/+7
If a Root Complex Integrated Endpoint (RCiEP) is implemented, it may signal errors through a Root Complex Event Collector (RCEC). Each RCiEP must be associated with no more than one RCEC. For an RCEC (which is technically not a Bridge), error messages "received" from associated RCiEPs must be enabled for "transmission" in order to cause a System Error via the Root Control register or (when the Advanced Error Reporting Capability is present) reporting via the Root Error Command register and logging in the Root Error Status register and Error Source Identification register. Given the commonality with Root Ports and the need to also support AER and PME services for RCECs, extend the Root Port driver to support RCEC devices by adding the RCEC Class ID to the driver structure. Co-developed-by: Sean V Kelley <[email protected]> Link: https://lore.kernel.org/r/[email protected] Tested-by: Jonathan Cameron <[email protected]> # non-native/no RCEC Signed-off-by: Sean V Kelley <[email protected]> Signed-off-by: Qiuxu Zhuo <[email protected]> Signed-off-by: Bjorn Helgaas <[email protected]> Reviewed-by: Jonathan Cameron <[email protected]> Reviewed-by: Kuppuswamy Sathyanarayanan <[email protected]>
2020-12-04Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextJakub Kicinski1-1/+44
Alexei Starovoitov says: ==================== pull-request: bpf-next 2020-12-03 The main changes are: 1) Support BTF in kernel modules, from Andrii. 2) Introduce preferred busy-polling, from Björn. 3) bpf_ima_inode_hash() and bpf_bprm_opts_set() helpers, from KP Singh. 4) Memcg-based memory accounting for bpf objects, from Roman. 5) Allow bpf_{s,g}etsockopt from cgroup bind{4,6} hooks, from Stanislav. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (118 commits) selftests/bpf: Fix invalid use of strncat in test_sockmap libbpf: Use memcpy instead of strncpy to please GCC selftests/bpf: Add fentry/fexit/fmod_ret selftest for kernel module selftests/bpf: Add tp_btf CO-RE reloc test for modules libbpf: Support attachment of BPF tracing programs to kernel modules libbpf: Factor out low-level BPF program loading helper bpf: Allow to specify kernel module BTFs when attaching BPF programs bpf: Remove hard-coded btf_vmlinux assumption from BPF verifier selftests/bpf: Add CO-RE relocs selftest relying on kernel module BTF selftests/bpf: Add support for marking sub-tests as skipped selftests/bpf: Add bpf_testmod kernel module for testing libbpf: Add kernel module BTF support for CO-RE relocations libbpf: Refactor CO-RE relocs to not assume a single BTF object libbpf: Add internal helper to load BTF data by FD bpf: Keep module's btf_data_size intact after load bpf: Fix bpf_put_raw_tracepoint()'s use of __module_address() selftests/bpf: Add Userspace tests for TCP_WINDOW_CLAMP bpf: Adds support for setting window clamp samples/bpf: Fix spelling mistake "recieving" -> "receiving" bpf: Fix cold build of test_progs-no_alu32 ... ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2020-12-04fs, close_range: add flag CLOSE_RANGE_CLOEXECGiuseppe Scrivano1-0/+3
When the flag CLOSE_RANGE_CLOEXEC is set, close_range doesn't immediately close the files but it sets the close-on-exec bit. It is useful for e.g. container runtimes that usually install a seccomp profile "as late as possible" before execv'ing the container process itself. The container runtime could either do: 1 2 - install_seccomp_profile(); - close_range(MIN_FD, MAX_INT, 0); - close_range(MIN_FD, MAX_INT, 0); - install_seccomp_profile(); - execve(...); - execve(...); Both alternative have some disadvantages. In the first variant the seccomp_profile cannot block the close_range syscall, as well as opendir/read/close/... for the fallback on older kernels. In the second variant, close_range() can be used only on the fds that are not going to be needed by the runtime anymore, and it must be potentially called multiple times to account for the different ranges that must be closed. Using close_range(..., ..., CLOSE_RANGE_CLOEXEC) solves these issues. The runtime is able to use the existing open fds, the seccomp profile can block close_range() and the syscalls used for its fallback. Signed-off-by: Giuseppe Scrivano <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Christian Brauner <[email protected]>
2020-12-04batman-adv: Allow selection of routing algorithm over rtnetlinkSven Eckelmann1-0/+6
A batadv net_device is associated to a B.A.T.M.A.N. routing algorithm. This algorithm has to be selected before the interface is initialized and cannot be changed after that. The only way to select this algorithm was a module parameter which specifies the default algorithm used during the creation of the net_device. This module parameter is writeable over /sys/module/batman_adv/parameters/routing_algo and thus allows switching of the routing algorithm: 1. change routing_algo parameter 2. create new batadv net_device But this is not race free because another process can be scheduled between 1 + 2 and in that time frame change the routing_algo parameter again. It is much cleaner to directly provide this information inside the rtnetlink's RTM_NEWLINK message. The two processes would be (in regards of the creation parameter of their batadv interfaces) be isolated. This also eases the integration of batadv devices inside tools like network-manager or systemd-networkd which are not expecting to operate on /sys before a new net_device is created. Signed-off-by: Sven Eckelmann <[email protected]> Signed-off-by: Simon Wunderlich <[email protected]>
2020-12-04batman-adv: Prepare infrastructure for newlink settingsSven Eckelmann1-0/+20
The batadv generic netlink family can be used to retrieve the current state and set various configuration settings. But there are also settings which must be set before the actual interface is created. The rtnetlink already uses IFLA_INFO_DATA to allow net_device families to transfer such configurations. The minimal required functionality for this is now available for the batadv rtnl_link_ops. Also a new IFLA class of attributes will be attached to it because rtnetlink only allows 51 different attributes but batadv_nl_attrs already contains 62 attributes. Signed-off-by: Sven Eckelmann <[email protected]> Signed-off-by: Simon Wunderlich <[email protected]>
2020-12-03bpf: Allow to specify kernel module BTFs when attaching BPF programsAndrii Nakryiko1-1/+6
Add ability for user-space programs to specify non-vmlinux BTF when attaching BTF-powered BPF programs: raw_tp, fentry/fexit/fmod_ret, LSM, etc. For this, attach_prog_fd (now with the alias name attach_btf_obj_fd) should specify FD of a module or vmlinux BTF object. For backwards compatibility reasons, 0 denotes vmlinux BTF. Only kernel BTF (vmlinux or module) can be specified. Signed-off-by: Andrii Nakryiko <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-12-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-3/+6
Conflicts: drivers/net/ethernet/ibm/ibmvnic.c Signed-off-by: Jakub Kicinski <[email protected]>
2020-12-03vfio-ccw: Wire in the request callbackEric Farman1-0/+1
The device is being unplugged, so pass the request to userspace to ask for a graceful cleanup. This should free up the thread that would otherwise loop waiting for the device to be fully released. Signed-off-by: Eric Farman <[email protected]> Reviewed-by: Cornelia Huck <[email protected]> Signed-off-by: Alex Williamson <[email protected]>
2020-12-03uapi: fix statx attribute value overlap for DAX & MOUNT_ROOTEric Sandeen1-3/+6
STATX_ATTR_MOUNT_ROOT and STATX_ATTR_DAX got merged with the same value, so one of them needs fixing. Move STATX_ATTR_DAX. While we're in here, clarify the value-matching scheme for some of the attributes, and explain why the value for DAX does not match. Fixes: 80340fe3605c ("statx: add mount_root") Fixes: 712b2698e4c0 ("fs/stat: Define DAX statx attribute") Link: https://lore.kernel.org/linux-fsdevel/[email protected]/ Link: https://lore.kernel.org/lkml/[email protected]/ Reported-by: David Howells <[email protected]> Signed-off-by: Eric Sandeen <[email protected]> Reviewed-by: David Howells <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Ira Weiny <[email protected]> Cc: <[email protected]> # 5.8 Signed-off-by: Linus Torvalds <[email protected]>
2020-12-03macvlan: Support for high multicast packet rateThomas Karlsson1-0/+2
Background: Broadcast and multicast packages are enqueued for later processing. This queue was previously hardcoded to 1000. This proved insufficient for handling very high packet rates. This resulted in packet drops for multicast. While at the same time unicast worked fine. The change: This patch make the queue length adjustable to accommodate for environments with very high multicast packet rate. But still keeps the default value of 1000 unless specified. The queue length is specified as a request per macvlan using the IFLA_MACVLAN_BC_QUEUE_LEN parameter. The actual used queue length will then be the maximum of any macvlan connected to the same port. The actual used queue length for the port can be retrieved (read only) by the IFLA_MACVLAN_BC_QUEUE_LEN_USED parameter for verification. This will be followed up by a patch to iproute2 in order to adjust the parameter from userspace. Signed-off-by: Thomas Karlsson <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2020-12-03media: vicodec: mark the stateless FWHT API as stableHans Verkuil2-0/+73
The FWHT stateless 'uAPI' was staging and marked explicitly in the V4L2 specification that it will change and is unstable. Note that these control IDs were never exported as a public API, they were only defined in kernel-local headers (fwht-ctrls.h). Now, the FWHT stateless controls is ready to be part of the stable uAPI. While not too late: - Rename V4L2_CID_MPEG_VIDEO_FWHT_PARAMS to V4L2_CID_STATELESS_FWHT_PARAMS. - Move the contents of fwht-ctrls.h to v4l2-controls.h. - Move the public parts of drivers/media/test-drivers/vicodec/codec-fwht.h to v4l2-controls.h. - Add V4L2_CTRL_TYPE_FWHT_PARAMS control initialization and validation. - Add p_fwht_params to struct v4l2_ext_control. Signed-off-by: Hans Verkuil <[email protected]> Signed-off-by: Mauro Carvalho Chehab <[email protected]>
2020-12-03media: uapi: move H264 stateless controls out of stagingEzequiel Garcia2-0/+387
The H.264 stateless 'uAPI' was staging and marked explicitly in the V4L2 specification that it will change and is unstable. Note that these control IDs were never exported as a public API, they were only defined in kernel-local headers (h264-ctrls.h). Now, the H264 stateless controls is ready to be part of the stable uAPI. While not too late, let's rename them and re-number their control IDs, moving them to the newly created stateless control class, and updating all the drivers accordingly. Signed-off-by: Ezequiel Garcia <[email protected]> Tested-by: Jernej Skrabec <[email protected]> Signed-off-by: Hans Verkuil <[email protected]> Signed-off-by: Mauro Carvalho Chehab <[email protected]>
2020-12-03media: uapi: Move the H264 stateless control types out of stagingEzequiel Garcia1-0/+7
Move the H264 stateless control types out of staging, and re-number them to avoid any confusion. Signed-off-by: Ezequiel Garcia <[email protected]> Tested-by: Jernej Skrabec <[email protected]> Signed-off-by: Hans Verkuil <[email protected]> Signed-off-by: Mauro Carvalho Chehab <[email protected]>
2020-12-03media: uapi: Move parsed H264 pixel format out of stagingEzequiel Garcia1-0/+1
Since we are ready to stabilize the H264 stateless API, start by first moving the parsed H264 pixel format. Signed-off-by: Ezequiel Garcia <[email protected]> Tested-by: Jernej Skrabec <[email protected]> Signed-off-by: Hans Verkuil <[email protected]> Signed-off-by: Mauro Carvalho Chehab <[email protected]>
2020-12-03media: controls: Add the stateless codec control classEzequiel Garcia1-0/+7
Add a new control class to hold the stateless codecs controls that are ready to be moved out of staging. Signed-off-by: Ezequiel Garcia <[email protected]> Tested-by: Jernej Skrabec <[email protected]> Signed-off-by: Hans Verkuil <[email protected]> Signed-off-by: Mauro Carvalho Chehab <[email protected]>
2020-12-03media: Rename stateful codec control macrosEzequiel Garcia1-200/+209
For historical reasons, stateful codec controls are named as {}_MPEG_{}. While we can't at this point sanely change all control IDs (such as V4L2_CID_MPEG_VIDEO_VP8_FRAME_HEADER), we can least change the more meaningful macros such as classes macros. Signed-off-by: Ezequiel Garcia <[email protected]> Tested-by: Jernej Skrabec <[email protected]> Signed-off-by: Hans Verkuil <[email protected]> Signed-off-by: Mauro Carvalho Chehab <[email protected]>
2020-12-03f2fs: add F2FS_IOC_DECOMPRESS_FILE and F2FS_IOC_COMPRESS_FILEDaeho Jeong1-0/+2
Added two ioctl to decompress/compress explicitly the compression enabled file in "compress_mode=user" mount option. Using these two ioctls, the users can make a control of compression and decompression of their files. Signed-off-by: Daeho Jeong <[email protected]> Reviewed-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2020-12-02f2fs: add F2FS_IOC_SET_COMPRESS_OPTION ioctlDaeho Jeong1-0/+2
Added a new F2FS_IOC_SET_COMPRESS_OPTION ioctl to change file compression option of a file. struct f2fs_comp_option { u8 algorithm; => compression algorithm => 0:lzo, 1:lz4, 2:zstd, 3:lzorle u8 log_cluster_size; => log scale cluster size => 2 ~ 8 }; struct f2fs_comp_option option; option.algorithm = 1; option.log_cluster_size = 7; ioctl(fd, F2FS_IOC_SET_COMPRESS_OPTION, &option); Signed-off-by: Daeho Jeong <[email protected]> [Chao Yu: remove f2fs_is_compress_algorithm_valid()] Signed-off-by: Jaegeuk Kim <[email protected]>
2020-12-02kernel: Implement selective syscall userspace redirectionGabriel Krisman Bertazi1-0/+5
Introduce a mechanism to quickly disable/enable syscall handling for a specific process and redirect to userspace via SIGSYS. This is useful for processes with parts that require syscall redirection and parts that don't, but who need to perform this boundary crossing really fast, without paying the cost of a system call to reconfigure syscall handling on each boundary transition. This is particularly important for Windows games running over Wine. The proposed interface looks like this: prctl(PR_SET_SYSCALL_USER_DISPATCH, <op>, <off>, <length>, [selector]) The range [<offset>,<offset>+<length>) is a part of the process memory map that is allowed to by-pass the redirection code and dispatch syscalls directly, such that in fast paths a process doesn't need to disable the trap nor the kernel has to check the selector. This is essential to return from SIGSYS to a blocked area without triggering another SIGSYS from rt_sigreturn. selector is an optional pointer to a char-sized userspace memory region that has a key switch for the mechanism. This key switch is set to either PR_SYS_DISPATCH_ON, PR_SYS_DISPATCH_OFF to enable and disable the redirection without calling the kernel. The feature is meant to be set per-thread and it is disabled on fork/clone/execv. Internally, this doesn't add overhead to the syscall hot path, and it requires very little per-architecture support. I avoided using seccomp, even though it duplicates some functionality, due to previous feedback that maybe it shouldn't mix with seccomp since it is not a security mechanism. And obviously, this should never be considered a security mechanism, since any part of the program can by-pass it by using the syscall dispatcher. For the sysinfo benchmark, which measures the overhead added to executing a native syscall that doesn't require interception, the overhead using only the direct dispatcher region to issue syscalls is pretty much irrelevant. The overhead of using the selector goes around 40ns for a native (unredirected) syscall in my system, and it is (as expected) dominated by the supervisor-mode user-address access. In fact, with SMAP off, the overhead is consistently less than 5ns on my test box. Signed-off-by: Gabriel Krisman Bertazi <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Andy Lutomirski <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Acked-by: Kees Cook <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-12-01net/smc: Add support for obtaining SMCR device listGuvenc Gulce1-2/+11
Deliver SMCR device information via netlink based diagnostic interface. Signed-off-by: Guvenc Gulce <[email protected]> Signed-off-by: Karsten Graul <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2020-12-01net/smc: Add support for obtaining SMCD device listGuvenc Gulce1-0/+28
Deliver SMCD device information via netlink based diagnostic interface. Signed-off-by: Guvenc Gulce <[email protected]> Signed-off-by: Karsten Graul <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2020-12-01net/smc: Add SMC-D Linkgroup diagnostic supportGuvenc Gulce1-0/+27
Deliver SMCD Linkgroup information via netlink based diagnostic interface. Signed-off-by: Guvenc Gulce <[email protected]> Signed-off-by: Karsten Graul <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2020-12-01net/smc: Introduce SMCR get link commandGuvenc Gulce1-0/+18
Introduce get link command which loops through all available links of all available link groups. It uses the SMC-R linkgroup list as entry point, not the socket list, which makes linkgroup diagnosis possible, in case linkgroup does not contain active connections anymore. Signed-off-by: Guvenc Gulce <[email protected]> Signed-off-by: Karsten Graul <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2020-12-01net/smc: Introduce SMCR get linkgroup commandGuvenc Gulce1-0/+15
Introduce get linkgroup command which loops through all available SMCR linkgroups. It uses the SMC-R linkgroup list as entry point, not the socket list, which makes linkgroup diagnosis possible, in case linkgroup does not contain active connections anymore. Signed-off-by: Guvenc Gulce <[email protected]> Signed-off-by: Karsten Graul <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>