aboutsummaryrefslogtreecommitdiff
path: root/include/uapi/linux
AgeCommit message (Collapse)AuthorFilesLines
2018-05-07Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller2-12/+169
Minor conflict, a CHECK was placed into an if() statement in net-next, whilst a newline was added to that CHECK call in 'net'. Thanks to Daniel for the merge resolution. Signed-off-by: David S. Miller <[email protected]>
2018-05-07cfg80211: average ack rssi support for data framesBalaji Pothunoori1-0/+7
Average ack rssi will be given to userspace via NL80211 interface if firmware is capable. Userspace tool ‘iw’ can process this information and give the output as one of the fields in ‘iw dev wlanX station dump’. Example output : localhost ~ #iw dev wlan-5000mhz station dump Station 34:f3:9a:aa:3b:29 (on wlan-5000mhz) inactive time: 5370 ms rx bytes: 85321 rx packets: 576 tx bytes: 14225 tx packets: 71 tx retries: 0 tx failed: 2 beacon loss: 0 rx drop misc: 0 signal: -54 dBm signal avg: -53 dBm tx bitrate: 866.7 MBit/s VHT-MCS 9 80MHz short GI VHT-NSS 2 rx bitrate: 866.7 MBit/s VHT-MCS 9 80MHz short GI VHT-NSS 2 avg ack signal: -56 dBm authorized: yes authenticated: yes associated: yes preamble: short WMM/WME: yes MFP: no TDLS peer: no DTIM period: 2 beacon interval:100 short preamble: yes short slot time:yes connected time: 203 seconds Main use case is to measure the signal strength of a connected station to AP. Data packet transmit rates and bandwidth used by station can vary a lot even if the station is at fixed location, especially if the rates used are multi stream(2stream, 3stream) rates with different bandwidth(20/40/80 Mhz). These multi stream rates are sensitive and station can use different transmit power for each of the rate and bandwidth combinations. RSSI measured from these RX packets on AP will be not stable and can vary a lot with in a short time. Whereas 802.11 ack frames from station are sent relatively at a constant rate (6/12/24 Mbps) with constant bandwidth(20 Mhz). So average rssi of the ack packets is good and more accurate. Signed-off-by: Balaji Pothunoori <[email protected]> Signed-off-by: Johannes Berg <[email protected]>
2018-05-07nl80211: Add wmm rule attribute to NL80211_CMD_GET_WIPHY dump commandHaim Dreyfuss1-0/+28
This will serve userspace entity to maintain its regulatory limitation. More specifcally APs can use this data to calculate the WMM IE when building: beacons, probe responses, assoc responses etc... Signed-off-by: Haim Dreyfuss <[email protected]> Signed-off-by: Luca Coelho <[email protected]> Signed-off-by: Johannes Berg <[email protected]>
2018-05-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller7-92/+174
Pablo Neira Ayuso says: ==================== Netfilter/IPVS updates for net-next The following patchset contains Netfilter/IPVS updates for your net-next tree, more relevant updates in this batch are: 1) Add Maglev support to IPVS. Moreover, store lastest server weight in IPVS since this is needed by maglev, patches from from Inju Song. 2) Preparation works to add iptables flowtable support, patches from Felix Fietkau. 3) Hand over flows back to conntrack slow path in case of TCP RST/FIN packet is seen via new teardown state, also from Felix. 4) Add support for extended netlink error reporting for nf_tables. 5) Support for larger timeouts that 23 days in nf_tables, patch from Florian Westphal. 6) Always set an upper limit to dynamic sets, also from Florian. 7) Allow number generator to make map lookups, from Laura Garcia. 8) Use hash_32() instead of opencode hashing in IPVS, from Vicent Bernat. 9) Extend ip6tables SRH match to support previous, next and last SID, from Ahmed Abdelsalam. 10) Move Passive OS fingerprint nf_osf.c, from Fernando Fernandez. 11) Expose nf_conntrack_max through ctnetlink, from Florent Fourcot. 12) Several housekeeping patches for xt_NFLOG, x_tables and ebtables, from Taehee Yoo. 13) Unify meta bridge with core nft_meta, then make nft_meta built-in. Make rt and exthdr built-in too, again from Florian. 14) Missing initialization of tbl->entries in IPVS, from Cong Wang. ==================== Signed-off-by: David S. Miller <[email protected]>
2018-05-07netfilter: ctnetlink: export nf_conntrack_maxFlorent Fourcot1-0/+1
IPCTNL_MSG_CT_GET_STATS netlink command allow to monitor current number of conntrack entries. However, if one wants to compare it with the maximum (and detect exhaustion), the only solution is currently to read sysctl value. This patch add nf_conntrack_max value in netlink message, and simplify monitoring for application built on netlink API. Signed-off-by: Florent Fourcot <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2018-05-07netfilter: extract Passive OS fingerprint infrastructure from xt_osfFernando Fernandez Mancera2-89/+107
Add nf_osf_ttl() and nf_osf_match() into nf_osf.c to prepare for nf_tables support. Signed-off-by: Fernando Fernandez Mancera <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2018-05-06netfilter: nf_tables: Provide NFT_{RT,CT}_MAX for userspacePhil Sutter1-0/+4
These macros allow conveniently declaring arrays which use NFT_{RT,CT}_* values as indexes. Signed-off-by: Phil Sutter <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2018-05-06netfilter: ip6t_srh: extend SRH matching for previous, next and last SIDAhmed Abdelsalam1-2/+41
IPv6 Segment Routing Header (SRH) contains a list of SIDs to be crossed by SR encapsulated packet. Each SID is encoded as an IPv6 prefix. When a Firewall receives an SR encapsulated packet, it should be able to identify which node previously processed the packet (previous SID), which node is going to process the packet next (next SID), and which node is the last to process the packet (last SID) which represent the final destination of the packet in case of inline SR mode. An example use-case of using these features could be SID list that includes two firewalls. When the second firewall receives a packet, it can check whether the packet has been processed by the first firewall or not. Based on that check, it decides to apply all rules, apply just subset of the rules, or totally skip all rules and forward the packet to the next SID. This patch extends SRH match to support matching previous SID, next SID, and last SID. Signed-off-by: Ahmed Abdelsalam <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2018-05-06netfilter: nft_numgen: add map lookups for numgen statementsLaura Garcia Liebana1-0/+4
This patch includes a new attribute in the numgen structure to allow the lookup of an element based on the number generator as a key. For this purpose, different ops have been included to extend the current numgen inc functions. Currently, only supported for numgen incremental operations, but it will be supported for random in a follow-up patch. Signed-off-by: Laura Garcia Liebana <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2018-05-04Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds3-3/+3
Pull rdma fixes from Doug Ledford: "This is our first pull request of the rc cycle. It's not that it's been overly quiet, we were just waiting on a few things before sending this off. For instance, the 6 patch series from Intel for the hfi1 driver had actually been pulled in on Tuesday for a Wednesday pull request, only to have Jason notice something I missed, so we held off for some testing, and then on Thursday had to respin the series because the very first patch needed a minor fix (unnecessary cast is all). There is a sizable hns patch series in here, as well as a reasonably largish hfi1 patch series, then all of the lines of uapi updates are just the change to the new official Linux-OpenIB SPDX tag (a bunch of our files had what amounts to a BSD-2-Clause + MIT Warranty statement as their license as a result of the initial code submission years ago, and the SPDX folks decided it was unique enough to warrant a unique tag), then the typical mlx4 and mlx5 updates, and finally some cxgb4 and core/cache/cma updates to round out the bunch. None of it was overly large by itself, but in the 2 1/2 weeks we've been collecting patches, it has added up :-/. As best I can tell, it's been through 0day (I got a notice about my last for-next push, but not for my for-rc push, but Jason seems to think that failure messages are prioritized and success messages not so much). It's also been through linux-next. And yes, we did notice in the context portion of the CMA query gid fix patch that there is a dubious BUG_ON() in the code, and have plans to audit our BUG_ON usage and remove it anywhere we can. Summary: - Various build fixes (USER_ACCESS=m and ADDR_TRANS turned off) - SPDX license tag cleanups (new tag Linux-OpenIB) - RoCE GID fixes related to default GIDs - Various fixes to: cxgb4, uverbs, cma, iwpm, rxe, hns (big batch), mlx4, mlx5, and hfi1 (medium batch)" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (52 commits) RDMA/cma: Do not query GID during QP state transition to RTR IB/mlx4: Fix integer overflow when calculating optimal MTT size IB/hfi1: Fix memory leak in exception path in get_irq_affinity() IB/{hfi1, rdmavt}: Fix memory leak in hfi1_alloc_devdata() upon failure IB/hfi1: Fix NULL pointer dereference when invalid num_vls is used IB/hfi1: Fix loss of BECN with AHG IB/hfi1 Use correct type for num_user_context IB/hfi1: Fix handling of FECN marked multicast packet IB/core: Make ib_mad_client_id atomic iw_cxgb4: Atomically flush per QP HW CQEs IB/uverbs: Fix kernel crash during MR deregistration flow IB/uverbs: Prevent reregistration of DM_MR to regular MR RDMA/mlx4: Add missed RSS hash inner header flag RDMA/hns: Fix a couple misspellings RDMA/hns: Submit bad wr RDMA/hns: Update assignment method for owner field of send wqe RDMA/hns: Adjust the order of cleanup hem table RDMA/hns: Only assign dqpn if IB_QP_PATH_DEST_QPN bit is set RDMA/hns: Remove some unnecessary attr_mask judgement RDMA/hns: Only assign mtu if IB_QP_PATH_MTU bit is set ...
2018-05-05seccomp: Add filter flag to opt-out of SSB mitigationKees Cook1-2/+3
If a seccomp user is not interested in Speculative Store Bypass mitigation by default, it can set the new SECCOMP_FILTER_FLAG_SPEC_ALLOW flag when adding filters. Signed-off-by: Kees Cook <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]>
2018-05-05prctl: Add force disable speculationThomas Gleixner1-0/+1
For certain use cases it is desired to enforce mitigations so they cannot be undone afterwards. That's important for loader stubs which want to prevent a child from disabling the mitigation again. Will also be used for seccomp(). The extra state preserving of the prctl state for SSB is a preparatory step for EBPF dymanic speculation control. Signed-off-by: Thomas Gleixner <[email protected]>
2018-05-04Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller4-19/+22
Overlapping changes in selftests Makefile. Signed-off-by: David S. Miller <[email protected]>
2018-05-03bpf: add skb_load_bytes_relative helperDaniel Borkmann1-1/+32
This adds a small BPF helper similar to bpf_skb_load_bytes() that is able to load relative to mac/net header offset from the skb's linear data. Compared to bpf_skb_load_bytes(), it takes a fifth argument namely start_header, which is either BPF_HDR_START_MAC or BPF_HDR_START_NET. This allows for a more flexible alternative compared to LD_ABS/LD_IND with negative offset. It's enabled for tc BPF programs as well as sock filter program types where it's mainly useful in reuseport programs to ease access to lower header data. Reference: https://lists.iovisor.org/pipermail/iovisor-dev/2017-March/000698.html Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Alexei Starovoitov <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]>
2018-05-03xsk: statistics supportMagnus Karlsson1-0/+7
In this commit, a new getsockopt is added: XDP_STATISTICS. This is used to obtain stats from the sockets. v2: getsockopt now returns size of stats structure. Signed-off-by: Magnus Karlsson <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]>
2018-05-03xsk: add Tx queue setup and mmap supportMagnus Karlsson1-0/+2
Another setsockopt (XDP_TX_QUEUE) is added to let the process allocate a queue, where the user process can pass frames to be transmitted by the kernel. The mmapping of the queue is done using the XDP_PGOFF_TX_QUEUE offset. Signed-off-by: Magnus Karlsson <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]>
2018-05-03xsk: add umem completion queue support and mmapMagnus Karlsson1-0/+2
Here, we add another setsockopt for registered user memory (umem) called XDP_UMEM_COMPLETION_QUEUE. Using this socket option, the process can ask the kernel to allocate a queue (ring buffer) and also mmap it (XDP_UMEM_PGOFF_COMPLETION_QUEUE) into the process. The queue is used to explicitly pass ownership of umem frames from the kernel to user process. This will be used by the TX path to tell user space that a certain frame has been transmitted and user space can use it for something else, if it wishes. Signed-off-by: Magnus Karlsson <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]>
2018-05-03bpf: introduce new bpf AF_XDP map type BPF_MAP_TYPE_XSKMAPBjörn Töpel1-0/+1
The xskmap is yet another BPF map, very much inspired by dev/cpu/sockmap, and is a holder of AF_XDP sockets. A user application adds AF_XDP sockets into the map, and by using the bpf_redirect_map helper, an XDP program can redirect XDP frames to an AF_XDP socket. Note that a socket that is bound to certain ifindex/queue index will *only* accept XDP frames from that netdev/queue index. If an XDP program tries to redirect from a netdev/queue index other than what the socket is bound to, the frame will not be received on the socket. A socket can reside in multiple maps. v3: Fixed race and simplified code. v2: Removed one indirection in map lookup. Signed-off-by: Björn Töpel <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]>
2018-05-03xsk: add support for bind for RxMagnus Karlsson1-0/+11
Here, the bind syscall is added. Binding an AF_XDP socket, means associating the socket to an umem, a netdev and a queue index. This can be done in two ways. The first way, creating a "socket from scratch". Create the umem using the XDP_UMEM_REG setsockopt and an associated fill queue with XDP_UMEM_FILL_QUEUE. Create the Rx queue using the XDP_RX_QUEUE setsockopt. Call bind passing ifindex and queue index ("channel" in ethtool speak). The second way to bind a socket, is simply skipping the umem/netdev/queue index, and passing another already setup AF_XDP socket. The new socket will then have the same umem/netdev/queue index as the parent so it will share the same umem. You must also set the flags field in the socket address to XDP_SHARED_UMEM. v2: Use PTR_ERR instead of passing error variable explicitly. Signed-off-by: Magnus Karlsson <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]>
2018-05-03xsk: add Rx queue setup and mmap supportBjörn Töpel1-0/+16
Another setsockopt (XDP_RX_QUEUE) is added to let the process allocate a queue, where the kernel can pass completed Rx frames from the kernel to user process. The mmapping of the queue is done using the XDP_PGOFF_RX_QUEUE offset. Signed-off-by: Björn Töpel <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]>
2018-05-03xsk: add umem fill queue support and mmapMagnus Karlsson1-0/+15
Here, we add another setsockopt for registered user memory (umem) called XDP_UMEM_FILL_QUEUE. Using this socket option, the process can ask the kernel to allocate a queue (ring buffer) and also mmap it (XDP_UMEM_PGOFF_FILL_QUEUE) into the process. The queue is used to explicitly pass ownership of umem frames from the user process to the kernel. These frames will in a later patch be filled in with Rx packet data by the kernel. v2: Fixed potential crash in xsk_mmap. Signed-off-by: Magnus Karlsson <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]>
2018-05-03xsk: add user memory registration support sockoptBjörn Töpel1-0/+34
In this commit the base structure of the AF_XDP address family is set up. Further, we introduce the abilty register a window of user memory to the kernel via the XDP_UMEM_REG setsockopt syscall. The memory window is viewed by an AF_XDP socket as a set of equally large frames. After a user memory registration all frames are "owned" by the user application, and not the kernel. v2: More robust checks on umem creation and unaccount on error. Call set_page_dirty_lock on cleanup. Simplified xdp_umem_reg. Co-authored-by: Magnus Karlsson <[email protected]> Signed-off-by: Magnus Karlsson <[email protected]> Signed-off-by: Björn Töpel <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]>
2018-05-03prctl: Add speculation control prctlsThomas Gleixner1-0/+11
Add two new prctls to control aspects of speculation related vulnerabilites and their mitigations to provide finer grained control over performance impacting mitigations. PR_GET_SPECULATION_CTRL returns the state of the speculation misfeature which is selected with arg2 of prctl(2). The return value uses bit 0-2 with the following meaning: Bit Define Description 0 PR_SPEC_PRCTL Mitigation can be controlled per task by PR_SET_SPECULATION_CTRL 1 PR_SPEC_ENABLE The speculation feature is enabled, mitigation is disabled 2 PR_SPEC_DISABLE The speculation feature is disabled, mitigation is enabled If all bits are 0 the CPU is not affected by the speculation misfeature. If PR_SPEC_PRCTL is set, then the per task control of the mitigation is available. If not set, prctl(PR_SET_SPECULATION_CTRL) for the speculation misfeature will fail. PR_SET_SPECULATION_CTRL allows to control the speculation misfeature, which is selected by arg2 of prctl(2) per task. arg3 is used to hand in the control value, i.e. either PR_SPEC_ENABLE or PR_SPEC_DISABLE. The common return values are: EINVAL prctl is not implemented by the architecture or the unused prctl() arguments are not 0 ENODEV arg2 is selecting a not supported speculation misfeature PR_SET_SPECULATION_CTRL has these additional return values: ERANGE arg3 is incorrect, i.e. it's not either PR_SPEC_ENABLE or PR_SPEC_DISABLE ENXIO prctl control of the selected speculation misfeature is disabled The first supported controlable speculation misfeature is PR_SPEC_STORE_BYPASS. Add the define so this can be shared between architectures. Based on an initial patch from Tim Chen and mostly rewritten. Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Ingo Molnar <[email protected]> Reviewed-by: Konrad Rzeszutek Wilk <[email protected]>
2018-05-02aio: implement io_pgeteventsChristoph Hellwig1-0/+6
This is the io_getevents equivalent of ppoll/pselect and allows to properly mix signals and aio completions (especially with IOCB_CMD_POLL) and atomically executes the following sequence: sigset_t origmask; pthread_sigmask(SIG_SETMASK, &sigmask, &origmask); ret = io_getevents(ctx, min_nr, nr, events, timeout); pthread_sigmask(SIG_SETMASK, &origmask, NULL); Note that unlike many other signal related calls we do not pass a sigmask size, as that would get us to 7 arguments, which aren't easily supported by the syscall infrastructure. It seems a lot less painful to just add a new syscall variant in the unlikely case we're going to increase the sigset size. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Greg Kroah-Hartman <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]>
2018-05-02Merge branch 'timers/urgent' into timers/coreThomas Gleixner6-22/+40
Pick up urgent fixes to apply dependent cleanup patch
2018-05-01tcp: send in-queue bytes in cmsg upon readSoheil Hassas Yeganeh1-0/+3
Applications with many concurrent connections, high variance in receive queue length and tight memory bounds cannot allocate worst-case buffer size to drain sockets. Knowing the size of receive queue length, applications can optimize how they allocate buffers to read from the socket. The number of bytes pending on the socket is directly available through ioctl(FIONREAD/SIOCINQ) and can be approximated using getsockopt(MEMINFO) (rmem_alloc includes skb overheads in addition to application data). But, both of these options add an extra syscall per recvmsg. Moreover, ioctl(FIONREAD/SIOCINQ) takes the socket lock. Add the TCP_INQ socket option to TCP. When this socket option is set, recvmsg() relays the number of bytes available on the socket for reading to the application via the TCP_CM_INQ control message. Calculate the number of bytes after releasing the socket lock to include the processed backlog, if any. To avoid an extra branch in the hot path of recvmsg() for this new control message, move all cmsg processing inside an existing branch for processing receive timestamps. Since the socket lock is not held when calculating the size of receive queue, TCP_INQ is a hint. For example, it can overestimate the queue size by one byte, if FIN is received. With this method, applications can start reading from the socket using a small buffer, and then use larger buffers based on the remaining data when needed. V3 change-log: As suggested by David Miller, added loads with barrier to check whether we have multiple threads calling recvmsg in parallel. When that happens we lock the socket to calculate inq. V4 change-log: Removed inline from a static function. Signed-off-by: Soheil Hassas Yeganeh <[email protected]> Signed-off-by: Yuchung Cheng <[email protected]> Signed-off-by: Willem de Bruijn <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Reviewed-by: Neal Cardwell <[email protected]> Suggested-by: David Miller <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-05-01connector: add parent pid and tgid to coredump and exit eventsStefan Strogin1-0/+4
The intention is to get notified of process failures as soon as possible, before a possible core dumping (which could be very long) (e.g. in some process-manager). Coredump and exit process events are perfect for such use cases (see 2b5faa4c553f "connector: Added coredumping event to the process connector"). The problem is that for now the process-manager cannot know the parent of a dying process using connectors. This could be useful if the process-manager should monitor for failures only children of certain parents, so we could filter the coredump and exit events by parent process and/or thread ID. Add parent pid and tgid to coredump and exit process connectors event data. Signed-off-by: Stefan Strogin <[email protected]> Acked-by: Evgeniy Polyakov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-04-30Merge 4.17-rc3 into usb-nextGreg Kroah-Hartman6-22/+40
This resolves the merge issue with drivers/usb/core/hcd.c Reported-by: Stephen Rothwell <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2018-04-30bpf: fix formatting for bpf_get_stack() helper docQuentin Monnet1-27/+27
Fix formatting (indent) for bpf_get_stack() helper documentation, so that the doc is rendered correctly with the Python script. Fixes: c195651e565a ("bpf: add bpf_get_stack helper") Cc: Yonghong Song <[email protected]> Signed-off-by: Quentin Monnet <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]>
2018-04-30bpf: fix formatting for bpf_perf_event_read() helper docQuentin Monnet1-4/+4
Some edits brought to the last iteration of BPF helper functions documentation introduced an error with RST formatting. As a result, most of one paragraph is rendered in bold text when only the name of a helper should be. Fix it, and fix formatting of another function name in the same paragraph. Fixes: c6b5fb8690fa ("bpf: add documentation for eBPF helpers (42-50)") Signed-off-by: Quentin Monnet <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]>
2018-04-29tcp: add TCP_ZEROCOPY_RECEIVE support for zerocopy receiveEric Dumazet1-0/+8
When adding tcp mmap() implementation, I forgot that socket lock had to be taken before current->mm->mmap_sem. syzbot eventually caught the bug. Since we can not lock the socket in tcp mmap() handler we have to split the operation in two phases. 1) mmap() on a tcp socket simply reserves VMA space, and nothing else. This operation does not involve any TCP locking. 2) getsockopt(fd, IPPROTO_TCP, TCP_ZEROCOPY_RECEIVE, ...) implements the transfert of pages from skbs to one VMA. This operation only uses down_read(&current->mm->mmap_sem) after holding TCP lock, thus solving the lockdep issue. This new implementation was suggested by Andy Lutomirski with great details. Benefits are : - Better scalability, in case multiple threads reuse VMAS (without mmap()/munmap() calls) since mmap_sem wont be write locked. - Better error recovery. The previous mmap() model had to provide the expected size of the mapping. If for some reason one part could not be mapped (partial MSS), the whole operation had to be aborted. With the tcp_zerocopy_receive struct, kernel can report how many bytes were successfuly mapped, and how many bytes should be read to skip the problematic sequence. - No more memory allocation to hold an array of page pointers. 16 MB mappings needed 32 KB for this array, potentially using vmalloc() :/ - skbs are freed while mmap_sem has been released Following patch makes the change in tcp_mmap tool to demonstrate one possible use of mmap() and setsockopt(... TCP_ZEROCOPY_RECEIVE ...) Note that memcg might require additional changes. Fixes: 93ab6cc69162 ("tcp: implement mmap() for zero copy receive") Signed-off-by: Eric Dumazet <[email protected]> Reported-by: syzbot <[email protected]> Suggested-by: Andy Lutomirski <[email protected]> Cc: [email protected] Acked-by: Soheil Hassas Yeganeh <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-04-29Merge branch 'timers-urgent-for-linus' of ↵Linus Torvalds1-1/+0
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fixes from Thomas Gleixner: "Two fixes from the timer departement: - Fix a long standing issue in the NOHZ tick code which causes RB tree corruption, delayed timers and other malfunctions. The cause for this is code which modifies the expiry time of an enqueued hrtimer. - Revert the CLOCK_MONOTONIC/CLOCK_BOOTTIME unification due to regression reports. Seems userspace _is_ relying on the documented behaviour despite our hope that it wont" * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: Revert: Unify CLOCK_MONOTONIC and CLOCK_BOOTTIME tick/sched: Do not mess with an enqueued hrtimer
2018-04-29bpf: Fix helpers ctx struct types in uapi docAndrey Ignatov1-6/+6
Helpers may operate on two types of ctx structures: user visible ones (e.g. `struct bpf_sock_ops`) when used in user programs, and kernel ones (e.g. `struct bpf_sock_ops_kern`) in kernel implementation. UAPI documentation must refer to only user visible structures. The patch replaces references to `_kern` structures in BPF helpers description by corresponding user visible structures. Signed-off-by: Andrey Ignatov <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]>
2018-04-29bpf: add bpf_get_stack helperYonghong Song1-2/+40
Currently, stackmap and bpf_get_stackid helper are provided for bpf program to get the stack trace. This approach has a limitation though. If two stack traces have the same hash, only one will get stored in the stackmap table, so some stack traces are missing from user perspective. This patch implements a new helper, bpf_get_stack, will send stack traces directly to bpf program. The bpf program is able to see all stack traces, and then can do in-kernel processing or send stack traces to user space through shared map or bpf_perf_event_output. Acked-by: Alexei Starovoitov <[email protected]> Signed-off-by: Yonghong Song <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]>
2018-04-27rMerge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds1-0/+7
Pull KVM fixes from Radim Krčmář: "ARM: - PSCI selection API, a leftover from 4.16 (for stable) - Kick vcpu on active interrupt affinity change - Plug a VMID allocation race on oversubscribed systems - Silence debug messages - Update Christoffer's email address (linaro -> arm) x86: - Expose userspace-relevant bits of a newly added feature - Fix TLB flushing on VMX with VPID, but without EPT" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: x86/headers/UAPI: Move DISABLE_EXITS KVM capability bits to the UAPI kvm: apic: Flush TLB after APIC mode/address change if VPIDs are in use arm/arm64: KVM: Add PSCI version selection API KVM: arm/arm64: vgic: Kick new VCPU on interrupt migration arm64: KVM: Demote SVE and LORegion warnings to debug only MAINTAINERS: Update e-mail address for Christoffer Dall KVM: arm/arm64: Close VMID generation race
2018-04-27PCI: Add PCI_EXP_LNKCTL2_TLS* macrosFrederick Lawler1-0/+5
The Link Control 2 register is missing macros for Target Link Speeds. Add those in. Signed-off-by: Frederick Lawler <[email protected]> [bhelgaas: use "GT" instead of "GB"] Signed-off-by: Bjorn Helgaas <[email protected]>
2018-04-27x86/headers/UAPI: Move DISABLE_EXITS KVM capability bits to the UAPIKarimAllah Ahmed1-0/+7
Move DISABLE_EXITS KVM capability bits to the UAPI just like the rest of capabilities. Cc: Paolo Bonzini <[email protected]> Cc: Radim Krčmář <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Signed-off-by: KarimAllah Ahmed <[email protected]> Signed-off-by: Radim Krčmář <[email protected]>
2018-04-27Merge tag 'staging-4.17-rc3' of ↵Linus Torvalds1-18/+0
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging Pull staging fixes from Greg KH: "Here are two staging driver fixups for 4.17-rc3. The first is the remaining stragglers of the irda code removal that you pointed out during the merge window. The second is a fix for the wilc1000 driver due to a patch that got merged in 4.17-rc1. Both of these have been in linux-next for a while with no reported issues" * tag 'staging-4.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: staging: wilc1000: fix NULL pointer exception in host_int_parse_assoc_resp_info() staging: irda: remove remaining remants of irda code removal
2018-04-27tipc: introduce ioctl for fetching node identityJon Maloy1-4/+8
After the introduction of a 128-bit node identity it may be difficult for a user to correlate between this identity and the generated node hash address. We now try to make this easier by introducing a new ioctl() call for fetching a node identity by using the hash value as key. This will be particularly useful when we extend some of the commands in the 'tipc' tool, but we also expect regular user applications to need this feature. Acked-by: Ying Xue <[email protected]> Signed-off-by: Jon Maloy <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-04-26Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller2-393/+1399
Daniel Borkmann says: ==================== pull-request: bpf-next 2018-04-27 The following pull-request contains BPF updates for your *net-next* tree. The main changes are: 1) Add extensive BPF helper description into include/uapi/linux/bpf.h and a new script bpf_helpers_doc.py which allows for generating a man page out of it. Thus, every helper in BPF now comes with proper function signature, detailed description and return code explanation, from Quentin. 2) Migrate the BPF collect metadata tunnel tests from BPF samples over to the BPF selftests and further extend them with v6 vxlan, geneve and ipip tests, simplify the ipip tests, improve documentation and convert to bpf_ntoh*() / bpf_hton*() api, from William. 3) Currently, helpers that expect ARG_PTR_TO_MAP_{KEY,VALUE} can only access stack and packet memory. Extend this to allow such helpers to also use map values, which enabled use cases where value from a first lookup can be directly used as a key for a second lookup, from Paul. 4) Add a new helper bpf_skb_get_xfrm_state() for tc BPF programs in order to retrieve XFRM state information containing SPI, peer address and reqid values, from Eyal. 5) Various optimizations in nfp driver's BPF JIT in order to turn ADD and SUB instructions with negative immediate into the opposite operation with a positive immediate such that nfp can better fit small immediates into instructions. Savings in instruction count up to 4% have been observed, from Jakub. 6) Add the BPF prog's gpl_compatible flag to struct bpf_prog_info and add support for dumping this through bpftool, from Jiri. 7) Move the BPF sockmap samples over into BPF selftests instead since sockmap was rather a series of tests than sample anyway and this way this can be run from automated bots, from John. 8) Follow-up fix for bpf_adjust_tail() helper in order to make it work with generic XDP, from Nikita. 9) Some follow-up cleanups to BTF, namely, removing unused defines from BTF uapi header and renaming 'name' struct btf_* members into name_off to make it more clear they are offsets into string section, from Martin. 10) Remove test_sock_addr from TEST_GEN_PROGS in BPF selftests since not run directly but invoked from test_sock_addr.sh, from Yonghong. 11) Remove redundant ret assignment in sample BPF loader, from Wang. 12) Add couple of missing files to BPF selftest's gitignore, from Anders. There are two trivial merge conflicts while pulling: 1) Remove samples/sockmap/Makefile since all sockmap tests have been moved to selftests. 2) Add both hunks from tools/testing/selftests/bpf/.gitignore to the file since git should ignore all of them. ==================== Signed-off-by: David S. Miller <[email protected]>
2018-04-26signal/signalfd: Add support for SIGSYSEric W. Biederman1-1/+5
I don't know why signalfd has never grown support for SIGSYS but grow it now. This corrects an oversight and removes a need for a default in the switch statement. Allowing gcc to warn when future members are added to the enum siginfo_layout, and signalfd does not handle them. Signed-off-by: "Eric W. Biederman" <[email protected]>
2018-04-27netfilter: Fix handling simultaneous open in TCP conntrackJozsef Kadlecsik1-0/+3
Dominique Martinet reported a TCP hang problem when simultaneous open was used. The problem is that the tcp_conntracks state table is not smart enough to handle the case. The state table could be fixed by introducing a new state, but that would require more lines of code compared to this patch, due to the required backward compatibility with ctnetlink. Signed-off-by: Jozsef Kadlecsik <[email protected]> Reported-by: Dominique Martinet <[email protected]> Tested-by: Dominique Martinet <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2018-04-27bpf: add documentation for eBPF helpers (65-66)Quentin Monnet1-0/+30
Add documentation for eBPF helper functions to bpf.h user header file. This documentation can be parsed with the Python script provided in another commit of the patch series, in order to provide a RST document that can later be converted into a man page. The objective is to make the documentation easily understandable and accessible to all eBPF developers, including beginners. This patch contains descriptions for the following helper functions: Helper from Nikita: - bpf_xdp_adjust_tail() Helper from Eyal: - bpf_skb_get_xfrm_state() v4: - New patch (helpers did not exist yet for previous versions). Cc: Nikita V. Shirokov <[email protected]> Cc: Eyal Birger <[email protected]> Signed-off-by: Quentin Monnet <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]>
2018-04-27bpf: add documentation for eBPF helpers (58-64)Quentin Monnet1-0/+147
Add documentation for eBPF helper functions to bpf.h user header file. This documentation can be parsed with the Python script provided in another commit of the patch series, in order to provide a RST document that can later be converted into a man page. The objective is to make the documentation easily understandable and accessible to all eBPF developers, including beginners. This patch contains descriptions for the following helper functions, all written by John: - bpf_redirect_map() - bpf_sk_redirect_map() - bpf_sock_map_update() - bpf_msg_redirect_map() - bpf_msg_apply_bytes() - bpf_msg_cork_bytes() - bpf_msg_pull_data() v4: - bpf_redirect_map(): Fix typos: "XDP_ABORT" changed to "XDP_ABORTED", "his" to "this". Also add a paragraph on performance improvement over bpf_redirect() helper. v3: - bpf_sk_redirect_map(): Improve description of BPF_F_INGRESS flag. - bpf_msg_redirect_map(): Improve description of BPF_F_INGRESS flag. - bpf_redirect_map(): Fix note on CPU redirection, not fully implemented for generic XDP but supported on native XDP. - bpf_msg_pull_data(): Clarify comment about invalidated verifier checks. Cc: Jesper Dangaard Brouer <[email protected]> Cc: John Fastabend <[email protected]> Signed-off-by: Quentin Monnet <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]>
2018-04-27bpf: add documentation for eBPF helpers (51-57)Quentin Monnet1-0/+180
Add documentation for eBPF helper functions to bpf.h user header file. This documentation can be parsed with the Python script provided in another commit of the patch series, in order to provide a RST document that can later be converted into a man page. The objective is to make the documentation easily understandable and accessible to all eBPF developers, including beginners. This patch contains descriptions for the following helper functions: Helpers from Lawrence: - bpf_setsockopt() - bpf_getsockopt() - bpf_sock_ops_cb_flags_set() Helpers from Yonghong: - bpf_perf_event_read_value() - bpf_perf_prog_read_value() Helper from Josef: - bpf_override_return() Helper from Andrey: - bpf_bind() v4: - bpf_perf_event_read_value(): State that this helper should be preferred over bpf_perf_event_read(). v3: - bpf_perf_event_read_value(): Fix time of selection for perf event type in description. Remove occurences of "cores" to avoid confusion with "CPU". - bpf_bind(): Remove last paragraph of description, which was off topic. Cc: Lawrence Brakmo <[email protected]> Cc: Yonghong Song <[email protected]> Cc: Josef Bacik <[email protected]> Cc: Andrey Ignatov <[email protected]> Signed-off-by: Quentin Monnet <[email protected]> Acked-by: Yonghong Song <[email protected]> [for bpf_perf_event_read_value(), bpf_perf_prog_read_value()] Acked-by: Andrey Ignatov <[email protected]> [for bpf_bind()] Signed-off-by: Daniel Borkmann <[email protected]>
2018-04-27bpf: add documentation for eBPF helpers (42-50)Quentin Monnet1-0/+172
Add documentation for eBPF helper functions to bpf.h user header file. This documentation can be parsed with the Python script provided in another commit of the patch series, in order to provide a RST document that can later be converted into a man page. The objective is to make the documentation easily understandable and accessible to all eBPF developers, including beginners. This patch contains descriptions for the following helper functions: Helper from Kaixu: - bpf_perf_event_read() Helpers from Martin: - bpf_skb_under_cgroup() - bpf_xdp_adjust_head() Helpers from Sargun: - bpf_probe_write_user() - bpf_current_task_under_cgroup() Helper from Thomas: - bpf_skb_change_head() Helper from Gianluca: - bpf_probe_read_str() Helpers from Chenbo: - bpf_get_socket_cookie() - bpf_get_socket_uid() v4: - bpf_perf_event_read(): State that bpf_perf_event_read_value() should be preferred over this helper. - bpf_skb_change_head(): Clarify comment about invalidated verifier checks. - bpf_xdp_adjust_head(): Clarify comment about invalidated verifier checks. - bpf_probe_write_user(): Add that dst must be a valid user space address. - bpf_get_socket_cookie(): Improve description by making clearer that the cockie belongs to the socket, and state that it remains stable for the life of the socket. v3: - bpf_perf_event_read(): Fix time of selection for perf event type in description. Remove occurences of "cores" to avoid confusion with "CPU". Cc: Martin KaFai Lau <[email protected]> Cc: Sargun Dhillon <[email protected]> Cc: Thomas Graf <[email protected]> Cc: Gianluca Borello <[email protected]> Cc: Chenbo Feng <[email protected]> Signed-off-by: Quentin Monnet <[email protected]> Acked-by: Alexei Starovoitov <[email protected]> Acked-by: Martin KaFai Lau <[email protected]> [for bpf_skb_under_cgroup(), bpf_xdp_adjust_head()] Signed-off-by: Daniel Borkmann <[email protected]>
2018-04-27bpf: add documentation for eBPF helpers (33-41)Quentin Monnet1-0/+164
Add documentation for eBPF helper functions to bpf.h user header file. This documentation can be parsed with the Python script provided in another commit of the patch series, in order to provide a RST document that can later be converted into a man page. The objective is to make the documentation easily understandable and accessible to all eBPF developers, including beginners. This patch contains descriptions for the following helper functions, all written by Daniel: - bpf_get_hash_recalc() - bpf_skb_change_tail() - bpf_skb_pull_data() - bpf_csum_update() - bpf_set_hash_invalid() - bpf_get_numa_node_id() - bpf_set_hash() - bpf_skb_adjust_room() - bpf_xdp_adjust_meta() v4: - bpf_skb_change_tail(): Clarify comment about invalidated verifier checks. - bpf_skb_pull_data(): Clarify the motivation for using this helper or bpf_skb_load_bytes(), on non-linear buffers. Fix RST formatting for *skb*. Clarify comment about invalidated verifier checks. - bpf_csum_update(): Fix description of checksum (entire packet, not IP checksum). Fix a typo: "header" instead of "helper". - bpf_set_hash_invalid(): Mention bpf_get_hash_recalc(). - bpf_get_numa_node_id(): State that the helper is not restricted to programs attached to sockets. - bpf_skb_adjust_room(): Clarify comment about invalidated verifier checks. - bpf_xdp_adjust_meta(): Clarify comment about invalidated verifier checks. Cc: Daniel Borkmann <[email protected]> Signed-off-by: Quentin Monnet <[email protected]> Acked-by: Alexei Starovoitov <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]>
2018-04-27bpf: add documentation for eBPF helpers (23-32)Quentin Monnet1-0/+197
Add documentation for eBPF helper functions to bpf.h user header file. This documentation can be parsed with the Python script provided in another commit of the patch series, in order to provide a RST document that can later be converted into a man page. The objective is to make the documentation easily understandable and accessible to all eBPF developers, including beginners. This patch contains descriptions for the following helper functions, all written by Daniel: - bpf_get_prandom_u32() - bpf_get_smp_processor_id() - bpf_get_cgroup_classid() - bpf_get_route_realm() - bpf_skb_load_bytes() - bpf_csum_diff() - bpf_skb_get_tunnel_opt() - bpf_skb_set_tunnel_opt() - bpf_skb_change_proto() - bpf_skb_change_type() v4: - bpf_get_prandom_u32(): Warn that the prng is not cryptographically secure. - bpf_get_smp_processor_id(): Fix a typo (case). - bpf_get_cgroup_classid(): Clarify description. Add notes on the helper being limited to cgroup v1, and to egress path. - bpf_get_route_realm(): Add comparison with bpf_get_cgroup_classid(). Add a note about usage with TC and advantage of clsact. Fix a typo in return value ("sdb" instead of "skb"). - bpf_skb_load_bytes(): Make explicit loading large data loads it to the eBPF stack. - bpf_csum_diff(): Add a note on seed that can be cascaded. Link to bpf_l3|l4_csum_replace(). - bpf_skb_get_tunnel_opt(): Add a note about usage with "collect metadata" mode, and example of this with Geneve. - bpf_skb_set_tunnel_opt(): Add a link to bpf_skb_get_tunnel_opt() description. - bpf_skb_change_proto(): Mention that the main use case is NAT64. Clarify comment about invalidated verifier checks. v3: - bpf_get_prandom_u32(): Fix helper name :(. Add description, including a note on the internal random state. - bpf_get_smp_processor_id(): Add description, including a note on the processor id remaining stable during program run. - bpf_get_cgroup_classid(): State that CONFIG_CGROUP_NET_CLASSID is required to use the helper. Add a reference to related documentation. State that placing a task in net_cls controller disables cgroup-bpf. - bpf_get_route_realm(): State that CONFIG_CGROUP_NET_CLASSID is required to use this helper. - bpf_skb_load_bytes(): Fix comment on current use cases for the helper. Cc: Daniel Borkmann <[email protected]> Signed-off-by: Quentin Monnet <[email protected]> Acked-by: Alexei Starovoitov <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]>
2018-04-27bpf: add documentation for eBPF helpers (12-22)Quentin Monnet1-0/+254
Add documentation for eBPF helper functions to bpf.h user header file. This documentation can be parsed with the Python script provided in another commit of the patch series, in order to provide a RST document that can later be converted into a man page. The objective is to make the documentation easily understandable and accessible to all eBPF developers, including beginners. This patch contains descriptions for the following helper functions, all written by Alexei: - bpf_get_current_pid_tgid() - bpf_get_current_uid_gid() - bpf_get_current_comm() - bpf_skb_vlan_push() - bpf_skb_vlan_pop() - bpf_skb_get_tunnel_key() - bpf_skb_set_tunnel_key() - bpf_redirect() - bpf_perf_event_output() - bpf_get_stackid() - bpf_get_current_task() v4: - bpf_redirect(): Fix typo: "XDP_ABORT" changed to "XDP_ABORTED". Add note on bpf_redirect_map() providing better performance. Replace "Save for" with "Except for". - bpf_skb_vlan_push(): Clarify comment about invalidated verifier checks. - bpf_skb_vlan_pop(): Clarify comment about invalidated verifier checks. - bpf_skb_get_tunnel_key(): Add notes on tunnel_id, "collect metadata" mode, and example tunneling protocols with which it can be used. - bpf_skb_set_tunnel_key(): Add a reference to the description of bpf_skb_get_tunnel_key(). - bpf_perf_event_output(): Specify that, and for what purpose, the helper can be used with programs attached to TC and XDP. v3: - bpf_skb_get_tunnel_key(): Change and improve description and example. - bpf_redirect(): Improve description of BPF_F_INGRESS flag. - bpf_perf_event_output(): Fix first sentence of description. Delete wrong statement on context being evaluated as a struct pt_reg. Remove the long yet incomplete example. - bpf_get_stackid(): Add a note about PERF_MAX_STACK_DEPTH being configurable. Cc: Alexei Starovoitov <[email protected]> Signed-off-by: Quentin Monnet <[email protected]> Acked-by: Alexei Starovoitov <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]>
2018-04-27bpf: add documentation for eBPF helpers (01-11)Quentin Monnet1-0/+230
Add documentation for eBPF helper functions to bpf.h user header file. This documentation can be parsed with the Python script provided in another commit of the patch series, in order to provide a RST document that can later be converted into a man page. The objective is to make the documentation easily understandable and accessible to all eBPF developers, including beginners. This patch contains descriptions for the following helper functions, all written by Alexei: - bpf_map_lookup_elem() - bpf_map_update_elem() - bpf_map_delete_elem() - bpf_probe_read() - bpf_ktime_get_ns() - bpf_trace_printk() - bpf_skb_store_bytes() - bpf_l3_csum_replace() - bpf_l4_csum_replace() - bpf_tail_call() - bpf_clone_redirect() v4: - bpf_map_lookup_elem(): Add "const" qualifier for key. - bpf_map_update_elem(): Add "const" qualifier for key and value. - bpf_map_lookup_elem(): Add "const" qualifier for key. - bpf_skb_store_bytes(): Clarify comment about invalidated verifier checks. - bpf_l3_csum_replace(): Mention L3 instead of just IP, and add a note about bpf_csum_diff(). - bpf_l4_csum_replace(): Mention L4 instead of just TCP/UDP, and add a note about bpf_csum_diff(). - bpf_tail_call(): Bring minor edits to description. - bpf_clone_redirect(): Add a note about the relation with bpf_redirect(). Also clarify comment about invalidated verifier checks. v3: - bpf_map_lookup_elem(): Fix description of restrictions for flags related to the existence of the entry. - bpf_trace_printk(): State that trace_pipe can be configured. Fix return value in case an unknown format specifier is met. Add a note on kernel log notice when the helper is used. Edit example. - bpf_tail_call(): Improve comment on stack inheritance. - bpf_clone_redirect(): Improve description of BPF_F_INGRESS flag. Cc: Alexei Starovoitov <[email protected]> Signed-off-by: Quentin Monnet <[email protected]> Acked-by: Alexei Starovoitov <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]>