aboutsummaryrefslogtreecommitdiff
path: root/include/uapi
AgeCommit message (Collapse)AuthorFilesLines
2018-03-19IB/uverbs: Move to new headers and make naming consistentMatan Barak4-84/+161
Use macros to make names consistent in ioctl() uAPI: The ioctl() uAPI works with object-method hierarchy. The method part also states which handler should be executed when this method is called from user-space. Therefore, we need to tie method, method's id, method's handler and the object owning this method together. Previously, this was done through explicit developer chosen names. This makes grepping the code harder. Changing the method's name, method's handler and object's name to be automatically generated based on the ids. The headers are split in a way so they be included and used by user-space. One header strictly contains structures that are used directly by user-space applications, where another header is used for internal library (i.e. libibverbs) to form the ioctl() commands. Other header simply contains the required general command structure. Reviewed-by: Yishai Hadas <[email protected]> Signed-off-by: Matan Barak <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2018-03-19bpf: sk_msg program helper bpf_sk_msg_pull_dataJohn Fastabend1-1/+2
Currently, if a bpf sk msg program is run the program can only parse data that the (start,end) pointers already consumed. For sendmsg hooks this is likely the first scatterlist element. For sendpage this will be the range (0,0) because the data is shared with userspace and by default we want to avoid allowing userspace to modify data while (or after) BPF verdict is being decided. To support pulling in additional bytes for parsing use a new helper bpf_sk_msg_pull(start, end, flags) which works similar to cls tc logic. This helper will attempt to point the data start pointer at 'start' bytes offest into msg and data end pointer at 'end' bytes offset into message. After basic sanity checks to ensure 'start' <= 'end' and 'end' <= msg_length there are a few cases we need to handle. First the sendmsg hook has already copied the data from userspace and has exclusive access to it. Therefor, it is not necessesary to copy the data. However, it may be required. After finding the scatterlist element with 'start' offset byte in it there are two cases. One the range (start,end) is entirely contained in the sg element and is already linear. All that is needed is to update the data pointers, no allocate/copy is needed. The other case is (start, end) crosses sg element boundaries. In this case we allocate a block of size 'end - start' and copy the data to linearize it. Next sendpage hook has not copied any data in initial state so that data pointers are (0,0). In this case we handle it similar to the above sendmsg case except the allocation/copy must always happen. Then when sending the data we have possibly three memory regions that need to be sent, (0, start - 1), (start, end), and (end + 1, msg_length). This is required to ensure any writes by the BPF program are correctly transmitted. Lastly this operation will invalidate any previous data checks so BPF programs will have to revalidate pointers after making this BPF call. Signed-off-by: John Fastabend <[email protected]> Acked-by: David S. Miller <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]>
2018-03-19bpf: sockmap, add msg_cork_bytes() helperJohn Fastabend1-1/+2
In the case where we need a specific number of bytes before a verdict can be assigned, even if the data spans multiple sendmsg or sendfile calls. The BPF program may use msg_cork_bytes(). The extreme case is a user can call sendmsg repeatedly with 1-byte msg segments. Obviously, this is bad for performance but is still valid. If the BPF program needs N bytes to validate a header it can use msg_cork_bytes to specify N bytes and the BPF program will not be called again until N bytes have been accumulated. The infrastructure will attempt to coalesce data if possible so in many cases (most my use cases at least) the data will be in a single scatterlist element with data pointers pointing to start/end of the element. However, this is dependent on available memory so is not guaranteed. So BPF programs must validate data pointer ranges, but this is the case anyways to convince the verifier the accesses are valid. Signed-off-by: John Fastabend <[email protected]> Acked-by: David S. Miller <[email protected]> Acked-by: Alexei Starovoitov <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]>
2018-03-19bpf: sockmap, add bpf_msg_apply_bytes() helperJohn Fastabend1-1/+2
A single sendmsg or sendfile system call can contain multiple logical messages that a BPF program may want to read and apply a verdict. But, without an apply_bytes helper any verdict on the data applies to all bytes in the sendmsg/sendfile. Alternatively, a BPF program may only care to read the first N bytes of a msg. If the payload is large say MB or even GB setting up and calling the BPF program repeatedly for all bytes, even though the verdict is already known, creates unnecessary overhead. To allow BPF programs to control how many bytes a given verdict applies to we implement a bpf_msg_apply_bytes() helper. When called from within a BPF program this sets a counter, internal to the BPF infrastructure, that applies the last verdict to the next N bytes. If the N is smaller than the current data being processed from a sendmsg/sendfile call, the first N bytes will be sent and the BPF program will be re-run with start_data pointing to the N+1 byte. If N is larger than the current data being processed the BPF verdict will be applied to multiple sendmsg/sendfile calls until N bytes are consumed. Note1 if a socket closes with apply_bytes counter non-zero this is not a problem because data is not being buffered for N bytes and is sent as its received. Signed-off-by: John Fastabend <[email protected]> Acked-by: David S. Miller <[email protected]> Acked-by: Alexei Starovoitov <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]>
2018-03-19bpf: create tcp_bpf_ulp allowing BPF to monitor socket TX/RX dataJohn Fastabend1-1/+21
This implements a BPF ULP layer to allow policy enforcement and monitoring at the socket layer. In order to support this a new program type BPF_PROG_TYPE_SK_MSG is used to run the policy at the sendmsg/sendpage hook. To attach the policy to sockets a sockmap is used with a new program attach type BPF_SK_MSG_VERDICT. Similar to previous sockmap usages when a sock is added to a sockmap, via a map update, if the map contains a BPF_SK_MSG_VERDICT program type attached then the BPF ULP layer is created on the socket and the attached BPF_PROG_TYPE_SK_MSG program is run for every msg in sendmsg case and page/offset in sendpage case. BPF_PROG_TYPE_SK_MSG Semantics/API: BPF_PROG_TYPE_SK_MSG supports only two return codes SK_PASS and SK_DROP. Returning SK_DROP free's the copied data in the sendmsg case and in the sendpage case leaves the data untouched. Both cases return -EACESS to the user. Returning SK_PASS will allow the msg to be sent. In the sendmsg case data is copied into kernel space buffers before running the BPF program. The kernel space buffers are stored in a scatterlist object where each element is a kernel memory buffer. Some effort is made to coalesce data from the sendmsg call here. For example a sendmsg call with many one byte iov entries will likely be pushed into a single entry. The BPF program is run with data pointers (start/end) pointing to the first sg element. In the sendpage case data is not copied. We opt not to copy the data by default here, because the BPF infrastructure does not know what bytes will be needed nor when they will be needed. So copying all bytes may be wasteful. Because of this the initial start/end data pointers are (0,0). Meaning no data can be read or written. This avoids reading data that may be modified by the user. A new helper is added later in this series if reading and writing the data is needed. The helper call will do a copy by default so that the page is exclusively owned by the BPF call. The verdict from the BPF_PROG_TYPE_SK_MSG applies to the entire msg in the sendmsg() case and the entire page/offset in the sendpage case. This avoids ambiguity on how to handle mixed return codes in the sendmsg case. Again a helper is added later in the series if a verdict needs to apply to multiple system calls and/or only a subpart of the currently being processed message. The helper msg_redirect_map() can be used to select the socket to send the data on. This is used similar to existing redirect use cases. This allows policy to redirect msgs. Pseudo code simple example: The basic logic to attach a program to a socket is as follows, // load the programs bpf_prog_load(SOCKMAP_TCP_MSG_PROG, BPF_PROG_TYPE_SK_MSG, &obj, &msg_prog); // lookup the sockmap bpf_map_msg = bpf_object__find_map_by_name(obj, "my_sock_map"); // get fd for sockmap map_fd_msg = bpf_map__fd(bpf_map_msg); // attach program to sockmap bpf_prog_attach(msg_prog, map_fd_msg, BPF_SK_MSG_VERDICT, 0); Adding sockets to the map is done in the normal way, // Add a socket 'fd' to sockmap at location 'i' bpf_map_update_elem(map_fd_msg, &i, fd, BPF_ANY); After the above any socket attached to "my_sock_map", in this case 'fd', will run the BPF msg verdict program (msg_prog) on every sendmsg and sendpage system call. For a complete example see BPF selftests or sockmap samples. Implementation notes: It seemed the simplest, to me at least, to use a refcnt to ensure psock is not lost across the sendmsg copy into the sg, the bpf program running on the data in sg_data, and the final pass to the TCP stack. Some performance testing may show a better method to do this and avoid the refcnt cost, but for now use the simpler method. Another item that will come after basic support is in place is supporting MSG_MORE flag. At the moment we call sendpages even if the MSG_MORE flag is set. An enhancement would be to collect the pages into a larger scatterlist and pass down the stack. Notice that bpf_tcp_sendmsg() could support this with some additional state saved across sendmsg calls. I built the code to support this without having to do refactoring work. Other features TBD include ZEROCOPY and the TCP_RECV_QUEUE/TCP_NO_QUEUE support. This will follow initial series shortly. Future work could improve size limits on the scatterlist rings used here. Currently, we use MAX_SKB_FRAGS simply because this was being used already in the TLS case. Future work could extend the kernel sk APIs to tune this depending on workload. This is a trade-off between memory usage and throughput performance. Signed-off-by: John Fastabend <[email protected]> Acked-by: David S. Miller <[email protected]> Acked-by: Alexei Starovoitov <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]>
2018-03-19Merge tag 'v4.16-rc6' into perf/core, to pick up fixesIngo Molnar2-2/+19
Signed-off-by: Ingo Molnar <[email protected]>
2018-03-19IB/mlx5: Packet packing enhancement for RAW QPBodong Wang1-1/+18
Enable RAW QP to be able to configure burst control by modify_qp. By using burst control with rate limiting, user can achieve best performance and accuracy. The burst control information is passed by user through udata. This patch also reports burst control capability for mlx5 related hardwares, burst control is only marked as supported when both packet_pacing_burst_bound and packet_pacing_typical_size are supported. Signed-off-by: Bodong Wang <[email protected]> Reviewed-by: Daniel Jurgens <[email protected]> Reviewed-by: Yishai Hadas <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2018-03-19RDMA/bnxt: Fix structure layout for bnxt_re_pd_respJason Gunthorpe1-1/+6
What is going on here is a bit subtle, in the kernel there is no problem because the struct is copied using copy_from_user, so it can safely have an 8 byte alignment, however in userspace it must be constructed by concatenation with the ib_uverbs_alloc_pd_resp struct. This is due to the required memory layout to execute the command. Since ibv_uverbs_alloc_pd_resp is only 4 bytes long, this causes misalignment, and the user space will experience an unexpected padding. Currently it works around this via pointer maths. Make everything more robust by having the compiler reduce the alignment of the struct to 4. The userspace has assertions to ensure this works properly in all situations. Signed-off-by: Jason Gunthorpe <[email protected]>
2018-03-19ALSA: usb-audio: Fix parsing descriptor of UAC2 processing unitKirill Marinushkin1-2/+2
Currently, the offsets in the UAC2 processing unit descriptor are calculated incorrectly. It causes an issue when connecting the device which provides such a feature: ~~~~ [84126.724420] usb 1-1.3.1: invalid Processing Unit descriptor (id 18) ~~~~ After this patch is applied, the UAC2 processing unit inits w/o this error. Fixes: 23caaf19b11e ("ALSA: usb-mixer: Add support for Audio Class v2.0") Signed-off-by: Kirill Marinushkin <[email protected]> Cc: <[email protected]> Signed-off-by: Takashi Iwai <[email protected]>
2018-03-19y2038: Introduce struct __kernel_old_timevalArnd Bergmann1-0/+12
Dealing with 'struct timeval' users in the y2038 series is a bit tricky: We have two definitions of timeval that are visible to user space, one comes from glibc (or some other C library), the other comes from linux/time.h. The kernel copy is what we want to be used for a number of structures defined by the kernel itself, e.g. elf_prstatus (used it core dumps), sysinfo and rusage (used in system calls). These generally tend to be used for passing time intervals rather than absolute (epoch-based) times, so they do not suffer from the y2038 overflow. Some of them could be changed to use 64-bit timestamps by creating new system calls, others like the core files cannot easily be changed. An application using these interfaces likely also uses gettimeofday() or other interfaces that use absolute times, and pass 'struct timeval' pointers directly into kernel interfaces, so glibc must redefine their timeval based on a 64-bit time_t when they introduce their y2038-safe interfaces. The only reasonable way forward I see is to remove the 'timeval' definion from the kernel's uapi headers, and change the interfaces that we do not want to (or cannot) duplicate for 64-bit times to use a new __kernel_old_timeval definition instead. This type should be avoided for all new interfaces (those can use 64-bit nanoseconds, or the 64-bit version of timespec instead), and should be used with great care when converting existing interfaces from timeval, to be sure they don't suffer from the y2038 overflow, and only with consensus for the particular user that using __kernel_old_timeval is better than moving to a 64-bit based interface. The structure name is intentionally chosen to not conflict with user space types, and to be ugly enough to discourage its use. Note that ioctl based interfaces that pass a bare 'timeval' pointer cannot change to '__kernel_old_timeval' because the user space source code refers to 'timeval' instead, and we don't want to modify the user space sources if possible. However, any application that relies on a structure to contain an embedded 'timeval' (e.g. by passing a pointer to the member into a function call that expects a timeval pointer) is broken when that structure gets converted to __kernel_old_timeval. I don't see any way around that, and we have to rely on the compiler to produce a warning or compile failure that will alert users when they recompile their sources against a new libc. Signed-off-by: Arnd Bergmann <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: Stephen Boyd <[email protected]> Cc: John Stultz <[email protected]> Cc: Al Viro <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2018-03-19Merge 4.16-rc6 into staging-nextGreg Kroah-Hartman7-4/+59
We want the staging fixes in here as well to handle merge/test issues. Signed-off-by: Greg Kroah-Hartman <[email protected]>
2018-03-18signals, sparc: Add signal codes for ADI violationsKhalid Aziz1-1/+4
SPARC M7 processor introduces a new feature - Application Data Integrity (ADI). ADI allows MMU to catch rogue accesses to memory. When a rogue access occurs, MMU blocks the access and raises an exception. In response to the exception, kernel sends the offending task a SIGSEGV with si_code that indicates the nature of exception. This patch adds three new signal codes specific to ADI feature: 1. ADI is not enabled for the address and task attempted to access memory using ADI 2. Task attempted to access memory using wrong ADI tag and caused a deferred exception. 3. Task attempted to access memory using wrong ADI tag and caused a precise exception. Signed-off-by: Khalid Aziz <[email protected]> Cc: Khalid Aziz <[email protected]> Reviewed-by: Anthony Yznaga <[email protected]> Acked-by: "Eric W. Biederman" <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-03-17tipc: obsolete TIPC_ZONE_SCOPEJon Maloy1-48/+54
Publications for TIPC_CLUSTER_SCOPE and TIPC_ZONE_SCOPE are in all aspects handled the same way, both on the publishing node and on the receiving nodes. Despite previous ambitions to the contrary, this is never going to change, so we take the conseqeunce of this and obsolete TIPC_ZONE_SCOPE and related macros/functions. Whenever a user is doing a bind() or a sendmsg() attempt using ZONE_SCOPE we translate this internally to CLUSTER_SCOPE, while we remain compatible with users and remote nodes still using ZONE_SCOPE. Furthermore, the non-formalized scope value 0 has always been permitted for use during lookup, with the same meaning as ZONE_SCOPE/CLUSTER_SCOPE. We now permit it even as binding scope, but for compatibility reasons we choose to not change the value of TIPC_CLUSTER_SCOPE. Acked-by: Ying Xue <[email protected]> Signed-off-by: Jon Maloy <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-03-17block: Move SECTOR_SIZE and SECTOR_SHIFT definitions into <linux/blkdev.h>Bart Van Assche1-0/+2
It happens often while I'm preparing a patch for a block driver that I'm wondering: is a definition of SECTOR_SIZE and/or SECTOR_SHIFT available for this driver? Do I have to introduce definitions of these constants before I can use these constants? To avoid this confusion, move the existing definitions of SECTOR_SIZE and SECTOR_SHIFT into the <linux/blkdev.h> header file such that these become available for all block drivers. Make the SECTOR_SIZE definition in the uapi msdos_fs.h header file conditional to avoid that including that header file after <linux/blkdev.h> causes the compiler to complain about a SECTOR_SIZE redefinition. Note: the SECTOR_SIZE / SECTOR_SHIFT / SECTOR_BITS definitions have not been removed from uapi header files nor from NAND drivers in which these constants are used for another purpose than converting block layer offsets and sizes into a number of sectors. Cc: David S. Miller <[email protected]> Cc: Mike Snitzer <[email protected]> Cc: Dan Williams <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Nitin Gupta <[email protected]> Reviewed-by: Sergey Senozhatsky <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]> Reviewed-by: Martin K. Petersen <[email protected]> Signed-off-by: Bart Van Assche <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-03-16KVM: X86: Provide a capability to disable MWAIT interceptsWanpeng Li1-1/+1
Allowing a guest to execute MWAIT without interception enables a guest to put a (physical) CPU into a power saving state, where it takes longer to return from than what may be desired by the host. Don't give a guest that power over a host by default. (Especially, since nothing prevents a guest from using MWAIT even when it is not advertised via CPUID.) Cc: Paolo Bonzini <[email protected]> Cc: Radim Krčmář <[email protected]> Cc: Jan H. Schönherr <[email protected]> Signed-off-by: Wanpeng Li <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2018-03-16tcp: add snd_ssthresh stat in SCM_TIMESTAMPING_OPT_STATSYousuk Seung1-0/+1
This patch adds TCP_NLA_SND_SSTHRESH stat into SCM_TIMESTAMPING_OPT_STATS that reports tcp_sock.snd_ssthresh. Signed-off-by: Yousuk Seung <[email protected]> Signed-off-by: Neal Cardwell <[email protected]> Signed-off-by: Priyaranjan Jha <[email protected]> Signed-off-by: Soheil Hassas Yeganeh <[email protected]> Signed-off-by: Yuchung Cheng <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-03-16net: Fix vlan untag for bridge and vlan_dev with reorder_hdr offToshiaki Makita1-0/+1
When we have a bridge with vlan_filtering on and a vlan device on top of it, packets would be corrupted in skb_vlan_untag() called from br_dev_xmit(). The problem sits in skb_reorder_vlan_header() used in skb_vlan_untag(), which makes use of skb->mac_len. In this function mac_len is meant for handling rx path with vlan devices with reorder_header disabled, but in tx path mac_len is typically 0 and cannot be used, which is the problem in this case. The current code even does not properly handle rx path (skb_vlan_untag() called from __netif_receive_skb_core()) with reorder_header off actually. In rx path single tag case, it works as follows: - Before skb_reorder_vlan_header() mac_header data v v +-------------------+-------------+------+---- | ETH | VLAN | ETH | | ADDRS | TPID | TCI | TYPE | +-------------------+-------------+------+---- <-------- mac_len ---------> <-------------> to be removed - After skb_reorder_vlan_header() mac_header data v v +-------------------+------+---- | ETH | ETH | | ADDRS | TYPE | +-------------------+------+---- <-------- mac_len ---------> This is ok, but in rx double tag case, it corrupts packets: - Before skb_reorder_vlan_header() mac_header data v v +-------------------+-------------+-------------+------+---- | ETH | VLAN | VLAN | ETH | | ADDRS | TPID | TCI | TPID | TCI | TYPE | +-------------------+-------------+-------------+------+---- <--------------- mac_len ----------------> <-------------> should be removed <---------------------------> actually will be removed - After skb_reorder_vlan_header() mac_header data v v +-------------------+------+---- | ETH | ETH | | ADDRS | TYPE | +-------------------+------+---- <--------------- mac_len ----------------> So, two of vlan tags are both removed while only inner one should be removed and mac_header (and mac_len) is broken. skb_vlan_untag() is meant for removing the vlan header at (skb->data - 2), so use skb->data and skb->mac_header to calculate the right offset. Reported-by: Brandon Carpenter <[email protected]> Fixes: a6e18ff11170 ("vlan: Fix untag operations of stacked vlans with REORDER_HEADER off") Signed-off-by: Toshiaki Makita <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-03-15IB/mlx4: Report TSO capabilitiesYishai Hadas1-0/+9
Report to the user area the TSO device capabilities, it includes the max_tso size and the QP types that support it. The TSO is applicable only when when of the ports is ETH and the device supports it. uresp logic around rss_caps is updated to fix a till-now harmless bug computing the length of the structure to copy. The code did not handle the implicit padding before rss_caps correctly. This is necessay to copy tss_caps successfully. Reviewed-by: Mark Bloch <[email protected]> Signed-off-by: Yishai Hadas <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2018-03-15RDMA/cxgb4: Use structs to describe the uABI instead of opencodingJason Gunthorpe1-0/+5
Open coding a loose value is not acceptable for describing the uABI in RDMA. Provide the missing struct. Reviewed-by: Steve Wise <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2018-03-15RDMA/hns: Use structs to describe the uABI instead of opencodingJason Gunthorpe1-0/+5
Open coding a loose value is not acceptable for describing the uABI in RDMA. Provide the missing struct. Signed-off-by: Jason Gunthorpe <[email protected]>
2018-03-15RDMA/i40iw: Move uapi header to include/uapiJason Gunthorpe1-0/+107
All of these defines are part of the uABI for the driver, this header duplicates providers/i40iw/i40iw-abi.h in rdma-core. Acked-by: Shiraz Saleem <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2018-03-15RDMA/mlx4: Move flag constants to uapi headerJason Gunthorpe1-0/+8
MLX4_USER_DEV_CAP_LARGE_CQE (via mlx4_ib_alloc_ucontext_resp.dev_caps) and MLX4_IB_QUERY_DEV_RESP_MASK_CORE_CLOCK_OFFSET (via mlx4_uverbs_ex_query_device_resp.comp_mask) are copied directly to userspace and form part of the uAPI. Move them to the uapi header where they belong. Signed-off-by: Jason Gunthorpe <[email protected]>
2018-03-15RDMA/rxe: Use structs to describe the uABI instead of opencodingJason Gunthorpe1-0/+22
Open coding pointer math is not acceptable for describing the uABI in RDMA. Provide structs for all the cases. The udata is casted to the struct as close to the verbs entry point as possible for maximum clarity. Function signatures and so forth are revised to allow for this. Tested-by: Yuval Shaia <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2018-03-15RDMA/hns: Fix cqn type and init respYixian Liu1-2/+1
This patch changes the type of cqn from u32 to u64 to keep userspace and kernel consistent, initializes resp both for cq and qp to zeros, and also changes the condition judgment of outlen considering future caps extension. Suggested-by: Jason Gunthorpe <[email protected]> Fixes: e088a685eae9 (hns: Support rq record doorbell for the user space) Fixes: 9b44703d0a21 (hns: Support cq record doorbell for the user space) Signed-off-by: Yixian Liu <[email protected]> Signed-off-by: Lijun Ou <[email protected]> Signed-off-by: Wei Hu (Xavier) <[email protected]> Signed-off-by: Shaobo Xu <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2018-03-15drm/amdkfd: Add ioctls for GPUVM memory managementFelix Kuehling1-1/+96
v2: * Fix error handling after kfd_bind_process_to_device in kfd_ioctl_map_memory_to_gpu v3: * Add ioctl to acquire VM from a DRM FD v4: * Return number of successful map/unmap operations in failure cases * Facilitate partial retry after failed map/unmap * Added comments with parameter descriptions to new APIs * Defined AMDKFD_IOC_FREE_MEMORY_OF_GPU write-only Signed-off-by: Felix Kuehling <[email protected]> Acked-by: Christian König <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2018-03-15drm/amdkfd: Remove limit on number of GPUsFelix Kuehling1-3/+24
Currently the number of GPUs is limited by aperture placement options available on GFX7 and GFX8 hardware. This limitation is not necessary. Scratch and LDS represent per-work-item and per-work-group storage respectively. Different work-items and work-groups use the same virtual address to access their own data. Work running on different GPUs is by definition in different work-groups (different dispatches, in fact). That means the same virtual addresses can be used for these apertures on different GPUs. Add a new AMDKFD_IOC_GET_PROCESS_APERTURES_NEW ioctl that removes the artificial limitation on the number of GPUs that can be supported. The new ioctl allows user mode to query the number of GPUs to allocate enough memory for all GPUs to be reported. This deprecates AMDKFD_IOC_GET_PROCESS_APERTURES. Signed-off-by: Felix Kuehling <[email protected]> Acked-by: Christian König <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2018-03-15signal: Add FPE_FLTUNK si_code for undiagnosable fp exceptionsDave Martin1-1/+2
Some architectures cannot always report accurately what kind of floating-point exception triggered a floating-point exception trap. This can occur with fp exceptions occurring on lanes in a vector instruction on arm64 for example. Rather than have every architecture come up with its own way of describing such a condition, this patch adds a common FPE_FLTUNK si_code value to report that an fp exception caused a trap but we cannot be certain which kind of fp exception it was. Signed-off-by: Dave Martin <[email protected]> Signed-off-by: Eric W. Biederman <[email protected]>
2018-03-15uapi: remove telephony headersBaruch Siach2-984/+0
ixjuser.h includes the telephony.h header. Other than that no kernel code uses any of these headers. The last user of the ixjuser.h header has been removed in commit 7326446c728 (Staging: remove telephony drivers), more than 5 years ago. Signed-off-by: Baruch Siach <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2018-03-15bpf: extend stackmap to save binary_build_id+offset instead of addressSong Liu1-0/+22
Currently, bpf stackmap store address for each entry in the call trace. To map these addresses to user space files, it is necessary to maintain the mapping from these virtual address to symbols in the binary. Usually, the user space profiler (such as perf) has to scan /proc/pid/maps at the beginning of profiling, and monitor mmap2() calls afterwards. Given the cost of maintaining the address map, this solution is not practical for system wide profiling that is always on. This patch tries to solve this problem with a variation of stackmap. This variation is enabled by flag BPF_F_STACK_BUILD_ID. Instead of storing addresses, the variation stores ELF file build_id + offset. Build ID is a 20-byte unique identifier for ELF files. The following command shows the Build ID of /bin/bash: [user@]$ readelf -n /bin/bash ... Build ID: XXXXXXXXXX ... With BPF_F_STACK_BUILD_ID, bpf_get_stackid() tries to parse Build ID for each entry in the call trace, and translate it into the following struct: struct bpf_stack_build_id_offset { __s32 status; unsigned char build_id[BPF_BUILD_ID_SIZE]; union { __u64 offset; __u64 ip; }; }; The search of build_id is limited to the first page of the file, and this page should be in page cache. Otherwise, we fallback to store ip for this entry (ip field in struct bpf_stack_build_id_offset). This requires the build_id to be stored in the first page. A quick survey of binary and dynamic library files in a few different systems shows that almost all binary and dynamic library files have build_id in the first page. Build_id is only meaningful for user stack. If a kernel stack is added to a stackmap with BPF_F_STACK_BUILD_ID, it will automatically fallback to only store ip (status == BPF_STACK_BUILD_ID_IP). Similarly, if build_id lookup failed for some reason, it will also fallback to store ip. User space can access struct bpf_stack_build_id_offset with bpf syscall BPF_MAP_LOOKUP_ELEM. It is necessary for user space to maintain mapping from build id to binary files. This mostly static mapping is much easier to maintain than per process address maps. Note: Stackmap with build_id only works in non-nmi context at this time. This is because we need to take mm->mmap_sem for find_vma(). If this changes, we would like to allow build_id lookup in nmi context. Signed-off-by: Song Liu <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]>
2018-03-14drm/amdgpu: query vram type from atombiosHawking Zhang1-0/+1
The vram type for dGPU is stored in umc_info while sys mem type for APU is stored in integratedsysteminfo Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2018-03-14sctp: add SCTP_AUTH_NO_AUTH type for AUTHENTICATION_EVENTXin Long1-0/+1
This patch is to add SCTP_AUTH_NO_AUTH type for AUTHENTICATION_EVENT, as described in section 6.1.8 of RFC6458. SCTP_AUTH_NO_AUTH: This report indicates that the peer does not support SCTP authentication as defined in [RFC4895]. Note that the implementation is quite similar as that of SCTP_ADAPTATION_INDICATION. Signed-off-by: Xin Long <[email protected]> Acked-by: Marcelo Ricardo Leitner <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-03-14sctp: add SCTP_AUTH_FREE_KEY type for AUTHENTICATION_EVENTXin Long1-1/+5
This patch is to add SCTP_AUTH_FREE_KEY type for AUTHENTICATION_EVENT, as described in section 6.1.8 of RFC6458. SCTP_AUTH_FREE_KEY: This report indicates that the SCTP implementation will no longer use the key identifier specified in auth_keynumber. After deactivating a key, it would never be used again, which means it's refcnt can't be held/increased by new chunks. But there may be some chunks in out queue still using it. So only when refcnt is 1, which means no chunk in outqueue is using/holding this key either, this EVENT would be sent. When users receive this notification, they could do DEL_KEY sockopt to remove this shkey, and also tell the peer that this key won't be used in any chunk thoroughly from now on, then the peer can remove it as well safely. Signed-off-by: Xin Long <[email protected]> Acked-by: Marcelo Ricardo Leitner <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-03-14sctp: add sockopt SCTP_AUTH_DEACTIVATE_KEYXin Long1-0/+1
This patch is to add sockopt SCTP_AUTH_DEACTIVATE_KEY, as described in section 8.3.4 of RFC6458. This set option indicates that the application will no longer send user messages using the indicated key identifier. Note that RFC requires that only deactivated keys that are no longer used by an association can be deleted, but for the backward compatibility, it is not to check deactivated when deleting or replacing one sh_key. Signed-off-by: Xin Long <[email protected]> Acked-by: Marcelo Ricardo Leitner <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-03-14sctp: add support for SCTP AUTH Information for sendmsgXin Long1-1/+13
This patch is to add support for SCTP AUTH Information for sendmsg, as described in section 5.3.8 of RFC6458. With this option, you can provide shared key identifier used for sending the user message. It's also a necessary send info for sctp_sendv. Note that it reuses sinfo->sinfo_tsn to indicate if this option is set and sinfo->sinfo_ssn to save the shkey ID which can be 0. Signed-off-by: Xin Long <[email protected]> Acked-by: Marcelo Ricardo Leitner <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-03-14staging: irda: remove the irda network stack and driversGreg Kroah-Hartman1-252/+0
No one has publicly stepped up to maintain this broken codebase for devices that no one uses anymore, so let's just drop the whole thing. If someone really wants/needs it, we can revert this and they can fix the code up to work properly. Cc: David S. Miller <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2018-03-14batman-adv: add multicast flags netlink supportLinus Lüssing1-0/+62
Dump the list of multicast flags entries via the netlink socket. Signed-off-by: Linus Lüssing <[email protected]> Signed-off-by: Sven Eckelmann <[email protected]> Signed-off-by: Simon Wunderlich <[email protected]>
2018-03-14batman-adv: add DAT cache netlink supportLinus Lüssing1-0/+20
Dump the list of DAT cache entries via the netlink socket. Signed-off-by: Linus Lüssing <[email protected]> Signed-off-by: Sven Eckelmann <[email protected]> Signed-off-by: Simon Wunderlich <[email protected]>
2018-03-14Merge tag 'drm-intel-next-2018-03-08' of ↵Dave Airlie1-3/+105
git://anongit.freedesktop.org/drm/drm-intel into drm-next UAPI Changes: - Query uAPI interface (used for GPU topology information currently) * Mesa: https://patchwork.freedesktop.org/series/38795/ Driver Changes: - Increase PSR2 size for CNL (DK) - Avoid retraining LSPCON link unnecessarily (Ville) - Decrease request signaling latency (Chris) - GuC error capture fix (Daniele) * tag 'drm-intel-next-2018-03-08' of git://anongit.freedesktop.org/drm/drm-intel: (127 commits) drm/i915: Update DRIVER_DATE to 20180308 drm/i915: add schedule out notification of preempted but completed request drm/i915: expose rcs topology through query uAPI drm/i915: add query uAPI drm/i915: add rcs topology to error state drm/i915/debugfs: add rcs topology entry drm/i915/debugfs: reuse max slice/subslices already stored in sseu drm/i915: store all subslice masks drm/i915/guc: work around gcc-4.4.4 union initializer issue drm/i915/cnl: Add Wa_2201832410 drm/i915/icl: Gen11 forcewake support drm/i915/icl: Add Indirect Context Offset for Gen11 drm/i915/icl: Enhanced execution list support drm/i915/icl: new context descriptor support drm/i915/icl: Correctly initialize the Gen11 engines drm/i915: Assert that the request is indeed complete when signaled from irq drm/i915: Handle changing enable_fbc parameter at runtime better. drm/i915: Track whether the DP link is trained or not drm/i915: Nuke intel_dp->channel_eq_status drm/i915: Move SST DP link retraining into the ->post_hotplug() hook ...
2018-03-14Merge tag 'drm-amdkfd-next-2018-03-11' of ↵Dave Airlie1-4/+4
git://people.freedesktop.org/~gabbayo/linux into drm-next Major points for this pull request: - Add dGPU support for amdkfd initialization code and queue handling. It's not complete support since the GPUVM part is missing (the under debate stuff). - Enable PCIe atomics for dGPU if present - Various adjustments to the amdgpu<-->amdkfd interface for dGPUs - Refactor IOMMUv2 code to allow loading amdkfd without IOMMUv2 in the system - Add HSA process eviction code in case of system memory pressure - Various fixes and small changes * tag 'drm-amdkfd-next-2018-03-11' of git://people.freedesktop.org/~gabbayo/linux: (24 commits) uapi: Fix type used in ioctl parameter structures drm/amdkfd: Implement KFD process eviction/restore drm/amdkfd: Add GPUVM virtual address space to PDD drm/amdkfd: Remove unaligned memory access drm/amdkfd: Centralize IOMMUv2 code and make it conditional drm/amdgpu: Add submit IB function for KFD drm/amdgpu: Add GPUVM memory management functions for KFD drm/amdgpu: add amdgpu_sync_clone drm/amdgpu: Update kgd2kfd_shared_resources for dGPU support drm/amdgpu: Add KFD eviction fence drm/amdgpu: Remove unused kfd2kgd interface drm/amdgpu: Fix wrong mask in get_atc_vmid_pasid_mapping_pasid drm/amdgpu: Fix header file dependencies drm/amdgpu: Replace kgd_mem with amdgpu_bo for kernel pinned gtt mem drm/amdgpu: remove useless BUG_ONs drm/amdgpu: Enable KFD initialization on dGPUs drm/amdkfd: Add dGPU device IDs and device info drm/amdkfd: Add dGPU support to kernel_queue_init drm/amdkfd: Add dGPU support to the MQD manager drm/amdkfd: Add dGPU support to the device queue manager ...
2018-03-14Merge tag 'drm-misc-next-2018-03-09-3' of ↵Dave Airlie1-3/+6
git://anongit.freedesktop.org/drm/drm-misc into drm-next drm-misc-next for 4.17: UAPI Changes: plane: Add color encoding/range properties (Jyri) nouveau: Replace iturbt_709 property with color_encoding property (Ville) Core Changes: atomic: Move plane clipping into plane check helper (Ville) property: Multiple new property checks/verification (Ville) Driver Changes: rockchip: Fixes & improvements for rk3399/chromebook plus (various) sun4i: Add H3/H5 HDMI support (Jernej) i915: Add support for limited/full-range ycbcr toggling (Ville) pl111: Add bandwidth checking/limiting (Linus) Cc: Jernej Skrabec <[email protected]> Cc: Jyri Sarha <[email protected]> Cc: Ville Syrjälä <[email protected]> Cc: Linus Walleij <[email protected]> * tag 'drm-misc-next-2018-03-09-3' of git://anongit.freedesktop.org/drm/drm-misc: (85 commits) drm/rockchip: Don't use atomic constructs for psr drm/rockchip: analogix_dp: set psr activate/deactivate when enable/disable bridge drm/rockchip: dw_hdmi: Move HDMI vpll clock enable to bind() drm/rockchip: inno_hdmi: reorder clk_disable_unprepare call in unbind drm/rockchip: inno_hdmi: Fix error handling path. drm/rockchip: dw-mipi-dsi: Fix connector and encoder cleanup. drm/nouveau: Replace the iturbt_709 prop with the standard COLOR_ENCODING prop drm/pl111: Use max memory bandwidth for resolution drm/bridge: sii902x: Retry status read after DDI I2C drm/pl111: Handle the RealView variant separately drm/pl111: Make the default BPP a per-variant variable drm: simple_kms_helper: Fix .mode_valid() documentation bridge: Elaborate a bit on dumb VGA bridges in Kconfig drm/atomic: Add new reverse iterator over all plane state (V2) drm: Reject bad property flag combinations drm: Make property flags u32 drm/uapi: Deprecate DRM_MODE_PROP_PENDING drm: WARN when trying to add enum value > 63 to a bitmask property drm: WARN when trying add enum values to non-enum/bitmask properties drm: Reject replacing property enum values ...
2018-03-13RDMA/hns: Support cq record doorbell for the user spaceYixian Liu1-0/+7
This patch updates to support cq record doorbell for the user space. Signed-off-by: Yixian Liu <[email protected]> Signed-off-by: Lijun Ou <[email protected]> Signed-off-by: Wei Hu (Xavier) <[email protected]> Signed-off-by: Shaobo Xu <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
2018-03-13RDMA/hns: Support rq record doorbell for the user spaceYixian Liu1-0/+5
This patch adds interfaces and definitions to support the rq record doorbell for the user space. Signed-off-by: Yixian Liu <[email protected]> Signed-off-by: Lijun Ou <[email protected]> Signed-off-by: Wei Hu (Xavier) <[email protected]> Signed-off-by: Shaobo Xu <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
2018-03-13perf/core: Implement fast breakpoint modification via _IOC_MODIFY_ATTRIBUTESMilind Chabbi1-11/+12
Problem and motivation: Once a breakpoint perf event (PERF_TYPE_BREAKPOINT) is created, there is no flexibility to change the breakpoint type (bp_type), breakpoint address (bp_addr), or breakpoint length (bp_len). The only option is to close the perf event and configure a new breakpoint event. This inflexibility has a significant performance overhead. For example, sampling-based, lightweight performance profilers (and also concurrency bug detection tools), monitor different addresses for a short duration using PERF_TYPE_BREAKPOINT and change the address (bp_addr) to another address or change the kind of breakpoint (bp_type) from "write" to a "read" or vice-versa or change the length (bp_len) of the address being monitored. The cost of these modifications is prohibitive since it involves unmapping the circular buffer associated with the perf event, closing the perf event, opening another perf event and mmaping another circular buffer. Solution: The new ioctl flag for perf events, PERF_EVENT_IOC_MODIFY_ATTRIBUTES, introduced in this patch takes a pointer to a struct perf_event_attr as an argument to update an old breakpoint event with new address, type, and size. This facility allows retaining a previous mmaped perf events ring buffer and avoids having to close and reopen another perf event. This patch supports only changing PERF_TYPE_BREAKPOINT event type; future implementations can extend this feature. The patch replicates some of its functionality of modify_user_hw_breakpoint() in kernel/events/hw_breakpoint.c. modify_user_hw_breakpoint cannot be called directly since perf_event_ctx_lock() is already held in _perf_ioctl(). Evidence: Experiments show that the baseline (not able to modify an already created breakpoint) costs an order of magnitude (~10x) more than the suggested optimization (having the ability to dynamically modifying a configured breakpoint via ioctl). When the breakpoints typically do not trap, the speedup due to the suggested optimization is ~10x; even when the breakpoints always trap, the speedup is ~4x due to the suggested optimization. Testing: tests posted at https://github.com/linux-contrib/perf_event_modify_bp demonstrate the performance significance of this patch. Tests also check the functional correctness of the patch. Signed-off-by: Milind Chabbi <[email protected]> [ Using modify_user_hw_breakpoint_check function. ] [ Reformated PERF_EVENT_IOC_*, so the values are all in one column. ] Signed-off-by: Jiri Olsa <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: David Ahern <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Hari Bathini <[email protected]> Cc: Jin Yao <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Sukadev Bhattiprolu <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Will Deacon <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2018-03-13timekeeping: Add the new CLOCK_MONOTONIC_ACTIVE clockThomas Gleixner1-0/+1
The planned change to unify the behaviour of the MONOTONIC and BOOTTIME clocks vs. suspend removes the ability to retrieve the active non-suspended time of a system. Provide a new CLOCK_MONOTONIC_ACTIVE clock which returns the active non-suspended time of the system via clock_gettime(). This preserves the old behaviour of CLOCK_MONOTONIC before the BOOTTIME/MONOTONIC unification. This new clock also allows applications to detect programmatically that the MONOTONIC and BOOTTIME clocks are identical. Signed-off-by: Thomas Gleixner <[email protected]> Cc: Dmitry Torokhov <[email protected]> Cc: John Stultz <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Kevin Easton <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Mark Salyzyn <[email protected]> Cc: Michael Kerrisk <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Petr Mladek <[email protected]> Cc: Prarit Bhargava <[email protected]> Cc: Sergey Senozhatsky <[email protected]> Cc: Steven Rostedt <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2018-03-09serial: 8250: Add Nuvoton NPCM UARTJoel Stanley1-0/+3
The Nuvoton UART is almost compatible with the 8250 driver when probed via the 8250_of driver, however it requires some extra configuration at startup. Reviewed-by: Rob Herring <[email protected]> Signed-off-by: Joel Stanley <[email protected]> Cc: stable <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2018-03-09Merge tag 'powerpc-4.16-5' of ↵Linus Torvalds1-0/+17
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: "One notable fix to properly advertise our support for a new firmware feature, caused by two series conflicting semantically but not textually. There's a new ioctl for the new ocxl driver, which is not a fix, but needed to complete the userspace API and good to have before the driver is in a released kernel. Finally three minor selftest fixes, and a fix for intermittent build failures for some obscure platforms, caused by a missing make dependency. Thanks to: Alastair D'Silva, Bharata B Rao, Guenter Roeck" * tag 'powerpc-4.16-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/pseries: Fix vector5 in ibm architecture vector table ocxl: Document the OCXL_IOCTL_GET_METADATA IOCTL ocxl: Add get_metadata IOCTL to share OCXL information to userspace selftests/powerpc: Skip the subpage_prot tests if the syscall is unavailable selftests/powerpc: Fix missing clean of pmu/lib.o powerpc/boot: Fix random libfdt related build errors selftests/powerpc: Skip tm-trap if transactional memory is not enabled
2018-03-09drm/etnaviv: add more minor features fieldsLucas Stach1-0/+6
Newer GPU cores added yet more feature bits. Make room for them and let userspace query them. Signed-off-by: Lucas Stach <[email protected]>
2018-03-08net: ethtool: extend RXNFC API to support RSS spreading of filter matchesEdward Cree1-6/+26
We use a two-step process to configure a filter with RSS spreading. First, the RSS context is allocated and configured using ETHTOOL_SRSSH; this returns an identifier (rss_context) which can then be passed to subsequent invocations of ETHTOOL_SRXCLSRLINS to specify that the offset from the RSS indirection table lookup should be added to the queue number (ring_cookie) when delivering the packet. Drivers for devices which can only use the indirection table entry directly (not add it to a base queue number) should reject rule insertions combining RSS with a nonzero ring_cookie. Signed-off-by: Edward Cree <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-03-08RDMA/nldev: provide detailed PD informationSteve Wise1-0/+7
Implement the RDMA nldev netlink interface for dumping detailed PD information. Reviewed-by: Leon Romanovsky <[email protected]> Signed-off-by: Steve Wise <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
2018-03-08RDMA/nldev: provide detailed MR informationSteve Wise1-0/+9
Implement the RDMA nldev netlink interface for dumping detailed MR information. Signed-off-by: Steve Wise <[email protected]> Reviewed-by: Leon Romanovsky <[email protected]> Signed-off-by: Doug Ledford <[email protected]>