| Age | Commit message (Collapse) | Author | Files | Lines |
|
Use macros to make names consistent in ioctl() uAPI:
The ioctl() uAPI works with object-method hierarchy. The method part
also states which handler should be executed when this method is called
from user-space. Therefore, we need to tie method, method's id, method's
handler and the object owning this method together.
Previously, this was done through explicit developer chosen names.
This makes grepping the code harder. Changing the method's name,
method's handler and object's name to be automatically generated based
on the ids.
The headers are split in a way so they be included and used by
user-space. One header strictly contains structures that are used
directly by user-space applications, where another header is used for
internal library (i.e. libibverbs) to form the ioctl() commands.
Other header simply contains the required general command structure.
Reviewed-by: Yishai Hadas <[email protected]>
Signed-off-by: Matan Barak <[email protected]>
Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
Currently, if a bpf sk msg program is run the program
can only parse data that the (start,end) pointers already
consumed. For sendmsg hooks this is likely the first
scatterlist element. For sendpage this will be the range
(0,0) because the data is shared with userspace and by
default we want to avoid allowing userspace to modify
data while (or after) BPF verdict is being decided.
To support pulling in additional bytes for parsing use
a new helper bpf_sk_msg_pull(start, end, flags) which
works similar to cls tc logic. This helper will attempt
to point the data start pointer at 'start' bytes offest
into msg and data end pointer at 'end' bytes offset into
message.
After basic sanity checks to ensure 'start' <= 'end' and
'end' <= msg_length there are a few cases we need to
handle.
First the sendmsg hook has already copied the data from
userspace and has exclusive access to it. Therefor, it
is not necessesary to copy the data. However, it may
be required. After finding the scatterlist element with
'start' offset byte in it there are two cases. One the
range (start,end) is entirely contained in the sg element
and is already linear. All that is needed is to update the
data pointers, no allocate/copy is needed. The other case
is (start, end) crosses sg element boundaries. In this
case we allocate a block of size 'end - start' and copy
the data to linearize it.
Next sendpage hook has not copied any data in initial
state so that data pointers are (0,0). In this case we
handle it similar to the above sendmsg case except the
allocation/copy must always happen. Then when sending
the data we have possibly three memory regions that
need to be sent, (0, start - 1), (start, end), and
(end + 1, msg_length). This is required to ensure any
writes by the BPF program are correctly transmitted.
Lastly this operation will invalidate any previous
data checks so BPF programs will have to revalidate
pointers after making this BPF call.
Signed-off-by: John Fastabend <[email protected]>
Acked-by: David S. Miller <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
|
|
In the case where we need a specific number of bytes before a
verdict can be assigned, even if the data spans multiple sendmsg
or sendfile calls. The BPF program may use msg_cork_bytes().
The extreme case is a user can call sendmsg repeatedly with
1-byte msg segments. Obviously, this is bad for performance but
is still valid. If the BPF program needs N bytes to validate
a header it can use msg_cork_bytes to specify N bytes and the
BPF program will not be called again until N bytes have been
accumulated. The infrastructure will attempt to coalesce data
if possible so in many cases (most my use cases at least) the
data will be in a single scatterlist element with data pointers
pointing to start/end of the element. However, this is dependent
on available memory so is not guaranteed. So BPF programs must
validate data pointer ranges, but this is the case anyways to
convince the verifier the accesses are valid.
Signed-off-by: John Fastabend <[email protected]>
Acked-by: David S. Miller <[email protected]>
Acked-by: Alexei Starovoitov <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
|
|
A single sendmsg or sendfile system call can contain multiple logical
messages that a BPF program may want to read and apply a verdict. But,
without an apply_bytes helper any verdict on the data applies to all
bytes in the sendmsg/sendfile. Alternatively, a BPF program may only
care to read the first N bytes of a msg. If the payload is large say
MB or even GB setting up and calling the BPF program repeatedly for
all bytes, even though the verdict is already known, creates
unnecessary overhead.
To allow BPF programs to control how many bytes a given verdict
applies to we implement a bpf_msg_apply_bytes() helper. When called
from within a BPF program this sets a counter, internal to the
BPF infrastructure, that applies the last verdict to the next N
bytes. If the N is smaller than the current data being processed
from a sendmsg/sendfile call, the first N bytes will be sent and
the BPF program will be re-run with start_data pointing to the N+1
byte. If N is larger than the current data being processed the
BPF verdict will be applied to multiple sendmsg/sendfile calls
until N bytes are consumed.
Note1 if a socket closes with apply_bytes counter non-zero this
is not a problem because data is not being buffered for N bytes
and is sent as its received.
Signed-off-by: John Fastabend <[email protected]>
Acked-by: David S. Miller <[email protected]>
Acked-by: Alexei Starovoitov <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
|
|
This implements a BPF ULP layer to allow policy enforcement and
monitoring at the socket layer. In order to support this a new
program type BPF_PROG_TYPE_SK_MSG is used to run the policy at
the sendmsg/sendpage hook. To attach the policy to sockets a
sockmap is used with a new program attach type BPF_SK_MSG_VERDICT.
Similar to previous sockmap usages when a sock is added to a
sockmap, via a map update, if the map contains a BPF_SK_MSG_VERDICT
program type attached then the BPF ULP layer is created on the
socket and the attached BPF_PROG_TYPE_SK_MSG program is run for
every msg in sendmsg case and page/offset in sendpage case.
BPF_PROG_TYPE_SK_MSG Semantics/API:
BPF_PROG_TYPE_SK_MSG supports only two return codes SK_PASS and
SK_DROP. Returning SK_DROP free's the copied data in the sendmsg
case and in the sendpage case leaves the data untouched. Both cases
return -EACESS to the user. Returning SK_PASS will allow the msg to
be sent.
In the sendmsg case data is copied into kernel space buffers before
running the BPF program. The kernel space buffers are stored in a
scatterlist object where each element is a kernel memory buffer.
Some effort is made to coalesce data from the sendmsg call here.
For example a sendmsg call with many one byte iov entries will
likely be pushed into a single entry. The BPF program is run with
data pointers (start/end) pointing to the first sg element.
In the sendpage case data is not copied. We opt not to copy the
data by default here, because the BPF infrastructure does not
know what bytes will be needed nor when they will be needed. So
copying all bytes may be wasteful. Because of this the initial
start/end data pointers are (0,0). Meaning no data can be read or
written. This avoids reading data that may be modified by the
user. A new helper is added later in this series if reading and
writing the data is needed. The helper call will do a copy by
default so that the page is exclusively owned by the BPF call.
The verdict from the BPF_PROG_TYPE_SK_MSG applies to the entire msg
in the sendmsg() case and the entire page/offset in the sendpage case.
This avoids ambiguity on how to handle mixed return codes in the
sendmsg case. Again a helper is added later in the series if
a verdict needs to apply to multiple system calls and/or only
a subpart of the currently being processed message.
The helper msg_redirect_map() can be used to select the socket to
send the data on. This is used similar to existing redirect use
cases. This allows policy to redirect msgs.
Pseudo code simple example:
The basic logic to attach a program to a socket is as follows,
// load the programs
bpf_prog_load(SOCKMAP_TCP_MSG_PROG, BPF_PROG_TYPE_SK_MSG,
&obj, &msg_prog);
// lookup the sockmap
bpf_map_msg = bpf_object__find_map_by_name(obj, "my_sock_map");
// get fd for sockmap
map_fd_msg = bpf_map__fd(bpf_map_msg);
// attach program to sockmap
bpf_prog_attach(msg_prog, map_fd_msg, BPF_SK_MSG_VERDICT, 0);
Adding sockets to the map is done in the normal way,
// Add a socket 'fd' to sockmap at location 'i'
bpf_map_update_elem(map_fd_msg, &i, fd, BPF_ANY);
After the above any socket attached to "my_sock_map", in this case
'fd', will run the BPF msg verdict program (msg_prog) on every
sendmsg and sendpage system call.
For a complete example see BPF selftests or sockmap samples.
Implementation notes:
It seemed the simplest, to me at least, to use a refcnt to ensure
psock is not lost across the sendmsg copy into the sg, the bpf program
running on the data in sg_data, and the final pass to the TCP stack.
Some performance testing may show a better method to do this and avoid
the refcnt cost, but for now use the simpler method.
Another item that will come after basic support is in place is
supporting MSG_MORE flag. At the moment we call sendpages even if
the MSG_MORE flag is set. An enhancement would be to collect the
pages into a larger scatterlist and pass down the stack. Notice that
bpf_tcp_sendmsg() could support this with some additional state saved
across sendmsg calls. I built the code to support this without having
to do refactoring work. Other features TBD include ZEROCOPY and the
TCP_RECV_QUEUE/TCP_NO_QUEUE support. This will follow initial series
shortly.
Future work could improve size limits on the scatterlist rings used
here. Currently, we use MAX_SKB_FRAGS simply because this was being
used already in the TLS case. Future work could extend the kernel sk
APIs to tune this depending on workload. This is a trade-off
between memory usage and throughput performance.
Signed-off-by: John Fastabend <[email protected]>
Acked-by: David S. Miller <[email protected]>
Acked-by: Alexei Starovoitov <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
|
|
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Enable RAW QP to be able to configure burst control by modify_qp. By
using burst control with rate limiting, user can achieve best
performance and accuracy. The burst control information is passed by
user through udata.
This patch also reports burst control capability for mlx5 related
hardwares, burst control is only marked as supported when both
packet_pacing_burst_bound and packet_pacing_typical_size are
supported.
Signed-off-by: Bodong Wang <[email protected]>
Reviewed-by: Daniel Jurgens <[email protected]>
Reviewed-by: Yishai Hadas <[email protected]>
Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
What is going on here is a bit subtle, in the kernel there is no
problem because the struct is copied using copy_from_user, so it
can safely have an 8 byte alignment, however in userspace it must
be constructed by concatenation with the ib_uverbs_alloc_pd_resp
struct. This is due to the required memory layout to execute the
command.
Since ibv_uverbs_alloc_pd_resp is only 4 bytes long, this causes
misalignment, and the user space will experience an unexpected padding.
Currently it works around this via pointer maths.
Make everything more robust by having the compiler reduce the alignment
of the struct to 4. The userspace has assertions to ensure this
works properly in all situations.
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
Currently, the offsets in the UAC2 processing unit descriptor are
calculated incorrectly. It causes an issue when connecting the device which
provides such a feature:
~~~~
[84126.724420] usb 1-1.3.1: invalid Processing Unit descriptor (id 18)
~~~~
After this patch is applied, the UAC2 processing unit inits w/o this error.
Fixes: 23caaf19b11e ("ALSA: usb-mixer: Add support for Audio Class v2.0")
Signed-off-by: Kirill Marinushkin <[email protected]>
Cc: <[email protected]>
Signed-off-by: Takashi Iwai <[email protected]>
|
|
Dealing with 'struct timeval' users in the y2038 series is a bit tricky:
We have two definitions of timeval that are visible to user space,
one comes from glibc (or some other C library), the other comes from
linux/time.h. The kernel copy is what we want to be used for a number of
structures defined by the kernel itself, e.g. elf_prstatus (used it core
dumps), sysinfo and rusage (used in system calls). These generally tend
to be used for passing time intervals rather than absolute (epoch-based)
times, so they do not suffer from the y2038 overflow. Some of them
could be changed to use 64-bit timestamps by creating new system calls,
others like the core files cannot easily be changed.
An application using these interfaces likely also uses gettimeofday()
or other interfaces that use absolute times, and pass 'struct timeval'
pointers directly into kernel interfaces, so glibc must redefine their
timeval based on a 64-bit time_t when they introduce their y2038-safe
interfaces.
The only reasonable way forward I see is to remove the 'timeval'
definion from the kernel's uapi headers, and change the interfaces that
we do not want to (or cannot) duplicate for 64-bit times to use a new
__kernel_old_timeval definition instead. This type should be avoided
for all new interfaces (those can use 64-bit nanoseconds, or the 64-bit
version of timespec instead), and should be used with great care when
converting existing interfaces from timeval, to be sure they don't suffer
from the y2038 overflow, and only with consensus for the particular user
that using __kernel_old_timeval is better than moving to a 64-bit based
interface. The structure name is intentionally chosen to not conflict
with user space types, and to be ugly enough to discourage its use.
Note that ioctl based interfaces that pass a bare 'timeval' pointer
cannot change to '__kernel_old_timeval' because the user space source
code refers to 'timeval' instead, and we don't want to modify the user
space sources if possible. However, any application that relies on a
structure to contain an embedded 'timeval' (e.g. by passing a pointer
to the member into a function call that expects a timeval pointer) is
broken when that structure gets converted to __kernel_old_timeval. I
don't see any way around that, and we have to rely on the compiler to
produce a warning or compile failure that will alert users when they
recompile their sources against a new libc.
Signed-off-by: Arnd Bergmann <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: Stephen Boyd <[email protected]>
Cc: John Stultz <[email protected]>
Cc: Al Viro <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
We want the staging fixes in here as well to handle merge/test issues.
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
SPARC M7 processor introduces a new feature - Application Data
Integrity (ADI). ADI allows MMU to catch rogue accesses to memory.
When a rogue access occurs, MMU blocks the access and raises an
exception. In response to the exception, kernel sends the offending
task a SIGSEGV with si_code that indicates the nature of exception.
This patch adds three new signal codes specific to ADI feature:
1. ADI is not enabled for the address and task attempted to access
memory using ADI
2. Task attempted to access memory using wrong ADI tag and caused
a deferred exception.
3. Task attempted to access memory using wrong ADI tag and caused
a precise exception.
Signed-off-by: Khalid Aziz <[email protected]>
Cc: Khalid Aziz <[email protected]>
Reviewed-by: Anthony Yznaga <[email protected]>
Acked-by: "Eric W. Biederman" <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Publications for TIPC_CLUSTER_SCOPE and TIPC_ZONE_SCOPE are in all
aspects handled the same way, both on the publishing node and on the
receiving nodes.
Despite previous ambitions to the contrary, this is never going to change,
so we take the conseqeunce of this and obsolete TIPC_ZONE_SCOPE and related
macros/functions. Whenever a user is doing a bind() or a sendmsg() attempt
using ZONE_SCOPE we translate this internally to CLUSTER_SCOPE, while we
remain compatible with users and remote nodes still using ZONE_SCOPE.
Furthermore, the non-formalized scope value 0 has always been permitted
for use during lookup, with the same meaning as ZONE_SCOPE/CLUSTER_SCOPE.
We now permit it even as binding scope, but for compatibility reasons we
choose to not change the value of TIPC_CLUSTER_SCOPE.
Acked-by: Ying Xue <[email protected]>
Signed-off-by: Jon Maloy <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
It happens often while I'm preparing a patch for a block driver that
I'm wondering: is a definition of SECTOR_SIZE and/or SECTOR_SHIFT
available for this driver? Do I have to introduce definitions of these
constants before I can use these constants? To avoid this confusion,
move the existing definitions of SECTOR_SIZE and SECTOR_SHIFT into the
<linux/blkdev.h> header file such that these become available for all
block drivers. Make the SECTOR_SIZE definition in the uapi msdos_fs.h
header file conditional to avoid that including that header file after
<linux/blkdev.h> causes the compiler to complain about a SECTOR_SIZE
redefinition.
Note: the SECTOR_SIZE / SECTOR_SHIFT / SECTOR_BITS definitions have
not been removed from uapi header files nor from NAND drivers in
which these constants are used for another purpose than converting
block layer offsets and sizes into a number of sectors.
Cc: David S. Miller <[email protected]>
Cc: Mike Snitzer <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Nitin Gupta <[email protected]>
Reviewed-by: Sergey Senozhatsky <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Johannes Thumshirn <[email protected]>
Reviewed-by: Martin K. Petersen <[email protected]>
Signed-off-by: Bart Van Assche <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Allowing a guest to execute MWAIT without interception enables a guest
to put a (physical) CPU into a power saving state, where it takes
longer to return from than what may be desired by the host.
Don't give a guest that power over a host by default. (Especially,
since nothing prevents a guest from using MWAIT even when it is not
advertised via CPUID.)
Cc: Paolo Bonzini <[email protected]>
Cc: Radim Krčmář <[email protected]>
Cc: Jan H. Schönherr <[email protected]>
Signed-off-by: Wanpeng Li <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
This patch adds TCP_NLA_SND_SSTHRESH stat into SCM_TIMESTAMPING_OPT_STATS
that reports tcp_sock.snd_ssthresh.
Signed-off-by: Yousuk Seung <[email protected]>
Signed-off-by: Neal Cardwell <[email protected]>
Signed-off-by: Priyaranjan Jha <[email protected]>
Signed-off-by: Soheil Hassas Yeganeh <[email protected]>
Signed-off-by: Yuchung Cheng <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
When we have a bridge with vlan_filtering on and a vlan device on top of
it, packets would be corrupted in skb_vlan_untag() called from
br_dev_xmit().
The problem sits in skb_reorder_vlan_header() used in skb_vlan_untag(),
which makes use of skb->mac_len. In this function mac_len is meant for
handling rx path with vlan devices with reorder_header disabled, but in
tx path mac_len is typically 0 and cannot be used, which is the problem
in this case.
The current code even does not properly handle rx path (skb_vlan_untag()
called from __netif_receive_skb_core()) with reorder_header off actually.
In rx path single tag case, it works as follows:
- Before skb_reorder_vlan_header()
mac_header data
v v
+-------------------+-------------+------+----
| ETH | VLAN | ETH |
| ADDRS | TPID | TCI | TYPE |
+-------------------+-------------+------+----
<-------- mac_len --------->
<------------->
to be removed
- After skb_reorder_vlan_header()
mac_header data
v v
+-------------------+------+----
| ETH | ETH |
| ADDRS | TYPE |
+-------------------+------+----
<-------- mac_len --------->
This is ok, but in rx double tag case, it corrupts packets:
- Before skb_reorder_vlan_header()
mac_header data
v v
+-------------------+-------------+-------------+------+----
| ETH | VLAN | VLAN | ETH |
| ADDRS | TPID | TCI | TPID | TCI | TYPE |
+-------------------+-------------+-------------+------+----
<--------------- mac_len ---------------->
<------------->
should be removed
<--------------------------->
actually will be removed
- After skb_reorder_vlan_header()
mac_header data
v v
+-------------------+------+----
| ETH | ETH |
| ADDRS | TYPE |
+-------------------+------+----
<--------------- mac_len ---------------->
So, two of vlan tags are both removed while only inner one should be
removed and mac_header (and mac_len) is broken.
skb_vlan_untag() is meant for removing the vlan header at (skb->data - 2),
so use skb->data and skb->mac_header to calculate the right offset.
Reported-by: Brandon Carpenter <[email protected]>
Fixes: a6e18ff11170 ("vlan: Fix untag operations of stacked vlans with REORDER_HEADER off")
Signed-off-by: Toshiaki Makita <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Report to the user area the TSO device capabilities, it includes the
max_tso size and the QP types that support it.
The TSO is applicable only when when of the ports is ETH and the device
supports it.
uresp logic around rss_caps is updated to fix a till-now harmless bug
computing the length of the structure to copy. The code did not handle the
implicit padding before rss_caps correctly. This is necessay to copy
tss_caps successfully.
Reviewed-by: Mark Bloch <[email protected]>
Signed-off-by: Yishai Hadas <[email protected]>
Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
Open coding a loose value is not acceptable for describing the uABI in
RDMA. Provide the missing struct.
Reviewed-by: Steve Wise <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
Open coding a loose value is not acceptable for describing the uABI in
RDMA. Provide the missing struct.
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
All of these defines are part of the uABI for the driver, this
header duplicates providers/i40iw/i40iw-abi.h in rdma-core.
Acked-by: Shiraz Saleem <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
MLX4_USER_DEV_CAP_LARGE_CQE (via mlx4_ib_alloc_ucontext_resp.dev_caps)
and MLX4_IB_QUERY_DEV_RESP_MASK_CORE_CLOCK_OFFSET (via
mlx4_uverbs_ex_query_device_resp.comp_mask) are copied directly to
userspace and form part of the uAPI.
Move them to the uapi header where they belong.
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
Open coding pointer math is not acceptable for describing the uABI in
RDMA. Provide structs for all the cases.
The udata is casted to the struct as close to the verbs entry point
as possible for maximum clarity. Function signatures and so forth
are revised to allow for this.
Tested-by: Yuval Shaia <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
This patch changes the type of cqn from u32 to u64 to keep
userspace and kernel consistent, initializes resp both for
cq and qp to zeros, and also changes the condition judgment
of outlen considering future caps extension.
Suggested-by: Jason Gunthorpe <[email protected]>
Fixes: e088a685eae9 (hns: Support rq record doorbell for the user space)
Fixes: 9b44703d0a21 (hns: Support cq record doorbell for the user space)
Signed-off-by: Yixian Liu <[email protected]>
Signed-off-by: Lijun Ou <[email protected]>
Signed-off-by: Wei Hu (Xavier) <[email protected]>
Signed-off-by: Shaobo Xu <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
v2:
* Fix error handling after kfd_bind_process_to_device in
kfd_ioctl_map_memory_to_gpu
v3:
* Add ioctl to acquire VM from a DRM FD
v4:
* Return number of successful map/unmap operations in failure cases
* Facilitate partial retry after failed map/unmap
* Added comments with parameter descriptions to new APIs
* Defined AMDKFD_IOC_FREE_MEMORY_OF_GPU write-only
Signed-off-by: Felix Kuehling <[email protected]>
Acked-by: Christian König <[email protected]>
Signed-off-by: Oded Gabbay <[email protected]>
|
|
Currently the number of GPUs is limited by aperture placement options
available on GFX7 and GFX8 hardware. This limitation is not necessary.
Scratch and LDS represent per-work-item and per-work-group storage
respectively. Different work-items and work-groups use the same virtual
address to access their own data. Work running on different GPUs is by
definition in different work-groups (different dispatches, in fact).
That means the same virtual addresses can be used for these apertures
on different GPUs.
Add a new AMDKFD_IOC_GET_PROCESS_APERTURES_NEW ioctl that removes the
artificial limitation on the number of GPUs that can be supported. The
new ioctl allows user mode to query the number of GPUs to allocate
enough memory for all GPUs to be reported.
This deprecates AMDKFD_IOC_GET_PROCESS_APERTURES.
Signed-off-by: Felix Kuehling <[email protected]>
Acked-by: Christian König <[email protected]>
Signed-off-by: Oded Gabbay <[email protected]>
|
|
Some architectures cannot always report accurately what kind of
floating-point exception triggered a floating-point exception trap.
This can occur with fp exceptions occurring on lanes in a vector
instruction on arm64 for example.
Rather than have every architecture come up with its own way of
describing such a condition, this patch adds a common FPE_FLTUNK
si_code value to report that an fp exception caused a trap but we
cannot be certain which kind of fp exception it was.
Signed-off-by: Dave Martin <[email protected]>
Signed-off-by: Eric W. Biederman <[email protected]>
|
|
ixjuser.h includes the telephony.h header. Other than that no kernel
code uses any of these headers. The last user of the ixjuser.h header
has been removed in commit 7326446c728 (Staging: remove telephony
drivers), more than 5 years ago.
Signed-off-by: Baruch Siach <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
Currently, bpf stackmap store address for each entry in the call trace.
To map these addresses to user space files, it is necessary to maintain
the mapping from these virtual address to symbols in the binary. Usually,
the user space profiler (such as perf) has to scan /proc/pid/maps at the
beginning of profiling, and monitor mmap2() calls afterwards. Given the
cost of maintaining the address map, this solution is not practical for
system wide profiling that is always on.
This patch tries to solve this problem with a variation of stackmap. This
variation is enabled by flag BPF_F_STACK_BUILD_ID. Instead of storing
addresses, the variation stores ELF file build_id + offset.
Build ID is a 20-byte unique identifier for ELF files. The following
command shows the Build ID of /bin/bash:
[user@]$ readelf -n /bin/bash
...
Build ID: XXXXXXXXXX
...
With BPF_F_STACK_BUILD_ID, bpf_get_stackid() tries to parse Build ID
for each entry in the call trace, and translate it into the following
struct:
struct bpf_stack_build_id_offset {
__s32 status;
unsigned char build_id[BPF_BUILD_ID_SIZE];
union {
__u64 offset;
__u64 ip;
};
};
The search of build_id is limited to the first page of the file, and this
page should be in page cache. Otherwise, we fallback to store ip for this
entry (ip field in struct bpf_stack_build_id_offset). This requires the
build_id to be stored in the first page. A quick survey of binary and
dynamic library files in a few different systems shows that almost all
binary and dynamic library files have build_id in the first page.
Build_id is only meaningful for user stack. If a kernel stack is added to
a stackmap with BPF_F_STACK_BUILD_ID, it will automatically fallback to
only store ip (status == BPF_STACK_BUILD_ID_IP). Similarly, if build_id
lookup failed for some reason, it will also fallback to store ip.
User space can access struct bpf_stack_build_id_offset with bpf
syscall BPF_MAP_LOOKUP_ELEM. It is necessary for user space to
maintain mapping from build id to binary files. This mostly static
mapping is much easier to maintain than per process address maps.
Note: Stackmap with build_id only works in non-nmi context at this time.
This is because we need to take mm->mmap_sem for find_vma(). If this
changes, we would like to allow build_id lookup in nmi context.
Signed-off-by: Song Liu <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
|
|
The vram type for dGPU is stored in umc_info while sys mem type
for APU is stored in integratedsysteminfo
Signed-off-by: Hawking Zhang <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
This patch is to add SCTP_AUTH_NO_AUTH type for AUTHENTICATION_EVENT,
as described in section 6.1.8 of RFC6458.
SCTP_AUTH_NO_AUTH: This report indicates that the peer does not
support SCTP authentication as defined in [RFC4895].
Note that the implementation is quite similar as that of
SCTP_ADAPTATION_INDICATION.
Signed-off-by: Xin Long <[email protected]>
Acked-by: Marcelo Ricardo Leitner <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
This patch is to add SCTP_AUTH_FREE_KEY type for AUTHENTICATION_EVENT,
as described in section 6.1.8 of RFC6458.
SCTP_AUTH_FREE_KEY: This report indicates that the SCTP
implementation will no longer use the key identifier specified
in auth_keynumber.
After deactivating a key, it would never be used again, which means
it's refcnt can't be held/increased by new chunks. But there may be
some chunks in out queue still using it. So only when refcnt is 1,
which means no chunk in outqueue is using/holding this key either,
this EVENT would be sent.
When users receive this notification, they could do DEL_KEY sockopt to
remove this shkey, and also tell the peer that this key won't be used
in any chunk thoroughly from now on, then the peer can remove it as
well safely.
Signed-off-by: Xin Long <[email protected]>
Acked-by: Marcelo Ricardo Leitner <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
This patch is to add sockopt SCTP_AUTH_DEACTIVATE_KEY, as described in
section 8.3.4 of RFC6458.
This set option indicates that the application will no longer send user
messages using the indicated key identifier.
Note that RFC requires that only deactivated keys that are no longer used
by an association can be deleted, but for the backward compatibility, it
is not to check deactivated when deleting or replacing one sh_key.
Signed-off-by: Xin Long <[email protected]>
Acked-by: Marcelo Ricardo Leitner <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
This patch is to add support for SCTP AUTH Information for sendmsg,
as described in section 5.3.8 of RFC6458.
With this option, you can provide shared key identifier used for
sending the user message.
It's also a necessary send info for sctp_sendv.
Note that it reuses sinfo->sinfo_tsn to indicate if this option is
set and sinfo->sinfo_ssn to save the shkey ID which can be 0.
Signed-off-by: Xin Long <[email protected]>
Acked-by: Marcelo Ricardo Leitner <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
No one has publicly stepped up to maintain this broken codebase for
devices that no one uses anymore, so let's just drop the whole thing.
If someone really wants/needs it, we can revert this and they can fix
the code up to work properly.
Cc: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
Dump the list of multicast flags entries via the netlink socket.
Signed-off-by: Linus Lüssing <[email protected]>
Signed-off-by: Sven Eckelmann <[email protected]>
Signed-off-by: Simon Wunderlich <[email protected]>
|
|
Dump the list of DAT cache entries via the netlink socket.
Signed-off-by: Linus Lüssing <[email protected]>
Signed-off-by: Sven Eckelmann <[email protected]>
Signed-off-by: Simon Wunderlich <[email protected]>
|
|
git://anongit.freedesktop.org/drm/drm-intel into drm-next
UAPI Changes:
- Query uAPI interface (used for GPU topology information currently)
* Mesa: https://patchwork.freedesktop.org/series/38795/
Driver Changes:
- Increase PSR2 size for CNL (DK)
- Avoid retraining LSPCON link unnecessarily (Ville)
- Decrease request signaling latency (Chris)
- GuC error capture fix (Daniele)
* tag 'drm-intel-next-2018-03-08' of git://anongit.freedesktop.org/drm/drm-intel: (127 commits)
drm/i915: Update DRIVER_DATE to 20180308
drm/i915: add schedule out notification of preempted but completed request
drm/i915: expose rcs topology through query uAPI
drm/i915: add query uAPI
drm/i915: add rcs topology to error state
drm/i915/debugfs: add rcs topology entry
drm/i915/debugfs: reuse max slice/subslices already stored in sseu
drm/i915: store all subslice masks
drm/i915/guc: work around gcc-4.4.4 union initializer issue
drm/i915/cnl: Add Wa_2201832410
drm/i915/icl: Gen11 forcewake support
drm/i915/icl: Add Indirect Context Offset for Gen11
drm/i915/icl: Enhanced execution list support
drm/i915/icl: new context descriptor support
drm/i915/icl: Correctly initialize the Gen11 engines
drm/i915: Assert that the request is indeed complete when signaled from irq
drm/i915: Handle changing enable_fbc parameter at runtime better.
drm/i915: Track whether the DP link is trained or not
drm/i915: Nuke intel_dp->channel_eq_status
drm/i915: Move SST DP link retraining into the ->post_hotplug() hook
...
|
|
git://people.freedesktop.org/~gabbayo/linux into drm-next
Major points for this pull request:
- Add dGPU support for amdkfd initialization code and queue handling. It's
not complete support since the GPUVM part is missing (the under debate stuff).
- Enable PCIe atomics for dGPU if present
- Various adjustments to the amdgpu<-->amdkfd interface for dGPUs
- Refactor IOMMUv2 code to allow loading amdkfd without IOMMUv2 in the system
- Add HSA process eviction code in case of system memory pressure
- Various fixes and small changes
* tag 'drm-amdkfd-next-2018-03-11' of git://people.freedesktop.org/~gabbayo/linux: (24 commits)
uapi: Fix type used in ioctl parameter structures
drm/amdkfd: Implement KFD process eviction/restore
drm/amdkfd: Add GPUVM virtual address space to PDD
drm/amdkfd: Remove unaligned memory access
drm/amdkfd: Centralize IOMMUv2 code and make it conditional
drm/amdgpu: Add submit IB function for KFD
drm/amdgpu: Add GPUVM memory management functions for KFD
drm/amdgpu: add amdgpu_sync_clone
drm/amdgpu: Update kgd2kfd_shared_resources for dGPU support
drm/amdgpu: Add KFD eviction fence
drm/amdgpu: Remove unused kfd2kgd interface
drm/amdgpu: Fix wrong mask in get_atc_vmid_pasid_mapping_pasid
drm/amdgpu: Fix header file dependencies
drm/amdgpu: Replace kgd_mem with amdgpu_bo for kernel pinned gtt mem
drm/amdgpu: remove useless BUG_ONs
drm/amdgpu: Enable KFD initialization on dGPUs
drm/amdkfd: Add dGPU device IDs and device info
drm/amdkfd: Add dGPU support to kernel_queue_init
drm/amdkfd: Add dGPU support to the MQD manager
drm/amdkfd: Add dGPU support to the device queue manager
...
|
|
git://anongit.freedesktop.org/drm/drm-misc into drm-next
drm-misc-next for 4.17:
UAPI Changes:
plane: Add color encoding/range properties (Jyri)
nouveau: Replace iturbt_709 property with color_encoding property (Ville)
Core Changes:
atomic: Move plane clipping into plane check helper (Ville)
property: Multiple new property checks/verification (Ville)
Driver Changes:
rockchip: Fixes & improvements for rk3399/chromebook plus (various)
sun4i: Add H3/H5 HDMI support (Jernej)
i915: Add support for limited/full-range ycbcr toggling (Ville)
pl111: Add bandwidth checking/limiting (Linus)
Cc: Jernej Skrabec <[email protected]>
Cc: Jyri Sarha <[email protected]>
Cc: Ville Syrjälä <[email protected]>
Cc: Linus Walleij <[email protected]>
* tag 'drm-misc-next-2018-03-09-3' of git://anongit.freedesktop.org/drm/drm-misc: (85 commits)
drm/rockchip: Don't use atomic constructs for psr
drm/rockchip: analogix_dp: set psr activate/deactivate when enable/disable bridge
drm/rockchip: dw_hdmi: Move HDMI vpll clock enable to bind()
drm/rockchip: inno_hdmi: reorder clk_disable_unprepare call in unbind
drm/rockchip: inno_hdmi: Fix error handling path.
drm/rockchip: dw-mipi-dsi: Fix connector and encoder cleanup.
drm/nouveau: Replace the iturbt_709 prop with the standard COLOR_ENCODING prop
drm/pl111: Use max memory bandwidth for resolution
drm/bridge: sii902x: Retry status read after DDI I2C
drm/pl111: Handle the RealView variant separately
drm/pl111: Make the default BPP a per-variant variable
drm: simple_kms_helper: Fix .mode_valid() documentation
bridge: Elaborate a bit on dumb VGA bridges in Kconfig
drm/atomic: Add new reverse iterator over all plane state (V2)
drm: Reject bad property flag combinations
drm: Make property flags u32
drm/uapi: Deprecate DRM_MODE_PROP_PENDING
drm: WARN when trying to add enum value > 63 to a bitmask property
drm: WARN when trying add enum values to non-enum/bitmask properties
drm: Reject replacing property enum values
...
|
|
This patch updates to support cq record doorbell for
the user space.
Signed-off-by: Yixian Liu <[email protected]>
Signed-off-by: Lijun Ou <[email protected]>
Signed-off-by: Wei Hu (Xavier) <[email protected]>
Signed-off-by: Shaobo Xu <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
This patch adds interfaces and definitions to support the rq record
doorbell for the user space.
Signed-off-by: Yixian Liu <[email protected]>
Signed-off-by: Lijun Ou <[email protected]>
Signed-off-by: Wei Hu (Xavier) <[email protected]>
Signed-off-by: Shaobo Xu <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
Problem and motivation: Once a breakpoint perf event (PERF_TYPE_BREAKPOINT)
is created, there is no flexibility to change the breakpoint type
(bp_type), breakpoint address (bp_addr), or breakpoint length (bp_len). The
only option is to close the perf event and configure a new breakpoint
event. This inflexibility has a significant performance overhead. For
example, sampling-based, lightweight performance profilers (and also
concurrency bug detection tools), monitor different addresses for a short
duration using PERF_TYPE_BREAKPOINT and change the address (bp_addr) to
another address or change the kind of breakpoint (bp_type) from "write" to
a "read" or vice-versa or change the length (bp_len) of the address being
monitored. The cost of these modifications is prohibitive since it involves
unmapping the circular buffer associated with the perf event, closing the
perf event, opening another perf event and mmaping another circular buffer.
Solution: The new ioctl flag for perf events,
PERF_EVENT_IOC_MODIFY_ATTRIBUTES, introduced in this patch takes a pointer
to a struct perf_event_attr as an argument to update an old breakpoint
event with new address, type, and size. This facility allows retaining a
previous mmaped perf events ring buffer and avoids having to close and
reopen another perf event.
This patch supports only changing PERF_TYPE_BREAKPOINT event type; future
implementations can extend this feature. The patch replicates some of its
functionality of modify_user_hw_breakpoint() in
kernel/events/hw_breakpoint.c. modify_user_hw_breakpoint cannot be called
directly since perf_event_ctx_lock() is already held in _perf_ioctl().
Evidence: Experiments show that the baseline (not able to modify an already
created breakpoint) costs an order of magnitude (~10x) more than the
suggested optimization (having the ability to dynamically modifying a
configured breakpoint via ioctl). When the breakpoints typically do not
trap, the speedup due to the suggested optimization is ~10x; even when the
breakpoints always trap, the speedup is ~4x due to the suggested
optimization.
Testing: tests posted at
https://github.com/linux-contrib/perf_event_modify_bp demonstrate the
performance significance of this patch. Tests also check the functional
correctness of the patch.
Signed-off-by: Milind Chabbi <[email protected]>
[ Using modify_user_hw_breakpoint_check function. ]
[ Reformated PERF_EVENT_IOC_*, so the values are all in one column. ]
Signed-off-by: Jiri Olsa <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Hari Bathini <[email protected]>
Cc: Jin Yao <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Sukadev Bhattiprolu <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Will Deacon <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
The planned change to unify the behaviour of the MONOTONIC and BOOTTIME
clocks vs. suspend removes the ability to retrieve the active
non-suspended time of a system.
Provide a new CLOCK_MONOTONIC_ACTIVE clock which returns the active
non-suspended time of the system via clock_gettime().
This preserves the old behaviour of CLOCK_MONOTONIC before the
BOOTTIME/MONOTONIC unification.
This new clock also allows applications to detect programmatically that
the MONOTONIC and BOOTTIME clocks are identical.
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: Dmitry Torokhov <[email protected]>
Cc: John Stultz <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Kevin Easton <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Mark Salyzyn <[email protected]>
Cc: Michael Kerrisk <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Petr Mladek <[email protected]>
Cc: Prarit Bhargava <[email protected]>
Cc: Sergey Senozhatsky <[email protected]>
Cc: Steven Rostedt <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
The Nuvoton UART is almost compatible with the 8250 driver when probed
via the 8250_of driver, however it requires some extra configuration
at startup.
Reviewed-by: Rob Herring <[email protected]>
Signed-off-by: Joel Stanley <[email protected]>
Cc: stable <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"One notable fix to properly advertise our support for a new firmware
feature, caused by two series conflicting semantically but not
textually.
There's a new ioctl for the new ocxl driver, which is not a fix, but
needed to complete the userspace API and good to have before the
driver is in a released kernel.
Finally three minor selftest fixes, and a fix for intermittent build
failures for some obscure platforms, caused by a missing make
dependency.
Thanks to: Alastair D'Silva, Bharata B Rao, Guenter Roeck"
* tag 'powerpc-4.16-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/pseries: Fix vector5 in ibm architecture vector table
ocxl: Document the OCXL_IOCTL_GET_METADATA IOCTL
ocxl: Add get_metadata IOCTL to share OCXL information to userspace
selftests/powerpc: Skip the subpage_prot tests if the syscall is unavailable
selftests/powerpc: Fix missing clean of pmu/lib.o
powerpc/boot: Fix random libfdt related build errors
selftests/powerpc: Skip tm-trap if transactional memory is not enabled
|
|
Newer GPU cores added yet more feature bits. Make room for them and
let userspace query them.
Signed-off-by: Lucas Stach <[email protected]>
|
|
We use a two-step process to configure a filter with RSS spreading. First,
the RSS context is allocated and configured using ETHTOOL_SRSSH; this
returns an identifier (rss_context) which can then be passed to subsequent
invocations of ETHTOOL_SRXCLSRLINS to specify that the offset from the RSS
indirection table lookup should be added to the queue number (ring_cookie)
when delivering the packet. Drivers for devices which can only use the
indirection table entry directly (not add it to a base queue number)
should reject rule insertions combining RSS with a nonzero ring_cookie.
Signed-off-by: Edward Cree <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Implement the RDMA nldev netlink interface for dumping detailed PD
information.
Reviewed-by: Leon Romanovsky <[email protected]>
Signed-off-by: Steve Wise <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
Implement the RDMA nldev netlink interface for dumping detailed
MR information.
Signed-off-by: Steve Wise <[email protected]>
Reviewed-by: Leon Romanovsky <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|