aboutsummaryrefslogtreecommitdiff
path: root/include/uapi/linux
AgeCommit message (Collapse)AuthorFilesLines
2020-07-24Merge branch 'io_uring-5.8' into for-5.9/io_uringJens Axboe1-0/+1
Merge in io_uring-5.8 fixes, as changes/cleanups to how we do locked mem accounting require a fixup, and only one of the spots are noticed by git as the other merges cleanly. The flags fix from io_uring-5.8 also causes a merge conflict, the leak fix for recvmsg, the double poll fix, and the link failure locking fix. * io_uring-5.8: io_uring: fix lockup in io_fail_links() io_uring: fix ->work corruption with poll_add io_uring: missed req_init_async() for IOSQE_ASYNC io_uring: always allow drain/link/hardlink/async sqe flags io_uring: ensure double poll additions work with both request types io_uring: fix recvmsg memory leak with buffer selection io_uring: fix not initialised work->flags io_uring: fix missing msg_name assignment io_uring: account user memory freed when exit has been queued io_uring: fix memleak in io_sqe_files_register() io_uring: fix memleak in __io_sqe_files_update() io_uring: export cq overflow status to userspace Signed-off-by: Jens Axboe <[email protected]>
2020-07-24Merge v5.8-rc6 into drm-nextDave Airlie10-27/+39
I've got a silent conflict + two trees based on fixes to merge. Fixes a silent merge with amdgpu Signed-off-by: Dave Airlie <[email protected]>
2020-07-23hyperv: hyperv.h: drop a duplicated wordRandy Dunlap1-1/+1
Drop the repeated word "the" in a comment. Signed-off-by: Randy Dunlap <[email protected]> Cc: "K. Y. Srinivasan" <[email protected]> Cc: Haiyang Zhang <[email protected]> Cc: Stephen Hemminger <[email protected]> Cc: Wei Liu <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Wei Liu <[email protected]>
2020-07-23android: binder.h: drop a duplicated wordRandy Dunlap1-1/+1
Drop the repeated word "the" in a comment. Cc: Arve Hjønnevåg <[email protected]> Cc: Todd Kjos <[email protected]> Cc: Martijn Coenen <[email protected]> Cc: Joel Fernandes <[email protected]> Cc: Hridya Valsaraju <[email protected]> Cc: Suren Baghdasaryan <[email protected]> Cc: [email protected] Acked-by: Christian Brauner <[email protected]> Signed-off-by: Randy Dunlap <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2020-07-23Merge tag 'fpga-for-5.9' of ↵Greg Kroah-Hartman1-0/+82
git://git.kernel.org/pub/scm/linux/kernel/git/mdf/linux-fpga into char-misc-next Moritz writes: FPGA Manager changes for 5.9-rc1 Here is the (slightly larger than usual) patch set for the 5.9-rc1 merge window. DFL: - Xu's changes add support for AFU interrupt handling and puts them to use for error handling. - Xu's other change also adds another device-id for the Intel FPGA PAC N3000. - John's change converts from using get_user_pages() to pin_user_pages(). - Gustavo's patch cleans up some of the allocation by using struct_size(). Xilinx: - Luca's changes clean up the xilinx-spi and xilinx-slave-serial drivers and updates the comments and dt-bindings to reflect the fact it also supports 7 series devices. Core: - Tom cleaned up the fpga-bridge / fpga-mgr core by removing some dead-stores. All patches have been reviewed on the mailing list, and have been in the last few linux-next releases (as part of my for-next branch) without issues. Signed-off-by: Moritz Fischer <[email protected]> * tag 'fpga-for-5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/mdf/linux-fpga: fpga: dfl: pci: add device id for Intel FPGA PAC N3000 Documentation: fpga: dfl: add descriptions for interrupt related interfaces. fpga: dfl: afu: add AFU interrupt support fpga: dfl: fme: add interrupt support for global error reporting fpga: dfl: afu: add interrupt support for port error reporting fpga: dfl: introduce interrupt trigger setting API fpga: dfl: pci: add irq info for feature devices enumeration fpga: dfl: parse interrupt info for feature devices on enumeration fpga manager: xilinx-spi: check INIT_B pin during write_init dt-bindings: fpga: xilinx-slave-serial: add optional INIT_B GPIO fpga: Fix dead store in fpga-bridge.c fpga: Fix dead store fpga-mgr.c fpga: dfl: Use struct_size() in kzalloc() fpga manager: xilinx-spi: remove unneeded, mistyped variables fpga manager: xilinx-spi: valid for the 7 Series too dt-bindings: fpga: xilinx-slave-serial: valid for the 7 Series too fpga: dfl: afu: convert get_user_pages() --> pin_user_pages()
2020-07-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller1-3/+94
Alexei Starovoitov says: ==================== pull-request: bpf-next 2020-07-21 The following pull-request contains BPF updates for your *net-next* tree. We've added 46 non-merge commits during the last 6 day(s) which contain a total of 68 files changed, 4929 insertions(+), 526 deletions(-). The main changes are: 1) Run BPF program on socket lookup, from Jakub. 2) Introduce cpumap, from Lorenzo. 3) s390 JIT fixes, from Ilya. 4) teach riscv JIT to emit compressed insns, from Luke. 5) use build time computed BTF ids in bpf iter, from Yonghong. ==================== Purely independent overlapping changes in both filter.h and xdp.h Signed-off-by: David S. Miller <[email protected]>
2020-07-21Merge branch 'for-linus' into nextDmitry Torokhov1-1/+2
Sync up with 'for-linus' branch to resolve conflict in Elan touchpad driver.
2020-07-21raid: md_p.h: drop duplicated word in a commentRandy Dunlap1-1/+1
Drop the doubled word "the" in a comment. Signed-off-by: Randy Dunlap <[email protected]> Cc: Song Liu <[email protected]> Cc: [email protected] Signed-off-by: Song Liu <[email protected]>
2020-07-21bareudp: Reverted support to enable & disable rx metadata collectionMartin Varghese1-1/+0
The commit fe80536acf83 ("bareudp: Added attribute to enable & disable rx metadata collection") breaks the the original(5.7) default behavior of bareudp module to collect RX metadadata at the receive. It was added to avoid the crash at the kernel neighbour subsytem when packet with metadata from bareudp is processed. But it is no more needed as the commit 394de110a733 ("net: Added pointer check for dst->ops->neigh_lookup in dst_neigh_lookup_skb") solves this crash. Fixes: fe80536acf83 ("bareudp: Added attribute to enable & disable rx metadata collection") Signed-off-by: Martin Varghese <[email protected]> Acked-by: Guillaume Nault <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-07-21audit: report audit wait metric in audit status replyMax Englander1-7/+11
In environments where the preservation of audit events and predictable usage of system memory are prioritized, admins may use a combination of --backlog_wait_time and -b options at the risk of degraded performance resulting from backlog waiting. In some cases, this risk may be preferred to lost events or unbounded memory usage. Ideally, this risk can be mitigated by making adjustments when backlog waiting is detected. However, detection can be difficult using the currently available metrics. For example, an admin attempting to debug degraded performance may falsely believe a full backlog indicates backlog waiting. It may turn out the backlog frequently fills up but drains quickly. To make it easier to reliably track degraded performance to backlog waiting, this patch makes the following changes: Add a new field backlog_wait_time_total to the audit status reply. Initialize this field to zero. Add to this field the total time spent by the current task on scheduled timeouts while the backlog limit is exceeded. Reset field to zero upon request via AUDIT_SET. Tested on Ubuntu 18.04 using complementary changes to the audit-userspace and audit-testsuite: - https://github.com/linux-audit/audit-userspace/pull/134 - https://github.com/linux-audit/audit-testsuite/pull/97 Signed-off-by: Max Englander <[email protected]> Signed-off-by: Paul Moore <[email protected]>
2020-07-20perf: Add perf_event_mmap_page::cap_user_time_short ABIPeter Zijlstra1-3/+20
In order to support short clock counters, provide an ABI extension. As a whole: u64 time, delta, cyc = read_cycle_counter(); + if (cap_user_time_short) + cyc = time_cycle + ((cyc - time_cycle) & time_mask); delta = mul_u64_u32_shr(cyc, time_mult, time_shift); if (cap_user_time_zero) time = time_zero + delta; delta += time_offset; Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Leo Yan <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]>
2020-07-20Merge v5.8-rc6 into char-misc-nextGreg Kroah-Hartman4-21/+27
We need the char/misc fixes in here as well. Signed-off-by: Greg Kroah-Hartman <[email protected]>
2020-07-20Merge 5.8-rc6 into usb-nextGreg Kroah-Hartman5-23/+29
We need the USB fixes in here as well. Signed-off-by: Greg Kroah-Hartman <[email protected]>
2020-07-20Merge 5.8-rc6 into tty-nextGreg Kroah-Hartman10-27/+39
We need the serial/tty fixes in here as well. Signed-off-by: Greg Kroah-Hartman <[email protected]>
2020-07-19ptp: introduce a phase offset in the periodic output requestVladimir Oltean1-2/+17
Some PHCs like the ocelot/felix switch cannot emit generic periodic output, but just PPS (pulse per second) signals, which: - don't start from arbitrary absolute times, but are rather phase-aligned to the beginning of [the closest next] second. - have an optional phase offset relative to that beginning of the second. For those, it was initially established that they should reject any other absolute time for the PTP_PEROUT_REQUEST than 0.000000000 [1]. But when it actually came to writing an application [2] that makes use of this functionality, we realized that we can't really deal generically with PHCs that support absolute start time, and with PHCs that don't, without an explicit interface. Namely, in an ideal world, PHC drivers would ensure that the "perout.start" value written to hardware will result in a functional output. This means that if the PTP time has become in the past of this PHC's current time, it should be automatically fast-forwarded by the driver into a close enough future time that is known to work (note: this is necessary only if the hardware doesn't do this fast-forward by itself). But we don't really know what is the status for PHC drivers in use today, so in the general sense, user space would be risking to have a non-functional periodic output if it simply asked for a start time of 0.000000000. So let's introduce a flag for this type of reduced-functionality hardware, named PTP_PEROUT_PHASE. The start time is just "soon", the only thing we know for sure about this signal is that its rising edge events, Rn, occur at: Rn = perout.phase + n * perout.period The "phase" in the periodic output structure is simply an alias to the "start" time, since both cannot logically be specified at the same time. Therefore, the binary layout of the structure is not affected. [1]: https://patchwork.ozlabs.org/project/netdev/patch/[email protected]/ [2]: https://www.mail-archive.com/[email protected]/msg04142.html Signed-off-by: Vladimir Oltean <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-07-19ptp: add ability to configure duty cycle for periodic outputVladimir Oltean1-3/+14
There are external event timestampers (PHCs with support for PTP_EXTTS_REQUEST) that timestamp both event edges. When those edges are very close (such as in the case of a short pulse), there is a chance that the collected timestamp might be of the rising, or of the falling edge, we never know. There are also PHCs capable of generating periodic output with a configurable duty cycle. This is good news, because we can space the rising and falling edge out enough in time, that the risks to overrun the 1-entry timestamp FIFO of the extts PHC are lower (example: the perout PHC can be configured for a period of 1 second, and an "on" time of 0.5 seconds, resulting in a duty cycle of 50%). A flag is introduced for signaling that an on time is present in the perout request structure, for preserving compatibility. Logically speaking, the duty cycle cannot exceed 100% and the PTP core checks for this. PHC drivers that don't support this flag emit a periodic output of an unspecified duty cycle, same as before. The duty cycle is encoded as an "on" time, similar to the "start" and "period" times, and reuses the reserved space while preserving overall binary layout. Pahole reported before: struct ptp_perout_request { struct ptp_clock_time start; /* 0 16 */ struct ptp_clock_time period; /* 16 16 */ unsigned int index; /* 32 4 */ unsigned int flags; /* 36 4 */ unsigned int rsv[4]; /* 40 16 */ /* size: 56, cachelines: 1, members: 5 */ /* last cacheline: 56 bytes */ }; And now: struct ptp_perout_request { struct ptp_clock_time start; /* 0 16 */ struct ptp_clock_time period; /* 16 16 */ unsigned int index; /* 32 4 */ unsigned int flags; /* 36 4 */ union { struct ptp_clock_time on; /* 40 16 */ unsigned int rsv[4]; /* 40 16 */ }; /* 40 16 */ /* size: 56, cachelines: 1, members: 5 */ /* last cacheline: 56 bytes */ }; Signed-off-by: Vladimir Oltean <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-07-19icmp: support rfc 4884Willem de Bruijn3-1/+36
Add setsockopt SOL_IP/IP_RECVERR_4884 to return the offset to an extension struct if present. ICMP messages may include an extension structure after the original datagram. RFC 4884 standardized this behavior. It stores the offset in words to the extension header in u8 icmphdr.un.reserved[1]. The field is valid only for ICMP types destination unreachable, time exceeded and parameter problem, if length is at least 128 bytes and entire packet does not exceed 576 bytes. Return the offset to the start of the extension struct when reading an ICMP error from the error queue, if it matches the above constraints. Do not return the raw u8 field. Return the offset from the start of the user buffer, in bytes. The kernel does not return the network and transport headers, so subtract those. Also validate the headers. Return the offset regardless of validation, as an invalid extension must still not be misinterpreted as part of the original datagram. Note that !invalid does not imply valid. If the extension version does not match, no validation can take place, for instance. For backward compatibility, make this optional, set by setsockopt SOL_IP/IP_RECVERR_RFC4884. For API example and feature test, see github.com/wdebruij/kerneltools/blob/master/tests/recv_icmp_v2.c For forward compatibility, reserve only setsockopt value 1, leaving other bits for additional icmp extensions. Changes v1->v2: - convert word offset to byte offset from start of user buffer - return in ee_data as u8 may be insufficient - define extension struct and object header structs - return len only if constraints met - if returning len, also validate Signed-off-by: Willem de Bruijn <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-07-19net: phy: add USXGMII link partner ability constantsMichael Walle1-0/+26
The constants are taken from the USXGMII Singleport Copper Interface specification. The naming are based on the SGMII ones, but with an MDIO_ prefix. Signed-off-by: Michael Walle <[email protected]> Reviewed-by: Russell King <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-07-19capabilities: Introduce CAP_CHECKPOINT_RESTOREAdrian Reber1-1/+8
This patch introduces CAP_CHECKPOINT_RESTORE, a new capability facilitating checkpoint/restore for non-root users. Over the last years, The CRIU (Checkpoint/Restore In Userspace) team has been asked numerous times if it is possible to checkpoint/restore a process as non-root. The answer usually was: 'almost'. The main blocker to restore a process as non-root was to control the PID of the restored process. This feature available via the clone3 system call, or via /proc/sys/kernel/ns_last_pid is unfortunately guarded by CAP_SYS_ADMIN. In the past two years, requests for non-root checkpoint/restore have increased due to the following use cases: * Checkpoint/Restore in an HPC environment in combination with a resource manager distributing jobs where users are always running as non-root. There is a desire to provide a way to checkpoint and restore long running jobs. * Container migration as non-root * We have been in contact with JVM developers who are integrating CRIU into a Java VM to decrease the startup time. These checkpoint/restore applications are not meant to be running with CAP_SYS_ADMIN. We have seen the following workarounds: * Use a setuid wrapper around CRIU: See https://github.com/FredHutch/slurm-examples/blob/master/checkpointer/lib/checkpointer/checkpointer-suid.c * Use a setuid helper that writes to ns_last_pid. Unfortunately, this helper delegation technique is impossible to use with clone3, and is thus prone to races. See https://github.com/twosigma/set_ns_last_pid * Cycle through PIDs with fork() until the desired PID is reached: This has been demonstrated to work with cycling rates of 100,000 PIDs/s See https://github.com/twosigma/set_ns_last_pid * Patch out the CAP_SYS_ADMIN check from the kernel * Run the desired application in a new user and PID namespace to provide a local CAP_SYS_ADMIN for controlling PIDs. This technique has limited use in typical container environments (e.g., Kubernetes) as /proc is typically protected with read-only layers (e.g., /proc/sys) for hardening purposes. Read-only layers prevent additional /proc mounts (due to proc's SB_I_USERNS_VISIBLE property), making the use of new PID namespaces limited as certain applications need access to /proc matching their PID namespace. The introduced capability allows to: * Control PIDs when the current user is CAP_CHECKPOINT_RESTORE capable for the corresponding PID namespace via ns_last_pid/clone3. * Open files in /proc/pid/map_files when the current user is CAP_CHECKPOINT_RESTORE capable in the root namespace, useful for recovering files that are unreachable via the file system such as deleted files, or memfd files. See corresponding selftest for an example with clone3(). Signed-off-by: Adrian Reber <[email protected]> Signed-off-by: Nicolas Viennot <[email protected]> Reviewed-by: Serge Hallyn <[email protected]> Acked-by: Christian Brauner <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Christian Brauner <[email protected]>
2020-07-19media: Add V4L2_TYPE_IS_CAPTURE helperEzequiel Garcia1-0/+2
It's all too easy to get confused by the V4L2_TYPE_IS_OUTPUT macro, when it's used as !V4L2_TYPE_IS_OUTPUT. Reduce the risk of confusion with macro to explicitly check for the CAPTURE queue type case. This change does not affect functionality, and it's only intended to make the code more readable. Suggested-by: Nicolas Dufresne <[email protected]> Signed-off-by: Ezequiel Garcia <[email protected]> Signed-off-by: Hans Verkuil <[email protected]> [[email protected]: checkpatch: align with parenthesis] Signed-off-by: Mauro Carvalho Chehab <[email protected]>
2020-07-17bpf: Introduce SK_LOOKUP program type with a dedicated attach pointJakub Sitnicki1-0/+77
Add a new program type BPF_PROG_TYPE_SK_LOOKUP with a dedicated attach type BPF_SK_LOOKUP. The new program kind is to be invoked by the transport layer when looking up a listening socket for a new connection request for connection oriented protocols, or when looking up an unconnected socket for a packet for connection-less protocols. When called, SK_LOOKUP BPF program can select a socket that will receive the packet. This serves as a mechanism to overcome the limits of what bind() API allows to express. Two use-cases driving this work are: (1) steer packets destined to an IP range, on fixed port to a socket 192.0.2.0/24, port 80 -> NGINX socket (2) steer packets destined to an IP address, on any port to a socket 198.51.100.1, any port -> L7 proxy socket In its run-time context program receives information about the packet that triggered the socket lookup. Namely IP version, L4 protocol identifier, and address 4-tuple. Context can be further extended to include ingress interface identifier. To select a socket BPF program fetches it from a map holding socket references, like SOCKMAP or SOCKHASH, and calls bpf_sk_assign(ctx, sk, ...) helper to record the selection. Transport layer then uses the selected socket as a result of socket lookup. In its basic form, SK_LOOKUP acts as a filter and hence must return either SK_PASS or SK_DROP. If the program returns with SK_PASS, transport should look for a socket to receive the packet, or use the one selected by the program if available, while SK_DROP informs the transport layer that the lookup should fail. This patch only enables the user to attach an SK_LOOKUP program to a network namespace. Subsequent patches hook it up to run on local delivery path in ipv4 and ipv6 stacks. Suggested-by: Marek Majkowski <[email protected]> Signed-off-by: Jakub Sitnicki <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-07-17tcp: add SNMP counter for no. of duplicate segments reported by DSACKPriyaranjan Jha1-0/+1
There are two existing SNMP counters, TCPDSACKRecv and TCPDSACKOfoRecv, which are incremented depending on whether the DSACKed range is below the cumulative ACK sequence number or not. Unfortunately, these both implicitly assume each DSACK covers only one segment. This makes these counters unusable for estimating spurious retransmit rates, or real/non-spurious loss rate. This patch introduces a new SNMP counter, TCPDSACKRecvSegs, which tracks the estimated number of duplicate segments based on: (DSACKed sequence range) / MSS. This counter is usable for estimating spurious retransmit rates, or real/non-spurious loss rate. Signed-off-by: Priyaranjan Jha <[email protected]> Signed-off-by: Neal Cardwell <[email protected]> Signed-off-by: Yuchung Cheng <[email protected]> Signed-off-by: Soheil Hassas Yeganeh <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-07-16bpf: Drop duplicated words in uapi helper commentsRandy Dunlap1-3/+3
Drop doubled words "will" and "attach". Signed-off-by: Randy Dunlap <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-07-16Merge tag 'char-misc-5.8-rc6' of ↵Linus Torvalds1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc into master Pull char/misc fixes from Greg KH: "Here are number of small char/misc driver fixes for 5.8-rc6 Not that many complex fixes here, just a number of small fixes for reported issues, and some new device ids. Nothing fancy. All of these have been in linux-next for a while with no reported issues" * tag 'char-misc-5.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (21 commits) virtio: virtio_console: add missing MODULE_DEVICE_TABLE() for rproc serial intel_th: Fix a NULL dereference when hub driver is not loaded intel_th: pci: Add Emmitsburg PCH support intel_th: pci: Add Tiger Lake PCH-H support intel_th: pci: Add Jasper Lake CPU support virt: vbox: Fix guest capabilities mask check virt: vbox: Fix VBGL_IOCTL_VMMDEV_REQUEST_BIG and _LOG req numbers to match upstream uio_pdrv_genirq: fix use without device tree and no interrupt uio_pdrv_genirq: Remove warning when irq is not specified coresight: etmv4: Fix CPU power management setup in probe() function coresight: cti: Fix error handling in probe Revert "zram: convert remaining CLASS_ATTR() to CLASS_ATTR_RO()" mei: bus: don't clean driver pointer misc: atmel-ssc: lock with mutex instead of spinlock phy: sun4i-usb: fix dereference of pointer phy0 before it is null checked phy: rockchip: Fix return value of inno_dsidphy_probe() phy: ti: j721e-wiz: Constify structs phy: ti: am654-serdes: Constify regmap_config phy: intel: fix enum type mismatch warning phy: intel: Fix compilation error on FIELD_PREP usage ...
2020-07-16bpf: cpumap: Add the possibility to attach an eBPF program to cpumapLorenzo Bianconi1-0/+5
Introduce the capability to attach an eBPF program to cpumap entries. The idea behind this feature is to add the possibility to define on which CPU run the eBPF program if the underlying hw does not support RSS. Current supported verdicts are XDP_DROP and XDP_PASS. This patch has been tested on Marvell ESPRESSObin using xdp_redirect_cpu sample available in the kernel tree to identify possible performance regressions. Results show there are no observable differences in packet-per-second: $./xdp_redirect_cpu --progname xdp_cpu_map0 --dev eth0 --cpu 1 rx: 354.8 Kpps rx: 356.0 Kpps rx: 356.8 Kpps rx: 356.3 Kpps rx: 356.6 Kpps rx: 356.6 Kpps rx: 356.7 Kpps rx: 355.8 Kpps rx: 356.8 Kpps rx: 356.8 Kpps Co-developed-by: Jesper Dangaard Brouer <[email protected]> Signed-off-by: Jesper Dangaard Brouer <[email protected]> Signed-off-by: Lorenzo Bianconi <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Jesper Dangaard Brouer <[email protected]> Link: https://lore.kernel.org/bpf/5c9febdf903d810b3415732e5cd98491d7d9067a.1594734381.git.lorenzo@kernel.org
2020-07-16cpumap: Formalize map value as a named structLorenzo Bianconi1-0/+9
As it has been already done for devmap, introduce 'struct bpf_cpumap_val' to formalize the expected values that can be passed in for a CPUMAP. Update cpumap code to use the struct. Signed-off-by: Lorenzo Bianconi <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Jesper Dangaard Brouer <[email protected]> Link: https://lore.kernel.org/bpf/754f950674665dae6139c061d28c1d982aaf4170.1594734381.git.lorenzo@kernel.org
2020-07-15net: caif: drop duplicate words in commentsRandy Dunlap1-1/+1
Drop doubled words "or" and "the" in several comments. Signed-off-by: Randy Dunlap <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: [email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2020-07-15Merge tag 'dmaengine-fix-5.8-rc6' of ↵Linus Torvalds1-0/+3
git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine into master Pull dmaengine fixes from Vinod Koul: - update dmaengine tree location to kernel.org - dmatest fix for completing threads - driver fixes for k3dma, fsl-dma, idxd, ,tegra, and few other drivers * tag 'dmaengine-fix-5.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine: (21 commits) dmaengine: ioat setting ioat timeout as module parameter dmaengine: fsl-edma: fix wrong tcd endianness for big-endian cpu dmaengine: dmatest: stop completed threads when running without set channel dmaengine: fsl-edma-common: correct DSIZE_32BYTE dmaengine: dw: Initialize channel before each transfer dmaengine: idxd: fix misc interrupt handler thread unmasking dmaengine: idxd: cleanup workqueue config after disabling dmaengine: tegra210-adma: Fix runtime PM imbalance on error dmaengine: mcf-edma: Fix NULL pointer exception in mcf_edma_tx_handler dmaengine: fsl-edma: Fix NULL pointer exception in fsl_edma_tx_handler dmaengine: fsl-edma: Add lockdep assert for exported function dmaengine: idxd: fix hw descriptor fields for delta record dmaengine: ti: k3-udma: add missing put_device() call in of_xudma_dev_get() dmaengine: sh: usb-dmac: set tx_result parameters dmaengine: ti: k3-udma: Fix delayed_work usage for tx drain workaround dmaengine: idxd: fix cdev locking for open and release dmaengine: imx-sdma: Fix: Remove 'always true' comparison MAINTAINERS: switch dmaengine tree to kernel.org dmaengine: ti: k3-udma: Fix the running channel handling in alloc_chan_resources dmaengine: ti: k3-udma: Fix cleanup code for alloc_chan_resources ...
2020-07-15include/uapi/linux: Update KFD ioctl versionAmber Lin1-1/+5
Bump KFD ioctl after adding SMI events support Signed-off-by: Amber Lin <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-07-15drm/amdkfd: Provide SMI events watchAmber Lin1-1/+15
When the compute is malfunctioning or performance drops, the system admin will use SMI (System Management Interface) tool to monitor/diagnostic what went wrong. This patch provides an event watch interface for the user space to register devices and subscribe events they are interested. After registered, the user can use annoymous file descriptor's poll function with wait-time specified and wait for events to happen. Once an event happens, the user can use read() to retrieve information related to the event. VM fault event is done in this patch. v2: - remove UNREGISTER and add event ENABLE/DISABLE - correct kfifo usage - move event message API to kfd_ioctl.h v3: send the event msg in text than in binary v4: support multiple clients v5: move events enablement from ioctl to fd write v6: sparse fix Signed-off-by: Amber Lin <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-07-14seccomp: Introduce addfd ioctl to seccomp user notifierSargun Dhillon1-0/+22
The current SECCOMP_RET_USER_NOTIF API allows for syscall supervision over an fd. It is often used in settings where a supervising task emulates syscalls on behalf of a supervised task in userspace, either to further restrict the supervisee's syscall abilities or to circumvent kernel enforced restrictions the supervisor deems safe to lift (e.g. actually performing a mount(2) for an unprivileged container). While SECCOMP_RET_USER_NOTIF allows for the interception of any syscall, only a certain subset of syscalls could be correctly emulated. Over the last few development cycles, the set of syscalls which can't be emulated has been reduced due to the addition of pidfd_getfd(2). With this we are now able to, for example, intercept syscalls that require the supervisor to operate on file descriptors of the supervisee such as connect(2). However, syscalls that cause new file descriptors to be installed can not currently be correctly emulated since there is no way for the supervisor to inject file descriptors into the supervisee. This patch adds a new addfd ioctl to remove this restriction by allowing the supervisor to install file descriptors into the intercepted task. By implementing this feature via seccomp the supervisor effectively instructs the supervisee to install a set of file descriptors into its own file descriptor table during the intercepted syscall. This way it is possible to intercept syscalls such as open() or accept(), and install (or replace, like dup2(2)) the supervisor's resulting fd into the supervisee. One replacement use-case would be to redirect the stdout and stderr of a supervisee into log file descriptors opened by the supervisor. The ioctl handling is based on the discussions[1] of how Extensible Arguments should interact with ioctls. Instead of building size into the addfd structure, make it a function of the ioctl command (which is how sizes are normally passed to ioctls). To support forward and backward compatibility, just mask out the direction and size, and match everything. The size (and any future direction) checks are done along with copy_struct_from_user() logic. As a note, the seccomp_notif_addfd structure is laid out based on 8-byte alignment without requiring packing as there have been packing issues with uapi highlighted before[2][3]. Although we could overload the newfd field and use -1 to indicate that it is not to be used, doing so requires changing the size of the fd field, and introduces struct packing complexity. [1]: https://lore.kernel.org/lkml/[email protected]/ [2]: https://lore.kernel.org/lkml/[email protected]/ [3]: https://lore.kernel.org/lkml/[email protected] Cc: Christoph Hellwig <[email protected]> Cc: Christian Brauner <[email protected]> Cc: Tycho Andersen <[email protected]> Cc: Jann Horn <[email protected]> Cc: Robert Sesek <[email protected]> Cc: Chris Palmer <[email protected]> Cc: Al Viro <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Suggested-by: Matt Denton <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sargun Dhillon <[email protected]> Reviewed-by: Will Drewry <[email protected]> Co-developed-by: Kees Cook <[email protected]> Signed-off-by: Kees Cook <[email protected]>
2020-07-14net: bridge: Add port attribute IFLA_BRPORT_MRP_IN_OPENHoratiu Vultur1-0/+1
This patch adds a new port attribute, IFLA_BRPORT_MRP_IN_OPEN, which allows to notify the userspace when the node lost the contiuity of MRP_InTest frames. Signed-off-by: Horatiu Vultur <[email protected]> Acked-by: Nikolay Aleksandrov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-07-14bridge: uapi: mrp: Extend MRP_INFO attributes for interconnect statusHoratiu Vultur1-0/+5
Extend the existing MRP_INFO to return status of MRP interconnect. In case there is no MRP interconnect on the node then the role will be disabled so the other attributes can be ignored. Signed-off-by: Horatiu Vultur <[email protected]> Acked-by: Nikolay Aleksandrov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-07-14bridge: uapi: mrp: Extend MRP attributes for MRP interconnectHoratiu Vultur2-0/+91
Extend the existing MRP netlink attributes to allow to configure MRP Interconnect: IFLA_BRIDGE_MRP_IN_ROLE - the parameter type is br_mrp_in_role which contains the interconnect id, the ring id, the interconnect role(MIM or MIC) and the port ifindex that represents the interconnect port. IFLA_BRIDGE_MRP_IN_STATE - the parameter type is br_mrp_in_state which contains the interconnect id and the interconnect state. IFLA_BRIDGE_MRP_IN_TEST - the parameter type is br_mrp_start_in_test which contains the interconnect id, the interval at which to send MRP_InTest frames, how many test frames can be missed before declaring the interconnect ring open and the period which represents for how long to send MRP_InTest frames. Signed-off-by: Horatiu Vultur <[email protected]> Acked-by: Nikolay Aleksandrov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-07-13Merge branch 'for-linus' of ↵Linus Torvalds1-1/+2
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input Pull input fixes from Dmitry Torokhov: "A few quirks for the Elan touchpad driver, another Thinkpad is being switched over from PS/2 to native RMI4 interface, and we gave a brand new SW_MACHINE_COVER switch definition" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: Input: elan_i2c - add more hardware ID for Lenovo laptops Input: i8042 - add Lenovo XiaoXin Air 12 to i8042 nomux list Revert "Input: elants_i2c - report resolution information for touch major" Input: elan_i2c - only increment wakeup count on touch Input: synaptics - enable InterTouch for ThinkPad X1E 1st gen ARM: dts: n900: remove mmc1 card detect gpio Input: add `SW_MACHINE_COVER`
2020-07-13Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller3-1/+16
Alexei Starovoitov says: ==================== pull-request: bpf-next 2020-07-13 The following pull-request contains BPF updates for your *net-next* tree. We've added 36 non-merge commits during the last 7 day(s) which contain a total of 62 files changed, 2242 insertions(+), 468 deletions(-). The main changes are: 1) Avoid trace_printk warning banner by switching bpf_trace_printk to use its own tracing event, from Alan. 2) Better libbpf support on older kernels, from Andrii. 3) Additional AF_XDP stats, from Ciara. 4) build time resolution of BTF IDs, from Jiri. 5) BPF_CGROUP_INET_SOCK_RELEASE hook, from Stanislav. ==================== Signed-off-by: David S. Miller <[email protected]>
2020-07-13atm: Replace HTTP links with HTTPS onesAlexander A. Klimov1-1/+1
Rationale: Reduces attack surface on kernel devs opening the links for MITM as HTTPS traffic is much harder to manipulate. Deterministic algorithm: For each file: If not .svg: For each line: If doesn't contain `\bxmlns\b`: For each link, `\bhttp://[^# \t\r\n]*(?:\w|/)`: If neither `\bgnu\.org/license`, nor `\bmozilla\.org/MPL\b`: If both the HTTP and HTTPS versions return 200 OK and serve the same content: Replace HTTP with HTTPS. Signed-off-by: Alexander A. Klimov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-07-13xsk: Add xdp statistics to xsk_diagCiara Loftus1-0/+11
Add xdp statistics to the information dumped through the xsk_diag interface Signed-off-by: Ciara Loftus <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-07-13xsk: Add new statisticsCiara Loftus1-1/+4
It can be useful for the user to know the reason behind a dropped packet. Introduce new counters which track drops on the receive path caused by: 1. rx ring being full 2. fill ring being empty Also, on the tx path introduce a counter which tracks the number of times we attempt pull from the tx ring when it is empty. Signed-off-by: Ciara Loftus <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-07-13NFSv4.2: add client side xattr caching.Frank van der Linden1-0/+1
Implement client side caching for NFSv4.2 extended attributes. The cache is a per-inode hashtable, with name/value entries. There is one special entry for the listxattr cache. NFS inodes have a pointer to a cache structure. The cache structure is allocated on demand, freed when the cache is invalidated. Memory shrinkers keep the size in check. Large entries (> PAGE_SIZE) are collected by a separate shrinker, and freed more aggressively than others. Signed-off-by: Frank van der Linden <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2020-07-13nfs,nfsd: NFSv4.2 extended attribute protocol definitionsFrank van der Linden1-0/+3
Add definitions for the new operations, errors and flags as defined in RFC 8276 (File System Extended Attributes in NFSv4). Signed-off-by: Frank van der Linden <[email protected]> Signed-off-by: Chuck Lever <[email protected]>
2020-07-12gpio: uapi: fix misplaced comment lineKent Gibson1-1/+1
The second line of the description for event_type is before the first. Move it to after the first line. Signed-off-by: Kent Gibson <[email protected]> Signed-off-by: Bartosz Golaszewski <[email protected]>
2020-07-11Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller4-21/+24
All conflicts seemed rather trivial, with some guidance from Saeed Mameed on the tc_ct.c one. Signed-off-by: David S. Miller <[email protected]>
2020-07-10Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds1-20/+21
Pull networking fixes from David Miller: 1) Restore previous behavior of CAP_SYS_ADMIN wrt loading networking BPF programs, from Maciej Żenczykowski. 2) Fix dropped broadcasts in mac80211 code, from Seevalamuthu Mariappan. 3) Slay memory leak in nl80211 bss color attribute parsing code, from Luca Coelho. 4) Get route from skb properly in ip_route_use_hint(), from Miaohe Lin. 5) Don't allow anything other than ARPHRD_ETHER in llc code, from Eric Dumazet. 6) xsk code dips too deeply into DMA mapping implementation internals. Add dma_need_sync and use it. From Christoph Hellwig 7) Enforce power-of-2 for BPF ringbuf sizes. From Andrii Nakryiko. 8) Check for disallowed attributes when loading flow dissector BPF programs. From Lorenz Bauer. 9) Correct packet injection to L3 tunnel devices via AF_PACKET, from Jason A. Donenfeld. 10) Don't advertise checksum offload on ipa devices that don't support it. From Alex Elder. 11) Resolve several issues in TCP MD5 signature support. Missing memory barriers, bogus options emitted when using syncookies, and failure to allow md5 key changes in established states. All from Eric Dumazet. 12) Fix interface leak in hsr code, from Taehee Yoo. 13) VF reset fixes in hns3 driver, from Huazhong Tan. 14) Make loopback work again with ipv6 anycast, from David Ahern. 15) Fix TX starvation under high load in fec driver, from Tobias Waldekranz. 16) MLD2 payload lengths not checked properly in bridge multicast code, from Linus Lüssing. 17) Packet scheduler code that wants to find the inner protocol currently only works for one level of VLAN encapsulation. Allow Q-in-Q situations to work properly here, from Toke Høiland-Jørgensen. 18) Fix route leak in l2tp, from Xin Long. 19) Resolve conflict between the sk->sk_user_data usage of bpf reuseport support and various protocols. From Martin KaFai Lau. 20) Fix socket cgroup v2 reference counting in some situations, from Cong Wang. 21) Cure memory leak in mlx5 connection tracking offload support, from Eli Britstein. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (146 commits) mlxsw: pci: Fix use-after-free in case of failed devlink reload mlxsw: spectrum_router: Remove inappropriate usage of WARN_ON() net: macb: fix call to pm_runtime in the suspend/resume functions net: macb: fix macb_suspend() by removing call to netif_carrier_off() net: macb: fix macb_get/set_wol() when moving to phylink net: macb: mark device wake capable when "magic-packet" property present net: macb: fix wakeup test in runtime suspend/resume routines bnxt_en: fix NULL dereference in case SR-IOV configuration fails libbpf: Fix libbpf hashmap on (I)LP32 architectures net/mlx5e: CT: Fix memory leak in cleanup net/mlx5e: Fix port buffers cell size value net/mlx5e: Fix 50G per lane indication net/mlx5e: Fix CPU mapping after function reload to avoid aRFS RX crash net/mlx5e: Fix VXLAN configuration restore after function reload net/mlx5e: Fix usage of rcu-protected pointer net/mxl5e: Verify that rpriv is not NULL net/mlx5: E-Switch, Fix vlan or qos setting in legacy mode net/mlx5: Fix eeprom support for SFP module cgroup: Fix sock_cgroup_data on big-endian. selftests: bpf: Fix detach from sockmap tests ...
2020-07-10seccomp: Fix ioctl number for SECCOMP_IOCTL_NOTIF_ID_VALIDKees Cook1-1/+2
When SECCOMP_IOCTL_NOTIF_ID_VALID was first introduced it had the wrong direction flag set. While this isn't a big deal as nothing currently enforces these bits in the kernel, it should be defined correctly. Fix the define and provide support for the old command until it is no longer needed for backward compatibility. Fixes: 6a21cc50f0c7 ("seccomp: add a return code to trap to userspace") Signed-off-by: Kees Cook <[email protected]>
2020-07-10KVM: x86: Add a capability for GUEST_MAXPHYADDR < HOST_MAXPHYADDR supportMohammed Gamal1-0/+2
This patch adds a new capability KVM_CAP_SMALLER_MAXPHYADDR which allows userspace to query if the underlying architecture would support GUEST_MAXPHYADDR < HOST_MAXPHYADDR and hence act accordingly (e.g. qemu can decide if it should warn for -cpu ..,phys-bits=X) The complications in this patch are due to unexpected (but documented) behaviour we see with NPF vmexit handling in AMD processor. If SVM is modified to add guest physical address checks in the NPF and guest #PF paths, we see the followning error multiple times in the 'access' test in kvm-unit-tests: test pte.p pte.36 pde.p: FAIL: pte 2000021 expected 2000001 Dump mapping: address: 0x123400000000 ------L4: 24c3027 ------L3: 24c4027 ------L2: 24c5021 ------L1: 1002000021 This is because the PTE's accessed bit is set by the CPU hardware before the NPF vmexit. This is handled completely by hardware and cannot be fixed in software. Therefore, availability of the new capability depends on a boolean variable allow_smaller_maxphyaddr which is set individually by VMX and SVM init routines. On VMX it's always set to true, on SVM it's only set to true when NPT is not enabled. CC: Tom Lendacky <[email protected]> CC: Babu Moger <[email protected]> Signed-off-by: Mohammed Gamal <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-07-10ethtool: add tunnel info interfaceJakub Kicinski2-0/+57
Add an interface to report offloaded UDP ports via ethtool netlink. Now that core takes care of tracking which UDP tunnel ports the NICs are aware of we can quite easily export this information out to user space. The responsibility of writing the netlink dumps is split between ethtool code and udp_tunnel_nic.c - since udp_tunnel module may not always be loaded, yet we should always report the capabilities of the NIC. $ ethtool --show-tunnels eth0 Tunnel information for eth0: UDP port table 0: Size: 4 Types: vxlan No entries UDP port table 1: Size: 4 Types: geneve, vxlan-gpe Entries (1): port 1230, vxlan-gpe v4: - back to v2, build fix is now directly in udp_tunnel.h v3: - don't compile ETHTOOL_MSG_TUNNEL_INFO_GET in if CONFIG_INET not set. v2: - fix string set count, - reorder enums in the uAPI, - fix type of ETHTOOL_A_TUNNEL_UDP_TABLE_TYPES to bitset in docs and comments. Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-07-10Merge tag 'io_uring-5.8-2020-07-10' of git://git.kernel.dk/linux-blockLinus Torvalds1-0/+1
Pull io_uring fixes from Jens Axboe: - Fix memleak for error path in registered files (Yang) - Export CQ overflow state in flags, necessary to fix a case where liburing doesn't know if it needs to enter the kernel (Xiaoguang) - Fix for a regression in when user memory is accounted freed, causing issues with back-to-back ring exit + init if the ulimit -l setting is very tight. * tag 'io_uring-5.8-2020-07-10' of git://git.kernel.dk/linux-block: io_uring: account user memory freed when exit has been queued io_uring: fix memleak in io_sqe_files_register() io_uring: fix memleak in __io_sqe_files_update() io_uring: export cq overflow status to userspace
2020-07-10char: raw: do not leak CONFIG_MAX_RAW_DEVS to userspaceMasahiro Yamada1-2/+0
include/uapi/linux/raw.h leaks CONFIG_MAX_RAW_DEVS to userspace. Userspace programs cannot use MAX_RAW_MINORS since CONFIG_MAX_RAW_DEVS is not available anyway. Remove the MAX_RAW_MINORS definition from the exported header, and use CONFIG_MAX_RAW_DEVS in drivers/char/raw.c While I was here, I converted printk(KERN_WARNING ...) to pr_warn(...) and stretched the warning message. Signed-off-by: Masahiro Yamada <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2020-07-10virt: vbox: Add a few new vmmdev request types to the userspace whitelistHans de Goede1-0/+3
Upstream VirtualBox has defined and is using a few new request types for vmmdev requests passed through /dev/vboxguest to the hypervisor. Add the defines for these to vbox_vmmdev_types.h and add add them to the whitelists of vmmdev requests which userspace is allowed to make. BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1789545 Acked-by: Arnd Bergmann <[email protected]> Signed-off-by: Hans de Goede <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>