aboutsummaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)AuthorFilesLines
2019-11-18mmc: core: Fix size overflow for mmc partitionsBradley Bolen1-1/+1
With large eMMC cards, it is possible to create general purpose partitions that are bigger than 4GB. The size member of the mmc_part struct is only an unsigned int which overflows for gp partitions larger than 4GB. Change this to a u64 to handle the overflow. Signed-off-by: Bradley Bolen <[email protected]> Signed-off-by: Ulf Hansson <[email protected]>
2019-11-17Merge tag 'iommu-fixes-v5.4-rc7' of ↵Linus Torvalds1-2/+4
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull iommu fixes from Joerg Roedel: - Fix for Intel IOMMU to correct invalidation commands when in SVA mode. - Update MAINTAINERS entry for Intel IOMMU * tag 'iommu-fixes-v5.4-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: iommu/vt-d: Fix QI_DEV_IOTLB_PFSID and QI_DEV_EIOTLB_PFSID macros MAINTAINERS: Update for INTEL IOMMU (VT-d) entry
2019-11-17libnvdimm: Move nd_device_attribute_group to device_typeDan Williams1-1/+0
A 'struct device_type' instance can carry default attributes for the device. Use this facility to remove the export of nd_device_attribute_group and put the responsibility on the core rather than leaf implementations to define this attribute. For regions this creates a new nd_region_attribute_groups[] added to the per-region device-type instances. Cc: Ira Weiny <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: "Oliver O'Halloran" <[email protected]> Cc: Vishal Verma <[email protected]> Cc: Aneesh Kumar K.V <[email protected]> Reviewed-by: Aneesh Kumar K.V <[email protected]> Link: https://lore.kernel.org/r/157309901138.1582359.12909354140826530394.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <[email protected]>
2019-11-16Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller6-26/+19
Lots of overlapping changes and parallel additions, stuff like that. Signed-off-by: David S. Miller <[email protected]>
2019-11-16percpu-refcount: Use normal instead of RCU-sched"Sebastian Andrzej Siewior1-8/+8
This is a revert of commit a4244454df129 ("percpu-refcount: use RCU-sched insted of normal RCU") which claims the only reason for using RCU-sched is "rcu_read_[un]lock() … are slightly more expensive than preempt_disable/enable()" and "As the RCU critical sections are extremely short, using sched-RCU shouldn't have any latency implications." The problem with using RCU-sched here is that it disables preemption and the release callback (called from percpu_ref_put_many()) must not acquire any sleeping locks like spinlock_t. This breaks PREEMPT_RT because some of the users acquire spinlock_t locks in their callbacks. Using rcu_read_lock() on PREEMPTION=n kernels is not any different compared to rcu_read_lock_sched(). On PREEMPTION=y kernels there are already performance issues due to additional preemption points. Looking at the code, the rcu_read_lock() is just an increment and unlock is almost just a decrement unless there is something special to do. Both are functions while disabling preemption is inlined. Doing a small benchmark, the minimal amount of time required was mostly the same. The average time required was higher due to the higher MAX value (which could be preemption). With DEBUG_PREEMPT=y it is rcu_read_lock_sched() that takes a little longer due to the additional debug code. Convert back to normal RCU. Signed-off-by: Sebastian Andrzej Siewior <[email protected]> Signed-off-by: Dennis Zhou <[email protected]>
2019-11-17crypto: ablkcipher - remove deprecated and unused ablkcipher supportArd Biesheuvel1-435/+0
Now that all users of the deprecated ablkcipher interface have been moved to the skcipher interface, ablkcipher is no longer used and can be removed. Reviewed-by: Eric Biggers <[email protected]> Signed-off-by: Ard Biesheuvel <[email protected]> Signed-off-by: Herbert Xu <[email protected]>
2019-11-16Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds1-0/+1
Pull networking fixes from David Miller: 1) Fix memory leak in xfrm_state code, from Steffen Klassert. 2) Fix races between devlink reload operations and device setup/cleanup, from Jiri Pirko. 3) Null deref in NFC code, from Stephan Gerhold. 4) Refcount fixes in SMC, from Ursula Braun. 5) Memory leak in slcan open error paths, from Jouni Hogander. 6) Fix ETS bandwidth validation in hns3, from Yonglong Liu. 7) Info leak on short USB request answers in ax88172a driver, from Oliver Neukum. 8) Release mem region properly in ep93xx_eth, from Chuhong Yuan. 9) PTP config timestamp flags validation, from Richard Cochran. 10) Dangling pointers after SKB data realloc in seg6, from Andrea Mayer. 11) Missing free_netdev() in gemini driver, from Chuhong Yuan. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (56 commits) ipmr: Fix skb headroom in ipmr_get_route(). net: hns3: cleanup of stray struct hns3_link_mode_mapping net/smc: fix fastopen for non-blocking connect() rds: ib: update WR sizes when bringing up connection net: gemini: add missed free_netdev net: dsa: tag_8021q: Fix dsa_8021q_restore_pvid for an absent pvid seg6: fix skb transport_header after decap_and_validate() seg6: fix srh pointer in get_srh() net: stmmac: Use the correct style for SPDX License Identifier octeontx2-af: Use the correct style for SPDX License Identifier ptp: Extend the test program to check the external time stamp flags. mlx5: Reject requests to enable time stamping on both edges. igb: Reject requests that fail to enable time stamping on both edges. dp83640: Reject requests to enable time stamping on both edges. mv88e6xxx: Reject requests to enable time stamping on both edges. ptp: Introduce strict checking of external time stamp options. renesas: reject unsupported external timestamp flags mlx5: reject unsupported external timestamp flags igb: reject unsupported external timestamp flags dp83640: reject unsupported external timestamp flags ...
2019-11-16usb: typec: tcpm: Remove tcpc_config configuration mechanismHans de Goede1-41/+0
All configuration can and should be done through fwnodes instead of through the tcpc_config struct and there are no existing users left of struct tcpc_config, so lets remove it. Signed-off-by: Hans de Goede <[email protected]> Reviewed-by: Heikki Krogerus <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2019-11-16pinctrl/msm: Setup GPIO chip in hierarchyLina Iyer1-0/+13
Some GPIOs are marked as wakeup capable and are routed to another interrupt controller that is an always-domain and can detect interrupts even when most of the SoC is powered off. The wakeup interrupt controller wakes up the GIC and replays the interrupt at the GIC. Setup the TLMM irqchip in hierarchy with the wakeup interrupt controller and ensure the wakeup GPIOs are handled correctly. Co-developed-by: Maulik Shah <[email protected]> Signed-off-by: Lina Iyer <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Reviewed-by: Stephen Boyd <[email protected]> Link: https://lore.kernel.org/r/[email protected] ---- Changes in v2: - Address review comments - Fix Co-developed-by tag Changes in v1: - Address minor review comments - Remove redundant call to set irq handler - Move irq_domain_qcom_handle_wakeup() to this patch Changes in RFC v2: - Rebase on top of GPIO hierarchy support in linux-next - Set the chained irq handler for summary line
2019-11-16irqchip/qcom-pdc: Add irqdomain for wakeup capable GPIOsLina Iyer1-0/+21
Introduce a new domain for wakeup capable GPIOs. The domain can be requested using the bus token DOMAIN_BUS_WAKEUP. In the following patches, we will specify PDC as the wakeup-parent for the TLMM GPIO irqchip. Requesting a wakeup GPIO will setup the GPIO and the corresponding PDC interrupt as its parent. Co-developed-by: Stephen Boyd <[email protected]> Signed-off-by: Stephen Boyd <[email protected]> Signed-off-by: Lina Iyer <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Reviewed-by: Stephen Boyd <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2019-11-16genirq: Introduce irq_chip_get/set_parent_state callsMaulik Shah1-0/+6
On certain QTI chipsets some GPIOs are direct-connect interrupts to the GIC to be used as regular interrupt lines. When the GPIOs are not used for interrupt generation the interrupt line is disabled. But disabling the interrupt at GIC does not prevent the interrupt to be reported as pending at GIC_ISPEND. Later, when drivers call enable_irq() on the interrupt, an unwanted interrupt occurs. Introduce get and set methods for irqchip's parent to clear it's pending irq state. This then can be invoked by the GPIO interrupt controller on the parents in it hierarchy to clear the interrupt before enabling the interrupt. Signed-off-by: Maulik Shah <[email protected]> Signed-off-by: Lina Iyer <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Reviewed-by: Stephen Boyd <[email protected]> Link: https://lore.kernel.org/r/[email protected] [updated commit text and minor code fixes]
2019-11-16irqdomain: Add bus token DOMAIN_BUS_WAKEUPLina Iyer1-0/+1
A single controller can handle normal interrupts and wake-up interrupts independently, with a different numbering space. It is thus crucial to allow the driver for such a controller discriminate between the two. A simple way to do so is to tag the wake-up irqdomain with a "bus token" that indicates the wake-up domain. This slightly abuses the notion of bus, but also radically simplifies the design of such a driver. Between two evils, we choose the least damaging. Suggested-by: Stephen Boyd <[email protected]> Signed-off-by: Lina Iyer <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Reviewed-by: Stephen Boyd <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2019-11-15mm/memory_hotplug: fix try_offline_node()David Hildenbrand1-0/+1
try_offline_node() is pretty much broken right now: - The node span is updated when onlining memory, not when adding it. We ignore memory that was mever onlined. Bad. - We touch possible garbage memmaps. The pfn_to_nid(pfn) can easily trigger a kernel panic. Bad for memory that is offline but also bad for subsection hotadd with ZONE_DEVICE, whereby the memmap of the first PFN of a section might contain garbage. - Sections belonging to mixed nodes are not properly considered. As memory blocks might belong to multiple nodes, we would have to walk all pageblocks (or at least subsections) within present sections. However, we don't have a way to identify whether a memmap that is not online was initialized (relevant for ZONE_DEVICE). This makes things more complicated. Luckily, we can piggy pack on the node span and the nid stored in memory blocks. Currently, the node span is grown when calling move_pfn_range_to_zone() - e.g., when onlining memory, and shrunk when removing memory, before calling try_offline_node(). Sysfs links are created via link_mem_sections(), e.g., during boot or when adding memory. If the node still spans memory or if any memory block belongs to the nid, we don't set the node offline. As memory blocks that span multiple nodes cannot get offlined, the nid stored in memory blocks is reliable enough (for such online memory blocks, the node still spans the memory). Introduce for_each_memory_block() to efficiently walk all memory blocks. Note: We will soon stop shrinking the ZONE_DEVICE zone and the node span when removing ZONE_DEVICE memory to fix similar issues (access of garbage memmaps) - until we have a reliable way to identify whether these memmaps were properly initialized. This implies later, that once a node had ZONE_DEVICE memory, we won't be able to set a node offline - which should be acceptable. Since commit f1dd2cd13c4b ("mm, memory_hotplug: do not associate hotadded memory to zones until online") memory that is added is not assoziated with a zone/node (memmap not initialized). The introducing commit 60a5a19e7419 ("memory-hotplug: remove sysfs file of node") already missed that we could have multiple nodes for a section and that the zone/node span is updated when onlining pages, not when adding them. I tested this by hotplugging two DIMMs to a memory-less and cpu-less NUMA node. The node is properly onlined when adding the DIMMs. When removing the DIMMs, the node is properly offlined. Masayoshi Mizuma reported: : Without this patch, memory hotplug fails as panic: : : BUG: kernel NULL pointer dereference, address: 0000000000000000 : ... : Call Trace: : remove_memory_block_devices+0x81/0xc0 : try_remove_memory+0xb4/0x130 : __remove_memory+0xa/0x20 : acpi_memory_device_remove+0x84/0x100 : acpi_bus_trim+0x57/0x90 : acpi_bus_trim+0x2e/0x90 : acpi_device_hotplug+0x2b2/0x4d0 : acpi_hotplug_work_fn+0x1a/0x30 : process_one_work+0x171/0x380 : worker_thread+0x49/0x3f0 : kthread+0xf8/0x130 : ret_from_fork+0x35/0x40 [[email protected]: v3] Link: http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/[email protected] Fixes: 60a5a19e7419 ("memory-hotplug: remove sysfs file of node") Fixes: f1dd2cd13c4b ("mm, memory_hotplug: do not associate hotadded memory to zones until online") # visiable after d0dc12e86b319 Signed-off-by: David Hildenbrand <[email protected]> Tested-by: Masayoshi Mizuma <[email protected]> Cc: Tang Chen <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: "Rafael J. Wysocki" <[email protected]> Cc: Keith Busch <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: "Peter Zijlstra (Intel)" <[email protected]> Cc: Jani Nikula <[email protected]> Cc: Nayna Jain <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Stephen Rothwell <[email protected]> Cc: Dan Williams <[email protected]> Cc: Pavel Tatashin <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-11-15fork: extend clone3() to support setting a PIDAdrian Reber3-1/+7
The main motivation to add set_tid to clone3() is CRIU. To restore a process with the same PID/TID CRIU currently uses /proc/sys/kernel/ns_last_pid. It writes the desired (PID - 1) to ns_last_pid and then (quickly) does a clone(). This works most of the time, but it is racy. It is also slow as it requires multiple syscalls. Extending clone3() to support *set_tid makes it possible restore a process using CRIU without accessing /proc/sys/kernel/ns_last_pid and race free (as long as the desired PID/TID is available). This clone3() extension places the same restrictions (CAP_SYS_ADMIN) on clone3() with *set_tid as they are currently in place for ns_last_pid. The original version of this change was using a single value for set_tid. At the 2019 LPC, after presenting set_tid, it was, however, decided to change set_tid to an array to enable setting the PID of a process in multiple PID namespaces at the same time. If a process is created in a PID namespace it is possible to influence the PID inside and outside of the PID namespace. Details also in the corresponding selftest. To create a process with the following PIDs: PID NS level Requested PID 0 (host) 31496 1 42 2 1 For that example the two newly introduced parameters to struct clone_args (set_tid and set_tid_size) would need to be: set_tid[0] = 1; set_tid[1] = 42; set_tid[2] = 31496; set_tid_size = 3; If only the PIDs of the two innermost nested PID namespaces should be defined it would look like this: set_tid[0] = 1; set_tid[1] = 42; set_tid_size = 2; The PID of the newly created process would then be the next available free PID in the PID namespace level 0 (host) and 42 in the PID namespace at level 1 and the PID of the process in the innermost PID namespace would be 1. The set_tid array is used to specify the PID of a process starting from the innermost nested PID namespaces up to set_tid_size PID namespaces. set_tid_size cannot be larger then the current PID namespace level. Signed-off-by: Adrian Reber <[email protected]> Reviewed-by: Christian Brauner <[email protected]> Reviewed-by: Oleg Nesterov <[email protected]> Reviewed-by: Dmitry Safonov <[email protected]> Acked-by: Andrei Vagin <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Christian Brauner <[email protected]>
2019-11-15bpf: Support attaching tracing BPF program to other BPF programsAlexei Starovoitov2-0/+2
Allow FENTRY/FEXIT BPF programs to attach to other BPF programs of any type including their subprograms. This feature allows snooping on input and output packets in XDP, TC programs including their return values. In order to do that the verifier needs to track types not only of vmlinux, but types of other BPF programs as well. The verifier also needs to translate uapi/linux/bpf.h types used by networking programs into kernel internal BTF types used by FENTRY/FEXIT BPF programs. In some cases LLVM optimizations can remove arguments from BPF subprograms without adjusting BTF info that LLVM backend knows. When BTF info disagrees with actual types that the verifiers sees the BPF trampoline has to fallback to conservative and treat all arguments as u64. The FENTRY/FEXIT program can still attach to such subprograms, but it won't be able to recognize pointer types like 'struct sk_buff *' and it won't be able to pass them to bpf_skb_output() for dumping packets to user space. The FENTRY/FEXIT program would need to use bpf_probe_read_kernel() instead. The BPF_PROG_LOAD command is extended with attach_prog_fd field. When it's set to zero the attach_btf_id is one vmlinux BTF type ids. When attach_prog_fd points to previously loaded BPF program the attach_btf_id is BTF type id of main function or one of its subprograms. Signed-off-by: Alexei Starovoitov <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Song Liu <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2019-11-15bpf: Compare BTF types of functions arguments with actual typesAlexei Starovoitov2-0/+9
Make the verifier check that BTF types of function arguments match actual types passed into top-level BPF program and into BPF-to-BPF calls. If types match such BPF programs and sub-programs will have full support of BPF trampoline. If types mismatch the trampoline has to be conservative. It has to save/restore five program arguments and assume 64-bit scalars. Signed-off-by: Alexei Starovoitov <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Song Liu <[email protected]> Acked-by: Andrii Nakryiko <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2019-11-15bpf: Annotate context typesAlexei Starovoitov2-27/+62
Annotate BPF program context types with program-side type and kernel-side type. This type information is used by the verifier. btf_get_prog_ctx_type() is used in the later patches to verify that BTF type of ctx in BPF program matches to kernel expected ctx type. For example, the XDP program type is: BPF_PROG_TYPE(BPF_PROG_TYPE_XDP, xdp, struct xdp_md, struct xdp_buff) That means that XDP program should be written as: int xdp_prog(struct xdp_md *ctx) { ... } Signed-off-by: Alexei Starovoitov <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Song Liu <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2019-11-15bpf: Fix race in btf_resolve_helper_id()Alexei Starovoitov1-2/+3
btf_resolve_helper_id() caching logic is a bit racy, since under root the verifier can verify several programs in parallel. Fix it with READ/WRITE_ONCE. Fix the type as well, since error is also recorded. Fixes: a7658e1a4164 ("bpf: Check types of arguments passed into helpers") Signed-off-by: Alexei Starovoitov <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Song Liu <[email protected]> Acked-by: Andrii Nakryiko <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2019-11-15bpf: Introduce BPF trampolineAlexei Starovoitov1-0/+105
Introduce BPF trampoline concept to allow kernel code to call into BPF programs with practically zero overhead. The trampoline generation logic is architecture dependent. It's converting native calling convention into BPF calling convention. BPF ISA is 64-bit (even on 32-bit architectures). The registers R1 to R5 are used to pass arguments into BPF functions. The main BPF program accepts only single argument "ctx" in R1. Whereas CPU native calling convention is different. x86-64 is passing first 6 arguments in registers and the rest on the stack. x86-32 is passing first 3 arguments in registers. sparc64 is passing first 6 in registers. And so on. The trampolines between BPF and kernel already exist. BPF_CALL_x macros in include/linux/filter.h statically compile trampolines from BPF into kernel helpers. They convert up to five u64 arguments into kernel C pointers and integers. On 64-bit architectures this BPF_to_kernel trampolines are nops. On 32-bit architecture they're meaningful. The opposite job kernel_to_BPF trampolines is done by CAST_TO_U64 macros and __bpf_trace_##call() shim functions in include/trace/bpf_probe.h. They convert kernel function arguments into array of u64s that BPF program consumes via R1=ctx pointer. This patch set is doing the same job as __bpf_trace_##call() static trampolines, but dynamically for any kernel function. There are ~22k global kernel functions that are attachable via nop at function entry. The function arguments and types are described in BTF. The job of btf_distill_func_proto() function is to extract useful information from BTF into "function model" that architecture dependent trampoline generators will use to generate assembly code to cast kernel function arguments into array of u64s. For example the kernel function eth_type_trans has two pointers. They will be casted to u64 and stored into stack of generated trampoline. The pointer to that stack space will be passed into BPF program in R1. On x86-64 such generated trampoline will consume 16 bytes of stack and two stores of %rdi and %rsi into stack. The verifier will make sure that only two u64 are accessed read-only by BPF program. The verifier will also recognize the precise type of the pointers being accessed and will not allow typecasting of the pointer to a different type within BPF program. The tracing use case in the datacenter demonstrated that certain key kernel functions have (like tcp_retransmit_skb) have 2 or more kprobes that are always active. Other functions have both kprobe and kretprobe. So it is essential to keep both kernel code and BPF programs executing at maximum speed. Hence generated BPF trampoline is re-generated every time new program is attached or detached to maintain maximum performance. To avoid the high cost of retpoline the attached BPF programs are called directly. __bpf_prog_enter/exit() are used to support per-program execution stats. In the future this logic will be optimized further by adding support for bpf_stats_enabled_key inside generated assembly code. Introduction of preemptible and sleepable BPF programs will completely remove the need to call to __bpf_prog_enter/exit(). Detach of a BPF program from the trampoline should not fail. To avoid memory allocation in detach path the half of the page is used as a reserve and flipped after each attach/detach. 2k bytes is enough to call 40+ BPF programs directly which is enough for BPF tracing use cases. This limit can be increased in the future. BPF_TRACE_FENTRY programs have access to raw kernel function arguments while BPF_TRACE_FEXIT programs have access to kernel return value as well. Often kprobe BPF program remembers function arguments in a map while kretprobe fetches arguments from a map and analyzes them together with return value. BPF_TRACE_FEXIT accelerates this typical use case. Recursion prevention for kprobe BPF programs is done via per-cpu bpf_prog_active counter. In practice that turned out to be a mistake. It caused programs to randomly skip execution. The tracing tools missed results they were looking for. Hence BPF trampoline doesn't provide builtin recursion prevention. It's a job of BPF program itself and will be addressed in the follow up patches. BPF trampoline is intended to be used beyond tracing and fentry/fexit use cases in the future. For example to remove retpoline cost from XDP programs. Signed-off-by: Alexei Starovoitov <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Andrii Nakryiko <[email protected]> Acked-by: Song Liu <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2019-11-15bpf: Add bpf_arch_text_poke() helperAlexei Starovoitov1-0/+8
Add bpf_arch_text_poke() helper that is used by BPF trampoline logic to patch nops/calls in kernel text into calls into BPF trampoline and to patch calls/nops inside BPF programs too. Signed-off-by: Alexei Starovoitov <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Song Liu <[email protected]> Acked-by: Andrii Nakryiko <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2019-11-15bpf: Support doubleword alignment in bpf_jit_binary_allocIlya Leoshkevich1-2/+4
Currently passing alignment greater than 4 to bpf_jit_binary_alloc does not work: in such cases it silently aligns only to 4 bytes. On s390, in order to load a constant from memory in a large (>512k) BPF program, one must use lgrl instruction, whose memory operand must be aligned on an 8-byte boundary. This patch makes it possible to request 8-byte alignment from bpf_jit_binary_alloc, and also makes it issue a warning when an unsupported alignment is requested. Signed-off-by: Ilya Leoshkevich <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2019-11-15i2c: remove helpers for ref-counting clientsWolfram Sang1-3/+0
There are no in-tree users of these helpers anymore, and there shouldn't. Most use cases went away once the driver model started to refcount for us. There have been users like the media subsystem, but they all switched to better refcounting methods meanwhile. Media did this in 2008. Last user (IPMI) left 2018. Remove this cruft. Signed-off-by: Wolfram Sang <[email protected]> Reviewed-by: Niklas Söderlund <[email protected]> Reviewed-by: Jean Delvare <[email protected]> Tested-by: Luca Ceresoli <[email protected]> Reviewed-by: Luca Ceresoli <[email protected]> Reviewed-by: Geert Uytterhoeven <[email protected]>
2019-11-15new helper: lookup_positive_unlocked()Al Viro1-0/+1
Most of the callers of lookup_one_len_unlocked() treat negatives are ERR_PTR(-ENOENT). Provide a helper that would do just that. Note that a pinned positive dentry remains positive - it's ->d_inode is stable, etc.; a pinned _negative_ dentry can become positive at any point as long as you are not holding its parent at least shared. So using lookup_one_len_unlocked() needs to be careful; lookup_positive_unlocked() is safer and that's what the callers end up open-coding anyway. Signed-off-by: Al Viro <[email protected]>
2019-11-15fs/namei.c: pull positivity check into follow_managed()Al Viro1-0/+5
There are 4 callers; two proceed to check if result is positive and fail with ENOENT if it isn't; one (in handle_lookup_down()) is guaranteed to yield positive and one (in lookup_fast()) is _preceded_ by positivity check. However, follow_managed() on a negative dentry is a (fairly cheap) no-op on anything other than autofs. And negative autofs dentries are never hashed, so lookup_fast() is not going to run into one of those. Moreover, successful follow_managed() on a _positive_ dentry never yields a negative one (and we significantly rely upon that in callers of lookup_fast()). In other words, we can easily transpose the positivity check and the call of follow_managed() in lookup_fast(). And that allows to fold the positivity check *into* follow_managed(), simplifying life for the code downstream of its calls. Signed-off-by: Al Viro <[email protected]>
2019-11-15pipe: Allow pipes to have kernel-reserved slotsDavid Howells1-1/+5
Split pipe->ring_size into two numbers: (1) pipe->ring_size - indicates the hard size of the pipe ring. (2) pipe->max_usage - indicates the maximum number of pipe ring slots that userspace orchestrated events can fill. This allows for a pipe that is both writable by the general kernel notification facility and by userspace, allowing plenty of ring space for notifications to be added whilst preventing userspace from being able to pin too much unswappable kernel space. Signed-off-by: David Howells <[email protected]>
2019-11-15jbd2: make jbd2_handle_buffer_credits() handle reserved handlesJan Kara1-2/+6
The helper jbd2_handle_buffer_credits() doesn't correctly handle reserved handles which can lead to crashes. Fix it getting of journal pointer to work for reserved handles as well. Fixes: a9a8344ee171 ("ext4, jbd2: Provide accessor function for handle credits") Reported-by: Eric Biggers <[email protected]> Signed-off-by: Jan Kara <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Theodore Ts'o <[email protected]>
2019-11-15y2038: move itimer reset into itimer.cArnd Bergmann1-4/+5
Preparing for a change to the itimer internals, stop using the do_setitimer() symbol and instead use a new higher-level interface. The do_getitimer()/do_setitimer functions can now be made static, allowing the compiler to potentially produce better object code. Reviewed-by: Thomas Gleixner <[email protected]> Signed-off-by: Arnd Bergmann <[email protected]>
2019-11-15y2038: itimer: compat handling to itimer.cArnd Bergmann1-11/+4
The structure is only used in one place, moving it there simplifies the interface and helps with later changes to this code. Rename it to match the other time32 structures in the process. Reviewed-by: Thomas Gleixner <[email protected]> Signed-off-by: Arnd Bergmann <[email protected]>
2019-11-15y2038: time: avoid timespec usage in settimeofday()Arnd Bergmann1-1/+1
The compat_get_timeval() and timeval_valid() interfaces are deprecated and getting removed along with the definition of struct timeval itself. Change the two implementations of the settimeofday() system call to open-code these helpers and completely avoid references to timeval. The timeval_valid() call is not needed any more here, only a check to avoid overflowing tv_nsec during the multiplication, as there is another range check in do_sys_settimeofday64(). Tested-by: [email protected] Signed-off-by: Arnd Bergmann <[email protected]>
2019-11-15y2038: make ns_to_compat_timeval use __kernel_old_timevalArnd Bergmann1-2/+2
This gets us one step closer to removing 'struct timeval' from the kernel. We still keep __kernel_old_timeval for interfaces that we cannot fix otherwise, and ns_to_compat_timeval() is provably safe for interfaces that are legitimate users of __kernel_old_timeval on native kernels, so this is an obvious change. Signed-off-by: Arnd Bergmann <[email protected]>
2019-11-15y2038: socket: use __kernel_old_timespec instead of timespecArnd Bergmann1-2/+5
The 'timespec' type definition and helpers like ktime_to_timespec() or timespec64_to_timespec() should no longer be used in the kernel so we can remove them and avoid introducing y2038 issues in new code. Change the socket code that needs to pass a timespec to user space for backward compatibility to use __kernel_old_timespec instead. This type has the same layout but with a clearer defined name. Slightly reformat tcp_recv_timestamp() for consistency after the removal of timespec64_to_timespec(). Acked-by: Deepa Dinamani <[email protected]> Signed-off-by: Arnd Bergmann <[email protected]>
2019-11-15y2038: syscalls: change remaining timeval to __kernel_old_timevalArnd Bergmann1-5/+5
All of the remaining syscalls that pass a timeval (gettimeofday, utime, futimesat) can trivially be changed to pass a __kernel_old_timeval instead, which has a compatible layout, but avoids ambiguity with the timeval type in user space. Acked-by: Christian Brauner <[email protected]> Acked-by: Rafael J. Wysocki <[email protected]> Signed-off-by: Arnd Bergmann <[email protected]>
2019-11-15y2038: uapi: change __kernel_time_t to __kernel_old_time_tArnd Bergmann3-4/+4
This is mainly a patch for clarification, and to let us remove the time_t definition from the kernel to prevent new users from creeping in that might not be y2038-safe. All remaining uses of 'time_t' or '__kernel_time_t' are part of the user API that cannot be changed by that either have a replacement or that do not suffer from the y2038 overflow. Acked-by: Deepa Dinamani <[email protected]> Acked-by: Christian Brauner <[email protected]> Signed-off-by: Arnd Bergmann <[email protected]>
2019-11-15KVM: x86: deliver KVM IOAPIC scan request to target vCPUsNitesh Narayan Lal1-0/+2
In IOAPIC fixed delivery mode instead of flushing the scan requests to all vCPUs, we should only send the requests to vCPUs specified within the destination field. This patch introduces kvm_get_dest_vcpus_mask() API which retrieves an array of target vCPUs by using kvm_apic_map_get_dest_lapic() and then based on the vcpus_idx, it sets the bit in a bitmap. However, if the above fails kvm_get_dest_vcpus_mask() finds the target vCPUs by traversing all available vCPUs. Followed by setting the bits in the bitmap. If we had different vCPUs in the previous request for the same redirection table entry then bits corresponding to these vCPUs are also set. This to done to keep ioapic_handled_vectors synchronized. This bitmap is then eventually passed on to kvm_make_vcpus_request_mask() to generate a masked request only for the target vCPUs. This would enable us to reduce the latency overhead on isolated vCPUs caused by the IPI to process due to KVM_REQ_IOAPIC_SCAN. Suggested-by: Marcelo Tosatti <[email protected]> Signed-off-by: Nitesh Narayan Lal <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2019-11-15KVM: remember position in kvm->vcpus arrayRadim Krčmář1-8/+3
Fetching an index for any vcpu in kvm->vcpus array by traversing the entire array everytime is costly. This patch remembers the position of each vcpu in kvm->vcpus array by storing it in vcpus_idx under kvm_vcpu structure. Signed-off-by: Radim Krčmář <[email protected]> Signed-off-by: Nitesh Narayan Lal <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2019-11-15perf/core: Provide a kernel-internal interface to pause perf_eventLike Xu1-0/+5
Exporting perf_event_pause() as an external accessor for kernel users (such as KVM) who may do both disable perf_event and read count with just one time to hold perf_event_ctx_lock. Also the value could be reset optionally. Suggested-by: Peter Zijlstra <[email protected]> Signed-off-by: Like Xu <[email protected]> Acked-by: Peter Zijlstra <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2019-11-15perf/core: Provide a kernel-internal interface to recalibrate event periodLike Xu1-0/+5
Currently, perf_event_period() is used by user tools via ioctl. Based on naming convention, exporting perf_event_period() for kernel users (such as KVM) who may recalibrate the event period for their assigned counter according to their requirements. The perf_event_period() is an external accessor, just like the perf_event_{en,dis}able() and should thus use perf_event_ctx_lock(). Suggested-by: Kan Liang <[email protected]> Signed-off-by: Like Xu <[email protected]> Acked-by: Peter Zijlstra <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2019-11-15x86/hyperv: Initialize clockevents earlier in CPU onliningMichael Kelley1-0/+1
Hyper-V has historically initialized stimer-based clockevents late in the process of onlining a CPU because clockevents depend on stimer interrupts. In the original Hyper-V design, stimer interrupts generate a VMbus message, so the VMbus machinery must be running first, and VMbus can't be initialized until relatively late. On x86/64, LAPIC timer based clockevents are used during early initialization before VMbus and stimer-based clockevents are ready, and again during CPU offlining after the stimer clockevents have been shut down. Unfortunately, this design creates problems when offlining CPUs for hibernation or other purposes. stimer-based clockevents are shut down relatively early in the offlining process, so clockevents_unbind_device() must be used to fallback to the LAPIC-based clockevents for the remainder of the offlining process. Furthermore, the late initialization and early shutdown of stimer-based clockevents doesn't work well on ARM64 since there is no other timer like the LAPIC to fallback to. So CPU onlining and offlining doesn't work properly. Fix this by recognizing that stimer Direct Mode is the normal path for newer versions of Hyper-V on x86/64, and the only path on other architectures. With stimer Direct Mode, stimer interrupts don't require any VMbus machinery. stimer clockevents can be initialized and shut down consistent with how it is done for other clockevent devices. While the old VMbus-based stimer interrupts must still be supported for backward compatibility on x86, that mode of operation can be treated as legacy. So add a new Hyper-V stimer entry in the CPU hotplug state list, and use that new state when in Direct Mode. Update the Hyper-V clocksource driver to allocate and initialize stimer clockevents earlier during boot. Update Hyper-V initialization and the VMbus driver to use this new design. As a result, the LAPIC timer is no longer used during boot or CPU onlining/offlining and clockevents_unbind_device() is not called. But retain the old design as a legacy implementation for older versions of Hyper-V that don't support Direct Mode. Signed-off-by: Michael Kelley <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Dexuan Cui <[email protected]> Reviewed-by: Dexuan Cui <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2019-11-15mmc: sdio: fix wl1251 vendor idH. Nikolaus Schaller1-0/+2
v4.11-rc1 did introduce a patch series that rearranged the sdio quirks into a header file. Unfortunately this did forget to handle SDIO_VENDOR_ID_TI differently between wl1251 and wl1271 with the result that although the wl1251 was found on the sdio bus, the firmware did not load any more and there was no interface registration. This patch defines separate constants to be used by sdio quirks and drivers. Fixes: 884f38607897 ("mmc: core: move some sdio IDs out of quirks file") Signed-off-by: H. Nikolaus Schaller <[email protected]> Cc: <[email protected]> # v4.11+ Signed-off-by: Ulf Hansson <[email protected]>
2019-11-15mmc: host: omap-hsmmc: remove init_card pdata callback from pdataH. Nikolaus Schaller1-3/+0
Now as we have removed the last user (pandora_wl1251_init_card) of this callback, we can remove it from the hsmmc code. Suggested-by: Ulf Hansson <[email protected]> Signed-off-by: H. Nikolaus Schaller <[email protected]> Signed-off-by: Ulf Hansson <[email protected]>
2019-11-14ftrace: Add modify_ftrace_direct()Steven Rostedt (VMware)1-0/+6
Add a new function modify_ftrace_direct() that will allow a user to update an existing direct caller to a new trampoline, without missing hits due to unregistering one and then adding another. Link: https://lore.kernel.org/r/[email protected] Suggested-by: Alexei Starovoitov <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2019-11-14libnvdimm: Trivial comment fixIra Weiny1-1/+1
Signed-off-by: Ira Weiny <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Dan Williams <[email protected]>
2019-11-14vsock/vmci: register vmci_transport only when VMCI guest/host are activeStefano Garzarella1-0/+2
To allow other transports to be loaded with vmci_transport, we register the vmci_transport as G2H or H2G only when a VMCI guest or host is active. To do that, this patch adds a callback registered in the vmci driver that will be called when the host or guest becomes active. This callback will register the vmci_transport in the VSOCK core. Cc: Jorgen Hansen <[email protected]> Signed-off-by: Stefano Garzarella <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-11-14vsock: handle buffer_size sockopts in the coreStefano Garzarella1-14/+1
virtio_transport and vmci_transport handle the buffer_size sockopts in a very similar way. In order to support multiple transports, this patch moves this handling in the core to allow the user to change the options also if the socket is not yet assigned to any transport. This patch also adds the '.notify_buffer_size' callback in the 'struct virtio_transport' in order to inform the transport, when the buffer_size is changed by the user. It is also useful to limit the 'buffer_size' requested (e.g. virtio transports). Acked-by: Dexuan Cui <[email protected]> Reviewed-by: Stefan Hajnoczi <[email protected]> Reviewed-by: Jorgen Hansen <[email protected]> Signed-off-by: Stefano Garzarella <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-11-14vsock/virtio: add transport parameter to the virtio_transport_reset_no_sock()Stefano Garzarella1-1/+2
We are going to add 'struct vsock_sock *' parameter to virtio_transport_get_ops(). In some cases, like in the virtio_transport_reset_no_sock(), we don't have any socket assigned to the packet received, so we can't use the virtio_transport_get_ops(). In order to allow virtio_transport_reset_no_sock() to use the '.send_pkt' callback from the 'vhost_transport' or 'virtio_transport', we add the 'struct virtio_transport *' to it and to its caller: virtio_transport_recv_pkt(). We moved the 'vhost_transport' and 'virtio_transport' definition, to pass their address to the virtio_transport_recv_pkt(). Reviewed-by: Stefan Hajnoczi <[email protected]> Signed-off-by: Stefano Garzarella <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-11-14vsock: remove include/linux/vm_sockets.h fileStefano Garzarella1-13/+0
This header file now only includes the "uapi/linux/vm_sockets.h". We can include directly it when needed. Reviewed-by: Stefan Hajnoczi <[email protected]> Reviewed-by: Jorgen Hansen <[email protected]> Signed-off-by: Stefano Garzarella <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-11-14vsock: remove vm_sockets_get_local_cid()Stefano Garzarella1-2/+0
vm_sockets_get_local_cid() is only used in virtio_transport_common.c. We can replace it calling the virtio_transport_get_ops() and using the get_local_cid() callback registered by the transport. Reviewed-by: Stefan Hajnoczi <[email protected]> Reviewed-by: Jorgen Hansen <[email protected]> Signed-off-by: Stefano Garzarella <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-11-14tracing: Use seq_buf_hex_dump() to dump buffersPiotr Maziarz2-0/+9
Without this, buffers can be printed with __print_array macro that has no formatting options and can be hard to read. The other way is to mimic formatting capability with multiple calls of trace event with one call per row which gives performance impact and different timestamp in each row. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Piotr Maziarz <[email protected]> Signed-off-by: Cezary Rojewski <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2019-11-14seq_buf: Add printing formatted hex dumpsPiotr Maziarz1-0/+3
Provided function is an analogue of print_hex_dump(). Implementing this function in seq_buf allows using for multiple purposes (e.g. for tracing) and therefore prevents from code duplication in every layer that uses seq_buf. print_hex_dump() is an essential part of logging data to dmesg. Adding similar capability for other purposes is beneficial to all users. Example usage: seq_buf_hex_dump(seq, "", DUMP_PREFIX_OFFSET, 16, 4, buf, ARRAY_SIZE(buf), true); Example output: 00000000: 00000000 ffffff10 ffffff32 ffff3210 ........2....2.. 00000010: ffff3210 83d00437 c0700000 00000000 .2..7.....p..... 00000020: 02010004 0000000f 0000000f 00004002 .............@.. 00000030: 00000fff 00000000 ........ Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Piotr Maziarz <[email protected]> Signed-off-by: Cezary Rojewski <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2019-11-14lib/bsearch: Use generic type for comparator functionAndy Shevchenko1-1/+1
Comparator function type, cmp_func_t, is defined in the types.h, use it in bsearch() and, thus, add more sense to the corresponding comment in the code. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Andy Shevchenko <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>