aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2019-12-06KVM: arm/arm64: Remove excessive permission check in ↵Jia He1-9/+0
kvm_arch_prepare_memory_region In kvm_arch_prepare_memory_region, arm kvm regards the memory region as writable if the flag has no KVM_MEM_READONLY, and the vm is readonly if !VM_WRITE. But there is common usage for setting kvm memory region as follows: e.g. qemu side (see the PROT_NONE flag) 1. mmap(NULL, size, PROT_NONE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0); memory_region_init_ram_ptr() 2. re mmap the above area with read/write authority. Such example is used in virtio-fs qemu codes which hasn't been upstreamed [1]. But seems we can't forbid this example. Without this patch, it will cause an EPERM during kvm_set_memory_region() and cause qemu boot crash. As told by Ard, "the underlying assumption is incorrect, i.e., that the value of vm_flags at this point in time defines how the VMA is used during its lifetime. There may be other cases where a VMA is created with VM_READ vm_flags that are changed to VM_READ|VM_WRITE later, and we are currently rejecting this use case as well." [1] https://gitlab.com/virtio-fs/qemu/blob/5a356e/hw/virtio/vhost-user-fs.c#L488 Suggested-by: Ard Biesheuvel <[email protected]> Signed-off-by: Jia He <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Reviewed-by: Christoffer Dall <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2019-12-06KVM: arm64: Don't log IMP DEF sysreg trapsMark Rutland1-0/+8
We don't intend to support IMPLEMENATION DEFINED system registers, but have to trap them (and emulate them as UNDEFINED). These traps aren't interesting to the system administrator or to the KVM developers, so let's not bother logging when we do so. Signed-off-by: Mark Rutland <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2019-12-06KVM: arm64: Sanely ratelimit sysreg messagesMark Rutland2-8/+21
Currently kvm_pr_unimpl() is ratelimited, so print_sys_reg_instr() won't spam the console. However, someof its callers try to print some contextual information with kvm_err(), which is not ratelimited. This means that in some cases the context may be printed without the sysreg encoding, which isn't all that useful. Let's ensure that both are consistently printed together and ratelimited, by refactoring print_sys_reg_instr() so that some callers can provide it with an arbitrary format string. Signed-off-by: Mark Rutland <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2019-12-06KVM: arm/arm64: vgic: Use wrapper function to lock/unlock all vcpus in ↵Miaohe Lin1-15/+4
kvm_vgic_create() Use wrapper function lock_all_vcpus()/unlock_all_vcpus() in kvm_vgic_create() to remove duplicated code dealing with locking and unlocking all vcpus in a vm. Signed-off-by: Miaohe Lin <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Reviewed-by: Eric Auger <[email protected]> Reviewed-by: Steven Price <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2019-12-06KVM: arm/arm64: vgic: Fix potential double free dist->spis in ↵Miaohe Lin1-0/+1
__kvm_vgic_destroy() In kvm_vgic_dist_init() called from kvm_vgic_map_resources(), if dist->vgic_model is invalid, dist->spis will be freed without set dist->spis = NULL. And in vgicv2 resources clean up path, __kvm_vgic_destroy() will be called to free allocated resources. And dist->spis will be freed again in clean up chain because we forget to set dist->spis = NULL in kvm_vgic_dist_init() failed path. So double free would happen. Signed-off-by: Miaohe Lin <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Reviewed-by: Eric Auger <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2019-12-06KVM: arm/arm64: Get rid of unused arg in cpu_init_hyp_mode()Miaohe Lin1-2/+2
As arg dummy is not really needed, there's no need to pass NULL when calling cpu_init_hyp_mode(). So clean it up. Fixes: 67f691976662 ("arm64: kvm: allows kvm cpu hotplug") Reviewed-by: Steven Price <[email protected]> Signed-off-by: Miaohe Lin <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2019-11-08Merge remote-tracking branch 'kvmarm/misc-5.5' into kvmarm/nextMarc Zyngier16-63/+97
2019-11-08KVM: arm64: Opportunistically turn off WFI trapping when using direct LPI ↵Marc Zyngier3-6/+11
injection Just like we do for WFE trapping, it can be useful to turn off WFI trapping when the physical CPU is not oversubscribed (that is, the vcpu is the only runnable process on this CPU) *and* that we're using direct injection of interrupts. The conditions are reevaluated on each vcpu_load(), ensuring that we don't switch to this mode on a busy system. On a GICv4 system, this has the effect of reducing the generation of doorbell interrupts to zero when the right conditions are met, which is a huge improvement over the current situation (where the doorbells are screaming if the CPU ever hits a blocking WFI). Signed-off-by: Marc Zyngier <[email protected]> Reviewed-by: Zenghui Yu <[email protected]> Reviewed-by: Christoffer Dall <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2019-11-08KVM: vgic-v4: Track the number of VLPIs per vcpuMarc Zyngier4-0/+8
In order to find out whether a vcpu is likely to be the target of VLPIs (and to further optimize the way we deal with those), let's track the number of VLPIs a vcpu can receive. This gets implemented with an atomic variable that gets incremented or decremented on map, unmap and move of a VLPI. Signed-off-by: Marc Zyngier <[email protected]> Reviewed-by: Zenghui Yu <[email protected]> Reviewed-by: Christoffer Dall <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2019-11-07KVM: arm/arm64: Let the timer expire in hardirq context on RTThomas Gleixner1-4/+4
The timers are canceled from an preempt-notifier which is invoked with disabled preemption which is not allowed on PREEMPT_RT. The timer callback is short so in could be invoked in hard-IRQ context on -RT. Let the timer expire on hard-IRQ context even on -RT. Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Sebastian Andrzej Siewior <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Tested-by: Julien Grall <[email protected]> Acked-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2019-10-29KVM: arm/arm64: vgic: Don't rely on the wrong pending tableZenghui Yu1-3/+3
It's possible that two LPIs locate in the same "byte_offset" but target two different vcpus, where their pending status are indicated by two different pending tables. In such a scenario, using last_byte_offset optimization will lead KVM relying on the wrong pending table entry. Let us use last_ptr instead, which can be treated as a byte index into a pending table and also, can be vcpu specific. Fixes: 280771252c1b ("KVM: arm64: vgic-v3: KVM_DEV_ARM_VGIC_SAVE_PENDING_TABLES") Cc: [email protected] Signed-off-by: Zenghui Yu <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Acked-by: Eric Auger <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2019-10-29KVM: arm/arm64: vgic: Fix some comments typoZenghui Yu3-3/+3
Fix various comments, including wrong function names, grammar mistakes and specification references. Signed-off-by: Zenghui Yu <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2019-10-29KVM: arm/arm64: vgic: Remove the declaration of kvm_send_userspace_msi()Zenghui Yu1-2/+0
The callsite of kvm_send_userspace_msi() is currently arch agnostic. There seems no reason to keep an extra declaration of it in arm_vgic.h (we already have one in include/linux/kvm_host.h). Remove it. Signed-off-by: Zenghui Yu <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Reviewed-by: Eric Auger <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2019-10-28KVM: arm64: vgic-v4: Move the GICv4 residency flow to be driven by vcpu_load/putMarc Zyngier8-42/+48
When the VHE code was reworked, a lot of the vgic stuff was moved around, but the GICv4 residency code did stay untouched, meaning that we come in and out of residency on each flush/sync, which is obviously suboptimal. To address this, let's move things around a bit: - Residency entry (flush) moves to vcpu_load - Residency exit (sync) moves to vcpu_put - On blocking (entry to WFI), we "put" - On unblocking (exit from WFI), we "load" Because these can nest (load/block/put/load/unblock/put, for example), we now have per-VPE tracking of the residency state. Additionally, vgic_v4_put gains a "need doorbell" parameter, which only gets set to true when blocking because of a WFI. This allows a finer control of the doorbell, which now also gets disabled as soon as it gets signaled. Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2019-10-28KVM: arm64: Don't set HCR_EL2.TVM when S2FWB is supportedChristoffer Dall2-3/+12
On CPUs that support S2FWB (Armv8.4+), KVM configures the stage 2 page tables to override the memory attributes of memory accesses, regardless of the stage 1 page table configurations, and also when the stage 1 MMU is turned off. This results in all memory accesses to RAM being cacheable, including during early boot of the guest. On CPUs without this feature, memory accesses were non-cacheable during boot until the guest turned on the stage 1 MMU, and we had to detect when the guest turned on the MMU, such that we could invalidate all cache entries and ensure a consistent view of memory with the MMU turned on. When the guest turned on the caches, we would call stage2_flush_vm() from kvm_toggle_cache(). However, stage2_flush_vm() walks all the stage 2 tables, and calls __kvm_flush-dcache_pte, which on a system with S2FWB does ... absolutely nothing. We can avoid that whole song and dance, and simply not set TVM when creating a VM on a system that has S2FWB. Signed-off-by: Christoffer Dall <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Reviewed-by: Mark Rutland <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2019-10-28KVM: arm/arm64: Show halt poll counters in debugfsChristian Borntraeger2-0/+8
ARM/ARM64 has counters halt_successful_poll, halt_attempted_poll, halt_poll_invalid, and halt_wakeup but never exposed those in debugfs. Signed-off-by: Christian Borntraeger <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2019-10-24Merge remote-tracking branch 'kvmarm/kvm-arm64/stolen-time' into ↵Marc Zyngier36-207/+774
kvmarm-master/next
2019-10-24KVM: arm64: Select TASK_DELAY_ACCT+TASKSTATS rather than SCHEDSTATSSteven Price1-1/+4
SCHEDSTATS requires DEBUG_KERNEL (and PROC_FS) and therefore isn't a good choice for enabling the scheduling statistics required for stolen time. Instead match the x86 configuration and select TASK_DELAY_ACCT and TASKSTATS. This adds the dependencies of NET && MULTIUSER for arm64 KVM. Suggested-by: Marc Zyngier <[email protected]> Fixes: 8564d6372a7d ("KVM: arm64: Support stolen time reporting via shared structure") Signed-off-by: Steven Price <[email protected]> Signed-off-by: Marc Zyngier <[email protected]>
2019-10-21arm64: Retrieve stolen time as paravirtualized guestSteven Price5-4/+155
Enable paravirtualization features when running under a hypervisor supporting the PV_TIME_ST hypercall. For each (v)CPU, we ask the hypervisor for the location of a shared page which the hypervisor will use to report stolen time to us. We set pv_time_ops to the stolen time function which simply reads the stolen value from the shared page for a VCPU. We guarantee single-copy atomicity using READ_ONCE which means we can also read the stolen time for another VCPU than the currently running one while it is potentially being updated by the hypervisor. Signed-off-by: Steven Price <[email protected]> Signed-off-by: Marc Zyngier <[email protected]>
2019-10-21arm/arm64: Make use of the SMCCC 1.1 wrapperSteven Price2-60/+34
Rather than directly choosing which function to use based on psci_ops.conduit, use the new arm_smccc_1_1 wrapper instead. In some cases we still need to do some operations based on the conduit, but the code duplication is removed. No functional change. Signed-off-by: Steven Price <[email protected]> Signed-off-by: Marc Zyngier <[email protected]>
2019-10-21arm/arm64: Provide a wrapper for SMCCC 1.1 callsSteven Price1-0/+45
SMCCC 1.1 calls may use either HVC or SMC depending on the PSCI conduit. Rather than coding this in every call site, provide a macro which uses the correct instruction. The macro also handles the case where no conduit is configured/available returning a not supported error in res, along with returning the conduit used for the call. This allow us to remove some duplicated code and will be useful later when adding paravirtualized time hypervisor calls. Signed-off-by: Steven Price <[email protected]> Acked-by: Will Deacon <[email protected]> Signed-off-by: Marc Zyngier <[email protected]>
2019-10-21KVM: arm64: Provide VCPU attributes for stolen timeSteven Price5-0/+79
Allow user space to inform the KVM host where in the physical memory map the paravirtualized time structures should be located. User space can set an attribute on the VCPU providing the IPA base address of the stolen time structure for that VCPU. This must be repeated for every VCPU in the VM. The address is given in terms of the physical address visible to the guest and must be 64 byte aligned. The guest will discover the address via a hypercall. Signed-off-by: Steven Price <[email protected]> Signed-off-by: Marc Zyngier <[email protected]>
2019-10-21KVM: Allow kvm_device_ops to be constSteven Price2-5/+5
Currently a kvm_device_ops structure cannot be const without triggering compiler warnings. However the structure doesn't need to be written to and, by marking it const, it can be read-only in memory. Add some more const keywords to allow this. Reviewed-by: Andrew Jones <[email protected]> Signed-off-by: Steven Price <[email protected]> Signed-off-by: Marc Zyngier <[email protected]>
2019-10-21KVM: arm64: Support stolen time reporting via shared structureSteven Price7-0/+111
Implement the service call for configuring a shared structure between a VCPU and the hypervisor in which the hypervisor can write the time stolen from the VCPU's execution time by other tasks on the host. User space allocates memory which is placed at an IPA also chosen by user space. The hypervisor then updates the shared structure using kvm_put_guest() to ensure single copy atomicity of the 64-bit value reporting the stolen time in nanoseconds. Whenever stolen time is enabled by the guest, the stolen time counter is reset. The stolen time itself is retrieved from the sched_info structure maintained by the Linux scheduler code. We enable SCHEDSTATS when selecting KVM Kconfig to ensure this value is meaningful. Signed-off-by: Steven Price <[email protected]> Signed-off-by: Marc Zyngier <[email protected]>
2019-10-21KVM: Implement kvm_put_guest()Steven Price1-0/+22
kvm_put_guest() is analogous to put_user() - it writes a single value to the guest physical address. The implementation is built upon put_user() and so it has the same single copy atomic properties. Signed-off-by: Steven Price <[email protected]> Signed-off-by: Marc Zyngier <[email protected]>
2019-10-21KVM: arm64: Implement PV_TIME_FEATURES callSteven Price7-1/+67
This provides a mechanism for querying which paravirtualized time features are available in this hypervisor. Also add the header file which defines the ABI for the paravirtualized time features we're about to add. Signed-off-by: Steven Price <[email protected]> Signed-off-by: Marc Zyngier <[email protected]>
2019-10-21KVM: arm/arm64: Factor out hypercall handling from PSCI codeChristoffer Dall9-87/+112
We currently intertwine the KVM PSCI implementation with the general dispatch of hypercall handling, which makes perfect sense because PSCI is the only category of hypercalls we support. However, as we are about to support additional hypercalls, factor out this functionality into a separate hypercall handler file. Signed-off-by: Christoffer Dall <[email protected]> [[email protected]: rebased] Reviewed-by: Andrew Jones <[email protected]> Signed-off-by: Steven Price <[email protected]> Signed-off-by: Marc Zyngier <[email protected]>
2019-10-21KVM: arm64: Document PV-time interfaceSteven Price2-0/+94
Introduce a paravirtualization interface for KVM/arm64 based on the "Arm Paravirtualized Time for Arm-Base Systems" specification DEN 0057A. This only adds the details about "Stolen Time" as the details of "Live Physical Time" have not been fully agreed. User space can specify a reserved area of memory for the guest and inform KVM to populate the memory with information on time that the host kernel has stolen from the guest. A hypercall interface is provided for the guest to interrogate the hypervisor's support for this interface and the location of the shared memory structures. Signed-off-by: Steven Price <[email protected]> Signed-off-by: Marc Zyngier <[email protected]>
2019-10-21Merge remote-tracking branch 'arm64/for-next/smccc-conduit-cleanup' into ↵Marc Zyngier8-60/+57
kvm-arm64/stolen-time
2019-10-21KVM: arm/arm64: Allow user injection of external data abortsChristoffer Dall8-5/+49
In some scenarios, such as buggy guest or incorrect configuration of the VMM and firmware description data, userspace will detect a memory access to a portion of the IPA, which is not mapped to any MMIO region. For this purpose, the appropriate action is to inject an external abort to the guest. The kernel already has functionality to inject an external abort, but we need to wire up a signal from user space that lets user space tell the kernel to do this. It turns out, we already have the set event functionality which we can perfectly reuse for this. Signed-off-by: Christoffer Dall <[email protected]> Signed-off-by: Marc Zyngier <[email protected]>
2019-10-21KVM: arm/arm64: Allow reporting non-ISV data aborts to userspaceChristoffer Dall9-1/+96
For a long time, if a guest accessed memory outside of a memslot using any of the load/store instructions in the architecture which doesn't supply decoding information in the ESR_EL2 (the ISV bit is not set), the kernel would print the following message and terminate the VM as a result of returning -ENOSYS to userspace: load/store instruction decoding not implemented The reason behind this message is that KVM assumes that all accesses outside a memslot is an MMIO access which should be handled by userspace, and we originally expected to eventually implement some sort of decoding of load/store instructions where the ISV bit was not set. However, it turns out that many of the instructions which don't provide decoding information on abort are not safe to use for MMIO accesses, and the remaining few that would potentially make sense to use on MMIO accesses, such as those with register writeback, are not used in practice. It also turns out that fetching an instruction from guest memory can be a pretty horrible affair, involving stopping all CPUs on SMP systems, handling multiple corner cases of address translation in software, and more. It doesn't appear likely that we'll ever implement this in the kernel. What is much more common is that a user has misconfigured his/her guest and is actually not accessing an MMIO region, but just hitting some random hole in the IPA space. In this scenario, the error message above is almost misleading and has led to a great deal of confusion over the years. It is, nevertheless, ABI to userspace, and we therefore need to introduce a new capability that userspace explicitly enables to change behavior. This patch introduces KVM_CAP_ARM_NISV_TO_USER (NISV meaning Non-ISV) which does exactly that, and introduces a new exit reason to report the event to userspace. User space can then emulate an exception to the guest, restart the guest, suspend the guest, or take any other appropriate action as per the policy of the running system. Reported-by: Heinrich Schuchardt <[email protected]> Signed-off-by: Christoffer Dall <[email protected]> Reviewed-by: Alexander Graf <[email protected]> Signed-off-by: Marc Zyngier <[email protected]>
2019-10-14firmware: arm_sdei: use common SMCCC_CONDUIT_*Mark Rutland3-13/+8
Now that we have common definitions for SMCCC conduits, move the SDEI code over to them, and remove the SDEI-specific definitions. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <[email protected]> Acked-by: Lorenzo Pieralisi <[email protected]> Acked-by: James Morse <[email protected]> Acked-by: Will Deacon <[email protected]> Signed-off-by: Catalin Marinas <[email protected]>
2019-10-14firmware/psci: use common SMCCC_CONDUIT_*Mark Rutland2-23/+11
Now that we have common SMCCC_CONDUIT_* definitions, migrate the PSCI code over to them, and kill off the old PSCI_CONDUIT_* definitions. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <[email protected]> Acked-by: Lorenzo Pieralisi <[email protected]> Acked-by: Will Deacon <[email protected]> Signed-off-by: Catalin Marinas <[email protected]>
2019-10-14arm: spectre-v2: use arm_smccc_1_1_get_conduit()Mark Rutland1-7/+3
Now that we have arm_smccc_1_1_get_conduit(), we can hide the PSCI implementation details from the arm spectre-v2 code, so let's do so. As arm_smccc_1_1_get_conduit() implicitly checks that the SMCCC version is at least SMCCC_VERSION_1_1, we no longer need to check this explicitly where switch statements have a default case. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <[email protected]> Cc: Marc Zyngier <[email protected]> Cc: Russell King <[email protected]> Signed-off-by: Catalin Marinas <[email protected]>
2019-10-14arm64: errata: use arm_smccc_1_1_get_conduit()Mark Rutland1-25/+12
Now that we have arm_smccc_1_1_get_conduit(), we can hide the PSCI implementation details from the arm64 cpu errata code, so let's do so. As arm_smccc_1_1_get_conduit() implicitly checks that the SMCCC version is at least SMCCC_VERSION_1_1, we no longer need to check this explicitly where switch statements have a default case, e.g. in has_ssbd_mitigation(). There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <[email protected]> Cc: Lorenzo Pieralisi <[email protected]> Cc: Will Deacon <[email protected]> Cc: Marc Zyngier <[email protected]> Cc: Suzuki K Poulose <[email protected]> Signed-off-by: Catalin Marinas <[email protected]>
2019-10-14arm/arm64: smccc/psci: add arm_smccc_1_1_get_conduit()Mark Rutland2-0/+31
SMCCC callers are currently amassing a collection of enums for the SMCCC conduit, and are having to dig into the PSCI driver's internals in order to figure out what to do. Let's clean this up, with common SMCCC_CONDUIT_* definitions, and an arm_smccc_1_1_get_conduit() helper that abstracts the PSCI driver's internal state. We can kill off the PSCI_CONDUIT_* definitions once we've migrated users over to the new interface. Signed-off-by: Mark Rutland <[email protected]> Acked-by: Lorenzo Pieralisi <[email protected]> Acked-by: Will Deacon <[email protected]> Signed-off-by: Catalin Marinas <[email protected]>
2019-10-13Linux 5.4-rc3Linus Torvalds1-1/+1
2019-10-13Merge tag 'trace-v5.4-rc2' of ↵Linus Torvalds15-132/+223
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixes from Steven Rostedt: "A few tracing fixes: - Remove lockdown from tracefs itself and moved it to the trace directory. Have the open functions there do the lockdown checks. - Fix a few races with opening an instance file and the instance being deleted (Discovered during the lockdown updates). Kept separate from the clean up code such that they can be backported to stable easier. - Clean up and consolidated the checks done when opening a trace file, as there were multiple checks that need to be done, and it did not make sense having them done in each open instance. - Fix a regression in the record mcount code. - Small hw_lat detector tracer fixes. - A trace_pipe read fix due to not initializing trace_seq" * tag 'trace-v5.4-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Initialize iter->seq after zeroing in tracing_read_pipe() tracing/hwlat: Don't ignore outer-loop duration when calculating max_latency tracing/hwlat: Report total time spent in all NMIs during the sample recordmcount: Fix nop_mcount() function tracing: Do not create tracefs files if tracefs lockdown is in effect tracing: Add locked_down checks to the open calls of files created for tracefs tracing: Add tracing_check_open_get_tr() tracing: Have trace events system open call tracing_open_generic_tr() tracing: Get trace_array reference for available_tracers files ftrace: Get a reference counter for the trace_array on filter files tracefs: Revert ccbd54ff54e8 ("tracefs: Restrict tracefs when the kernel is locked down")
2019-10-13Merge tag 'hwmon-for-v5.4-rc3' of ↵Linus Torvalds5-9/+47
git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging Pull hwmon fixes from Guenter Roeck: - Update/fix inspur-ipsps1 and k10temp Documentation - Fix nct7904 driver - Fix HWMON_P_MIN_ALARM mask in hwmon core * tag 'hwmon-for-v5.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging: hwmon: docs: Extend inspur-ipsps1 title underline hwmon: (nct7904) Add array fan_alarm and vsen_alarm to store the alarms in nct7904_data struct. docs: hwmon: Include 'inspur-ipsps1.rst' into docs hwmon: Fix HWMON_P_MIN_ALARM mask hwmon: (k10temp) Update documentation and add temp2_input info hwmon: (nct7904) Fix the incorrect value of vsen_mask in nct7904_data struct
2019-10-13Merge tag 'fixes-for-5.4-rc3' of ↵Linus Torvalds2-4/+3
git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux Pull MTD fixes from Richard Weinberger: "Two fixes for MTD: - spi-nor: Fix for a regression in write_sr() - rawnand: Regression fix for the au1550nd driver" * tag 'fixes-for-5.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux: mtd: rawnand: au1550nd: Fix au_read_buf16() prototype mtd: spi-nor: Fix direction of the write_sr() transfer
2019-10-13Merge tag 'for-linus-20191012' of git://git.kernel.dk/linux-blockLinus Torvalds1-17/+20
Pull io_uring fix from Jens Axboe: "Single small fix for a regression in the sequence logic for linked commands" * tag 'for-linus-20191012' of git://git.kernel.dk/linux-block: io_uring: fix sequence logic for timeout requests
2019-10-12tracing: Initialize iter->seq after zeroing in tracing_read_pipe()Petr Mladek1-0/+1
A customer reported the following softlockup: [899688.160002] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [test.sh:16464] [899688.160002] CPU: 0 PID: 16464 Comm: test.sh Not tainted 4.12.14-6.23-azure #1 SLE12-SP4 [899688.160002] RIP: 0010:up_write+0x1a/0x30 [899688.160002] Kernel panic - not syncing: softlockup: hung tasks [899688.160002] RIP: 0010:up_write+0x1a/0x30 [899688.160002] RSP: 0018:ffffa86784d4fde8 EFLAGS: 00000257 ORIG_RAX: ffffffffffffff12 [899688.160002] RAX: ffffffff970fea00 RBX: 0000000000000001 RCX: 0000000000000000 [899688.160002] RDX: ffffffff00000001 RSI: 0000000000000080 RDI: ffffffff970fea00 [899688.160002] RBP: ffffffffffffffff R08: ffffffffffffffff R09: 0000000000000000 [899688.160002] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8b59014720d8 [899688.160002] R13: ffff8b59014720c0 R14: ffff8b5901471090 R15: ffff8b5901470000 [899688.160002] tracing_read_pipe+0x336/0x3c0 [899688.160002] __vfs_read+0x26/0x140 [899688.160002] vfs_read+0x87/0x130 [899688.160002] SyS_read+0x42/0x90 [899688.160002] do_syscall_64+0x74/0x160 It caught the process in the middle of trace_access_unlock(). There is no loop. So, it must be looping in the caller tracing_read_pipe() via the "waitagain" label. Crashdump analyze uncovered that iter->seq was completely zeroed at this point, including iter->seq.seq.size. It means that print_trace_line() was never able to print anything and there was no forward progress. The culprit seems to be in the code: /* reset all but tr, trace, and overruns */ memset(&iter->seq, 0, sizeof(struct trace_iterator) - offsetof(struct trace_iterator, seq)); It was added by the commit 53d0aa773053ab182877 ("ftrace: add logic to record overruns"). It was v2.6.27-rc1. It was the time when iter->seq looked like: struct trace_seq { unsigned char buffer[PAGE_SIZE]; unsigned int len; }; There was no "size" variable and zeroing was perfectly fine. The solution is to reinitialize the structure after or without zeroing. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Petr Mladek <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2019-10-12tracing/hwlat: Don't ignore outer-loop duration when calculating max_latencySrivatsa S. Bhat (VMware)1-0/+2
max_latency is intended to record the maximum ever observed hardware latency, which may occur in either part of the loop (inner/outer). So we need to also consider the outer-loop sample when updating max_latency. Link: http://lkml.kernel.org/r/157073345463.17189.18124025522664682811.stgit@srivatsa-ubuntu Fixes: e7c15cd8a113 ("tracing: Added hardware latency tracer") Cc: [email protected] Signed-off-by: Srivatsa S. Bhat (VMware) <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2019-10-12tracing/hwlat: Report total time spent in all NMIs during the sampleSrivatsa S. Bhat (VMware)1-1/+1
nmi_total_ts is supposed to record the total time spent in *all* NMIs that occur on the given CPU during the (active portion of the) sampling window. However, the code seems to be overwriting this variable for each NMI, thereby only recording the time spent in the most recent NMI. Fix it by accumulating the duration instead. Link: http://lkml.kernel.org/r/157073343544.17189.13911783866738671133.stgit@srivatsa-ubuntu Fixes: 7b2c86250122 ("tracing: Add NMI tracing in hwlat detector") Cc: [email protected] Signed-off-by: Srivatsa S. Bhat (VMware) <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2019-10-12recordmcount: Fix nop_mcount() functionSteven Rostedt (VMware)1-4/+1
The removal of the longjmp code in recordmcount.c mistakenly made the return of make_nop() being negative an exit of nop_mcount(). It should not exit the routine, but instead just not process that part of the code. By exiting with an error code, it would cause the update of recordmcount to fail some files which would fail the build if ftrace function tracing was enabled. Link: http://lkml.kernel.org/r/[email protected] Reported-by: Uwe Kleine-König <[email protected]> Tested-by: Uwe Kleine-König <[email protected]> Fixes: 3f1df12019f3 ("recordmcount: Rewrite error/success handling") Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2019-10-12tracing: Do not create tracefs files if tracefs lockdown is in effectSteven Rostedt (VMware)1-0/+4
If on boot up, lockdown is activated for tracefs, don't even bother creating the files. This can also prevent instances from being created if lockdown is in effect. Link: http://lkml.kernel.org/r/CAHk-=whC6Ji=fWnjh2+eS4b15TnbsS4VPVtvBOwCy1jjEG_JHQ@mail.gmail.com Suggested-by: Linus Torvalds <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2019-10-12tracing: Add locked_down checks to the open calls of files created for tracefsSteven Rostedt (VMware)10-4/+98
Added various checks on open tracefs calls to see if tracefs is in lockdown mode, and if so, to return -EPERM. Note, the event format files (which are basically standard on all machines) as well as the enabled_functions file (which shows what is currently being traced) are not lockde down. Perhaps they should be, but it seems counter intuitive to lockdown information to help you know if the system has been modified. Link: http://lkml.kernel.org/r/CAHk-=wj7fGPKUspr579Cii-w_y60PtRaiDgKuxVtBAMK0VNNkA@mail.gmail.com Suggested-by: Linus Torvalds <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2019-10-12tracing: Add tracing_check_open_get_tr()Steven Rostedt (VMware)6-60/+81
Currently, most files in the tracefs directory test if tracing_disabled is set. If so, it should return -ENODEV. The tracing_disabled is called when tracing is found to be broken. Originally it was done in case the ring buffer was found to be corrupted, and we wanted to prevent reading it from crashing the kernel. But it's also called if a tracing selftest fails on boot. It's a one way switch. That is, once it is triggered, tracing is disabled until reboot. As most tracefs files can also be used by instances in the tracefs directory, they need to be carefully done. Each instance has a trace_array associated to it, and when the instance is removed, the trace_array is freed. But if an instance is opened with a reference to the trace_array, then it requires looking up the trace_array to get its ref counter (as there could be a race with it being deleted and the open itself). Once it is found, a reference is added to prevent the instance from being removed (and the trace_array associated with it freed). Combine the two checks (tracing_disabled and trace_array_get()) into a single helper function. This will also make it easier to add lockdown to tracefs later. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2019-10-12tracing: Have trace events system open call tracing_open_generic_tr()Steven Rostedt (VMware)3-15/+5
Instead of having the trace events system open call open code the taking of the trace_array descriptor (with trace_array_get()) and then calling trace_open_generic(), have it use the tracing_open_generic_tr() that does the combination of the two. This requires making tracing_open_generic_tr() global. Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2019-10-12tracing: Get trace_array reference for available_tracers filesSteven Rostedt (VMware)1-2/+15
As instances may have different tracers available, we need to look at the trace_array descriptor that shows the list of the available tracers for the instance. But there's a race between opening the file and an admin deleting the instance. The trace_array_get() needs to be called before accessing the trace_array. Cc: [email protected] Fixes: 607e2ea167e56 ("tracing: Set up infrastructure to allow tracers for instances") Signed-off-by: Steven Rostedt (VMware) <[email protected]>