aboutsummaryrefslogtreecommitdiff
path: root/arch/x86
AgeCommit message (Collapse)AuthorFilesLines
2018-10-23Merge tag 'gpio-v4.20-1' of ↵Linus Torvalds1-1/+0
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio Pull GPIO updates from Linus Walleij: "This is the bulk of GPIO changes for the v4.20 series: Core changes: - A patch series from Hans Verkuil to make it possible to enable/disable IRQs on a GPIO line at runtime and drive GPIO lines as output without having to put/get them from scratch. The irqchip callbacks have been improved so that they can use only the fastpatch callbacks to enable/disable irqs like any normal irqchip, especially the gpiod_lock_as_irq() has been improved to be callable in fastpath context. A bunch of rework had to be done to achieve this but it is a big win since I never liked to restrict this to slowpath. The only call requireing slowpath was try_module_get() and this is kept at the .request_resources() slowpath callback. In the GPIO CEC driver this is a big win sine a single line is used for both outgoing and incoming traffic, and this needs to use IRQs for incoming traffic while actively driving the line for outgoing traffic. - Janusz Krzysztofik improved the GPIO array API to pass a "cookie" (struct gpio_array) and a bitmap for setting or getting multiple GPIO lines at once. This improvement orginated in a specific need to speed up an OMAP1 driver and has led to a much better API and real performance gains when the state of the array can be used to bypass a lot of checks and code when we want things to go really fast. The previous code would minimize the number of calls down to the driver callbacks assuming the CPU speed was orders of magnitude faster than the I/O latency, but this assumption was wrong on several platforms: what we needed to do was to profile and improve the speed on the hot path of the array functions and this change is now completed. - Clean out the painful and hard to grasp BNF experiments from the device tree bindings. Future approaches are looking into using JSON schema for this purpose. (Rob Herring is floating a patch series.) New drivers: - The RCAR driver now supports r8a774a1 (RZ/G2M). - Synopsys GPIO via CREGs driver. Major improvements: - Modernization of the EP93xx driver to use irqdomain and other contemporary concepts. - The ingenic driver has been merged into the Ingenic pin control driver and removed from the GPIO subsystem. - Debounce support in the ftgpio010 driver" * tag 'gpio-v4.20-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio: (116 commits) gpio: Clarify kerneldoc on gpiochip_set_chained_irqchip() gpio: Remove unused 'irqchip' argument to gpiochip_set_cascaded_irqchip() gpio: Drop parent irq assignment during cascade setup mmc: pwrseq_simple: Fix incorrect handling of GPIO bitmap gpio: fix SNPS_CREG kconfig dependency warning gpiolib: Initialize gdev field before is used gpio: fix kernel-doc after devres.c file rename gpio: fix doc string for devm_gpiochip_add_data() to not talk about irq_chip gpio: syscon: Fix possible NULL ptr usage gpiolib: Show correct direction from the beginning pinctrl: msm: Use init_valid_mask exported function gpiolib: Add init_valid_mask exported function GPIO: add single-register GPIO via CREG driver dt-bindings: Document the Synopsys GPIO via CREG bindings gpio: mockup: use device properties instead of platform_data gpio: Slightly more helpful debugfs gpio: omap: Remove set but not used variable 'dev' gpio: omap: drop omap_gpio_list Accept partial 'gpio-line-names' property. gpio: omap: get rid of the conditional PM runtime calls ...
2018-10-23Merge tag 'regulator-v5.0' of ↵Linus Torvalds1-3/+14
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator Pull regulator updates from Mark Brown: "The biggest chunk of the regulator changes for this release outside of the new drivers is the conversion of the fixed regulator to use the GPIO descriptor API, there's a small addition to the GPIO API plus a bunch of updates to board files to implement it. This is some really welcome work from Linus Walleij that's had a bunch of review and has been sitting in -next for a while so I'm fairly happy there's no major issues. - Helpers for overlapping linear ranges. - Display opmode and consumer requested load in the regualtor_summary file in debugfs, plus a fix there. - Support for the fun and entertaining power off mechanism that the pfuze100 hardware implements. - Conversion of the fixed regulator API to use GPIO descriptors, including pulling in a bunch of patches to a bunch of board files. - New drivers for Cirrus Logic Lochnagar, Qualcomm PMS405, Rohm BD71847, ST PMIC1, and TI LM363x devices" * tag 'regulator-v5.0' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator: (36 commits) regulator: lochnagar: Use a consisent comment style for SPDX header regulator: bd718x7: Remove struct bd718xx_pmic regulator: Fetch enable gpiods nonexclusive regulator/gpio: Allow nonexclusive GPIO access regulator: lochnagar: Add support for the Cirrus Logic Lochnagar regulator: stpmic1: Return REGULATOR_MODE_INVALID for invalid mode regulator: stpmic1: add stpmic1 regulator driver dt-bindings: regulator: document stpmic1 pmic regulators regulator: axp20x: Mark expected switch fall-throughs regulator: bd718xx: fix build warning on x86_64 regulator: fixed: Default enable high on DT regulators regulator: bd718xx: rename bd71837 to 718xx regulator: bd718XX use pickable ranges regulator/mfd: bd718xx: rename bd71837/bd71847 common instances regulator: Support regulators where voltage ranges are selectable mfd: dt bindings: add BD71847 device-tree binding documentation regulator: dt bindings: add BD71847 device-tree binding documentation regulator/mfd: Support ROHM BD71847 power management IC regulator: da905{2,5}: Remove unnecessary array check regulator: qcom: Add PMS405 regulators ...
2018-10-22Merge tag 'dma-mapping-4.20' of git://git.infradead.org/users/hch/dma-mappingLinus Torvalds1-3/+3
Pull dma mapping updates from Christoph Hellwig: "First batch of dma-mapping changes for 4.20. There will be a second PR as some big changes were only applied just before the end of the merge window, and I want to give them a few more days in linux-next. Summary: - mostly more consolidation of the direct mapping code, including converting over hexagon, and merging the coherent and non-coherent code into a single dma_map_ops instance (me) - cleanups for the dma_configure/dma_unconfigure callchains (me) - better handling of dma_masks in odd setups (me, Alexander Duyck) - better debugging of passing vmalloc address to the DMA API (Stephen Boyd) - CMA command line parsing fix (He Zhe)" * tag 'dma-mapping-4.20' of git://git.infradead.org/users/hch/dma-mapping: (27 commits) dma-direct: respect DMA_ATTR_NO_WARN dma-mapping: translate __GFP_NOFAIL to DMA_ATTR_NO_WARN dma-direct: document the zone selection logic dma-debug: Check for drivers mapping invalid addresses in dma_map_single() dma-direct: fix return value of dma_direct_supported dma-mapping: move dma_default_get_required_mask under ifdef dma-direct: always allow dma mask <= physiscal memory size dma-direct: implement complete bus_dma_mask handling dma-direct: refine dma_direct_alloc zone selection dma-direct: add an explicit dma_direct_get_required_mask dma-mapping: make the get_required_mask method available unconditionally unicore32: remove swiotlb support Revert "dma-mapping: clear dev->dma_ops in arch_teardown_dma_ops" dma-mapping: support non-coherent devices in dma_common_get_sgtable dma-mapping: consolidate the dma mmap implementations dma-mapping: merge direct and noncoherent ops dma-mapping: move the dma_coherent flag to struct device MIPS: don't select DMA_MAYBE_COHERENT from DMA_PERDEV_COHERENT dma-mapping: add the missing ARCH_HAS_SYNC_DMA_FOR_CPU_ALL declaration dma-mapping: fix panic caused by passing empty cma command line argument ...
2018-10-22Merge tag 'for-4.20/block-20181021' of git://git.kernel.dk/linux-blockLinus Torvalds6-12/+6
Pull block layer updates from Jens Axboe: "This is the main pull request for block changes for 4.20. This contains: - Series enabling runtime PM for blk-mq (Bart). - Two pull requests from Christoph for NVMe, with items such as; - Better AEN tracking - Multipath improvements - RDMA fixes - Rework of FC for target removal - Fixes for issues identified by static checkers - Fabric cleanups, as prep for TCP transport - Various cleanups and bug fixes - Block merging cleanups (Christoph) - Conversion of drivers to generic DMA mapping API (Christoph) - Series fixing ref count issues with blkcg (Dennis) - Series improving BFQ heuristics (Paolo, et al) - Series improving heuristics for the Kyber IO scheduler (Omar) - Removal of dangerous bio_rewind_iter() API (Ming) - Apply single queue IPI redirection logic to blk-mq (Ming) - Set of fixes and improvements for bcache (Coly et al) - Series closing a hotplug race with sysfs group attributes (Hannes) - Set of patches for lightnvm: - pblk trace support (Hans) - SPDX license header update (Javier) - Tons of refactoring patches to cleanly abstract the 1.2 and 2.0 specs behind a common core interface. (Javier, Matias) - Enable pblk to use a common interface to retrieve chunk metadata (Matias) - Bug fixes (Various) - Set of fixes and updates to the blk IO latency target (Josef) - blk-mq queue number updates fixes (Jianchao) - Convert a bunch of drivers from the old legacy IO interface to blk-mq. This will conclude with the removal of the legacy IO interface itself in 4.21, with the rest of the drivers (me, Omar) - Removal of the DAC960 driver. The SCSI tree will introduce two replacement drivers for this (Hannes)" * tag 'for-4.20/block-20181021' of git://git.kernel.dk/linux-block: (204 commits) block: setup bounce bio_sets properly blkcg: reassociate bios when make_request() is called recursively blkcg: fix edge case for blk_get_rl() under memory pressure nvme-fabrics: move controller options matching to fabrics nvme-rdma: always have a valid trsvcid mtip32xx: fully switch to the generic DMA API rsxx: switch to the generic DMA API umem: switch to the generic DMA API sx8: switch to the generic DMA API sx8: remove dead IF_64BIT_DMA_IS_POSSIBLE code skd: switch to the generic DMA API ubd: remove use of blk_rq_map_sg nvme-pci: remove duplicate check drivers/block: Remove DAC960 driver nvme-pci: fix hot removal during error handling nvmet-fcloop: suppress a compiler warning nvme-core: make implicit seed truncation explicit nvmet-fc: fix kernel-doc headers nvme-fc: rework the request initialization code nvme-fc: introduce struct nvme_fcp_op_w_sgl ...
2018-10-22x86/stackprotector: Remove the call to boot_init_stack_canary() from ↵Christophe Leroy1-0/+2
cpu_startup_entry() The following commit: d7880812b359 ("idle: Add the stack canary init to cpu_startup_entry()") ... added an x86 specific boot_init_stack_canary() call to the generic cpu_startup_entry() as a temporary hack, with the intention to remove the #ifdef CONFIG_X86 later. More than 5 years later let's finally realize that plan! :-) While implementing stack protector support for PowerPC, we found that calling boot_init_stack_canary() is also needed for PowerPC which uses per task (TLS) stack canary like the X86. However, calling boot_init_stack_canary() would break architectures using a global stack canary (ARM, SH, MIPS and XTENSA). Instead of modifying the #ifdef CONFIG_X86 to an even messier: #if defined(CONFIG_X86) || defined(CONFIG_PPC) PowerPC implemented the call to boot_init_stack_canary() in the function calling cpu_startup_entry(). Let's try the same cleanup on the x86 side as well. On x86 we have two functions calling cpu_startup_entry(): - start_secondary() - cpu_bringup_and_idle() start_secondary() already calls boot_init_stack_canary(), so it's good, and this patch adds the call to boot_init_stack_canary() in cpu_bringup_and_idle(). I.e. now x86 catches up to the rest of the world and the ugly init sequence in init/main.c can be removed from cpu_startup_entry(). As a final benefit we can also remove the <linux/stackprotector.h> dependency from <linux/sched.h>. [ mingo: Improved the changelog a bit, added language explaining x86 borkage and sched.h change. ] Signed-off-by: Christophe Leroy <[email protected]> Reviewed-by: Juergen Gross <[email protected]> Cc: Boris Ostrovsky <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2018-10-22kprobes/x86: Use preempt_enable() in optimized_callback()Masami Hiramatsu1-1/+1
The following commit: a19b2e3d7839 ("kprobes/x86: Remove IRQ disabling from ftrace-based/optimized kprobes”) removed local_irq_save/restore() from optimized_callback(), the handler might be interrupted by the rescheduling interrupt and might be rescheduled - so we must not use the preempt_enable_no_resched() macro. Use preempt_enable() instead, to not lose preemption events. [ mingo: Improved the changelog. ] Reported-by: Nadav Amit <[email protected]> Signed-off-by: Masami Hiramatsu <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Cc: <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Fixes: a19b2e3d7839 ("kprobes/x86: Remove IRQ disabling from ftrace-based/optimized kprobes”) Link: http://lkml.kernel.org/r/154002887331.7627.10194920925792947001.stgit@devbox Signed-off-by: Ingo Molnar <[email protected]>
2018-10-21Merge branch 'regulator-4.20' into regulator-nextMark Brown1-3/+14
2018-10-21x86/mm: Kill stray kernel fault handling commentDave Hansen1-1/+0
I originally had matching user and kernel comments, but the kernel one got improved. Some errant conflict resolution kicked the commment somewhere wrong. Kill it. Reported-by: Eric W. Biederman <[email protected]> Signed-off-by: Dave Hansen <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Jann Horn <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Sean Christopherson <[email protected]> Cc: Thomas Gleixner <[email protected]> Fixes: aa37c51b94 ("x86/mm: Break out user address space handling") Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2018-10-20Merge branch 'pci/host-vmd'Bjorn Helgaas1-9/+3
- Fix VMD AERSID quirk Device ID matching (Jon Derrick) * pci/host-vmd: x86/PCI: Apply VMD's AERSID fixup generically
2018-10-19x86/kvm/nVMX: tweak shadow fieldsVitaly Kuznetsov2-9/+6
It seems we have some leftovers from times when 'unrestricted guest' wasn't exposed to L1. Stop shadowing GUEST_CS_{BASE,LIMIT,AR_SELECTOR} and GUEST_ES_BASE, shadow GUEST_SS_AR_BYTES as it was found that some hypervisors (e.g. Hyper-V without Enlightened VMCS) access it pretty often. Suggested-by: Paolo Bonzini <[email protected]> Signed-off-by: Vitaly Kuznetsov <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2018-10-19Merge tag 'kvmarm-for-v4.20' of ↵Paolo Bonzini16-40/+235
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm updates for 4.20 - Improved guest IPA space support (32 to 52 bits) - RAS event delivery for 32bit - PMU fixes - Guest entry hardening - Various cleanups
2018-10-19x86/intel_rdt: Prevent pseudo-locking from using stale pointersJithu Joseph4-12/+55
When the last CPU in an rdt_domain goes offline, its rdt_domain struct gets freed. Current pseudo-locking code is unaware of this scenario and tries to dereference the freed structure in a few places. Add checks to prevent pseudo-locking code from doing this. While further work is needed to seamlessly restore resource groups (not just pseudo-locking) to their configuration when the domain is brought back online, the immediate issue of invalid pointers is addressed here. Fixes: f4e80d67a5274 ("x86/intel_rdt: Resctrl files reflect pseudo-locked information") Fixes: 443810fe61605 ("x86/intel_rdt: Create debugfs files for pseudo-locking testing") Fixes: 746e08590b864 ("x86/intel_rdt: Create character device exposing pseudo-locked region") Fixes: 33dc3e410a0d9 ("x86/intel_rdt: Make CPU information accessible for pseudo-locked regions") Signed-off-by: Jithu Joseph <[email protected]> Signed-off-by: Reinette Chatre <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: https://lkml.kernel.org/r/231f742dbb7b00a31cc104416860e27dba6b072d.1539384145.git.reinette.chatre@intel.com
2018-10-19x86/swiotlb: Enable swiotlb for > 4GiG RAM on 32-bit kernelsChristoph Hellwig1-2/+0
We already build the swiotlb code for 32-bit kernels with PAE support, but the code to actually use swiotlb has only been enabled for 64-bit kernels for an unknown reason. Before Linux v4.18 we paper over this fact because the networking code, the SCSI layer and some random block drivers implemented their own bounce buffering scheme. [ mingo: Changelog fixes. ] Fixes: 21e07dba9fb1 ("scsi: reduce use of block bounce buffers") Fixes: ab74cfebafa3 ("net: remove the PCI_DMA_BUS_IS_PHYS check in illegal_highdma") Reported-by: Matthew Whitehead <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Matthew Whitehead <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2018-10-18Merge branches 'acpi-pm' and 'pm-sleep'Rafael J. Wysocki10-247/+334
* acpi-pm: ACPI / PM: LPIT: Register sysfs attributes based on FADT * pm-sleep: x86-32, hibernate: Adjust in_suspend after resumed on 32bit system x86-32, hibernate: Set up temporary text mapping for 32bit system x86-32, hibernate: Switch to relocated restore code during resume on 32bit system x86-32, hibernate: Switch to original page table after resumed x86-32, hibernate: Use the page size macro instead of constant value x86-32, hibernate: Use temp_pgt as the temporary page table x86, hibernate: Rename temp_level4_pgt to temp_pgt x86-32, hibernate: Enable CONFIG_ARCH_HIBERNATION_HEADER on 32bit system x86, hibernate: Extract the common code of 64/32 bit system x86-32/asm/power: Create stack frames in hibernate_asm_32.S PM / hibernate: Check the success of generating md5 digest before hibernation x86, hibernate: Fix nosave_regions setup for hibernation PM / sleep: Show freezing tasks that caused a suspend abort PM / hibernate: Documentation: fix image_size default value
2018-10-18kprobes, x86/ptrace.h: Make regs_get_kernel_stack_nth() not fault on bad stackSteven Rostedt (VMware)1-7/+35
Andy had some concerns about using regs_get_kernel_stack_nth() in a new function regs_get_kernel_argument() as if there's any error in the stack code, it could cause a bad memory access. To be on the safe side, call probe_kernel_read() on the stack address to be extra careful in accessing the memory. A helper function, regs_get_kernel_stack_nth_addr(), was added to just return the stack address (or NULL if not on the stack), that will be used to find the address (and could be used by other functions) and read the address with kernel_probe_read(). Requested-by: Andy Lutomirski <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]> Reviewed-by: Joel Fernandes (Google) <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2018-10-18x86/mcelog: Remove one mce_helper definitionSebastian Andrzej Siewior1-3/+0
Commit 5de97c9f6d85f ("x86/mce: Factor out and deprecate the /dev/mcelog driver") moved the old interface into one file including mce_helper definition as static and "extern". Remove one. Fixes: 5de97c9f6d85f ("x86/mce: Factor out and deprecate the /dev/mcelog driver") Signed-off-by: Sebastian Andrzej Siewior <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> CC: "H. Peter Anvin" <[email protected]> CC: Ingo Molnar <[email protected]> CC: Thomas Gleixner <[email protected]> CC: Tony Luck <[email protected]> CC: linux-edac <[email protected]> CC: x86-ml <[email protected]> Link: http://lkml.kernel.org/r/[email protected]
2018-10-17KVM: VMX: enable nested virtualization by defaultPaolo Bonzini1-1/+1
With live migration support and finally a good solution for exception event injection, nested VMX should be ready for having a stable userspace ABI. The results of syzkaller fuzzing are not perfect but not horrible either (and might be partially due to running on GCE, so that effectively we're testing three-level nesting on a fork of upstream KVM!). Enabling it by default seems like a nice way to conclude the 4.20 pull request. :) Unfortunately, enabling nested SVM in 2009 (commit 4b6e4dca701) was a bit premature. However, until live migration support is in place we can reasonably expect that it does not offer much in terms of ABI guarantees. Therefore we are still in time to break things and conform as much as possible to the interface used for VMX. Suggested-by: Jim Mattson <[email protected]> Suggested-by: Liran Alon <[email protected]> Reviewed-by: Liran Alon <[email protected]> Celebrated-by: Liran Alon <[email protected]> Celebrated-by: Wanpeng Li <[email protected]> Celebrated-by: Wincy Van <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2018-10-17KVM/x86: Use 32bit xor to clear registers in svm.cUros Bizjak2-16/+18
x86_64 zero-extends 32bit xor operation to a full 64bit register. Also add a comment and remove unnecessary instruction suffix in vmx.c Signed-off-by: Uros Bizjak <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2018-10-17kvm: x86: Introduce KVM_CAP_EXCEPTION_PAYLOADJim Mattson1-0/+5
This is a per-VM capability which can be enabled by userspace so that the faulting linear address will be included with the information about a pending #PF in L2, and the "new DR6 bits" will be included with the information about a pending #DB in L2. With this capability enabled, the L1 hypervisor can now intercept #PF before CR2 is modified. Under VMX, the L1 hypervisor can now intercept #DB before DR6 and DR7 are modified. When userspace has enabled KVM_CAP_EXCEPTION_PAYLOAD, it should generally provide an appropriate payload when injecting a #PF or #DB exception via KVM_SET_VCPU_EVENTS. However, to support restoring old checkpoints, this payload is not required. Note that bit 16 of the "new DR6 bits" is set to indicate that a debug exception (#DB) or a breakpoint exception (#BP) occurred inside an RTM region while advanced debugging of RTM transactional regions was enabled. This is the reverse of DR6.RTM, which is cleared in this scenario. This capability also enables exception.pending in struct kvm_vcpu_events, which allows userspace to distinguish between pending and injected exceptions. Reported-by: Jim Mattson <[email protected]> Suggested-by: Paolo Bonzini <[email protected]> Signed-off-by: Jim Mattson <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2018-10-17kvm: vmx: Defer setting of DR6 until #DB deliveryJim Mattson2-31/+62
When exception payloads are enabled by userspace (which is not yet possible) and a #DB is raised in L2, defer the setting of DR6 until later. Under VMX, this allows the L1 hypervisor to intercept the fault before DR6 is modified. Under SVM, DR6 is modified before L1 can intercept the fault (as has always been the case with DR7). Note that the payload associated with a #DB exception includes only the "new DR6 bits." When the payload is delievered, DR6.B0-B3 will be cleared and DR6.RTM will be set prior to merging in the new DR6 bits. Also note that bit 16 in the "new DR6 bits" is set to indicate that a debug exception (#DB) or a breakpoint exception (#BP) occurred inside an RTM region while advanced debugging of RTM transactional regions was enabled. Though the reverse of DR6.RTM, this makes the #DB payload field compatible with both the pending debug exceptions field under VMX and the exit qualification for #DB exceptions under VMX. Reported-by: Jim Mattson <[email protected]> Suggested-by: Paolo Bonzini <[email protected]> Signed-off-by: Jim Mattson <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2018-10-17kvm: x86: Defer setting of CR2 until #PF deliveryJim Mattson4-21/+62
When exception payloads are enabled by userspace (which is not yet possible) and a #PF is raised in L2, defer the setting of CR2 until the #PF is delivered. This allows the L1 hypervisor to intercept the fault before CR2 is modified. For backwards compatibility, when exception payloads are not enabled by userspace, kvm_multiple_exception modifies CR2 when the #PF exception is raised. Reported-by: Jim Mattson <[email protected]> Suggested-by: Paolo Bonzini <[email protected]> Signed-off-by: Jim Mattson <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2018-10-17kvm: x86: Add payload operands to kvm_multiple_exceptionJim Mattson1-7/+15
kvm_multiple_exception now takes two additional operands: has_payload and payload, so that updates to CR2 (and DR6 under VMX) can be delayed until the exception is delivered. This is necessary to properly emulate VMX or SVM hardware behavior for nested virtualization. The new behavior is triggered by vcpu->kvm->arch.exception_payload_enabled, which will (later) be set by a new per-VM capability, KVM_CAP_EXCEPTION_PAYLOAD. Reported-by: Jim Mattson <[email protected]> Suggested-by: Paolo Bonzini <[email protected]> Signed-off-by: Jim Mattson <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2018-10-17kvm: x86: Add exception payload fields to kvm_vcpu_eventsJim Mattson3-18/+51
The per-VM capability KVM_CAP_EXCEPTION_PAYLOAD (to be introduced in a later commit) adds the following fields to struct kvm_vcpu_events: exception_has_payload, exception_payload, and exception.pending. With this capability set, all of the details of vcpu->arch.exception, including the payload for a pending exception, are reported to userspace in response to KVM_GET_VCPU_EVENTS. With this capability clear, the original ABI is preserved, and the exception.injected field is set for either pending or injected exceptions. When userspace calls KVM_SET_VCPU_EVENTS with KVM_CAP_EXCEPTION_PAYLOAD clear, exception.injected is no longer translated to exception.pending. KVM_SET_VCPU_EVENTS can now only establish a pending exception when KVM_CAP_EXCEPTION_PAYLOAD is set. Reported-by: Jim Mattson <[email protected]> Suggested-by: Paolo Bonzini <[email protected]> Signed-off-by: Jim Mattson <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2018-10-17x86/fpu: Fix i486 + no387 boot crash by only saving FPU registers on context ↵Sebastian Andrzej Siewior1-1/+1
switch if there is an FPU Booting an i486 with "no387 nofxsr" ends with with the following crash: math_emulate: 0060:c101987d Kernel panic - not syncing: Math emulation needed in kernel on the first context switch in user land. The reason is that copy_fpregs_to_fpstate() tries FNSAVE which does not work as the FPU is turned off. This bug was introduced in: f1c8cd0176078 ("x86/fpu: Change fpu->fpregs_active users to fpu->fpstate_active") Add a check for X86_FEATURE_FPU before trying to save FPU registers (we have such a check in switch_fpu_finish() already). Signed-off-by: Sebastian Andrzej Siewior <[email protected]> Reviewed-by: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Fixes: f1c8cd0176078 ("x86/fpu: Change fpu->fpregs_active users to fpu->fpstate_active") Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2018-10-17x86/fpu: Remove second definition of fpu in __fpu__restore_sig()Sebastian Andrzej Siewior1-1/+0
Commit: c5bedc6847c3b ("x86/fpu: Get rid of PF_USED_MATH usage, convert it to fpu->fpstate_active") introduced the 'fpu' variable at top of __restore_xstate_sig(), which now shadows the other definition: arch/x86/kernel/fpu/signal.c:318:28: warning: symbol 'fpu' shadows an earlier one arch/x86/kernel/fpu/signal.c:271:20: originally declared here Remove the shadowed definition of 'fpu', as the two definitions are the same. Signed-off-by: Sebastian Andrzej Siewior <[email protected]> Reviewed-by: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Fixes: c5bedc6847c3b ("x86/fpu: Get rid of PF_USED_MATH usage, convert it to fpu->fpstate_active") Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2018-10-17x86/entry/64: Further improve paranoid_entry commentsAndy Lutomirski1-6/+4
Commit: 16561f27f94e ("x86/entry: Add some paranoid entry/exit CR3 handling comments") ... added some comments. This improves them a bit: - When I first read the new comments, it was unclear to me whether they were referring to the case where paranoid_entry interrupted other entry code or where paranoid_entry was itself interrupted. Clarify it. - Remove the EBX comment. We no longer use EBX as a SWAPGS indicator. Signed-off-by: Andy Lutomirski <[email protected]> Acked-by: Thomas Gleixner <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/c47daa1888dc2298e7e1d3f82bd76b776ea33393.1539542111.git.luto@kernel.org Signed-off-by: Ingo Molnar <[email protected]>
2018-10-17x86/entry/32: Clear the CS high bitsJan Kiszka1-6/+7
Even if not on an entry stack, the CS's high bits must be initialized because they are unconditionally evaluated in PARANOID_EXIT_TO_KERNEL_MODE. Failing to do so broke the boot on Galileo Gen2 and IOT2000 boards. [ bp: Make the commit message tone passive and impartial. ] Fixes: b92a165df17e ("x86/entry/32: Handle Entry from Kernel-Mode on Entry-Stack") Signed-off-by: Jan Kiszka <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Joerg Roedel <[email protected]> Acked-by: Joerg Roedel <[email protected]> CC: "H. Peter Anvin" <[email protected]> CC: Andrea Arcangeli <[email protected]> CC: Andy Lutomirski <[email protected]> CC: Boris Ostrovsky <[email protected]> CC: Brian Gerst <[email protected]> CC: Dave Hansen <[email protected]> CC: David Laight <[email protected]> CC: Denys Vlasenko <[email protected]> CC: Eduardo Valentin <[email protected]> CC: Greg KH <[email protected]> CC: Ingo Molnar <[email protected]> CC: Jiri Kosina <[email protected]> CC: Josh Poimboeuf <[email protected]> CC: Juergen Gross <[email protected]> CC: Linus Torvalds <[email protected]> CC: Peter Zijlstra <[email protected]> CC: Thomas Gleixner <[email protected]> CC: Will Deacon <[email protected]> CC: [email protected] CC: [email protected] CC: [email protected] CC: [email protected] CC: linux-mm <[email protected]> CC: x86-ml <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2018-10-17x86/kconfig: Remove redundant 'default n' lines from all x86 Kconfig'sBartlomiej Zolnierkiewicz3-9/+0
'default n' is the default value for any bool or tristate Kconfig setting so there is no need to write it explicitly. Also, since commit: f467c5640c29 ("kconfig: only write '# CONFIG_FOO is not set' for visible symbols") ... the Kconfig behavior is the same regardless of 'default n' being present or not: ... One side effect of (and the main motivation for) this change is making the following two definitions behave exactly the same: config FOO bool config FOO bool default n With this change, neither of these will generate a '# CONFIG_FOO is not set' line (assuming FOO isn't selected/implied). That might make it clearer to people that a bare 'default n' is redundant. ... Signed-off-by: Bartlomiej Zolnierkiewicz <[email protected]> Cc: Boris Ostrovsky <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Juergen Gross <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/20181016134217eucas1p2102984488b89178a865162553369025b%[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2018-10-17kvm: x86: Add has_payload and payload to kvm_queued_exceptionJim Mattson2-0/+10
The payload associated with a #PF exception is the linear address of the fault to be loaded into CR2 when the fault is delivered. The payload associated with a #DB exception is a mask of the DR6 bits to be set (or in the case of DR6.RTM, cleared) when the fault is delivered. Add fields has_payload and payload to kvm_queued_exception to track payloads for pending exceptions. The new fields are introduced here, but for now, they are just cleared. Reported-by: Jim Mattson <[email protected]> Suggested-by: Paolo Bonzini <[email protected]> Signed-off-by: Jim Mattson <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2018-10-17x86/kvm/nVMX: nested state migration for Enlightened VMCSVitaly Kuznetsov3-20/+65
Add support for get/set of nested state when Enlightened VMCS is in use. A new KVM_STATE_NESTED_EVMCS flag to indicate eVMCS on the vCPU was enabled is added. Signed-off-by: Vitaly Kuznetsov <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2018-10-17x86/kvm/nVMX: allow bare VMXON state migrationVitaly Kuznetsov1-7/+8
It is perfectly valid for a guest to do VMXON and not do VMPTRLD. This state needs to be preserved on migration. Cc: [email protected] Fixes: 8fcc4b5923af5de58b80b53a069453b135693304 Signed-off-by: Vitaly Kuznetsov <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2018-10-17x86/kvm/lapic: preserve gfn_to_hva_cache len on cache reinitVitaly Kuznetsov1-2/+10
vcpu->arch.pv_eoi is accessible through both HV_X64_MSR_VP_ASSIST_PAGE and MSR_KVM_PV_EOI_EN so on migration userspace may try to restore them in any order. Values match, however, kvm_lapic_enable_pv_eoi() uses different length: for Hyper-V case it's the whole struct hv_vp_assist_page, for KVM native case it is 8. In case we restore KVM-native MSR last cache will be reinitialized with len=8 so trying to access VP assist page beyond 8 bytes with kvm_read_guest_cached() will fail. Check if we re-initializing cache for the same address and preserve length in case it was greater. Signed-off-by: Vitaly Kuznetsov <[email protected]> Cc: [email protected] Signed-off-by: Paolo Bonzini <[email protected]>
2018-10-17x86/kvm/hyperv: don't clear VP assist pages on initVitaly Kuznetsov1-1/+7
VP assist pages may hold valuable data which needs to be preserved across migration. Clean PV EOI portion of the data on init, the guest is responsible for making sure there's no garbage in the rest. This will be used for nVMX migration, eVMCS address needs to be preserved. Signed-off-by: Vitaly Kuznetsov <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2018-10-17KVM: nVMX: optimize prepare_vmcs02{,_full} for Enlightened VMCS caseVitaly Kuznetsov1-53/+65
When Enlightened VMCS is in use by L1 hypervisor we can avoid vmwriting VMCS fields which did not change. Our first goal is to achieve minimal impact on traditional VMCS case so we're not wrapping each vmwrite() with an if-changed checker. We also can't utilize static keys as Enlightened VMCS usage is per-guest. This patch implements the simpliest solution: checking fields in groups. We skip single vmwrite() statements as doing the check will cost us something even in non-evmcs case and the win is tiny. Unfortunately, this makes prepare_vmcs02_full{,_full}() code Enlightened VMCS-dependent (and a bit ugly). Signed-off-by: Vitaly Kuznetsov <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2018-10-17KVM: nVMX: implement enlightened VMPTRLD and VMCLEARVitaly Kuznetsov1-7/+108
Per Hyper-V TLFS 5.0b: "The L1 hypervisor may choose to use enlightened VMCSs by writing 1 to the corresponding field in the VP assist page (see section 7.8.7). Another field in the VP assist page controls the currently active enlightened VMCS. Each enlightened VMCS is exactly one page (4 KB) in size and must be initially zeroed. No VMPTRLD instruction must be executed to make an enlightened VMCS active or current. After the L1 hypervisor performs a VM entry with an enlightened VMCS, the VMCS is considered active on the processor. An enlightened VMCS can only be active on a single processor at the same time. The L1 hypervisor can execute a VMCLEAR instruction to transition an enlightened VMCS from the active to the non-active state. Any VMREAD or VMWRITE instructions while an enlightened VMCS is active is unsupported and can result in unexpected behavior." Keep Enlightened VMCS structure for the current L2 guest permanently mapped from struct nested_vmx instead of mapping it every time. Suggested-by: Ladi Prosek <[email protected]> Signed-off-by: Vitaly Kuznetsov <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2018-10-17KVM: nVMX: add enlightened VMCS stateVitaly Kuznetsov1-17/+423
Adds hv_evmcs pointer and implement copy_enlightened_to_vmcs12() and copy_enlightened_to_vmcs12(). prepare_vmcs02()/prepare_vmcs02_full() separation is not valid for Enlightened VMCS, do full sync for now. Suggested-by: Ladi Prosek <[email protected]> Signed-off-by: Vitaly Kuznetsov <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2018-10-17KVM: nVMX: add KVM_CAP_HYPERV_ENLIGHTENED_VMCS capabilityVitaly Kuznetsov4-0/+64
Enlightened VMCS is opt-in. The current version does not contain all fields supported by nested VMX so we must not advertise the corresponding VMX features if enlightened VMCS is enabled. Userspace is given the enlightened VMCS version supported by KVM as part of enabling KVM_CAP_HYPERV_ENLIGHTENED_VMCS. The version is to be advertised to the nested hypervisor, currently done via a cpuid leaf for Hyper-V. Suggested-by: Ladi Prosek <[email protected]> Signed-off-by: Vitaly Kuznetsov <[email protected]> Reviewed-by: Liran Alon <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2018-10-17KVM: VMX: refactor evmcs_sanitize_exec_ctrls()Vitaly Kuznetsov1-61/+47
Split off EVMCS1_UNSUPPORTED_* macros so we can re-use them when enabling Enlightened VMCS for Hyper-V on KVM. Signed-off-by: Vitaly Kuznetsov <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2018-10-17KVM: hyperv: define VP assist page helpersLadi Prosek5-6/+29
The state related to the VP assist page is still managed by the LAPIC code in the pv_eoi field. Signed-off-by: Ladi Prosek <[email protected]> Signed-off-by: Vitaly Kuznetsov <[email protected]> Reviewed-by: Liran Alon <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2018-10-17KVM: x86: reintroduce pte_list_remove, but including mmu_spte_clear_track_bitsWei Yang1-3/+9
rmap_remove() removes the sptep after locating the correct rmap_head but, in several cases, the caller has already known the correct rmap_head. This patch introduces a new pte_list_remove(); because it is known that the spte is present (or it would not have an rmap_head), it is safe to remove the tracking bits without any previous check. Signed-off-by: Wei Yang <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2018-10-17KVM: x86: rename pte_list_remove to __pte_list_removeWei Yang1-8/+8
This is a patch preparing for further change. Signed-off-by: Wei Yang <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2018-10-17kvm/x86 : fix some typoPeng Hao1-2/+2
Signed-off-by: Peng Hao <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2018-10-17KVM/VMX: Change hv flush logic when ept tables are mismatched.Lan Tianyu1-7/+8
If ept table pointers are mismatched, flushing tlb for each vcpus via hv flush interface still helps to reduce vmexits which are triggered by IPI and INEPT emulation. Signed-off-by: Lan Tianyu <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2018-10-17KVM/x86: Use 32bit xor to clear registerUros Bizjak1-2/+2
x86_64 zero-extends 32bit xor to a full 64bit register. Use %k asm operand modifier to force 32bit register and save 268 bytes in kvm.o Signed-off-by: Uros Bizjak <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2018-10-17KVM/x86: Use assembly instruction mnemonics instead of .byte streamsUros Bizjak3-40/+21
Recently the minimum required version of binutils was changed to 2.20, which supports all VMX instruction mnemonics. The patch removes all .byte #defines and uses real instruction mnemonics instead. The compiler is now able to pass memory operand to the instruction, so there is no need for memory clobber anymore. Also, the compiler adds CC register clobber automatically to all extended asm clauses, so the patch also removes explicit CC clobber. The immediate benefit of the patch is removal of many unnecesary register moves, resulting in 1434 saved bytes in vmx.o: text data bss dec hex filename 151257 18246 8500 178003 2b753 vmx.o 152691 18246 8500 179437 2bced vmx-old.o Some examples of improvement include removal of unneeded moves of %rsp to %rax in front of invept and invvpid instructions: a57e: b9 01 00 00 00 mov $0x1,%ecx a583: 48 89 04 24 mov %rax,(%rsp) a587: 48 89 e0 mov %rsp,%rax a58a: 48 c7 44 24 08 00 00 movq $0x0,0x8(%rsp) a591: 00 00 a593: 66 0f 38 80 08 invept (%rax),%rcx to: a45c: 48 89 04 24 mov %rax,(%rsp) a460: b8 01 00 00 00 mov $0x1,%eax a465: 48 c7 44 24 08 00 00 movq $0x0,0x8(%rsp) a46c: 00 00 a46e: 66 0f 38 80 04 24 invept (%rsp),%rax and the ability to use more optimal registers and memory operands in the instruction: 8faa: 48 8b 44 24 28 mov 0x28(%rsp),%rax 8faf: 4c 89 c2 mov %r8,%rdx 8fb2: 0f 79 d0 vmwrite %rax,%rdx to: 8e7c: 44 0f 79 44 24 28 vmwrite 0x28(%rsp),%r8 Signed-off-by: Uros Bizjak <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2018-10-17KVM/x86: Fix invvpid and invept register operand size in 64-bit modeUros Bizjak1-2/+2
Register operand size of invvpid and invept instruction in 64-bit mode has always 64 bits. Adjust inline function argument type to reflect correct size. Signed-off-by: Uros Bizjak <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2018-10-17x86/kvm/mmu: check if MMU reconfiguration is needed in init_kvm_nested_mmu()Vitaly Kuznetsov1-0/+6
We don't use root page role for nested_mmu, however, optimizing out re-initialization in case nothing changed is still valuable as this is done for every nested vmentry. Signed-off-by: Vitaly Kuznetsov <[email protected]> Reviewed-by: Sean Christopherson <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2018-10-17x86/kvm/mmu: check if tdp/shadow MMU reconfiguration is neededVitaly Kuznetsov2-34/+63
MMU reconfiguration in init_kvm_tdp_mmu()/kvm_init_shadow_mmu() can be avoided if the source data used to configure it didn't change; enhance MMU extended role with the required fields and consolidate common code in kvm_calc_mmu_role_common(). Signed-off-by: Vitaly Kuznetsov <[email protected]> Reviewed-by: Sean Christopherson <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2018-10-17x86/kvm/nVMX: introduce source data cache for kvm_init_shadow_ept_mmu()Vitaly Kuznetsov2-15/+52
MMU re-initialization is expensive, in particular, update_permission_bitmask() and update_pkru_bitmask() are. Cache the data used to setup shadow EPT MMU and avoid full re-init when it is unchanged. Signed-off-by: Vitaly Kuznetsov <[email protected]> Sean Christopherson <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2018-10-17x86/kvm/mmu: make space for source data caching in struct kvm_mmuVitaly Kuznetsov3-9/+35
In preparation to MMU reconfiguration avoidance we need a space to cache source data. As this partially intersects with kvm_mmu_page_role, create 64bit sized union kvm_mmu_role holding both base and extended data. No functional change. Signed-off-by: Vitaly Kuznetsov <[email protected]> Reviewed-by: Sean Christopherson <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>