aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2020-01-27Merge branch 'for-5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wqLinus Torvalds2-23/+29
Pull workqueue updates from Tejun Heo: "Just a couple tracepoint patches" * 'for-5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: remove workqueue_work event class workqueue: add worker function to workqueue_execute_end tracepoint
2020-01-27io-wq: make the io_wq ref countedJens Axboe1-1/+10
In preparation for sharing an io-wq across different users, add a reference count that manages destruction of it. Reviewed-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2020-01-27io_uring: fix refcounting with batched allocations at OOMPavel Begunkov1-2/+5
In case of out of memory the second argument of percpu_ref_put_many() in io_submit_sqes() may evaluate into "nr - (-EAGAIN)", that is clearly wrong. Fixes: 2b85edfc0c90 ("io_uring: batch getting pcpu references") Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2020-01-27io_uring: add comment for drain_nextPavel Begunkov1-0/+7
Draining the middle of a link is tricky, so leave a comment there Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2020-01-27io_uring: don't attempt to copy iovec for READ/WRITEJens Axboe1-2/+1
For the non-vectored variant of READV/WRITEV, we don't need to setup an async io context, and we flag that appropriately in the io_op_defs array. However, in fixing this for the 5.5 kernel in commit 74566df3a71c we didn't have these opcodes, so the check there was added just for the READ_FIXED and WRITE_FIXED opcodes. Replace that check with just a single check for needing async context, that covers all four of these read/write variants that don't use an iovec. Signed-off-by: Jens Axboe <[email protected]>
2020-01-27scripts/find-unused-docs: Fix massive false positivesGeert Uytterhoeven1-1/+1
scripts/find-unused-docs.sh invokes scripts/kernel-doc to find out if a source file contains kerneldoc or not. However, as it passes the no longer supported "-text" option to scripts/kernel-doc, the latter prints out its help text, causing all files to be considered containing kerneldoc. Get rid of these false positives by removing the no longer supported "-text" option from the scripts/kernel-doc invocation. Cc: [email protected] # 4.16+ Fixes: b05142675310d2ac ("scripts: kernel-doc: get rid of unused output formats") Signed-off-by: Geert Uytterhoeven <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jonathan Corbet <[email protected]>
2020-01-27Merge tag 'ioremap-5.6' of git://git.infradead.org/users/hch/ioremapLinus Torvalds366-600/+506
Pull ioremap updates from Christoph Hellwig: "Remove the ioremap_nocache API (plus wrappers) that are always identical to ioremap" * tag 'ioremap-5.6' of git://git.infradead.org/users/hch/ioremap: remove ioremap_nocache and devm_ioremap_nocache MIPS: define ioremap_nocache to ioremap
2020-01-27Merge tag 'for-5.6/libata-2020-01-27' of git://git.kernel.dk/linux-blockLinus Torvalds4-18/+65
Pull libata updates from Jens Axboe: "As usual pretty quiet, mostly trivial fixes (or dead code removal), outside of various fixes for ahci_bcrm" * tag 'for-5.6/libata-2020-01-27' of git://git.kernel.dk/linux-block: ata/acard_ahci: remove unused variable n_elem ata: pata_macio: fix comparing pointer to 0 ata: ahci_brcm: BCM7216 reset is self de-asserting ata: ahci_brcm: Perform reset after obtaining resources ata: brcm: fix reset controller API usage ata: brcm: mark PM functions as __maybe_unused ata: ahci_brcm: Support BCM7216 reset controller name dt-bindings: ata: Document BCM7216 AHCI controller compatible ata: ahci_brcm: Add a shutdown callback ata: ahci_brcm: Manage reset line during suspend/resume
2020-01-27Merge tag 'for-5.6/drivers-2020-01-27' of git://git.kernel.dk/linux-blockLinus Torvalds17-262/+571
Pull block driver updates from Jens Axboe: "Like the core side, not a lot of changes here, just two main items: - Series of patches (via Coly) with fixes for bcache (Coly, Christoph) - MD pull request from Song" * tag 'for-5.6/drivers-2020-01-27' of git://git.kernel.dk/linux-block: (31 commits) bcache: reap from tail of c->btree_cache in bch_mca_scan() bcache: reap c->btree_cache_freeable from the tail in bch_mca_scan() bcache: remove member accessed from struct btree bcache: print written and keys in trace_bcache_btree_write bcache: avoid unnecessary btree nodes flushing in btree_flush_write() bcache: add code comments for state->pool in __btree_sort() lib: crc64: include <linux/crc64.h> for 'crc64_be' bcache: use read_cache_page_gfp to read the superblock bcache: store a pointer to the on-disk sb in the cache and cached_dev structures bcache: return a pointer to the on-disk sb from read_super bcache: transfer the sb_page reference to register_{bdev,cache} bcache: fix use-after-free in register_bcache() bcache: properly initialize 'path' and 'err' in register_bcache() bcache: rework error unwinding in register_bcache bcache: use a separate data structure for the on-disk super block bcache: cached_dev_free needs to put the sb page md/raid1: introduce wait_for_serialization md/raid1: use bucket based mechanism for IO serialization md: introduce a new struct for IO serialization md: don't destroy serial_info_pool if serialize_policy is true ...
2020-01-27Merge tag 'for-5.6/block-2020-01-27' of git://git.kernel.dk/linux-blockLinus Torvalds11-57/+131
Pull core block updates from Jens Axboe: "This may be the most quiet round we've had in years. I'm not complaining. Really not a lot to detail here, outside of spelling and documentation improvements/fixes, we have: - Allow t10-pi to be modular (Herbert) - Remove dead code in bfq (Alex) - Mark zone management requests with REQ_SYNC (Chaitanya) - BFQ division improvement (Wen) - Small series improving plugging (Pavel)" * tag 'for-5.6/block-2020-01-27' of git://git.kernel.dk/linux-block: partitions/ldm: fix spelling mistake "to" -> "too" block, bfq: improve arithmetic division in bfq_delta() block/bfq: remove unused bfq_class_rt which never used block: mark zone-mgmt bios with REQ_SYNC blk-mq: Document functions for sending request block: Allow t10-pi to be modular blk-mq: optimise blk_mq_flush_plug_list() list: introduce list_for_each_continue() blk-mq: optimise rq sort function
2020-01-27Merge tag 'pnp-5.6-rc1' of ↵Linus Torvalds1-24/+1
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull PNP updates from Rafael Wysocki: "Get rid of unused variable and function (yu kuai)" * tag 'pnp-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: PNP: isapnp: remove defined but not used function 'isapnp_checksum' PNP: isapnp: remove set but not used variable 'checksum'
2020-01-27Merge tag 'devprop-5.6-rc1' of ↵Linus Torvalds7-155/+662
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull device properties framework updates from Rafael Wysocki: "Add support for reference properties in sofrware nodes (Dmitry Torokhov) and a basic test for property entries along with fixes on top of it (Dmitry Torokhov, Qian Cai, Alan Maguire)" * tag 'devprop-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: software node: introduce CONFIG_KUNIT_DRIVER_PE_TEST usb: dwc3: use proper initializers for property entries drivers/base/test: fix global-out-of-bounds error software node: add basic tests for property entries software node: remove separate handling of references platform/x86: intel_cht_int33fe: use inline reference properties software node: implement reference properties software node: allow embedding of small arrays into property_entry software node: replace is_array with is_inline
2020-01-27dm: fix potential for q->make_request_fn NULL pointerMike Snitzer1-2/+7
Move blk_queue_make_request() to dm.c:alloc_dev() so that q->make_request_fn is never NULL during the lifetime of a DM device (even one that is created without a DM table). Otherwise generic_make_request() will crash simply by doing: dmsetup create -n test mount /dev/dm-N /mnt While at it, move ->congested_data initialization out of dm.c:alloc_dev() and into the bio-based specific init method. Reported-by: Stefan Bader <[email protected]> BugLink: https://bugs.launchpad.net/bugs/1860231 Fixes: ff36ab34583a ("dm: remove request-based logic from make_request_fn wrapper") Depends-on: c12c9a3c3860c ("dm: various cleanups to md->queue initialization code") Cc: [email protected] Signed-off-by: Mike Snitzer <[email protected]>
2020-01-27Merge tag 'acpi-5.6-rc1' of ↵Linus Torvalds195-246/+511
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI updates from Rafael Wysocki: "These update the ACPICA code in the kernel to the most recent upstream revision (20200110), add new hardware support to a handful of ACPI drivers, make the ACPI fan driver expose power states information for fans, add some more quirks, fix bugs and clean up assorted things. Specifics: - Update the ACPICA code in the kernel to upstream revision 20200110 including: - Update of copyright notices to 2020 (Bob Moore). - Dispatcher fix to always generate buffer objects for the ASL create_field() operator (Maximilian Luz). - Debugger cleanup (Colin Ian King). - Disassembler change to create buffer fields in ACPI_PARSE_LOAD_PASS1 (Erik Kaneda). - UNIX line ending support for non-windows builds in acpisrc (Erik Kaneda). - Update the list of ACPICA maintainers (Rafael Wysocki). - Add Intel Tiger Lake ACPI device IDs to the ACPI DPTF, ACPI fan, int340x_thermal and intel-hid drivers (Gayatri Kammela). - Make the ACPI fan driver create additional sysfs attributes to expose power states information for fans (Srinivas Pandruvada). - Fix up the ACPI battery driver to deal with unexpected battery capacity information in a better way (Hans de Goede). - Add ACPI backlight quirks for Lenovo E41-25/45 and MSI MS-7721 boards (Aaron Ma, Hans de Goede). - Add DMI quirk for Razer Blade Stealth 13 late 2019 lid switch to the ACPI button driver (Jason Ekstrand). - Drop TIMER_DEFERRABLE from the GHES polling mode timer function flags to make it run precisely at the configured time (Bhaskar Upadhaya). - Fix race condition related to the reference counting of query handlers in the ACPI EC driver (Rafael Wysocki). - Fix ACPI tools build issue (Zhengyuan Liu). - Replace dma_request_slave_channel() with dma_request_chan() in the firmware guide documentation for ACPI (Peter Ujfalusi). - Fix typo in a comment and clean up function parameter data type inconsistencies (Kacper Piwiński, Tian Tao)" * tag 'acpi-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (25 commits) ACPICA: Update version to 20200110 ACPICA: All acpica: Update copyrights to 2020 Including tool signons. apei/ghes: Do not delay GHES polling ACPI: button: Add DMI quirk for Razer Blade Stealth 13 late 2019 lid switch ACPI: PPTT: Consistently use unsigned int as parameter type ACPI: EC: Reference count query handlers under lock ACPICA: Update the list of maintainers ACPICA: Update version to 20191213 ACPICA: Dispatcher: always generate buffer objects for ASL create_field() operator ACPICA: acpisrc: add unix line ending support for non-windows build ACPICA: Disassembler: create buffer fields in ACPI_PARSE_LOAD_PASS1 ACPICA: debugger: fix spelling mistake "adress" -> "address" ACPI: video: Do not export a non working backlight interface on MSI MS-7721 boards docs: firmware-guide: ACPI: Replace dma_request_slave_channel() with dma_request_chan() thermal: int340x_thermal: Add Tiger Lake ACPI device IDs platform/x86: intel-hid: Add Tiger Lake ACPI device ID ACPI: fan: Add Tiger Lake ACPI device ID ACPI: DPTF: Add Tiger Lake ACPI device IDs ACPI: fan: Expose fan performance state information tools/power/acpi: fix compilation error ...
2020-01-27Merge tag 'pm-5.6-rc1' of ↵Linus Torvalds65-606/+3829
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management updates from Rafael Wysocki: "These add ACPI support to the intel_idle driver along with an admin guide document for it, add support for CPR (Core Power Reduction) to the AVS (Adaptive Voltage Scaling) subsystem, add new hardware support in a few places, add some new sysfs attributes, debugfs files and tracepoints, fix bugs and clean up a bunch of things all over. Specifics: - Update the ACPI processor driver in order to export acpi_processor_evaluate_cst() to the code outside of it, add ACPI support to the intel_idle driver based on that and clean up that driver somewhat (Rafael Wysocki). - Add an admin guide document for the intel_idle driver (Rafael Wysocki). - Clean up cpuidle core and drivers, enable compilation testing for some of them (Benjamin Gaignard, Krzysztof Kozlowski, Rafael Wysocki, Yangtao Li). - Fix reference counting of OPP (operating performance points) table structures (Viresh Kumar). - Add support for CPR (Core Power Reduction) to the AVS (Adaptive Voltage Scaling) subsystem (Niklas Cassel, Colin Ian King, YueHaibing). - Add support for TigerLake Mobile and JasperLake to the Intel RAPL power capping driver (Zhang Rui). - Update cpufreq drivers: - Add i.MX8MP support to imx-cpufreq-dt (Anson Huang). - Fix usage of a macro in loongson2_cpufreq (Alexandre Oliva). - Fix cpufreq policy reference counting issues in s3c and brcmstb-avs (chenqiwu). - Fix ACPI table reference counting issue and HiSilicon quirk handling in the CPPC driver (Hanjun Guo). - Clean up spelling mistake in intel_pstate (Harry Pan). - Convert the kirkwood and tegra186 drivers to using devm_platform_ioremap_resource() (Yangtao Li). - Update devfreq core: - Add 'name' sysfs attribute for devfreq devices (Chanwoo Choi). - Clean up the handing of transition statistics and allow them to be reset by writing 0 to the 'trans_stat' devfreq device attribute in sysfs (Kamil Konieczny). - Add 'devfreq_summary' to debugfs (Chanwoo Choi). - Clean up kerneldoc comments and Kconfig indentation (Krzysztof Kozlowski, Randy Dunlap). - Update devfreq drivers: - Add dynamic scaling for the imx8m DDR controller and clean up imx8m-ddrc (Leonard Crestez, YueHaibing). - Fix DT node reference counting and nitialization error code path in rk3399_dmc and add COMPILE_TEST and HAVE_ARM_SMCCC dependency for it (Chanwoo Choi, Yangtao Li). - Fix DT node reference counting in rockchip-dfi and make it use devm_platform_ioremap_resource() (Yangtao Li). - Fix excessive stack usage in exynos-ppmu (Arnd Bergmann). - Fix initialization error code paths in exynos-bus (Yangtao Li). - Clean up exynos-bus and exynos somewhat (Artur Świgoń, Krzysztof Kozlowski). - Add tracepoints for tracking usage_count updates unrelated to status changes in PM-runtime (Michał Mirosław). - Add sysfs attribute to control the "sync on suspend" behavior during system-wide suspend (Jonas Meurer). - Switch system-wide suspend tests over to 64-bit time (Alexandre Belloni). - Make wakeup sources statistics in debugfs cover deleted ones which used to be the case some time ago (zhuguangqing). - Clean up computations carried out during hibernation, update messages related to hibernation and fix a spelling mistake in one of them (Wen Yang, Luigi Semenzato, Colin Ian King). - Add mailmap entry for maintainer e-mail address that has not been functional for several years (Rafael Wysocki)" * tag 'pm-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (83 commits) cpufreq: loongson2_cpufreq: adjust cpufreq uses of LOONGSON_CHIPCFG intel_idle: Clean up irtl_2_usec() intel_idle: Move 3 functions closer to their callers intel_idle: Annotate initialization code and data structures intel_idle: Move and clean up intel_idle_cpuidle_devices_uninit() intel_idle: Rearrange intel_idle_cpuidle_driver_init() intel_idle: Clean up NULL pointer check in intel_idle_init() intel_idle: Fold intel_idle_probe() into intel_idle_init() intel_idle: Eliminate __setup_broadcast_timer() cpuidle: fix cpuidle_find_deepest_state() kerneldoc warnings cpuidle: sysfs: fix warnings when compiling with W=1 cpuidle: coupled: fix warnings when compiling with W=1 cpufreq: brcmstb-avs: fix imbalance of cpufreq policy refcount PM: suspend: Add sysfs attribute to control the "sync on suspend" behavior PM / devfreq: Add debugfs support with devfreq_summary file Documentation: admin-guide: PM: Add intel_idle document cpuidle: arm: Enable compile testing for some of drivers PM-runtime: add tracepoints for usage_count changes cpufreq: intel_pstate: fix spelling mistake: "Whethet" -> "Whether" PM: hibernate: fix spelling mistake "shapshot" -> "snapshot" ...
2020-01-27security: remove EARLY_LSM_COUNT which never usedAlex Shi1-1/+0
This macro is never used from it was introduced in commit e6b1db98cf4d5 ("security: Support early LSMs"), better to remove it. Signed-off-by: Alex Shi <[email protected]> Acked-by: Serge Hallyn <[email protected]> Signed-off-by: James Morris <[email protected]>
2020-01-27Merge tag 'regulator-v5.6' of ↵Linus Torvalds38-156/+2014
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator Pull regulator updates from Mark Brown: "Hardly anything going on in the core this time around with the regulator API and pretty quiet on the driver front: - An API for comparing regulators, useful for devices that need to check if supply voltages exactly match rather than just nominally match. - Conversion of several DT bindings to YAML format. - Conversion of I2C drivers to probe_new(). - New drivers for Monolithic MPQ7920 and MP8859, and Rohm BD71828" * tag 'regulator-v5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator: (34 commits) dt-bindings: regulator: add document bindings for mpq7920 regulator: core: Fix exported symbols to the exported GPL version regulator: mpq7920: Fix incorrect defines regulator: vqmmc-ipq4019: Fix platform_no_drv_owner.cocci warnings regulator: vctrl-regulator: Avoid deadlock getting and setting the voltage regulator fix for "regulator: core: Add regulator_is_equal() helper" regulator: core: Add regulator_is_equal() helper regulator: mpq7920: Convert to use .probe_new regulator: mpq7920: Remove unneeded fields from struct mpq7920_regulator_info regulator: vqmmc-ipq4019: Trivial clean up regulator: vqmmc-ipq4019: Remove ipq4019_regulator_remove regulator: bindings: Drop document bindings for mpq7920 dt-bindings: Drop entry for Monolithic Power System, MPS regulator: bd718x7: Simplify the code by removing struct bd718xx_pmic_inits regulator: add IPQ4019 SDHCI VQMMC LDO driver regulator: Convert i2c drivers to use .probe_new regulator: mpq7920: Check the correct variable in mpq7920_regulator_register() regulator: mpq7920: Fix Woverflow warning on conversion regulator: mp8859: tidy up white space in probe regulator: mpq7920: add mpq7920 regulator driver ...
2020-01-27Merge tag 'spi-v5.6' of ↵Linus Torvalds37-628/+1208
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi Pull spi updates from Mark Brown: "Not much going on in the core for SPI this time but a reasonable amount of change in the drivers: - Removal of dmal_request_slave_channel() from Peter Ujfalusi. - More conversions of drivers to GPIO descriptors from Linus Walleij. - A big rework of the sh-msiof driver from Geert Uytterhoeven moving it over to the generic native chipselect support. - DMA support for the uniphier driver from Kunihiko Hayashi. - New driver support for HiSilcon v3xx SPI NOR controllers from John Garry" * tag 'spi-v5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: (52 commits) dt-binding: spi: add NPCM PSPI reset binding spi: pxa2xx: Avoid touching SSCR0_SSE on MMP2 spi: spi-fsl-qspi: Ensure width is respected in spi-mem operations spi: npcm-pspi: modify reset support spi: npcm-pspi: improve spi transfer performance spi: spi-ti-qspi: fix warning spi: npcm-pspi: fix 16 bit send and receive support spi: pxa2xx: Add support for Intel Comet Lake PCH-V spi: fsl: simplify error path in of_fsl_spi_probe() spi: fsl-lpspi: fix only one cs-gpio working spi: spi-ti-qspi: optimize byte-transfers spi: spi-ti-qspi: support large flash devices spi: spi-qcom-qspi: Use device managed memory for clk_bulk_data MAINTAINERS: Add a maintainer for the HiSilicon v3xx SFC driver spi: Add HiSilicon v3xx SPI NOR flash controller driver dt-bindings: spi_atmel: add microchip,sam9x60-spi spi: bcm2835: Raise maximum number of slaves to 4 spi: sh-msiof: Do not redefine STR while compile testing spi: rspi: Add support for GPIO chip selects spi: rspi: Add support for multiple native chip selects ...
2020-01-27Merge tag 'regmap-v5.6' of ↵Linus Torvalds3-10/+62
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap Pull regmap updates from Mark Brown: "This is quite a busy release for a subsystem that's usually very quiet, though still a small set of updates in the grand scheme of things: - A fix for writes to non-incrementing registers. - An iopoll() style helper for use with atomic safe regmaps, making it easier to transition from raw memory mapped I/O. - Some constification" * tag 'regmap-v5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap: regmap: fix writes to non incrementing registers regmap: add iopoll-like atomic polling macro regmap-i2c: constify regmap_bus structures
2020-01-27KVM: x86: Use a typedef for fastop functionsSean Christopherson1-7/+7
Add a typedef to for the fastop function prototype to make the code more readable. No functional change intended. Signed-off-by: Sean Christopherson <[email protected]> Reviewed-by: Miaohe Lin <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-01-27KVM: X86: Add 'else' to unify fastop and execute call pathMiaohe Lin1-4/+2
It also helps eliminate some duplicated code. Signed-off-by: Miaohe Lin <[email protected]> Reviewed-by: Sean Christopherson <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-01-27KVM: x86: inline memslot_valid_for_gptePaolo Bonzini1-13/+4
The function now has a single caller, so there is no point in keeping it separate. Signed-off-by: Paolo Bonzini <[email protected]>
2020-01-27KVM: x86/mmu: Use huge pages for DAX-backed filesSean Christopherson1-5/+4
Walk the host page tables to identify hugepage mappings for ZONE_DEVICE pfns, i.e. DAX pages. Explicitly query kvm_is_zone_device_pfn() when deciding whether or not to bother walking the host page tables, as DAX pages do not set up the head/tail infrastructure, i.e. will return false for PageCompound() even when using huge pages. Zap ZONE_DEVICE sptes when disabling dirty logging, e.g. if live migration fails, to allow KVM to rebuild large pages for DAX-based mappings. Presumably DAX favors large pages, and worst case scenario is a minor performance hit as KVM will need to re-fault all DAX-based pages. Suggested-by: Barret Rhoden <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Dan Williams <[email protected]> Cc: Jason Zeng <[email protected]> Cc: Dave Jiang <[email protected]> Cc: Liran Alon <[email protected]> Cc: linux-nvdimm <[email protected]> Signed-off-by: Sean Christopherson <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-01-27KVM: x86/mmu: Remove lpage_is_disallowed() check from set_spte()Sean Christopherson1-36/+3
Remove the late "lpage is disallowed" check from set_spte() now that the initial check is performed after acquiring mmu_lock. Fold the guts of the remaining helper, __mmu_gfn_lpage_is_disallowed(), into kvm_mmu_hugepage_adjust() to eliminate the unnecessary slot !NULL check. Signed-off-by: Sean Christopherson <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-01-27KVM: x86/mmu: Fold max_mapping_level() into kvm_mmu_hugepage_adjust()Sean Christopherson2-37/+30
Fold max_mapping_level() into kvm_mmu_hugepage_adjust() now that HugeTLB mappings are handled in kvm_mmu_hugepage_adjust(), i.e. there isn't a need to pre-calculate the max mapping level. Co-locating all hugepage checks eliminates a memslot lookup, at the cost of performing the __mmu_gfn_lpage_is_disallowed() checks while holding mmu_lock. The latency of lpage_is_disallowed() is likely negligible relative to the rest of the code run while holding mmu_lock, and can be offset to some extent by eliminating the mmu_gfn_lpage_is_disallowed() check in set_spte() in a future patch. Eliminating the check in set_spte() is made possible by performing the initial lpage_is_disallowed() checks while holding mmu_lock. Signed-off-by: Sean Christopherson <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-01-27KVM: x86/mmu: Zap any compound page when collapsing sptesSean Christopherson1-1/+1
Zap any compound page, e.g. THP or HugeTLB pages, when zapping sptes that can potentially be converted to huge sptes after disabling dirty logging on the associated memslot. Note, this approach could result in false positives, e.g. if a random compound page is mapped into the guest, but mapping non-huge compound pages into the guest is far from the norm, and toggling dirty logging is not a frequent operation. Signed-off-by: Sean Christopherson <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-01-27KVM: x86/mmu: Remove obsolete gfn restoration in FNAME(fetch)Sean Christopherson1-10/+3
Remove logic to retrieve the original gfn now that HugeTLB mappings are are identified in FNAME(fetch), i.e. FNAME(page_fault) no longer adjusts the level or gfn. Signed-off-by: Sean Christopherson <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-01-27KVM: x86/mmu: Rely on host page tables to find HugeTLB mappingsSean Christopherson2-70/+29
Remove KVM's HugeTLB specific logic and instead rely on walking the host page tables (already done for THP) to identify HugeTLB mappings. Eliminating the HugeTLB-only logic avoids taking mmap_sem and calling find_vma() for all hugepage compatible page faults, and simplifies KVM's page fault code by consolidating all hugepage adjustments into a common helper. Signed-off-by: Sean Christopherson <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-01-27KVM: x86/mmu: Drop level optimization from fast_page_fault()Sean Christopherson1-4/+3
Remove fast_page_fault()'s optimization to stop the shadow walk if the iterator level drops below the intended map level. The intended map level is only acccurate for HugeTLB mappings (THP mappings are detected after fast_page_fault()), i.e. it's not required for correctness, and a future patch will also move HugeTLB mapping detection to after fast_page_fault(). Signed-off-by: Sean Christopherson <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-01-27KVM: x86/mmu: Walk host page tables to find THP mappingsSean Christopherson1-2/+31
Explicitly walk the host page tables to identify THP mappings instead of relying solely on the metadata in struct page. This sets the stage for using a common method of identifying huge mappings regardless of the underlying implementation (HugeTLB vs THB vs DAX), and hopefully avoids the pitfalls of relying on metadata to identify THP mappings, e.g. see commit 169226f7e0d2 ("mm: thp: handle page cache THP correctly in PageTransCompoundMap") and the need for KVM to explicitly check for a THP compound page. KVM will also naturally work with 1gb THP pages, if they are ever supported. Walking the tables for THP mappings is likely marginally slower than querying metadata, but a future patch will reuse the walk to identify HugeTLB mappings, at which point eliminating the existing VMA lookup for HugeTLB will make this a net positive. Cc: Andrea Arcangeli <[email protected]> Cc: Barret Rhoden <[email protected]> Signed-off-by: Sean Christopherson <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-01-27KVM: x86/mmu: Refactor THP adjust to prep for changing querySean Christopherson2-24/+23
Refactor transparent_hugepage_adjust() in preparation for walking the host page tables to identify hugepage mappings, initially for THP pages, and eventualy for HugeTLB and DAX-backed pages as well. The latter cases support 1gb pages, i.e. the adjustment logic needs access to the max allowed level. Signed-off-by: Sean Christopherson <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-01-27x86/mm: Introduce lookup_address_in_mm()Sean Christopherson2-0/+15
Add a helper, lookup_address_in_mm(), to traverse the page tables of a given mm struct. KVM will use the helper to retrieve the host mapping level, e.g. 4k vs. 2mb vs. 1gb, of a compound (or DAX-backed) page without having to resort to implementation specific metadata. E.g. KVM currently uses different logic for HugeTLB vs. THP, and would add a third variant for DAX-backed files. Cc: Dan Williams <[email protected]> Signed-off-by: Sean Christopherson <[email protected]> Reviewed-by: Thomas Gleixner <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-01-27KVM: Play nice with read-only memslots when querying host page sizeSean Christopherson1-1/+1
Avoid the "writable" check in __gfn_to_hva_many(), which will always fail on read-only memslots due to gfn_to_hva() assuming writes. Functionally, this allows x86 to create large mappings for read-only memslots that are backed by HugeTLB mappings. Note, the changelog for commit 05da45583de9 ("KVM: MMU: large page support") states "If the largepage contains write-protected pages, a large pte is not used.", but "write-protected" refers to pages that are temporarily read-only, e.g. read-only memslots didn't even exist at the time. Fixes: 4d8b81abc47b ("KVM: introduce readonly memslot") Cc: [email protected] Signed-off-by: Sean Christopherson <[email protected]> [Redone using kvm_vcpu_gfn_to_memslot_prot. - Paolo] Signed-off-by: Paolo Bonzini <[email protected]>
2020-01-27KVM: Use vcpu-specific gva->hva translation when querying host page sizeSean Christopherson4-7/+7
Use kvm_vcpu_gfn_to_hva() when retrieving the host page size so that the correct set of memslots is used when handling x86 page faults in SMM. Fixes: 54bf36aac520 ("KVM: x86: use vcpu-specific functions to read/write/translate GFNs") Cc: [email protected] Signed-off-by: Sean Christopherson <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-01-27mm: thp: KVM: Explicitly check for THP when populating secondary MMUSean Christopherson6-9/+31
Add a helper, is_transparent_hugepage(), to explicitly check whether a compound page is a THP and use it when populating KVM's secondary MMU. The explicit check fixes a bug where a remapped compound page, e.g. for an XDP Rx socket, is mapped into a KVM guest and is mistaken for a THP, which results in KVM incorrectly creating a huge page in its secondary MMU. Fixes: 936a5fe6e6148 ("thp: kvm mmu transparent hugepage support") Reported-by: [email protected] Cc: Andrea Arcangeli <[email protected]> Cc: [email protected] Signed-off-by: Sean Christopherson <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-01-27KVM: x86/mmu: Enforce max_level on HugeTLB mappingsSean Christopherson1-2/+3
Limit KVM's mapping level for HugeTLB based on its calculated max_level. The max_level check prior to invoking host_mapping_level() only filters out the case where KVM cannot create a 2mb mapping, it doesn't handle the scenario where KVM can create a 2mb but not 1gb mapping, and the host is using a 1gb HugeTLB mapping. Fixes: 2f57b7051fe8 ("KVM: x86/mmu: Persist gfn_lpage_is_disallowed() to max_level") Signed-off-by: Sean Christopherson <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-01-27KVM: Return immediately if __kvm_gfn_to_hva_cache_init() failsSean Christopherson1-4/+8
Check the result of __kvm_gfn_to_hva_cache_init() and return immediately instead of relying on the kvm_is_error_hva() check to detect errors so that it's abundantly clear KVM intends to immediately bail on an error. Note, the hva check is still mandatory to handle errors on subqeuesnt calls with the same generation. Similarly, always return -EFAULT on error so that multiple (bad) calls for a given generation will get the same result, e.g. on an illegal gfn wrap, propagating the return from __kvm_gfn_to_hva_cache_init() would cause the initial call to return -EINVAL and subsequent calls to return -EFAULT. Signed-off-by: Sean Christopherson <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-01-27KVM: Clean up __kvm_gfn_to_hva_cache_init() and its callersSean Christopherson1-9/+12
Barret reported a (technically benign) bug where nr_pages_avail can be accessed without being initialized if gfn_to_hva_many() fails. virt/kvm/kvm_main.c:2193:13: warning: 'nr_pages_avail' may be used uninitialized in this function [-Wmaybe-uninitialized] Rather than simply squashing the warning by initializing nr_pages_avail, fix the underlying issues by reworking __kvm_gfn_to_hva_cache_init() to return immediately instead of continuing on. Now that all callers check the result and/or bail immediately on a bad hva, there's no need to explicitly nullify the memslot on error. Reported-by: Barret Rhoden <[email protected]> Fixes: f1b9dd5eb86c ("kvm: Disallow wraparound in kvm_gfn_to_hva_cache_init") Cc: Jim Mattson <[email protected]> Signed-off-by: Sean Christopherson <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-01-27KVM: Check for a bad hva before dropping into the ghc slow pathSean Christopherson1-6/+6
When reading/writing using the guest/host cache, check for a bad hva before checking for a NULL memslot, which triggers the slow path for handing cross-page accesses. Because the memslot is nullified on error by __kvm_gfn_to_hva_cache_init(), if the bad hva is encountered after crossing into a new page, then the kvm_{read,write}_guest() slow path could potentially write/access the first chunk prior to detecting the bad hva. Arguably, performing a partial access is semantically correct from an architectural perspective, but that behavior is certainly not intended. In the original implementation, memslot was not explicitly nullified and therefore the partial access behavior varied based on whether the memslot itself was null, or if the hva was simply bad. The current behavior was introduced as a seemingly unintentional side effect in commit f1b9dd5eb86c ("kvm: Disallow wraparound in kvm_gfn_to_hva_cache_init"), which justified the change with "since some callers don't check the return code from this function, it sit seems prudent to clear ghc->memslot in the event of an error". Regardless of intent, the partial access is dependent on _not_ checking the result of the cache initialization, which is arguably a bug in its own right, at best simply weird. Fixes: 8f964525a121 ("KVM: Allow cross page reads and writes from cached translations.") Cc: Jim Mattson <[email protected]> Cc: Andrew Honig <[email protected]> Signed-off-by: Sean Christopherson <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-01-27kvm/x86: export kvm_vector_hashing_enabled() is unnecessaryPeng Hao1-1/+0
kvm_vector_hashing_enabled() is just called in kvm.ko module. Signed-off-by: Peng Hao <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-01-27KVM: VMX: remove duplicated segment cache clearMiaohe Lin1-2/+0
vmx_set_segment() clears segment cache unconditionally, so we should not clear it again by calling vmx_segment_cache_clear(). Signed-off-by: Miaohe Lin <[email protected]> Reviewed-by: Vitaly Kuznetsov <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-01-27Adding 'else' to reduce checking.Haiwei Li1-2/+2
These two conditions are in conflict, adding 'else' to reduce checking. Signed-off-by: Haiwei Li <[email protected]> Reviewed-by: Vitaly Kuznetsov <[email protected]> Reviewed-by: Miaohe Lin <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-01-27KVM: nVMX: Check GUEST_DR7 on vmentry of nested guestsKrish Sadhukhan3-1/+11
According to section "Checks on Guest Control Registers, Debug Registers, and and MSRs" in Intel SDM vol 3C, the following checks are performed on vmentry of nested guests: If the "load debug controls" VM-entry control is 1, bits 63:32 in the DR7 field must be 0. In KVM, GUEST_DR7 is set prior to the vmcs02 VM-entry by kvm_set_dr() and the latter synthesizes a #GP if any bit in the high dword in the former is set. Hence this field needs to be checked in software. Signed-off-by: Krish Sadhukhan <[email protected]> Reviewed-by: Karl Heubaum <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-01-27KVM: remove unused guest_enterAlex Shi1-9/+0
After commit 61bd0f66ff92 ("KVM: PPC: Book3S HV: Fix guest time accounting with VIRT_CPU_ACCOUNTING_GEN"), no one use this function anymore, So better to remove it. Signed-off-by: Alex Shi <[email protected]> Cc: Paolo Bonzini <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: [email protected] Signed-off-by: Paolo Bonzini <[email protected]>
2020-01-27KVM: Move running VCPU from ARM to common codePaolo Bonzini8-50/+34
For ring-based dirty log tracking, it will be more efficient to account writes during schedule-out or schedule-in to the currently running VCPU. We would like to do it even if the write doesn't use the current VCPU's address space, as is the case for cached writes (see commit 4e335d9e7ddb, "Revert "KVM: Support vCPU-based gfn->hva cache"", 2017-05-02). Therefore, add a mechanism to track the currently-loaded kvm_vcpu struct. There is already something similar in KVM/ARM; one important difference is that kvm_arch_vcpu_{load,put} have two callers in virt/kvm/kvm_main.c: we have to update both the architecture-independent vcpu_{load,put} and the preempt notifiers. Another change made in the process is to allow using kvm_get_running_vcpu() in preemptible code. This is allowed because preempt notifiers ensure that the value does not change even after the VCPU thread is migrated. Signed-off-by: Paolo Bonzini <[email protected]> Reviewed-by: Paolo Bonzini <[email protected]> Signed-off-by: Peter Xu <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-01-27KVM: X86: Drop x86_set_memory_region()Peter Xu3-18/+12
The helper x86_set_memory_region() is only used in vmx_set_tss_addr() and kvm_arch_destroy_vm(). Push the lock upper in both cases. With that, drop x86_set_memory_region(). This prepares to allow __x86_set_memory_region() to return a HVA mapped, because the HVA will need to be protected by the lock too even after __x86_set_memory_region() returns. Signed-off-by: Peter Xu <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-01-27KVM: X86: Don't take srcu lock in init_rmode_identity_map()Peter Xu1-7/+3
We've already got the slots_lock, so we should be safe. Signed-off-by: Peter Xu <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-01-27KVM: Add build-time error check on kvm_run sizePeter Xu1-0/+1
It's already going to reach 2400 Bytes (which is over half of page size on 4K page archs), so maybe it's good to have this build-time check in case it overflows when adding new fields. Signed-off-by: Peter Xu <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-01-27KVM: Remove kvm_read_guest_atomic()Peter Xu2-13/+0
Remove kvm_read_guest_atomic() because it's not used anywhere. Signed-off-by: Peter Xu <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-01-27x86/kvm/hyper-v: remove stale evmcs_already_enabled check from ↵Vitaly Kuznetsov1-5/+0
nested_enable_evmcs() In nested_enable_evmcs() evmcs_already_enabled check doesn't really do anything: controls are already sanitized and we return '0' regardless. Just drop the check. Signed-off-by: Vitaly Kuznetsov <[email protected]> Reviewed-by: Liran Alon <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>