Age | Commit message (Collapse) | Author | Files | Lines |
|
Pull workqueue updates from Tejun Heo:
"Just a couple tracepoint patches"
* 'for-5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
workqueue: remove workqueue_work event class
workqueue: add worker function to workqueue_execute_end tracepoint
|
|
In preparation for sharing an io-wq across different users, add a
reference count that manages destruction of it.
Reviewed-by: Pavel Begunkov <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
In case of out of memory the second argument of percpu_ref_put_many() in
io_submit_sqes() may evaluate into "nr - (-EAGAIN)", that is clearly
wrong.
Fixes: 2b85edfc0c90 ("io_uring: batch getting pcpu references")
Signed-off-by: Pavel Begunkov <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Draining the middle of a link is tricky, so leave a comment there
Signed-off-by: Pavel Begunkov <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
For the non-vectored variant of READV/WRITEV, we don't need to setup an
async io context, and we flag that appropriately in the io_op_defs
array. However, in fixing this for the 5.5 kernel in commit 74566df3a71c
we didn't have these opcodes, so the check there was added just for the
READ_FIXED and WRITE_FIXED opcodes. Replace that check with just a
single check for needing async context, that covers all four of these
read/write variants that don't use an iovec.
Signed-off-by: Jens Axboe <[email protected]>
|
|
scripts/find-unused-docs.sh invokes scripts/kernel-doc to find out if a
source file contains kerneldoc or not.
However, as it passes the no longer supported "-text" option to
scripts/kernel-doc, the latter prints out its help text, causing all
files to be considered containing kerneldoc.
Get rid of these false positives by removing the no longer supported
"-text" option from the scripts/kernel-doc invocation.
Cc: [email protected] # 4.16+
Fixes: b05142675310d2ac ("scripts: kernel-doc: get rid of unused output formats")
Signed-off-by: Geert Uytterhoeven <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jonathan Corbet <[email protected]>
|
|
Pull ioremap updates from Christoph Hellwig:
"Remove the ioremap_nocache API (plus wrappers) that are always
identical to ioremap"
* tag 'ioremap-5.6' of git://git.infradead.org/users/hch/ioremap:
remove ioremap_nocache and devm_ioremap_nocache
MIPS: define ioremap_nocache to ioremap
|
|
Pull libata updates from Jens Axboe:
"As usual pretty quiet, mostly trivial fixes (or dead code removal),
outside of various fixes for ahci_bcrm"
* tag 'for-5.6/libata-2020-01-27' of git://git.kernel.dk/linux-block:
ata/acard_ahci: remove unused variable n_elem
ata: pata_macio: fix comparing pointer to 0
ata: ahci_brcm: BCM7216 reset is self de-asserting
ata: ahci_brcm: Perform reset after obtaining resources
ata: brcm: fix reset controller API usage
ata: brcm: mark PM functions as __maybe_unused
ata: ahci_brcm: Support BCM7216 reset controller name
dt-bindings: ata: Document BCM7216 AHCI controller compatible
ata: ahci_brcm: Add a shutdown callback
ata: ahci_brcm: Manage reset line during suspend/resume
|
|
Pull block driver updates from Jens Axboe:
"Like the core side, not a lot of changes here, just two main items:
- Series of patches (via Coly) with fixes for bcache (Coly,
Christoph)
- MD pull request from Song"
* tag 'for-5.6/drivers-2020-01-27' of git://git.kernel.dk/linux-block: (31 commits)
bcache: reap from tail of c->btree_cache in bch_mca_scan()
bcache: reap c->btree_cache_freeable from the tail in bch_mca_scan()
bcache: remove member accessed from struct btree
bcache: print written and keys in trace_bcache_btree_write
bcache: avoid unnecessary btree nodes flushing in btree_flush_write()
bcache: add code comments for state->pool in __btree_sort()
lib: crc64: include <linux/crc64.h> for 'crc64_be'
bcache: use read_cache_page_gfp to read the superblock
bcache: store a pointer to the on-disk sb in the cache and cached_dev structures
bcache: return a pointer to the on-disk sb from read_super
bcache: transfer the sb_page reference to register_{bdev,cache}
bcache: fix use-after-free in register_bcache()
bcache: properly initialize 'path' and 'err' in register_bcache()
bcache: rework error unwinding in register_bcache
bcache: use a separate data structure for the on-disk super block
bcache: cached_dev_free needs to put the sb page
md/raid1: introduce wait_for_serialization
md/raid1: use bucket based mechanism for IO serialization
md: introduce a new struct for IO serialization
md: don't destroy serial_info_pool if serialize_policy is true
...
|
|
Pull core block updates from Jens Axboe:
"This may be the most quiet round we've had in years. I'm not
complaining. Really not a lot to detail here, outside of spelling and
documentation improvements/fixes, we have:
- Allow t10-pi to be modular (Herbert)
- Remove dead code in bfq (Alex)
- Mark zone management requests with REQ_SYNC (Chaitanya)
- BFQ division improvement (Wen)
- Small series improving plugging (Pavel)"
* tag 'for-5.6/block-2020-01-27' of git://git.kernel.dk/linux-block:
partitions/ldm: fix spelling mistake "to" -> "too"
block, bfq: improve arithmetic division in bfq_delta()
block/bfq: remove unused bfq_class_rt which never used
block: mark zone-mgmt bios with REQ_SYNC
blk-mq: Document functions for sending request
block: Allow t10-pi to be modular
blk-mq: optimise blk_mq_flush_plug_list()
list: introduce list_for_each_continue()
blk-mq: optimise rq sort function
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull PNP updates from Rafael Wysocki:
"Get rid of unused variable and function (yu kuai)"
* tag 'pnp-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PNP: isapnp: remove defined but not used function 'isapnp_checksum'
PNP: isapnp: remove set but not used variable 'checksum'
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull device properties framework updates from Rafael Wysocki:
"Add support for reference properties in sofrware nodes (Dmitry
Torokhov) and a basic test for property entries along with fixes on
top of it (Dmitry Torokhov, Qian Cai, Alan Maguire)"
* tag 'devprop-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
software node: introduce CONFIG_KUNIT_DRIVER_PE_TEST
usb: dwc3: use proper initializers for property entries
drivers/base/test: fix global-out-of-bounds error
software node: add basic tests for property entries
software node: remove separate handling of references
platform/x86: intel_cht_int33fe: use inline reference properties
software node: implement reference properties
software node: allow embedding of small arrays into property_entry
software node: replace is_array with is_inline
|
|
Move blk_queue_make_request() to dm.c:alloc_dev() so that
q->make_request_fn is never NULL during the lifetime of a DM device
(even one that is created without a DM table).
Otherwise generic_make_request() will crash simply by doing:
dmsetup create -n test
mount /dev/dm-N /mnt
While at it, move ->congested_data initialization out of
dm.c:alloc_dev() and into the bio-based specific init method.
Reported-by: Stefan Bader <[email protected]>
BugLink: https://bugs.launchpad.net/bugs/1860231
Fixes: ff36ab34583a ("dm: remove request-based logic from make_request_fn wrapper")
Depends-on: c12c9a3c3860c ("dm: various cleanups to md->queue initialization code")
Cc: [email protected]
Signed-off-by: Mike Snitzer <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI updates from Rafael Wysocki:
"These update the ACPICA code in the kernel to the most recent upstream
revision (20200110), add new hardware support to a handful of ACPI
drivers, make the ACPI fan driver expose power states information for
fans, add some more quirks, fix bugs and clean up assorted things.
Specifics:
- Update the ACPICA code in the kernel to upstream revision 20200110
including:
- Update of copyright notices to 2020 (Bob Moore).
- Dispatcher fix to always generate buffer objects for the ASL
create_field() operator (Maximilian Luz).
- Debugger cleanup (Colin Ian King).
- Disassembler change to create buffer fields in
ACPI_PARSE_LOAD_PASS1 (Erik Kaneda).
- UNIX line ending support for non-windows builds in acpisrc (Erik
Kaneda).
- Update the list of ACPICA maintainers (Rafael Wysocki).
- Add Intel Tiger Lake ACPI device IDs to the ACPI DPTF, ACPI fan,
int340x_thermal and intel-hid drivers (Gayatri Kammela).
- Make the ACPI fan driver create additional sysfs attributes to
expose power states information for fans (Srinivas Pandruvada).
- Fix up the ACPI battery driver to deal with unexpected battery
capacity information in a better way (Hans de Goede).
- Add ACPI backlight quirks for Lenovo E41-25/45 and MSI MS-7721
boards (Aaron Ma, Hans de Goede).
- Add DMI quirk for Razer Blade Stealth 13 late 2019 lid switch to
the ACPI button driver (Jason Ekstrand).
- Drop TIMER_DEFERRABLE from the GHES polling mode timer function
flags to make it run precisely at the configured time (Bhaskar
Upadhaya).
- Fix race condition related to the reference counting of query
handlers in the ACPI EC driver (Rafael Wysocki).
- Fix ACPI tools build issue (Zhengyuan Liu).
- Replace dma_request_slave_channel() with dma_request_chan() in the
firmware guide documentation for ACPI (Peter Ujfalusi).
- Fix typo in a comment and clean up function parameter data type
inconsistencies (Kacper Piwiński, Tian Tao)"
* tag 'acpi-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (25 commits)
ACPICA: Update version to 20200110
ACPICA: All acpica: Update copyrights to 2020 Including tool signons.
apei/ghes: Do not delay GHES polling
ACPI: button: Add DMI quirk for Razer Blade Stealth 13 late 2019 lid switch
ACPI: PPTT: Consistently use unsigned int as parameter type
ACPI: EC: Reference count query handlers under lock
ACPICA: Update the list of maintainers
ACPICA: Update version to 20191213
ACPICA: Dispatcher: always generate buffer objects for ASL create_field() operator
ACPICA: acpisrc: add unix line ending support for non-windows build
ACPICA: Disassembler: create buffer fields in ACPI_PARSE_LOAD_PASS1
ACPICA: debugger: fix spelling mistake "adress" -> "address"
ACPI: video: Do not export a non working backlight interface on MSI MS-7721 boards
docs: firmware-guide: ACPI: Replace dma_request_slave_channel() with dma_request_chan()
thermal: int340x_thermal: Add Tiger Lake ACPI device IDs
platform/x86: intel-hid: Add Tiger Lake ACPI device ID
ACPI: fan: Add Tiger Lake ACPI device ID
ACPI: DPTF: Add Tiger Lake ACPI device IDs
ACPI: fan: Expose fan performance state information
tools/power/acpi: fix compilation error
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management updates from Rafael Wysocki:
"These add ACPI support to the intel_idle driver along with an admin
guide document for it, add support for CPR (Core Power Reduction) to
the AVS (Adaptive Voltage Scaling) subsystem, add new hardware support
in a few places, add some new sysfs attributes, debugfs files and
tracepoints, fix bugs and clean up a bunch of things all over.
Specifics:
- Update the ACPI processor driver in order to export
acpi_processor_evaluate_cst() to the code outside of it, add ACPI
support to the intel_idle driver based on that and clean up that
driver somewhat (Rafael Wysocki).
- Add an admin guide document for the intel_idle driver (Rafael
Wysocki).
- Clean up cpuidle core and drivers, enable compilation testing for
some of them (Benjamin Gaignard, Krzysztof Kozlowski, Rafael
Wysocki, Yangtao Li).
- Fix reference counting of OPP (operating performance points) table
structures (Viresh Kumar).
- Add support for CPR (Core Power Reduction) to the AVS (Adaptive
Voltage Scaling) subsystem (Niklas Cassel, Colin Ian King,
YueHaibing).
- Add support for TigerLake Mobile and JasperLake to the Intel RAPL
power capping driver (Zhang Rui).
- Update cpufreq drivers:
- Add i.MX8MP support to imx-cpufreq-dt (Anson Huang).
- Fix usage of a macro in loongson2_cpufreq (Alexandre Oliva).
- Fix cpufreq policy reference counting issues in s3c and
brcmstb-avs (chenqiwu).
- Fix ACPI table reference counting issue and HiSilicon quirk
handling in the CPPC driver (Hanjun Guo).
- Clean up spelling mistake in intel_pstate (Harry Pan).
- Convert the kirkwood and tegra186 drivers to using
devm_platform_ioremap_resource() (Yangtao Li).
- Update devfreq core:
- Add 'name' sysfs attribute for devfreq devices (Chanwoo Choi).
- Clean up the handing of transition statistics and allow them to
be reset by writing 0 to the 'trans_stat' devfreq device
attribute in sysfs (Kamil Konieczny).
- Add 'devfreq_summary' to debugfs (Chanwoo Choi).
- Clean up kerneldoc comments and Kconfig indentation (Krzysztof
Kozlowski, Randy Dunlap).
- Update devfreq drivers:
- Add dynamic scaling for the imx8m DDR controller and clean up
imx8m-ddrc (Leonard Crestez, YueHaibing).
- Fix DT node reference counting and nitialization error code path
in rk3399_dmc and add COMPILE_TEST and HAVE_ARM_SMCCC dependency
for it (Chanwoo Choi, Yangtao Li).
- Fix DT node reference counting in rockchip-dfi and make it use
devm_platform_ioremap_resource() (Yangtao Li).
- Fix excessive stack usage in exynos-ppmu (Arnd Bergmann).
- Fix initialization error code paths in exynos-bus (Yangtao Li).
- Clean up exynos-bus and exynos somewhat (Artur Świgoń, Krzysztof
Kozlowski).
- Add tracepoints for tracking usage_count updates unrelated to
status changes in PM-runtime (Michał Mirosław).
- Add sysfs attribute to control the "sync on suspend" behavior
during system-wide suspend (Jonas Meurer).
- Switch system-wide suspend tests over to 64-bit time (Alexandre
Belloni).
- Make wakeup sources statistics in debugfs cover deleted ones which
used to be the case some time ago (zhuguangqing).
- Clean up computations carried out during hibernation, update
messages related to hibernation and fix a spelling mistake in one
of them (Wen Yang, Luigi Semenzato, Colin Ian King).
- Add mailmap entry for maintainer e-mail address that has not been
functional for several years (Rafael Wysocki)"
* tag 'pm-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (83 commits)
cpufreq: loongson2_cpufreq: adjust cpufreq uses of LOONGSON_CHIPCFG
intel_idle: Clean up irtl_2_usec()
intel_idle: Move 3 functions closer to their callers
intel_idle: Annotate initialization code and data structures
intel_idle: Move and clean up intel_idle_cpuidle_devices_uninit()
intel_idle: Rearrange intel_idle_cpuidle_driver_init()
intel_idle: Clean up NULL pointer check in intel_idle_init()
intel_idle: Fold intel_idle_probe() into intel_idle_init()
intel_idle: Eliminate __setup_broadcast_timer()
cpuidle: fix cpuidle_find_deepest_state() kerneldoc warnings
cpuidle: sysfs: fix warnings when compiling with W=1
cpuidle: coupled: fix warnings when compiling with W=1
cpufreq: brcmstb-avs: fix imbalance of cpufreq policy refcount
PM: suspend: Add sysfs attribute to control the "sync on suspend" behavior
PM / devfreq: Add debugfs support with devfreq_summary file
Documentation: admin-guide: PM: Add intel_idle document
cpuidle: arm: Enable compile testing for some of drivers
PM-runtime: add tracepoints for usage_count changes
cpufreq: intel_pstate: fix spelling mistake: "Whethet" -> "Whether"
PM: hibernate: fix spelling mistake "shapshot" -> "snapshot"
...
|
|
This macro is never used from it was introduced in commit e6b1db98cf4d5
("security: Support early LSMs"), better to remove it.
Signed-off-by: Alex Shi <[email protected]>
Acked-by: Serge Hallyn <[email protected]>
Signed-off-by: James Morris <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
Pull regulator updates from Mark Brown:
"Hardly anything going on in the core this time around with the
regulator API and pretty quiet on the driver front:
- An API for comparing regulators, useful for devices that need to
check if supply voltages exactly match rather than just nominally
match.
- Conversion of several DT bindings to YAML format.
- Conversion of I2C drivers to probe_new().
- New drivers for Monolithic MPQ7920 and MP8859, and Rohm BD71828"
* tag 'regulator-v5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator: (34 commits)
dt-bindings: regulator: add document bindings for mpq7920
regulator: core: Fix exported symbols to the exported GPL version
regulator: mpq7920: Fix incorrect defines
regulator: vqmmc-ipq4019: Fix platform_no_drv_owner.cocci warnings
regulator: vctrl-regulator: Avoid deadlock getting and setting the voltage
regulator fix for "regulator: core: Add regulator_is_equal() helper"
regulator: core: Add regulator_is_equal() helper
regulator: mpq7920: Convert to use .probe_new
regulator: mpq7920: Remove unneeded fields from struct mpq7920_regulator_info
regulator: vqmmc-ipq4019: Trivial clean up
regulator: vqmmc-ipq4019: Remove ipq4019_regulator_remove
regulator: bindings: Drop document bindings for mpq7920
dt-bindings: Drop entry for Monolithic Power System, MPS
regulator: bd718x7: Simplify the code by removing struct bd718xx_pmic_inits
regulator: add IPQ4019 SDHCI VQMMC LDO driver
regulator: Convert i2c drivers to use .probe_new
regulator: mpq7920: Check the correct variable in mpq7920_regulator_register()
regulator: mpq7920: Fix Woverflow warning on conversion
regulator: mp8859: tidy up white space in probe
regulator: mpq7920: add mpq7920 regulator driver
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Pull spi updates from Mark Brown:
"Not much going on in the core for SPI this time but a reasonable
amount of change in the drivers:
- Removal of dmal_request_slave_channel() from Peter Ujfalusi.
- More conversions of drivers to GPIO descriptors from Linus Walleij.
- A big rework of the sh-msiof driver from Geert Uytterhoeven moving
it over to the generic native chipselect support.
- DMA support for the uniphier driver from Kunihiko Hayashi.
- New driver support for HiSilcon v3xx SPI NOR controllers from John
Garry"
* tag 'spi-v5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: (52 commits)
dt-binding: spi: add NPCM PSPI reset binding
spi: pxa2xx: Avoid touching SSCR0_SSE on MMP2
spi: spi-fsl-qspi: Ensure width is respected in spi-mem operations
spi: npcm-pspi: modify reset support
spi: npcm-pspi: improve spi transfer performance
spi: spi-ti-qspi: fix warning
spi: npcm-pspi: fix 16 bit send and receive support
spi: pxa2xx: Add support for Intel Comet Lake PCH-V
spi: fsl: simplify error path in of_fsl_spi_probe()
spi: fsl-lpspi: fix only one cs-gpio working
spi: spi-ti-qspi: optimize byte-transfers
spi: spi-ti-qspi: support large flash devices
spi: spi-qcom-qspi: Use device managed memory for clk_bulk_data
MAINTAINERS: Add a maintainer for the HiSilicon v3xx SFC driver
spi: Add HiSilicon v3xx SPI NOR flash controller driver
dt-bindings: spi_atmel: add microchip,sam9x60-spi
spi: bcm2835: Raise maximum number of slaves to 4
spi: sh-msiof: Do not redefine STR while compile testing
spi: rspi: Add support for GPIO chip selects
spi: rspi: Add support for multiple native chip selects
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap
Pull regmap updates from Mark Brown:
"This is quite a busy release for a subsystem that's usually very
quiet, though still a small set of updates in the grand scheme of
things:
- A fix for writes to non-incrementing registers.
- An iopoll() style helper for use with atomic safe regmaps, making
it easier to transition from raw memory mapped I/O.
- Some constification"
* tag 'regmap-v5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
regmap: fix writes to non incrementing registers
regmap: add iopoll-like atomic polling macro
regmap-i2c: constify regmap_bus structures
|
|
Add a typedef to for the fastop function prototype to make the code more
readable.
No functional change intended.
Signed-off-by: Sean Christopherson <[email protected]>
Reviewed-by: Miaohe Lin <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
It also helps eliminate some duplicated code.
Signed-off-by: Miaohe Lin <[email protected]>
Reviewed-by: Sean Christopherson <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
The function now has a single caller, so there is no point
in keeping it separate.
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Walk the host page tables to identify hugepage mappings for ZONE_DEVICE
pfns, i.e. DAX pages. Explicitly query kvm_is_zone_device_pfn() when
deciding whether or not to bother walking the host page tables, as DAX
pages do not set up the head/tail infrastructure, i.e. will return false
for PageCompound() even when using huge pages.
Zap ZONE_DEVICE sptes when disabling dirty logging, e.g. if live
migration fails, to allow KVM to rebuild large pages for DAX-based
mappings. Presumably DAX favors large pages, and worst case scenario is
a minor performance hit as KVM will need to re-fault all DAX-based
pages.
Suggested-by: Barret Rhoden <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Jason Zeng <[email protected]>
Cc: Dave Jiang <[email protected]>
Cc: Liran Alon <[email protected]>
Cc: linux-nvdimm <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Remove the late "lpage is disallowed" check from set_spte() now that the
initial check is performed after acquiring mmu_lock. Fold the guts of
the remaining helper, __mmu_gfn_lpage_is_disallowed(), into
kvm_mmu_hugepage_adjust() to eliminate the unnecessary slot !NULL check.
Signed-off-by: Sean Christopherson <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Fold max_mapping_level() into kvm_mmu_hugepage_adjust() now that HugeTLB
mappings are handled in kvm_mmu_hugepage_adjust(), i.e. there isn't a
need to pre-calculate the max mapping level. Co-locating all hugepage
checks eliminates a memslot lookup, at the cost of performing the
__mmu_gfn_lpage_is_disallowed() checks while holding mmu_lock.
The latency of lpage_is_disallowed() is likely negligible relative to
the rest of the code run while holding mmu_lock, and can be offset to
some extent by eliminating the mmu_gfn_lpage_is_disallowed() check in
set_spte() in a future patch. Eliminating the check in set_spte() is
made possible by performing the initial lpage_is_disallowed() checks
while holding mmu_lock.
Signed-off-by: Sean Christopherson <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Zap any compound page, e.g. THP or HugeTLB pages, when zapping sptes
that can potentially be converted to huge sptes after disabling dirty
logging on the associated memslot. Note, this approach could result in
false positives, e.g. if a random compound page is mapped into the
guest, but mapping non-huge compound pages into the guest is far from
the norm, and toggling dirty logging is not a frequent operation.
Signed-off-by: Sean Christopherson <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Remove logic to retrieve the original gfn now that HugeTLB mappings are
are identified in FNAME(fetch), i.e. FNAME(page_fault) no longer adjusts
the level or gfn.
Signed-off-by: Sean Christopherson <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Remove KVM's HugeTLB specific logic and instead rely on walking the host
page tables (already done for THP) to identify HugeTLB mappings.
Eliminating the HugeTLB-only logic avoids taking mmap_sem and calling
find_vma() for all hugepage compatible page faults, and simplifies KVM's
page fault code by consolidating all hugepage adjustments into a common
helper.
Signed-off-by: Sean Christopherson <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Remove fast_page_fault()'s optimization to stop the shadow walk if the
iterator level drops below the intended map level. The intended map
level is only acccurate for HugeTLB mappings (THP mappings are detected
after fast_page_fault()), i.e. it's not required for correctness, and
a future patch will also move HugeTLB mapping detection to after
fast_page_fault().
Signed-off-by: Sean Christopherson <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Explicitly walk the host page tables to identify THP mappings instead
of relying solely on the metadata in struct page. This sets the stage
for using a common method of identifying huge mappings regardless of the
underlying implementation (HugeTLB vs THB vs DAX), and hopefully avoids
the pitfalls of relying on metadata to identify THP mappings, e.g. see
commit 169226f7e0d2 ("mm: thp: handle page cache THP correctly in
PageTransCompoundMap") and the need for KVM to explicitly check for a
THP compound page. KVM will also naturally work with 1gb THP pages, if
they are ever supported.
Walking the tables for THP mappings is likely marginally slower than
querying metadata, but a future patch will reuse the walk to identify
HugeTLB mappings, at which point eliminating the existing VMA lookup for
HugeTLB will make this a net positive.
Cc: Andrea Arcangeli <[email protected]>
Cc: Barret Rhoden <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Refactor transparent_hugepage_adjust() in preparation for walking the
host page tables to identify hugepage mappings, initially for THP pages,
and eventualy for HugeTLB and DAX-backed pages as well. The latter
cases support 1gb pages, i.e. the adjustment logic needs access to the
max allowed level.
Signed-off-by: Sean Christopherson <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Add a helper, lookup_address_in_mm(), to traverse the page tables of a
given mm struct. KVM will use the helper to retrieve the host mapping
level, e.g. 4k vs. 2mb vs. 1gb, of a compound (or DAX-backed) page
without having to resort to implementation specific metadata. E.g. KVM
currently uses different logic for HugeTLB vs. THP, and would add a
third variant for DAX-backed files.
Cc: Dan Williams <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
Reviewed-by: Thomas Gleixner <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Avoid the "writable" check in __gfn_to_hva_many(), which will always fail
on read-only memslots due to gfn_to_hva() assuming writes. Functionally,
this allows x86 to create large mappings for read-only memslots that
are backed by HugeTLB mappings.
Note, the changelog for commit 05da45583de9 ("KVM: MMU: large page
support") states "If the largepage contains write-protected pages, a
large pte is not used.", but "write-protected" refers to pages that are
temporarily read-only, e.g. read-only memslots didn't even exist at the
time.
Fixes: 4d8b81abc47b ("KVM: introduce readonly memslot")
Cc: [email protected]
Signed-off-by: Sean Christopherson <[email protected]>
[Redone using kvm_vcpu_gfn_to_memslot_prot. - Paolo]
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Use kvm_vcpu_gfn_to_hva() when retrieving the host page size so that the
correct set of memslots is used when handling x86 page faults in SMM.
Fixes: 54bf36aac520 ("KVM: x86: use vcpu-specific functions to read/write/translate GFNs")
Cc: [email protected]
Signed-off-by: Sean Christopherson <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Add a helper, is_transparent_hugepage(), to explicitly check whether a
compound page is a THP and use it when populating KVM's secondary MMU.
The explicit check fixes a bug where a remapped compound page, e.g. for
an XDP Rx socket, is mapped into a KVM guest and is mistaken for a THP,
which results in KVM incorrectly creating a huge page in its secondary
MMU.
Fixes: 936a5fe6e6148 ("thp: kvm mmu transparent hugepage support")
Reported-by: [email protected]
Cc: Andrea Arcangeli <[email protected]>
Cc: [email protected]
Signed-off-by: Sean Christopherson <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Limit KVM's mapping level for HugeTLB based on its calculated max_level.
The max_level check prior to invoking host_mapping_level() only filters
out the case where KVM cannot create a 2mb mapping, it doesn't handle
the scenario where KVM can create a 2mb but not 1gb mapping, and the
host is using a 1gb HugeTLB mapping.
Fixes: 2f57b7051fe8 ("KVM: x86/mmu: Persist gfn_lpage_is_disallowed() to max_level")
Signed-off-by: Sean Christopherson <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Check the result of __kvm_gfn_to_hva_cache_init() and return immediately
instead of relying on the kvm_is_error_hva() check to detect errors so
that it's abundantly clear KVM intends to immediately bail on an error.
Note, the hva check is still mandatory to handle errors on subqeuesnt
calls with the same generation. Similarly, always return -EFAULT on
error so that multiple (bad) calls for a given generation will get the
same result, e.g. on an illegal gfn wrap, propagating the return from
__kvm_gfn_to_hva_cache_init() would cause the initial call to return
-EINVAL and subsequent calls to return -EFAULT.
Signed-off-by: Sean Christopherson <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Barret reported a (technically benign) bug where nr_pages_avail can be
accessed without being initialized if gfn_to_hva_many() fails.
virt/kvm/kvm_main.c:2193:13: warning: 'nr_pages_avail' may be
used uninitialized in this function [-Wmaybe-uninitialized]
Rather than simply squashing the warning by initializing nr_pages_avail,
fix the underlying issues by reworking __kvm_gfn_to_hva_cache_init() to
return immediately instead of continuing on. Now that all callers check
the result and/or bail immediately on a bad hva, there's no need to
explicitly nullify the memslot on error.
Reported-by: Barret Rhoden <[email protected]>
Fixes: f1b9dd5eb86c ("kvm: Disallow wraparound in kvm_gfn_to_hva_cache_init")
Cc: Jim Mattson <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
When reading/writing using the guest/host cache, check for a bad hva
before checking for a NULL memslot, which triggers the slow path for
handing cross-page accesses. Because the memslot is nullified on error
by __kvm_gfn_to_hva_cache_init(), if the bad hva is encountered after
crossing into a new page, then the kvm_{read,write}_guest() slow path
could potentially write/access the first chunk prior to detecting the
bad hva.
Arguably, performing a partial access is semantically correct from an
architectural perspective, but that behavior is certainly not intended.
In the original implementation, memslot was not explicitly nullified
and therefore the partial access behavior varied based on whether the
memslot itself was null, or if the hva was simply bad. The current
behavior was introduced as a seemingly unintentional side effect in
commit f1b9dd5eb86c ("kvm: Disallow wraparound in
kvm_gfn_to_hva_cache_init"), which justified the change with "since some
callers don't check the return code from this function, it sit seems
prudent to clear ghc->memslot in the event of an error".
Regardless of intent, the partial access is dependent on _not_ checking
the result of the cache initialization, which is arguably a bug in its
own right, at best simply weird.
Fixes: 8f964525a121 ("KVM: Allow cross page reads and writes from cached translations.")
Cc: Jim Mattson <[email protected]>
Cc: Andrew Honig <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
kvm_vector_hashing_enabled() is just called in kvm.ko module.
Signed-off-by: Peng Hao <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
vmx_set_segment() clears segment cache unconditionally, so we should not
clear it again by calling vmx_segment_cache_clear().
Signed-off-by: Miaohe Lin <[email protected]>
Reviewed-by: Vitaly Kuznetsov <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
These two conditions are in conflict, adding 'else' to reduce checking.
Signed-off-by: Haiwei Li <[email protected]>
Reviewed-by: Vitaly Kuznetsov <[email protected]>
Reviewed-by: Miaohe Lin <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
According to section "Checks on Guest Control Registers, Debug Registers, and
and MSRs" in Intel SDM vol 3C, the following checks are performed on vmentry
of nested guests:
If the "load debug controls" VM-entry control is 1, bits 63:32 in the DR7
field must be 0.
In KVM, GUEST_DR7 is set prior to the vmcs02 VM-entry by kvm_set_dr() and the
latter synthesizes a #GP if any bit in the high dword in the former is set.
Hence this field needs to be checked in software.
Signed-off-by: Krish Sadhukhan <[email protected]>
Reviewed-by: Karl Heubaum <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
After commit 61bd0f66ff92 ("KVM: PPC: Book3S HV: Fix guest time accounting
with VIRT_CPU_ACCOUNTING_GEN"), no one use this function anymore, So better
to remove it.
Signed-off-by: Alex Shi <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: [email protected]
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
For ring-based dirty log tracking, it will be more efficient to account
writes during schedule-out or schedule-in to the currently running VCPU.
We would like to do it even if the write doesn't use the current VCPU's
address space, as is the case for cached writes (see commit 4e335d9e7ddb,
"Revert "KVM: Support vCPU-based gfn->hva cache"", 2017-05-02).
Therefore, add a mechanism to track the currently-loaded kvm_vcpu struct.
There is already something similar in KVM/ARM; one important difference
is that kvm_arch_vcpu_{load,put} have two callers in virt/kvm/kvm_main.c:
we have to update both the architecture-independent vcpu_{load,put} and
the preempt notifiers.
Another change made in the process is to allow using kvm_get_running_vcpu()
in preemptible code. This is allowed because preempt notifiers ensure
that the value does not change even after the VCPU thread is migrated.
Signed-off-by: Paolo Bonzini <[email protected]>
Reviewed-by: Paolo Bonzini <[email protected]>
Signed-off-by: Peter Xu <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
The helper x86_set_memory_region() is only used in vmx_set_tss_addr()
and kvm_arch_destroy_vm(). Push the lock upper in both cases. With
that, drop x86_set_memory_region().
This prepares to allow __x86_set_memory_region() to return a HVA
mapped, because the HVA will need to be protected by the lock too even
after __x86_set_memory_region() returns.
Signed-off-by: Peter Xu <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
We've already got the slots_lock, so we should be safe.
Signed-off-by: Peter Xu <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
It's already going to reach 2400 Bytes (which is over half of page
size on 4K page archs), so maybe it's good to have this build-time
check in case it overflows when adding new fields.
Signed-off-by: Peter Xu <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Remove kvm_read_guest_atomic() because it's not used anywhere.
Signed-off-by: Peter Xu <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
nested_enable_evmcs()
In nested_enable_evmcs() evmcs_already_enabled check doesn't really do
anything: controls are already sanitized and we return '0' regardless.
Just drop the check.
Signed-off-by: Vitaly Kuznetsov <[email protected]>
Reviewed-by: Liran Alon <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|