Age | Commit message (Collapse) | Author | Files | Lines |
|
There's an OF helper called of_dma_is_coherent(), which checks if a
device has a "dma-coherent" property to see if the device is coherent
for DMA.
But on some platforms devices are coherent by default, and on some
platforms it's not possible to update existing device trees to add the
"dma-coherent" property.
So add a Kconfig symbol to allow arch code to tell
of_dma_is_coherent() that devices are coherent by default, regardless
of the presence of the property.
Select that symbol on powerpc when NOT_COHERENT_CACHE is not set, ie.
when the system has a coherent cache.
Fixes: 92ea637edea3 ("of: introduce of_dma_is_coherent() helper")
Cc: [email protected] # v3.16+
Reported-by: Christian Zigotzky <[email protected]>
Tested-by: Christian Zigotzky <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Reviewed-by: Ulf Hansson <[email protected]>
Signed-off-by: Rob Herring <[email protected]>
|
|
According to the ARM ARM, registers CNT{P,V}_TVAL_EL0 have bits [63:32]
RES0 [1]. When reading the register, the value is truncated to the least
significant 32 bits [2], and on writes, TimerValue is treated as a signed
32-bit integer [1, 2].
When the guest behaves correctly and writes 32-bit values, treating TVAL
as an unsigned 64 bit register works as expected. However, things start
to break down when the guest writes larger values, because
(u64)0x1_ffff_ffff = 8589934591. but (s32)0x1_ffff_ffff = -1, and the
former will cause the timer interrupt to be asserted in the future, but
the latter will cause it to be asserted now. Let's treat TVAL as a
signed 32-bit register on writes, to match the behaviour described in
the architecture, and the behaviour experimentally exhibited by the
virtual timer on a non-vhe host.
[1] Arm DDI 0487E.a, section D13.8.18
[2] Arm DDI 0487E.a, section D11.2.4
Signed-off-by: Alexandru Elisei <[email protected]>
[maz: replaced the read-side mask with lower_32_bits]
Signed-off-by: Marc Zyngier <[email protected]>
Fixes: 8fa761624871 ("KVM: arm/arm64: arch_timer: Fix CNTP_TVAL calculation")
Link: https://lore.kernel.org/r/[email protected]
|
|
Let the code never use unsupported event counters. Change
kvm_pmu_handle_pmcr() to only reset supported counters and
kvm_pmu_vcpu_reset() to only stop supported counters.
Other actions are filtered on the supported counters in
kvm/sysregs.c
Signed-off-by: Eric Auger <[email protected]>
Signed-off-by: Marc Zyngier <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
At the moment a SW_INCR counter always overflows on 32-bit
boundary, independently on whether the n+1th counter is
programmed as CHAIN.
Check whether the SW_INCR counter is a 64b counter and if so,
implement the 64b logic.
Fixes: 80f393a23be6 ("KVM: arm/arm64: Support chained PMU counters")
Signed-off-by: Eric Auger <[email protected]>
Signed-off-by: Marc Zyngier <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
At the moment we update the chain bitmap on type setting. This
does not take into account the enable state of the odd register.
Let's make sure a counter is never considered as chained if
the high counter is disabled.
We recompute the chain state on enable/disable and type changes.
Also let create_perf_event() use the chain bitmap and not use
kvm_pmu_idx_has_chain_evtype().
Suggested-by: Marc Zyngier <[email protected]>
Signed-off-by: Eric Auger <[email protected]>
Signed-off-by: Marc Zyngier <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
The specification says PMSWINC increments PMEVCNTR<n>_EL1 by 1
if PMEVCNTR<n>_EL0 is enabled and configured to count SW_INCR.
For PMEVCNTR<n>_EL0 to be enabled, we need both PMCNTENSET to
be set for the corresponding event counter but we also need
the PMCR.E bit to be set.
Fixes: 7a0adc7064b8 ("arm64: KVM: Add access handler for PMSWINC register")
Signed-off-by: Eric Auger <[email protected]>
Signed-off-by: Marc Zyngier <[email protected]>
Reviewed-by: Andrew Murray <[email protected]>
Acked-by: Marc Zyngier <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Add default MDIO_BCM_IPROC Kconfig setting such that it is default
on for IPROC family of devices.
Signed-off-by: Scott Branden <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Multicast and broadcast packets can be looped from egress to ingress
pre segmentation with dev_loopback_xmit. That function unconditionally
sets ip_summed to CHECKSUM_UNNECESSARY.
udp_rcv_segment segments gso packets in the udp rx path. Segmentation
usually executes on egress, and does not expect packets of this type.
__udp_gso_segment interprets !CHECKSUM_PARTIAL as CHECKSUM_NONE. But
the offsets are not correct for gso_make_checksum.
UDP GSO packets are of type CHECKSUM_PARTIAL, with their uh->check set
to the correct pseudo header checksum. Reset ip_summed to this type.
(CHECKSUM_PARTIAL is allowed on ingress, see comments in skbuff.h)
Reported-by: syzbot <[email protected]>
Fixes: cf329aa42b66 ("udp: cope with UDP GRO packet misdirection")
Signed-off-by: Willem de Bruijn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The old netem mailing list was inactive and recently was targeted by
spammers. Switch to just using netdev mailing list which is where all
the real change happens.
Signed-off-by: Stephen Hemminger <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
There are several storage drivers like dm-multipath, iscsi, tcmu-runner,
amd nbd that have userspace components that can run in the IO path. For
example, iscsi and nbd's userspace deamons may need to recreate a socket
and/or send IO on it, and dm-multipath's daemon multipathd may need to
send SG IO or read/write IO to figure out the state of paths and re-set
them up.
In the kernel these drivers have access to GFP_NOIO/GFP_NOFS and the
memalloc_*_save/restore functions to control the allocation behavior,
but for userspace we would end up hitting an allocation that ended up
writing data back to the same device we are trying to allocate for.
The device is then in a state of deadlock, because to execute IO the
device needs to allocate memory, but to allocate memory the memory
layers want execute IO to the device.
Here is an example with nbd using a local userspace daemon that performs
network IO to a remote server. We are using XFS on top of the nbd device,
but it can happen with any FS or other modules layered on top of the nbd
device that can write out data to free memory. Here a nbd daemon helper
thread, msgr-worker-1, is performing a write/sendmsg on a socket to execute
a request. This kicks off a reclaim operation which results in a WRITE to
the nbd device and the nbd thread calling back into the mm layer.
[ 1626.609191] msgr-worker-1 D 0 1026 1 0x00004000
[ 1626.609193] Call Trace:
[ 1626.609195] ? __schedule+0x29b/0x630
[ 1626.609197] ? wait_for_completion+0xe0/0x170
[ 1626.609198] schedule+0x30/0xb0
[ 1626.609200] schedule_timeout+0x1f6/0x2f0
[ 1626.609202] ? blk_finish_plug+0x21/0x2e
[ 1626.609204] ? _xfs_buf_ioapply+0x2e6/0x410
[ 1626.609206] ? wait_for_completion+0xe0/0x170
[ 1626.609208] wait_for_completion+0x108/0x170
[ 1626.609210] ? wake_up_q+0x70/0x70
[ 1626.609212] ? __xfs_buf_submit+0x12e/0x250
[ 1626.609214] ? xfs_bwrite+0x25/0x60
[ 1626.609215] xfs_buf_iowait+0x22/0xf0
[ 1626.609218] __xfs_buf_submit+0x12e/0x250
[ 1626.609220] xfs_bwrite+0x25/0x60
[ 1626.609222] xfs_reclaim_inode+0x2e8/0x310
[ 1626.609224] xfs_reclaim_inodes_ag+0x1b6/0x300
[ 1626.609227] xfs_reclaim_inodes_nr+0x31/0x40
[ 1626.609228] super_cache_scan+0x152/0x1a0
[ 1626.609231] do_shrink_slab+0x12c/0x2d0
[ 1626.609233] shrink_slab+0x9c/0x2a0
[ 1626.609235] shrink_node+0xd7/0x470
[ 1626.609237] do_try_to_free_pages+0xbf/0x380
[ 1626.609240] try_to_free_pages+0xd9/0x1f0
[ 1626.609245] __alloc_pages_slowpath+0x3a4/0xd30
[ 1626.609251] ? ___slab_alloc+0x238/0x560
[ 1626.609254] __alloc_pages_nodemask+0x30c/0x350
[ 1626.609259] skb_page_frag_refill+0x97/0xd0
[ 1626.609274] sk_page_frag_refill+0x1d/0x80
[ 1626.609279] tcp_sendmsg_locked+0x2bb/0xdd0
[ 1626.609304] tcp_sendmsg+0x27/0x40
[ 1626.609307] sock_sendmsg+0x54/0x60
[ 1626.609308] ___sys_sendmsg+0x29f/0x320
[ 1626.609313] ? sock_poll+0x66/0xb0
[ 1626.609318] ? ep_item_poll.isra.15+0x40/0xc0
[ 1626.609320] ? ep_send_events_proc+0xe6/0x230
[ 1626.609322] ? hrtimer_try_to_cancel+0x54/0xf0
[ 1626.609324] ? ep_read_events_proc+0xc0/0xc0
[ 1626.609326] ? _raw_write_unlock_irq+0xa/0x20
[ 1626.609327] ? ep_scan_ready_list.constprop.19+0x218/0x230
[ 1626.609329] ? __hrtimer_init+0xb0/0xb0
[ 1626.609331] ? _raw_spin_unlock_irq+0xa/0x20
[ 1626.609334] ? ep_poll+0x26c/0x4a0
[ 1626.609337] ? tcp_tsq_write.part.54+0xa0/0xa0
[ 1626.609339] ? release_sock+0x43/0x90
[ 1626.609341] ? _raw_spin_unlock_bh+0xa/0x20
[ 1626.609342] __sys_sendmsg+0x47/0x80
[ 1626.609347] do_syscall_64+0x5f/0x1c0
[ 1626.609349] ? prepare_exit_to_usermode+0x75/0xa0
[ 1626.609351] entry_SYSCALL_64_after_hwframe+0x44/0xa9
This patch adds a new prctl command that daemons can use after they have
done their initial setup, and before they start to do allocations that
are in the IO path. It sets the PF_MEMALLOC_NOIO and PF_LESS_THROTTLE
flags so both userspace block and FS threads can use it to avoid the
allocation recursion and try to prevent from being throttled while
writing out data to free up memory.
Signed-off-by: Mike Christie <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Tested-by: Masato Suzuki <[email protected]>
Reviewed-by: Damien Le Moal <[email protected]>
Reviewed-by: Bart Van Assche <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Christian Brauner <[email protected]>
|
|
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Signed-off-by: Ingo Molnar <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 pti updates from Thomas Gleixner:
"The performance deterioration departement provides a few non-scary
fixes and improvements:
- Update the cached HLE state when the TSX state is changed via the
new control register. This ensures feature bit consistency.
- Exclude the new Zhaoxin CPUs from Spectre V2 and SWAPGS
vulnerabilities"
* tag 'x86-pti-2020-01-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/speculation/swapgs: Exclude Zhaoxin CPUs from SWAPGS vulnerability
x86/speculation/spectre_v2: Exclude Zhaoxin CPUs from SPECTRE_V2
x86/cpu: Update cached HLE state on write to TSX_CTRL_CPUID_CLEAR
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq updates from Thomas Gleixner:
"The interrupt departement provides:
- A mechanism to shield isolated tasks from managed interrupts:
The affinity of managed interrupts is completely controlled by the
kernel and user space has no influence on them. The reason is that
the automatically assigned affinity correlates to the multi-queue
CPU handling of block devices.
If the generated affinity mask spaws both housekeeping and isolated
CPUs the interrupt could be routed to an isolated CPU which would
then be disturbed by I/O submitted by a housekeeping CPU.
The new mechamism ensures that as long as one housekeeping CPU is
online in the assigned affinity mask the interrupt is routed to a
housekeeping CPU.
If there is no online housekeeping CPU in the affinity mask, then
the interrupt is routed to an isolated CPU to keep the device queue
intact, but unless the isolated CPU submits I/O by itself these
interrupts are not raised.
- A small addon to the device tree irqdomain core code to avoid
duplication in irq chip drivers
- Conversion of the SiFive PLIC to hierarchical domains
- The usual pile of new irq chip drivers: SiFive GPIO, Aspeed SCI,
NXP INTMUX, Meson A1 GPIO
- The first cut of support for the new ARM GICv4.1
- The usual pile of fixes and improvements in core and driver code"
* tag 'irq-core-2020-01-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (33 commits)
genirq, sched/isolation: Isolate from handling managed interrupts
irqchip/gic-v4.1: Allow direct invalidation of VLPIs
irqchip/gic-v4.1: Suppress per-VLPI doorbell
irqchip/gic-v4.1: Add VPE INVALL callback
irqchip/gic-v4.1: Add VPE eviction callback
irqchip/gic-v4.1: Add VPE residency callback
irqchip/gic-v4.1: Add mask/unmask doorbell callbacks
irqchip/gic-v4.1: Plumb skeletal VPE irqchip
irqchip/gic-v4.1: Implement the v4.1 flavour of VMOVP
irqchip/gic-v4.1: Don't use the VPE proxy if RVPEID is set
irqchip/gic-v4.1: Implement the v4.1 flavour of VMAPP
irqchip/gic-v4.1: VPE table (aka GICR_VPROPBASER) allocation
irqchip/gic-v3: Add GICv4.1 VPEID size discovery
irqchip/gic-v3: Detect GICv4.1 supporting RVPEID
irqchip/gic-v3-its: Fix get_vlpi_map() breakage with doorbells
irqdomain: Fix a memory leak in irq_domain_push_irq()
irqchip: Add NXP INTMUX interrupt multiplexer support
dt-bindings: interrupt-controller: Add binding for NXP INTMUX interrupt multiplexer
irqchip: Define EXYNOS_IRQ_COMBINER
irqchip/meson-gpio: Add support for meson a1 SoCs
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull core SMP updates from Thomas Gleixner:
"A small set of SMP core code changes:
- Rework the smp function call core code to avoid the allocation of
an additional cpumask
- Remove the not longer required GFP argument from on_each_cpu_cond()
and on_each_cpu_cond_mask() and fixup the callers"
* tag 'smp-core-2020-01-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
smp: Remove allocation mask from on_each_cpu_cond.*()
smp: Add a smp_cond_func_t argument to smp_call_function_many()
smp: Use smp_cond_func_t as type for the conditional function
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer updates from Thomas Gleixner:
"The timekeeping and timers departement provides:
- Time namespace support:
If a container migrates from one host to another then it expects
that clocks based on MONOTONIC and BOOTTIME are not subject to
disruption. Due to different boot time and non-suspended runtime
these clocks can differ significantly on two hosts, in the worst
case time goes backwards which is a violation of the POSIX
requirements.
The time namespace addresses this problem. It allows to set offsets
for clock MONOTONIC and BOOTTIME once after creation and before
tasks are associated with the namespace. These offsets are taken
into account by timers and timekeeping including the VDSO.
Offsets for wall clock based clocks (REALTIME/TAI) are not provided
by this mechanism. While in theory possible, the overhead and code
complexity would be immense and not justified by the esoteric
potential use cases which were discussed at Plumbers '18.
The overhead for tasks in the root namespace (ie where host time
offsets = 0) is in the noise and great effort was made to ensure
that especially in the VDSO. If time namespace is disabled in the
kernel configuration the code is compiled out.
Kudos to Andrei Vagin and Dmitry Sofanov who implemented this
feature and kept on for more than a year addressing review
comments, finding better solutions. A pleasant experience.
- Overhaul of the alarmtimer device dependency handling to ensure
that the init/suspend/resume ordering is correct.
- A new clocksource/event driver for Microchip PIT64
- Suspend/resume support for the Hyper-V clocksource
- The usual pile of fixes, updates and improvements mostly in the
driver code"
* tag 'timers-core-2020-01-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (71 commits)
alarmtimer: Make alarmtimer_get_rtcdev() a stub when CONFIG_RTC_CLASS=n
alarmtimer: Use wakeup source from alarmtimer platform device
alarmtimer: Make alarmtimer platform device child of RTC device
alarmtimer: Update alarmtimer_get_rtcdev() docs to reflect reality
hrtimer: Add missing sparse annotation for __run_timer()
lib/vdso: Only read hrtimer_res when needed in __cvdso_clock_getres()
MIPS: vdso: Define BUILD_VDSO32 when building a 32bit kernel
clocksource/drivers/hyper-v: Set TSC clocksource as default w/ InvariantTSC
clocksource/drivers/hyper-v: Untangle stimers and timesync from clocksources
clocksource/drivers/timer-microchip-pit64b: Fix sparse warning
clocksource/drivers/exynos_mct: Rename Exynos to lowercase
clocksource/drivers/timer-ti-dm: Fix uninitialized pointer access
clocksource/drivers/timer-ti-dm: Switch to platform_get_irq
clocksource/drivers/timer-ti-dm: Convert to devm_platform_ioremap_resource
clocksource/drivers/em_sti: Fix variable declaration in em_sti_probe
clocksource/drivers/em_sti: Convert to devm_platform_ioremap_resource
clocksource/drivers/bcm2835_timer: Fix memory leak of timer
clocksource/drivers/cadence-ttc: Use ttc driver as platform driver
clocksource/drivers/timer-microchip-pit64b: Add Microchip PIT64B support
clocksource/drivers/hyper-v: Reserve PAGE_SIZE space for tsc page
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull debugobjects update from Thomas Gleixner:
"A single commit for debug objects which fixes a pile of potential data
races detected by KCSAN"
* tag 'core-debugobjects-2020-01-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
debugobjects: Fix various data races
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull watchdog updates from Thomas Gleixner:
"A set of watchdog/softlockup related improvements:
- Enforce that the watchdog timestamp is always valid on boot. The
original implementation caused a watchdog disabled gap of one
second in the boot process due to truncation of the underlying
sched clock.
The sched clock is divided by 1e9 to convert nanoseconds to
seconds. So for the first second of the boot process the result is
0 which is at the same time the indicator to disable the watchdog.
The trivial fix is to change the disabled indicator to ULONG_MAX.
- Two cleanup patches removing unused and redundant code which got
forgotten to be cleaned up in previous changes"
* tag 'core-core-2020-01-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
watchdog/softlockup: Enforce that timestamp is valid on boot
watchdog/softlockup: Remove obsolete check of last reported task
watchdog: Remove soft_lockup_hrtimer_cnt and related code
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fixes from Thomas Gleixner:
"Two fixes for the generic VDSO code which missed 5.5:
- Make the update to the coarse timekeeper unconditional.
This is required because the coarse timekeeper interfaces in the
VDSO do not depend on a VDSO capable clocksource. If the system
does not have a VDSO capable clocksource and the update is
depending on the VDSO capable clocksource, the coarse VDSO
interfaces would operate on stale data forever.
- Invert the logic of __arch_update_vdso_data() to avoid further head
scratching.
Tripped over this several times while analyzing the update problem
above"
* tag 'timers-urgent-2020-01-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
lib/vdso: Update coarse timekeeper unconditionally
lib/vdso: Make __arch_update_vdso_data() logic understandable
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux
Pull SELinux update from Paul Moore:
"This is one of the bigger SELinux pull requests in recent years with
28 patches. Everything is passing our test suite and the highlights
are below:
- Mark CONFIG_SECURITY_SELINUX_DISABLE as deprecated. We're some time
away from actually attempting to remove this in the kernel, but the
only distro we know that still uses it (Fedora) is working on
moving away from this so we want to at least let people know we are
planning to remove it.
- Reorder the SELinux hooks to help prevent bad things when SELinux
is disabled at runtime. The proper fix is to remove the
CONFIG_SECURITY_SELINUX_DISABLE functionality (see above) and just
take care of it at boot time (e.g. "selinux=0").
- Add SELinux controls for the kernel lockdown functionality,
introducing a new SELinux class/permissions: "lockdown { integrity
confidentiality }".
- Add a SELinux control for move_mount(2) that reuses the "file {
mounton }" permission.
- Improvements to the SELinux security label data store lookup
functions to speed up translations between our internal label
representations and the visible string labels (both directions).
- Revisit a previous fix related to SELinux inode auditing and
permission caching and do it correctly this time.
- Fix the SELinux access decision cache to cleanup properly on error.
In some extreme cases this could limit the cache size and result in
a decrease in performance.
- Enable SELinux per-file labeling for binderfs.
- The SELinux initialized and disabled flags were wrapped with
accessors to ensure they are accessed correctly.
- Mark several key SELinux structures with __randomize_layout.
- Changes to the LSM build configuration to only build
security/lsm_audit.c when needed.
- Changes to the SELinux build configuration to only build the IB
object cache when CONFIG_SECURITY_INFINIBAND is enabled.
- Move a number of single-caller functions into their callers.
- Documentation fixes (/selinux -> /sys/fs/selinux).
- A handful of cleanup patches that aren't worth mentioning on their
own, the individual descriptions have plenty of detail"
* tag 'selinux-pr-20200127' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux: (28 commits)
selinux: fix regression introduced by move_mount(2) syscall
selinux: do not allocate ancillary buffer on first load
selinux: remove redundant allocation and helper functions
selinux: remove redundant selinux_nlmsg_perm
selinux: fix wrong buffer types in policydb.c
selinux: reorder hooks to make runtime disable less broken
selinux: treat atomic flags more carefully
selinux: make default_noexec read-only after init
selinux: move ibpkeys code under CONFIG_SECURITY_INFINIBAND.
selinux: remove redundant msg_msg_alloc_security
Documentation,selinux: fix references to old selinuxfs mount point
selinux: deprecate disabling SELinux and runtime
selinux: allow per-file labelling for binderfs
selinuxfs: use scnprintf to get real length for inode
selinux: remove set but not used variable 'sidtab'
selinux: ensure the policy has been loaded before reading the sidtab stats
selinux: ensure we cleanup the internal AVC counters on error in avc_update()
selinux: randomize layout of key structures
selinux: clean up selinux_enabled/disabled/enforcing_boot
selinux: remove unnecessary selinux cred request
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit
Pull audit update from Paul Moore:
"One small audit patch for the Linux v5.6 merge window, and
unsurprisingly it passes our test suite with flying colors"
* tag 'audit-pr-20200127' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
audit: Add __rcu annotation to RCU pointer
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup updates from Tejun Heo:
- cgroup2 interface for hugetlb controller. I think this was the last
remaining bit which was missing from cgroup2
- fixes for race and a spurious warning in threaded cgroup handling
- other minor changes
* 'for-5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
iocost: Fix iocost_monitor.py due to helper type mismatch
cgroup: Prevent double killing of css when enabling threaded cgroup
cgroup: fix function name in comment
mm: hugetlb controller for cgroups v2
|
|
Pull workqueue updates from Tejun Heo:
"Just a couple tracepoint patches"
* 'for-5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
workqueue: remove workqueue_work event class
workqueue: add worker function to workqueue_execute_end tracepoint
|
|
In preparation for sharing an io-wq across different users, add a
reference count that manages destruction of it.
Reviewed-by: Pavel Begunkov <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
In case of out of memory the second argument of percpu_ref_put_many() in
io_submit_sqes() may evaluate into "nr - (-EAGAIN)", that is clearly
wrong.
Fixes: 2b85edfc0c90 ("io_uring: batch getting pcpu references")
Signed-off-by: Pavel Begunkov <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Draining the middle of a link is tricky, so leave a comment there
Signed-off-by: Pavel Begunkov <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
For the non-vectored variant of READV/WRITEV, we don't need to setup an
async io context, and we flag that appropriately in the io_op_defs
array. However, in fixing this for the 5.5 kernel in commit 74566df3a71c
we didn't have these opcodes, so the check there was added just for the
READ_FIXED and WRITE_FIXED opcodes. Replace that check with just a
single check for needing async context, that covers all four of these
read/write variants that don't use an iovec.
Signed-off-by: Jens Axboe <[email protected]>
|
|
scripts/find-unused-docs.sh invokes scripts/kernel-doc to find out if a
source file contains kerneldoc or not.
However, as it passes the no longer supported "-text" option to
scripts/kernel-doc, the latter prints out its help text, causing all
files to be considered containing kerneldoc.
Get rid of these false positives by removing the no longer supported
"-text" option from the scripts/kernel-doc invocation.
Cc: [email protected] # 4.16+
Fixes: b05142675310d2ac ("scripts: kernel-doc: get rid of unused output formats")
Signed-off-by: Geert Uytterhoeven <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jonathan Corbet <[email protected]>
|
|
Pull ioremap updates from Christoph Hellwig:
"Remove the ioremap_nocache API (plus wrappers) that are always
identical to ioremap"
* tag 'ioremap-5.6' of git://git.infradead.org/users/hch/ioremap:
remove ioremap_nocache and devm_ioremap_nocache
MIPS: define ioremap_nocache to ioremap
|
|
Pull libata updates from Jens Axboe:
"As usual pretty quiet, mostly trivial fixes (or dead code removal),
outside of various fixes for ahci_bcrm"
* tag 'for-5.6/libata-2020-01-27' of git://git.kernel.dk/linux-block:
ata/acard_ahci: remove unused variable n_elem
ata: pata_macio: fix comparing pointer to 0
ata: ahci_brcm: BCM7216 reset is self de-asserting
ata: ahci_brcm: Perform reset after obtaining resources
ata: brcm: fix reset controller API usage
ata: brcm: mark PM functions as __maybe_unused
ata: ahci_brcm: Support BCM7216 reset controller name
dt-bindings: ata: Document BCM7216 AHCI controller compatible
ata: ahci_brcm: Add a shutdown callback
ata: ahci_brcm: Manage reset line during suspend/resume
|
|
Pull block driver updates from Jens Axboe:
"Like the core side, not a lot of changes here, just two main items:
- Series of patches (via Coly) with fixes for bcache (Coly,
Christoph)
- MD pull request from Song"
* tag 'for-5.6/drivers-2020-01-27' of git://git.kernel.dk/linux-block: (31 commits)
bcache: reap from tail of c->btree_cache in bch_mca_scan()
bcache: reap c->btree_cache_freeable from the tail in bch_mca_scan()
bcache: remove member accessed from struct btree
bcache: print written and keys in trace_bcache_btree_write
bcache: avoid unnecessary btree nodes flushing in btree_flush_write()
bcache: add code comments for state->pool in __btree_sort()
lib: crc64: include <linux/crc64.h> for 'crc64_be'
bcache: use read_cache_page_gfp to read the superblock
bcache: store a pointer to the on-disk sb in the cache and cached_dev structures
bcache: return a pointer to the on-disk sb from read_super
bcache: transfer the sb_page reference to register_{bdev,cache}
bcache: fix use-after-free in register_bcache()
bcache: properly initialize 'path' and 'err' in register_bcache()
bcache: rework error unwinding in register_bcache
bcache: use a separate data structure for the on-disk super block
bcache: cached_dev_free needs to put the sb page
md/raid1: introduce wait_for_serialization
md/raid1: use bucket based mechanism for IO serialization
md: introduce a new struct for IO serialization
md: don't destroy serial_info_pool if serialize_policy is true
...
|
|
Pull core block updates from Jens Axboe:
"This may be the most quiet round we've had in years. I'm not
complaining. Really not a lot to detail here, outside of spelling and
documentation improvements/fixes, we have:
- Allow t10-pi to be modular (Herbert)
- Remove dead code in bfq (Alex)
- Mark zone management requests with REQ_SYNC (Chaitanya)
- BFQ division improvement (Wen)
- Small series improving plugging (Pavel)"
* tag 'for-5.6/block-2020-01-27' of git://git.kernel.dk/linux-block:
partitions/ldm: fix spelling mistake "to" -> "too"
block, bfq: improve arithmetic division in bfq_delta()
block/bfq: remove unused bfq_class_rt which never used
block: mark zone-mgmt bios with REQ_SYNC
blk-mq: Document functions for sending request
block: Allow t10-pi to be modular
blk-mq: optimise blk_mq_flush_plug_list()
list: introduce list_for_each_continue()
blk-mq: optimise rq sort function
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull PNP updates from Rafael Wysocki:
"Get rid of unused variable and function (yu kuai)"
* tag 'pnp-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PNP: isapnp: remove defined but not used function 'isapnp_checksum'
PNP: isapnp: remove set but not used variable 'checksum'
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull device properties framework updates from Rafael Wysocki:
"Add support for reference properties in sofrware nodes (Dmitry
Torokhov) and a basic test for property entries along with fixes on
top of it (Dmitry Torokhov, Qian Cai, Alan Maguire)"
* tag 'devprop-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
software node: introduce CONFIG_KUNIT_DRIVER_PE_TEST
usb: dwc3: use proper initializers for property entries
drivers/base/test: fix global-out-of-bounds error
software node: add basic tests for property entries
software node: remove separate handling of references
platform/x86: intel_cht_int33fe: use inline reference properties
software node: implement reference properties
software node: allow embedding of small arrays into property_entry
software node: replace is_array with is_inline
|
|
Move blk_queue_make_request() to dm.c:alloc_dev() so that
q->make_request_fn is never NULL during the lifetime of a DM device
(even one that is created without a DM table).
Otherwise generic_make_request() will crash simply by doing:
dmsetup create -n test
mount /dev/dm-N /mnt
While at it, move ->congested_data initialization out of
dm.c:alloc_dev() and into the bio-based specific init method.
Reported-by: Stefan Bader <[email protected]>
BugLink: https://bugs.launchpad.net/bugs/1860231
Fixes: ff36ab34583a ("dm: remove request-based logic from make_request_fn wrapper")
Depends-on: c12c9a3c3860c ("dm: various cleanups to md->queue initialization code")
Cc: [email protected]
Signed-off-by: Mike Snitzer <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI updates from Rafael Wysocki:
"These update the ACPICA code in the kernel to the most recent upstream
revision (20200110), add new hardware support to a handful of ACPI
drivers, make the ACPI fan driver expose power states information for
fans, add some more quirks, fix bugs and clean up assorted things.
Specifics:
- Update the ACPICA code in the kernel to upstream revision 20200110
including:
- Update of copyright notices to 2020 (Bob Moore).
- Dispatcher fix to always generate buffer objects for the ASL
create_field() operator (Maximilian Luz).
- Debugger cleanup (Colin Ian King).
- Disassembler change to create buffer fields in
ACPI_PARSE_LOAD_PASS1 (Erik Kaneda).
- UNIX line ending support for non-windows builds in acpisrc (Erik
Kaneda).
- Update the list of ACPICA maintainers (Rafael Wysocki).
- Add Intel Tiger Lake ACPI device IDs to the ACPI DPTF, ACPI fan,
int340x_thermal and intel-hid drivers (Gayatri Kammela).
- Make the ACPI fan driver create additional sysfs attributes to
expose power states information for fans (Srinivas Pandruvada).
- Fix up the ACPI battery driver to deal with unexpected battery
capacity information in a better way (Hans de Goede).
- Add ACPI backlight quirks for Lenovo E41-25/45 and MSI MS-7721
boards (Aaron Ma, Hans de Goede).
- Add DMI quirk for Razer Blade Stealth 13 late 2019 lid switch to
the ACPI button driver (Jason Ekstrand).
- Drop TIMER_DEFERRABLE from the GHES polling mode timer function
flags to make it run precisely at the configured time (Bhaskar
Upadhaya).
- Fix race condition related to the reference counting of query
handlers in the ACPI EC driver (Rafael Wysocki).
- Fix ACPI tools build issue (Zhengyuan Liu).
- Replace dma_request_slave_channel() with dma_request_chan() in the
firmware guide documentation for ACPI (Peter Ujfalusi).
- Fix typo in a comment and clean up function parameter data type
inconsistencies (Kacper Piwiński, Tian Tao)"
* tag 'acpi-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (25 commits)
ACPICA: Update version to 20200110
ACPICA: All acpica: Update copyrights to 2020 Including tool signons.
apei/ghes: Do not delay GHES polling
ACPI: button: Add DMI quirk for Razer Blade Stealth 13 late 2019 lid switch
ACPI: PPTT: Consistently use unsigned int as parameter type
ACPI: EC: Reference count query handlers under lock
ACPICA: Update the list of maintainers
ACPICA: Update version to 20191213
ACPICA: Dispatcher: always generate buffer objects for ASL create_field() operator
ACPICA: acpisrc: add unix line ending support for non-windows build
ACPICA: Disassembler: create buffer fields in ACPI_PARSE_LOAD_PASS1
ACPICA: debugger: fix spelling mistake "adress" -> "address"
ACPI: video: Do not export a non working backlight interface on MSI MS-7721 boards
docs: firmware-guide: ACPI: Replace dma_request_slave_channel() with dma_request_chan()
thermal: int340x_thermal: Add Tiger Lake ACPI device IDs
platform/x86: intel-hid: Add Tiger Lake ACPI device ID
ACPI: fan: Add Tiger Lake ACPI device ID
ACPI: DPTF: Add Tiger Lake ACPI device IDs
ACPI: fan: Expose fan performance state information
tools/power/acpi: fix compilation error
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management updates from Rafael Wysocki:
"These add ACPI support to the intel_idle driver along with an admin
guide document for it, add support for CPR (Core Power Reduction) to
the AVS (Adaptive Voltage Scaling) subsystem, add new hardware support
in a few places, add some new sysfs attributes, debugfs files and
tracepoints, fix bugs and clean up a bunch of things all over.
Specifics:
- Update the ACPI processor driver in order to export
acpi_processor_evaluate_cst() to the code outside of it, add ACPI
support to the intel_idle driver based on that and clean up that
driver somewhat (Rafael Wysocki).
- Add an admin guide document for the intel_idle driver (Rafael
Wysocki).
- Clean up cpuidle core and drivers, enable compilation testing for
some of them (Benjamin Gaignard, Krzysztof Kozlowski, Rafael
Wysocki, Yangtao Li).
- Fix reference counting of OPP (operating performance points) table
structures (Viresh Kumar).
- Add support for CPR (Core Power Reduction) to the AVS (Adaptive
Voltage Scaling) subsystem (Niklas Cassel, Colin Ian King,
YueHaibing).
- Add support for TigerLake Mobile and JasperLake to the Intel RAPL
power capping driver (Zhang Rui).
- Update cpufreq drivers:
- Add i.MX8MP support to imx-cpufreq-dt (Anson Huang).
- Fix usage of a macro in loongson2_cpufreq (Alexandre Oliva).
- Fix cpufreq policy reference counting issues in s3c and
brcmstb-avs (chenqiwu).
- Fix ACPI table reference counting issue and HiSilicon quirk
handling in the CPPC driver (Hanjun Guo).
- Clean up spelling mistake in intel_pstate (Harry Pan).
- Convert the kirkwood and tegra186 drivers to using
devm_platform_ioremap_resource() (Yangtao Li).
- Update devfreq core:
- Add 'name' sysfs attribute for devfreq devices (Chanwoo Choi).
- Clean up the handing of transition statistics and allow them to
be reset by writing 0 to the 'trans_stat' devfreq device
attribute in sysfs (Kamil Konieczny).
- Add 'devfreq_summary' to debugfs (Chanwoo Choi).
- Clean up kerneldoc comments and Kconfig indentation (Krzysztof
Kozlowski, Randy Dunlap).
- Update devfreq drivers:
- Add dynamic scaling for the imx8m DDR controller and clean up
imx8m-ddrc (Leonard Crestez, YueHaibing).
- Fix DT node reference counting and nitialization error code path
in rk3399_dmc and add COMPILE_TEST and HAVE_ARM_SMCCC dependency
for it (Chanwoo Choi, Yangtao Li).
- Fix DT node reference counting in rockchip-dfi and make it use
devm_platform_ioremap_resource() (Yangtao Li).
- Fix excessive stack usage in exynos-ppmu (Arnd Bergmann).
- Fix initialization error code paths in exynos-bus (Yangtao Li).
- Clean up exynos-bus and exynos somewhat (Artur Świgoń, Krzysztof
Kozlowski).
- Add tracepoints for tracking usage_count updates unrelated to
status changes in PM-runtime (Michał Mirosław).
- Add sysfs attribute to control the "sync on suspend" behavior
during system-wide suspend (Jonas Meurer).
- Switch system-wide suspend tests over to 64-bit time (Alexandre
Belloni).
- Make wakeup sources statistics in debugfs cover deleted ones which
used to be the case some time ago (zhuguangqing).
- Clean up computations carried out during hibernation, update
messages related to hibernation and fix a spelling mistake in one
of them (Wen Yang, Luigi Semenzato, Colin Ian King).
- Add mailmap entry for maintainer e-mail address that has not been
functional for several years (Rafael Wysocki)"
* tag 'pm-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (83 commits)
cpufreq: loongson2_cpufreq: adjust cpufreq uses of LOONGSON_CHIPCFG
intel_idle: Clean up irtl_2_usec()
intel_idle: Move 3 functions closer to their callers
intel_idle: Annotate initialization code and data structures
intel_idle: Move and clean up intel_idle_cpuidle_devices_uninit()
intel_idle: Rearrange intel_idle_cpuidle_driver_init()
intel_idle: Clean up NULL pointer check in intel_idle_init()
intel_idle: Fold intel_idle_probe() into intel_idle_init()
intel_idle: Eliminate __setup_broadcast_timer()
cpuidle: fix cpuidle_find_deepest_state() kerneldoc warnings
cpuidle: sysfs: fix warnings when compiling with W=1
cpuidle: coupled: fix warnings when compiling with W=1
cpufreq: brcmstb-avs: fix imbalance of cpufreq policy refcount
PM: suspend: Add sysfs attribute to control the "sync on suspend" behavior
PM / devfreq: Add debugfs support with devfreq_summary file
Documentation: admin-guide: PM: Add intel_idle document
cpuidle: arm: Enable compile testing for some of drivers
PM-runtime: add tracepoints for usage_count changes
cpufreq: intel_pstate: fix spelling mistake: "Whethet" -> "Whether"
PM: hibernate: fix spelling mistake "shapshot" -> "snapshot"
...
|
|
This macro is never used from it was introduced in commit e6b1db98cf4d5
("security: Support early LSMs"), better to remove it.
Signed-off-by: Alex Shi <[email protected]>
Acked-by: Serge Hallyn <[email protected]>
Signed-off-by: James Morris <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
Pull regulator updates from Mark Brown:
"Hardly anything going on in the core this time around with the
regulator API and pretty quiet on the driver front:
- An API for comparing regulators, useful for devices that need to
check if supply voltages exactly match rather than just nominally
match.
- Conversion of several DT bindings to YAML format.
- Conversion of I2C drivers to probe_new().
- New drivers for Monolithic MPQ7920 and MP8859, and Rohm BD71828"
* tag 'regulator-v5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator: (34 commits)
dt-bindings: regulator: add document bindings for mpq7920
regulator: core: Fix exported symbols to the exported GPL version
regulator: mpq7920: Fix incorrect defines
regulator: vqmmc-ipq4019: Fix platform_no_drv_owner.cocci warnings
regulator: vctrl-regulator: Avoid deadlock getting and setting the voltage
regulator fix for "regulator: core: Add regulator_is_equal() helper"
regulator: core: Add regulator_is_equal() helper
regulator: mpq7920: Convert to use .probe_new
regulator: mpq7920: Remove unneeded fields from struct mpq7920_regulator_info
regulator: vqmmc-ipq4019: Trivial clean up
regulator: vqmmc-ipq4019: Remove ipq4019_regulator_remove
regulator: bindings: Drop document bindings for mpq7920
dt-bindings: Drop entry for Monolithic Power System, MPS
regulator: bd718x7: Simplify the code by removing struct bd718xx_pmic_inits
regulator: add IPQ4019 SDHCI VQMMC LDO driver
regulator: Convert i2c drivers to use .probe_new
regulator: mpq7920: Check the correct variable in mpq7920_regulator_register()
regulator: mpq7920: Fix Woverflow warning on conversion
regulator: mp8859: tidy up white space in probe
regulator: mpq7920: add mpq7920 regulator driver
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Pull spi updates from Mark Brown:
"Not much going on in the core for SPI this time but a reasonable
amount of change in the drivers:
- Removal of dmal_request_slave_channel() from Peter Ujfalusi.
- More conversions of drivers to GPIO descriptors from Linus Walleij.
- A big rework of the sh-msiof driver from Geert Uytterhoeven moving
it over to the generic native chipselect support.
- DMA support for the uniphier driver from Kunihiko Hayashi.
- New driver support for HiSilcon v3xx SPI NOR controllers from John
Garry"
* tag 'spi-v5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: (52 commits)
dt-binding: spi: add NPCM PSPI reset binding
spi: pxa2xx: Avoid touching SSCR0_SSE on MMP2
spi: spi-fsl-qspi: Ensure width is respected in spi-mem operations
spi: npcm-pspi: modify reset support
spi: npcm-pspi: improve spi transfer performance
spi: spi-ti-qspi: fix warning
spi: npcm-pspi: fix 16 bit send and receive support
spi: pxa2xx: Add support for Intel Comet Lake PCH-V
spi: fsl: simplify error path in of_fsl_spi_probe()
spi: fsl-lpspi: fix only one cs-gpio working
spi: spi-ti-qspi: optimize byte-transfers
spi: spi-ti-qspi: support large flash devices
spi: spi-qcom-qspi: Use device managed memory for clk_bulk_data
MAINTAINERS: Add a maintainer for the HiSilicon v3xx SFC driver
spi: Add HiSilicon v3xx SPI NOR flash controller driver
dt-bindings: spi_atmel: add microchip,sam9x60-spi
spi: bcm2835: Raise maximum number of slaves to 4
spi: sh-msiof: Do not redefine STR while compile testing
spi: rspi: Add support for GPIO chip selects
spi: rspi: Add support for multiple native chip selects
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap
Pull regmap updates from Mark Brown:
"This is quite a busy release for a subsystem that's usually very
quiet, though still a small set of updates in the grand scheme of
things:
- A fix for writes to non-incrementing registers.
- An iopoll() style helper for use with atomic safe regmaps, making
it easier to transition from raw memory mapped I/O.
- Some constification"
* tag 'regmap-v5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
regmap: fix writes to non incrementing registers
regmap: add iopoll-like atomic polling macro
regmap-i2c: constify regmap_bus structures
|
|
Add a typedef to for the fastop function prototype to make the code more
readable.
No functional change intended.
Signed-off-by: Sean Christopherson <[email protected]>
Reviewed-by: Miaohe Lin <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
It also helps eliminate some duplicated code.
Signed-off-by: Miaohe Lin <[email protected]>
Reviewed-by: Sean Christopherson <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
The function now has a single caller, so there is no point
in keeping it separate.
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Walk the host page tables to identify hugepage mappings for ZONE_DEVICE
pfns, i.e. DAX pages. Explicitly query kvm_is_zone_device_pfn() when
deciding whether or not to bother walking the host page tables, as DAX
pages do not set up the head/tail infrastructure, i.e. will return false
for PageCompound() even when using huge pages.
Zap ZONE_DEVICE sptes when disabling dirty logging, e.g. if live
migration fails, to allow KVM to rebuild large pages for DAX-based
mappings. Presumably DAX favors large pages, and worst case scenario is
a minor performance hit as KVM will need to re-fault all DAX-based
pages.
Suggested-by: Barret Rhoden <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Jason Zeng <[email protected]>
Cc: Dave Jiang <[email protected]>
Cc: Liran Alon <[email protected]>
Cc: linux-nvdimm <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Remove the late "lpage is disallowed" check from set_spte() now that the
initial check is performed after acquiring mmu_lock. Fold the guts of
the remaining helper, __mmu_gfn_lpage_is_disallowed(), into
kvm_mmu_hugepage_adjust() to eliminate the unnecessary slot !NULL check.
Signed-off-by: Sean Christopherson <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Fold max_mapping_level() into kvm_mmu_hugepage_adjust() now that HugeTLB
mappings are handled in kvm_mmu_hugepage_adjust(), i.e. there isn't a
need to pre-calculate the max mapping level. Co-locating all hugepage
checks eliminates a memslot lookup, at the cost of performing the
__mmu_gfn_lpage_is_disallowed() checks while holding mmu_lock.
The latency of lpage_is_disallowed() is likely negligible relative to
the rest of the code run while holding mmu_lock, and can be offset to
some extent by eliminating the mmu_gfn_lpage_is_disallowed() check in
set_spte() in a future patch. Eliminating the check in set_spte() is
made possible by performing the initial lpage_is_disallowed() checks
while holding mmu_lock.
Signed-off-by: Sean Christopherson <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Zap any compound page, e.g. THP or HugeTLB pages, when zapping sptes
that can potentially be converted to huge sptes after disabling dirty
logging on the associated memslot. Note, this approach could result in
false positives, e.g. if a random compound page is mapped into the
guest, but mapping non-huge compound pages into the guest is far from
the norm, and toggling dirty logging is not a frequent operation.
Signed-off-by: Sean Christopherson <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Remove logic to retrieve the original gfn now that HugeTLB mappings are
are identified in FNAME(fetch), i.e. FNAME(page_fault) no longer adjusts
the level or gfn.
Signed-off-by: Sean Christopherson <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Remove KVM's HugeTLB specific logic and instead rely on walking the host
page tables (already done for THP) to identify HugeTLB mappings.
Eliminating the HugeTLB-only logic avoids taking mmap_sem and calling
find_vma() for all hugepage compatible page faults, and simplifies KVM's
page fault code by consolidating all hugepage adjustments into a common
helper.
Signed-off-by: Sean Christopherson <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|