Age | Commit message (Collapse) | Author | Files | Lines |
|
Add missing cleanup in balloon_probe() if the call to
balloon_connect_vsp() fails. Also correctly handle cleanup in
balloon_remove() when dm_state is DM_INIT_ERROR because
balloon_resume() failed.
Signed-off-by: Shradha Gupta <[email protected]>
Reviewed-by: Michael Kelley <[email protected]>
Link: https://lore.kernel.org/r/20220516045058.GA7933@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net
Signed-off-by: Wei Liu <[email protected]>
|
|
The latest storvsc code has already removed the support for windows 7 and
earlier. There is still some code logic remaining which is there to support
pre Windows 8 OS. This patch removes these stale logic.
This patch majorly does three things :
1. Removes vmscsi_size_delta and its logic, as the vmscsi_request struct is
same for all the OS post windows 8 there is no need of delta.
2. Simplify sense_buffer_size logic, as there is single buffer size for
all the post windows 8 OS.
3. Embed the vmscsi_win8_extension structure inside the vmscsi_request,
as there is no separate handling required for different OS.
Signed-off-by: Saurabh Sengar <[email protected]>
Reviewed-by: Michael Kelley <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Wei Liu <[email protected]>
|
|
Spelling mistake (triple letters) in comment.
Detected with the help of Coccinelle.
Signed-off-by: Julia Lawall <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Wei Liu <[email protected]>
|
|
[ Similarly to commit a765ed47e4516 ("PCI: hv: Fix synchronization
between channel callback and hv_compose_msi_msg()"): ]
The (on-stack) teardown packet becomes invalid once the completion
timeout in hv_pci_bus_exit() has expired and hv_pci_bus_exit() has
returned. Prevent the channel callback from accessing the invalid
packet by removing the ID associated to such packet from the VMbus
requestor in hv_pci_bus_exit().
Signed-off-by: Andrea Parri (Microsoft) <[email protected]>
Reviewed-by: Michael Kelley <[email protected]>
Acked-by: Lorenzo Pieralisi <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Wei Liu <[email protected]>
|
|
For additional robustness in the face of Hyper-V errors or malicious
behavior, validate all values that originate from packets that Hyper-V
has sent to the guest in the host-to-guest ring buffer. Ensure that
invalid values cannot cause data being copied out of the bounds of the
source buffer in hv_pci_onchannelcallback().
While at it, remove a redundant validation in hv_pci_generic_compl():
hv_pci_onchannelcallback() already ensures that all processed incoming
packets are "at least as large as [in fact larger than] a response".
Signed-off-by: Andrea Parri (Microsoft) <[email protected]>
Reviewed-by: Michael Kelley <[email protected]>
Acked-by: Lorenzo Pieralisi <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Wei Liu <[email protected]>
|
|
According to Dexuan, the hypervisor folks beleive that multi-msi
allocations are not correct. compose_msi_msg() will allocate multi-msi
one by one. However, multi-msi is a block of related MSIs, with alignment
requirements. In order for the hypervisor to allocate properly aligned
and consecutive entries in the IOMMU Interrupt Remapping Table, there
should be a single mapping request that requests all of the multi-msi
vectors in one shot.
Dexuan suggests detecting the multi-msi case and composing a single
request related to the first MSI. Then for the other MSIs in the same
block, use the cached information. This appears to be viable, so do it.
Suggested-by: Dexuan Cui <[email protected]>
Signed-off-by: Jeffrey Hugo <[email protected]>
Reviewed-by: Dexuan Cui <[email protected]>
Tested-by: Michael Kelley <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Wei Liu <[email protected]>
|
|
Currently if compose_msi_msg() is called multiple times, it will free any
previous IRTE allocation, and generate a new allocation. While nothing
prevents this from occurring, it is extraneous when Linux could just reuse
the existing allocation and avoid a bunch of overhead.
However, when future IRTE allocations operate on blocks of MSIs instead of
a single line, freeing the allocation will impact all of the lines. This
could cause an issue where an allocation of N MSIs occurs, then some of
the lines are retargeted, and finally the allocation is freed/reallocated.
The freeing of the allocation removes all of the configuration for the
entire block, which requires all the lines to be retargeted, which might
not happen since some lines might already be unmasked/active.
Signed-off-by: Jeffrey Hugo <[email protected]>
Reviewed-by: Dexuan Cui <[email protected]>
Tested-by: Dexuan Cui <[email protected]>
Tested-by: Michael Kelley <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Wei Liu <[email protected]>
|
|
The DRM Hyper-V driver has special case code for running on the first
released versions of Hyper-V: 2008 and 2008 R2/Windows 7. These versions
are now out of support (except for extended security updates) and lack
support for performance features that are needed for effective production
usage of Linux guests.
The negotiation of the VMbus protocol versions required by these old
Hyper-V versions has been removed from the VMbus driver. So now remove
the handling of these VMbus protocol versions from the DRM Hyper-V
driver.
Signed-off-by: Michael Kelley <[email protected]>
Reviewed-by: Deepak Rawat <[email protected]>
Reviewed-by: Andrea Parri (Microsoft) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Wei Liu <[email protected]>
|
|
The hyperv_fb driver has special case code for running on the first
released versions of Hyper-V: 2008 and 2008 R2/Windows 7. These versions
are now out of support (except for extended security updates) and lack
support for performance features that are needed for effective production
usage of Linux guests.
The negotiation of the VMbus protocol versions required by these old
Hyper-V versions has been removed from the VMbus driver. So now remove
the handling of these VMbus protocol versions from the hyperv_fb driver.
Signed-off-by: Michael Kelley <[email protected]>
Reviewed-by: Andrea Parri (Microsoft) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Wei Liu <[email protected]>
|
|
The storvsc driver has special case code for running on the first released
versions of Hyper-V: 2008 and 2008 R2/Windows 7. These versions are now
out of support (except for extended security updates) and lack support
for performance features like multiple VMbus channels that are needed for
effective production usage of Linux guests.
The negotiation of the VMbus protocol versions required by these old
Hyper-V versions has been removed from the VMbus driver. So now remove
the handling of these VMbus protocol versions from the storvsc driver.
Signed-off-by: Michael Kelley <[email protected]>
Reviewed-by: Andrea Parri (Microsoft) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Wei Liu <[email protected]>
|
|
The VMbus driver has special case code for running on the first released
versions of Hyper-V: 2008 and 2008 R2/Windows 7. These versions are now
out of support (except for extended security updates) and lack the
performance features needed for effective production usage of Linux
guests.
Simplify the code by removing the negotiation of the VMbus protocol
versions required for these releases of Hyper-V, and by removing the
special case code for handling these VMbus protocol versions.
Signed-off-by: Michael Kelley <[email protected]>
Reviewed-by: Andrea Parri (Microsoft) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Wei Liu <[email protected]>
|
|
In newer versions of Hyper-V, the x86/x64 PMU can be virtualized
into guest VMs by explicitly enabling it. Linux kernels are typically
built to automatically enable the hardlockup detector if the PMU is
found. To prevent the possibility of false positives due to the
vagaries of VM scheduling, disable the PMU-based hardlockup detector
by default in a VM on Hyper-V. The hardlockup detector can still be
enabled by overriding the default with the nmi_watchdog=1 option on
the kernel boot line or via sysctl at runtime.
This change mimics the approach taken with KVM guests in
commit 692297d8f968 ("watchdog: introduce the hardlockup_detector_disable()
function").
Linux on ARM64 does not provide a PMU-based hardlockup detector, so
there's no corresponding disable in the Hyper-V init code on ARM64.
Signed-off-by: Michael Kelley <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Wei Liu <[email protected]>
|
|
Add error message when the size of requested framebuffer is more than
the allocated size by vmbus mmio region for framebuffer.
Signed-off-by: Saurabh Sengar <[email protected]>
Reviewed-by: Dexuan Cui <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Wei Liu <[email protected]>
|
|
Currently when the pci-hyperv driver finishes probing and initializing the
PCI device, it sets the PCI_COMMAND_MEMORY bit; later when the PCI device
is registered to the core PCI subsystem, the core PCI driver's BAR detection
and initialization code toggles the bit multiple times, and each toggling of
the bit causes the hypervisor to unmap/map the virtual BARs from/to the
physical BARs, which can be slow if the BAR sizes are huge, e.g., a Linux VM
with 14 GPU devices has to spend more than 3 minutes on BAR detection and
initialization, causing a long boot time.
Reduce the boot time by not setting the PCI_COMMAND_MEMORY bit when we
register the PCI device (there is no need to have it set in the first place).
The bit stays off till the PCI device driver calls pci_enable_device().
With this change, the boot time of such a 14-GPU VM is reduced by almost
3 minutes.
Link: https://lore.kernel.org/lkml/[email protected]/
Tested-by: Boqun Feng (Microsoft) <[email protected]>
Signed-off-by: Dexuan Cui <[email protected]>
Reviewed-by: Michael Kelley <[email protected]>
Acked-by: Lorenzo Pieralisi <[email protected]>
Cc: Jake Oshins <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Wei Liu <[email protected]>
|
|
In the multi-MSI case, hv_arch_irq_unmask() will only operate on the first
MSI of the N allocated. This is because only the first msi_desc is cached
and it is shared by all the MSIs of the multi-MSI block. This means that
hv_arch_irq_unmask() gets the correct address, but the wrong data (always
0).
This can break MSIs.
Lets assume MSI0 is vector 34 on CPU0, and MSI1 is vector 33 on CPU0.
hv_arch_irq_unmask() is called on MSI0. It uses a hypercall to configure
the MSI address and data (0) to vector 34 of CPU0. This is correct. Then
hv_arch_irq_unmask is called on MSI1. It uses another hypercall to
configure the MSI address and data (0) to vector 33 of CPU0. This is
wrong, and results in both MSI0 and MSI1 being routed to vector 33. Linux
will observe extra instances of MSI1 and no instances of MSI0 despite the
endpoint device behaving correctly.
For the multi-MSI case, we need unique address and data info for each MSI,
but the cached msi_desc does not provide that. However, that information
can be gotten from the int_desc cached in the chip_data by
compose_msi_msg(). Fix the multi-MSI case to use that cached information
instead. Since hv_set_msi_entry_from_desc() is no longer applicable,
remove it.
Signed-off-by: Jeffrey Hugo <[email protected]>
Reviewed-by: Michael Kelley <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Wei Liu <[email protected]>
|
|
With no users of hv_pkt_iter_next_raw() and no "external" users of
hv_pkt_iter_first_raw(), the iterator functions can be refactored
and simplified to remove some indirection/code.
Signed-off-by: Andrea Parri (Microsoft) <[email protected]>
Reviewed-by: Michael Kelley <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Wei Liu <[email protected]>
|
|
So that isolated guests can communicate with the host via hv_sock
channels.
Signed-off-by: Andrea Parri (Microsoft) <[email protected]>
Reviewed-by: Michael Kelley <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Wei Liu <[email protected]>
|
|
For additional robustness in the face of Hyper-V errors or malicious
behavior, validate all values that originate from packets that Hyper-V
has sent to the guest in the host-to-guest ring buffer. Ensure that
invalid values cannot cause data being copied out of the bounds of the
source buffer in hvs_stream_dequeue().
Signed-off-by: Andrea Parri (Microsoft) <[email protected]>
Reviewed-by: Michael Kelley <[email protected]>
Reviewed-by: Stefano Garzarella <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Wei Liu <[email protected]>
|
|
Pointers to VMbus packets sent by Hyper-V are used by the hv_sock driver
within the guest VM. Hyper-V can send packets with erroneous values or
modify packet fields after they are processed by the guest. To defend
against these scenarios, copy the incoming packet after validating its
length and offset fields using hv_pkt_iter_{first,next}(). Use
HVS_PKT_LEN(HVS_MTU_SIZE) to initialize the buffer which holds the
copies of the incoming packets. In this way, the packet can no longer
be modified by the host.
Signed-off-by: Andrea Parri (Microsoft) <[email protected]>
Reviewed-by: Michael Kelley <[email protected]>
Reviewed-by: Stefano Garzarella <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Wei Liu <[email protected]>
|
|
The function returns NULL if the ring buffer doesn't contain enough
readable bytes to constitute a packet descriptor. The ring buffer's
write_index is in memory which is shared with the Hyper-V host, an
erroneous or malicious host could thus change its value and overturn
the result of hvs_stream_has_data().
Signed-off-by: Andrea Parri (Microsoft) <[email protected]>
Reviewed-by: Michael Kelley <[email protected]>
Reviewed-by: Stefano Garzarella <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Wei Liu <[email protected]>
|
|
Dexuan wrote:
"[...] when we disable AccelNet, the host PCI VSP driver sends a
PCI_EJECT message first, and the channel callback may set
hpdev->state to hv_pcichild_ejecting on a different CPU. This can
cause hv_compose_msi_msg() to exit from the loop and 'return', and
the on-stack variable 'ctxt' is invalid. Now, if the response
message from the host arrives, the channel callback will try to
access the invalid 'ctxt' variable, and this may cause a crash."
Schematically:
Hyper-V sends PCI_EJECT msg
hv_pci_onchannelcallback()
state = hv_pcichild_ejecting
hv_compose_msi_msg()
alloc and init comp_pkt
state == hv_pcichild_ejecting
Hyper-V sends VM_PKT_COMP msg
hv_pci_onchannelcallback()
retrieve address of comp_pkt
'free' comp_pkt and return
comp_pkt->completion_func()
Dexuan also showed how the crash can be triggered after introducing
suitable delays in the driver code, thus validating the 'assumption'
that the host can still normally respond to the guest's compose_msi
request after the host has started to eject the PCI device.
Fix the synchronization by leveraging the requestor lock as follows:
- Before 'return'-ing in hv_compose_msi_msg(), remove the ID (while
holding the requestor lock) associated to the completion packet.
- Retrieve the address *and call ->completion_func() within a same
(requestor) critical section in hv_pci_onchannelcallback().
Reported-by: Wei Hu <[email protected]>
Reported-by: Dexuan Cui <[email protected]>
Suggested-by: Michael Kelley <[email protected]>
Signed-off-by: Andrea Parri (Microsoft) <[email protected]>
Reviewed-by: Michael Kelley <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Wei Liu <[email protected]>
|
|
To abtract the lock and unlock operations on the requestor spin lock.
The helpers will come in handy in hv_pci.
No functional change.
Suggested-by: Michael Kelley <[email protected]>
Signed-off-by: Andrea Parri (Microsoft) <[email protected]>
Reviewed-by: Michael Kelley <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Wei Liu <[email protected]>
|
|
The function can be used to retrieve and clear/remove a transation ID
from a channel requestor, provided the memory address corresponding to
the ID equals a specified address. The function, and its 'lockless'
variant __vmbus_request_addr_match(), will be used by hv_pci.
Refactor vmbus_request_addr() to reuse the 'newly' introduced code.
No functional change.
Suggested-by: Michael Kelley <[email protected]>
Signed-off-by: Andrea Parri (Microsoft) <[email protected]>
Reviewed-by: Michael Kelley <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Wei Liu <[email protected]>
|
|
The function can be used to send a VMbus packet and retrieve the
corresponding transaction ID. It will be used by hv_pci.
No functional change.
Suggested-by: Michael Kelley <[email protected]>
Signed-off-by: Andrea Parri (Microsoft) <[email protected]>
Reviewed-by: Michael Kelley <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Wei Liu <[email protected]>
|
|
Currently, pointers to guest memory are passed to Hyper-V as transaction
IDs in hv_pci. In the face of errors or malicious behavior in Hyper-V,
hv_pci should not expose or trust the transaction IDs returned by
Hyper-V to be valid guest memory addresses. Instead, use small integers
generated by vmbus_requestor as request (transaction) IDs.
Suggested-by: Michael Kelley <[email protected]>
Signed-off-by: Andrea Parri (Microsoft) <[email protected]>
Reviewed-by: Michael Kelley <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Wei Liu <[email protected]>
|
|
vmbus_request_addr() returns 0 (zero) if the transaction ID passed
to as argument is 0. This is unfortunate for two reasons: first,
netvsc_send_completion() does not check for a NULL cmd_rqst (before
dereferencing the corresponding NVSP message); second, 0 is a *valid*
value of cmd_rqst in netvsc_send_tx_complete(), cf. the call of
vmbus_sendpacket() in netvsc_send_pkt().
vmbus_request_addr() has included the code in question since its
introduction with commit e8b7db38449ac ("Drivers: hv: vmbus: Add
vmbus_requestor data structure for VMBus hardening"); such code was
motivated by the early use of vmbus_requestor by hv_storvsc. Since
hv_storvsc moved to a tag-based mechanism to generate and retrieve
transaction IDs with commit bf5fd8cae3c8f ("scsi: storvsc: Use
blk_mq_unique_tag() to generate requestIDs"), vmbus_request_addr()
can be modified to return VMBUS_RQST_ERROR if the ID is 0. This
change solves the issues in hv_netvsc (and makes the handling of
messages with transaction ID of 0 consistent with the semantics
"the ID is not contained in the requestor/invalid ID").
vmbus_next_request_id(), vmbus_request_addr() should still reserve
the ID of 0 for Hyper-V, because Hyper-V will "ignore" (not respond
to) VMBUS_DATA_PACKET_FLAG_COMPLETION_REQUESTED packets/requests with
transaction ID of 0 from the guest.
Fixes: bf5fd8cae3c8f ("scsi: storvsc: Use blk_mq_unique_tag() to generate requestIDs")
Signed-off-by: Andrea Parri (Microsoft) <[email protected]>
Reviewed-by: Michael Kelley <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Wei Liu <[email protected]>
|
|
If the allocation of multiple MSI vectors for multi-MSI fails in the core
PCI framework, the framework will retry the allocation as a single MSI
vector, assuming that meets the min_vecs specified by the requesting
driver.
Hyper-V advertises that multi-MSI is supported, but reuses the VECTOR
domain to implement that for x86. The VECTOR domain does not support
multi-MSI, so the alloc will always fail and fallback to a single MSI
allocation.
In short, Hyper-V advertises a capability it does not implement.
Hyper-V can support multi-MSI because it coordinates with the hypervisor
to map the MSIs in the IOMMU's interrupt remapper, which is something the
VECTOR domain does not have. Therefore the fix is simple - copy what the
x86 IOMMU drivers (AMD/Intel-IR) do by removing
X86_IRQ_ALLOC_CONTIGUOUS_VECTORS after calling the VECTOR domain's
pci_msi_prepare().
Fixes: 4daace0d8ce8 ("PCI: hv: Add paravirtual PCI front-end for Microsoft Hyper-V VMs")
Signed-off-by: Jeffrey Hugo <[email protected]>
Reviewed-by: Dexuan Cui <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Wei Liu <[email protected]>
|
|
Hyper-V may offer an Initial Machine Configuration (IMC) synthetic
device to guest VMs. The device may be used by Windows guests to get
specialization information, such as the hostname. But the device
is not used in Linux and there is no Linux driver, so it is
unsupported.
Currently, the IMC device GUID is not recognized by the VMbus driver,
which results in an "Unknown GUID" error message during boot. Add
the GUID to the list of known but unsupported devices so that the
error message is not generated. Other than avoiding the error message,
there is no change in guest behavior.
Signed-off-by: Michael Kelley <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Wei Liu <[email protected]>
|
|
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fix from Borislav Petkov:
- Fix a corner case when calculating sched runqueue variables
That fix also removes a check for a zero divisor in the code, without
mentioning it. Vincent clarified that it's ok after I whined about it:
https://lore.kernel.org/all/CAKfTPtD2QEyZ6ADd5WrwETMOX0XOwJGnVddt7VHgfURdqgOS-Q@mail.gmail.com/
* tag 'sched_urgent_for_v5.18_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/pelt: Fix attach_entity_load_avg() corner case
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
- Partly revert a change to our timer_interrupt() that caused lockups
with high res timers disabled.
- Fix a bug in KVM TCE handling that could corrupt kernel memory.
- Two commits fixing Power9/Power10 perf alternative event selection.
Thanks to Alexey Kardashevskiy, Athira Rajeev, David Gibson, Frederic
Barrat, Madhavan Srinivasan, Miguel Ojeda, and Nicholas Piggin.
* tag 'powerpc-5.18-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/perf: Fix 32bit compile
powerpc/perf: Fix power10 event alternatives
powerpc/perf: Fix power9 event alternatives
KVM: PPC: Fix TCE handling for VFIO
powerpc/time: Always set decrementer in timer_interrupt()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Borislav Petkov:
- Add Sapphire Rapids CPU support
- Fix a perf vmalloc-ed buffer mapping error (PERF_USE_VMALLOC in use)
* tag 'perf_urgent_for_v5.18_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/cstate: Add SAPPHIRERAPIDS_X CPU support
perf/core: Fix perf_mmap fail when CONFIG_PERF_USE_VMALLOC enabled
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras
Pull EDAC fix from Borislav Petkov:
- Read the reported error count from the proper register on
synopsys_edac
* tag 'edac_urgent_for_v5.18_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
EDAC/synopsys: Read the error count from the correct register
|
|
Since commit 559089e0a93d ("vmalloc: replace VM_NO_HUGE_VMAP with
VM_ALLOW_HUGE_VMAP"), the use of hugepage mappings for vmalloc is an
opt-in strategy, because it caused a number of problems that weren't
noticed until x86 enabled it too.
One of the issues was fixed by Nick Piggin in commit 3b8000ae185c
("mm/vmalloc: huge vmalloc backing pages should be split rather than
compound"), but I'm still worried about page protection issues, and
VM_FLUSH_RESET_PERMS in particular.
However, like the hash table allocation case (commit f2edd118d02d:
"page_alloc: use vmalloc_huge for large system hash"), the use of
kvmalloc() should be safe from any such games, since the returned
pointer might be a SLUB allocation, and as such no user should
reasonably be using it in any odd ways.
We also know that the allocations are fairly large, since it falls back
to the vmalloc case only when a kmalloc() fails. So using a hugepage
mapping seems both safe and relevant.
This patch does show a weakness in the opt-in strategy: since the opt-in
flag is in the 'vm_flags', not the usual gfp_t allocation flags, very
few of the usual interfaces actually expose it.
That's not much of an issue in this case that already used one of the
fairly specialized low-level vmalloc interfaces for the allocation, but
for a lot of other vmalloc() users that might want to opt in, it's going
to be very inconvenient.
We'll either have to fix any compatibility problems, or expose it in the
gfp flags (__GFP_COMP would have made a lot of sense) to allow normal
vmalloc() users to use hugepage mappings. That said, the cases that
really matter were probably already taken care of by the hash tabel
allocation.
Link: https://lore.kernel.org/all/[email protected]/
Link: https://lore.kernel.org/all/CAHk-=whao=iosX1s5Z4SF-ZGa-ebAukJoAdUJFk5SPwnofV+Vg@mail.gmail.com/
Cc: Nicholas Piggin <[email protected]>
Cc: Paul Menzel <[email protected]>
Cc: Song Liu <[email protected]>
Cc: Rick Edgecombe <[email protected]>
Cc: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Use vmalloc_huge() in alloc_large_system_hash() so that large system
hash (>= PMD_SIZE) could benefit from huge pages.
Note that vmalloc_huge only allocates huge pages for systems with
HAVE_ARCH_HUGE_VMALLOC.
Signed-off-by: Song Liu <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Rik van Riel <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Pull ksmbd server fixes from Steve French:
- cap maximum sector size reported to avoid mount problems
- reference count fix
- fix filename rename race
* tag '5.18-rc3-ksmbd-fixes' of git://git.samba.org/ksmbd:
ksmbd: set fixed sector size to FS_SECTOR_SIZE_INFORMATION
ksmbd: increment reference count of parent fp
ksmbd: remove filename in ksmbd_file
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc
Pull ARC fixes from Vineet Gupta:
- Assorted fixes
* tag 'arc-5.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
ARC: remove redundant READ_ONCE() in cmpxchg loop
ARC: atomic: cleanup atomic-llsc definitions
arc: drop definitions of pgd_index() and pgd_offset{, _k}() entirely
ARC: dts: align SPI NOR node name with dtschema
ARC: Remove a redundant memset()
ARC: fix typos in comments
ARC: entry: fix syscall_trace_exit argument
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fix from James Bottomley:
"One fix for an information leak caused by copying a buffer to
userspace without checking for error first in the sr driver"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: sr: Do not leak information in ioctl
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen fixes from Juergen Gross:
"A simple cleanup patch and a refcount fix for Xen on Arm"
* tag 'for-linus-5.18-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
arm/xen: Fix some refcount leaks
xen: Convert kmap() to kmap_local_page()
|
|
Pull more drm fixes from Dave Airlie:
"Maarten was away, so Maxine stepped up and sent me the drm-fixes
merge, so no point leaving it for another week.
The big change is an OF revert around bridge/panels, it may have some
driver fallout, but hopefully this revert gets them shook out in the
next week easier.
Otherwise it's a bunch of locking/refcounts across drivers, a radeon
dma_resv logic fix and some raspberry pi panel fixes.
panel:
- revert of patch that broke panel/bridge issues
dma-buf:
- remove unused header file.
amdgpu:
- partial revert of locking change
radeon:
- fix dma_resv logic inversion
panel:
- pi touchscreen panel init fixes
vc4:
- build fix
- runtime pm refcount fix
vmwgfx:
- refcounting fix"
* tag 'drm-fixes-2022-04-23' of git://anongit.freedesktop.org/drm/drm:
drm/amdgpu: partial revert "remove ctx->lock" v2
Revert "drm: of: Lookup if child node has panel or bridge"
Revert "drm: of: Properly try all possible cases for bridge/panel detection"
drm/vc4: Use pm_runtime_resume_and_get to fix pm_runtime_get_sync() usage
drm/vmwgfx: Fix gem refcounting and memory evictions
drm/vc4: Fix build error when CONFIG_DRM_VC4=y && CONFIG_RASPBERRYPI_FIRMWARE=m
drm/panel/raspberrypi-touchscreen: Initialise the bridge in prepare
drm/panel/raspberrypi-touchscreen: Avoid NULL deref if not initialised
dma-buf-map: remove renamed header file
drm/radeon: fix logic inversion in radeon_sync_resv
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input fixes from Dmitry Torokhov:
- a new set of keycodes to be used by marine navigation systems
- minor fixes to omap4-keypad and cypress-sf drivers
* tag 'input-for-v5.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: add Marine Navigation Keycodes
Input: omap4-keypad - fix pm_runtime_get_sync() error checking
Input: cypress-sf - register a callback to disable the regulators
|
|
Pull block fixes from Jens Axboe:
"Just two small regression fixes for bcache"
* tag 'block-5.18-2022-04-22' of git://git.kernel.dk/linux-block:
bcache: fix wrong bdev parameter when calling bio_alloc_clone() in do_bio_hook()
bcache: put bch_bio_map() back to correct location in journal_write_unlocked()
|
|
Pull io_uring fixes from Jens Axboe:
"Just two small fixes - one fixing a potential leak for the iovec for
larger requests added in this cycle, and one fixing a theoretical leak
with CQE_SKIP and IOPOLL"
* tag 'io_uring-5.18-2022-04-22' of git://git.kernel.dk/linux-block:
io_uring: fix leaks on IOPOLL and CQE_SKIP
io_uring: free iovec if file assignment fails
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
Pull perf tools fixes from Arnaldo Carvalho de Melo:
- Fix header include for LLVM >= 14 when building with libclang.
- Allow access to 'data_src' for auxtrace in 'perf script' with ARM SPE
perf.data files, fixing processing data with such attributes.
- Fix error message for test case 71 ("Convert perf time to TSC") on
s390, where it is not supported.
* tag 'perf-tools-fixes-for-v5.18-2022-04-22' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
perf test: Fix error message for test case 71 on s390, where it is not supported
perf report: Set PERF_SAMPLE_DATA_SRC bit for Arm SPE event
perf script: Always allow field 'data_src' for auxtrace
perf clang: Fix header include for LLVM >= 14
|
|
Add a struct page forward declaration to cacheflush_32.h.
Fixes this build warning:
CC drivers/crypto/xilinx/zynqmp-sha.o
In file included from arch/sparc/include/asm/cacheflush.h:11,
from include/linux/cacheflush.h:5,
from drivers/crypto/xilinx/zynqmp-sha.c:6:
arch/sparc/include/asm/cacheflush_32.h:38:37: warning: 'struct page' declared inside parameter list will not be visible outside of this definition or declaration
38 | void sparc_flush_page_to_ram(struct page *page);
Exposed by commit 0e03b8fd2936 ("crypto: xilinx - Turn SHA into a
tristate and allow COMPILE_TEST") but not Fixes: that commit because the
underlying problem is older.
Signed-off-by: Randy Dunlap <[email protected]>
Reported-by: kernel test robot <[email protected]>
Cc: Herbert Xu <[email protected]>
Cc: David S. Miller <[email protected]>
Cc: Sam Ravnborg <[email protected]>
Cc: [email protected]
Acked-by: Sam Ravnborg <[email protected]>
Acked-by: Herbert Xu <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
Two fixes for the raspberrypi panel initialisation, one fix for a logic
inversion in radeon, a build and pm refcounting fix for vc4, two reverts
for drm_of_get_bridge that caused a number of regression and a locking
regression for amdgpu.
Signed-off-by: Dave Airlie <[email protected]>
From: Maxime Ripard <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/20220422084403.2xrhf3jusdej5yo4@houat
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 fixes from Ted Ts'o:
"Fix some syzbot-detected bugs, as well as other bugs found by I/O
injection testing.
Change ext4's fallocate to consistently drop set[ug]id bits when an
fallocate operation might possibly change the user-visible contents of
a file.
Also, improve handling of potentially invalid values in the the
s_overhead_cluster superblock field to avoid ext4 returning a negative
number of free blocks"
* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
jbd2: fix a potential race while discarding reserved buffers after an abort
ext4: update the cached overhead value in the superblock
ext4: force overhead calculation if the s_overhead_cluster makes no sense
ext4: fix overhead calculation to account for the reserved gdt blocks
ext4, doc: fix incorrect h_reserved size
ext4: limit length to bitmap_maxbytes - blocksize in punch_hole
ext4: fix use-after-free in ext4_search_dir
ext4: fix bug_on in start_this_handle during umount filesystem
ext4: fix symlink file size not match to file content
ext4: fix fallocate to use file_modified to update permissions consistently
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata
Pull ATA fix from Damien Le Moal:
"A single fix to avoid a NULL pointer dereference in the pata_marvell
driver with adapters not supporting DMA, from Zheyu"
* tag 'ata-5.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata:
ata: pata_marvell: Check the 'bmdma_addr' beforing reading
|
|
Pull kvm fixes from Paolo Bonzini:
"The main and larger change here is a workaround for AMD's lack of
cache coherency for encrypted-memory guests.
I have another patch pending, but it's waiting for review from the
architecture maintainers.
RISC-V:
- Remove 's' & 'u' as valid ISA extension
- Do not allow disabling the base extensions 'i'/'m'/'a'/'c'
x86:
- Fix NMI watchdog in guests on AMD
- Fix for SEV cache incoherency issues
- Don't re-acquire SRCU lock in complete_emulated_io()
- Avoid NULL pointer deref if VM creation fails
- Fix race conditions between APICv disabling and vCPU creation
- Bugfixes for disabling of APICv
- Preserve BSP MSR_KVM_POLL_CONTROL across suspend/resume
selftests:
- Do not use bitfields larger than 32-bits, they differ between GCC
and clang"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
kvm: selftests: introduce and use more page size-related constants
kvm: selftests: do not use bitfields larger than 32-bits for PTEs
KVM: SEV: add cache flush to solve SEV cache incoherency issues
KVM: SVM: Flush when freeing encrypted pages even on SME_COHERENT CPUs
KVM: SVM: Simplify and harden helper to flush SEV guest page(s)
KVM: selftests: Silence compiler warning in the kvm_page_table_test
KVM: x86/pmu: Update AMD PMC sample period to fix guest NMI-watchdog
x86/kvm: Preserve BSP MSR_KVM_POLL_CONTROL across suspend/resume
KVM: SPDX style and spelling fixes
KVM: x86: Skip KVM_GUESTDBG_BLOCKIRQ APICv update if APICv is disabled
KVM: x86: Pend KVM_REQ_APICV_UPDATE during vCPU creation to fix a race
KVM: nVMX: Defer APICv updates while L2 is active until L1 is active
KVM: x86: Tag APICv DISABLE inhibit, not ABSENT, if APICv is disabled
KVM: Initialize debugfs_dentry when a VM is created to avoid NULL deref
KVM: Add helpers to wrap vcpu->srcu_idx and yell if it's abused
KVM: RISC-V: Use kvm_vcpu.srcu_idx, drop RISC-V's unnecessary copy
KVM: x86: Don't re-acquire SRCU lock in complete_emulated_io()
RISC-V: KVM: Restrict the extensions that can be disabled
RISC-V: KVM: Remove 's' & 'u' as valid ISA extension
|
|
Test case 71 'Convert perf time to TSC' is not supported on s390.
Subtest 71.1 is skipped with the correct message, but subtest 71.2 is
not skipped and fails.
The root cause is function evlist__open() called from
test__perf_time_to_tsc(). evlist__open() returns -ENOENT because the
event cycles:u is not supported by the selected PMU, for example
platform s390 on z/VM or an x86_64 virtual machine.
The PMU driver returns -ENOENT in this case. This error is leads to the
failure.
Fix this by returning TEST_SKIP on -ENOENT.
Output before:
71: Convert perf time to TSC:
71.1: TSC support: Skip (This architecture does not support)
71.2: Perf time to TSC: FAILED!
Output after:
71: Convert perf time to TSC:
71.1: TSC support: Skip (This architecture does not support)
71.2: Perf time to TSC: Skip (perf_read_tsc_conversion is not supported)
This also happens on an x86_64 virtual machine:
# uname -m
x86_64
$ ./perf test -F 71
71: Convert perf time to TSC :
71.1: TSC support : Ok
71.2: Perf time to TSC : FAILED!
$
Committer testing:
Continues to work on x86_64:
$ perf test 71
71: Convert perf time to TSC :
71.1: TSC support : Ok
71.2: Perf time to TSC : Ok
$
Fixes: 290fa68bdc458863 ("perf test tsc: Fix error message when not supported")
Signed-off-by: Thomas Richter <[email protected]>
Acked-by: Sumanth Korikkar <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Chengdong Li <[email protected]>
Cc: [email protected]
Cc: Heiko Carstens <[email protected]>
Cc: Sven Schnelle <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|