Age | Commit message (Collapse) | Author | Files | Lines |
|
With TASK_SIZE now reflecting the maximum size of the address space for
a process the code for arch_get_unmapped_area[_topdown] can be simplified.
Just let the logic pick a suitable address and deal with the page table
upgrade after the address has been selected.
Signed-off-by: Martin Schwidefsky <[email protected]>
|
|
The TASK_SIZE for a process should be maximum possible size of the address
space, 2GB for a 31-bit process and 8PB for a 64-bit process. The number
of page table levels required for a given memory layout is a consequence
of the mapped memory areas and their location.
Signed-off-by: Martin Schwidefsky <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/vfio-ccw into features
Pull vfio-ccw fixes from Cornelia Huck:
Two fixes for the new vfio-ccw support.
|
|
The guarded storage interface allows to register a control block for
each thread that is activated with the guarded storage broadcast event.
To retrieve the complete state of a process from the kernel a register
set for the stored broadcast control block is required.
Signed-off-by: Martin Schwidefsky <[email protected]>
|
|
Add use_cmma field to mm_context_t, like we do for storage keys.
Signed-off-by: Claudio Imbrenda <[email protected]>
Acked-by: Janosch Frank <[email protected]>
Reviewed-by: Christian Borntraeger <[email protected]>
Signed-off-by: Christian Borntraeger <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>
|
|
Add PGSTE manipulation functions:
* set_pgste_bits sets specific bits in a PGSTE
* get_pgste returns the whole PGSTE
* pgste_perform_essa manipulates a PGSTE to set specific storage states
* ESSA_[SG]ET_* macros used to indicate the action for manipulate_pgste
Signed-off-by: Claudio Imbrenda <[email protected]>
Reviewed-by: Janosch Frank <[email protected]>
Reviewed-by: Christian Borntraeger <[email protected]>
Signed-off-by: Christian Borntraeger <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>
|
|
When vfio_ccw_mdev_reset fails during the remove process of the mdev,
the current implementation simply returns.
The failure indicates that the subchannel device is in a NOT_OPER state,
thus the right thing to do should be removing the mdev.
While we are at here, reverse the condition check to make the code more
concise and readable.
Signed-off-by: Dong Jia Shi <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Cornelia Huck <[email protected]>
|
|
Remove several unnecessary checks for the @private pointer, since it
can never be NULL in these places.
Reported-by: Dan Carpenter <[email protected]>
Signed-off-by: Dong Jia Shi <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Cornelia Huck <[email protected]>
|
|
The CAD instruction never worked quite as expected for the spinlock
code. It has been disabled by default with git commit 61b0b01686d48220,
if the "cad" kernel parameter is specified it is enabled for both user
space and the spinlock code. Leave the option to enable the instruction
for user space but remove it from the spinlock code.
Signed-off-by: Martin Schwidefsky <[email protected]>
|
|
Add a couple more __atomic_xxx function to atomic_ops.h and use them
to replace the compare-and-swap inlines in the spinlock code. This
changes the type of the lock value from unsigned int to int.
Signed-off-by: Martin Schwidefsky <[email protected]>
|
|
There are three different code levels in regard to the identification
of guest samples. They differ in the way the LPP instruction is used.
1) Old kernels without the LPP instruction. The guest program parameter
is always zero.
2) Newer kernels load the process pid into the program parameter with LPP.
The guest program parameter is non-zero if the guest executes in a
process != idle.
3) The latest kernels load ((1UL << 31) | pid) with LPP to make the value
non-zero even for the idle task. The guest program parameter is non-zero
if the guest is running.
All kernels load the process pid to CR4 on context switch. The CPU sampling
code uses the value in CR4 to decide between guest and host samples in case
the guest program parameter is zero. The three cases:
1) CR4==pid, gpp==0
2) CR4==pid, gpp==pid
3) CR4==pid, gpp==((1UL << 31) | pid)
The load-control instruction to load the pid into CR4 is expensive and the
goal is to remove it. To distinguish the host CR4 from the guest pid for
the idle process the maximum value 0xffff for the PASN is used.
This adds a fourth case for a guest OS with an updated kernel:
4) CR4==0xffff, gpp=((1UL << 31) | pid)
The host kernel will have CR4==0xffff and will use (gpp!=0 || CR4!==0xffff)
to identify guest samples. This works nicely with all 4 cases, the only
possible issue would be a guest with an old kernel (gpp==0) and a process
pid of 0xffff. Well, don't do that..
Suggested-by: Christian Borntraeger <[email protected]>
Reviewed-by: Christian Borntraeger <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>
|
|
Move a struct definition to get rid of a forward declaration.
Signed-off-by: Sebastian Ott <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>
|
|
Users complained that they are hitting the limit of 64 functions (some
physical functions can spawn lots of virtual functions). Double the
default limit. With the latest savings in static data usage this
increases the image size by only 520 bytes.
Signed-off-by: Sebastian Ott <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>
|
|
Commit c506fff3d3a8 ("s390/pci: resize iomap") reduced the iomap
to NR_FUNCTIONS * PCI_BAR_COUNT elements. Since we only support
functions with 64bit BARs we can cut that number in half.
Signed-off-by: Sebastian Ott <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>
|
|
Address space identifiers are already defined in <asm/pci_insn.h>.
Signed-off-by: Sebastian Ott <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>
|
|
barsize was never used. Get rid of it.
Signed-off-by: Sebastian Ott <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>
|
|
The 32-bit lctl instruction is quite a bit slower than the 64-bit
counter part lctlg. Use the faster instruction.
Signed-off-by: Martin Schwidefsky <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/vfio-ccw into features
Pull vfio-ccw branch to add the basic channel I/O passthrough
intrastructure based on vfio.
The focus is on supporting dasd-eckd(cu_type/dev_type = 0x3990/0x3390)
as the target device.
Signed-off-by: Martin Schwidefsky <[email protected]>
|
|
Add Cornelia Huck and myself as the vfio-ccw driver maintainers.
Signed-off-by: Dong Jia Shi <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Cornelia Huck <[email protected]>
|
|
Although Linux does not use format-0 channel command words (CCW0)
these are a non-optional part of the platform spec, and for the sake
of platform compliance, and possibly some non-Linux guests, we have
to support CCW0.
Making the kernel execute a format 0 channel program is too much hassle
because we would need to allocate and use memory which can be addressed
by 24 bit physical addresses (because of CCW0.cda). So we implement CCW0
support by translating the channel program into an equivalent CCW1
program instead.
Based upon an orginal patch by Kai Yue Wang.
Signed-off-by: Dong Jia Shi <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Cornelia Huck <[email protected]>
|
|
Add file Documentation/s390/vfio-ccw.txt that includes details
of vfio-ccw.
Acked-by: Pierre Morel <[email protected]>
Signed-off-by: Dong Jia Shi <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Cornelia Huck <[email protected]>
|
|
The current implementation doesn't check if the subchannel is in a
proper device state when handling an event. Let's introduce
a finite state machine to manage the state/event change.
Signed-off-by: Dong Jia Shi <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Cornelia Huck <[email protected]>
|
|
Introduce a singlethreaded workqueue to handle the I/O interrupts.
With the work added to this queue, we store the I/O results to the
io_region of the subchannel, then signal the userspace program to
handle the results.
Signed-off-by: Dong Jia Shi <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Cornelia Huck <[email protected]>
|
|
Realize VFIO_DEVICE_GET_IRQ_INFO ioctl to retrieve
VFIO_CCW_IO_IRQ information.
Realize VFIO_DEVICE_SET_IRQS ioctl to set an eventfd fd for
VFIO_CCW_IO_IRQ. Once a write operation to the ccw_io_region
was performed, trigger a signal on this fd.
Reviewed-by: Pierre Morel <[email protected]>
Signed-off-by: Dong Jia Shi <[email protected]>
Acked-by: Alex Williamson <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Cornelia Huck <[email protected]>
|
|
Introduce VFIO_DEVICE_RESET ioctl for vfio-ccw to make it possible
to hot-reset the device.
We try to achieve a reset by first disabling the subchannel and
then enabling it again: this should clear all state at the subchannel.
Signed-off-by: Dong Jia Shi <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Cornelia Huck <[email protected]>
|
|
Introduce device information about vfio-ccw: VFIO_DEVICE_FLAGS_CCW.
Realize VFIO_DEVICE_GET_REGION_INFO ioctl for vfio-ccw.
Reviewed-by: Pierre Morel <[email protected]>
Signed-off-by: Dong Jia Shi <[email protected]>
Acked-by: Alex Williamson <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Cornelia Huck <[email protected]>
|
|
We implement the basic ccw command handling infrastructure
here:
1. Translate the ccw commands.
2. Issue the translated ccw commands to the device.
3. Once we get the execution result, update the guest SCSW
with it.
Acked-by: Pierre Morel <[email protected]>
Signed-off-by: Dong Jia Shi <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Cornelia Huck <[email protected]>
|
|
To provide user-space a set of interfaces to:
1. pass in a ccw program to perform an I/O operation.
2. read back I/O results of the completed I/O operations.
We introduce an MMIO region for the vfio-ccw device here.
This region is defined to content:
1. areas to store arguments that an ssch required.
2. areas to store the I/O results.
Using pwrite/pread to the device on this region, a user-space program
could write/read data to/from the vfio-ccw device.
Reviewed-by: Pierre Morel <[email protected]>
Signed-off-by: Dong Jia Shi <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Cornelia Huck <[email protected]>
|
|
To make vfio support subchannel devices, we need to leverage the
mediated device framework to create a mediated device for the
subchannel device.
This registers the subchannel device to the mediated device
framework during probe to enable mediated device creation.
Reviewed-by: Pierre Morel <[email protected]>
Signed-off-by: Dong Jia Shi <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Cornelia Huck <[email protected]>
|
|
Introduce ccwchain structure and helper functions that can be used to
handle a channel program issued from a virtual machine.
The following limitations apply:
1. Supports only prefetch enabled mode.
2. Supports idal(c64) ccw chaining.
3. Supports 4k idaw.
4. Supports ccw1.
5. Supports direct ccw chaining by translating them to idal ccws.
CCW translation requires to leverage the vfio_(un)pin_pages interfaces
to pin/unpin sets of mem pages frequently. Currently we have a lack of
support to do this in an efficient way. So we introduce pfn_array data
structure and helper functions to handle pin/unpin operations here.
Signed-off-by: Dong Jia Shi <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Cornelia Huck <[email protected]>
|
|
To make vfio support subchannel devices, we need a css driver for
the vfio subchannels. This patch adds a basic vfio-ccw subchannel
driver for this purpose.
To enable VFIO for vfio-ccw, enable S390_CCW_IOMMU config option
and configure VFIO as required.
Acked-by: Pierre Morel <[email protected]>
Signed-off-by: Dong Jia Shi <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Cornelia Huck <[email protected]>
|
|
Define vfio-ccw device API strings. CCW vendor driver using mediated
device framework should use this string for device_api attribute.
Reviewed-by: Pierre Morel <[email protected]>
Signed-off-by: Dong Jia Shi <[email protected]>
Acked-by: Alex Williamson <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Cornelia Huck <[email protected]>
|
|
Export the common I/O interfaces those are needed by an I/O
subchannel driver to actually talk to the subchannel.
Reviewed-by: Pierre Morel <[email protected]>
Signed-off-by: Dong Jia Shi <[email protected]>
Cc: Sebastian Ott <[email protected]>
Cc: Peter Oberparleiter <[email protected]>
Acked-by: Sebastian Ott <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Cornelia Huck <[email protected]>
|
|
For future code reuse purpose, this decouples the cio code with
the ccw device specific parts from ccw_device_cancel_halt_clear,
and makes a new common I/O interface named cio_cancel_halt_clear.
Reviewed-by: Pierre Morel <[email protected]>
Signed-off-by: Dong Jia Shi <[email protected]>
Cc: Sebastian Ott <[email protected]>
Cc: Peter Oberparleiter <[email protected]>
Acked-by: Sebastian Ott <[email protected]>
Message-Id: <[email protected]>
[CH: Fix typo]
Signed-off-by: Cornelia Huck <[email protected]>
|
|
The return code of hw_perf_event_update() is not evaluated by
its callers. Hence, simplify the function by removing the
return code.
Reported-by: Heiko Carstens <[email protected]>
Signed-off-by: Hendrik Brueckner <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>
|
|
Using a register variable for r4 is not necessary. Let the
compiler decide the register to be used.
Reported-by: Heiko Carstens <[email protected]>
Signed-off-by: Hendrik Brueckner <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>
|
|
Make clear that the event definitions relate to the counter
facility (cf) and not to the sampling facility (sf).
Signed-off-by: Hendrik Brueckner <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>
|
|
Add the event names for the IBM z13/z13s specific CPU-MF counters.
Also improve the merging of the generic and model specific events
so that their sysfs attribute definitions completely reside in
memory. Hence, flagging the generic event attribute definitions
as initdata too.
Signed-off-by: Hendrik Brueckner <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>
|
|
Complete the IBM z13 support and support counters from the
MT-diagnostic counter set. Note that this counter set is
available only if SMT is enabled.
Signed-off-by: Hendrik Brueckner <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>
|
|
The validate_event() function just checked for reserved counters
in particular CPU-MF counter sets. Because the number of counters
in counter sets vary among different hardware models, remove the
explicit check to tolerate new models.
Reserved counters are not accounted and, thus, will return zero.
Signed-off-by: Hendrik Brueckner <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>
|
|
Use the highest counter number that can be specified for the
ecctr (extract CPU counter) instruction for perf.
Signed-off-by: Hendrik Brueckner <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>
|
|
Signed-off-by: Heiko Carstens <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>
|
|
Signed-off-by: Heiko Carstens <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>
|
|
Signed-off-by: Heiko Carstens <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>
|
|
Deferred struct page initialization works on s390. However it makes
only sense for the fake numa case, since the kthreads that initialize
struct pages are started per node. Without fake numa there is just a
single node and therefore no gain.
However there is no reason to not enable this feature. Therefore
select the config option and enable the feature in all config files.
Signed-off-by: Heiko Carstens <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>
|
|
gmap.c deals mostly with KVM-related memory management, so a lot
of changes to this file will come via the KVM tree. Reflect this
in MAINTAINERS. Please note that there are intricate ties to
arch/s390/mm/pgtable.c. If changes are needed in both files,
this will continue to be submitted via the s390 tree (or a
topic branch if necessary).
Signed-off-by: Christian Borntraeger <[email protected]>
Acked-by: Cornelia Huck <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>
|
|
Since linux v3.14 with commit 38dfac843cb6d7be1 ("vmcore: prevent PT_NOTE
p_memsz overflow during header update") on s390 we get the following
message in the kdump kernel:
Warning: Exceeded p_memsz, dropping PT_NOTE entry n_namesz=0x6b6b6b6b,
n_descsz=0x6b6b6b6b
The reason for this is that we don't create a final zero note in
the ELF header which the proc/vmcore code uses to find out the end
of the notes section (see also kernel/kexec_core.c:final_note()).
It still worked on s390 by chance because we (most of the time?) have the
byte pattern 0x6b6b6b6b after the notes section which also makes the notes
parsing code stop in update_note_header_size_elf64() because 0x6b6b6b6b is
interpreded as note size:
if ((real_sz + sz) > max_sz) {
pr_warn("Warning: Exceeded p_memsz, dropping P ...);
break;
}
So fix this and add the missing final note to the ELF header.
We don't have to adjust the memory size for ELF header ("alloc_size")
because the new ELF note still fits into the 0x1000 base memory.
Cc: [email protected] # v4.4+
Signed-off-by: Michael Holzheu <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>
|
|
HAVE_ARCH_EARLY_PFN_TO_NID selects a not present Kconfig
option. Therefore remove it.
Given that the first call of early_pfn_to_nid() happens after
numa_setup() finished to establish the memory to node mapping, there
is no need to implement an architecture private version of
__early_pfn_to_nid() like (only) ia64 does.
Signed-off-by: Heiko Carstens <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>
|
|
On some z/VM systems the query host access command is not supported for
temp disks, though the corresponding feature code is set.
This does not have any impact beside that the information is not available.
Suppress the full blown command reject error messages to not confuse the
user. The error is still logged in the s390dbf.
Signed-off-by: Stefan Haberland <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>
|
|
Some storage servers might not support the query host access feature.
Check if the corresponding feature code is set.
Signed-off-by: Stefan Haberland <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>
|