Age | Commit message (Collapse) | Author | Files | Lines |
|
Add PGSTE manipulation functions:
* set_pgste_bits sets specific bits in a PGSTE
* get_pgste returns the whole PGSTE
* pgste_perform_essa manipulates a PGSTE to set specific storage states
* ESSA_[SG]ET_* macros used to indicate the action for manipulate_pgste
Signed-off-by: Claudio Imbrenda <[email protected]>
Reviewed-by: Janosch Frank <[email protected]>
Reviewed-by: Christian Borntraeger <[email protected]>
Signed-off-by: Christian Borntraeger <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>
|
|
The CAD instruction never worked quite as expected for the spinlock
code. It has been disabled by default with git commit 61b0b01686d48220,
if the "cad" kernel parameter is specified it is enabled for both user
space and the spinlock code. Leave the option to enable the instruction
for user space but remove it from the spinlock code.
Signed-off-by: Martin Schwidefsky <[email protected]>
|
|
Add a couple more __atomic_xxx function to atomic_ops.h and use them
to replace the compare-and-swap inlines in the spinlock code. This
changes the type of the lock value from unsigned int to int.
Signed-off-by: Martin Schwidefsky <[email protected]>
|
|
There are three different code levels in regard to the identification
of guest samples. They differ in the way the LPP instruction is used.
1) Old kernels without the LPP instruction. The guest program parameter
is always zero.
2) Newer kernels load the process pid into the program parameter with LPP.
The guest program parameter is non-zero if the guest executes in a
process != idle.
3) The latest kernels load ((1UL << 31) | pid) with LPP to make the value
non-zero even for the idle task. The guest program parameter is non-zero
if the guest is running.
All kernels load the process pid to CR4 on context switch. The CPU sampling
code uses the value in CR4 to decide between guest and host samples in case
the guest program parameter is zero. The three cases:
1) CR4==pid, gpp==0
2) CR4==pid, gpp==pid
3) CR4==pid, gpp==((1UL << 31) | pid)
The load-control instruction to load the pid into CR4 is expensive and the
goal is to remove it. To distinguish the host CR4 from the guest pid for
the idle process the maximum value 0xffff for the PASN is used.
This adds a fourth case for a guest OS with an updated kernel:
4) CR4==0xffff, gpp=((1UL << 31) | pid)
The host kernel will have CR4==0xffff and will use (gpp!=0 || CR4!==0xffff)
to identify guest samples. This works nicely with all 4 cases, the only
possible issue would be a guest with an old kernel (gpp==0) and a process
pid of 0xffff. Well, don't do that..
Suggested-by: Christian Borntraeger <[email protected]>
Reviewed-by: Christian Borntraeger <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>
|
|
Move a struct definition to get rid of a forward declaration.
Signed-off-by: Sebastian Ott <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>
|
|
Users complained that they are hitting the limit of 64 functions (some
physical functions can spawn lots of virtual functions). Double the
default limit. With the latest savings in static data usage this
increases the image size by only 520 bytes.
Signed-off-by: Sebastian Ott <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>
|
|
Commit c506fff3d3a8 ("s390/pci: resize iomap") reduced the iomap
to NR_FUNCTIONS * PCI_BAR_COUNT elements. Since we only support
functions with 64bit BARs we can cut that number in half.
Signed-off-by: Sebastian Ott <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>
|
|
Address space identifiers are already defined in <asm/pci_insn.h>.
Signed-off-by: Sebastian Ott <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>
|
|
barsize was never used. Get rid of it.
Signed-off-by: Sebastian Ott <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>
|
|
The 32-bit lctl instruction is quite a bit slower than the 64-bit
counter part lctlg. Use the faster instruction.
Signed-off-by: Martin Schwidefsky <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/vfio-ccw into features
Pull vfio-ccw branch to add the basic channel I/O passthrough
intrastructure based on vfio.
The focus is on supporting dasd-eckd(cu_type/dev_type = 0x3990/0x3390)
as the target device.
Signed-off-by: Martin Schwidefsky <[email protected]>
|
|
Add Cornelia Huck and myself as the vfio-ccw driver maintainers.
Signed-off-by: Dong Jia Shi <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Cornelia Huck <[email protected]>
|
|
Although Linux does not use format-0 channel command words (CCW0)
these are a non-optional part of the platform spec, and for the sake
of platform compliance, and possibly some non-Linux guests, we have
to support CCW0.
Making the kernel execute a format 0 channel program is too much hassle
because we would need to allocate and use memory which can be addressed
by 24 bit physical addresses (because of CCW0.cda). So we implement CCW0
support by translating the channel program into an equivalent CCW1
program instead.
Based upon an orginal patch by Kai Yue Wang.
Signed-off-by: Dong Jia Shi <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Cornelia Huck <[email protected]>
|
|
Add file Documentation/s390/vfio-ccw.txt that includes details
of vfio-ccw.
Acked-by: Pierre Morel <[email protected]>
Signed-off-by: Dong Jia Shi <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Cornelia Huck <[email protected]>
|
|
The current implementation doesn't check if the subchannel is in a
proper device state when handling an event. Let's introduce
a finite state machine to manage the state/event change.
Signed-off-by: Dong Jia Shi <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Cornelia Huck <[email protected]>
|
|
Introduce a singlethreaded workqueue to handle the I/O interrupts.
With the work added to this queue, we store the I/O results to the
io_region of the subchannel, then signal the userspace program to
handle the results.
Signed-off-by: Dong Jia Shi <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Cornelia Huck <[email protected]>
|
|
Realize VFIO_DEVICE_GET_IRQ_INFO ioctl to retrieve
VFIO_CCW_IO_IRQ information.
Realize VFIO_DEVICE_SET_IRQS ioctl to set an eventfd fd for
VFIO_CCW_IO_IRQ. Once a write operation to the ccw_io_region
was performed, trigger a signal on this fd.
Reviewed-by: Pierre Morel <[email protected]>
Signed-off-by: Dong Jia Shi <[email protected]>
Acked-by: Alex Williamson <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Cornelia Huck <[email protected]>
|
|
Introduce VFIO_DEVICE_RESET ioctl for vfio-ccw to make it possible
to hot-reset the device.
We try to achieve a reset by first disabling the subchannel and
then enabling it again: this should clear all state at the subchannel.
Signed-off-by: Dong Jia Shi <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Cornelia Huck <[email protected]>
|
|
Introduce device information about vfio-ccw: VFIO_DEVICE_FLAGS_CCW.
Realize VFIO_DEVICE_GET_REGION_INFO ioctl for vfio-ccw.
Reviewed-by: Pierre Morel <[email protected]>
Signed-off-by: Dong Jia Shi <[email protected]>
Acked-by: Alex Williamson <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Cornelia Huck <[email protected]>
|
|
We implement the basic ccw command handling infrastructure
here:
1. Translate the ccw commands.
2. Issue the translated ccw commands to the device.
3. Once we get the execution result, update the guest SCSW
with it.
Acked-by: Pierre Morel <[email protected]>
Signed-off-by: Dong Jia Shi <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Cornelia Huck <[email protected]>
|
|
To provide user-space a set of interfaces to:
1. pass in a ccw program to perform an I/O operation.
2. read back I/O results of the completed I/O operations.
We introduce an MMIO region for the vfio-ccw device here.
This region is defined to content:
1. areas to store arguments that an ssch required.
2. areas to store the I/O results.
Using pwrite/pread to the device on this region, a user-space program
could write/read data to/from the vfio-ccw device.
Reviewed-by: Pierre Morel <[email protected]>
Signed-off-by: Dong Jia Shi <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Cornelia Huck <[email protected]>
|
|
To make vfio support subchannel devices, we need to leverage the
mediated device framework to create a mediated device for the
subchannel device.
This registers the subchannel device to the mediated device
framework during probe to enable mediated device creation.
Reviewed-by: Pierre Morel <[email protected]>
Signed-off-by: Dong Jia Shi <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Cornelia Huck <[email protected]>
|
|
Introduce ccwchain structure and helper functions that can be used to
handle a channel program issued from a virtual machine.
The following limitations apply:
1. Supports only prefetch enabled mode.
2. Supports idal(c64) ccw chaining.
3. Supports 4k idaw.
4. Supports ccw1.
5. Supports direct ccw chaining by translating them to idal ccws.
CCW translation requires to leverage the vfio_(un)pin_pages interfaces
to pin/unpin sets of mem pages frequently. Currently we have a lack of
support to do this in an efficient way. So we introduce pfn_array data
structure and helper functions to handle pin/unpin operations here.
Signed-off-by: Dong Jia Shi <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Cornelia Huck <[email protected]>
|
|
To make vfio support subchannel devices, we need a css driver for
the vfio subchannels. This patch adds a basic vfio-ccw subchannel
driver for this purpose.
To enable VFIO for vfio-ccw, enable S390_CCW_IOMMU config option
and configure VFIO as required.
Acked-by: Pierre Morel <[email protected]>
Signed-off-by: Dong Jia Shi <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Cornelia Huck <[email protected]>
|
|
Define vfio-ccw device API strings. CCW vendor driver using mediated
device framework should use this string for device_api attribute.
Reviewed-by: Pierre Morel <[email protected]>
Signed-off-by: Dong Jia Shi <[email protected]>
Acked-by: Alex Williamson <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Cornelia Huck <[email protected]>
|
|
Export the common I/O interfaces those are needed by an I/O
subchannel driver to actually talk to the subchannel.
Reviewed-by: Pierre Morel <[email protected]>
Signed-off-by: Dong Jia Shi <[email protected]>
Cc: Sebastian Ott <[email protected]>
Cc: Peter Oberparleiter <[email protected]>
Acked-by: Sebastian Ott <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Cornelia Huck <[email protected]>
|
|
For future code reuse purpose, this decouples the cio code with
the ccw device specific parts from ccw_device_cancel_halt_clear,
and makes a new common I/O interface named cio_cancel_halt_clear.
Reviewed-by: Pierre Morel <[email protected]>
Signed-off-by: Dong Jia Shi <[email protected]>
Cc: Sebastian Ott <[email protected]>
Cc: Peter Oberparleiter <[email protected]>
Acked-by: Sebastian Ott <[email protected]>
Message-Id: <[email protected]>
[CH: Fix typo]
Signed-off-by: Cornelia Huck <[email protected]>
|
|
The return code of hw_perf_event_update() is not evaluated by
its callers. Hence, simplify the function by removing the
return code.
Reported-by: Heiko Carstens <[email protected]>
Signed-off-by: Hendrik Brueckner <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>
|
|
Using a register variable for r4 is not necessary. Let the
compiler decide the register to be used.
Reported-by: Heiko Carstens <[email protected]>
Signed-off-by: Hendrik Brueckner <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>
|
|
Make clear that the event definitions relate to the counter
facility (cf) and not to the sampling facility (sf).
Signed-off-by: Hendrik Brueckner <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>
|
|
Add the event names for the IBM z13/z13s specific CPU-MF counters.
Also improve the merging of the generic and model specific events
so that their sysfs attribute definitions completely reside in
memory. Hence, flagging the generic event attribute definitions
as initdata too.
Signed-off-by: Hendrik Brueckner <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>
|
|
Complete the IBM z13 support and support counters from the
MT-diagnostic counter set. Note that this counter set is
available only if SMT is enabled.
Signed-off-by: Hendrik Brueckner <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>
|
|
The validate_event() function just checked for reserved counters
in particular CPU-MF counter sets. Because the number of counters
in counter sets vary among different hardware models, remove the
explicit check to tolerate new models.
Reserved counters are not accounted and, thus, will return zero.
Signed-off-by: Hendrik Brueckner <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>
|
|
Use the highest counter number that can be specified for the
ecctr (extract CPU counter) instruction for perf.
Signed-off-by: Hendrik Brueckner <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>
|
|
Signed-off-by: Heiko Carstens <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>
|
|
Signed-off-by: Heiko Carstens <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>
|
|
Signed-off-by: Heiko Carstens <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>
|
|
Deferred struct page initialization works on s390. However it makes
only sense for the fake numa case, since the kthreads that initialize
struct pages are started per node. Without fake numa there is just a
single node and therefore no gain.
However there is no reason to not enable this feature. Therefore
select the config option and enable the feature in all config files.
Signed-off-by: Heiko Carstens <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>
|
|
gmap.c deals mostly with KVM-related memory management, so a lot
of changes to this file will come via the KVM tree. Reflect this
in MAINTAINERS. Please note that there are intricate ties to
arch/s390/mm/pgtable.c. If changes are needed in both files,
this will continue to be submitted via the s390 tree (or a
topic branch if necessary).
Signed-off-by: Christian Borntraeger <[email protected]>
Acked-by: Cornelia Huck <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>
|
|
Since linux v3.14 with commit 38dfac843cb6d7be1 ("vmcore: prevent PT_NOTE
p_memsz overflow during header update") on s390 we get the following
message in the kdump kernel:
Warning: Exceeded p_memsz, dropping PT_NOTE entry n_namesz=0x6b6b6b6b,
n_descsz=0x6b6b6b6b
The reason for this is that we don't create a final zero note in
the ELF header which the proc/vmcore code uses to find out the end
of the notes section (see also kernel/kexec_core.c:final_note()).
It still worked on s390 by chance because we (most of the time?) have the
byte pattern 0x6b6b6b6b after the notes section which also makes the notes
parsing code stop in update_note_header_size_elf64() because 0x6b6b6b6b is
interpreded as note size:
if ((real_sz + sz) > max_sz) {
pr_warn("Warning: Exceeded p_memsz, dropping P ...);
break;
}
So fix this and add the missing final note to the ELF header.
We don't have to adjust the memory size for ELF header ("alloc_size")
because the new ELF note still fits into the 0x1000 base memory.
Cc: [email protected] # v4.4+
Signed-off-by: Michael Holzheu <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>
|
|
HAVE_ARCH_EARLY_PFN_TO_NID selects a not present Kconfig
option. Therefore remove it.
Given that the first call of early_pfn_to_nid() happens after
numa_setup() finished to establish the memory to node mapping, there
is no need to implement an architecture private version of
__early_pfn_to_nid() like (only) ia64 does.
Signed-off-by: Heiko Carstens <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>
|
|
On some z/VM systems the query host access command is not supported for
temp disks, though the corresponding feature code is set.
This does not have any impact beside that the information is not available.
Suppress the full blown command reject error messages to not confuse the
user. The error is still logged in the s390dbf.
Signed-off-by: Stefan Haberland <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>
|
|
Some storage servers might not support the query host access feature.
Check if the corresponding feature code is set.
Signed-off-by: Stefan Haberland <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>
|
|
ptep_notify and gmap_shadow_notify both need a guest address and
therefore retrieve them from the available virtual host address.
As they operate on the same guest address, we can calculate it once
and then pass it on. As a gmap normally has more than one shadow gmap,
we also do not recalculate for each of them any more.
Signed-off-by: Janosch Frank <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>
|
|
There is no need for the __ASSEMBLY__ ifdefery anymore since the
architecture level set code that deals with facility bits was
converted to C in the meantime.
Signed-off-by: Heiko Carstens <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>
|
|
Use the actual size of the facility list array within the lowcore
structure for the MAX_FACILITY_BIT define instead of a comment which
states what this is good for. This makes it a bit harder to break
things.
Signed-off-by: Heiko Carstens <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>
|
|
Provide the remaining stsi information via debugfs files. This also
might be useful for debugging purposes.
Suggested-by: Christian Borntraeger <[email protected]>
Acked-by: Christian Borntraeger <[email protected]>
Signed-off-by: Heiko Carstens <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>
|
|
Provide the raw stsi 15,1,x data contents via debugfs. This makes it
much easier to debug unexpected scheduling domains on machines that
provide cpu topology information.
Therefore this file adds a new 's390/stsi' debugfs directory with a
file for each possible topology nesting level that is allowed by the
architecture. The files will be created regardless if the machine
supports all, or any, level. If a level is not supported, or no data
is available, user space can recognize this with a -EINVAL error code
when trying to read such data.
In addition a 'topology' symlink is created that points to the file
that contains the data that is used to create the scheduling domains.
Acked-by: Christian Borntraeger <[email protected]>
Signed-off-by: Heiko Carstens <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>
|
|
Introduce a top-level 's390' directory which should be used when
adding new s390 specific debug feature files and/or directories.
This makes hopefully sure that the contents of the s390 directory will
be a bit more structured. Right now we have a couple of top-level
files where it is not easy to tell to which subsystem they belong to.
Acked-by: Christian Borntraeger <[email protected]>
Signed-off-by: Heiko Carstens <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>
|
|
User space needs some information about the secure key(s)
before actually invoking the pkey and/or paes funcionality.
This patch introduces a new ioctl API and in kernel API to
verify the the secure key blob and give back some
information about the key (type, bitsize, old MKVP).
Both APIs are described in detail in the header files
arch/s390/include/asm/pkey.h and arch/s390/include/uapi/asm/pkey.h.
Signed-off-by: Harald Freudenberger <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>
|