Age | Commit message (Collapse) | Author | Files | Lines |
|
After commit e2b3d202d1dba8f3546ed28224ce485bc50010be
("powerpc: Switch 16GB and 16MB explicit hugepages to a
different page table format"), we don't need to support
is_hugepd() for 64K page size.
Signed-off-by: Aneesh Kumar K.V <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
All unrecovered machine check errors on PowerNV should cause an
immediate panic. There are 2 reasons that this is the right policy:
it's not safe to continue, and we're already trying to reboot.
Firstly, if we go through the recovery process and do not successfully
recover, we can't be sure about the state of the machine, and it is
not safe to recover and proceed.
Linux knows about the following sources of Machine Check Errors:
- Uncorrectable Errors (UE)
- Effective - Real Address Translation (ERAT)
- Segment Lookaside Buffer (SLB)
- Translation Lookaside Buffer (TLB)
- Unknown/Unrecognised
In the SLB, TLB and ERAT cases, we can further categorise these as
parity errors, multihit errors or unknown/unrecognised.
We can handle SLB errors by flushing and reloading the SLB. We can
handle TLB and ERAT multihit errors by flushing the TLB. (It appears
we may not handle TLB and ERAT parity errors: I will investigate
further and send a followup patch if appropriate.)
This leaves us with uncorrectable errors. Uncorrectable errors are
usually the result of ECC memory detecting an error that it cannot
correct, but they also crop up in the context of PCI cards failing
during DMA writes, and during CAPI error events.
There are several types of UE, and there are 3 places a UE can occur:
Skiboot, the kernel, and userspace. For Skiboot errors, we have the
facility to make some recoverable. For userspace, we can simply kill
(SIGBUS) the affected process. We have no meaningful way to deal with
UEs in kernel space or in unrecoverable sections of Skiboot.
Currently, these unrecovered UEs fall through to
machine_check_expection() in traps.c, which calls die(), which OOPSes
and sends SIGBUS to the process. This sometimes allows us to stumble
onwards. For example we've seen UEs kill the kernel eehd and
khugepaged. However, the process killed could have held a lock, or it
could have been a more important process, etc: we can no longer make
any assertions about the state of the machine. Similarly if we see a
UE in skiboot (and again we've seen this happen), we're not in a
position where we can make any assertions about the state of the
machine.
Likewise, for unknown or unrecognised errors, we're not able to say
anything about the state of the machine.
Therefore, if we have an unrecovered MCE, the most appropriate thing
to do is to panic.
The second reason is that since e784b6499d9c ("powerpc/powernv: Invoke
opal_cec_reboot2() on unrecoverable machine check errors."), we
attempt a special OPAL reboot on an unhandled MCE. This is so the
hardware can record error data for later debugging.
The comments in that commit assert that we are heading down the panic
path anyway. At the moment this is not always true. With UEs in kernel
space, for instance, they are marked as recoverable by the hardware,
so if the attempt to reboot failed (e.g. old Skiboot), we wouldn't
panic() but would simply die() and OOPS. It doesn't make sense to be
staggering on if we've just tried to reboot: we should panic().
Explicitly panic() on unrecovered MCEs on PowerNV.
Update the comments appropriately.
This fixes some hangs following EEH events on cxlflash setups.
Signed-off-by: Daniel Axtens <[email protected]>
Reviewed-by: Andrew Donnellan <[email protected]>
Reviewed-by: Ian Munsie <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
pi_buff is being memset before it is sanity checked. Move the
memset after the null pi_buff sanity check to avoid an oops.
Signed-off-by: Colin Ian King <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
This avoid errors like
unsigned int usize = 1 << 30;
int size = 1 << 30;
unsigned long addr = 64UL << 30 ;
value = _ALIGN_DOWN(addr, usize); -> 0
value = _ALIGN_DOWN(addr, size); -> 0x1000000000
Signed-off-by: Aneesh Kumar K.V <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
native_hpte_clear() is called in real mode from two places:
- Early in boot during htab initialisation if firmware assisted dump is
active.
- Late in the kexec path.
In both contexts there is no need to disable interrupts are they are
already disabled. Furthermore, locking around the tlbie() is only required
for pre POWER5 hardware.
On POWER5 or newer hardware concurrent tlbie()s work as expected and on pre
POWER5 hardware concurrent tlbie()s could result in deadlock. This code
would only be executed at crashdump time, during which all bets are off,
concurrent tlbie()s are unlikely and taking locks is unsafe therefore the
best course of action is to simply do nothing. Concurrent tlbie()s are not
possible in the first case as secondary CPUs have not come up yet.
Signed-off-by: Cyril Bur <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
For some reason, only the little-endian flavor of
powerpc provided the zero_bytemask() implementation.
Reported-by: Michal Sojka <[email protected]>
Acked-by: Michael Ellerman <[email protected]>
Signed-off-by: Chris Metcalf <[email protected]>
|
|
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Enable the "wake-on-filer" (aka. wake on user defined packet)
wake on lan capability for the eTSEC ethernet nodes.
Cc: Li Yang <[email protected]>
Cc: Zhao Chenhui <[email protected]>
Signed-off-by: Claudiu Manoil <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
arch/tile added word-at-a-time.h after the patch that added generic-y
entries; the generic-y entry is now stale.
arch/h8300 is newer than the generic-y patch for word-at-a-time.h,
and needs a generic-y entry.
arch/powerpc seems to have gotten a generic-y entry by mistake in
the first patch; this change removes it.
Signed-off-by: Chris Metcalf <[email protected]>
|
|
Always include a timeout when waiting for secondary cpus to enter OPAL
in the kexec path, rather than only when crashing.
Signed-off-by: Samuel Mendoza-Jonas <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
show_interrupts() expects the irq_chip name to be max 8 characters
otherwise everything get misaligned
# cat /proc/interrupts
CPU0
17: 0 CPM PIC 0 Level error
19: 0 MPC8XX SIU 15 Level tbint
20: 90 CPM PIC 4 Level cpm_uart
38: 29746 MPC8XX SIU 5 Level fs_enet-mac
39: 0 MPC8XX SIU 7 Level fs_enet-mac
47: 401 CPM PIC 5 Level fsl_spi
68: 1 MPC8XX SIU 2 Level phy_interrupt, phy_interrupt, phy_interrupt
LOC: 7225485 Local timer interrupts for timer event device
LOC: 9 Local timer interrupts for others
SPU: 0 Spurious interrupts
PMI: 0 Performance monitoring interrupts
MCE: 0 Machine check exceptions
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
During the MSI bitmap test on boot kmemleak spews the following trace:
unreferenced object 0xc00000016e86c900 (size 64):
comm "swapper/0", pid 1, jiffies 4294893173 (age 518.024s)
hex dump (first 32 bytes):
00 00 01 ff 7f ff 7f 37 00 00 00 00 00 00 00 00
.......7........
ff ff ff ff ff ff ff ff 01 ff ff ff ff
ff ff ff
................
backtrace:
[<c00000000003eebc>] .zalloc_maybe_bootmem+0x3c/0x380
[<c000000000042d6c>] .msi_bitmap_alloc+0x3c/0xb0
[<c000000000a9aff8>] .msi_bitmap_selftest+0x30/0x2b4
[<c0000000000090f4>] .do_one_initcall+0xd4/0x270
[<c000000000a8e250>] .kernel_init_freeable+0x1a0/0x280
[<c000000000009b5c>] .kernel_init+0x1c/0x120
[<c000000000007fbc>] .ret_from_kernel_thread+0x58/0x9c
Add a flag to msi_bitmap for tracking allocations from slab and memblock
so we can properly free/handle memory in msi_bitmap_free().
Signed-off-by: Denis Kirjanov <[email protected]>
Reviewed-by: Catalin Marinas <[email protected]>
[mpe: Reword changelog & use bitmap_from_slab in the if]
Signed-off-by: Michael Ellerman <[email protected]>
|
|
The derive_parent() has similar semantics to what we have in newly introduced
of_helpers module. The replacement reduces code base and propagates the actual
error code to the caller.
Signed-off-by: Andy Shevchenko <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
In case we have node without '/' strrchr() returns NULL which might lead to
crash. Replace strrchr() by kbasename() and modify condition to avoid such
behaviour.
Suggested-by: Segher Boessenkool <[email protected]>
Signed-off-by: Andy Shevchenko <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
The helper kstrndup() will do the same in one line.
Signed-off-by: Andy Shevchenko <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
In case we have a full node name like /foo/bar and /foo is not found the
parent_path left unfreed. So, free a memory before return to a caller.
Signed-off-by: Andy Shevchenko <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Extract a new module to share the code between other modules.
There is no functional change.
Signed-off-by: Andy Shevchenko <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile
Pull strscpy string copy function implementation from Chris Metcalf.
Chris sent this during the merge window, but I waffled back and forth on
the pull request, which is why it's going in only now.
The new "strscpy()" function is definitely easier to use and more secure
than either strncpy() or strlcpy(), both of which are horrible nasty
interfaces that have serious and irredeemable problems.
strncpy() has a useless return value, and doesn't NUL-terminate an
overlong result. To make matters worse, it pads a short result with
zeroes, which is a performance disaster if you have big buffers.
strlcpy(), by contrast, is a mis-designed "fix" for strlcpy(), lacking
the insane NUL padding, but having a differently broken return value
which returns the original length of the source string. Which means
that it will read characters past the count from the source buffer, and
you have to trust the source to be properly terminated. It also makes
error handling fragile, since the test for overflow is unnecessarily
subtle.
strscpy() avoids both these problems, guaranteeing the NUL termination
(but not excessive padding) if the destination size wasn't zero, and
making the overflow condition very obvious by returning -E2BIG. It also
doesn't read past the size of the source, and can thus be used for
untrusted source data too.
So why did I waffle about this for so long?
Every time we introduce a new-and-improved interface, people start doing
these interminable series of trivial conversion patches.
And every time that happens, somebody does some silly mistake, and the
conversion patch to the improved interface actually makes things worse.
Because the patch is mindnumbing and trivial, nobody has the attention
span to look at it carefully, and it's usually done over large swatches
of source code which means that not every conversion gets tested.
So I'm pulling the strscpy() support because it *is* a better interface.
But I will refuse to pull mindless conversion patches. Use this in
places where it makes sense, but don't do trivial patches to fix things
that aren't actually known to be broken.
* 'strscpy' of git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile:
tile: use global strscpy() rather than private copy
string: provide strscpy()
Make asm/word-at-a-time.h available on all architectures
|
|
As we need to add further flags to the bpf_prog structure, lets migrate
both bools to a bitfield representation. The size of the base structure
(excluding insns) remains unchanged at 40 bytes.
Add also tags for the kmemchecker, so that it doesn't throw false
positives. Even in case gcc would generate suboptimal code, it's not
being accessed in performance critical paths.
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Alexei Starovoitov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
'nvram_create_os_partition' should be 'nvram_create_partition'.
Use __func__ to have it right, as done elsewhere in this file.
Signed-off-by: Christophe Jaillet <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
If 'nvram_write_header' fails, then 'new_part' should be freed, otherwise,
there is a memory leak.
Signed-off-by: Christophe Jaillet <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Based directly on ppc64_defconfig using merge_config.
Signed-off-by: Michael Ellerman <[email protected]>
|
|
This add helper virt_to_pfn and remove the opencoded usage of the
same.
Signed-off-by: Aneesh Kumar K.V <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
powerpc has a link register (lr) used for calling functions. We "bl
<func>" to call a function, and "blr" to return back to the call site.
The lr is only a single register, so if we call another function from
inside this function (ie. nested calls), software must save away the
lr on the software stack before calling the new function. Before
returning (ie. before the "blr"), the lr is restored by software from
the software stack.
This makes branch prediction quite difficult for the processor as it
will only know the branch target just before the "blr".
To help with this, modern powerpc processors keep a (non-architected)
hardware stack of lr called a "link stack". When a "bl <func>" is
run, the lr is pushed onto this stack. When a "blr" is called, the
branch predictor pops the lr value from the top of the link stack, and
uses it to predict the branch target. Hence the processor pipeline
knows a lot earlier the branch target.
This works great but there are some cases where you call "bl" but
without a matching "blr". Once such case is when trying to determine
the program counter (which can't be read directly). Here you "bl+4;
mflr" to get the program counter. If you do this, the link stack will
get out of sync with reality, causing the branch predictor to
mis-predict subsequent function returns.
To avoid this, modern micro-architectures have a special case of bl.
Using the form "bcl 20,31,+4", ensures the processor doesn't push to
the link stack.
The 32 and 64 bit variants of __get_datapage() use a "bl; mflr" to
determine the loaded address of the VDSO. The current versions of
these attempt to use this special bl variant.
Unfortunately they use +8 rather than the required +4. Hence the
current code results in the link stack getting out of sync with
reality and hence the resulting performance degradation.
This patch moves it to bcl+4 by moving __kernel_datapage_offset out of
__get_datapage().
With this patch, running a gettimeofday() (which uses
__get_datapage()) microbenchmark we get a decent bump in performance
on POWER7/8.
For the benchmark in tools/testing/selftests/powerpc/benchmarks/gettimeofday.c
POWER8:
64bit gets ~4% improvement
32bit gets ~9% improvement
POWER7:
64bit gets ~7% improvement
Signed-off-by: Michael Neuling <[email protected]>
Reported-by: Aaron Sawdey <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
For no reason other than it looks ugly.
Signed-off-by: Michael Ellerman <[email protected]>
|
|
This patch defines macros for the three bolted SLB indexes we use.
Switch the functions that take the indexes as an argument to use the
enum.
Signed-off-by: Anshuman Khandual <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Andy Lutomirski says:
Some dynamic loaders may be slightly faster if a GNU hash is
available.
This is unlikely to have any measurable effect on the time it takes
to resolve vdso symbols (since there are so few of them). In some
contexts, it can be a win for a different reason: if every DSO has a
GNU hash section, then libc can avoid calculating SysV hashes at
all. Both musl and glibc appear to have this optimization.
Signed-off-by: Michael Ellerman <[email protected]>
|
|
This struct is unused, which is now a build error with gcc 6:
error: 'os_area_db_id_video_mode' defined but not used
There doesn't seem to be any good reason to keep it around so remove it,
it's in the history if anyone needs it.
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Refresh and remove obsolete CONFIG_EXT3_FS.
Signed-off-by: Geoff Levand <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Currently, little endian is only supported on powernv and pseries,
however, Kconfigs still allow us to include other platforms in a LE
kernel, this may result in space wasting or even build error if some
BE-only platforms always assume they are built for a BE kernel. So just
modify the Kconfigs of BE-only platforms to remove them from being built
for a LE kernel.
For 32bit only platforms, nothing needs to be done, because
CPU_LITTLE_ENDIAN depends on PPC64. For 64bit supported platforms, add
CPU_BIG_ENDIAN to dependencies explicitly, so that these platforms will
be disabled for LE [Suggested-by: Cédric Le Goater <[email protected]>].
Signed-off-by: Boqun Feng <[email protected]>
Acked-by: Geoff Levand <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Commit 086b91d052eb ("scsi_dh: integrate into the core SCSI code")
changed CONFIG_SCSI_DH from tristate to bool.
Our defconfigs have CONFIG_SCSI_DH=m, which the kconfig machinery warns
us is invalid, but instead of converting it to =y it leaves it unset.
This means we loose the CONFIG_SCSI_DH code and everything that depends
on it.
So convert the values in the defconfigs to =y.
Fixes: 086b91d052eb ("scsi_dh: integrate into the core SCSI code")
Signed-off-by: Michael Ellerman <[email protected]>
|
|
changes
Signed-off-by: Ingo Molnar <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull PCI fixes from Bjorn Helgaas:
"These are fixes for things we merged for v4.3 (VPD, MSI, and bridge
window management), and a new Renesas R8A7794 SoC device ID.
Details:
Resource management:
- Revert pci_read_bridge_bases() unification (Bjorn Helgaas)
- Clear IORESOURCE_UNSET when clipping a bridge window (Bjorn
Helgaas)
MSI:
- Fix MSI IRQ domains for VFs on virtual buses (Alex Williamson)
Renesas R-Car host bridge driver:
- Add R8A7794 support (Sergei Shtylyov)
Miscellaneous:
- Fix devfn for VPD access through function 0 (Alex Williamson)
- Use function 0 VPD only for identical functions (Alex Williamson)"
* tag 'pci-v4.3-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
PCI: rcar: Add R8A7794 support
PCI: Use function 0 VPD for identical functions, regular VPD for others
PCI: Fix devfn for VPD access through function 0
PCI/MSI: Fix MSI IRQ domains for VFs on virtual buses
PCI: Clear IORESOURCE_UNSET when clipping a bridge window
PCI: Revert "PCI: Call pci_read_bridge_bases() from core instead of arch code"
|
|
Pull KVM fixes from Paolo Bonzini:
"AMD fixes for bugs introduced in the 4.2 merge window, and a few PPC
bug fixes too"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: disable halt_poll_ns as default for s390x
KVM: x86: fix off-by-one in reserved bits check
KVM: x86: use correct page table format to check nested page table reserved bits
KVM: svm: do not call kvm_set_cr0 from init_vmcb
KVM: x86: trap AMD MSRs for the TSeg base and mask
KVM: PPC: Book3S: Take the kvm->srcu lock in kvmppc_h_logical_ci_load/store()
KVM: PPC: Book3S HV: Pass the correct trap argument to kvmhv_commence_exit
KVM: PPC: Book3S HV: Fix handling of interrupted VCPUs
kvm: svm: reset mmu on VCPU reset
|
|
We observed some performance degradation on s390x with dynamic
halt polling. Until we can provide a proper fix, let's enable
halt_poll_ns as default only for supported architectures.
Architectures are now free to set their own halt_poll_ns
default value.
Signed-off-by: David Hildenbrand <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
The selftest passes on 64-bit LE & BE, and 32-bit.
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Access to the kvm->buses (like with the kvm_io_bus_read() and -write()
functions) has to be protected via the kvm->srcu lock.
The kvmppc_h_logical_ci_load() and -store() functions are missing
this lock so far, so let's add it there, too.
This fixes the problem that the kernel reports "suspicious RCU usage"
when lock debugging is enabled.
Cc: [email protected] # v4.1+
Fixes: 99342cf8044420eebdf9297ca03a14cb6a7085a1
Signed-off-by: Thomas Huth <[email protected]>
Signed-off-by: Paul Mackerras <[email protected]>
|
|
In guest_exit_cont we call kvmhv_commence_exit which expects the trap
number as the argument. However r3 doesn't contain the trap number at
this point and as a result we would be calling the function with a
spurious trap number.
Fix this by copying r12 into r3 before calling kvmhv_commence_exit as
r12 contains the trap number.
Cc: [email protected] # v4.1+
Fixes: eddb60fb1443
Signed-off-by: Gautham R. Shenoy <[email protected]>
Signed-off-by: Paul Mackerras <[email protected]>
|
|
This fixes a bug which results in stale vcore pointers being left in
the per-cpu preempted vcore lists when a VM is destroyed. The result
of the stale vcore pointers is usually either a crash or a lockup
inside collect_piggybacks() when another VM is run. A typical
lockup message looks like:
[ 472.161074] NMI watchdog: BUG: soft lockup - CPU#24 stuck for 22s! [qemu-system-ppc:7039]
[ 472.161204] Modules linked in: kvm_hv kvm_pr kvm xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw ses enclosure shpchp rtc_opal i2c_opal powernv_rng binfmt_misc dm_service_time scsi_dh_alua radeon i2c_algo_bit drm_kms_helper ttm drm tg3 ptp pps_core cxgb3 ipr i2c_core mdio dm_multipath [last unloaded: kvm_hv]
[ 472.162111] CPU: 24 PID: 7039 Comm: qemu-system-ppc Not tainted 4.2.0-kvm+ #49
[ 472.162187] task: c000001e38512750 ti: c000001e41bfc000 task.ti: c000001e41bfc000
[ 472.162262] NIP: c00000000096b094 LR: c00000000096b08c CTR: c000000000111130
[ 472.162337] REGS: c000001e41bff520 TRAP: 0901 Not tainted (4.2.0-kvm+)
[ 472.162399] MSR: 9000000100009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 24848844 XER: 00000000
[ 472.162588] CFAR: c00000000096b0ac SOFTE: 1
GPR00: c000000000111170 c000001e41bff7a0 c00000000127df00 0000000000000001
GPR04: 0000000000000003 0000000000000001 0000000000000000 0000000000874821
GPR08: c000001e41bff8e0 0000000000000001 0000000000000000 d00000000efde740
GPR12: c000000000111130 c00000000fdae400
[ 472.163053] NIP [c00000000096b094] _raw_spin_lock_irqsave+0xa4/0x130
[ 472.163117] LR [c00000000096b08c] _raw_spin_lock_irqsave+0x9c/0x130
[ 472.163179] Call Trace:
[ 472.163206] [c000001e41bff7a0] [c000001e41bff7f0] 0xc000001e41bff7f0 (unreliable)
[ 472.163295] [c000001e41bff7e0] [c000000000111170] __wake_up+0x40/0x90
[ 472.163375] [c000001e41bff830] [d00000000efd6fc0] kvmppc_run_core+0x1240/0x1950 [kvm_hv]
[ 472.163465] [c000001e41bffa30] [d00000000efd8510] kvmppc_vcpu_run_hv+0x5a0/0xd90 [kvm_hv]
[ 472.163559] [c000001e41bffb70] [d00000000e9318a4] kvmppc_vcpu_run+0x44/0x60 [kvm]
[ 472.163653] [c000001e41bffba0] [d00000000e92e674] kvm_arch_vcpu_ioctl_run+0x64/0x170 [kvm]
[ 472.163745] [c000001e41bffbe0] [d00000000e9263a8] kvm_vcpu_ioctl+0x538/0x7b0 [kvm]
[ 472.163834] [c000001e41bffd40] [c0000000002d0f50] do_vfs_ioctl+0x480/0x7c0
[ 472.163910] [c000001e41bffde0] [c0000000002d1364] SyS_ioctl+0xd4/0xf0
[ 472.163986] [c000001e41bffe30] [c000000000009260] system_call+0x38/0xd0
[ 472.164060] Instruction dump:
[ 472.164098] ebc1fff0 ebe1fff8 7c0803a6 4e800020 60000000 60000000 60420000 8bad02e2
[ 472.164224] 7fc3f378 4b6a57c1 60000000 7c210b78 <e92d0000> 89290009 792affe3 40820070
The bug is that kvmppc_run_vcpu does not correctly handle the case
where a vcpu task receives a signal while its guest vcpu is executing
in the guest as a result of being piggy-backed onto the execution of
another vcore. In that case we need to wait for the vcpu to finish
executing inside the guest, and then remove this vcore from the
preempted vcores list. That way, we avoid leaving this vcpu's vcore
on the preempted vcores list when the vcpu gets interrupted.
Fixes: ec2571650826
Reported-by: Thomas Huth <[email protected]>
Tested-by: Thomas Huth <[email protected]>
Signed-off-by: Paul Mackerras <[email protected]>
|
|
Pull KVM fixes from Paolo Bonzini:
"Mostly stable material, a lot of ARM fixes"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (22 commits)
sched: access local runqueue directly in single_task_running
arm/arm64: KVM: Remove 'config KVM_ARM_MAX_VCPUS'
arm64: KVM: Remove all traces of the ThumbEE registers
arm: KVM: Disable virtual timer even if the guest is not using it
arm64: KVM: Disable virtual timer even if the guest is not using it
arm/arm64: KVM: vgic: Check for !irqchip_in_kernel() when mapping resources
KVM: s390: Replace incorrect atomic_or with atomic_andnot
arm: KVM: Fix incorrect device to IPA mapping
arm64: KVM: Fix user access for debug registers
KVM: vmx: fix VPID is 0000H in non-root operation
KVM: add halt_attempted_poll to VCPU stats
kvm: fix zero length mmio searching
kvm: fix double free for fast mmio eventfd
kvm: factor out core eventfd assign/deassign logic
kvm: don't try to register to KVM_FAST_MMIO_BUS for non mmio eventfd
KVM: make the declaration of functions within 80 characters
KVM: arm64: add workaround for Cortex-A57 erratum #852523
KVM: fix polling for guest halt continued even if disable it
arm/arm64: KVM: Fix PSCI affinity info return value for non valid cores
arm64: KVM: set {v,}TCR_EL2 RES1 bits
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq updates from Thomas Gleixner:
"This is a rather large update post rc1 due to the final steps of
cleanups and API changes which had to wait for the preparatory patches
to hit your tree.
- Regression fixes for ARM GIC irqchips
- Regression fixes and lockdep anotations for renesas irq chips
- The leftovers of the cleanup and preparatory patches which have
been ignored by maintainers
- Final conversions of the newly merged users of obsolete APIs
- Final removal of obsolete APIs
- Final removal of ARM artifacts which had been introduced during the
conversion of ARM to the generic interrupt code.
- Final split of the irq_data into chip specific and common data to
reflect the needs of hierarchical irq domains.
- Treewide removal of the first argument of interrupt flow handlers,
i.e. the irq number, which is not used by the majority of handlers
and simple to retrieve from the other argument the irq descriptor.
- A few comment updates and build warning fixes"
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (40 commits)
arm64: Remove ununsed set_irq_flags
ARM: Remove ununsed set_irq_flags
sh: Kill off set_irq_flags usage
irqchip: Kill off set_irq_flags usage
gpu/drm: Kill off set_irq_flags usage
genirq: Remove irq argument from irq flow handlers
genirq: Move field 'msi_desc' from irq_data into irq_common_data
genirq: Move field 'affinity' from irq_data into irq_common_data
genirq: Move field 'handler_data' from irq_data into irq_common_data
genirq: Move field 'node' from irq_data into irq_common_data
irqchip/gic-v3: Use IRQD_FORWARDED_TO_VCPU flag
irqchip/gic: Use IRQD_FORWARDED_TO_VCPU flag
genirq: Provide IRQD_FORWARDED_TO_VCPU status flag
genirq: Simplify irq_data_to_desc()
genirq: Remove __irq_set_handler_locked()
pinctrl/pistachio: Use irq_set_handler_locked
gpio: vf610: Use irq_set_handler_locked
powerpc/mpc8xx: Use irq_set_handler_locked()
powerpc/ipic: Use irq_set_handler_locked()
powerpc/cpm2: Use irq_set_handler_locked()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
- Fix 32-bit TCE table init in kdump kernel from Nish
- Fix kdump with non-power-of-2 crashkernel= from Nish
- Abort cxl_pci_enable_device_hook() if PCI channel is offline from
Andrew
- Fix to release DRC when configure_connector() fails from Bharata
- Wire up sys_userfaultfd()
- Fix race condition in tearing down MSI interrupts from Paul
- Fix unbalanced pci_dev_get() in cxl_probe() from Daniel
- Fix cxl build failure due to -Wunused-variable gcc behaviour change
from Ian
- Tell the toolchain to use ABI v2 when building an LE boot wrapper
from Benh
- Fix THP to recompute hash value after a failed update from Aneesh
- 32-bit memcpy/memset: only use dcbz once cache is enabled from
Christophe
* tag 'powerpc-4.3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc32: memset: only use dcbz once cache is enabled
powerpc32: memcpy: only use dcbz once cache is enabled
powerpc/mm: Recompute hash value after a failed update
powerpc/boot: Specify ABI v2 when building an LE boot wrapper
cxl: Fix build failure due to -Wunused-variable behaviour change
cxl: Fix unbalanced pci_dev_get in cxl_probe
powerpc/MSI: Fix race condition in tearing down MSI interrupts
powerpc: Wire up sys_userfaultfd()
powerpc/pseries: Release DRC when configure_connector fails
cxl: abort cxl_pci_enable_device_hook() if PCI channel is offline
powerpc/powernv/pci-ioda: fix kdump with non-power-of-2 crashkernel=
powerpc/powernv/pci-ioda: fix 32-bit TCE table init in kdump kernel
|
|
When find_and_init_phbs() looks for the probe-only property, it seems to
trust the firmware to be correctly written, and assumes that there is a
parameter to the property.
It is conceivable that the firmware could not be that perfect, and it could
expose this property naked (at least one arm64 platform seems to exhibit
this exact behaviour). The setup code the ends up making a decision based
on whatever the property pointer points to, which is likely to be junk.
Instead, switch to the common of_pci.c implementation that doesn't suffer
from this problem and ignore the property if the firmware couldn't make up
its mind.
Signed-off-by: Marc Zyngier <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
Reviewed-by: Rob Herring <[email protected]>
Acked-by: Michael Ellerman <[email protected]>
|
|
memset() uses instruction dcbz to speed up clearing by not wasting time
loading cache line with data that will be overwritten.
Some platform like mpc52xx do no have cache active at startup and
can therefore not use memset(). Allthough no part of the code
explicitly uses memset(), GCC may make calls to it.
This patch modifies memset() such that at startup, memset()
unconditionally skip the optimised bloc that uses dcbz instruction.
Once the initial MMU is set up, in machine_init() we patch memset()
by replacing this inconditional jump by a NOP
Tested-by: Thomas Gleixner <[email protected]>
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
memcpy() uses instruction dcbz to speed up copy by not wasting time
loading cache line with data that will be overwritten.
Some platform like mpc52xx do no have cache active at startup and
can therefore not use memcpy(). Allthough no part of the code
explicitly uses memcpy(), GCC makes calls to it.
This patch modifies memcpy() such that at startup, memcpy()
unconditionally jumps to generic_memcpy() which doesn't use
the dcbz instruction.
Once the initial MMU is set up, in machine_init() we patch memcpy()
by replacing this inconditional jump by a NOP
Reported-by: Michal Sojka <[email protected]>
Tested-by: Thomas Gleixner <[email protected]>
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Most interrupt flow handlers do not use the irq argument. Those few
which use it can retrieve the irq number from the irq descriptor.
Remove the argument.
Search and replace was done with coccinelle and some extra helper
scripts around it. Thanks to Julia for her help!
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: Julia Lawall <[email protected]>
Cc: Jiang Liu <[email protected]>
|
|
Use irq_set_handler_locked() as it avoids a redundant lookup of the
irq descriptor.
Search and replacement was done with coccinelle:
@@
struct irq_data *d;
expression E1;
@@
-__irq_set_handler_locked(d->irq, E1);
+irq_set_handler_locked(d, E1);
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: Jiang Liu <[email protected]>
Cc: Julia Lawall <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: [email protected]
|
|
Use irq_set_handler_locked() as it avoids a redundant lookup of the
irq descriptor.
Search and replacement was done with coccinelle:
@@
struct irq_data *d;
expression E1;
@@
-__irq_set_handler_locked(d->irq, E1);
+irq_set_handler_locked(d, E1);
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: Jiang Liu <[email protected]>
Cc: Julia Lawall <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Anton Blanchard <[email protected]>
Cc: [email protected]
|
|
Use irq_set_handler_locked() as it avoids a redundant lookup of the
irq descriptor.
Search and replacement was done with coccinelle:
@@
struct irq_data *d;
expression E1;
@@
-__irq_set_handler_locked(d->irq, E1);
+irq_set_handler_locked(d, E1);
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: Jiang Liu <[email protected]>
Cc: Julia Lawall <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: [email protected]
|
|
Use irq_set_handler_locked() as it avoids a redundant lookup of the
irq descriptor.
Search and replacement was done with coccinelle:
@@
struct irq_data *d;
expression E1;
@@
-__irq_set_handler_locked(d->irq, E1);
+irq_set_handler_locked(d, E1);
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: Jiang Liu <[email protected]>
Cc: Julia Lawall <[email protected]>
Cc: Anatolij Gustschin <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: [email protected]
|