Age | Commit message (Collapse) | Author | Files | Lines |
|
Currently if you are in xmon without an oops etc. to view the kernel
version you have to type "d $linux_banner" - not necessarily obvious. As
this is useful information, append to the output of "e" command.
Example output:
$mon> e
cpu 0x1: Vector: 0 at [c0000000f879ba80]
pc: c000000000081718: sysrq_handle_xmon+0x68/0x80
lr: c000000000081718: sysrq_handle_xmon+0x68/0x80
sp: c0000000f879bbe0
msr: 8000000000009033
current = 0xc0000000f604d5c0
paca = 0xc00000000fdc0480 softe: 0 irq_happened: 0x01
pid = 2467, comm = bash
Linux version 4.4.0-rc2-00008-gc51af91c3ab3-dirty (rashmica@circle) (gcc
version 5.1.1 20150629 (GCC) ) #45 SMP Wed Nov 25 10:25:12 AEDT 2015
Signed-off-by: Rashmica Gupta <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
All users of QPACE have upgraded to QPACE2 so remove the Cell QPACE code.
Signed-off-by: Rashmica Gupta <[email protected]>
Acked-by: Arnd Bergmann <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Kernel prints respective warnings about various EPOW events for
user information/action after parsing EPOW interrupts. At times
below EPOW reset event warning is seen to be flooding kernel log
over a period of time.
May 25 03:46:34 alp kernel: Non critical power or cooling issue cleared
May 25 03:46:52 alp kernel: Non critical power or cooling issue cleared
May 25 03:53:48 alp kernel: Non critical power or cooling issue cleared
May 25 03:55:46 alp kernel: Non critical power or cooling issue cleared
May 25 03:56:34 alp kernel: Non critical power or cooling issue cleared
May 25 03:59:04 alp kernel: Non critical power or cooling issue cleared
May 25 04:02:01 alp kernel: Non critical power or cooling issue cleared
These EPOW reset events are spurious in nature and are triggered by
firmware without an actual EPOW event being reset. This patch avoids these
multiple EPOW reset warnings by using a counter variable. This variable
is incremented every time an EPOW event is reported. Upon receiving a EPOW
reset event the same variable is checked to filter out spurious events and
decremented accordingly.
This patch also improves log messages to better describe EPOW event being
reported. Merged adjacent log messages into single one to reduce number of
lines printed per event.
Signed-off-by: Kamalesh Babulal <[email protected]>
Signed-off-by: Vipin K Parashar <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Merge the two TM fixes we merged in 4.4. We are about to merge selftests
for these, and without the fixes the selftests will oops.
powerpc fixes for 4.4 #2
- tm: Block signal return from setting invalid MSR state from Michael Neuling
- tm: Check for already reclaimed tasks from Michael Neuling
|
|
Print MSR TM bits in oops messages. This appends them to the end
like this:
MSR: 8000000502823031 <SF,VEC,VSX,FP,ME,IR,DR,LE,TM[TE]>
You get the TM[] only if at least one TM MSR bit is set. Inside the
TM[], E means Enabled (bit 32), S means Suspended (bit 33), and T
means Transactional (bit 34)
If no bits are set, you get no TM[] output.
Include rework of printbits() to handle this case.
Signed-off-by: Michael Neuling <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
According to memory-barriers.txt, xchg*, cmpxchg* and their atomic_
versions all need to be fully ordered, however they are now just
RELEASE+ACQUIRE, which are not fully ordered.
So also replace PPC_RELEASE_BARRIER and PPC_ACQUIRE_BARRIER with
PPC_ATOMIC_ENTRY_BARRIER and PPC_ATOMIC_EXIT_BARRIER in
__{cmp,}xchg_{u32,u64} respectively to guarantee fully ordered semantics
of atomic{,64}_{cmp,}xchg() and {cmp,}xchg(), as a complement of commit
b97021f85517 ("powerpc: Fix atomic_xxx_return barrier semantics")
This patch depends on patch "powerpc: Make value-returning atomics fully
ordered" for PPC_ATOMIC_ENTRY_BARRIER definition.
Cc: [email protected] # 3.2+
Signed-off-by: Boqun Feng <[email protected]>
Reviewed-by: Paul E. McKenney <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
According to memory-barriers.txt:
> Any atomic operation that modifies some state in memory and returns
> information about the state (old or new) implies an SMP-conditional
> general memory barrier (smp_mb()) on each side of the actual
> operation ...
Which mean these operations should be fully ordered. However on PPC,
PPC_ATOMIC_ENTRY_BARRIER is the barrier before the actual operation,
which is currently "lwsync" if SMP=y. The leading "lwsync" can not
guarantee fully ordered atomics, according to Paul Mckenney:
https://lkml.org/lkml/2015/10/14/970
To fix this, we define PPC_ATOMIC_ENTRY_BARRIER as "sync" to guarantee
the fully-ordered semantics.
This also makes futex atomics fully ordered, which can avoid possible
memory ordering problems if userspace code relies on futex system call
for fully ordered semantics.
Fixes: b97021f85517 ("powerpc: Fix atomic_xxx_return barrier semantics")
Cc: [email protected] # 3.2+
Signed-off-by: Boqun Feng <[email protected]>
Reviewed-by: Paul E. McKenney <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
The slot information of base page size hash pte is stored in the
pgtable_t w.r.t transparent hugepage. We need to make sure we don't
index beyond pgtable_t size.
Signed-off-by: Aneesh Kumar K.V <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
This will bulk read 4 hash pte slot entries and should reduce the loop
Signed-off-by: Aneesh Kumar K.V <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Remove the related functions and #defines
Signed-off-by: Aneesh Kumar K.V <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
They don't need to track 4k subpage slot details and hence don't need
second half of pgtable_t.
Signed-off-by: Aneesh Kumar K.V <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Use the #define instead of open-coding the same
Signed-off-by: Aneesh Kumar K.V <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
pte and pmd table size are dependent on config items. Don't
hard code the same. This make sure we use the right value
when masking pmd entries and also while checking pmd_bad
Signed-off-by: Aneesh Kumar K.V <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
For a pte entry we will have _PAGE_PTE set. Our pte page
address have a minimum alignment requirement of HUGEPD_SHIFT_MASK + 1.
We use the lower 7 bits to indicate hugepd. ie.
For pmd and pgd we can find:
1) _PAGE_PTE set pte -> indicate PTE
2) bits [2..6] non zero -> indicate hugepd.
They also encode the size. We skip bit 1 (_PAGE_PRESENT).
3) othewise pointer to next table.
Acked-by: Scott Wood <[email protected]>
Signed-off-by: Aneesh Kumar K.V <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
We support THP only with book3s_64 and 64K page size. Move
THP details to hash64-64k.h to clarify the same.
Acked-by: Scott Wood <[email protected]>
Signed-off-by: Aneesh Kumar K.V <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
W.r.t hugetlb, we support two format for pmd. With book3s_64 and
64K linux page size, we can have pte at the pmd level. Hence we
don't need to support hugepd there. For everything else hugepd
is supported and pmd_huge is (0).
Acked-by: Scott Wood <[email protected]>
Signed-off-by: Aneesh Kumar K.V <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Only difference here is, we apply the WIMG mapping early, so rflags
passed to updatepp will also be changed.
Acked-by: Scott Wood <[email protected]>
Signed-off-by: Aneesh Kumar K.V <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Instead of open coding it in multiple code paths, export the helper
and add more documentation. Also make sure we don't make assumption
regarding pte bit position
Acked-by: Scott Wood <[email protected]>
Signed-off-by: Aneesh Kumar K.V <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
This is similar to 64K insert. May be we want to consolidate
Acked-by: Scott Wood <[email protected]>
Signed-off-by: Aneesh Kumar K.V <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Convert from asm to C
Acked-by: Scott Wood <[email protected]>
Signed-off-by: Aneesh Kumar K.V <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
No real change, only style changes
Acked-by: Scott Wood <[email protected]>
Signed-off-by: Aneesh Kumar K.V <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Signed-off-by: Aneesh Kumar K.V <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
This free up 11 bits in pte_t. In the later patch we also change
the pte_t format so that we can start supporting migration pte
at pmd level. We now track 4k subpage valid bit as below
If we have _PAGE_COMBO set, we override the _PAGE_F_GIX_SHIFT
and _PAGE_F_SECOND. Together we have 4 bits, each of them
used to indicate whether any of the 4 4k subpage in that group
is valid. ie,
[ group 1 bit ] [ group 2 bit ] ..... [ group 4 ]
[ subpage 1 - 4] [ subpage 5- 8] ..... [ subpage 13 - 16]
We still track each 4k subpage slot number and secondary hash
information in the second half of pgtable_t. Removing the subpage
tracking have some significant overhead on aim9 and ebizzy benchmark and
to support THP with 4K subpage, we do need a pgtable_t of 4096 bytes.
Signed-off-by: Aneesh Kumar K.V <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
We should not expect pte bit position in asm code. Simply
by moving part of that to C
Acked-by: Scott Wood <[email protected]>
Signed-off-by: Aneesh Kumar K.V <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Acked-by: Scott Wood <[email protected]>
Signed-off-by: Aneesh Kumar K.V <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Move the booke related headers below booke/32 or booke/64
Acked-by: Scott Wood <[email protected]>
Signed-off-by: Aneesh Kumar K.V <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
functions which operate on pte bits are moved to hash*.h and other
generic functions are moved to pgtable.h
Acked-by: Scott Wood <[email protected]>
Signed-off-by: Aneesh Kumar K.V <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
This enables us to keep hash64 related bits together, and makes it easy
to follow.
Acked-by: Scott Wood <[email protected]>
Signed-off-by: Aneesh Kumar K.V <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
We convert them static inline function here as we did with pte_val in
the previous patch
Acked-by: Scott Wood <[email protected]>
Signed-off-by: Aneesh Kumar K.V <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
We also convert few #define to static inline in this patch for better
type checking
Acked-by: Scott Wood <[email protected]>
Signed-off-by: Aneesh Kumar K.V <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
We copy only needed PTE bits define from pte-common.h to respective
hash related header. This should greatly simply later patches in which
we are going to change the pte format for hash config
Acked-by: Scott Wood <[email protected]>
Signed-off-by: Aneesh Kumar K.V <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
We are going to drop pte_common.h in the later patch. The idea is to
enable hash code not require to define all PTE bits. Having PTE bits
defined in pte_common.h made the code unnecessarily complex.
Acked-by: Scott Wood <[email protected]>
Signed-off-by: Aneesh Kumar K.V <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
We also move __ASSEMBLY__ towards the end of header. This avoid
having #ifndef __ASSEMBLY___ all over the header
Acked-by: Scott Wood <[email protected]>
Signed-off-by: Aneesh Kumar K.V <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
This further make a copy of pte defines to book3s/64/hash*.h. This
remove the dependency on pgtable-ppc64-4k.h and pgtable-ppc64-64k.h
Acked-by: Scott Wood <[email protected]>
Signed-off-by: Aneesh Kumar K.V <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
In this patch we do:
cp pgtable-ppc32.h book3s/32/pgtable.h
cp pgtable-ppc64.h book3s/64/pgtable.h
This enable us to do further changes to hash specific config.
We will change the page table format for 64bit hash in later patches.
Acked-by: Scott Wood <[email protected]>
Signed-off-by: Aneesh Kumar K.V <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Acked-by: Scott Wood <[email protected]>
Signed-off-by: Aneesh Kumar K.V <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
This is the same bug we fixed as part of 09567e7fd44291bfc08accfdd67ad8f467842332
("powerpc/mm: Check paca psize is up to date for huge mappings"). Please
check that for details. The difference here is that faults were
happening on a 4K page at an address previously mapped by hugetlb.
Signed-off-by: Aneesh Kumar K.V <[email protected]>
Reviewed-by: Anshuman Khandual <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Originally the mpc85xx-pci-edac driver bound directly to the PCI
controller node.
Commit
905e75c46dba ("powerpc/fsl-pci: Unify pci/pcie initialization code")
turned the PCI controller code into a platform device. Since we can't
have two drivers binding to the same device, the EDAC code was changed
to be called into as a library-style submodule. However, this doesn't
work if the EDAC driver is built as a module.
Commit
8d8fcba6d1ea ("EDAC: Rip out the edac_subsys reference counting")
exposed another problem with this approach -- mpc85xx_pci_err_probe()
was being called in the same early boot phase that the PCI controller
is initialized, rather than in the device_initcall phase that the EDAC
layer expects. This caused a crash on boot.
To fix this, the PCI controller code now creates a child platform device
specifically for EDAC, which the mpc85xx-pci-edac driver binds to.
Reported-by: Michael Ellerman <[email protected]>
Reviewed-by: Johannes Thumshirn <[email protected]>
Signed-off-by: Scott Wood <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Daniel Axtens <[email protected]>
Cc: Doug Thompson <[email protected]>
Cc: Jia Hongtao <[email protected]>
Cc: Jiri Kosina <[email protected]>
Cc: Kim Phillips <[email protected]>
Cc: linux-edac <[email protected]>
Cc: [email protected]
Cc: Masanari Iida <[email protected]>
Cc: Mauro Carvalho Chehab <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Randy Dunlap <[email protected]>
Cc: Rob Herring <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Borislav Petkov <[email protected]>
|
|
* pci/aspm:
PCI/ASPM: Make sysfs link_state_store() consistent with link_state_show()
* pci/hotplug:
PCI: pciehp: Always protect pciehp_disable_slot() with hotplug mutex
* pci/misc:
x86/PCI: Simplify pci_bios_{read,write}
PCI: Simplify config space size computation
PCI: Limit config space size for Netronome NFP6000 family
PCI: Add Netronome vendor and device IDs
PCI: Support PCIe devices with short cfg_size
x86/PCI: Clarify AMD Fam10h config access restrictions comment
PCI: Print warnings for all invalid expansion ROM headers
PCI: Check for PCI_HEADER_TYPE_BRIDGE equality, not bitmask
* pci/msi:
PCI/MSI: Remove empty pci_msi_init_pci_dev()
PCI/MSI: Initialize MSI capability for all architectures
|
|
Bit 7 of the "Header Type" register indicates a multi-function device when
set. Bits 0-6 contain encoded values, where 0x1 indicates a PCI-PCI
bridge. It is incorrect to test this as though it were a mask.
For example, while the PCI 3.0 spec only defines values 0x0, 0x1, and 0x2,
it's conceivable that a future spec could define 0x3 to mean something
else; then tests for "(hdr_type & 0x7f) & PCI_HEADER_TYPE_BRIDGE" would
incorrectly succeed for this new 0x3 header type.
Test bits 0-6 of the Header Type for equality with PCI_HEADER_TYPE_BRIDGE.
Signed-off-by: Bjorn Helgaas <[email protected]>
|
|
Two DSCR tests have a hack in them:
/*
* XXX: Force a context switch out so that DSCR
* current value is copied into the thread struct
* which is required for the child to inherit the
* changed value.
*/
sleep(1);
We should not be working around this in the testcase, it is a kernel bug.
Fix it by copying the current DSCR to the child, instead of what we
had in the thread struct at last context switch.
Signed-off-by: Anton Blanchard <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
commit 152d523e6307 ("powerpc: Create context switch helpers save_sprs()
and restore_sprs()") moved the restore of SPRs after the call to _switch().
There is an issue with this approach - new tasks do not return through
_switch(), they are set up by copy_thread() to directly return through
ret_from_fork() or ret_from_kernel_thread(). This means restore_sprs() is
not getting called for new tasks.
Fix this by moving restore_sprs() before _switch().
Fixes: 152d523e6307 ("powerpc: Create context switch helpers save_sprs() and restore_sprs()")
Signed-off-by: Anton Blanchard <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Commit a0e72cf12b1a ("powerpc: Create msr_check_and_{set,clear}()")
removed a call to check_if_tm_restore_required() in the
enable_kernel_*() functions. Add them back in.
Fixes: a0e72cf12b1a ("powerpc: Create msr_check_and_{set,clear}()")
Reported-by: Rashmica Gupta <[email protected]>
Signed-off-by: Anton Blanchard <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Only using 32 memslots for KVM on powerpc is way too low, you can
nowadays hit this limit quite fast by adding a couple of PCI devices
and/or pluggable memory DIMMs to the guest.
x86 already increased the KVM_USER_MEM_SLOTS to 509, to satisfy 256
pluggable DIMM slots, 3 private slots and 253 slots for other things
like PCI devices (i.e. resulting in 256 + 3 + 253 = 512 slots in
total). We should do something similar for powerpc, and since we do
not use private slots here, we can set the value to 512 directly.
While we're at it, also remove the KVM_MEM_SLOTS_NUM definition
from the powerpc-specific header since this gets defined in the
generic kvm_host.h header anyway.
Signed-off-by: Thomas Huth <[email protected]>
Signed-off-by: Paul Mackerras <[email protected]>
|
|
Currently it is possible for userspace (e.g. QEMU) to set a value
for the MSR for a guest VCPU which has both of the TS bits set,
which is an illegal combination. The result of this is that when
we execute a hrfid (hypervisor return from interrupt doubleword)
instruction to enter the guest, the CPU will take a TM Bad Thing
type of program interrupt (vector 0x700).
Now, if PR KVM is configured in the kernel along with HV KVM, we
actually handle this without crashing the host or giving hypervisor
privilege to the guest; instead what happens is that we deliver a
program interrupt to the guest, with SRR0 reflecting the address
of the hrfid instruction and SRR1 containing the MSR value at that
point. If PR KVM is not configured in the kernel, then we try to
run the host's program interrupt handler with the MMU set to the
guest context, which almost certainly causes a host crash.
This closes the hole by making kvmppc_set_msr_hv() check for the
illegal combination and force the TS field to a safe value (00,
meaning non-transactional).
Cc: [email protected] # v3.9+
Signed-off-by: Paul Mackerras <[email protected]>
|
|
Signed-off-by: Al Viro <[email protected]>
|
|
The vcpu_book3s variable is assigned but never used. So remove it.
Found using cppcheck.
Signed-off-by: Geyslan G. Bem <[email protected]>
Signed-off-by: Paul Mackerras <[email protected]>
|
|
In the old DABR register, the BT (Breakpoint Translation) bit
is bit number 61. In the new DAWRX register, the WT (Watchpoint
Translation) bit is bit number 59. So to move the DABR-BT bit
into the position of the DAWRX-WT bit, it has to be shifted by
two, not only by one. This fixes hardware watchpoints in gdb of
older guests that only use the H_SET_DABR/X interface instead
of the new H_SET_MODE interface.
Cc: [email protected] # v3.14+
Signed-off-by: Thomas Huth <[email protected]>
Reviewed-by: Laurent Vivier <[email protected]>
Reviewed-by: David Gibson <[email protected]>
Signed-off-by: Paul Mackerras <[email protected]>
|
|
As we saw with the TM Bad Thing type of program interrupt occurring
on the hrfid that enters the guest, it is not completely impossible
to have a trap occurring in the guest entry/exit code, despite the
fact that the code has been written to avoid taking any traps.
This adds a check in the kvmppc_handle_exit_hv() function to detect
the case when a trap has occurred in the hypervisor-mode code, and
instead of treating it just like a trap in guest code, we now print
a message and return to userspace with a KVM_EXIT_INTERNAL_ERROR
exit reason.
Of the various interrupts that get handled in the assembly code in
the guest exit path and that can return directly to the guest, the
only one that can occur when MSR.HV=1 and MSR.EE=0 is machine check
(other than system call, which we can avoid just by not doing a sc
instruction). Therefore this adds code to the machine check path to
ensure that if the MCE occurred in hypervisor mode, we exit to the
host rather than trying to continue the guest.
Signed-off-by: Paul Mackerras <[email protected]>
|
|
This reverts commit 527d10ef3a315d3cb9dc098dacd61889a6c26439.
The reverted commit breaks cxlflash devices following an EEH reset (and
possibly other cxl devices, however this has not been tested).
The reverted commit changed the behaviour of eeh_reset_device() so that PHB
PEs are not unfrozen following the completion of the reset. This should not
be problematic, as no device resources should have been associated with the
PHB PE.
However, when attempting to load the cxlflash driver after a reset, the
driver attempts to read Vital Product Data through a call to
pci_read_vpd() (which is called on the physical cxl device, not on the
virtual AFU device). pci_read_vpd() in turn attempts to read from the cxl
device's config space. This fails, as the PE it's trying to read from is
still frozen. In turn, the driver gets an -ENODEV and fails to initialise.
It appears this issue only affects some parts of the VPD area, as "lspci
-vvv", which only reads a subset of the VPD bytes, is not broken by the
original patch.
At this stage, we don't fully understand why we're trying to read a frozen
PE, and we don't know how this affects other cxl devices. It is possible
that there is an underlying bug in the cxl driver or the powerpc CAPI
support code, or alternatively a bug in the PCI resource allocation/mapping
code that is incorrectly mapping resources to PE#0.
As such, this fix is incomplete, however it is necessary to prevent a
serious regression in CAPI support.
In the meantime, revert the commit, especially as it was intended to be a
non-functional change.
Cc: Gavin Shan <[email protected]>
Cc: Ian Munsie <[email protected]>
Cc: Daniel Axtens <[email protected]>
Signed-off-by: Andrew Donnellan <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|