Age | Commit message (Collapse) | Author | Files | Lines |
|
This file was originally cloned off of the MPC8641D-HPCN reference
platform, which actually had a PHY IRQ line connected. However this
board does not. The bogus entry was largely inert and went undetected
until commit 321beec5047af83db90c88114b7e664b156f49fe ("net: phy: Use
interrupts when available in NOLINK state") was added to the tree.
With the above commit, the board fails to NFS boot since it sits waiting
for a PHY IRQ event that of course never arrives. Removing the bogus
entries from the DTS file fixes the issue.
Cc: Andrew Lunn <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
The OPAL event calls return a mask of events that are active in big
endian format. This is checked when unmasking the events in the
irqchip by comparison with a cached value. The cached value was stored
in big endian format but should've been converted to CPU endian
first.
This bug leads to OPAL event delivery being delayed or dropped on some
systems. Symptoms may include a non-functional console.
The bug is fixed by calling opal_handle_events(...) instead of
duplicating code in opal_event_unmask(...).
Fixes: 9f0fd0499d30 ("powerpc/powernv: Add a virtual irqchip for opal events")
Cc: [email protected] # v4.2+
Reported-by: Douglas L Lehr <[email protected]>
Signed-off-by: Alistair Popple <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Makes it easier to handle init vs core cleanly, though the change is
fairly invasive across random architectures.
It simplifies the rbtree code immediately, however, while keeping the
core data together in the same cachline (now iff the rbtree code is
enabled).
Acked-by: Peter Zijlstra <[email protected]>
Reviewed-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Rusty Russell <[email protected]>
Signed-off-by: Jiri Kosina <[email protected]>
|
|
With commit b92b8b35a2e ("locking/arch: Rename set_mb() to smp_store_mb()")
it was made clear that the context of this call (and thus set_mb)
is strictly for CPU ordering, as opposed to IO. As such all archs
should use the smp variant of mb(), respecting the semantics and
saving a mandatory barrier on UP.
Signed-off-by: Davidlohr Bueso <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Remove a bunch of unnecessary fallback functions and group
things in a more logical way.
Signed-off-by: Anton Blanchard <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Most of __switch_to() is housekeeping, TLB batching, timekeeping etc.
Move these away from the more complex and critical context switching
code.
Signed-off-by: Anton Blanchard <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Create a single function that flushes everything (FP, VMX, VSX, SPE).
Doing this all at once means we only do one MSR write.
Signed-off-by: Anton Blanchard <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Create a single function that gives everything up (FP, VMX, VSX, SPE).
Doing this all at once means we only do one MSR write.
A context switch microbenchmark using yield():
http://ozlabs.org/~anton/junkcode/context_switch2.c
./context_switch2 --test=yield --fp --altivec --vector 0 0
shows an improvement of 3% on POWER8.
Signed-off-by: Anton Blanchard <[email protected]>
[mpe: giveup_all() needs to be EXPORT_SYMBOL'ed]
Signed-off-by: Michael Ellerman <[email protected]>
|
|
More consolidation of our MSR available bit handling.
Signed-off-by: Anton Blanchard <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Add a boot option that strictly manages the MSR unavailable bits.
This catches kernel uses of FP/Altivec/SPE that would otherwise
corrupt user state.
Signed-off-by: Anton Blanchard <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
The enable_kernel_*() functions leave the relevant MSR bits enabled
until we exit the kernel sometime later. Create disable versions
that wrap the kernel use of FP, Altivec VSX or SPE.
While we don't want to disable it normally for performance reasons
(MSR writes are slow), it will be used for a debug boot option that
does this and catches bad uses in other areas of the kernel.
Signed-off-by: Anton Blanchard <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Create helper functions to set and clear MSR bits after first
checking if they are already set. Grouping them will make it
easy to avoid the MSR writes in a subsequent optimisation.
Signed-off-by: Anton Blanchard <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Move the MSR modification into c. Removing it from the assembly
function will allow us to avoid costly MSR writes by batching them
up.
Check the FP and VMX bits before calling the relevant giveup_*()
function. This makes giveup_vsx() and flush_vsx_to_thread() perform
more like their sister functions, and allows us to use
flush_vsx_to_thread() in the signal code.
Move the check_if_tm_restore_required() check in.
Signed-off-by: Anton Blanchard <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Move the MSR modification into new c functions. Removing it from
the low level functions will allow us to avoid costly MSR writes
by batching them up.
Move the check_if_tm_restore_required() check into these new functions.
Signed-off-by: Anton Blanchard <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
We used to allow giveup_*() to be called with a NULL task struct
pointer. Now those cases are handled in the caller we can remove
the checks. We can also remove giveup_altivec_notask() which is also
unused.
Signed-off-by: Anton Blanchard <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
mtmsrd_isync() will do an mtmsrd followed by an isync on older
processors. On newer processors we avoid the isync via a feature fixup.
Signed-off-by: Anton Blanchard <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Instead of having multiple giveup_*_maybe_transactional() functions,
separate out the TM check into a new function called
check_if_tm_restore_required().
This will make it easier to optimise the giveup_*() functions in a
subsequent patch.
Signed-off-by: Anton Blanchard <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
The UP only lazy floating point and vector optimisations were written
back when SMP was not common, and neither glibc nor gcc used vector
instructions. Now SMP is very common, glibc aggressively uses vector
instructions and gcc autovectorises.
We want to add new optimisations that apply to both UP and SMP, but
in preparation for that remove these UP only optimisations.
Signed-off-by: Anton Blanchard <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
No need to execute mflr twice.
Signed-off-by: Anton Blanchard <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Move all our context switch SPR save and restore code into two
helpers. We do a few optimisations:
- Group all mfsprs and all mtsprs. In many cases an mtspr sets a
scoreboarding bit that an mfspr waits on, so the current practise of
mfspr A; mtspr A; mfpsr B; mtspr B is the worst scheduling we can
do.
- SPR writes are slow, so check that the value is changing before
writing it.
A context switch microbenchmark using yield():
http://ozlabs.org/~anton/junkcode/context_switch2.c
./context_switch2 --test=yield 0 0
shows an improvement of almost 10% on POWER8.
Signed-off-by: Anton Blanchard <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Similar to the non TM load_up_*() functions, don't disable the MSR
bits on the way out.
Signed-off-by: Anton Blanchard <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Writing the MSR is slow, so we want to avoid it whenever possible.
A subsequent patch will add a debug option that strictly manages the
FP/VMX/VSX unavailable bits. For now just remove it, matching what
we do in other areas of the kernel (eg enable_kernel_altivec()).
A context switch microbenchmark using yield():
http://ozlabs.org/~anton/junkcode/context_switch2.c
./context_switch2 --test=yield --fp 0 0
shows an improvement of almost 3% on POWER8.
Signed-off-by: Anton Blanchard <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Currently, if HV KVM is configured but PR KVM isn't, we don't include
a test to see whether we were interrupted in KVM guest context for the
set of interrupts which get delivered directly to the guest by hardware
if they occur in the guest. This includes things like program
interrupts.
However, the recent bug where userspace could set the MSR for a VCPU
to have an illegal value in the TS field, and thus cause a TM Bad Thing
type of program interrupt on the hrfid that enters the guest, showed that
we can never be completely sure that these interrupts can never occur
in the guest entry/exit code. If one of these interrupts does happen
and we have HV KVM configured but not PR KVM, then we end up trying to
run the handler in the host with the MMU set to the guest MMU context,
which generally ends badly.
Thus, for robustness it is better to have the test in every interrupt
vector, so that if some way is found to trigger some interrupt in the
guest entry/exit path, we can handle it without immediately crashing
the host.
This means that the distinction between KVMTEST and KVMTEST_PR goes
away. Thus we delete KVMTEST_PR and associated macros and use KVMTEST
everywhere that we previously used either KVMTEST_PR or KVMTEST. It
also means that SOFTEN_TEST_HV_201 becomes the same as SOFTEN_TEST_PR,
so we deleted SOFTEN_TEST_HV_201 and use SOFTEN_TEST_PR instead.
Signed-off-by: Paul Mackerras <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Let's reuse the new common function for VPCU lookup by id.
Reviewed-by: Christian Borntraeger <[email protected]>
Reviewed-by: Dominik Dingel <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
Signed-off-by: Christian Borntraeger <[email protected]>
[split out the new function into a separate patch]
|
|
This platform driver has a OF device ID table but the OF module
alias information is not created so module autoloading won't work.
Signed-off-by: Luis de Bethencourt <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
It is common practice with powerpc to use 'rN' to refer to register 'N'. However
when using the pt_regs_offset table we have to use 'gprN'.
So add aliases such that both 'rN' and 'gprN' can be used.
For example, we can currently do:
$ su -
$ cd /sys/kernel/debug/tracing
$ echo "p:probe/sys_fchownat sys_fchownat %gpr3:s32 +0(%gpr4):string %gpr5:s32 %gpr6:s32 %gpr7:s32" > kprobe_events
$ echo 1 > events/probe/sys_fchownat/enable
$ touch /tmp/foo
$ chown root /tmp/foo
$ echo 0 > events/enable
$ cat trace
chown-2925 [014] d... 76.160657: sys_fchownat: (SyS_fchownat+0x8/0x1a0) arg1=-100 arg2="/tmp/foo" arg3=0 arg4=-1 arg5=0
Instead we'd like to be able to use:
$ echo "p:probe/sys_fchownat sys_fchownat %r3:s32 +0(%r4):string %r5:s32 %r6:s32 %r7:s32" > kprobe_events
Signed-off-by: Rashmica Gupta <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Most architectures use NR_syscalls as the #define for the number of syscalls.
We use __NR_syscalls, and then define NR_syscalls as __NR_syscalls.
__NR_syscalls is not used outside arch code, whereas NR_syscalls is. So as
NR_syscalls must be defined and __NR_syscalls does not, replace __NR_syscalls
with NR_syscalls.
Signed-off-by: Rashmica Gupta <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
This function has been unused since commit 14cf11af6cf6 ("powerpc: Merge enough
to start building in arch/powerpc."), so remove it.
Signed-off-by: Rashmica Gupta <[email protected]>
Reviewed-by: Andrew Donnellan <[email protected]>
Reviewed-by: Anshuman Khandual <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
if the language is not english objdump output is not parsed correctly
and format is "". Later, "ld -m $format" fails.
This patch adds "LANG=C" to force english output for objdump.
Signed-off-by: Laurent Vivier <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
The xmon and cascade irq handlers must not run as threads.
pmac_pic_lock is already a raw_spinlock, but the irq flag
IRQF_NO_THREAD needs to be set as well.
Signed-off-by: John Ogness <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
In another patch kvm_is_visible_gfn is maken return bool due to this
function only returns zero or one as its return value, let's also make
kvmppc_visible_gpa return bool to keep consistent.
No functional change.
Signed-off-by: Yaowei Bai <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
1851617cd2da ("PCI/MSI: Disable MSI at enumeration even if kernel doesn't
support MSI") moved dev->msi_cap and dev->msix_cap initialization from the
pci_init_capabilities() path (used on all architectures) to the
pci_setup_device() path (not used on Open Firmware architectures).
This broke MSI or MSI-X on Open Firmware machines. 4d9aac397a5d
("powerpc/PCI: Disable MSI/MSI-X interrupts at PCI probe time in OF case")
fixed it for PowerPC but not for SPARC.
Set up MSI and MSI-X (initialize msi_cap and msix_cap and disable MSI and
MSI-X) in pci_init_capabilities() so all architectures do it the same way.
This reverts 4d9aac397a5d since this patch fixes the problem generically
for both PowerPC and SPARC.
[bhelgaas: changelog, make pci_msi_setup_pci_dev() static]
Fixes: 1851617cd2da ("PCI/MSI: Disable MSI at enumeration even if kernel doesn't support MSI")
Signed-off-by: Guilherme G. Piccoli <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
|
|
platform_driver does not need to set an owner because
platform_driver_register() will set it.
Signed-off-by: Krzysztof Kozlowski <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Currently we can hit a scenario where we'll tm_reclaim() twice. This
results in a TM bad thing exception because the second reclaim occurs
when not in suspend mode.
The scenario in which this can happen is the following. We attempt to
deliver a signal to userspace. To do this we need obtain the stack
pointer to write the signal context. To get this stack pointer we
must tm_reclaim() in case we need to use the checkpointed stack
pointer (see get_tm_stackpointer()). Normally we'd then return
directly to userspace to deliver the signal without going through
__switch_to().
Unfortunatley, if at this point we get an error (such as a bad
userspace stack pointer), we need to exit the process. The exit will
result in a __switch_to(). __switch_to() will attempt to save the
process state which results in another tm_reclaim(). This
tm_reclaim() now causes a TM Bad Thing exception as this state has
already been saved and the processor is no longer in TM suspend mode.
Whee!
This patch checks the state of the MSR to ensure we are TM suspended
before we attempt the tm_reclaim(). If we've already saved the state
away, we should no longer be in TM suspend mode. This has the
additional advantage of checking for a potential TM Bad Thing
exception.
Found using syscall fuzzer.
Fixes: fb09692e71f1 ("powerpc: Add reclaim and recheckpoint functions for context switching transactional memory processes")
Cc: [email protected] # v3.9+
Signed-off-by: Michael Neuling <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Currently we allow both the MSR T and S bits to be set by userspace on
a signal return. Unfortunately this is a reserved configuration and
will cause a TM Bad Thing exception if attempted (via rfid).
This patch checks for this case in both the 32 and 64 bit signals
code. If both T and S are set, we mark the context as invalid.
Found using a syscall fuzzer.
Fixes: 2b0a576d15e0 ("powerpc: Add new transactional memory state to the signal context")
Cc: [email protected] # v3.9+
Signed-off-by: Michael Neuling <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
The selftest passes on 64-bit LE and 32-bit BE.
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Pull second batch of kvm updates from Paolo Bonzini:
"Four changes:
- x86: work around two nasty cases where a benign exception occurs
while another is being delivered. The endless stream of exceptions
causes an infinite loop in the processor, which not even NMIs or
SMIs can interrupt; in the virt case, there is no possibility to
exit to the host either.
- x86: support for Skylake per-guest TSC rate. Long supported by
AMD, the patches mostly move things from there to common
arch/x86/kvm/ code.
- generic: remove local_irq_save/restore from the guest entry and
exit paths when context tracking is enabled. The patches are a few
months old, but we discussed them again at kernel summit. Andy
will pick up from here and, in 4.5, try to remove it from the user
entry/exit paths.
- PPC: Two bug fixes, see merge commit 370289756becc for details"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (21 commits)
KVM: x86: rename update_db_bp_intercept to update_bp_intercept
KVM: svm: unconditionally intercept #DB
KVM: x86: work around infinite loop in microcode when #AC is delivered
context_tracking: avoid irq_save/irq_restore on guest entry and exit
context_tracking: remove duplicate enabled check
KVM: VMX: Dump TSC multiplier in dump_vmcs()
KVM: VMX: Use a scaled host TSC for guest readings of MSR_IA32_TSC
KVM: VMX: Setup TSC scaling ratio when a vcpu is loaded
KVM: VMX: Enable and initialize VMX TSC scaling
KVM: x86: Use the correct vcpu's TSC rate to compute time scale
KVM: x86: Move TSC scaling logic out of call-back read_l1_tsc()
KVM: x86: Move TSC scaling logic out of call-back adjust_tsc_offset()
KVM: x86: Replace call-back compute_tsc_offset() with a common function
KVM: x86: Replace call-back set_tsc_khz() with a common function
KVM: x86: Add a common TSC scaling function
KVM: x86: Add a common TSC scaling ratio field in kvm_vcpu_arch
KVM: x86: Collect information for setting TSC scaling ratio
KVM: x86: declare a few variables as __read_mostly
KVM: x86: merge handle_mmio_page_fault and handle_mmio_page_fault_common
KVM: PPC: Book3S HV: Don't dynamically split core when already split
...
|
|
Pull block IO poll support from Jens Axboe:
"Various groups have been doing experimentation around IO polling for
(really) fast devices. The code has been reviewed and has been
sitting on the side for a few releases, but this is now good enough
for coordinated benchmarking and further experimentation.
Currently O_DIRECT sync read/write are supported. A framework is in
the works that allows scalable stats tracking so we can auto-tune
this. And we'll add libaio support as well soon. Fow now, it's an
opt-in feature for test purposes"
* 'for-4.4/io-poll' of git://git.kernel.dk/linux-block:
direct-io: be sure to assign dio->bio_bdev for both paths
directio: add block polling support
NVMe: add blk polling support
block: add block polling support
blk-mq: return tag/queue combo in the make_request_fn handlers
block: change ->make_request_fn() and users to return a queue cookie
|
|
Removal started in commit 5bbeed12bdc3 ("sparc32: drop unused
kmap_atomic_to_page"). Let's do it across the whole tree.
Signed-off-by: Nicolas Pitre <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
No functional changes in this patch, but it prepares us for returning
a more useful cookie related to the IO that was queued up.
Signed-off-by: Jens Axboe <[email protected]>
Acked-by: Christoph Hellwig <[email protected]>
Acked-by: Keith Busch <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull PCI updates from Bjorn Helgaas:
"Resource management:
- Add support for Enhanced Allocation devices (Sean O. Stalley)
- Add Enhanced Allocation register entries (Sean O. Stalley)
- Handle IORESOURCE_PCI_FIXED when sizing resources (David Daney)
- Handle IORESOURCE_PCI_FIXED when assigning resources (David Daney)
- Handle Enhanced Allocation capability for SR-IOV devices (David Daney)
- Clear IORESOURCE_UNSET when reverting to firmware-assigned address (Bjorn Helgaas)
- Make Enhanced Allocation bitmasks more obvious (Bjorn Helgaas)
- Expand Enhanced Allocation BAR output (Bjorn Helgaas)
- Add of_pci_check_probe_only to parse "linux,pci-probe-only" (Marc Zyngier)
- Fix lookup of linux,pci-probe-only property (Marc Zyngier)
- Add sparc mem64 resource parsing for root bus (Yinghai Lu)
PCI device hotplug:
- pciehp: Queue power work requests in dedicated function (Guenter Roeck)
Driver binding:
- Add builtin_pci_driver() to avoid registration boilerplate (Paul Gortmaker)
Virtualization:
- Set SR-IOV NumVFs to zero after enumeration (Alexander Duyck)
- Remove redundant validation of SR-IOV offset/stride registers (Alexander Duyck)
- Remove VFs in reverse order if virtfn_add() fails (Alexander Duyck)
- Reorder pcibios_sriov_disable() (Alexander Duyck)
- Wait 1 second between disabling VFs and clearing NumVFs (Alexander Duyck)
- Fix sriov_enable() error path for pcibios_enable_sriov() failures (Alexander Duyck)
- Enable SR-IOV ARI Capable Hierarchy before reading TotalVFs (Ben Shelton)
- Don't try to restore VF BARs (Wei Yang)
MSI:
- Don't alloc pcibios-irq when MSI is enabled (Joerg Roedel)
- Add msi_controller setup_irqs() method for special multivector setup (Lucas Stach)
- Export all remapped MSIs to sysfs attributes (Romain Bezut)
- Disable MSI on SiS 761 (Ondrej Zary)
AER:
- Clear error status registers during enumeration and restore (Taku Izumi)
Generic host bridge driver:
- Fix lookup of linux,pci-probe-only property (Marc Zyngier)
- Allow multiple hosts with different map_bus() methods (David Daney)
- Pass starting bus number to pci_scan_root_bus() (David Daney)
- Fix address window calculation for non-zero starting bus (David Daney)
Altera host bridge driver:
- Add msi.h to ARM Kbuild (Ley Foon Tan)
- Add Altera PCIe host controller driver (Ley Foon Tan)
- Add Altera PCIe MSI driver (Ley Foon Tan)
APM X-Gene host bridge driver:
- Remove msi_controller assignment (Duc Dang)
Broadcom iProc host bridge driver:
- Fix header comment "Corporation" misspelling (Florian Fainelli)
- Fix code comment to match code (Ray Jui)
- Remove unused struct iproc_pcie.irqs[] (Ray Jui)
- Call pci_fixup_irqs() for ARM64 as well as ARM (Ray Jui)
- Fix PCIe reset logic (Ray Jui)
- Improve link detection logic (Ray Jui)
- Update PCIe device tree bindings (Ray Jui)
- Add outbound mapping support (Ray Jui)
Freescale i.MX6 host bridge driver:
- Return real error code from imx6_add_pcie_port() (Fabio Estevam)
- Add PCIE_PHY_RX_ASIC_OUT_VALID definition (Fabio Estevam)
Freescale Layerscape host bridge driver:
- Remove ls_pcie_establish_link() (Minghuan Lian)
- Ignore PCIe controllers in Endpoint mode (Minghuan Lian)
- Factor out SCFG related function (Minghuan Lian)
- Update ls_add_pcie_port() (Minghuan Lian)
- Remove unused fields from struct ls_pcie (Minghuan Lian)
- Add support for LS1043a and LS2080a (Minghuan Lian)
- Add ls_pcie_msi_host_init() (Minghuan Lian)
HiSilicon host bridge driver:
- Add HiSilicon SoC Hip05 PCIe driver (Zhou Wang)
Marvell MVEBU host bridge driver:
- Return zero for reserved or unimplemented config space (Russell King)
- Use exact config access size; don't read/modify/write (Russell King)
- Use of_get_available_child_count() (Russell King)
- Use for_each_available_child_of_node() to walk child nodes (Russell King)
- Report full node name when reporting a DT error (Russell King)
- Use port->name rather than "PCIe%d.%d" (Russell King)
- Move port parsing and resource claiming to separate function (Russell King)
- Fix memory leaks and refcount leaks (Russell King)
- Split port parsing and resource claiming from port setup (Russell King)
- Use gpio_set_value_cansleep() (Russell King)
- Use devm_kcalloc() to allocate an array (Russell King)
- Use gpio_desc to carry around gpio (Russell King)
- Improve clock/reset handling (Russell King)
- Add PCI Express root complex capability block (Russell King)
- Remove code restricting accesses to slot 0 (Russell King)
NVIDIA Tegra host bridge driver:
- Wrap static pgprot_t initializer with __pgprot() (Ard Biesheuvel)
Renesas R-Car host bridge driver:
- Build pci-rcar-gen2.c only on ARM (Geert Uytterhoeven)
- Build pcie-rcar.c only on ARM (Geert Uytterhoeven)
- Make PCI aware of the I/O resources (Phil Edworthy)
- Remove dependency on ARM-specific struct hw_pci (Phil Edworthy)
- Set root bus nr to that provided in DT (Phil Edworthy)
- Fix I/O offset for multiple host bridges (Phil Edworthy)
ST Microelectronics SPEAr13xx host bridge driver:
- Fix dw_pcie_cfg_read/write() usage (Gabriele Paoloni)
Synopsys DesignWare host bridge driver:
- Make "clocks" and "clock-names" optional DT properties (Bhupesh Sharma)
- Use exact access size in dw_pcie_cfg_read() (Gabriele Paoloni)
- Simplify dw_pcie_cfg_read/write() interfaces (Gabriele Paoloni)
- Require config accesses to be naturally aligned (Gabriele Paoloni)
- Make "num-lanes" an optional DT property (Gabriele Paoloni)
- Move calculation of bus addresses to DRA7xx (Gabriele Paoloni)
- Replace ARM pci_sys_data->align_resource with global function pointer (Gabriele Paoloni)
- Factor out MSI msg setup (Lucas Stach)
- Implement multivector MSI IRQ setup (Lucas Stach)
- Make get_msi_addr() return phys_addr_t, not u32 (Lucas Stach)
- Set up high part of MSI target address (Lucas Stach)
- Fix PORT_LOGIC_LINK_WIDTH_MASK (Zhou Wang)
- Revert "PCI: designware: Program ATU with untranslated address" (Zhou Wang)
- Use of_pci_get_host_bridge_resources() to parse DT (Zhou Wang)
- Make driver arch-agnostic (Zhou Wang)
Miscellaneous:
- Make x86 pci_subsys_init() static (Alexander Kuleshov)
- Turn off Request Attributes to avoid Chelsio T5 Completion erratum (Hariprasad Shenai)"
* tag 'pci-v4.4-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (94 commits)
PCI: altera: Add Altera PCIe MSI driver
PCI: hisi: Add HiSilicon SoC Hip05 PCIe driver
PCI: layerscape: Add ls_pcie_msi_host_init()
PCI: layerscape: Add support for LS1043a and LS2080a
PCI: layerscape: Remove unused fields from struct ls_pcie
PCI: layerscape: Update ls_add_pcie_port()
PCI: layerscape: Factor out SCFG related function
PCI: layerscape: Ignore PCIe controllers in Endpoint mode
PCI: layerscape: Remove ls_pcie_establish_link()
PCI: designware: Make "clocks" and "clock-names" optional DT properties
PCI: designware: Make driver arch-agnostic
ARM/PCI: Replace pci_sys_data->align_resource with global function pointer
PCI: designware: Use of_pci_get_host_bridge_resources() to parse DT
Revert "PCI: designware: Program ATU with untranslated address"
PCI: designware: Move calculation of bus addresses to DRA7xx
PCI: designware: Make "num-lanes" an optional DT property
PCI: designware: Require config accesses to be naturally aligned
PCI: designware: Simplify dw_pcie_cfg_read/write() interfaces
PCI: designware: Use exact access size in dw_pcie_cfg_read()
PCI: spear: Fix dw_pcie_cfg_read/write() usage
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:
- Kconfig: remove BE-only platforms from LE kernel build from Boqun
Feng
- Refresh ps3_defconfig from Geoff Levand
- Emit GNU & SysV hashes for the vdso from Michael Ellerman
- Define an enum for the bolted SLB indexes from Anshuman Khandual
- Use a local to avoid multiple calls to get_slb_shadow() from Michael
Ellerman
- Add gettimeofday() benchmark from Michael Neuling
- Avoid link stack corruption in __get_datapage() from Michael Neuling
- Add virt_to_pfn and use this instead of opencoding from Aneesh Kumar
K.V
- Add ppc64le_defconfig from Michael Ellerman
- pseries: extract of_helpers module from Andy Shevchenko
- Correct string length in pseries_of_derive_parent() from Nathan
Fontenot
- Free the MSI bitmap if it was slab allocated from Denis Kirjanov
- Shorten irq_chip name for the SIU from Christophe Leroy
- Wait 1s for secondaries to enter OPAL during kexec from Samuel
Mendoza-Jonas
- Fix _ALIGN_* errors due to type difference, from Aneesh Kumar K.V
- powerpc/pseries/hvcserver: don't memset pi_buff if it is null from
Colin Ian King
- Disable hugepd for 64K page size, from Aneesh Kumar K.V
- Differentiate between hugetlb and THP during page walk from Aneesh
Kumar K.V
- Make PCI non-optional for pseries from Michael Ellerman
- Individual System V IPC system calls from Sam bobroff
- Add selftest of unmuxed IPC calls from Michael Ellerman
- discard .exit.data at runtime from Stephen Rothwell
- Delete old orphaned PrPMC 280/2800 DTS and boot file, from Paul
Gortmaker
- Use of_get_next_parent to simplify code from Christophe Jaillet
- Paginate some xmon output from Sam bobroff
- Add some more elements to the xmon PACA dump from Michael Ellerman
- Allow the tm-syscall selftest to build with old headers from Michael
Ellerman
- Run EBB selftests only on POWER8 from Denis Kirjanov
- Drop CONFIG_TUNE_CELL in favour of CONFIG_CELL_CPU from Michael
Ellerman
- Avoid reference to potentially freed memory in prom.c from Christophe
Jaillet
- Quieten boot wrapper output with run_cmd from Geoff Levand
- EEH fixes and cleanups from Gavin Shan
- Fix recursive fenced PHB on Broadcom shiner adapter from Gavin Shan
- Use of_get_next_parent() in of_get_ibm_chip_id() from Michael
Ellerman
- Fix section mismatch warning in msi_bitmap_alloc() from Denis
Kirjanov
- Fix ps3-lpm white space from Rudhresh Kumar J
- Fix ps3-vuart null dereference from Colin King
- nvram: Add missing kfree in error path from Christophe Jaillet
- nvram: Fix function name in some errors messages, from Christophe
Jaillet
- drivers/macintosh: adb: fix misleading Kconfig help text from Aaro
Koskinen
- agp/uninorth: fix a memleak in create_gatt_table from Denis Kirjanov
- cxl: Free virtual PHB when removing from Andrew Donnellan
- scripts/kconfig/Makefile: Allow KBUILD_DEFCONFIG to be a target from
Michael Ellerman
- scripts/kconfig/Makefile: Fix KBUILD_DEFCONFIG check when building
with O= from Michael Ellerman
- Freescale updates from Scott: Highlights include 64-bit book3e
kexec/kdump support, a rework of the qoriq clock driver, device tree
changes including qoriq fman nodes, support for a new 85xx board, and
some fixes.
- MPC5xxx updates from Anatolij: Highlights include a driver for
MPC512x LocalPlus Bus FIFO with its device tree binding
documentation, mpc512x device tree updates and some minor fixes.
* tag 'powerpc-4.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (106 commits)
powerpc/msi: Fix section mismatch warning in msi_bitmap_alloc()
powerpc/prom: Use of_get_next_parent() in of_get_ibm_chip_id()
powerpc/pseries: Correct string length in pseries_of_derive_parent()
powerpc/e6500: hw tablewalk: make sure we invalidate and write to the same tlb entry
powerpc/mpc85xx: Add FSL QorIQ DPAA FMan support to the SoC device tree(s)
powerpc/mpc85xx: Create dts components for the FSL QorIQ DPAA FMan
powerpc/fsl: Add #clock-cells and clockgen label to clockgen nodes
powerpc: handle error case in cpm_muram_alloc()
powerpc: mpic: use IRQCHIP_SKIP_SET_WAKE instead of redundant mpic_irq_set_wake
powerpc/book3e-64: Enable kexec
powerpc/book3e-64/kexec: Set "r4 = 0" when entering spinloop
powerpc/booke: Only use VIRT_PHYS_OFFSET on booke32
powerpc/book3e-64/kexec: Enable SMP release
powerpc/book3e-64/kexec: create an identity TLB mapping
powerpc/book3e-64: Don't limit paca to 256 MiB
powerpc/book3e/kdump: Enable crash_kexec_wait_realmode
powerpc/book3e: support CONFIG_RELOCATABLE
powerpc/booke64: Fix args to copy_and_flush
powerpc/book3e-64: rename interrupt_end_book3e with __end_interrupts
powerpc/e6500: kexec: Handle hardware threads
...
|
|
Merge patch-bomb from Andrew Morton:
- inotify tweaks
- some ocfs2 updates (many more are awaiting review)
- various misc bits
- kernel/watchdog.c updates
- Some of mm. I have a huge number of MM patches this time and quite a
lot of it is quite difficult and much will be held over to next time.
* emailed patches from Andrew Morton <[email protected]>: (162 commits)
selftests: vm: add tests for lock on fault
mm: mlock: add mlock flags to enable VM_LOCKONFAULT usage
mm: introduce VM_LOCKONFAULT
mm: mlock: add new mlock system call
mm: mlock: refactor mlock, munlock, and munlockall code
kasan: always taint kernel on report
mm, slub, kasan: enable user tracking by default with KASAN=y
kasan: use IS_ALIGNED in memory_is_poisoned_8()
kasan: Fix a type conversion error
lib: test_kasan: add some testcases
kasan: update reference to kasan prototype repo
kasan: move KASAN_SANITIZE in arch/x86/boot/Makefile
kasan: various fixes in documentation
kasan: update log messages
kasan: accurately determine the type of the bad access
kasan: update reported bug types for kernel memory accesses
kasan: update reported bug types for not user nor kernel memory accesses
mm/kasan: prevent deadlock in kasan reporting
mm/kasan: don't use kasan shadow pointer in generic functions
mm/kasan: MODULE_VADDR is not available on all archs
...
|
|
In static micro-threading modes, the dynamic micro-threading code
is supposed to be disabled, because subcores can't make independent
decisions about what micro-threading mode to put the core in - there is
only one micro-threading mode for the whole core. The code that
implements dynamic micro-threading checks for this, except that the
check was missed in one case. This means that it is possible for a
subcore in static 2-way micro-threading mode to try to put the core
into 4-way micro-threading mode, which usually leads to stuck CPUs,
spinlock lockups, and other stalls in the host.
The problem was in the can_split_piggybacked_subcores() function, which
should always return false if the system is in a static micro-threading
mode. This fixes the problem by making can_split_piggybacked_subcores()
use subcore_config_ok() for its checks, as subcore_config_ok() includes
the necessary check for the static micro-threading modes.
Credit to Gautham Shenoy for working out that the reason for the hangs
and stalls we were seeing was that we were trying to do dynamic 4-way
micro-threading while we were in static 2-way mode.
Fixes: b4deba5c41e9
Cc: [email protected] # v4.3
Signed-off-by: Paul Mackerras <[email protected]>
|
|
When handling a hypervisor data or instruction storage interrupt (HDSI
or HISI), we look up the SLB entry for the address being accessed in
order to translate the effective address to a virtual address which can
be looked up in the guest HPT. This lookup can occasionally fail due
to the guest replacing an SLB entry without invalidating the evicted
SLB entry. In this situation an ERAT (effective to real address
translation cache) entry can persist and be used by the hardware even
though there is no longer a corresponding SLB entry.
Previously we would just deliver a data or instruction storage interrupt
(DSI or ISI) to the guest in this case. However, this is not correct
and has been observed to cause guests to crash, typically with a
data storage protection interrupt on a store to the vmemmap area.
Instead, what we do now is to synthesize a data or instruction segment
interrupt. That should cause the guest to reload an appropriate entry
into the SLB and retry the faulting instruction. If it still faults,
we should find an appropriate SLB entry next time and be able to handle
the fault.
Tested-by: Thomas Huth <[email protected]>
Reviewed-by: David Gibson <[email protected]>
Signed-off-by: Paul Mackerras <[email protected]>
|
|
The previous patch introduced a flag that specified pages in a VMA should
be placed on the unevictable LRU, but they should not be made present when
the area is created. This patch adds the ability to set this state via
the new mlock system calls.
We add MLOCK_ONFAULT for mlock2 and MCL_ONFAULT for mlockall.
MLOCK_ONFAULT will set the VM_LOCKONFAULT modifier for VM_LOCKED.
MCL_ONFAULT should be used as a modifier to the two other mlockall flags.
When used with MCL_CURRENT, all current mappings will be marked with
VM_LOCKED | VM_LOCKONFAULT. When used with MCL_FUTURE, the mm->def_flags
will be marked with VM_LOCKED | VM_LOCKONFAULT. When used with both
MCL_CURRENT and MCL_FUTURE, all current mappings and mm->def_flags will be
marked with VM_LOCKED | VM_LOCKONFAULT.
Prior to this patch, mlockall() will unconditionally clear the
mm->def_flags any time it is called without MCL_FUTURE. This behavior is
maintained after adding MCL_ONFAULT. If a call to mlockall(MCL_FUTURE) is
followed by mlockall(MCL_CURRENT), the mm->def_flags will be cleared and
new VMAs will be unlocked. This remains true with or without MCL_ONFAULT
in either mlockall() invocation.
munlock() will unconditionally clear both vma flags. munlockall()
unconditionally clears for VMA flags on all VMAs and in the mm->def_flags
field.
Signed-off-by: Eric B Munson <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Guenter Roeck <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Michael Kerrisk <[email protected]>
Cc: Ralf Baechle <[email protected]>
Cc: Shuah Khan <[email protected]>
Cc: Stephen Rothwell <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
With the setup_nr_nodes(), we have already initialized
node_possible_map. So it is safe to use for_each_node here.
There are many places in the kernel that use hardcoded 'for' loop with
nr_node_ids, because all other architectures have numa nodes populated
serially. That should be reason we had maintained the same for
powerpc.
But, since sparse numa node ids possible on powerpc, we unnecessarily
allocate memory for non existent numa nodes.
For e.g., on a system with 0,1,16,17 as numa nodes nr_node_ids=18 and
we allocate memory for nodes 2-14. This patch we allocate memory for
only existing numa nodes.
The patch is boot tested on a 4 node tuleta, confirming with printks
that it works as expected.
Signed-off-by: Raghavendra K T <[email protected]>
Cc: Vladimir Davydov <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Anton Blanchard <[email protected]>
Cc: Nishanth Aravamudan <[email protected]>
Cc: Greg Kurz <[email protected]>
Cc: Grant Likely <[email protected]>
Cc: Nikunj A Dadhania <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
probe_kernel_address() is basically the same as the (later added)
probe_kernel_read().
The return value on EFAULT is a bit different: probe_kernel_address()
returns number-of-bytes-not-copied whereas probe_kernel_read() returns
-EFAULT. All callers have been checked, none cared.
probe_kernel_read() can be overridden by the architecture whereas
probe_kernel_address() cannot. parisc, blackfin and um do this, to insert
additional checking. Hence this patch possibly fixes obscure bugs,
although there are only two probe_kernel_address() callsites outside
arch/.
My first attempt involved removing probe_kernel_address() entirely and
converting all callsites to use probe_kernel_read() directly, but that got
tiresome.
This patch shrinks mm/slab_common.o by 218 bytes. For a single
probe_kernel_address() callsite.
Cc: Steven Miao <[email protected]>
Cc: Jeff Dike <[email protected]>
Cc: Richard Weinberger <[email protected]>
Cc: "James E.J. Bottomley" <[email protected]>
Cc: Helge Deller <[email protected]>
Cc: Ingo Molnar <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Pull KVM updates from Paolo Bonzini:
"First batch of KVM changes for 4.4.
s390:
A bunch of fixes and optimizations for interrupt and time handling.
PPC:
Mostly bug fixes.
ARM:
No big features, but many small fixes and prerequisites including:
- a number of fixes for the arch-timer
- introducing proper level-triggered semantics for the arch-timers
- a series of patches to synchronously halt a guest (prerequisite
for IRQ forwarding)
- some tracepoint improvements
- a tweak for the EL2 panic handlers
- some more VGIC cleanups getting rid of redundant state
x86:
Quite a few changes:
- support for VT-d posted interrupts (i.e. PCI devices can inject
interrupts directly into vCPUs). This introduces a new
component (in virt/lib/) that connects VFIO and KVM together.
The same infrastructure will be used for ARM interrupt
forwarding as well.
- more Hyper-V features, though the main one Hyper-V synthetic
interrupt controller will have to wait for 4.5. These will let
KVM expose Hyper-V devices.
- nested virtualization now supports VPID (same as PCID but for
vCPUs) which makes it quite a bit faster
- for future hardware that supports NVDIMM, there is support for
clflushopt, clwb, pcommit
- support for "split irqchip", i.e. LAPIC in kernel +
IOAPIC/PIC/PIT in userspace, which reduces the attack surface of
the hypervisor
- obligatory smattering of SMM fixes
- on the guest side, stable scheduler clock support was rewritten
to not require help from the hypervisor"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (123 commits)
KVM: VMX: Fix commit which broke PML
KVM: x86: obey KVM_X86_QUIRK_CD_NW_CLEARED in kvm_set_cr0()
KVM: x86: allow RSM from 64-bit mode
KVM: VMX: fix SMEP and SMAP without EPT
KVM: x86: move kvm_set_irq_inatomic to legacy device assignment
KVM: device assignment: remove pointless #ifdefs
KVM: x86: merge kvm_arch_set_irq with kvm_set_msi_inatomic
KVM: x86: zero apic_arb_prio on reset
drivers/hv: share Hyper-V SynIC constants with userspace
KVM: x86: handle SMBASE as physical address in RSM
KVM: x86: add read_phys to x86_emulate_ops
KVM: x86: removing unused variable
KVM: don't pointlessly leave KVM_COMPAT=y in non-KVM configs
KVM: arm/arm64: Merge vgic_set_lr() and vgic_sync_lr_elrsr()
KVM: arm/arm64: Clean up vgic_retire_lr() and surroundings
KVM: arm/arm64: Optimize away redundant LR tracking
KVM: s390: use simple switch statement as multiplexer
KVM: s390: drop useless newline in debugging data
KVM: s390: SCA must not cross page boundaries
KVM: arm: Do not indent the arguments of DECLARE_BITMAP
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull security subsystem update from James Morris:
"This is mostly maintenance updates across the subsystem, with a
notable update for TPM 2.0, and addition of Jarkko Sakkinen as a
maintainer of that"
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (40 commits)
apparmor: clarify CRYPTO dependency
selinux: Use a kmem_cache for allocation struct file_security_struct
selinux: ioctl_has_perm should be static
selinux: use sprintf return value
selinux: use kstrdup() in security_get_bools()
selinux: use kmemdup in security_sid_to_context_core()
selinux: remove pointless cast in selinux_inode_setsecurity()
selinux: introduce security_context_str_to_sid
selinux: do not check open perm on ftruncate call
selinux: change CONFIG_SECURITY_SELINUX_CHECKREQPROT_VALUE default
KEYS: Merge the type-specific data with the payload data
KEYS: Provide a script to extract a module signature
KEYS: Provide a script to extract the sys cert list from a vmlinux file
keys: Be more consistent in selection of union members used
certs: add .gitignore to stop git nagging about x509_certificate_list
KEYS: use kvfree() in add_key
Smack: limited capability for changing process label
TPM: remove unnecessary little endian conversion
vTPM: support little endian guests
char: Drop owner assignment from i2c_driver
...
|