aboutsummaryrefslogtreecommitdiff
path: root/arch
AgeCommit message (Collapse)AuthorFilesLines
2017-07-02powerpc/perf/hv-24x7: Fix off-by-one error in request_buffer checkThiago Jung Bauermann1-1/+5
request_buffer can hold 254 requests, so if it already has that number of entries we can't add a new one. Also, define constant to show where the number comes from. Fixes: e3ee15dc5d19 ("powerpc/perf/hv-24x7: Define add_event_to_24x7_request()") Reviewed-by: Sukadev Bhattiprolu <[email protected]> Signed-off-by: Thiago Jung Bauermann <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2017-07-02powerpc/perf/hv-24x7: Fix passing of catalog version numberThiago Jung Bauermann1-5/+5
H_GET_24X7_CATALOG_PAGE needs to be passed the version number obtained from the first catalog page obtained previously. This is a 64 bit number, but create_events_from_catalog truncates it to 32-bit. This worked on POWER8, but POWER9 actually uses the upper bits so the call fails with H_P3 because the hypervisor doesn't recognize the version. This patch also adds the hcall return code to the error message, which is helpful when debugging the problem. Fixes: 5c5cd7b50259 ("powerpc/perf/hv-24x7: parse catalog and populate sysfs with events") Reviewed-by: Sukadev Bhattiprolu <[email protected]> Signed-off-by: Thiago Jung Bauermann <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2017-07-02powerpc/mm: Enable ZONE_DEVICE on powerpcOliver O'Halloran1-0/+1
Flip the switch. Running around and screaming "IT'S ALIVE" is optional, but recommended. Signed-off-by: Oliver O'Halloran <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2017-07-02powerpc/mm: Wire up hpte_removebolted for powernvAnton Blanchard1-0/+33
Adds support for removing bolted (i.e kernel linear mapping) mappings on powernv. This is needed to support memory hot unplug operations which are required for the teardown of DAX/PMEM devices. Reviewed-by: Balbir Singh <[email protected]> Reviewed-by: Rashmica Gupta <[email protected]> Signed-off-by: Anton Blanchard <[email protected]> Signed-off-by: Oliver O'Halloran <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2017-07-02powerpc/mm: Add devmap support for ppc64Oliver O'Halloran7-7/+55
Add support for the devmap bit on PTEs and PMDs for PPC64 Book3S. This is used to differentiate device backed memory from transparent huge pages since they are handled in more or less the same manner by the core mm code. Cc: Aneesh Kumar K.V <[email protected]> Signed-off-by: Oliver O'Halloran <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2017-07-02powerpc/vmemmap: Add altmap supportOliver O'Halloran2-5/+26
Adds support to powerpc for the altmap feature of ZONE_DEVICE memory. An altmap is a driver provided region that is used to provide the backing storage for the struct pages of ZONE_DEVICE memory. In situations where large amount of ZONE_DEVICE memory is being added to the system the altmap reduces pressure on main system memory by allowing the mm/ metadata to be stored on the device itself rather in main memory. Reviewed-by: Balbir Singh <[email protected]> Signed-off-by: Oliver O'Halloran <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2017-07-02powerpc/vmemmap: Reshuffle vmemmap_free()Oliver O'Halloran1-23/+25
Removes an indentation level and shuffles some code around to make the following patch cleaner. No functional changes. Reviewed-by: Balbir Singh <[email protected]> Signed-off-by: Oliver O'Halloran <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2017-07-02mm, x86: Add ARCH_HAS_ZONE_DEVICE to KconfigOliver O'Halloran1-0/+1
Currently ZONE_DEVICE depends on X86_64 and this will get unwieldly as new architectures (and platforms) get ZONE_DEVICE support. Move to an arch selected Kconfig option to save us the trouble. Cc: [email protected] Acked-by: Ingo Molnar <[email protected]> Acked-by: Balbir Singh <[email protected]> Signed-off-by: Oliver O'Halloran <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2017-07-02powerpc/hugetlbfs: Export HPAGE_SHIFTOliver O'Halloran1-0/+1
Export it so it can be referenced inside a module. Signed-off-by: Oliver O'Halloran <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2017-07-02powerpc: use spin loop primitives in some functionsNicholas Piggin4-10/+23
Use the different spin loop primitives in some simple powerpc spin loops, including those which will spin as a common case. This will help to test the spin loop primitives before more conversions are done. Signed-off-by: Nicholas Piggin <[email protected]> [mpe: Add some includes of <linux/processor.h>] Signed-off-by: Michael Ellerman <[email protected]>
2017-07-02powerpc/64: implement spin loop primitivesNicholas Piggin1-0/+20
Signed-off-by: Nicholas Piggin <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2017-07-01arm: sunxi: Revert changes merged through net-next.Maxime Ripard14-175/+1
This reverts commits 2c0cba482e79 ("arm: sun8i: sunxi-h3-h5: Add dt node for the syscon control module") to 2428fd0fe550 ("arm64: defconfig: Enable dwmac-sun8i driver on defconfig") and 3432a86e641c ("arm: sun8i: orangepipc: use internal phy-mode") to 5a79b4f2a5e7 ("arm: sun8i: orangepi-2: use internal phy-mode") that should be merged through the arm-soc tree, and end up in merge conflicts and build failures. Signed-off-by: Maxime Ripard <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-07-01Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds6-10/+15
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "Fixlets for x86: - Prevent kexec crash when KASLR is enabled, which was caused by an address calculation bug - Restore the freeing of PUDs on memory hot remove - Correct a negated pointer check in the intel uncore performance monitoring driver - Plug a memory leak in an error exit path in the RDT code" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/intel_rdt: Fix memory leak on mount failure x86/boot/KASLR: Fix kexec crash due to 'virt_addr' calculation bug x86/boot/KASLR: Add checking for the offset of kernel virtual address randomization perf/x86/intel/uncore: Fix wrong box pointer check x86/mm/hotplug: Fix BUG_ON() after hot-remove by not freeing PUD
2017-07-01KVM: PPC: Book3S HV: Close race with testing for signals on guest entryPaul Mackerras3-79/+127
At present, interrupts are hard-disabled fairly late in the guest entry path, in the assembly code. Since we check for pending signals for the vCPU(s) task(s) earlier in the guest entry path, it is possible for a signal to be delivered before we enter the guest but not be noticed until after we exit the guest for some other reason. Similarly, it is possible for the scheduler to request a reschedule while we are in the guest entry path, and we won't notice until after we have run the guest, potentially for a whole timeslice. Furthermore, with a radix guest on POWER9, we can take the interrupt with the MMU on. In this case we end up leaving interrupts hard-disabled after the guest exit, and they are likely to stay hard-disabled until we exit to userspace or context-switch to another process. This was masking the fact that we were also not setting the RI (recoverable interrupt) bit in the MSR, meaning that if we had taken an interrupt, it would have crashed the host kernel with an unrecoverable interrupt message. To close these races, we need to check for signals and reschedule requests after hard-disabling interrupts, and then keep interrupts hard-disabled until we enter the guest. If there is a signal or a reschedule request from another CPU, it will send an IPI, which will cause a guest exit. This puts the interrupt disabling before we call kvmppc_start_thread() for all the secondary threads of this core that are going to run vCPUs. The reason for that is that once we have started the secondary threads there is no easy way to back out without going through at least part of the guest entry path. However, kvmppc_start_thread() includes some code for radix guests which needs to call smp_call_function(), which must be called with interrupts enabled. To solve this problem, this patch moves that code into a separate function that is called earlier. When the guest exit is caused by an external interrupt, a hypervisor doorbell or a hypervisor maintenance interrupt, we now handle these using the replay facility. __kvmppc_vcore_entry() now returns the trap number that caused the exit on this thread, and instead of the assembly code jumping to the handler entry, we return to C code with interrupts still hard-disabled and set the irq_happened flag in the PACA, so that when we do local_irq_enable() the appropriate handler gets called. With all this, we now have the interrupt soft-enable flag clear while we are in the guest. This is useful because code in the real-mode hypercall handlers that checks whether interrupts are enabled will now see that they are disabled, which is correct, since interrupts are hard-disabled in the real-mode code. Signed-off-by: Paul Mackerras <[email protected]>
2017-07-01KVM: PPC: Book3S HV: Simplify dynamic micro-threading codePaul Mackerras4-54/+39
Since commit b009031f74da ("KVM: PPC: Book3S HV: Take out virtual core piggybacking code", 2016-09-15), we only have at most one vcore per subcore. Previously, the fact that there might be more than one vcore per subcore meant that we had the notion of a "master vcore", which was the vcore that controlled thread 0 of the subcore. We also needed a list per subcore in the core_info struct to record which vcores belonged to each subcore. Now that there can only be one vcore in the subcore, we can replace the list with a simple pointer and get rid of the notion of the master vcore (and in fact treat every vcore as a master vcore). We can also get rid of the subcore_vm[] field in the core_info struct since it is never read. Reviewed-by: David Gibson <[email protected]> Signed-off-by: Paul Mackerras <[email protected]>
2017-07-01Merge tag 'perf-core-for-mingo-4.13-20170630' of ↵Ingo Molnar1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo: Intel PT enhancements: - Support "ptwrite" instruction, a way to stuff 32 or 64 bit values into the Intel PT trace (Adrian Hunter) - Support power events in Intel PT to report changes to C-state (Adrian Hunter) - Synthesize Intel PT events as PERF_RECORD_SAMPLE records with a perf_event_attr.type (PERF_TYPE_SYNTH) just after the range used by the kernel, i.e. right after what is allocated for PMUs, at INT_MAX + 1U, attr.config will have the identification for the synthesized event and the PERF_SAMPLE_RAW payload will have its fields (Adrian Hunter) Infrastructure changes: - Remove warning() and error(), using instead pr_warning() and pr_error(), consolidating error reporting (Arnaldo Carvalho de Melo) - Add platform dependency to 'perf test 15' (Thomas Richter) Signed-off-by: Arnaldo Carvalho de Melo <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2017-06-30x86/intel_rdt: Fix memory leak on mount failureVikas Shivappa1-1/+3
If mount fails, the kn_info directory is not freed causing memory leak. Add the missing error handling path. Fixes: 4e978d06dedb ("x86/intel_rdt: Add "info" files to resctrl file system") Signed-off-by: Vikas Shivappa <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected]
2017-06-30ARM: Prepare for randomized task_structArnd Bergmann3-7/+10
With the new task struct randomization, we can run into a build failure for certain random seeds, which will place fields beyond the allow immediate size in the assembly: arch/arm/kernel/entry-armv.S: Assembler messages: arch/arm/kernel/entry-armv.S:803: Error: bad immediate value for offset (4096) Only two constants in asm-offset.h are affected, and I'm changing both of them here to work correctly in all configurations. One more macro has the problem, but is currently unused, so this removes it instead of adding complexity. Suggested-by: Ard Biesheuvel <[email protected]> Signed-off-by: Arnd Bergmann <[email protected]> [kees: Adjust commit log slightly] Signed-off-by: Kees Cook <[email protected]>
2017-06-30Merge tag 'powerpc-4.12-8' of ↵Linus Torvalds1-7/+1
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: "Hopefully the last two powerpc fixes for 4.12. The CXL one is larger than I'd usually send at rc7, but it fixes new code this cycle, so better to have it working for the release. It was actually sent a few weeks back but got blocked in testing behind another fix that was causing issues. We are still tracking one crash in v4.12-rc7, but only one person has reproduced it and the commit identified by bisect doesn't touch any of the relevant code, so I think it's 50/50 whether that commit is actually the problem or it's some code layout / toolchain issue. Two fixes for code we merged this cycle: - cxl: Fixes for Coherent Accelerator Interface Architecture 2.0 - Avoid miscompilation w/GCC 4.6.3 on 32-bit - don't inline copy_to/from_user() Thanks to Al Viro, Larry Finger, Christophe Lombard" * tag 'powerpc-4.12-8' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/32: Avoid miscompilation w/GCC 4.6.3 - don't inline copy_to/from_user() cxl: Fixes for Coherent Accelerator Interface Architecture 2.0
2017-06-30Merge tag 'iommu-fixes-v4.12-rc7' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull IOMMU fixes from Joerg Roedel: "Two fixes: - A fix for AMD IOMMU interrupt remapping code when IRQs are forwarded directly to KVM guests - Fixed check in the recently merged code to allow tboot with Intel VT-d disabled" * tag 'iommu-fixes-v4.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: iommu/amd: Fix interrupt remapping when disable guest_mode iommu/vt-d: Correctly disable Intel IOMMU force on
2017-06-30ARM: dma-mapping: Remove traces of NOMMU codeVladimir Murzin1-27/+2
DMA operations for NOMMU case have been just factored out into separate compilation unit, so don't keep dead code. Tested-by: Benjamin Gaignard <[email protected]> Tested-by: Andras Szemzo <[email protected]> Tested-by: Alexandre TORGUE <[email protected]> Signed-off-by: Vladimir Murzin <[email protected]> Acked-by: Arnd Bergmann <[email protected]> Acked-by: Russell King <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2017-06-30ARM: NOMMU: Set ARM_DMA_MEM_BUFFERABLE for M-class cpusVladimir Murzin1-2/+6
Now, we have dedicated non-cacheable region for consistent DMA operations. However, that region can still be marked as bufferable by MPU, so it'd be safer to have barriers by default. M-class machines that didn't need it until now also likely won't need it in the future, therefore, we offer this as an option. Tested-by: Benjamin Gaignard <[email protected]> Tested-by: Andras Szemzo <[email protected]> Tested-by: Alexandre TORGUE <[email protected]> Reviewed-by: Robin Murphy <[email protected]> Signed-off-by: Vladimir Murzin <[email protected]> Acked-by: Russell King <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2017-06-30ARM: NOMMU: Introduce dma operations for noMMUVladimir Murzin4-4/+232
R/M classes of cpus can have memory covered by MPU which in turn might configure RAM as Normal i.e. bufferable and cacheable. It breaks dma_alloc_coherent() and friends, since data can stuck in caches now or be buffered. This patch factors out DMA support for NOMMU configuration into separate entity which provides dedicated dma_ops. We have to handle there several cases: - configurations with MMU/MPU setup - configurations without MMU/MPU setup - special case for M-class, since caches and MPU there are optional In general we rely on default DMA area for coherent allocations or/and per-device memory reserves suitable for coherent DMA, so if such regions are set coherent allocations go from there. In case MMU/MPU was not setup we fallback to normal page allocator for DMA memory allocation. In case we run M-class cpus, for configuration without cache support (like Cortex-M3/M4) dma operations are forced to be coherent and wired with dma-noop (such decision is made based on cacheid global variable); however, if caches are detected there and no DMA coherent region is given (either default or per-device), dma is disallowed even MPU is not set - it is because M-class implement system memory map which defines part of address space as Normal memory. Reported-by: Alexandre Torgue <[email protected]> Reported-by: Andras Szemzo <[email protected]> Tested-by: Benjamin Gaignard <[email protected]> Tested-by: Andras Szemzo <[email protected]> Tested-by: Alexandre TORGUE <[email protected]> Reviewed-by: Robin Murphy <[email protected]> Signed-off-by: Vladimir Murzin <[email protected]> Acked-by: Arnd Bergmann <[email protected]> Acked-by: Russell King <[email protected]> [hch: removed the dma_supported() implementation that isn't required anymore] Signed-off-by: Christoph Hellwig <[email protected]>
2017-06-30drivers: dma-mapping: allow dma_common_mmap() for NOMMUVladimir Murzin6-0/+6
Currently, internals of dma_common_mmap() is compiled out if build is done for either NOMMU or target which explicitly says it does not have/want coherent DMA mmap. It turned out that dma_common_mmap() can be handy in NOMMU setup (at least for ARM). This patch converts exitent NOMMU targets to use ARCH_NO_COHERENT_DMA_MMAP, thus when CONFIG_MMU is gone from dma_common_mmap() their behaviour stays unchanged. ARM is not converted to ARCH_NO_COHERENT_DMA_MMAP because it 1) already has mmap callback which can handle (at some extent) NOMMU 2) already defines dummy pgprot_noncached() for NOMMU build. c6x and frv stay untouched since they already have ARCH_NO_COHERENT_DMA_MMAP. Cc: Steven Miao <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Michal Simek <[email protected]> Cc: Yoshinori Sato <[email protected]> Cc: Rich Felker <[email protected]> Cc: Chris Zankel <[email protected]> Cc: Max Filippov <[email protected]> Suggested-by: Christoph Hellwig <[email protected]> Signed-off-by: Vladimir Murzin <[email protected]> Tested-by: Benjamin Gaignard <[email protected]>
2017-06-30Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller64-395/+340
A set of overlapping changes in macvlan and the rocker driver, nothing serious. Signed-off-by: David S. Miller <[email protected]>
2017-06-30x86/PCI: Avoid AMD SB7xx EHCI USB wakeup defectKai-Heng Feng1-0/+15
On an AMD Carrizo laptop, when EHCI runtime PM is enabled, EHCI ports do not assert PME# for device plug/unplug events while in D3. As Alan Stern points out [1], the PME signal is not enabled when controller is in D3, therefore it's not being woken up when new devices get plugged in. Testing shows PME signal works when the EHCI power state is D2. Clear the PCI_PM_CAP_PME_D3 and PCI_PM_CAP_PME_D3cold bits in dev->pme_support to indicate the device will not assert PME# from those states. [1] http://lkml.kernel.org/r/[email protected] Link: https://bugzilla.kernel.org/show_bug.cgi?id=196091 Link: https://support.amd.com/TechDocs/46837.pdf (Section 23) Link: https://support.amd.com/TechDocs/42413.pdf (Appendix A2) Signed-off-by: Kai-Heng Feng <[email protected]> [bhelgaas: changelog, add parens in quirk] Signed-off-by: Bjorn Helgaas <[email protected]>
2017-06-30arm64: fix endianness annotation for 'struct jit_ctx' and friendsLuc Van Oostenryck1-3/+3
struct jit_ctx::image is used the store a pointer to the jitted intructions, which are always little-endian. These instructions are thus correctly converted from native order to little-endian before being stored but the pointer 'image' is declared as for native order values. Fix this by declaring the field as __le32* instead of u32*. Same for the pointer used in jit_fill_hole() to initialize the image. Signed-off-by: Luc Van Oostenryck <[email protected]> Signed-off-by: Will Deacon <[email protected]>
2017-06-30arm64: cpuinfo: constify attribute_group structures.Arvind Yadav1-1/+1
attribute_groups are not supposed to change at runtime. All functions working with attribute_groups provided by <linux/sysfs.h> work with const attribute_group. So mark the non-const structs as const. Signed-off-by: Arvind Yadav <[email protected]> Signed-off-by: Will Deacon <[email protected]>
2017-06-30KVM: x86: remove ignored type attributeNick Desaulniers1-1/+1
The macro insn_fetch marks the 'type' argument as having a specified alignment. Type attributes can only be applied to structs, unions, or enums, but insn_fetch is only ever invoked with integral types, so Clang produces 19 -Wignored-attributes warnings for this source file. Signed-off-by: Nick Desaulniers <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2017-06-30Merge tag 'kvmarm-for-4.13' of ↵Paolo Bonzini33-138/+281
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/ARM updates for 4.13 - vcpu request overhaul - allow timer and PMU to have their interrupt number selected from userspace - workaround for Cavium erratum 30115 - handling of memory poisonning - the usual crop of fixes and cleanups Conflicts: arch/s390/include/asm/kvm_host.h
2017-06-30objtool, x86: Add several functions and files to the objtool whitelistJosh Poimboeuf14-5/+36
In preparation for an objtool rewrite which will have broader checks, whitelist functions and files which cause problems because they do unusual things with the stack. These whitelists serve as a TODO list for which functions and files don't yet have undwarf unwinder coverage. Eventually most of the whitelists can be removed in favor of manual CFI hint annotations or objtool improvements. Signed-off-by: Josh Poimboeuf <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Jiri Slaby <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/7f934a5d707a574bda33ea282e9478e627fb1829.1498659915.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <[email protected]>
2017-06-30x86/mm: Delete a big outdated comment about TLB flushingAndy Lutomirski1-36/+0
The comment describes the old explicit IPI-based flush logic, which is long gone. Signed-off-by: Andy Lutomirski <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Arjan van de Ven <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Nadav Amit <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/55e44997e56086528140c5180f8337dc53fb7ffc.1498751203.git.luto@kernel.org Signed-off-by: Ingo Molnar <[email protected]>
2017-06-30x86/mm: Don't reenter flush_tlb_func_common()Andy Lutomirski1-2/+15
It was historically possible to have two concurrent TLB flushes targetting the same CPU: one initiated locally and one initiated remotely. This can now cause an OOPS in leave_mm() at arch/x86/mm/tlb.c:47: if (this_cpu_read(cpu_tlbstate.state) == TLBSTATE_OK) BUG(); with this call trace: flush_tlb_func_local arch/x86/mm/tlb.c:239 [inline] flush_tlb_mm_range+0x26d/0x370 arch/x86/mm/tlb.c:317 Without reentrancy, this OOPS is impossible: leave_mm() is only called if we're not in TLBSTATE_OK, but then we're unexpectedly in TLBSTATE_OK in leave_mm(). This can be caused by flush_tlb_func_remote() happening between the two checks and calling leave_mm(), resulting in two consecutive leave_mm() calls on the same CPU with no intervening switch_mm() calls. We never saw this OOPS before because the old leave_mm() implementation didn't put us back in TLBSTATE_OK, so the assertion didn't fire. Nadav noticed the reentrancy issue in a different context, but neither of us realized that it caused a problem yet. Reported-by: Levin, Alexander (Sasha Levin) <[email protected]> Signed-off-by: Andy Lutomirski <[email protected]> Reviewed-by: Nadav Amit <[email protected]> Reviewed-by: Thomas Gleixner <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Arjan van de Ven <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rik van Riel <[email protected]> Cc: [email protected] Fixes: 3d28ebceaffa ("x86/mm: Rework lazy TLB to track the actual loaded mm") Link: http://lkml.kernel.org/r/855acf733268d521c9f2e191faee2dcc23a29729.1498751203.git.luto@kernel.org Signed-off-by: Ingo Molnar <[email protected]>
2017-06-30x86/uaccess: Optimize copy_user_enhanced_fast_string() for short stringsPaolo Abeni1-2/+5
According to the Intel datasheet, the REP MOVSB instruction exposes a pretty heavy setup cost (50 ticks), which hurts short string copy operations. This change tries to avoid this cost by calling the explicit loop available in the unrolled code for strings shorter than 64 bytes. The 64 bytes cutoff value is arbitrary from the code logic point of view - it has been selected based on measurements, as the largest value that still ensures a measurable gain. Micro benchmarks of the __copy_from_user() function with lengths in the [0-63] range show this performance gain (shorter the string, larger the gain): - in the [55%-4%] range on Intel Xeon(R) CPU E5-2690 v4 - in the [72%-9%] range on Intel Core i7-4810MQ Other tested CPUs - namely Intel Atom S1260 and AMD Opteron 8216 - show no difference, because they do not expose the ERMS feature bit. Signed-off-by: Paolo Abeni <[email protected]> Acked-by: Linus Torvalds <[email protected]> Cc: Alan Cox <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Hannes Frederic Sowa <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: Kees Cook <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/4533a1d101fd460f80e21329a34928fad521c1d4.1498744345.git.pabeni@redhat.com [ Clarified the changelog. ] Signed-off-by: Ingo Molnar <[email protected]>
2017-06-30perf/x86/intel: Constify the 'lbr_desc[]' array and make a function staticColin Ian King1-2/+2
A few minor clean-ups: constify the lbr_desc[] array and make local function lbr_from_signext_quirk_rd() static to fix a sparse warning: "symbol 'lbr_from_signext_quirk_rd' was not declared. Should it be static?" Signed-off-by: Colin Ian King <[email protected]> Cc: Dan Carpenter <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2017-06-30x86/KASLR: Fix detection 32/64 bit bootloaders for 5-level pagingKirill A. Shutemov1-6/+12
KASLR uses hack to detect whether we booted via startup_32() or startup_64(): it checks what is loaded into cr3 and compares it to _pgtables. _pgtables is the array of page tables where early code allocates page table from. KASLR expects cr3 to point to _pgtables if we booted via startup_32(), but that's not true if we booted with 5-level paging enabled. In this case top level page table is allocated separately and only the first p4d page table is allocated from the array. Let's modify the check to cover both 4- and 5-level paging cases. The patch also renames 'level4p' to 'top_level_pgt' as it now can hold page table for 4th or 5th level, depending on configuration. Signed-off-by: Kirill A. Shutemov <[email protected]> Acked-by: Kees Cook <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2017-06-30x86/boot/KASLR: Fix kexec crash due to 'virt_addr' calculation bugBaoquan He3-7/+2
Kernel text KASLR is separated into physical address and virtual address randomization. And for virtual address randomization, we only randomiza to get an offset between 16M and KERNEL_IMAGE_SIZE. So the initial value of 'virt_addr' should be LOAD_PHYSICAL_ADDR, but not the original kernel loading address 'output'. The bug will cause kernel boot failure if kernel is loaded at a different position than the address, 16M, which is decided at compiled time. Kexec/kdump is such practical case. To fix it, just assign LOAD_PHYSICAL_ADDR to virt_addr as initial value. Tested-by: Dave Young <[email protected]> Signed-off-by: Baoquan He <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Fixes: 8391c73 ("x86/KASLR: Randomize virtual address separately") Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2017-06-30x86/boot/KASLR: Add checking for the offset of kernel virtual address ↵Baoquan He1-0/+2
randomization For kernel text KASLR, the virtual address is confined to area of 1G, [0xffffffff80000000, 0xffffffffc0000000). For the implemenataion of virtual address randomization, we only randomize to get an offset between 16M and 1G, then add this offset to the starting address, 0xffffffff80000000. Here 16M is the offset which is decided at linking stage. So the amount of the local variable 'virt_addr' which respresents the offset plus the kernel output size can not exceed KERNEL_IMAGE_SIZE. Add a debug check for the offset. If out of bounds, print error message and hang there. Suggested-by: Ingo Molnar <[email protected]> Signed-off-by: Baoquan He <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2017-06-30MIPS: Avoid accidental raw backtraceJames Hogan1-0/+2
Since commit 81a76d7119f6 ("MIPS: Avoid using unwind_stack() with usermode") show_backtrace() invokes the raw backtracer when cp0_status & ST0_KSU indicates user mode to fix issues on EVA kernels where user and kernel address spaces overlap. However this is used by show_stack() which creates its own pt_regs on the stack and leaves cp0_status uninitialised in most of the code paths. This results in the non deterministic use of the raw back tracer depending on the previous stack content. show_stack() deals exclusively with kernel mode stacks anyway, so explicitly initialise regs.cp0_status to KSU_KERNEL (i.e. 0) to ensure we get a useful backtrace. Fixes: 81a76d7119f6 ("MIPS: Avoid using unwind_stack() with usermode") Signed-off-by: James Hogan <[email protected]> Cc: [email protected] Cc: <[email protected]> # 3.15+ Patchwork: https://patchwork.linux-mips.org/patch/16656/ Signed-off-by: Ralf Baechle <[email protected]>
2017-06-30MIPS: Perform post-DMA cache flushes on systems with MAARsPaul Burton1-5/+18
Recent CPUs from Imagination Technologies such as the I6400 or P6600 are able to speculatively fetch data from memory into caches. This means that if used in a system with non-coherent DMA they require that caches be invalidated after a device performs DMA, and before the CPU reads the DMA'd data, in order to ensure that stale values weren't speculatively prefetched. Such CPUs also introduced Memory Accessibility Attribute Registers (MAARs) in order to control the regions in which they are allowed to speculate. Thus we can use the presence of MAARs as a good indication that the CPU requires the above cache maintenance. Use the presence of MAARs to determine the result of cpu_needs_post_dma_flush() in the default case, in order to handle these recent CPUs correctly. Note that the return type of cpu_needs_post_dma_flush() is changed to bool, such that it's clearer what's happening when cpu_has_maar is cast to bool for the return value. If this patch were backported to a pre-v4.7 kernel then MIPS_CPU_MAAR was 1ull<<34, so when cast to an int we would incorrectly return 0. It so happens that MIPS_CPU_MAAR is currently 1ull<<30, so when truncated to an int gives a non-zero value anyway, but even so the implicit conversion from long long int to bool makes it clearer to understand what will happen than the implicit conversion from long long int to int would. The bool return type also fits this usage better semantically, so seems like an all-round win. Thanks to Ed for spotting the issue for pre-v4.7 kernels & suggesting the return type change. Signed-off-by: Paul Burton <[email protected]> Reviewed-by: Bryan O'Donoghue <[email protected]> Tested-by: Bryan O'Donoghue <[email protected]> Cc: Ed Blake <[email protected]> Cc: [email protected] Patchwork: https://patchwork.linux-mips.org/patch/16363/ Signed-off-by: Ralf Baechle <[email protected]>
2017-06-30MIPS: Fix IRQ tracing & lockdep when reschedulingPaul Burton1-0/+3
When the scheduler sets TIF_NEED_RESCHED & we call into the scheduler from arch/mips/kernel/entry.S we disable interrupts. This is true regardless of whether we reach work_resched from syscall_exit_work, resume_userspace or by looping after calling schedule(). Although we disable interrupts in these paths we don't call trace_hardirqs_off() before calling into C code which may acquire locks, and we therefore leave lockdep with an inconsistent view of whether interrupts are disabled or not when CONFIG_PROVE_LOCKING & CONFIG_DEBUG_LOCKDEP are both enabled. Without tracing this interrupt state lockdep will print warnings such as the following once a task returns from a syscall via syscall_exit_partial with TIF_NEED_RESCHED set: [ 49.927678] ------------[ cut here ]------------ [ 49.934445] WARNING: CPU: 0 PID: 1 at kernel/locking/lockdep.c:3687 check_flags.part.41+0x1dc/0x1e8 [ 49.946031] DEBUG_LOCKS_WARN_ON(current->hardirqs_enabled) [ 49.946355] CPU: 0 PID: 1 Comm: init Not tainted 4.10.0-00439-gc9fd5d362289-dirty #197 [ 49.963505] Stack : 0000000000000000 ffffffff81bb5d6a 0000000000000006 ffffffff801ce9c4 [ 49.974431] 0000000000000000 0000000000000000 0000000000000000 000000000000004a [ 49.985300] ffffffff80b7e487 ffffffff80a24498 a8000000ff160000 ffffffff80ede8b8 [ 49.996194] 0000000000000001 0000000000000000 0000000000000000 0000000077c8030c [ 50.007063] 000000007fd8a510 ffffffff801cd45c 0000000000000000 a8000000ff127c88 [ 50.017945] 0000000000000000 ffffffff801cf928 0000000000000001 ffffffff80a24498 [ 50.028827] 0000000000000000 0000000000000001 0000000000000000 0000000000000000 [ 50.039688] 0000000000000000 a8000000ff127bd0 0000000000000000 ffffffff805509bc [ 50.050575] 00000000140084e0 0000000000000000 0000000000000000 0000000000040a00 [ 50.061448] 0000000000000000 ffffffff8010e1b0 0000000000000000 ffffffff805509bc [ 50.072327] ... [ 50.076087] Call Trace: [ 50.079869] [<ffffffff8010e1b0>] show_stack+0x80/0xa8 [ 50.086577] [<ffffffff805509bc>] dump_stack+0x10c/0x190 [ 50.093498] [<ffffffff8015dde0>] __warn+0xf0/0x108 [ 50.099889] [<ffffffff8015de34>] warn_slowpath_fmt+0x3c/0x48 [ 50.107241] [<ffffffff801c15b4>] check_flags.part.41+0x1dc/0x1e8 [ 50.114961] [<ffffffff801c239c>] lock_is_held_type+0x8c/0xb0 [ 50.122291] [<ffffffff809461b8>] __schedule+0x8c0/0x10f8 [ 50.129221] [<ffffffff80946a60>] schedule+0x30/0x98 [ 50.135659] [<ffffffff80106278>] work_resched+0x8/0x34 [ 50.142397] ---[ end trace 0cb4f6ef5b99fe21 ]--- [ 50.148405] possible reason: unannotated irqs-off. [ 50.154600] irq event stamp: 400463 [ 50.159566] hardirqs last enabled at (400463): [<ffffffff8094edc8>] _raw_spin_unlock_irqrestore+0x40/0xa8 [ 50.171981] hardirqs last disabled at (400462): [<ffffffff8094eb98>] _raw_spin_lock_irqsave+0x30/0xb0 [ 50.183897] softirqs last enabled at (400450): [<ffffffff8016580c>] __do_softirq+0x4ac/0x6a8 [ 50.195015] softirqs last disabled at (400425): [<ffffffff80165e78>] irq_exit+0x110/0x128 Fix this by using the TRACE_IRQS_OFF macro to call trace_hardirqs_off() when CONFIG_TRACE_IRQFLAGS is enabled. This is done before invoking schedule() following the work_resched label because: 1) Interrupts are disabled regardless of the path we take to reach work_resched() & schedule(). 2) Performing the tracing here avoids the need to do it in paths which disable interrupts but don't call out to C code before hitting a path which uses the RESTORE_SOME macro that will call trace_hardirqs_on() or trace_hardirqs_off() as appropriate. We call trace_hardirqs_on() using the TRACE_IRQS_ON macro before calling syscall_trace_leave() for similar reasons, ensuring that lockdep has a consistent view of state after we re-enable interrupts. Signed-off-by: Paul Burton <[email protected]> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Cc: [email protected] Cc: stable <[email protected]> Patchwork: https://patchwork.linux-mips.org/patch/15385/ Signed-off-by: Ralf Baechle <[email protected]>
2017-06-30MIPS: pm-cps: Drop manual cache-line alignment of ready_countPaul Burton1-8/+1
We allocate memory for a ready_count variable per-CPU, which is accessed via a cached non-coherent TLB mapping to perform synchronisation between threads within the core using LL/SC instructions. In order to ensure that the variable is contained within its own data cache line we allocate 2 lines worth of memory & align the resulting pointer to a line boundary. This is however unnecessary, since kmalloc is guaranteed to return memory which is at least cache-line aligned (see ARCH_DMA_MINALIGN). Stop the redundant manual alignment. Besides cleaning up the code & avoiding needless work, this has the side effect of avoiding an arithmetic error found by Bryan on 64 bit systems due to the 32 bit size of the former dlinesz. This led the ready_count variable to have its upper 32b cleared erroneously for MIPS64 kernels, causing problems when ready_count was later used on MIPS64 via cpuidle. Signed-off-by: Paul Burton <[email protected]> Fixes: 3179d37ee1ed ("MIPS: pm-cps: add PM state entry code for CPS systems") Reported-by: Bryan O'Donoghue <[email protected]> Reviewed-by: Bryan O'Donoghue <[email protected]> Tested-by: Bryan O'Donoghue <[email protected]> Cc: [email protected] Cc: stable <[email protected]> # v3.16+ Patchwork: https://patchwork.linux-mips.org/patch/15383/ Signed-off-by: Ralf Baechle <[email protected]>
2017-06-30tile: remove unneeded extra-y in MakefileMasahiro Yamada1-1/+0
This extra-y is unneeded because vdso.lds is generated according to the dependency. Signed-off-by: Masahiro Yamada <[email protected]>
2017-06-30kbuild: thin archives make default for all archsNicholas Piggin2-9/+1
Make thin archives build the default, but keep the config option to allow exemptions if any breakage can't be quickly solved. Signed-off-by: Nicholas Piggin <[email protected]> Signed-off-by: Masahiro Yamada <[email protected]>
2017-06-30x86/um: thin archives build fixNicholas Piggin1-1/+1
The linker does not like vdso-syms.lds in input archive files. Make it an extra-y instead. Cc: Jeff Dike <[email protected]> Cc: Richard Weinberger <[email protected]> Cc: [email protected] Signed-off-by: Nicholas Piggin <[email protected]> Signed-off-by: Masahiro Yamada <[email protected]>
2017-06-30tile: thin archives fix linkingNicholas Piggin1-7/+7
The VDSO symbols can't be linked into built-in.o when building with thin archives, so change this to linking a new object file that is included into the built-in.o. Cc: Chris Metcalf <[email protected]> Signed-off-by: Nicholas Piggin <[email protected]> Signed-off-by: Masahiro Yamada <[email protected]>
2017-06-30ia64: thin archives fix linkingNicholas Piggin1-6/+9
The VDSO symbols can't be linked into built-in.o when building with thin archives, so change this to linking a new object file that is included into the built-in.o. Cc: Tony Luck <[email protected]> Cc: Fenghua Yu <[email protected]> Cc: [email protected] Signed-off-by: Nicholas Piggin <[email protected]> Signed-off-by: Masahiro Yamada <[email protected]>
2017-06-30sh: thin archives fix linkingNicholas Piggin1-8/+8
The VDSO symbols can't be linked into built-in.o when building with thin archives, so change this to linking a new object file that is included into the built-in.o. Cc: Yoshinori Sato <[email protected]> Cc: Rich Felker <[email protected]> Cc: [email protected] Signed-off-by: Nicholas Piggin <[email protected]> Signed-off-by: Masahiro Yamada <[email protected]>
2017-06-30ia64: remove unneeded extra-y in Makefile.gateMasahiro Yamada1-3/+1
All the files listed in "extra-y" are generated according to the dependency. They are still needed in "targets" to include .*.cmd for incremental building. Signed-off-by: Masahiro Yamada <[email protected]>
2017-06-30tile: fix dependency and .*.cmd inclusion for incremental buildMasahiro Yamada1-6/+5
Build targets using if_changed(_rule) must depend on FORCE so that they are evaluated every time. In order to include .*.cmd files correctly, build targets added to "targets" must not be prefixed with $(obj)/ because it is done by scripts/Makefile.lib . Signed-off-by: Masahiro Yamada <[email protected]>