aboutsummaryrefslogtreecommitdiff
path: root/arch/arm64/mm
AgeCommit message (Collapse)AuthorFilesLines
2016-04-21arm64/dma-mapping: Extend DMA ops workaround to PCI devicesRobin Murphy1-0/+4
PCI devices now suffer the same hiccup as platform devices, in that they get their DMA ops configured before they have been added to their bus, and thus before we know whether they have successfully registered with an IOMMU or not. Until the necessary driver core changes to reorder calls during device creation have been worked out, extend our delayed notifier trick onto the PCI bus so as to avoid broken DMA ops once IOMMUs get plugged into the PCI code. Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-04-19arm64: mm: Show bss segment in kernel memory layoutKefeng Wang1-0/+2
Show the bss segment information as with text and data in Virtual memory kernel layout. Acked-by: James Morse <james.morse@arm.com> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-04-19arm64: mm: make pr_cont() per line in Virtual kernel memory layoutKefeng Wang1-11/+10
Each line with single pr_cont() in Virtual kernel memory layout, or the dump of the kernel memory layout in dmesg is not aligned when PRINTK_TIME enabled, due to the missing time stamps. Tested-by: James Morse <james.morse@arm.com> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-04-19arm64: mm: remove the redundant codeHuang Shijie1-4/+0
We already re-enable interrupts where necessary in the entry code, so there is no need to do it again in do_page fault. This patch removes the redundant code. Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Huang Shijie <shijie.huang@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-04-15arm64: Implement ptep_set_access_flags() for hardware AF/DBMCatalin Marinas1-0/+50
When hardware updates of the access and dirty states are enabled, the default ptep_set_access_flags() implementation based on calling set_pte_at() directly is potentially racy. This triggers the "racy dirty state clearing" warning in set_pte_at() because an existing writable PTE is overridden with a clean entry. There are two main scenarios for this situation: 1. The CPU getting an access fault does not support hardware updates of the access/dirty flags. However, a different agent in the system (e.g. SMMU) can do this, therefore overriding a writable entry with a clean one could potentially lose the automatically updated dirty status 2. A more complex situation is possible when all CPUs support hardware AF/DBM: a) Initial state: shareable + writable vma and pte_none(pte) b) Read fault taken by two threads of the same process on different CPUs c) CPU0 takes the mmap_sem and proceeds to handling the fault. It eventually reaches do_set_pte() which sets a writable + clean pte. CPU0 releases the mmap_sem d) CPU1 acquires the mmap_sem and proceeds to handle_pte_fault(). The pte entry it reads is present, writable and clean and it continues to pte_mkyoung() e) CPU1 calls ptep_set_access_flags() If between (d) and (e) the hardware (another CPU) updates the dirty state (clears PTE_RDONLY), CPU1 will override the PTR_RDONLY bit marking the entry clean again. This patch implements an arm64-specific ptep_set_access_flags() function to perform an atomic update of the PTE flags. Fixes: 2f4b829c625e ("arm64: Add support for hardware updates of the access and dirty pte bits") Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Reported-by: Ming Lei <tom.leiming@gmail.com> Tested-by: Julien Grall <julien.grall@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: <stable@vger.kernel.org> # 4.3+ [will: reworded comment] Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-04-15arm64, numa: Add NUMA support for arm64 platforms.Ganapatrao Kulkarni3-5/+427
Attempt to get the memory and CPU NUMA node via of_numa. If that fails, default the dummy NUMA node and map all memory and CPUs to node 0. Tested-by: Shannon Zhao <shannon.zhao@linaro.org> Reviewed-by: Robert Richter <rrichter@cavium.com> Signed-off-by: Ganapatrao Kulkarni <gkulkarni@caviumnetworks.com> Signed-off-by: David Daney <david.daney@cavium.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-04-15arm64: Move unflatten_device_tree() call earlier.David Daney2-3/+0
In order to extract NUMA information from the device tree, we need to have the tree in its unflattened form. Move the call to bootmem_init() in the tail of paging_init() into setup_arch, and adjust header files so that its declaration is visible. Move the unflatten_device_tree() call between the calls to paging_init() and bootmem_init(). Follow on patches add NUMA handling to bootmem_init(). Signed-off-by: David Daney <david.daney@cavium.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-04-15arm64: Add cpu_panic_kernel helperSuzuki K Poulose1-2/+1
During the activation of a secondary CPU, we could report serious configuration issues and hence request to crash the kernel. We do this for CPU ASID bit check now. We will need it also for handling mismatched exception levels for the CPUs with VHE. Hence, add a helper to do the same for reusability. Cc: Mark Rutland <mark.rutland@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-04-14arm64: mm: Add trace_irqflags annotations to do_debug_exception()James Morse1-10/+23
With CONFIG_PROVE_LOCKING, CONFIG_DEBUG_LOCKDEP and CONFIG_TRACE_IRQFLAGS enabled, lockdep will compare current->hardirqs_enabled with the flags from local_irq_save(). When a debug exception occurs, interrupts are disabled in entry.S, but lockdep isn't told, resulting in: DEBUG_LOCKS_WARN_ON(current->hardirqs_enabled) ------------[ cut here ]------------ WARNING: at ../kernel/locking/lockdep.c:3523 Modules linked in: CPU: 3 PID: 1752 Comm: perf Not tainted 4.5.0-rc4+ #2204 Hardware name: ARM Juno development board (r1) (DT) task: ffffffc974868000 ti: ffffffc975f40000 task.ti: ffffffc975f40000 PC is at check_flags.part.35+0x17c/0x184 LR is at check_flags.part.35+0x17c/0x184 pc : [<ffffff80080fc93c>] lr : [<ffffff80080fc93c>] pstate: 600003c5 [...] ---[ end trace 74631f9305ef5020 ]--- Call trace: [<ffffff80080fc93c>] check_flags.part.35+0x17c/0x184 [<ffffff80080ffe30>] lock_acquire+0xa8/0xc4 [<ffffff8008093038>] breakpoint_handler+0x118/0x288 [<ffffff8008082434>] do_debug_exception+0x3c/0xa8 [<ffffff80080854b4>] el1_dbg+0x18/0x6c [<ffffff80081e82f4>] do_filp_open+0x64/0xdc [<ffffff80081d6e60>] do_sys_open+0x140/0x204 [<ffffff80081d6f58>] SyS_openat+0x10/0x18 [<ffffff8008085d30>] el0_svc_naked+0x24/0x28 possible reason: unannotated irqs-off. irq event stamp: 65857 hardirqs last enabled at (65857): [<ffffff80081fb1c0>] lookup_mnt+0xf4/0x1b4 hardirqs last disabled at (65856): [<ffffff80081fb188>] lookup_mnt+0xbc/0x1b4 softirqs last enabled at (65790): [<ffffff80080bdca4>] __do_softirq+0x1f8/0x290 softirqs last disabled at (65757): [<ffffff80080be038>] irq_exit+0x9c/0xd0 This patch adds the annotations to do_debug_exception(), while trying not to call trace_hardirqs_off() if el1_dbg() interrupted a task that already had irqs disabled. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-04-14arm64: cover the .head.text section in the .text segment mappingArd Biesheuvel1-5/+5
Keeping .head.text out of the .text mapping buys us very little: its actual payload is only 4 KB, most of which is padding, but the page alignment may add up to 2 MB (in case of CONFIG_DEBUG_ALIGN_RODATA=y) of additional padding to the uncompressed kernel Image. Also, on 4 KB granule kernels, the 4 KB misalignment of .text forces us to map the adjacent 56 KB of code without the PTE_CONT attribute, and since this region contains things like the vector table and the GIC interrupt handling entry point, this region is likely to benefit from the reduced TLB pressure that results from PTE_CONT mappings. So remove the alignment between the .head.text and .text sections, and use the [_text, _etext) rather than the [_stext, _etext) interval for mapping the .text segment. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-04-14arm64: use 'segment' rather than 'chunk' to describe mapped kernel regionsArd Biesheuvel1-7/+7
Replace the poorly defined term chunk with segment, which is a term that is already used by the ELF spec to describe contiguous mappings with the same permission attributes of statically allocated ranges of an executable. Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-04-14arm64: mm: move vmemmap region right below the linear regionArd Biesheuvel2-12/+18
This moves the vmemmap region right below PAGE_OFFSET, aka the start of the linear region, and redefines its size to be a power of two. Due to the placement of PAGE_OFFSET in the middle of the address space, whose size is a power of two as well, this guarantees that virt to page conversions and vice versa can be implemented efficiently, by masking and shifting rather than ordinary arithmetic. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-04-14arm64: mm: free __init memory via the linear mappingArd Biesheuvel1-1/+2
The implementation of free_initmem_default() expects __init_begin and __init_end to be covered by the linear mapping, which is no longer the case. So open code it instead, using addresses that are explicitly translated from kernel virtual to linear virtual. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-04-14arm64: add the initrd region to the linear mapping explicitlyArd Biesheuvel1-0/+29
Instead of going out of our way to relocate the initrd if it turns out to occupy memory that is not covered by the linear mapping, just add the initrd to the linear mapping. This puts the burden on the bootloader to pass initrd= and mem= options that are mutually consistent. Note that, since the placement of the linear region in the PA space is also dependent on the placement of the kernel Image, which may reside anywhere in memory, we may still end up with a situation where the initrd and the kernel Image are simply too far apart to be covered by the linear region. Since we now leave it up to the bootloader to pass the initrd in memory that is guaranteed to be accessible by the kernel, add a mention of this to the arm64 boot protocol specification as well. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-04-14arm64/mm: ensure memstart_addr remains sufficiently alignedArd Biesheuvel1-2/+6
After choosing memstart_addr to be the highest multiple of ARM64_MEMSTART_ALIGN less than or equal to the first usable physical memory address, we clip the memblocks to the maximum size of the linear region. Since the kernel may be high up in memory, we take care not to clip the kernel itself, which means we have to clip some memory from the bottom if this occurs, to ensure that the distance between the first and the last usable physical memory address can be covered by the linear region. However, we fail to update memstart_addr if this clipping from the bottom occurs, which means that we may still end up with virtual addresses that wrap into the userland range. So increment memstart_addr as appropriate to prevent this from happening. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-03-24Merge tag 'arm64-upstream' of ↵Linus Torvalds3-40/+30
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull second set of arm64 updates from Catalin Marinas: - KASLR bug fixes: use callee-saved register, boot-time I-cache maintenance - inv_entry asm macro fix (EL0 check typo) - pr_notice("Virtual kernel memory layout...") splitting - Clean-ups: use p?d_set_huge consistently, allow preemption around copy_to_user_page, remove unused __local_flush_icache_all() * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: mm: allow preemption in copy_to_user_page arm64: consistently use p?d_set_huge arm64: kaslr: use callee saved register to preserve SCTLR across C call arm64: Split pr_notice("Virtual kernel memory layout...") into multiple pr_cont() arm64: drop unused __local_flush_icache_all() arm64: fix KASLR boot-time I-cache maintenance arm64/kernel: fix incorrect EL0 check in inv_entry macro
2016-03-24arm64: mm: allow preemption in copy_to_user_pageMark Rutland1-4/+0
Currently we disable preemption in copy_to_user_page; a behaviour that we inherited from the 32-bit arm code. This was necessary for older cores without broadcast data cache maintenance, and ensured that cache lines were dirtied and cleaned by the same CPU. On these systems dirty cache line migration was not possible, so this was sufficient to guarantee coherency. On contemporary systems, cache coherence protocols permit (dirty) cache lines to migrate between CPUs as a result of speculation, prefetching, and other behaviours. To account for this, in ARMv8 data cache maintenance operations are broadcast and affect all data caches in the domain associated with the VA (i.e. ISH for kernel and user mappings). In __switch_to we ensure that tasks can be safely migrated in the middle of a maintenance sequence, using a dsb(ish) to ensure prior explicit memory accesses are observed and cache maintenance operations are completed before a task can be run on another CPU. Given the above, it is not necessary to disable preemption in copy_to_user_page. This patch removes the preempt_{disable,enable} calls, permitting preemption. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-03-24arm64: consistently use p?d_set_hugeMark Rutland1-4/+2
Commit 324420bf91f60582 ("arm64: add support for ioremap() block mappings") added new p?d_set_huge functions which do the hard work to generate and set a correct block entry. These differ from open-coded huge page creation in the early page table code by explicitly setting the P?D_TYPE_SECT bits (which are implicitly retained by mk_sect_prot() for any valid prot), but are otherwise identical (and cannot fail on arm64). For simplicity and consistency, make use of these in the initial page table creation code. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-03-21arm64: Split pr_notice("Virtual kernel memory layout...") into multiple ↵Catalin Marinas1-32/+28
pr_cont() The printk() implementation has a limit of LOG_LINE_MAX (== 1024 - 32) buffer per call which the arm64 mem_init() breaches when printing the virtual memory layout with CONFIG_KASAN enabled. The result is that the last line is no longer printed. This patch splits the call into a pr_notice() + additional pr_cont() calls. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Mark Rutland <mark.rutland@arm.com>
2016-03-18Merge branch 'akpm' (patches from Andrew)Linus Torvalds2-2/+2
Merge second patch-bomb from Andrew Morton: - a couple of hotfixes - the rest of MM - a new timer slack control in procfs - a couple of procfs fixes - a few misc things - some printk tweaks - lib/ updates, notably to radix-tree. - add my and Nick Piggin's old userspace radix-tree test harness to tools/testing/radix-tree/. Matthew said it was a godsend during the radix-tree work he did. - a few code-size improvements, switching to __always_inline where gcc screwed up. - partially implement character sets in sscanf * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (118 commits) sscanf: implement basic character sets lib/bug.c: use common WARN helper param: convert some "on"/"off" users to strtobool lib: add "on"/"off" support to kstrtobool lib: update single-char callers of strtobool() lib: move strtobool() to kstrtobool() include/linux/unaligned: force inlining of byteswap operations include/uapi/linux/byteorder, swab: force inlining of some byteswap operations include/asm-generic/atomic-long.h: force inlining of some atomic_long operations usb: common: convert to use match_string() helper ide: hpt366: convert to use match_string() helper ata: hpt366: convert to use match_string() helper power: ab8500: convert to use match_string() helper power: charger_manager: convert to use match_string() helper drm/edid: convert to use match_string() helper pinctrl: convert to use match_string() helper device property: convert to use match_string() helper lib/string: introduce match_string() helper radix-tree tests: add test for radix_tree_iter_next radix-tree tests: add regression3 test ...
2016-03-17Merge tag 'arm64-upstream' of ↵Linus Torvalds9-268/+655
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Catalin Marinas: "Here are the main arm64 updates for 4.6. There are some relatively intrusive changes to support KASLR, the reworking of the kernel virtual memory layout and initial page table creation. Summary: - Initial page table creation reworked to avoid breaking large block mappings (huge pages) into smaller ones. The ARM architecture requires break-before-make in such cases to avoid TLB conflicts but that's not always possible on live page tables - Kernel virtual memory layout: the kernel image is no longer linked to the bottom of the linear mapping (PAGE_OFFSET) but at the bottom of the vmalloc space, allowing the kernel to be loaded (nearly) anywhere in physical RAM - Kernel ASLR: position independent kernel Image and modules being randomly mapped in the vmalloc space with the randomness is provided by UEFI (efi_get_random_bytes() patches merged via the arm64 tree, acked by Matt Fleming) - Implement relative exception tables for arm64, required by KASLR (initial code for ARCH_HAS_RELATIVE_EXTABLE added to lib/extable.c but actual x86 conversion to deferred to 4.7 because of the merge dependencies) - Support for the User Access Override feature of ARMv8.2: this allows uaccess functions (get_user etc.) to be implemented using LDTR/STTR instructions. Such instructions, when run by the kernel, perform unprivileged accesses adding an extra level of protection. The set_fs() macro is used to "upgrade" such instruction to privileged accesses via the UAO bit - Half-precision floating point support (part of ARMv8.2) - Optimisations for CPUs with or without a hardware prefetcher (using run-time code patching) - copy_page performance improvement to deal with 128 bytes at a time - Sanity checks on the CPU capabilities (via CPUID) to prevent incompatible secondary CPUs from being brought up (e.g. weird big.LITTLE configurations) - valid_user_regs() reworked for better sanity check of the sigcontext information (restored pstate information) - ACPI parking protocol implementation - CONFIG_DEBUG_RODATA enabled by default - VDSO code marked as read-only - DEBUG_PAGEALLOC support - ARCH_HAS_UBSAN_SANITIZE_ALL enabled - Erratum workaround Cavium ThunderX SoC - set_pte_at() fix for PROT_NONE mappings - Code clean-ups" * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (99 commits) arm64: kasan: Fix zero shadow mapping overriding kernel image shadow arm64: kasan: Use actual memory node when populating the kernel image shadow arm64: Update PTE_RDONLY in set_pte_at() for PROT_NONE permission arm64: Fix misspellings in comments. arm64: efi: add missing frame pointer assignment arm64: make mrs_s prefixing implicit in read_cpuid arm64: enable CONFIG_DEBUG_RODATA by default arm64: Rework valid_user_regs arm64: mm: check at build time that PAGE_OFFSET divides the VA space evenly arm64: KVM: Move kvm_call_hyp back to its original localtion arm64: mm: treat memstart_addr as a signed quantity arm64: mm: list kernel sections in order arm64: lse: deal with clobbered IP registers after branch via PLT arm64: mm: dump: Use VA_START directly instead of private LOWEST_ADDR arm64: kconfig: add submenu for 8.2 architectural features arm64: kernel: acpi: fix ioremap in ACPI parking protocol cpu_postboot arm64: Add support for Half precision floating point arm64: Remove fixmap include fragility arm64: Add workaround for Cavium erratum 27456 arm64: mm: Mark .rodata as RO ...
2016-03-17mm: remove VM_FAULT_MINORJan Kara1-1/+1
The define has a comment from Nick Piggin from 2007: /* For backwards compat. Remove me quickly. */ I guess 9 years should not be too hurried sense of 'quickly' even for kernel measures. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-17mm: cleanup *pte_alloc* interfacesKirill A. Shutemov1-1/+1
There are few things about *pte_alloc*() helpers worth cleaning up: - 'vma' argument is unused, let's drop it; - most __pte_alloc() callers do speculative check for pmd_none(), before taking ptl: let's introduce pte_alloc() macro which does the check. The only direct user of __pte_alloc left is userfaultfd, which has different expectation about atomicity wrt pmd. - pte_alloc_map() and pte_alloc_map_lock() are redefined using pte_alloc(). [sudeep.holla@arm.com: fix build for arm64 hugetlbpage] [sfr@canb.auug.org.au: fix arch/arm/mm/mmu.c some more] Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Signed-off-by: Sudeep Holla <sudeep.holla@arm.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-11arm64: kasan: Fix zero shadow mapping overriding kernel image shadowCatalin Marinas1-8/+5
With the 16KB and 64KB page size configurations, SWAPPER_BLOCK_SIZE is PAGE_SIZE and ARM64_SWAPPER_USES_SECTION_MAPS is 0. Since kimg_shadow_end is not page aligned (_end shifted by KASAN_SHADOW_SCALE_SHIFT), the edges of previously mapped kernel image shadow via vmemmap_populate() may be overridden by subsequent calls to kasan_populate_zero_shadow(), leading to kernel panics like below: ------------------------------------------------------------------------------ Unable to handle kernel paging request at virtual address fffffc100135068c pgd = fffffc8009ac0000 [fffffc100135068c] *pgd=00000009ffee0003, *pud=00000009ffee0003, *pmd=00000009ffee0003, *pte=00e0000081a00793 Internal error: Oops: 9600004f [#1] PREEMPT SMP Modules linked in: CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.5.0-rc4+ #1984 Hardware name: Juno (DT) task: fffffe09001a0000 ti: fffffe0900200000 task.ti: fffffe0900200000 PC is at __memset+0x4c/0x200 LR is at kasan_unpoison_shadow+0x34/0x50 pc : [<fffffc800846f1cc>] lr : [<fffffc800821ff54>] pstate: 00000245 sp : fffffe0900203db0 x29: fffffe0900203db0 x28: 0000000000000000 x27: 0000000000000000 x26: 0000000000000000 x25: fffffc80099b69d0 x24: 0000000000000001 x23: 0000000000000000 x22: 0000000000002000 x21: dffffc8000000000 x20: 1fffff9001350a8c x19: 0000000000002000 x18: 0000000000000008 x17: 0000000000000147 x16: ffffffffffffffff x15: 79746972100e041d x14: ffffff0000000000 x13: ffff000000000000 x12: 0000000000000000 x11: 0101010101010101 x10: 1fffffc11c000000 x9 : 0000000000000000 x8 : fffffc100135068c x7 : 0000000000000000 x6 : 000000000000003f x5 : 0000000000000040 x4 : 0000000000000004 x3 : fffffc100134f651 x2 : 0000000000000400 x1 : 0000000000000000 x0 : fffffc100135068c Process swapper/0 (pid: 1, stack limit = 0xfffffe0900200020) Call trace: [<fffffc800846f1cc>] __memset+0x4c/0x200 [<fffffc8008220044>] __asan_register_globals+0x5c/0xb0 [<fffffc8008a09d34>] _GLOBAL__sub_I_65535_1_sunrpc_cache_lookup+0x1c/0x28 [<fffffc8008f20d28>] kernel_init_freeable+0x104/0x274 [<fffffc80089e1948>] kernel_init+0x10/0xf8 [<fffffc8008093a00>] ret_from_fork+0x10/0x50 ------------------------------------------------------------------------------ This patch aligns kimg_shadow_start and kimg_shadow_end to SWAPPER_BLOCK_SIZE in all configurations. Fixes: f9040773b7bb ("arm64: move kernel image to base of vmalloc area") Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
2016-03-11arm64: kasan: Use actual memory node when populating the kernel image shadowCatalin Marinas1-1/+2
With the 16KB or 64KB page configurations, the generic vmemmap_populate() implementation warns on potential offnode page_structs via vmemmap_verify() because the arm64 kasan_init() passes NUMA_NO_NODE instead of the actual node for the kernel image memory. Fixes: f9040773b7bb ("arm64: move kernel image to base of vmalloc area") Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Reported-by: James Morse <james.morse@arm.com> Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Acked-by: Mark Rutland <mark.rutland@arm.com>
2016-03-09arm64: hugetlb: partial revert of 66b3923a1a0fWill Deacon1-14/+0
Commit 66b3923a1a0f ("arm64: hugetlb: add support for PTE contiguous bit") introduced support for huge pages using the contiguous bit in the PTE as opposed to block mappings, which may be slightly unwieldy (512M) in 64k page configurations. Unfortunately, this support has resulted in some late regressions when running the libhugetlbfs test suite with 64k pages and CONFIG_DEBUG_VM as a result of a BUG: | readback (2M: 64): ------------[ cut here ]------------ | kernel BUG at fs/hugetlbfs/inode.c:446! | Internal error: Oops - BUG: 0 [#1] SMP | Modules linked in: | CPU: 7 PID: 1448 Comm: readback Not tainted 4.5.0-rc7 #148 | Hardware name: linux,dummy-virt (DT) | task: fffffe0040964b00 ti: fffffe00c2668000 task.ti: fffffe00c2668000 | PC is at remove_inode_hugepages+0x44c/0x480 | LR is at remove_inode_hugepages+0x264/0x480 Rather than revert the entire patch, simply avoid advertising the contiguous huge page sizes for now while people are actively working on a fix. This patch can then be reverted once things have been sorted out. Cc: David Woods <dwoods@ezchip.com> Reported-by: Steve Capper <steve.capper@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-03-04Merge tag 'arm64-fixes' of ↵Linus Torvalds1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fix from Will Deacon: "Arm64 fix for -rc7. Without it, our struct page array can overflow the vmemmap region on systems with a large PHYS_OFFSET. Nothing else on the radar at the moment, so hopefully that's it for 4.5 from us. Summary: Ensure struct page array fits within vmemmap area" * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: vmemmap: use virtual projection of linear region
2016-03-04arm64: make mrs_s prefixing implicit in read_cpuidMark Rutland1-1/+1
Commit 0f54b14e76f5302a ("arm64: cpufeature: Change read_cpuid() to use sysreg's mrs_s macro") changed read_cpuid to require a SYS_ prefix on register names, to allow manual assembly of registers unknown by the toolchain, using tables in sysreg.h. This interacts poorly with commit 42b55734030c1f72 ("efi/arm64: Check for h/w support before booting a >4 KB granular kernel"), which is curretly queued via the tip tree, and uses read_cpuid without a SYS_ prefix. Due to this, a build of next-20160304 fails if EFI and 64K pages are selected. To avoid this issue when trees are merged, move the required SYS_ prefixing into read_cpuid, and revert all of the updated callsites to pass plain register names. This effectively reverts the bulk of commit 0f54b14e76f5302a. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: James Morse <james.morse@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-03-02arm64: mm: check at build time that PAGE_OFFSET divides the VA space evenlyArd Biesheuvel1-0/+7
Commit 8439e62a1561 ("arm64: mm: use bit ops rather than arithmetic in pa/va translations") changed the boundary check against PAGE_OFFSET from an arithmetic comparison to a bit test. This means we now silently assume that PAGE_OFFSET is a power of 2 that divides the kernel virtual address space into two equal halves. So make that assumption explicit. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-02-29arm64: mm: treat memstart_addr as a signed quantityArd Biesheuvel1-2/+2
Commit c031a4213c11 ("arm64: kaslr: randomize the linear region") implements randomization of the linear region, by subtracting a random multiple of PUD_SIZE from memstart_addr. This causes the virtual mapping of system RAM to move upwards in the linear region, and at the same time causes memstart_addr to assume a value which may be negative if the offset of system RAM in the physical space is smaller than its offset relative to PAGE_OFFSET in the virtual space. Since memstart_addr is effectively an offset now, redefine its type as s64 so that expressions involving shifting or division preserve its sign. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-02-29arm64: mm: list kernel sections in orderArd Biesheuvel1-2/+2
In the boot log, instead of listing .init first, list .text, .rodata, .init and .data in the same order they appear in memory Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-02-27mm: ASLR: use get_random_long()Daniel Cashman1-2/+2
Replace calls to get_random_int() followed by a cast to (unsigned long) with calls to get_random_long(). Also address shifting bug which, in case of x86 removed entropy mask for mmap_rnd_bits values > 31 bits. Signed-off-by: Daniel Cashman <dcashman@android.com> Acked-by: Kees Cook <keescook@chromium.org> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: David S. Miller <davem@davemloft.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Nick Kralevich <nnk@google.com> Cc: Jeff Vander Stoep <jeffv@google.com> Cc: Mark Salyzyn <salyzyn@android.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-02-26arm64: mm: dump: Use VA_START directly instead of private LOWEST_ADDRKefeng Wang1-3/+1
Use VA_START macro in asm/memory.h instead of private LOWEST_ADDR definition in dump.c. Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-02-26arm64: vmemmap: use virtual projection of linear regionArd Biesheuvel1-2/+2
Commit dd006da21646 ("arm64: mm: increase VA range of identity map") made some changes to the memory mapping code to allow physical memory to reside at an offset that exceeds the size of the virtual mapping. However, since the size of the vmemmap area is proportional to the size of the VA area, but it is populated relative to the physical space, we may end up with the struct page array being mapped outside of the vmemmap region. For instance, on my Seattle A0 box, I can see the following output in the dmesg log. vmemmap : 0xffffffbdc0000000 - 0xffffffbfc0000000 ( 8 GB maximum) 0xffffffbfc0000000 - 0xffffffbfd0000000 ( 256 MB actual) We can fix this by deciding that the vmemmap region is not a projection of the physical space, but of the virtual space above PAGE_OFFSET, i.e., the linear region. This way, we are guaranteed that the vmemmap region is of sufficient size, and we can even reduce the size by half. Cc: <stable@vger.kernel.org> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-02-26arm64: Add workaround for Cavium erratum 27456Andrew Pinski1-0/+12
On ThunderX T88 pass 1.x through 2.1 parts, broadcast TLBI instructions may cause the icache to become corrupted if it contains data for a non-current ASID. This patch implements the workaround (which invalidates the local icache when switching the mm) by using code patching. Signed-off-by: Andrew Pinski <apinski@cavium.com> Signed-off-by: David Daney <david.daney@cavium.com> Reviewed-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-02-26arm64: mm: Mark .rodata as ROJeremy Linton2-7/+16
Currently the .rodata section is actually still executable when DEBUG_RODATA is enabled. This changes that so the .rodata is actually read only, no execute. It also adds the .rodata section to the mem_init banner. Signed-off-by: Jeremy Linton <jeremy.linton@arm.com> Reviewed-by: Kees Cook <keescook@chromium.org> Acked-by: Mark Rutland <mark.rutland@arm.com> [catalin.marinas@arm.com: added vm_struct vmlinux_rodata in map_kernel()] Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-02-26arm64/mm: remove unnecessary boundary checkMiles Chen1-2/+0
Remove the unnecessary boundary check since there is a huge gap between user and kernel address that they would never overlap. (arm64 does not have enough levels of page tables to cover 64-bit virtual address) See Documentation/arm64/memory.txt Signed-off-by: Miles Chen <miles.chen@mediatek.com> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-02-25arm64: Rename cpuid_feature field extract routinesSuzuki K Poulose1-1/+1
Now that we have a clear understanding of the sign of a feature, rename the routines to reflect the sign, so that it is not misused. The cpuid_feature_extract_field() now accepts a 'sign' parameter. Signed-off-by: Suzuki K. Poulose <suzuki.poulose@arm.com> Acked-by: Will Deacon <will.deacon@arm.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-02-25arm64: Ensure the secondary CPUs have safe ASIDBits sizeSuzuki K Poulose1-0/+18
Adds a hook for checking whether a secondary CPU has the features used already by the kernel during early boot, based on the boot CPU and plugs in the check for ASID size. The ID_AA64MMFR0_EL1:ASIDBits determines the size of the mm context id and is used in the early boot to make decisions. The value is picked up from the Boot CPU and cannot be delayed until other CPUs are up. If a secondary CPU has a smaller size than that of the Boot CPU, things will break horribly and the usual SANITY check is not good enough to prevent the system from crashing. So, crash the system with enough information. Cc: Mark Rutland <mark.rutland@arm.com> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-02-25arm64: Add helper for extracting ASIDBitsSuzuki K Poulose1-13/+23
Add a helper to extract ASIDBits on the current cpu Cc: Mark Rutland <mark.rutland@arm.com> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-02-24arm64: kaslr: randomize the linear regionArd Biesheuvel1-2/+20
When KASLR is enabled (CONFIG_RANDOMIZE_BASE=y), and entropy has been provided by the bootloader, randomize the placement of RAM inside the linear region if sufficient space is available. For instance, on a 4KB granule/3 levels kernel, the linear region is 256 GB in size, and we can choose any 1 GB aligned offset that is far enough from the top of the address space to fit the distance between the start of the lowest memblock and the top of the highest memblock. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-02-24arm64: add support for kernel ASLRArd Biesheuvel2-12/+34
This adds support for KASLR is implemented, based on entropy provided by the bootloader in the /chosen/kaslr-seed DT property. Depending on the size of the address space (VA_BITS) and the page size, the entropy in the virtual displacement is up to 13 bits (16k/2 levels) and up to 25 bits (all 4 levels), with the sidenote that displacements that result in the kernel image straddling a 1GB/32MB/512MB alignment boundary (for 4KB/16KB/64KB granule kernels, respectively) are not allowed, and will be rounded up to an acceptable value. If CONFIG_RANDOMIZE_MODULE_REGION_FULL is enabled, the module region is randomized independently from the core kernel. This makes it less likely that the location of core kernel data structures can be determined by an adversary, but causes all function calls from modules into the core kernel to be resolved via entries in the module PLTs. If CONFIG_RANDOMIZE_MODULE_REGION_FULL is not enabled, the module region is randomized by choosing a page aligned 128 MB region inside the interval [_etext - 128 MB, _stext + 128 MB). This gives between 10 and 14 bits of entropy (depending on page size), independently of the kernel randomization, but still guarantees that modules are within the range of relative branch and jump instructions (with the caveat that, since the module region is shared with other uses of the vmalloc area, modules may need to be loaded further away if the module region is exhausted) Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-02-24arm64: switch to relative exception tablesArd Biesheuvel1-1/+1
Instead of using absolute addresses for both the exception location and the fixup, use offsets relative to the exception table entry values. Not only does this cut the size of the exception table in half, it is also a prerequisite for KASLR, since absolute exception table entries are subject to dynamic relocation, which is incompatible with the sorting of the exception table that occurs at build time. This patch also introduces the _ASM_EXTABLE preprocessor macro (which exists on x86 as well) and its _asm_extable assembly counterpart, as shorthands to emit exception table entries. Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-02-19arm64: User die() instead of panic() in do_page_fault()Catalin Marinas1-2/+2
The former gives better error reporting on unhandled permission faults (introduced by the UAO patches). Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-02-19arm64: mm: allow the kernel to handle alignment faults on user accessesEunTaik Lee1-1/+8
Although we don't expect to take alignment faults on access to normal memory, misbehaving (i.e. buggy) user code can pass MMIO pointers into system calls, leading to things like get_user accessing device memory. Rather than OOPS the kernel, allow any exception fixups to run and return something like -EFAULT back to userspace. This makes the behaviour more consistent with userspace, even though applications with access to device mappings can easily cause other issues if they try hard enough. Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Eun Taik Lee <eun.taik.lee@samsung.com> [will: dropped __kprobes annotation and rewrote commit mesage] Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-02-18arm64: allow kernel Image to be loaded anywhere in physical memoryArd Biesheuvel2-2/+64
This relaxes the kernel Image placement requirements, so that it may be placed at any 2 MB aligned offset in physical memory. This is accomplished by ignoring PHYS_OFFSET when installing memblocks, and accounting for the apparent virtual offset of the kernel Image. As a result, virtual address references below PAGE_OFFSET are correctly mapped onto physical references into the kernel Image regardless of where it sits in memory. Special care needs to be taken for dealing with memory limits passed via mem=, since the generic implementation clips memory top down, which may clip the kernel image itself if it is loaded high up in memory. To deal with this case, we simply add back the memory covering the kernel image, which may result in more memory to be retained than was passed as a mem= parameter. Since mem= should not be considered a production feature, a panic notifier handler is installed that dumps the memory limit at panic time if one was set. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-02-18arm64: defer __va translation of initrd_start and initrd_endArd Biesheuvel1-4/+9
Before deferring the assignment of memstart_addr in a subsequent patch, to the moment where all memory has been discovered and possibly clipped based on the size of the linear region and the presence of a mem= command line parameter, we need to ensure that memstart_addr is not used to perform __va translations before it is assigned. One such use is in the generic early DT discovery of the initrd location, which is recorded as a virtual address in the globals initrd_start and initrd_end. So wire up the generic support to declare the initrd addresses, and implement it without __va() translations, and perform the translation after memstart_addr has been assigned. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-02-18arm64: move kernel image to base of vmalloc areaArd Biesheuvel4-53/+119
This moves the module area to right before the vmalloc area, and moves the kernel image to the base of the vmalloc area. This is an intermediate step towards implementing KASLR, which allows the kernel image to be located anywhere in the vmalloc area. Since other subsystems such as hibernate may still need to refer to the kernel text or data segments via their linears addresses, both are mapped in the linear region as well. The linear alias of the text region is mapped read-only/non-executable to prevent inadvertent modification or execution. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-02-18arm64: decouple early fixmap init from linear mappingArd Biesheuvel1-10/+6
Since the early fixmap page tables are populated using pages that are part of the static footprint of the kernel, they are covered by the initial kernel mapping, and we can refer to them without using __va/__pa translations, which are tied to the linear mapping. Since the fixmap page tables are disjoint from the kernel mapping up to the top level pgd entry, we can refer to bm_pte[] directly, and there is no need to walk the page tables and perform __pa()/__va() translations at each step. Reviewed-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-02-18arm64: add support for ioremap() block mappingsArd Biesheuvel1-0/+41
This wires up the existing generic huge-vmap feature, which allows ioremap() to use PMD or PUD sized block mappings. It also adds support to the unmap path for dealing with block mappings, which will allow us to unmap the __init region using unmap_kernel_range() in a subsequent patch. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>