blaster4385/linux-IllusionX - Linux kernel with personal config changes for arch linux

Age	Commit message (Collapse)	Author	Files	Lines
2017-07-06	Merge tag 'for-linus-4.13-rc1-tag' of ↵	Linus Torvalds	13	-147/+259
	git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen updates from Juergen Gross: "Other than fixes and cleanups it contains: - support > 32 VCPUs at domain restore - support for new sysfs nodes related to Xen - some performance tuning for Linux running as Xen guest" * tag 'for-linus-4.13-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: x86/xen: allow userspace access during hypercalls x86: xen: remove unnecessary variable in xen_foreach_remap_area() xen: allocate page for shared info page from low memory xen: avoid deadlock in xenbus driver xen: add sysfs node for hypervisor build id xen: sync include/xen/interface/version.h xen: add sysfs node for guest type doc,xen: document hypervisor sysfs nodes for xen xen/vcpu: Handle xen_vcpu_setup() failure at boot xen/vcpu: Handle xen_vcpu_setup() failure in hotplug xen/pv: Fix OOPS on restore for a PV, !SMP domain xen/pvh*: Support > 32 VCPUs at domain restore xen/vcpu: Simplify xen_vcpu related code xen-evtchn: Bind dyn evtchn:qemu-dm interrupt to next online VCPU xen: avoid type warning in xchg_xen_ulong xen: fix HYPERVISOR_dm_op() prototype xen: don't print error message in case of missing Xenstore entry arm/xen: Adjust one function call together with a variable assignment arm/xen: Delete an error message for a failed memory allocation in __set_phys_to_machine_multi() arm/xen: Improve a size determination in __set_phys_to_machine_multi()
2017-07-06	Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm	Linus Torvalds	63	-562/+1855
	Pull KVM updates from Paolo Bonzini: "PPC: - Better machine check handling for HV KVM - Ability to support guests with threads=2, 4 or 8 on POWER9 - Fix for a race that could cause delayed recognition of signals - Fix for a bug where POWER9 guests could sleep with interrupts pending. ARM: - VCPU request overhaul - allow timer and PMU to have their interrupt number selected from userspace - workaround for Cavium erratum 30115 - handling of memory poisonning - the usual crop of fixes and cleanups s390: - initial machine check forwarding - migration support for the CMMA page hinting information - cleanups and fixes x86: - nested VMX bugfixes and improvements - more reliable NMI window detection on AMD - APIC timer optimizations Generic: - VCPU request overhaul + documentation of common code patterns - kvm_stat improvements" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (124 commits) Update my email address kvm: vmx: allow host to access guest MSR_IA32_BNDCFGS x86: kvm: mmu: use ept a/d in vmcs02 iff used in vmcs12 kvm: x86: mmu: allow A/D bits to be disabled in an mmu x86: kvm: mmu: make spte mmio mask more explicit x86: kvm: mmu: dead code thanks to access tracking KVM: PPC: Book3S: Fix typo in XICS-on-XIVE state saving code KVM: PPC: Book3S HV: Close race with testing for signals on guest entry KVM: PPC: Book3S HV: Simplify dynamic micro-threading code KVM: x86: remove ignored type attribute KVM: LAPIC: Fix lapic timer injection delay KVM: lapic: reorganize restart_apic_timer KVM: lapic: reorganize start_hv_timer kvm: nVMX: Check memory operand to INVVPID KVM: s390: Inject machine check into the nested guest KVM: s390: Inject machine check into the guest tools/kvm_stat: add new interactive command 'b' tools/kvm_stat: add new command line switch '-i' tools/kvm_stat: fix error on interactive command 'g' KVM: SVM: suppress unnecessary NMI singlestep on GIF=0 and nested exit ...
2017-07-06	mm: memcontrol: per-lruvec stats infrastructure	Johannes Weiner	3	-2/+1
	lruvecs are at the intersection of the NUMA node and memcg, which is the scope for most paging activity. Introduce a convenient accounting infrastructure that maintains statistics per node, per memcg, and the lruvec itself. Then convert over accounting sites for statistics that are already tracked in both nodes and memcgs and can be easily switched. [[email protected]: fix crash in the new cgroup stat keeping code] Link: http://lkml.kernel.org/r/[email protected] [[email protected]: don't track uncharged pages at all Link: http://lkml.kernel.org/r/[email protected] [[email protected]: add missing free_percpu()] Link: http://lkml.kernel.org/r/[email protected] [[email protected]: hexagon: fix build error caused by include file order] Link: http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Johannes Weiner <[email protected]> Signed-off-by: Guenter Roeck <[email protected]> Acked-by: Vladimir Davydov <[email protected]> Cc: Josef Bacik <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Rik van Riel <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-07-06	mm/hugetlb: allow architectures to override huge_pte_clear()	Punit Agrawal	1	-1/+1
	When unmapping a hugepage range, huge_pte_clear() is used to clear the page table entries that are marked as not present. huge_pte_clear() internally just ends up calling pte_clear() which does not correctly deal with hugepages consisting of contiguous page table entries. Add a size argument to address this issue and allow architectures to override huge_pte_clear() by wrapping it in a #ifndef block. Update s390 implementation with the size parameter as well. Note that the change only affects huge_pte_clear() - the other generic hugetlb functions don't need any change. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Punit Agrawal <[email protected]> Acked-by: Martin Schwidefsky <[email protected]> [s390 bits] Cc: Heiko Carstens <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: "Aneesh Kumar K.V" <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Will Deacon <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Cc: Steve Capper <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Hillf Danton <[email protected]> Cc: Michal Hocko <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-07-06	mm/hugetlb: add size parameter to huge_pte_offset()	Punit Agrawal	11	-12/+20
	A poisoned or migrated hugepage is stored as a swap entry in the page tables. On architectures that support hugepages consisting of contiguous page table entries (such as on arm64) this leads to ambiguity in determining the page table entry to return in huge_pte_offset() when a poisoned entry is encountered. Let's remove the ambiguity by adding a size parameter to convey additional information about the requested address. Also fixup the definition/usage of huge_pte_offset() throughout the tree. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Punit Agrawal <[email protected]> Acked-by: Steve Capper <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Will Deacon <[email protected]> Cc: Tony Luck <[email protected]> Cc: Fenghua Yu <[email protected]> Cc: James Hogan <[email protected]> (odd fixer:METAG ARCHITECTURE) Cc: Ralf Baechle <[email protected]> (supporter:MIPS) Cc: "James E.J. Bottomley" <[email protected]> Cc: Helge Deller <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Martin Schwidefsky <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Yoshinori Sato <[email protected]> Cc: Rich Felker <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Chris Metcalf <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Alexander Viro <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: "Aneesh Kumar K.V" <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Cc: Hillf Danton <[email protected]> Cc: Mark Rutland <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-07-06	arm64: hugetlb: remove spurious calls to huge_ptep_offset()	Steve Capper	1	-23/+14
	We don't need to call huge_ptep_offset as our accessors are already supplied with the pte_t *. This patch removes those spurious calls. [[email protected]: resolve rebase conflicts due to patch re-ordering] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Steve Capper <[email protected]> Signed-off-by: Punit Agrawal <[email protected]> Cc: David Woods <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Aneesh Kumar K.V <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Hillf Danton <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Mike Kravetz <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-07-06	arm64: hugetlb: refactor find_num_contig()	Steve Capper	1	-9/+8
	Patch series "Support for contiguous pte hugepages", v4. This patchset updates the hugetlb code to fix issues arising from contiguous pte hugepages (such as on arm64). Compared to v3, This version addresses a build failure on arm64 by including two cleanup patches. Other than the arm64 cleanups, the rest are generic code changes. The remaining arm64 support based on these patches will be posted separately. The patches are based on v4.12-rc2. Previous related postings can be found at [0], [1], [2], and [3]. The patches fall into three categories - * Patch 1-2 - arm64 cleanups required to greatly simplify changing huge_pte_offset() prototype in Patch 5. Catalin, Will - are you happy for these patches to go via mm? * Patches 3-4 address issues with gup * Patches 5-8 relate to passing a size argument to hugepage helpers to disambiguate the size of the referred page. These changes are required to enable arch code to properly handle swap entries for contiguous pte hugepages. The changes to huge_pte_offset() (patch 5) touch multiple architectures but I've managed to minimise these changes for the other affected functions - huge_pte_clear() and set_huge_pte_at(). These patches gate the enabling of contiguous hugepages support on arm64 which has been requested for systems using !4k page granule. The ARM64 architecture supports two flavours of hugepages - * Block mappings at the pud/pmd level These are regular hugepages where a pmd or a pud page table entry points to a block of memory. Depending on the PAGE_SIZE in use the following size of block mappings are supported - PMD PUD --- --- 4K: 2M 1G 16K: 32M 64K: 512M For certain applications/usecases such as HPC and large enterprise workloads, folks are using 64k page size but the minimum hugepage size of 512MB isn't very practical. To overcome this ... * Using the Contiguous bit The architecture provides a contiguous bit in the translation table entry which acts as a hint to the mmu to indicate that it is one of a contiguous set of entries that can be cached in a single TLB entry. We use the contiguous bit in Linux to increase the mapping size at the pmd and pte (last) level. The number of supported contiguous entries varies by page size and level of the page table. Using the contiguous bit allows additional hugepage sizes - CONT PTE PMD CONT PMD PUD -------- --- -------- --- 4K: 64K 2M 32M 1G 16K: 2M 32M 1G 64K: 2M 512M 16G Of these, 64K with 4K and 2M with 64K pages have been explicitly requested by a few different users. Entries with the contiguous bit set are required to be modified all together - which makes things like memory poisoning and migration impossible to do correctly without knowing the size of hugepage being dealt with - the reason for adding size parameter to a few of the hugepage helpers in this series. This patch (of 8): As we regularly check for contiguous pte's in the huge accessors, remove this extra check from find_num_contig. [[email protected]: resolve rebase conflicts due to patch re-ordering] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Steve Capper <[email protected]> Signed-off-by: Punit Agrawal <[email protected]> Cc: David Woods <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Aneesh Kumar K.V <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Hillf Danton <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Mike Kravetz <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-07-06	powerpc/mm/hugetlb: add support for 1G huge pages	Aneesh Kumar K.V	3	-2/+16
	POWER9 supports hugepages of size 2M and 1G in radix MMU mode. This patch enables the usage of 1G page size for hugetlbfs. This also update the helper such we can do 1G page allocation at runtime. We still don't enable 1G page size on DD1 version. This is to avoid doing workaround mentioned in commit 6d3a0379ebdc ("powerpc/mm: Add radix__tlb_flush_pte_p9_dd1()"). Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Aneesh Kumar K.V <[email protected]> Cc: Michael Ellerman <[email protected]> (powerpc) Cc: Anshuman Khandual <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-07-06	mm/hugetlb: clean up ARCH_HAS_GIGANTIC_PAGE	Aneesh Kumar K.V	6	-3/+14
	This moves the #ifdef in C code to a Kconfig dependency. Also we move the gigantic_page_supported() function to be arch specific. This allows architectures to conditionally enable runtime allocation of gigantic huge page. Architectures like ppc64 supports different gigantic huge page size (16G and 1G) based on the translation mode selected. This provides an opportunity for ppc64 to enable runtime allocation only w.r.t 1G hugepage. No functional change in this patch. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Aneesh Kumar K.V <[email protected]> Cc: Michael Ellerman <[email protected]> (powerpc) Cc: Anshuman Khandual <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-07-06	powerpc/hugetlb: enable hugetlb migration for ppc64	Aneesh Kumar K.V	1	-0/+5
	Link: http://lkml.kernel.org/r/1494926612-23928-10-git-send-email-aneesh.kumar@linux.vnet.ibm.com Signed-off-by: Aneesh Kumar K.V <[email protected]> Cc: Anshuman Khandual <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Mike Kravetz <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-07-06	powerpc/mm/hugetlb: remove follow_huge_addr for powerpc	Aneesh Kumar K.V	1	-64/+0
	With generic code now handling hugetlb entries at pgd level and also supporting hugepage directory format, we can now remove the powerpc sepcific follow_huge_addr implementation. Link: http://lkml.kernel.org/r/1494926612-23928-9-git-send-email-aneesh.kumar@linux.vnet.ibm.com Signed-off-by: Aneesh Kumar K.V <[email protected]> Cc: Anshuman Khandual <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Mike Kravetz <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-07-06	powerpc/hugetlb: add follow_huge_pd implementation for ppc64	Aneesh Kumar K.V	1	-0/+43
	Link: http://lkml.kernel.org/r/1494926612-23928-8-git-send-email-aneesh.kumar@linux.vnet.ibm.com Signed-off-by: Aneesh Kumar K.V <[email protected]> Cc: Anshuman Khandual <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Mike Kravetz <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-07-06	mm, memory_hotplug: replace for_device by want_memblock in arch_add_memory	Michal Hocko	6	-12/+12
	arch_add_memory gets for_device argument which then controls whether we want to create memblocks for created memory sections. Simplify the logic by telling whether we want memblocks directly rather than going through pointless negation. This also makes the api easier to understand because it is clear what we want rather than nothing telling for_device which can mean anything. This shouldn't introduce any functional change. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Michal Hocko <[email protected]> Tested-by: Dan Williams <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Balbir Singh <[email protected]> Cc: Daniel Kiper <[email protected]> Cc: David Rientjes <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Igor Mammedov <[email protected]> Cc: Jerome Glisse <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Martin Schwidefsky <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Reza Arbab <[email protected]> Cc: Tobias Regnery <[email protected]> Cc: Toshi Kani <[email protected]> Cc: Vitaly Kuznetsov <[email protected]> Cc: Xishi Qiu <[email protected]> Cc: Yasuaki Ishimatsu <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-07-06	mm, memory_hotplug: do not associate hotadded memory to zones until online	Michal Hocko	6	-64/+7
	The current memory hotplug implementation relies on having all the struct pages associate with a zone/node during the physical hotplug phase (arch_add_memory->__add_pages->__add_section->__add_zone). In the vast majority of cases this means that they are added to ZONE_NORMAL. This has been so since 9d99aaa31f59 ("[PATCH] x86_64: Support memory hotadd without sparsemem") and it wasn't a big deal back then because movable onlining didn't exist yet. Much later memory hotplug wanted to (ab)use ZONE_MOVABLE for movable onlining 511c2aba8f07 ("mm, memory-hotplug: dynamic configure movable memory and portion memory") and then things got more complicated. Rather than reconsidering the zone association which was no longer needed (because the memory hotplug already depended on SPARSEMEM) a convoluted semantic of zone shifting has been developed. Only the currently last memblock or the one adjacent to the zone_movable can be onlined movable. This essentially means that the online type changes as the new memblocks are added. Let's simulate memory hot online manually $ echo 0x100000000 > /sys/devices/system/memory/probe $ grep . /sys/devices/system/memory/memory32/valid_zones Normal Movable $ echo $((0x100000000+(128<<20))) > /sys/devices/system/memory/probe $ grep . /sys/devices/system/memory/memory3?/valid_zones /sys/devices/system/memory/memory32/valid_zones:Normal /sys/devices/system/memory/memory33/valid_zones:Normal Movable $ echo $((0x100000000+2*(128<<20))) > /sys/devices/system/memory/probe $ grep . /sys/devices/system/memory/memory3?/valid_zones /sys/devices/system/memory/memory32/valid_zones:Normal /sys/devices/system/memory/memory33/valid_zones:Normal /sys/devices/system/memory/memory34/valid_zones:Normal Movable $ echo online_movable > /sys/devices/system/memory/memory34/state $ grep . /sys/devices/system/memory/memory3?/valid_zones /sys/devices/system/memory/memory32/valid_zones:Normal /sys/devices/system/memory/memory33/valid_zones:Normal Movable /sys/devices/system/memory/memory34/valid_zones:Movable Normal This is an awkward semantic because an udev event is sent as soon as the block is onlined and an udev handler might want to online it based on some policy (e.g. association with a node) but it will inherently race with new blocks showing up. This patch changes the physical online phase to not associate pages with any zone at all. All the pages are just marked reserved and wait for the onlining phase to be associated with the zone as per the online request. There are only two requirements - existing ZONE_NORMAL and ZONE_MOVABLE cannot overlap - ZONE_NORMAL precedes ZONE_MOVABLE in physical addresses the latter one is not an inherent requirement and can be changed in the future. It preserves the current behavior and made the code slightly simpler. This is subject to change in future. This means that the same physical online steps as above will lead to the following state: Normal Movable /sys/devices/system/memory/memory32/valid_zones:Normal Movable /sys/devices/system/memory/memory33/valid_zones:Normal Movable /sys/devices/system/memory/memory32/valid_zones:Normal Movable /sys/devices/system/memory/memory33/valid_zones:Normal Movable /sys/devices/system/memory/memory34/valid_zones:Normal Movable /sys/devices/system/memory/memory32/valid_zones:Normal Movable /sys/devices/system/memory/memory33/valid_zones:Normal Movable /sys/devices/system/memory/memory34/valid_zones:Movable Implementation: The current move_pfn_range is reimplemented to check the above requirements (allow_online_pfn_range) and then updates the respective zone (move_pfn_range_to_zone), the pgdat and links all the pages in the pfn range with the zone/node. __add_pages is updated to not require the zone and only initializes sections in the range. This allowed to simplify the arch_add_memory code (s390 could get rid of quite some of code). devm_memremap_pages is the only user of arch_add_memory which relies on the zone association because it only hooks into the memory hotplug only half way. It uses it to associate the new memory with ZONE_DEVICE but doesn't allow it to be {on,off}lined via sysfs. This means that this particular code path has to call move_pfn_range_to_zone explicitly. The original zone shifting code is kept in place and will be removed in the follow up patch for an easier review. Please note that this patch also changes the original behavior when offlining a memory block adjacent to another zone (Normal vs. Movable) used to allow to change its movable type. This will be handled later. [[email protected]: simplify zone_intersects()] Link: http://lkml.kernel.org/r/[email protected] [[email protected]: remove duplicate call for set_page_links] Link: http://lkml.kernel.org/r/[email protected] [[email protected]: remove unused local `i'] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Michal Hocko <[email protected]> Signed-off-by: Wei Yang <[email protected]> Tested-by: Dan Williams <[email protected]> Tested-by: Reza Arbab <[email protected]> Acked-by: Heiko Carstens <[email protected]> # For s390 bits Acked-by: Vlastimil Babka <[email protected]> Cc: Martin Schwidefsky <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Balbir Singh <[email protected]> Cc: Daniel Kiper <[email protected]> Cc: David Rientjes <[email protected]> Cc: Igor Mammedov <[email protected]> Cc: Jerome Glisse <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Tobias Regnery <[email protected]> Cc: Toshi Kani <[email protected]> Cc: Vitaly Kuznetsov <[email protected]> Cc: Xishi Qiu <[email protected]> Cc: Yasuaki Ishimatsu <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-07-06	mm, memory_hotplug: get rid of is_zone_device_section	Michal Hocko	6	-6/+6
	Device memory hotplug hooks into regular memory hotplug only half way. It needs memory sections to track struct pages but there is no need/desire to associate those sections with memory blocks and export them to the userspace via sysfs because they cannot be onlined anyway. This is currently expressed by for_device argument to arch_add_memory which then makes sure to associate the given memory range with ZONE_DEVICE. register_new_memory then relies on is_zone_device_section to distinguish special memory hotplug from the regular one. While this works now, later patches in this series want to move __add_zone outside of arch_add_memory path so we have to come up with something else. Add want_memblock down the __add_pages path and use it to control whether the section->memblock association should be done. arch_add_memory then just trivially want memblock for everything but for_device hotplug. remove_memory_section doesn't need is_zone_device_section either. We can simply skip all the memblock specific cleanup if there is no memblock for the given section. This shouldn't introduce any functional change. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Michal Hocko <[email protected]> Tested-by: Dan Williams <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Balbir Singh <[email protected]> Cc: Daniel Kiper <[email protected]> Cc: David Rientjes <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Igor Mammedov <[email protected]> Cc: Jerome Glisse <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Martin Schwidefsky <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Reza Arbab <[email protected]> Cc: Tobias Regnery <[email protected]> Cc: Toshi Kani <[email protected]> Cc: Vitaly Kuznetsov <[email protected]> Cc: Xishi Qiu <[email protected]> Cc: Yasuaki Ishimatsu <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-07-06	mm, THP, swap: delay splitting THP during swap out	Huang Ying	1	-0/+1
	Patch series "THP swap: Delay splitting THP during swapping out", v11. This patchset is to optimize the performance of Transparent Huge Page (THP) swap. Recently, the performance of the storage devices improved so fast that we cannot saturate the disk bandwidth with single logical CPU when do page swap out even on a high-end server machine. Because the performance of the storage device improved faster than that of single logical CPU. And it seems that the trend will not change in the near future. On the other hand, the THP becomes more and more popular because of increased memory size. So it becomes necessary to optimize THP swap performance. The advantages of the THP swap support include: - Batch the swap operations for the THP to reduce lock acquiring/releasing, including allocating/freeing the swap space, adding/deleting to/from the swap cache, and writing/reading the swap space, etc. This will help improve the performance of the THP swap. - The THP swap space read/write will be 2M sequential IO. It is particularly helpful for the swap read, which are usually 4k random IO. This will improve the performance of the THP swap too. - It will help the memory fragmentation, especially when the THP is heavily used by the applications. The 2M continuous pages will be free up after THP swapping out. - It will improve the THP utilization on the system with the swap turned on. Because the speed for khugepaged to collapse the normal pages into the THP is quite slow. After the THP is split during the swapping out, it will take quite long time for the normal pages to collapse back into the THP after being swapped in. The high THP utilization helps the efficiency of the page based memory management too. There are some concerns regarding THP swap in, mainly because possible enlarged read/write IO size (for swap in/out) may put more overhead on the storage device. To deal with that, the THP swap in should be turned on only when necessary. For example, it can be selected via "always/never/madvise" logic, to be turned on globally, turned off globally, or turned on only for VMA with MADV_HUGEPAGE, etc. This patchset is the first step for the THP swap support. The plan is to delay splitting THP step by step, finally avoid splitting THP during the THP swapping out and swap out/in the THP as a whole. As the first step, in this patchset, the splitting huge page is delayed from almost the first step of swapping out to after allocating the swap space for the THP and adding the THP into the swap cache. This will reduce lock acquiring/releasing for the locks used for the swap cache management. With the patchset, the swap out throughput improves 15.5% (from about 3.73GB/s to about 4.31GB/s) in the vm-scalability swap-w-seq test case with 8 processes. The test is done on a Xeon E5 v3 system. The swap device used is a RAM simulated PMEM (persistent memory) device. To test the sequential swapping out, the test case creates 8 processes, which sequentially allocate and write to the anonymous pages until the RAM and part of the swap device is used up. This patch (of 5): In this patch, splitting huge page is delayed from almost the first step of swapping out to after allocating the swap space for the THP (Transparent Huge Page) and adding the THP into the swap cache. This will batch the corresponding operation, thus improve THP swap out throughput. This is the first step for the THP swap optimization. The plan is to delay splitting the THP step by step and avoid splitting the THP finally. In this patch, one swap cluster is used to hold the contents of each THP swapped out. So, the size of the swap cluster is changed to that of the THP (Transparent Huge Page) on x86_64 architecture (512). For other architectures which want such THP swap optimization, ARCH_USES_THP_SWAP_CLUSTER needs to be selected in the Kconfig file for the architecture. In effect, this will enlarge swap cluster size by 2 times on x86_64. Which may make it harder to find a free cluster when the swap space becomes fragmented. So that, this may reduce the continuous swap space allocation and sequential write in theory. The performance test in 0day shows no regressions caused by this. In the future of THP swap optimization, some information of the swapped out THP (such as compound map count) will be recorded in the swap_cluster_info data structure. The mem cgroup swap accounting functions are enhanced to support charge or uncharge a swap cluster backing a THP as a whole. The swap cluster allocate/free functions are added to allocate/free a swap cluster for a THP. A fair simple algorithm is used for swap cluster allocation, that is, only the first swap device in priority list will be tried to allocate the swap cluster. The function will fail if the trying is not successful, and the caller will fallback to allocate a single swap slot instead. This works good enough for normal cases. If the difference of the number of the free swap clusters among multiple swap devices is significant, it is possible that some THPs are split earlier than necessary. For example, this could be caused by big size difference among multiple swap devices. The swap cache functions is enhanced to support add/delete THP to/from the swap cache as a set of (HPAGE_PMD_NR) sub-pages. This may be enhanced in the future with multi-order radix tree. But because we will split the THP soon during swapping out, that optimization doesn't make much sense for this first step. The THP splitting functions are enhanced to support to split THP in swap cache during swapping out. The page lock will be held during allocating the swap cluster, adding the THP into the swap cache and splitting the THP. So in the code path other than swapping out, if the THP need to be split, the PageSwapCache(THP) will be always false. The swap cluster is only available for SSD, so the THP swap optimization in this patchset has no effect for HDD. [[email protected]: fix two issues in THP optimize patch] Link: http://lkml.kernel.org/r/[email protected] [[email protected]: extensive cleanups and simplifications, reduce code size] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: "Huang, Ying" <[email protected]> Signed-off-by: Johannes Weiner <[email protected]> Suggested-by: Andrew Morton <[email protected]> [for config option] Acked-by: Kirill A. Shutemov <[email protected]> [for changes in huge_memory.c and huge_mm.h] Cc: Andrea Arcangeli <[email protected]> Cc: Ebru Akagunduz <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Shaohua Li <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Rik van Riel <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-07-06	tile: provide default ioremap declaration	Logan Gunthorpe	1	-0/+11
	Add a default ioremap function which was not provided in all circumstances. (Only when CONFIG_PCI and CONFIG_TILEGX was set). I have designs to use them in scatterlist.c where they'd likely never be called with this architecture, but it is needed to compile. Thus, if the function is ever hit it returns NULL. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Logan Gunthorpe <[email protected]> Signed-off-by: Stephen Bates <[email protected]> Cc: Chris Metcalf <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Al Viro <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-07-06	mn10300: use generic fb.h	Tobias Klauser	2	-23/+1
	The mn10300 arch uses a verbatim copy of the asm-generic version and does not add any own implementations to the header, so use asm-generic/fb.h instead of duplicating code. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Tobias Klauser <[email protected]> Reviewed-by: David Howells <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-07-06	mn10300: remove wrapper header for asm/device.h	Tobias Klauser	2	-1/+1
	mn10300's asm/device.h is merely including asm-generic/device.h. Thus, the arch specific header can be omitted and the generic header can be used directly. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Tobias Klauser <[email protected]> Cc: David Howells <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-07-06	Merge tag 'pinctrl-v4.13-1' of ↵	Linus Torvalds	13	-912/+284
	git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl Pull pin control updates from Linus Walleij: "This is the big bulk of pin control changes for the v4.13 series: Core: - The documentation is moved over to RST. - We now have agreed bindings for enabling input and output buffers without actually enabling input and/or output on a pin. We are chiseling out some details of pin control electronics. New drivers: - ZTE ZX - Renesas RZA1 - MIPS Ingenic JZ47xx: also switch over existing drivers in the tree to use this pin controller and consolidate earlier spread out code. - Microschip MCP23S08: this driver is migrated from the GPIO subsystem and totally rewritten to use proper pin control. All users are switched over. New subdrivers: - Renesas R8A7743 and R8A7745. - Allwinner Sunxi A83T R_PIO. - Marvell MVEBU Armada CP110 and AP806. - Intel Cannon Lake PCH. - Qualcomm IPQ8074. Notable improvements: - IRQ support on the Marvell MVEBU Armada 37xx. - Meson driver supports HDMI CEC, AO, I2S, SPDIF and PWM. - Rockchip driver now supports iomux-route switching for RK3228, RK3328 and RK3399. - Rockchip A10 and A20 are merged into a single driver. - STM32 has improved GPIO support. - Samsung Exynos drivers are split per ARMv7 and ARMv8. - Marvell MVEBU is converted to use regmap for register access. Maintenance: - Several Renesas SH-PFC refactorings and updates. - Serious code size cut for Mediatek MT7623. - Misc janitorial and MAINTAINERS fixes" * tag 'pinctrl-v4.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: (137 commits) pinctrl: samsung: Remove bogus irq_[un]mask from resource management pinctrl: rza1: make structures rza1_gpiochip_template and rza1_pinmux_ops static pinctrl: rza1: Remove unneeded wrong check for wrong variable pinctrl: qcom: Add ipq8074 pinctrl driver pinctrl: freescale: imx7d: make of_device_ids const. pinctrl: DT: extend the pinmux property to support integers array pinctrl: generic: Add output-enable property pinctrl: armada-37xx: Fix number of pin in sdio_sb pinctrl: armada-37xx: Fix uart2 group selection register mask pinctrl: bcm2835: Avoid warning from __irq_do_set_handler pinctrl: sh-pfc: r8a7795: Add PWM support MAINTAINERS: Add Qualcomm pinctrl drivers section arm: dts: dt-bindings: Add Renesas RZ/A1 pinctrl header dt-bindings: pinctrl: Add RZ/A1 bindings doc pinctrl: Renesas RZ/A1 pin and gpio controller pinctrl: sh-pfc: r8a7792: Add SCIF1 and SCIF2 pin groups pinctrl.txt: move it to the driver-api book pinctrl: ingenic: checking for NULL instead of IS_ERR() pinctrl: uniphier: fix WARN_ON() of pingroups dump on LD20 pinctrl: uniphier: fix WARN_ON() of pingroups dump on LD11 ...
2017-07-06	video: adp8870: move header file out of I2C realm	Wolfram Sang	1	-1/+1
	include/linux/i2c is not for client devices. Move the header file to a more appropriate location. Signed-off-by: Wolfram Sang <[email protected]> Acked-by: Daniel Thompson <[email protected]> Acked-by: Bartlomiej Zolnierkiewicz <[email protected]> Signed-off-by: Lee Jones <[email protected]>
2017-07-06	backlight: adp8860: Move header file out of I2C realm	Wolfram Sang	1	-1/+1
	include/linux/i2c is not for client devices. Move the header file to a more appropriate location. Signed-off-by: Wolfram Sang <[email protected]> Acked-by: Daniel Thompson <[email protected]> Acked-by: Michael Hennerich <[email protected]> Signed-off-by: Lee Jones <[email protected]>
2017-07-05	Merge branch 'parisc-4.13-2' of ↵	Linus Torvalds	1	-2/+0
	git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux Pull another parisc update from Helge Deller: "Christoph Hellwig provided one patch for the parisc architecture to drop the DMA_ERROR_CODE define from the parisc architecture" * 'parisc-4.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux: parisc: ->mapping_error
2017-07-05	Merge tag 'arm64-upstream' of ↵	Linus Torvalds	44	-207/+912
	git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Will Deacon: - RAS reporting via GHES/APEI (ACPI) - Indirect ftrace trampolines for modules - Improvements to kernel fault reporting - Page poisoning - Sigframe cleanups and preparation for SVE context - Core dump fixes - Sparse fixes (mainly relating to endianness) - xgene SoC PMU v3 driver - Misc cleanups and non-critical fixes * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (75 commits) arm64: fix endianness annotation for 'struct jit_ctx' and friends arm64: cpuinfo: constify attribute_group structures. arm64: ptrace: Fix incorrect get_user() use in compat_vfp_set() arm64: ptrace: Remove redundant overrun check from compat_vfp_set() arm64: ptrace: Avoid setting compat FP[SC]R to garbage if get_user fails arm64: fix endianness annotation for __apply_alternatives()/get_alt_insn() arm64: fix endianness annotation in get_kaslr_seed() arm64: add missing conversion to __wsum in ip_fast_csum() arm64: fix endianness annotation in acpi_parking_protocol.c arm64: use readq() instead of readl() to read 64bit entry_point arm64: fix endianness annotation for reloc_insn_movw() & reloc_insn_imm() arm64: fix endianness annotation for aarch64_insn_write() arm64: fix endianness annotation in aarch64_insn_read() arm64: fix endianness annotation in call_undef_hook() arm64: fix endianness annotation for debug-monitors.c ras: mark stub functions as 'inline' arm64: pass endianness info to sparse arm64: ftrace: fix !CONFIG_ARM64_MODULE_PLTS kernels arm64: signal: Allow expansion of the signal frame acpi: apei: check for pending errors when probing GHES entries ...
2017-07-05	um: add dummy ioremap and iounmap functions	Logan Gunthorpe	1	-0/+17
	The user mode architecture does not provide ioremap or iounmap, and because of this, the arch won't build when the functions are used in some core libraries. I have designs to use these functions in scatterlist.c where they'd almost certainly never be called on the um architecture but it does need to compile. Signed-off-by: Logan Gunthorpe <[email protected]> Signed-off-by: Stephen Bates <[email protected]> Signed-off-by: Richard Weinberger <[email protected]>
2017-07-05	um: Allow building and running on older hosts	Florian Fainelli	2	-4/+12
	Commit a78ff1112263 ("um: add extended processor state save/restore support") and b6024b21fec8 ("um: extend fpstate to _xstate to support YMM registers") forced the use of the x86 FP _xstate and PTRACE_GETREGSET/SETREGSET. On older hosts, we would neither be able to build UML nor run it anymore with these two commits applied because we don't have definitions for struct _xstate nor these two ptrace requests. We can determine at build time which fp context structure to check against, just like we can keep using the old i387 fp save/restore if PTRACE_GETRESET/SETREGSET are not defined. Fixes: a78ff1112263 ("um: add extended processor state save/restore support") Fixes: b6024b21fec8 ("um: extend fpstate to _xstate to support YMM registers") Signed-off-by: Florian Fainelli <[email protected]> Signed-off-by: Richard Weinberger <[email protected]>
2017-07-05	um: Avoid longjmp/setjmp symbol clashes with libpthread.a	Florian Fainelli	3	-16/+20
	Building a statically linked UML kernel on a Centos 6.9 host resulted in the following linking failure (GCC 4.4, glibc-2.12): /usr/lib/gcc/x86_64-redhat-linux/4.4.7/../../../../lib64/libpthread.a(libpthread.o): In function `siglongjmp': (.text+0x8490): multiple definition of `longjmp' arch/x86/um/built-in.o:/local/users/fainelli/openwrt/trunk/build_dir/target-x86_64_musl/linux-uml/linux-4.4.69/arch/x86/um/setjmp_64.S:44: first defined here /usr/lib/gcc/x86_64-redhat-linux/4.4.7/../../../../lib64/libpthread.a(libpthread.o): In function `sem_open': (.text+0x77cd): warning: the use of `mktemp' is dangerous, better use `mkstemp' collect2: ld returned 1 exit status make[4]: *** [vmlinux] Error 1 Adopt a solution similar to the one done for vmap where we define longjmp/setjmp to be kernel_longjmp/setjmp. In the process, make sure we do rename the functions in arch/x86/um/setjmp_*.S accordingly. Fixes: a7df4716d195 ("um: link with -lpthread") Signed-off-by: Florian Fainelli <[email protected]> Signed-off-by: Richard Weinberger <[email protected]>
2017-07-05	um: console: Ignore console= option	Masami Hiramatsu	1	-0/+3
	Ignore linux kernel's console= option at uml's console option handler. Since uml's con= option is only for setting up new console, and Linux kernel's console= option specify to which console kernel output its message, we can use both option for different purpose. Signed-off-by: Masami Hiramatsu <[email protected]> Signed-off-by: Richard Weinberger <[email protected]>
2017-07-05	um: Use os_warn to print out pre-boot warning/error messages	Masami Hiramatsu	8	-24/+26
	Use os_warn() instead of printf/fprintf to print out pre-boot warning/error messages to stderr. Note that the help message and version message are kept to print out to stdout, because user explicitly specifies those options to get such information. Signed-off-by: Masami Hiramatsu <[email protected]> Signed-off-by: Richard Weinberger <[email protected]>
2017-07-05	um: Add os_warn() for pre-boot warning/error messages	Masami Hiramatsu	2	-0/+11
	Add os_warn() for printing out pre-boot warning/error messages in stderr. The messages via os_warn() are not suppressed by quiet option. Signed-off-by: Masami Hiramatsu <[email protected]> Signed-off-by: Richard Weinberger <[email protected]>
2017-07-05	um: Use os_info for the messages on normal path	Masami Hiramatsu	4	-27/+28
	Use os_info() for printing out the messages on the normal execution path. Signed-off-by: Masami Hiramatsu <[email protected]> Signed-off-by: Richard Weinberger <[email protected]>
2017-07-05	um: Add os_info() for pre-boot information messages	Masami Hiramatsu	2	-0/+27
	Add os_info() for printing out pre-boot information level messages in stderr. The messages via os_info() are suppressed by "quiet" kernel command line. Signed-off-by: Masami Hiramatsu <[email protected]> Signed-off-by: Richard Weinberger <[email protected]>
2017-07-05	um: Use printk instead of printf in make_uml_dir	Masami Hiramatsu	1	-4/+7
	Since this function will be called after printk buffer initialized, use printk as other functions do. Signed-off-by: Masami Hiramatsu <[email protected]> Signed-off-by: Richard Weinberger <[email protected]>
2017-07-05	Merge branch 'work.sys_wait' of ↵	Linus Torvalds	1	-39/+14
	git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull wait syscall updates from Al Viro: "Consolidating sys_wait* and compat counterparts. Gets rid of set_fs()/double-copy mess, simplifies the whole thing (lifting the copyouts to the syscalls means less headache in the part that does actual work - fewer failure exits, to start with), gets rid of the overhead of field-by-field __put_user()" * 'work.sys_wait' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: osf_wait4: switch to kernel_wait4() waitid(): switch copyout of siginfo to unsafe_put_user() wait_task_zombie: consolidate info logics kill wait_noreap_copyout() lift getrusage() from wait_noreap_copyout() waitid(2): leave copyout of siginfo to syscall itself kernel_wait4()/kernel_waitid(): delay copying status to userland wait4(2)/waitid(2): separate copying rusage to userland move compat wait4 and waitid next to native variants
2017-07-05	parisc: ->mapping_error	Christoph Hellwig	1	-2/+0
	DMA_ERROR_CODE already went away in linux-next, but parisc unfortunately added a new instance of it without any review as far as I can tell. Move the two iommu drivers to report errors through ->mapping_error. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Helge Deller <[email protected]>
2017-07-05	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next	Linus Torvalds	22	-150/+172
	Pull networking updates from David Miller: "Reasonably busy this cycle, but perhaps not as busy as in the 4.12 merge window: 1) Several optimizations for UDP processing under high load from Paolo Abeni. 2) Support pacing internally in TCP when using the sch_fq packet scheduler for this is not practical. From Eric Dumazet. 3) Support mutliple filter chains per qdisc, from Jiri Pirko. 4) Move to 1ms TCP timestamp clock, from Eric Dumazet. 5) Add batch dequeueing to vhost_net, from Jason Wang. 6) Flesh out more completely SCTP checksum offload support, from Davide Caratti. 7) More plumbing of extended netlink ACKs, from David Ahern, Pablo Neira Ayuso, and Matthias Schiffer. 8) Add devlink support to nfp driver, from Simon Horman. 9) Add RTM_F_FIB_MATCH flag to RTM_GETROUTE queries, from Roopa Prabhu. 10) Add stack depth tracking to BPF verifier and use this information in the various eBPF JITs. From Alexei Starovoitov. 11) Support XDP on qed device VFs, from Yuval Mintz. 12) Introduce BPF PROG ID for better introspection of installed BPF programs. From Martin KaFai Lau. 13) Add bpf_set_hash helper for TC bpf programs, from Daniel Borkmann. 14) For loads, allow narrower accesses in bpf verifier checking, from Yonghong Song. 15) Support MIPS in the BPF selftests and samples infrastructure, the MIPS eBPF JIT will be merged in via the MIPS GIT tree. From David Daney. 16) Support kernel based TLS, from Dave Watson and others. 17) Remove completely DST garbage collection, from Wei Wang. 18) Allow installing TCP MD5 rules using prefixes, from Ivan Delalande. 19) Add XDP support to Intel i40e driver, from Björn Töpel 20) Add support for TC flower offload in nfp driver, from Simon Horman, Pieter Jansen van Vuuren, Benjamin LaHaise, Jakub Kicinski, and Bert van Leeuwen. 21) IPSEC offloading support in mlx5, from Ilan Tayari. 22) Add HW PTP support to macb driver, from Rafal Ozieblo. 23) Networking refcount_t conversions, From Elena Reshetova. 24) Add sock_ops support to BPF, from Lawrence Brako. This is useful for tuning the TCP sockopt settings of a group of applications, currently via CGROUPs" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1899 commits) net: phy: dp83867: add workaround for incorrect RX_CTRL pin strap dt-bindings: phy: dp83867: provide a workaround for incorrect RX_CTRL pin strap cxgb4: Support for get_ts_info ethtool method cxgb4: Add PTP Hardware Clock (PHC) support cxgb4: time stamping interface for PTP nfp: default to chained metadata prepend format nfp: remove legacy MAC address lookup nfp: improve order of interfaces in breakout mode net: macb: remove extraneous return when MACB_EXT_DESC is defined bpf: add missing break in for the TCP_BPF_SNDCWND_CLAMP case bpf: fix return in load_bpf_file mpls: fix rtm policy in mpls_getroute net, ax25: convert ax25_cb.refcount from atomic_t to refcount_t net, ax25: convert ax25_route.refcount from atomic_t to refcount_t net, ax25: convert ax25_uid_assoc.refcount from atomic_t to refcount_t net, sctp: convert sctp_ep_common.refcnt from atomic_t to refcount_t net, sctp: convert sctp_transport.refcnt from atomic_t to refcount_t net, sctp: convert sctp_chunk.refcnt from atomic_t to refcount_t net, sctp: convert sctp_datamsg.refcnt from atomic_t to refcount_t net, sctp: convert sctp_auth_bytes.refcnt from atomic_t to refcount_t ...
2017-07-05	Merge branch 'linus' of ↵	Linus Torvalds	15	-248/+595
	git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto updates from Herbert Xu: "Algorithms: - add private key generation to ecdh Drivers: - add generic gcm(aes) to aesni-intel - add SafeXcel EIP197 crypto engine driver - add ecb(aes), cfb(aes) and ecb(des3_ede) to cavium - add support for CNN55XX adapters in cavium - add ctr mode to chcr - add support for gcm(aes) to omap" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (140 commits) crypto: testmgr - Reenable sha1/aes in FIPS mode crypto: ccp - Release locks before returning crypto: cavium/nitrox - dma_mapping_error() returns bool crypto: doc - fix typo in docs Documentation/bindings: Document the SafeXel cryptographic engine driver crypto: caam - fix gfp allocation flags (part II) crypto: caam - fix gfp allocation flags (part I) crypto: drbg - Fixes panic in wait_for_completion call crypto: caam - make of_device_ids const. crypto: vmx - remove unnecessary check crypto: n2 - make of_device_ids const crypto: inside-secure - use the base_end pointer in ring rollback crypto: inside-secure - increase the batch size crypto: inside-secure - only dequeue when needed crypto: inside-secure - get the backlog before dequeueing the request crypto: inside-secure - stop requeueing failed requests crypto: inside-secure - use one queue per hw ring crypto: inside-secure - update the context and request later crypto: inside-secure - align the cipher and hash send functions crypto: inside-secure - optimize DSE bufferability control ...
2017-07-05	Merge tag 'gcc-plugins-v4.13-rc1' of ↵	Linus Torvalds	4	-8/+50
	git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull GCC plugin updates from Kees Cook: "The big part is the randstruct plugin infrastructure. This is the first of two expected pull requests for randstruct since there are dependencies in other trees that would be easier to merge once those have landed. Notably, the IPC allocation refactoring in -mm, and many trivial merge conflicts across several trees when applying the __randomize_layout annotation. As a result, it seemed like I should send this now since it is relatively self-contained, and once the rest of the trees have landed, send the annotation patches. I'm expecting the final phase of randstruct (automatic struct selection) will land for v4.14, but if its other tree dependencies actually make it for v4.13, I can send that merge request too. Summary: - typo fix in Kconfig (Jean Delvare) - randstruct infrastructure" * tag 'gcc-plugins-v4.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: ARM: Prepare for randomized task_struct randstruct: Whitelist NIU struct page overloading randstruct: Whitelist big_key path struct overloading randstruct: Whitelist UNIXCB cast randstruct: Whitelist struct security_hook_heads cast gcc-plugins: Add the randstruct plugin Fix English in description of GCC_PLUGIN_STRUCTLEAK compiler: Add __designated_init annotation gcc-plugins: Detail c-common.h location for GCC 4.6
2017-07-05	Merge tag 'pstore-v4.13-rc1' of ↵	Linus Torvalds	1	-9/+5
	git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull pstore updates from Kees Cook: "Various fixes and tweaks for the pstore subsystem. Highlights: - use memdup_user() instead of open-coded copies (Geliang Tang) - fix record memory leak during initialization (Douglas Anderson) - avoid confused compressed record warning (Ankit Kumar) - prepopulate record timestamp and remove redundant logic from backends" * tag 'pstore-v4.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: powerpc/nvram: use memdup_user pstore: use memdup_user pstore: Fix format string to use %u for record id pstore: Populate pstore record->time field pstore: Create common record initializer efi-pstore: Refactor erase routine pstore: Avoid potential infinite loop pstore: Fix leaked pstore_record in pstore_get_backend_records() pstore: Don't warn if data is uncompressed and type is not PSTORE_TYPE_DMESG
2017-07-05	crypto: sha1-ssse3 - Disable avx2	Herbert Xu	1	-1/+1
	It has been reported that sha1-avx2 can cause page faults by reading beyond the end of the input. This patch disables it until it can be fixed. Cc: <[email protected]> Fixes: 7c1da8d0d046 ("crypto: sha - SHA1 transform x86_64 AVX2") Reported-by: Jan Stancek <[email protected]> Signed-off-by: Herbert Xu <[email protected]>
2017-07-05	MIPS: MIPS16e2: Subdecode extended LWSP/SWSP instructions	Maciej W. Rozycki	1	-2/+37
	Implement extended LWSP/SWSP instruction subdecoding for the purpose of unaligned GP-relative memory access emulation. With the introduction of the MIPS16e2 ASE[1] the previously must-be-zero 3-bit field at bits 7..5 of the extended encodings of the instructions selected with the LWSP and SWSP major opcodes has become a `sel' field, acting as an opcode extension for additional operations. In both cases the `sel' value of 0 has retained the original operation, that is: LW rx, offset(sp) and: SW rx, offset(sp) for LWSP and SWSP respectively. In hardware predating the MIPS16e2 ASE other values may or may not have been decoded, architecturally yielding unpredictable results, and in our unaligned memory access emulation we have treated the 3-bit field as a don't-care, that is effectively making all the possible encodings of the field alias to the architecturally defined encoding of 0. For the non-zero values of the `sel' field the MIPS16e2 ASE has in particular defined these GP-relative operations: LW rx, offset(gp) # sel = 1 LH rx, offset(gp) # sel = 2 LHU rx, offset(gp) # sel = 4 and SW rx, offset(gp) # sel = 1 SH rx, offset(gp) # sel = 2 for LWSP and SWSP respectively, which will trap with an Address Error exception if the effective address calculated is not naturally-aligned for the operation requested. These operations have been selected for unaligned access emulation, for consistency with the corresponding regular MIPS and microMIPS operations. For other non-zero values of the `sel' field the MIPS16e2 ASE has defined further operations, which however either never trap with an Address Error exception, such as LWL or GP-relative SB, or are not supposed to be emulated, such as LL or SC. These operations have been selected to exclude from unaligned access emulation, should an Address Error exception ever happen with them. Subdecode the `sel' field in unaligned access emulation then for the extended encodings of the instructions selected with the LWSP and SWSP major opcodes, whenever support for the MIPS16e2 ASE has been detected in hardware, and either emulate the operation requested or send SIGBUS to the originating process, according to the selection described above. For hardware implementing the MIPS16 ASE, however lacking MIPS16e2 ASE support retain the original interpretation of the `sel' field. The effects of this change are illustrated with the following user program: $ cat mips16e2-test.c #include <inttypes.h> #include <stdio.h> int main(void) { int64_t scratch[16] = { 0 }; int32_t tmp0, tmp1, tmp2; int i; scratch[0] = 0xc8c7c6c5c4c3c2c1; scratch[1] = 0xd0cfcecdcccbcac9; asm volatile( "move %0, $sp\n\t" "move %1, $gp\n\t" "move $sp, %4\n\t" "addiu %2, %4, 8\n\t" "move $gp, %2\n\t" "lw %2, 2($sp)\n\t" "sw %2, 16(%4)\n\t" "lw %2, 2($gp)\n\t" "sw %2, 24(%4)\n\t" "lw %2, 1($sp)\n\t" "sw %2, 32(%4)\n\t" "lh %2, 1($gp)\n\t" "sw %2, 40(%4)\n\t" "lw %2, 3($sp)\n\t" "sw %2, 48(%4)\n\t" "lhu %2, 3($gp)\n\t" "sw %2, 56(%4)\n\t" "lw %2, 0(%4)\n\t" "sw %2, 66($sp)\n\t" "lw %2, 8(%4)\n\t" "sw %2, 82($gp)\n\t" "lw %2, 0(%4)\n\t" "sw %2, 97($sp)\n\t" "lw %2, 8(%4)\n\t" "sh %2, 113($gp)\n\t" "move $gp, %1\n\t" "move $sp, %0" : "=&d" (tmp0), "=&d" (tmp1), "=&d" (tmp2), "=m" (scratch) : "d" (scratch)); for (i = 0; i < sizeof(scratch) / sizeof(scratch); i += 2) printf("%016" PRIx64 "\t%016" PRIx64 "\n", scratch[i], scratch[i + 1]); return 0; } $ to be compiled with: $ gcc -mips16 -mips32r2 -Wa,-mmips16e2 -o mips16e2-test mips16e2-test.c $ With 74Kf hardware, which does not implement the MIPS16e2 ASE, this program produces the following output: $ ./mips16e2-test c8c7c6c5c4c3c2c1 d0cfcecdcccbcac9 00000000c6c5c4c3 00000000c6c5c4c3 00000000c5c4c3c2 00000000c5c4c3c2 00000000c7c6c5c4 00000000c7c6c5c4 0000c4c3c2c10000 0000000000000000 0000cccbcac90000 0000000000000000 000000c4c3c2c100 0000000000000000 000000cccbcac900 0000000000000000 $ regardless of whether the change has been applied or not. With the change not applied and interAptive MR2 hardware[2], which does implement the MIPS16e2 ASE, it produces the following output: $ ./mips16e2-test c8c7c6c5c4c3c2c1 d0cfcecdcccbcac9 00000000c6c5c4c3 00000000cecdcccb 00000000c5c4c3c2 00000000cdcccbca 00000000c7c6c5c4 00000000cfcecdcc 0000c4c3c2c10000 0000000000000000 0000000000000000 0000cccbcac90000 000000c4c3c2c100 0000000000000000 0000000000000000 000000cccbcac900 $ which shows that for GP-relative operations the correct trapping address calculated from $gp has been obtained from the CP0 BadVAddr register and so has data from the source operand, however masking and extension has not been applied for halfword operations. With the change applied and interAptive MR2 hardware the program produces the following output: $ ./mips16e2-test c8c7c6c5c4c3c2c1 d0cfcecdcccbcac9 00000000c6c5c4c3 00000000cecdcccb 00000000c5c4c3c2 00000000ffffcbca 00000000c7c6c5c4 000000000000cdcc 0000c4c3c2c10000 0000000000000000 0000000000000000 0000cccbcac90000 000000c4c3c2c100 0000000000000000 0000000000000000 0000000000cac900 $ as expected. References: [1] "MIPS32 Architecture for Programmers: MIPS16e2 Application-Specific Extension Technical Reference Manual", Imagination Technologies Ltd., Document Number: MD01172, Revision 01.00, April 26, 2016 [2] "MIPS32 interAptiv Multiprocessing System Software User's Manual", Imagination Technologies Ltd., Document Number: MD00904, Revision 02.01, June 15, 2016, Chapter 24 "MIPS16e Application-Specific Extension to the MIPS32 Instruction Set", pp. 871-883 Signed-off-by: Maciej W. Rozycki <[email protected]> Reviewed-by: James Hogan <[email protected]> Cc: [email protected] Patchwork: https://patchwork.linux-mips.org/patch/16095/ Signed-off-by: Ralf Baechle <[email protected]>
2017-07-05	MIPS: MIPS16e2: Identify ASE presence	Maciej W. Rozycki	4	-0/+7
	Identify the presence of the MIPS16e2 ASE as per the architecture specification[1], by checking for CP0 Config5.CA2 bit being 1[2]. References: [1] "MIPS32 Architecture for Programmers: MIPS16e2 Application-Specific Extension Technical Reference Manual", Imagination Technologies Ltd., Document Number: MD01172, Revision 01.00, April 26, 2016, Section 1.2 "Software Detection of the ASE", p. 5 [2] "MIPS32 interAptiv Multiprocessing System Software User's Manual", Imagination Technologies Ltd., Document Number: MD00904, Revision 02.01, June 15, 2016, Section 2.2.1.6 "Device Configuration 5 -- Config5 (CP0 Register 16, Select 5)", pp. 71-72 Signed-off-by: Maciej W. Rozycki <[email protected]> Reviewed-by: James Hogan <[email protected]> Cc: [email protected] Patchwork: https://patchwork.linux-mips.org/patch/16094/ Signed-off-by: Ralf Baechle <[email protected]>
2017-07-05	Merge branches 'fixes' and 'misc' into for-linus	Russell King	10	-18/+181

2017-07-05	x86/boot/e820: Introduce the bootloader provided e820_table_firmware[] table	Chen Yu	3	-7/+29
	Add the real e820_tabel_firmware[] that will not be modified by the kernel or the EFI boot stub under any circumstance. In addition to that modify the code so that e820_table_firmwarep[] is exposed via sysfs to represent the real firmware memory layout, rather than exposing the e820_table_kexec[] table. This fixes a hibernation bug/warning, which uses e820_table_kexec[] to check RAM layout consistency across hibernation/resume: The suspend kernel: [ 0.000000] e820: update [mem 0x76671018-0x76679457] usable ==> usable The resume kernel: [ 0.000000] e820: update [mem 0x7666f018-0x76677457] usable ==> usable ... [ 15.752088] PM: Using 3 thread(s) for decompression. [ 15.752088] PM: Loading and decompressing image data (471870 pages)... [ 15.764971] Hibernate inconsistent memory map detected! [ 15.770833] PM: Image mismatch: architecture specific data Actually it is safe to restore these pages because E820_TYPE_RAM and E820_TYPE_RESERVED_KERN are treated the same during hibernation, so the original e820 table provided by the bootloader is used for hibernation MD5 fingerprint checking. The side effect is that, this newly introduced variable might increase the kernel size at compile time. Suggested-by: Ingo Molnar <[email protected]> Signed-off-by: Chen Yu <[email protected]> Cc: Dave Young <[email protected]> Cc: Len Brown <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rafael J. Wysocki <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Xunlei Pang <[email protected]> Cc: [email protected] Signed-off-by: Ingo Molnar <[email protected]>
2017-07-05	x86/boot/e820: Rename the e820_table_firmware to e820_table_kexec	Chen Yu	4	-26/+26
	Currently the e820_table_firmware[] table is mainly used by the kexec, and it is not what it's supposed to be - despite its name it might be modified by the kernel. So change its name to e820_table_kexec[]. In the next patch we will introduce the real e820_table_firmware[] table. No functional change. Signed-off-by: Chen Yu <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Xunlei Pang <[email protected]> Cc: [email protected] Signed-off-by: Ingo Molnar <[email protected]>
2017-07-05	x86/boot/e820: Avoid overwriting e820_table_firmware	Chen Yu	1	-2/+4
	The following commit in 2013: 77ea8c948953 ("x86: Reserve setup_data ranges late after parsing memmap cmdline") has fixed the issue of losing setup_data information by deferring the e820_reserve_setup_data() call until the early params have been parsed. But this also introduced a new problem that, during early params parsing, the kexec kernel might fake a mptable and saves it into the e820_table_firmware[] table (without saving the mptable to the e820_table[]), however the subsequent invoking of e820_reserve_setup_data() will overwrite the e820_table_firmware[] according to the e820_table[], thus the fake mptable information is lost. Fix this issue by updating the e820_table_firmware[] according to the setup_data information, but without overwriting it. Signed-off-by: Chen Yu <[email protected]> Cc: Dave Young <[email protected]> Cc: Len Brown <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rafael J. Wysocki <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Xunlei Pang <[email protected]> Cc: [email protected] Signed-off-by: Ingo Molnar <[email protected]>
2017-07-05	x86/mm/pat: Don't report PAT on CPUs that don't support it	Mikulas Patocka	3	-16/+20
	The pat_enabled() logic is broken on CPUs which do not support PAT and where the initialization code fails to call pat_init(). Due to that the enabled flag stays true and pat_enabled() returns true wrongfully. As a consequence the mappings, e.g. for Xorg, are set up with the wrong caching mode and the required MTRR setups are omitted. To cure this the following changes are required: 1) Make pat_enabled() return true only if PAT initialization was invoked and successful. 2) Invoke init_cache_modes() unconditionally in setup_arch() and remove the extra callsites in pat_disable() and the pat disabled code path in pat_init(). Also rename __pat_enabled to pat_disabled to reflect the real purpose of this variable. Fixes: 9cd25aac1f44 ("x86/mm/pat: Emulate PAT when it is disabled") Signed-off-by: Mikulas Patocka <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: Bernhard Held <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Brian Gerst <[email protected]> Cc: "Luis R. Rodriguez" <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/alpine.LRH.2.02.1707041749300.3456@file01.intranet.prod.int.rdu2.redhat.com
2017-07-05	s390/syscalls: Fix out of bounds arguments access	Jiri Olsa	1	-0/+6
	Zorro reported following crash while having enabled syscall tracing (CONFIG_FTRACE_SYSCALLS): Unable to handle kernel pointer dereference at virtual ... Oops: 0011 [#1] SMP DEBUG_PAGEALLOC SNIP Call Trace: ([<000000000024d79c>] ftrace_syscall_enter+0xec/0x1d8) [<00000000001099c6>] do_syscall_trace_enter+0x236/0x2f8 [<0000000000730f1c>] sysc_tracesys+0x1a/0x32 [<000003fffcf946a2>] 0x3fffcf946a2 INFO: lockdep is turned off. Last Breaking-Event-Address: [<000000000022dd44>] rb_event_data+0x34/0x40 ---[ end trace 8c795f86b1b3f7b9 ]--- The crash happens in syscall_get_arguments function for syscalls with zero arguments, that will try to access first argument (args[0]) in event entry, but it's not allocated. Bail out of there are no arguments. Cc: [email protected] Reported-by: Zorro Lang <[email protected]> Signed-off-by: Jiri Olsa <[email protected]> Signed-off-by: Martin Schwidefsky <[email protected]>
2017-07-05	s390/crash: Remove unused KEXEC_NOTE_BYTES	Michael Holzheu	1	-18/+0
	After commmit 692f66f26a4c19 ("crash: move crashkernel parsing and vmcore related code under CONFIG_CRASH_CORE") the KEXEC_NOTE_BYTES macro is not used anymore and for s390 we create the ELF header in the new kernel anyway. Therefore remove the macro. Reported-by: Xunlei Pang <[email protected]> Reviewed-by: Mikhail Zaslonko <[email protected]> Signed-off-by: Michael Holzheu <[email protected]> Signed-off-by: Martin Schwidefsky <[email protected]>
2017-07-04	Merge branch 'merge/randstruct' into for-next/gcc-plugins	Kees Cook	4	-8/+50