blaster4385/linux-IllusionX - Linux kernel with personal config changes for arch linux

Age	Commit message (Collapse)	Author	Files	Lines
2018-12-11	sched/topology: Reference the Energy Model of CPUs when available	Quentin Perret	2	-4/+151
	The existing scheduling domain hierarchy is defined to map to the cache topology of the system. However, Energy Aware Scheduling (EAS) requires more knowledge about the platform, and specifically needs to know about the span of Performance Domains (PD), which do not always align with caches. To address this issue, use the Energy Model (EM) of the system to extend the scheduler topology code with a representation of the PDs, alongside the scheduling domains. More specifically, a linked list of PDs is attached to each root domain. When multiple root domains are in use, each list contains only the PDs covering the CPUs of its root domain. If a PD spans over CPUs of multiple different root domains, it will be duplicated in all lists. The lists are fully maintained by the scheduler from partition_sched_domains() in order to cope with hotplug and cpuset changes. As for scheduling domains, the list are protected by RCU to ensure safe concurrent updates. Signed-off-by: Quentin Perret <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2018-12-11	PM: Introduce an Energy Model management framework	Quentin Perret	4	-0/+405
	Several subsystems in the kernel (task scheduler and/or thermal at the time of writing) can benefit from knowing about the energy consumed by CPUs. Yet, this information can come from different sources (DT or firmware for example), in different formats, hence making it hard to exploit without a standard API. As an attempt to address this, introduce a centralized Energy Model (EM) management framework which aggregates the power values provided by drivers into a table for each performance domain in the system. The power cost tables are made available to interested clients (e.g. task scheduler or thermal) via platform-agnostic APIs. The overall design is represented by the diagram below (focused on Arm-related drivers as an example, but applicable to any architecture): +---------------+ +-----------------+ +-------------+ \| Thermal (IPA) \| \| Scheduler (EAS) \| \| Other \| +---------------+ +-----------------+ +-------------+ \| \| em_pd_energy() \| \| \| em_cpu_get() \| +-----------+ \| +--------+ \| \| \| v v v +---------------------+ \| \| \| Energy Model \| \| \| \| Framework \| \| \| +---------------------+ ^ ^ ^ \| \| \| em_register_perf_domain() +----------+ \| +---------+ \| \| \| +---------------+ +---------------+ +--------------+ \| cpufreq-dt \| \| arm_scmi \| \| Other \| +---------------+ +---------------+ +--------------+ ^ ^ ^ \| \| \| +--------------+ +---------------+ +--------------+ \| Device Tree \| \| Firmware \| \| ? \| +--------------+ +---------------+ +--------------+ Drivers (typically, but not limited to, CPUFreq drivers) can register data in the EM framework using the em_register_perf_domain() API. The calling driver must provide a callback function with a standardized signature that will be used by the EM framework to build the power cost tables of the performance domain. This design should offer a lot of flexibility to calling drivers which are free of reading information from any location and to use any technique to compute power costs. Moreover, the capacity states registered by drivers in the EM framework are not required to match real performance states of the target. This is particularly important on targets where the performance states are not known by the OS. The power cost coefficients managed by the EM framework are specified in milli-watts. Although the two potential users of those coefficients (IPA and EAS) only need relative correctness, IPA specifically needs to compare the power of CPUs with the power of other components (GPUs, for example), which are still expressed in absolute terms in their respective subsystems. Hence, specifying the power of CPUs in milli-watts should help transitioning IPA to using the EM framework without introducing new problems by keeping units comparable across sub-systems. On the longer term, the EM of other devices than CPUs could also be managed by the EM framework, which would enable to remove the absolute unit. However, this is not absolutely required as a first step, so this extension of the EM framework is left for later. On the client side, the EM framework offers APIs to access the power cost tables of a CPU (em_cpu_get()), and to estimate the energy consumed by the CPUs of a performance domain (em_pd_energy()). Clients such as the task scheduler can then use these APIs to access the shared data structures holding the Energy Model of CPUs. Signed-off-by: Quentin Perret <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rafael J. Wysocki <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2018-12-11	sched/cpufreq: Prepare schedutil for Energy Aware Scheduling	Quentin Perret	3	-15/+74
	Schedutil requests frequency by aggregating utilization signals from the scheduler (CFS, RT, DL, IRQ) and applying a 25% margin on top of them. Since Energy Aware Scheduling (EAS) needs to be able to predict the frequency requests, it needs to forecast the decisions made by the governor. In order to prepare the introduction of EAS, introduce schedutil_freq_util() to centralize the aforementioned signal aggregation and make it available to both schedutil and EAS. Since frequency selection and energy estimation still need to deal with RT and DL signals slightly differently, schedutil_freq_util() is called with a different 'type' parameter in those two contexts, and returns an aggregated utilization signal accordingly. While at it, introduce the map_util_freq() function which is designed to make schedutil's 25% margin usable easily for both sugov and EAS. As EAS will be able to predict schedutil's frequency requests more accurately than any other governor by design, it'd be sensible to make sure EAS cannot be used without schedutil. This will be done later, once EAS has actually been introduced. Suggested-by: Peter Zijlstra <[email protected]> Signed-off-by: Quentin Perret <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2018-12-11	sched/topology: Relocate arch_scale_cpu_capacity() to the internal header	Quentin Perret	2	-18/+16
	By default, arch_scale_cpu_capacity() is only visible from within the kernel/sched folder. Relocate it to include/linux/sched/topology.h to make it visible to other clients needing to know about the capacity of CPUs, such as the Energy Model framework. This also shrinks the <linux/sched/topology.h> public header. Signed-off-by: Quentin Perret <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2018-12-11	sched/core: Remove unnecessary unlikely() in push_*_task()	Yangtao Li	2	-6/+2
	WARN_ON() already contains an unlikely(), so it's not necessary to use WARN_ON(1). Signed-off-by: Yangtao Li <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2018-12-11	sched/topology: Remove the ::smt_gain field from 'struct sched_domain'	Vincent Guittot	3	-6/+0
	::smt_gain is used to compute the capacity of CPUs of a SMT core with the constraint 1 < ::smt_gain < 2 in order to be able to compute number of CPUs per core. The field has_free_capacity of struct numa_stat, which was the last user of this computation of number of CPUs per core, has been removed by: 2d4056fafa19 ("sched/numa: Remove numa_has_capacity()") We can now remove this constraint on core capacity and use the defautl value SCHED_CAPACITY_SCALE for SMT CPUs. With this remove, SCHED_CAPACITY_SCALE becomes the maximum compute capacity of CPUs on every systems. This should help to simplify some code and remove fields like rd->max_cpu_capacity Furthermore, arch_scale_cpu_capacity() is used with a NULL sd in several other places in the code when it wants the capacity of a CPUs to scale some metrics like in pelt, deadline or schedutil. In case on SMT, the value returned is not the capacity of SMT CPUs but default SCHED_CAPACITY_SCALE. So remove it. Signed-off-by: Vincent Guittot <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2018-12-03	sched: Fix various typos in comments	Ingo Molnar	10	-22/+22
	Go over the scheduler source code and fix common typos in comments - and a typo in an actual variable name. No change in functionality intended. Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: [email protected] Signed-off-by: Ingo Molnar <[email protected]>
2018-12-03	Merge tag 'v4.20-rc5' into sched/core, to pick up fixes	Ingo Molnar	5490	-85032/+296518
	Signed-off-by: Ingo Molnar <[email protected]>
2018-12-02	Linux 4.20-rc5	Linus Torvalds	1	-1/+1

2018-12-02	Merge tag 'armsoc-fixes' of ↵	Linus Torvalds	24	-33/+176
	git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc Pull ARM SoC fixes from Olof Johansson: "Volume is a little higher than usual due to a set of gpio fixes for Davinci platforms that's been around a while, still seemed appropriate to not hold off until next merge window. Besides that it's the usual mix of minor fixes, mostly corrections of small stuff in device trees. Major stability-related one is the removal of a regulator from DT on Rock960, since DVFS caused undervoltage. I expect it'll be restored once they figure out the underlying issue" * tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (28 commits) MAINTAINERS: Remove unused Qualcomm SoC mailing list ARM: davinci: dm644x: set the GPIO base to 0 ARM: davinci: da830: set the GPIO base to 0 ARM: davinci: dm355: set the GPIO base to 0 ARM: davinci: dm646x: set the GPIO base to 0 ARM: davinci: dm365: set the GPIO base to 0 ARM: davinci: da850: set the GPIO base to 0 gpio: davinci: restore a way to manually specify the GPIO base ARM: davinci: dm644x: define gpio interrupts as separate resources ARM: davinci: dm355: define gpio interrupts as separate resources ARM: davinci: dm646x: define gpio interrupts as separate resources ARM: davinci: dm365: define gpio interrupts as separate resources ARM: davinci: da8xx: define gpio interrupts as separate resources ARM: dts: at91: sama5d2: use the divided clock for SMC ARM: dts: imx51-zii-rdu1: Remove EEPROM node ARM: dts: rockchip: Remove @0 from the veyron memory node arm64: dts: rockchip: Fix PCIe reset polarity for rk3399-puma-haikou. arm64: dts: qcom: msm8998: Reserve gpio ranges on MTP arm64: dts: sdm845-mtp: Reserve reserved gpios arm64: dts: ti: k3-am654: Fix wakeup_uart reg address ...
2018-12-02	Merge tag 'for-linus-4.20a-rc5-tag' of ↵	Linus Torvalds	8	-164/+37
	git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen fixes from Juergen Gross: - A revert of a previous commit as it is no longer necessary and has shown to cause problems in some memory hotplug cases. - Some small fixes and a minor cleanup. - A patch for adding better diagnostic data in a very rare failure case. * tag 'for-linus-4.20a-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: pvcalls-front: fixes incorrect error handling Revert "xen/balloon: Mark unallocated host memory as UNUSABLE" xen: xlate_mmu: add missing header to fix 'W=1' warning xen/x86: add diagnostic printout to xen_mc_flush() in case of error x86/xen: cleanup includes in arch/x86/xen/spinlock.c
2018-12-02	Merge tag 'dmaengine-fix-4.20-rc5' of ↵	Linus Torvalds	1	-1/+9
	git://git.infradead.org/users/vkoul/slave-dma Pull dmaengine fixes from Vinod Koul: "This contains two fixes to at_hdmac which fixes long standing bus reported recently on serial transfers causing memory leak. These fixes were done by Richard Genoud" * tag 'dmaengine-fix-4.20-rc5' of git://git.infradead.org/users/vkoul/slave-dma: dmaengine: at_hdmac: fix module unloading dmaengine: at_hdmac: fix memory leak in at_dma_xlate()
2018-12-01	Merge branch 'x86-pti-for-linus' of ↵	Linus Torvalds	26	-282/+780
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull STIBP fallout fixes from Thomas Gleixner: "The performance destruction department finally got it's act together and came up with a cure for the STIPB regression: - Provide a command line option to control the spectre v2 user space mitigations. Default is either seccomp or prctl (if seccomp is disabled in Kconfig). prctl allows mitigation opt-in, seccomp enables the migitation for sandboxed processes. - Rework the code to handle the conditional STIBP/IBPB control and remove the now unused ptrace_may_access_sched() optimization attempt - Disable STIBP automatically when SMT is disabled - Optimize the switch_to() logic to avoid MSR writes and invocations of __switch_to_xtra(). - Make the asynchronous speculation TIF updates synchronous to prevent stale mitigation state. As a general cleanup this also makes retpoline directly depend on compiler support and removes the 'minimal retpoline' option which just pretended to provide some form of security while providing none" * 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (31 commits) x86/speculation: Provide IBPB always command line options x86/speculation: Add seccomp Spectre v2 user space protection mode x86/speculation: Enable prctl mode for spectre_v2_user x86/speculation: Add prctl() control for indirect branch speculation x86/speculation: Prepare arch_smt_update() for PRCTL mode x86/speculation: Prevent stale SPEC_CTRL msr content x86/speculation: Split out TIF update ptrace: Remove unused ptrace_may_access_sched() and MODE_IBRS x86/speculation: Prepare for conditional IBPB in switch_mm() x86/speculation: Avoid __switch_to_xtra() calls x86/process: Consolidate and simplify switch_to_xtra() code x86/speculation: Prepare for per task indirect branch speculation control x86/speculation: Add command line control for indirect branch speculation x86/speculation: Unify conditional spectre v2 print functions x86/speculataion: Mark command line parser data __initdata x86/speculation: Mark string arrays const correctly x86/speculation: Reorder the spec_v2 code x86/l1tf: Show actual SMT state x86/speculation: Rework SMT state change sched/smt: Expose sched_smt_present static key ...
2018-12-01	Merge tag 'for-linus-20181201' of git://git.kernel.dk/linux-block	Linus Torvalds	6	-7/+14
	Pull block layer fixes from Jens Axboe: - Single range elevator discard merge fix, that caused crashes (Ming) - Fix for a regression in O_DIRECT, where we could potentially lose the error value (Maximilian Heyne) - NVMe pull request from Christoph, with little fixes all over the map for NVMe. * tag 'for-linus-20181201' of git://git.kernel.dk/linux-block: block: fix single range discard merge nvme-rdma: fix double freeing of async event data nvme: flush namespace scanning work just before removing namespaces nvme: warn when finding multi-port subsystems without multipathing enabled fs: fix lost error code in dio_complete nvme-pci: fix surprise removal nvme-fc: initialize nvme_req(rq)->ctrl after calling __nvme_fc_init_request() nvme: Free ctrl device name on init failure
2018-12-01	Merge tag 'pci-v4.20-fixes-2' of ↵	Linus Torvalds	4	-24/+13
	git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI fixes from Bjorn Helgaas: - Fix a link speed checking interface that broke PCIe gen3 cards in gen1 slots (Mikulas Patocka) - Fix an imx6 link training error (Trent Piepho) - Fix a layerscape outbound window accessor calling error (Hou Zhiqiang) - Fix a DesignWare endpoint MSI-X address calculation error (Gustavo Pimentel) * tag 'pci-v4.20-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: PCI: Fix incorrect value returned from pcie_get_speed_cap() PCI: dwc: Fix MSI-X EP framework address calculation bug PCI: layerscape: Fix wrong invocation of outbound window disable accessor PCI: imx6: Fix link training status detection in link up check
2018-11-30	Merge remote-tracking branch 'lorenzo/pci/controller-fixes' into for-linus	Bjorn Helgaas	3	-11/+2
	- Fix DesignWare endpoint MSI-X address calculation bug (Gustavo Pimentel) - Fix Layerscape outbound window disable usage (Hou Zhiqiang) - Fix imx6 link up detection (Trent Piepho) * lorenzo/pci/controller-fixes: PCI: dwc: Fix MSI-X EP framework address calculation bug PCI: layerscape: Fix wrong invocation of outbound window disable accessor PCI: imx6: Fix link training status detection in link up check
2018-11-30	PCI: Fix incorrect value returned from pcie_get_speed_cap()	Mikulas Patocka	1	-13/+11
	The macros PCI_EXP_LNKCAP_SLS_*GB are values, not bit masks. We must mask the register and compare it against them. This fixes errors like this: amdgpu: [powerplay] failed to send message 261 ret is 0 when a PCIe-v3 card is plugged into a PCIe-v1 slot, because the slot is being incorrectly reported as PCIe-v3 capable. 6cf57be0f78e, which appeared in v4.17, added pcie_get_speed_cap() with the incorrect test of PCI_EXP_LNKCAP_SLS as a bitmask. 5d9a63304032, which appeared in v4.19, changed amdgpu to use pcie_get_speed_cap(), so the amdgpu bug reports below are regressions in v4.19. Fixes: 6cf57be0f78e ("PCI: Add pcie_get_speed_cap() to find max supported link speed") Fixes: 5d9a63304032 ("drm/amdgpu: use pcie functions for link width and speed") Link: https://bugs.freedesktop.org/show_bug.cgi?id=108704 Link: https://bugs.freedesktop.org/show_bug.cgi?id=108778 Signed-off-by: Mikulas Patocka <[email protected]> [bhelgaas: update comment, remove use of PCI_EXP_LNKCAP_SLS_8_0GB and PCI_EXP_LNKCAP_SLS_16_0GB since those should be covered by PCI_EXP_LNKCAP2, remove test of PCI_EXP_LNKCAP for zero, since that register is required] Signed-off-by: Bjorn Helgaas <[email protected]> Acked-by: Alex Deucher <[email protected]> Cc: [email protected] # v4.17+
2018-11-30	Merge branch 'akpm' (patches from Andrew)	Linus Torvalds	25	-177/+319
	Merge misc fixes from Andrew Morton: "31 fixes" * emailed patches from Andrew Morton <[email protected]>: (31 commits) ocfs2: fix potential use after free mm/khugepaged: fix the xas_create_range() error path mm/khugepaged: collapse_shmem() do not crash on Compound mm/khugepaged: collapse_shmem() without freezing new_page mm/khugepaged: minor reorderings in collapse_shmem() mm/khugepaged: collapse_shmem() remember to clear holes mm/khugepaged: fix crashes due to misaccounted holes mm/khugepaged: collapse_shmem() stop if punched or truncated mm/huge_memory: fix lockdep complaint on 32-bit i_size_read() mm/huge_memory: splitting set mapping+index before unfreeze mm/huge_memory: rename freeze_page() to unmap_page() initramfs: clean old path before creating a hardlink kernel/kcov.c: mark funcs in __sanitizer_cov_trace_pc() as notrace psi: make disabling/enabling easier for vendor kernels proc: fixup map_files test on arm debugobjects: avoid recursive calls with kmemleak userfaultfd: shmem: UFFDIO_COPY: set the page dirty if VM_WRITE is not set userfaultfd: shmem: add i_size checks userfaultfd: shmem/hugetlbfs: only allow to register VM_MAYWRITE vmas userfaultfd: shmem: allocate anonymous memory for MAP_PRIVATE shmem ...
2018-11-30	Merge tag 'mips_fixes_4.20_4' of ↵	Linus Torvalds	3	-25/+25
	git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux Pull few more MIPS fixes from Paul Burton: - Fix mips_get_syscall_arg() to operate on the task specified when detecting o32 tasks running on MIPS64 kernels. - Fix some incorrect GPIO pin muxing for the MT7620 SoC. - Update the linux-mips mailing list address. * tag 'mips_fixes_4.20_4' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux: MAINTAINERS: Update linux-mips mailing list address MIPS: ralink: Fix mt7620 nd_sd pinmux mips: fix mips_get_syscall_arg o32 check
2018-11-30	Merge tag 'arm64-fixes' of ↵	Linus Torvalds	6	-6/+59
	git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Catalin Marinas: - Cortex-A76 erratum workaround - ftrace fix to enable syscall events on arm64 - Fix uninitialised pointer in iort_get_platform_device_domain() * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: ACPI/IORT: Fix iort_get_platform_device_domain() uninitialized pointer value arm64: ftrace: Fix to enable syscall events on arm64 arm64: Add workaround for Cortex-A76 erratum 1286807
2018-11-30	Merge tag 'gcc-plugins-v4.20-rc5' of ↵	Linus Torvalds	1	-1/+3
	git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull stackleak plugin fix from Kees Cook: "Fix crash by not allowing kprobing of stackleak_erase() (Alexander Popov)" * tag 'gcc-plugins-v4.20-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: stackleak: Disable function tracing and kprobes for stackleak_erase()
2018-11-30	Merge tag 'fscache-fixes-20181130' of ↵	Linus Torvalds	5	-9/+17
	git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs Pull fscache and cachefiles fixes from David Howells: "Misc fixes: - Fix an assertion failure at fs/cachefiles/xattr.c:138 caused by a race between a cache object lookup failing and someone attempting to reenable that object, thereby triggering an update of the object's attributes. - Fix an assertion failure at fs/fscache/operation.c:449 caused by a split atomic subtract and atomic read that allows a race to happen. - Fix a leak of backing pages when simultaneously reading the same page from the same object from two or more threads. - Fix a hang due to a race between a cache object being discarded and the corresponding cookie being reenabled. There are also some minor cleanups: - Cast an enum value to a different enum type to prevent clang from generating a warning. This shouldn't cause any sort of change in the emitted code. - Use ktime_get_real_seconds() instead of get_seconds(). This is just used to uniquify a filename for an object to be placed in the graveyard. Objects placed there are deleted by cachfilesd in userspace immediately thereafter. - Remove an initialised, but otherwise unused variable. This should have been entirely optimised away anyway" * tag 'fscache-fixes-20181130' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: fscache, cachefiles: remove redundant variable 'cache' cachefiles: avoid deprecated get_seconds() cachefiles: Explicitly cast enumerated type in put_object fscache: fix race between enablement and dropping of object cachefiles: Fix page leak in cachefiles_read_backing_file while vmscan is active fscache: Fix race in fscache_op_complete() due to split atomic_sub & read cachefiles: Fix an assertion failure when trying to update a failed object
2018-11-30	MAINTAINERS: Update linux-mips mailing list address	Paul Burton	1	-23/+23
	The linux-mips.org infrastructure has been unreliable recently & nobody with sufficient access to fix it is around to do so. As a result we're moving away from it, and part of this is migrating our mailing list to kernel.org. Replace all instances of [email protected] in MAINTAINERS with the shiny new [email protected] address. The new list is now being archived on kernel.org at https://lore.kernel.org/linux-mips/ which also holds the history of the old linux-mips.org list. Signed-off-by: Paul Burton <[email protected]> Cc: [email protected] Cc: [email protected]
2018-11-30	ocfs2: fix potential use after free	Pan Bian	1	-1/+1
	ocfs2_get_dentry() calls iput(inode) to drop the reference count of inode, and if the reference count hits 0, inode is freed. However, in this function, it then reads inode->i_generation, which may result in a use after free bug. Move the put operation later. Link: http://lkml.kernel.org/r/[email protected] Fixes: 781f200cb7a("ocfs2: Remove masklog ML_EXPORT.") Signed-off-by: Pan Bian <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Cc: Mark Fasheh <[email protected]> Cc: Joel Becker <[email protected]> Cc: Junxiao Bi <[email protected]> Cc: Joseph Qi <[email protected]> Cc: Changwei Ge <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-11-30	mm/khugepaged: fix the xas_create_range() error path	Hugh Dickins	1	-11/+14
	collapse_shmem()'s xas_nomem() is very unlikely to fail, but it is rightly given a failure path, so move the whole xas_create_range() block up before __SetPageLocked(new_page): so that it does not need to remember to unlock_page(new_page). Add the missing mem_cgroup_cancel_charge(), and set (currently unused) result to SCAN_FAIL rather than SCAN_SUCCEED. Link: http://lkml.kernel.org/r/[email protected] Fixes: 77da9389b9d5 ("mm: Convert collapse_shmem to XArray") Signed-off-by: Hugh Dickins <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Jerome Glisse <[email protected]> Cc: Konstantin Khlebnikov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-11-30	mm/khugepaged: collapse_shmem() do not crash on Compound	Hugh Dickins	1	-1/+9
	collapse_shmem()'s VM_BUG_ON_PAGE(PageTransCompound) was unsafe: before it holds page lock of the first page, racing truncation then extension might conceivably have inserted a hugepage there already. Fail with the SCAN_PAGE_COMPOUND result, instead of crashing (CONFIG_DEBUG_VM=y) or otherwise mishandling the unexpected hugepage - though later we might code up a more constructive way of handling it, with SCAN_SUCCESS. Link: http://lkml.kernel.org/r/[email protected] Fixes: f3f0e1d2150b2 ("khugepaged: add support of collapse for tmpfs/shmem pages") Signed-off-by: Hugh Dickins <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Jerome Glisse <[email protected]> Cc: Konstantin Khlebnikov <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: <[email protected]> [4.8+] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-11-30	mm/khugepaged: collapse_shmem() without freezing new_page	Hugh Dickins	1	-12/+7
	khugepaged's collapse_shmem() does almost all of its work, to assemble the huge new_page from 512 scattered old pages, with the new_page's refcount frozen to 0 (and refcounts of all old pages so far also frozen to 0). Including shmem_getpage() to read in any which were out on swap, memory reclaim if necessary to allocate their intermediate pages, and copying over all the data from old to new. Imagine the frozen refcount as a spinlock held, but without any lock debugging to highlight the abuse: it's not good, and under serious load heads into lockups - speculative getters of the page are not expecting to spin while khugepaged is rescheduled. One can get a little further under load by hacking around elsewhere; but fortunately, freezing the new_page turns out to have been entirely unnecessary, with no hacks needed elsewhere. The huge new_page lock is already held throughout, and guards all its subpages as they are brought one by one into the page cache tree; and anything reading the data in that page, without the lock, before it has been marked PageUptodate, would already be in the wrong. So simply eliminate the freezing of the new_page. Each of the old pages remains frozen with refcount 0 after it has been replaced by a new_page subpage in the page cache tree, until they are all unfrozen on success or failure: just as before. They could be unfrozen sooner, but cause no problem once no longer visible to find_get_entry(), filemap_map_pages() and other speculative lookups. Link: http://lkml.kernel.org/r/[email protected] Fixes: f3f0e1d2150b2 ("khugepaged: add support of collapse for tmpfs/shmem pages") Signed-off-by: Hugh Dickins <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Cc: Jerome Glisse <[email protected]> Cc: Konstantin Khlebnikov <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: <[email protected]> [4.8+] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-11-30	mm/khugepaged: minor reorderings in collapse_shmem()	Hugh Dickins	1	-40/+32
	Several cleanups in collapse_shmem(): most of which probably do not really matter, beyond doing things in a more familiar and reassuring order. Simplify the failure gotos in the main loop, and on success update stats while interrupts still disabled from the last iteration. Link: http://lkml.kernel.org/r/[email protected] Fixes: f3f0e1d2150b2 ("khugepaged: add support of collapse for tmpfs/shmem pages") Signed-off-by: Hugh Dickins <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Cc: Jerome Glisse <[email protected]> Cc: Konstantin Khlebnikov <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: <[email protected]> [4.8+] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-11-30	mm/khugepaged: collapse_shmem() remember to clear holes	Hugh Dickins	1	-0/+10
	Huge tmpfs testing reminds us that there is no __GFP_ZERO in the gfp flags khugepaged uses to allocate a huge page - in all common cases it would just be a waste of effort - so collapse_shmem() must remember to clear out any holes that it instantiates. The obvious place to do so, where they are put into the page cache tree, is not a good choice: because interrupts are disabled there. Leave it until further down, once success is assured, where the other pages are copied (before setting PageUptodate). Link: http://lkml.kernel.org/r/[email protected] Fixes: f3f0e1d2150b2 ("khugepaged: add support of collapse for tmpfs/shmem pages") Signed-off-by: Hugh Dickins <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Cc: Jerome Glisse <[email protected]> Cc: Konstantin Khlebnikov <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: <[email protected]> [4.8+] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-11-30	mm/khugepaged: fix crashes due to misaccounted holes	Hugh Dickins	2	-2/+9
	Huge tmpfs testing on a shortish file mapped into a pmd-rounded extent hit shmem_evict_inode()'s WARN_ON(inode->i_blocks) followed by clear_inode()'s BUG_ON(inode->i_data.nrpages) when the file was later closed and unlinked. khugepaged's collapse_shmem() was forgetting to update mapping->nrpages on the rollback path, after it had added but then needs to undo some holes. There is indeed an irritating asymmetry between shmem_charge(), whose callers want it to increment nrpages after successfully accounting blocks, and shmem_uncharge(), when __delete_from_page_cache() already decremented nrpages itself: oh well, just add a comment on that to them both. And shmem_recalc_inode() is supposed to be called when the accounting is expected to be in balance (so it can deduce from imbalance that reclaim discarded some pages): so change shmem_charge() to update nrpages earlier (though it's rare for the difference to matter at all). Link: http://lkml.kernel.org/r/[email protected] Fixes: 800d8c63b2e98 ("shmem: add huge pages support") Fixes: f3f0e1d2150b2 ("khugepaged: add support of collapse for tmpfs/shmem pages") Signed-off-by: Hugh Dickins <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Cc: Jerome Glisse <[email protected]> Cc: Konstantin Khlebnikov <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: <[email protected]> [4.8+] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-11-30	mm/khugepaged: collapse_shmem() stop if punched or truncated	Hugh Dickins	1	-0/+11
	Huge tmpfs testing showed that although collapse_shmem() recognizes a concurrently truncated or hole-punched page correctly, its handling of holes was liable to refill an emptied extent. Add check to stop that. Link: http://lkml.kernel.org/r/[email protected] Fixes: f3f0e1d2150b2 ("khugepaged: add support of collapse for tmpfs/shmem pages") Signed-off-by: Hugh Dickins <[email protected]> Reviewed-by: Matthew Wilcox <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Jerome Glisse <[email protected]> Cc: Konstantin Khlebnikov <[email protected]> Cc: <[email protected]> [4.8+] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-11-30	mm/huge_memory: fix lockdep complaint on 32-bit i_size_read()	Hugh Dickins	1	-6/+13
	Huge tmpfs testing, on 32-bit kernel with lockdep enabled, showed that __split_huge_page() was using i_size_read() while holding the irq-safe lru_lock and page tree lock, but the 32-bit i_size_read() uses an irq-unsafe seqlock which should not be nested inside them. Instead, read the i_size earlier in split_huge_page_to_list(), and pass the end offset down to __split_huge_page(): all while holding head page lock, which is enough to prevent truncation of that extent before the page tree lock has been taken. Link: http://lkml.kernel.org/r/[email protected] Fixes: baa355fd33142 ("thp: file pages support for split_huge_page()") Signed-off-by: Hugh Dickins <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Cc: Jerome Glisse <[email protected]> Cc: Konstantin Khlebnikov <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: <[email protected]> [4.8+] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-11-30	mm/huge_memory: splitting set mapping+index before unfreeze	Hugh Dickins	1	-6/+6
	Huge tmpfs stress testing has occasionally hit shmem_undo_range()'s VM_BUG_ON_PAGE(page_to_pgoff(page) != index, page). Move the setting of mapping and index up before the page_ref_unfreeze() in __split_huge_page_tail() to fix this: so that a page cache lookup cannot get a reference while the tail's mapping and index are unstable. In fact, might as well move them up before the smp_wmb(): I don't see an actual need for that, but if I'm missing something, this way round is safer than the other, and no less efficient. You might argue that VM_BUG_ON_PAGE(page_to_pgoff(page) != index, page) is misplaced, and should be left until after the trylock_page(); but left as is has not crashed since, and gives more stringent assurance. Link: http://lkml.kernel.org/r/[email protected] Fixes: e9b61f19858a5 ("thp: reintroduce split_huge_page()") Requires: 605ca5ede764 ("mm/huge_memory.c: reorder operations in __split_huge_page_tail()") Signed-off-by: Hugh Dickins <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Cc: Konstantin Khlebnikov <[email protected]> Cc: Jerome Glisse <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: <[email protected]> [4.8+] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-11-30	mm/huge_memory: rename freeze_page() to unmap_page()	Hugh Dickins	2	-16/+9
	The term "freeze" is used in several ways in the kernel, and in mm it has the particular meaning of forcing page refcount temporarily to 0. freeze_page() is just too confusing a name for a function that unmaps a page: rename it unmap_page(), and rename unfreeze_page() remap_page(). Went to change the mention of freeze_page() added later in mm/rmap.c, but found it to be incorrect: ordinary page reclaim reaches there too; but the substance of the comment still seems correct, so edit it down. Link: http://lkml.kernel.org/r/[email protected] Fixes: e9b61f19858a5 ("thp: reintroduce split_huge_page()") Signed-off-by: Hugh Dickins <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Cc: Jerome Glisse <[email protected]> Cc: Konstantin Khlebnikov <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: <[email protected]> [4.8+] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-11-30	initramfs: clean old path before creating a hardlink	Li Zhijian	1	-10/+12
	sys_link() can fail due to the new path already existing. This case ofen occurs when we use a concated initrd, for example: 1) prepare a basic rootfs, it contains a regular files rc.local lizhijian@:~/yocto-tiny-i386-2016-04-22$ cat etc/rc.local #!/bin/sh echo "Running /etc/rc.local..." yocto-tiny-i386-2016-04-22$ find . \| sed 's,^\./,,' \| cpio -o -H newc \| gzip -n -9 >../rootfs.cgz 2) create a extra initrd which also includes a etc/rc.local lizhijian@:~/lkp-x86_64/etc$ echo "append initrd" >rc.local lizhijian@:~/lkp/lkp-x86_64/etc$ cat rc.local append initrd lizhijian@:~/lkp/lkp-x86_64/etc$ ln rc.local rc.local.hardlink append initrd lizhijian@:~/lkp/lkp-x86_64/etc$ stat rc.local rc.local.hardlink File: 'rc.local' Size: 14 Blocks: 8 IO Block: 4096 regular file Device: 801h/2049d Inode: 11296086 Links: 2 Access: (0664/-rw-rw-r--) Uid: ( 1002/lizhijian) Gid: ( 1002/lizhijian) Access: 2018-11-15 16:08:28.654464815 +0800 Modify: 2018-11-15 16:07:57.514903210 +0800 Change: 2018-11-15 16:08:24.180228872 +0800 Birth: - File: 'rc.local.hardlink' Size: 14 Blocks: 8 IO Block: 4096 regular file Device: 801h/2049d Inode: 11296086 Links: 2 Access: (0664/-rw-rw-r--) Uid: ( 1002/lizhijian) Gid: ( 1002/lizhijian) Access: 2018-11-15 16:08:28.654464815 +0800 Modify: 2018-11-15 16:07:57.514903210 +0800 Change: 2018-11-15 16:08:24.180228872 +0800 Birth: - lizhijian@:~/lkp/lkp-x86_64$ find . \| sed 's,^\./,,' \| cpio -o -H newc \| gzip -n -9 >../rc-local.cgz lizhijian@:~/lkp/lkp-x86_64$ gzip -dc ../rc-local.cgz \| cpio -t . etc etc/rc.local.hardlink <<< it will be extracted first at this initrd etc/rc.local 3) concate 2 initrds and boot lizhijian@:~/lkp$ cat rootfs.cgz rc-local.cgz >concate-initrd.cgz lizhijian@:~/lkp$ qemu-system-x86_64 -nographic -enable-kvm -cpu host -smp 1 -m 1024 -kernel ~/lkp/linux/arch/x86/boot/bzImage -append "console=ttyS0 earlyprint=ttyS0 ignore_loglevel" -initrd ./concate-initr.cgz -serial stdio -nodefaults In this case, sys_link(2) will fail and return -EEXIST, so we can only get the rc.local at rootfs.cgz instead of rc-local.cgz [[email protected]: move code to avoid forward declaration] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Li Zhijian <[email protected]> Cc: Philip Li <[email protected]> Cc: Dominik Brodowski <[email protected]> Cc: Li Zhijian <[email protected]> Cc: Al Viro <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-11-30	kernel/kcov.c: mark funcs in __sanitizer_cov_trace_pc() as notrace	Anders Roxell	1	-2/+2
	Since __sanitizer_cov_trace_pc() is marked as notrace, function calls in __sanitizer_cov_trace_pc() shouldn't be traced either. ftrace_graph_caller() gets called for each function that isn't marked 'notrace', like canonicalize_ip(). This is the call trace from a run: [ 139.644550] ftrace_graph_caller+0x1c/0x24 [ 139.648352] canonicalize_ip+0x18/0x28 [ 139.652313] __sanitizer_cov_trace_pc+0x14/0x58 [ 139.656184] sched_clock+0x34/0x1e8 [ 139.659759] trace_clock_local+0x40/0x88 [ 139.663722] ftrace_push_return_trace+0x8c/0x1f0 [ 139.667767] prepare_ftrace_return+0xa8/0x100 [ 139.671709] ftrace_graph_caller+0x1c/0x24 Rework so that check_kcov_mode() and canonicalize_ip() that are called from __sanitizer_cov_trace_pc() are also marked as notrace. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnd Bergmann <[email protected]> Signen-off-by: Anders Roxell <[email protected]> Co-developed-by: Arnd Bergmann <[email protected]> Acked-by: Steven Rostedt (VMware) <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Ingo Molnar <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-11-30	psi: make disabling/enabling easier for vendor kernels	Johannes Weiner	5	-14/+40
	Mel Gorman reports a hackbench regression with psi that would prohibit shipping the suse kernel with it default-enabled, but he'd still like users to be able to opt in at little to no cost to others. With the current combination of CONFIG_PSI and the psi_disabled bool set from the commandline, this is a challenge. Do the following things to make it easier: 1. Add a config option CONFIG_PSI_DEFAULT_DISABLED that allows distros to enable CONFIG_PSI in their kernel but leave the feature disabled unless a user requests it at boot-time. To avoid double negatives, rename psi_disabled= to psi=. 2. Make psi_disabled a static branch to eliminate any branch costs when the feature is disabled. In terms of numbers before and after this patch, Mel says: : The following is a comparision using CONFIG_PSI=n as a baseline against : your patch and a vanilla kernel : : 4.20.0-rc4 4.20.0-rc4 4.20.0-rc4 : kconfigdisable-v1r1 vanilla psidisable-v1r1 : Amean 1 1.3100 ( 0.00%) 1.3923 ( -6.28%) 1.3427 ( -2.49%) : Amean 3 3.8860 ( 0.00%) 4.1230 * -6.10%* 3.8860 ( -0.00%) : Amean 5 6.8847 ( 0.00%) 8.0390 * -16.77%* 6.7727 ( 1.63%) : Amean 7 9.9310 ( 0.00%) 10.8367 * -9.12%* 9.9910 ( -0.60%) : Amean 12 16.6577 ( 0.00%) 18.2363 * -9.48%* 17.1083 ( -2.71%) : Amean 18 26.5133 ( 0.00%) 27.8833 * -5.17%* 25.7663 ( 2.82%) : Amean 24 34.3003 ( 0.00%) 34.6830 ( -1.12%) 32.0450 ( 6.58%) : Amean 30 40.0063 ( 0.00%) 40.5800 ( -1.43%) 41.5087 ( -3.76%) : Amean 32 40.1407 ( 0.00%) 41.2273 ( -2.71%) 39.9417 ( 0.50%) : : It's showing that the vanilla kernel takes a hit (as the bisection : indicated it would) and that disabling PSI by default is reasonably : close in terms of performance for this particular workload on this : particular machine so; Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Johannes Weiner <[email protected]> Tested-by: Mel Gorman <[email protected]> Reported-by: Mel Gorman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-11-30	proc: fixup map_files test on arm	Alexey Dobriyan	1	-2/+7
	https://bugs.linaro.org/show_bug.cgi?id=3782 Turns out arm doesn't permit mapping address 0, so try minimum virtual address instead. Link: http://lkml.kernel.org/r/20181113165446.GA28157@avx2 Signed-off-by: Alexey Dobriyan <[email protected]> Reported-by: Rafael David Tinoco <[email protected]> Tested-by: Rafael David Tinoco <[email protected]> Acked-by: Cyrill Gorcunov <[email protected]> Cc: Shuah Khan <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-11-30	debugobjects: avoid recursive calls with kmemleak	Qian Cai	1	-3/+2
	CONFIG_DEBUG_OBJECTS_RCU_HEAD does not play well with kmemleak due to recursive calls. fill_pool kmemleak_ignore make_black_object put_object __call_rcu (kernel/rcu/tree.c) debug_rcu_head_queue debug_object_activate debug_object_init fill_pool kmemleak_ignore make_black_object ... So add SLAB_NOLEAKTRACE to kmem_cache_create() to not register newly allocated debug objects at all. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Qian Cai <[email protected]> Suggested-by: Catalin Marinas <[email protected]> Acked-by: Waiman Long <[email protected]> Acked-by: Catalin Marinas <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Yang Shi <[email protected]> Cc: Arnd Bergmann <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-11-30	userfaultfd: shmem: UFFDIO_COPY: set the page dirty if VM_WRITE is not set	Andrea Arcangeli	1	-0/+11
	Set the page dirty if VM_WRITE is not set because in such case the pte won't be marked dirty and the page would be reclaimed without writepage (i.e. swapout in the shmem case). This was found by source review. Most apps (certainly including QEMU) only use UFFDIO_COPY on PROT_READ\|PROT_WRITE mappings or the app can't modify the memory in the first place. This is for correctness and it could help the non cooperative use case to avoid unexpected data loss. Link: http://lkml.kernel.org/r/[email protected] Reviewed-by: Hugh Dickins <[email protected]> Cc: [email protected] Fixes: 4c27fe4c4c84 ("userfaultfd: shmem: add shmem_mcopy_atomic_pte for userfaultfd support") Reported-by: Hugh Dickins <[email protected]> Signed-off-by: Andrea Arcangeli <[email protected]> Cc: "Dr. David Alan Gilbert" <[email protected]> Cc: Jann Horn <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Peter Xu <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-11-30	userfaultfd: shmem: add i_size checks	Andrea Arcangeli	2	-4/+40
	With MAP_SHARED: recheck the i_size after taking the PT lock, to serialize against truncate with the PT lock. Delete the page from the pagecache if the i_size_read check fails. With MAP_PRIVATE: check the i_size after the PT lock before mapping anonymous memory or zeropages into the MAP_PRIVATE shmem mapping. A mostly irrelevant cleanup: like we do the delete_from_page_cache() pagecache removal after dropping the PT lock, the PT lock is a spinlock so drop it before the sleepable page lock. Link: http://lkml.kernel.org/r/[email protected] Fixes: 4c27fe4c4c84 ("userfaultfd: shmem: add shmem_mcopy_atomic_pte for userfaultfd support") Signed-off-by: Andrea Arcangeli <[email protected]> Reviewed-by: Mike Rapoport <[email protected]> Reviewed-by: Hugh Dickins <[email protected]> Reported-by: Jann Horn <[email protected]> Cc: <[email protected]> Cc: "Dr. David Alan Gilbert" <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Peter Xu <[email protected]> Cc: [email protected] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-11-30	userfaultfd: shmem/hugetlbfs: only allow to register VM_MAYWRITE vmas	Andrea Arcangeli	2	-9/+21
	After the VMA to register the uffd onto is found, check that it has VM_MAYWRITE set before allowing registration. This way we inherit all common code checks before allowing to fill file holes in shmem and hugetlbfs with UFFDIO_COPY. The userfaultfd memory model is not applicable for readonly files unless it's a MAP_PRIVATE. Link: http://lkml.kernel.org/r/[email protected] Fixes: ff62a3421044 ("hugetlb: implement memfd sealing") Signed-off-by: Andrea Arcangeli <[email protected]> Reviewed-by: Mike Rapoport <[email protected]> Reviewed-by: Hugh Dickins <[email protected]> Reported-by: Jann Horn <[email protected]> Fixes: 4c27fe4c4c84 ("userfaultfd: shmem: add shmem_mcopy_atomic_pte for userfaultfd support") Cc: <[email protected]> Cc: "Dr. David Alan Gilbert" <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Peter Xu <[email protected]> Cc: [email protected] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-11-30	userfaultfd: shmem: allocate anonymous memory for MAP_PRIVATE shmem	Andrea Arcangeli	1	-2/+13
	Userfaultfd did not create private memory when UFFDIO_COPY was invoked on a MAP_PRIVATE shmem mapping. Instead it wrote to the shmem file, even when that had not been opened for writing. Though, fortunately, that could only happen where there was a hole in the file. Fix the shmem-backed implementation of UFFDIO_COPY to create private memory for MAP_PRIVATE mappings. The hugetlbfs-backed implementation was already correct. This change is visible to userland, if userfaultfd has been used in unintended ways: so it introduces a small risk of incompatibility, but is necessary in order to respect file permissions. An app that uses UFFDIO_COPY for anything like postcopy live migration won't notice the difference, and in fact it'll run faster because there will be no copy-on-write and memory waste in the tmpfs pagecache anymore. Userfaults on MAP_PRIVATE shmem keep triggering only on file holes like before. The real zeropage can also be built on a MAP_PRIVATE shmem mapping through UFFDIO_ZEROPAGE and that's safe because the zeropage pte is never dirty, in turn even an mprotect upgrading the vma permission from PROT_READ to PROT_READ\|PROT_WRITE won't make the zeropage pte writable. Link: http://lkml.kernel.org/r/[email protected] Fixes: 4c27fe4c4c84 ("userfaultfd: shmem: add shmem_mcopy_atomic_pte for userfaultfd support") Signed-off-by: Andrea Arcangeli <[email protected]> Reported-by: Mike Rapoport <[email protected]> Reviewed-by: Hugh Dickins <[email protected]> Cc: <[email protected]> Cc: "Dr. David Alan Gilbert" <[email protected]> Cc: Jann Horn <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Peter Xu <[email protected]> Cc: [email protected] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-11-30	userfaultfd: use ENOENT instead of EFAULT if the atomic copy user fails	Andrea Arcangeli	3	-5/+5
	Patch series "userfaultfd shmem updates". Jann found two bugs in the userfaultfd shmem MAP_SHARED backend: the lack of the VM_MAYWRITE check and the lack of i_size checks. Then looking into the above we also fixed the MAP_PRIVATE case. Hugh by source review also found a data loss source if UFFDIO_COPY is used on shmem MAP_SHARED PROT_READ mappings (the production usages incidentally run with PROT_READ\|PROT_WRITE, so the data loss couldn't happen in those production usages like with QEMU). The whole patchset is marked for stable. We verified QEMU postcopy live migration with guest running on shmem MAP_PRIVATE run as well as before after the fix of shmem MAP_PRIVATE. Regardless if it's shmem or hugetlbfs or MAP_PRIVATE or MAP_SHARED, QEMU unconditionally invokes a punch hole if the guest mapping is filebacked and a MADV_DONTNEED too (needed to get rid of the MAP_PRIVATE COWs and for the anon backend). This patch (of 5): We internally used EFAULT to communicate with the caller, switch to ENOENT, so EFAULT can be used as a non internal retval. Link: http://lkml.kernel.org/r/[email protected] Fixes: 4c27fe4c4c84 ("userfaultfd: shmem: add shmem_mcopy_atomic_pte for userfaultfd support") Signed-off-by: Andrea Arcangeli <[email protected]> Reviewed-by: Mike Rapoport <[email protected]> Reviewed-by: Hugh Dickins <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Jann Horn <[email protected]> Cc: Peter Xu <[email protected]> Cc: "Dr. David Alan Gilbert" <[email protected]> Cc: <[email protected]> Cc: [email protected] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-11-30	lib/test_kmod.c: fix rmmod double free	Luis Chamberlain	1	-1/+0
	We free the misc device string twice on rmmod; fix this. Without this we cannot remove the module without crashing. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Luis Chamberlain <[email protected]> Reported-by: Randy Dunlap <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Cc: <[email protected]> [4.12+] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-11-30	hfsplus: do not free node before using	Pan Bian	1	-1/+2
	hfs_bmap_free() frees node via hfs_bnode_put(node). However it then reads node->this when dumping error message on an error path, which may result in a use-after-free bug. This patch frees node only when it is never used. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Pan Bian <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Cc: Ernesto A. Fernandez <[email protected]> Cc: Joe Perches <[email protected]> Cc: Viacheslav Dubeyko <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-11-30	hfs: do not free node before using	Pan Bian	1	-1/+2
	hfs_bmap_free() frees the node via hfs_bnode_put(node). However, it then reads node->this when dumping error message on an error path, which may result in a use-after-free bug. This patch frees the node only when it is never again used. Link: http://lkml.kernel.org/r/[email protected] Fixes: a1185ffa2fc ("HFS rewrite") Signed-off-by: Pan Bian <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Cc: Joe Perches <[email protected]> Cc: Ernesto A. Fernandez <[email protected]> Cc: Viacheslav Dubeyko <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-11-30	proc: update MAINTAINERS with proc.txt	Alexey Dobriyan	1	-0/+1
	Turns out that /proc has official documentation and people even trying to keep it uptodate. Link: http://lkml.kernel.org/r/20181116134630.GA8004@avx2 Signed-off-by: Alexey Dobriyan <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-11-30	mm/page_alloc.c: fix calculation of pgdat->nr_zones	Wei Yang	1	-1/+3
	init_currently_empty_zone() will adjust pgdat->nr_zones and set it to 'zone_idx(zone) + 1' unconditionally. This is correct in the normal case, while not exact in hot-plug situation. This function is used in two places: * free_area_init_core() * move_pfn_range_to_zone() In the first case, we are sure zone index increase monotonically. While in the second one, this is under users control. One way to reproduce this is: ---------------------------- 1. create a virtual machine with empty node1 -m 4G,slots=32,maxmem=32G \ -smp 4,maxcpus=8 \ -numa node,nodeid=0,mem=4G,cpus=0-3 \ -numa node,nodeid=1,mem=0G,cpus=4-7 2. hot-add cpu 3-7 cpu-add [3-7] 2. hot-add memory to nod1 object_add memory-backend-ram,id=ram0,size=1G device_add pc-dimm,id=dimm0,memdev=ram0,node=1 3. online memory with following order echo online_movable > memory47/state echo online > memory40/state After this, node1 will have its nr_zones equals to (ZONE_NORMAL + 1) instead of (ZONE_MOVABLE + 1). Michal said: "Having an incorrect nr_zones might result in all sorts of problems which would be quite hard to debug (e.g. reclaim not considering the movable zone). I do not expect many users would suffer from this it but still this is trivial and obviously right thing to do so backporting to the stable tree shouldn't be harmful (last famous words)" Link: http://lkml.kernel.org/r/[email protected] Fixes: f1dd2cd13c4b ("mm, memory_hotplug: do not associate hotadded memory to zones until online") Signed-off-by: Wei Yang <[email protected]> Acked-by: Michal Hocko <[email protected]> Reviewed-by: Oscar Salvador <[email protected]> Cc: Anshuman Khandual <[email protected]> Cc: Dave Hansen <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-11-30	mm: use swp_offset as key in shmem_replace_page()	Yu Zhao	1	-2/+4
	We changed the key of swap cache tree from swp_entry_t.val to swp_offset. We need to do so in shmem_replace_page() as well. Hugh said: "shmem_replace_page() has been wrong since the day I wrote it: good enough to work on swap "type" 0, which is all most people ever use (especially those few who need shmem_replace_page() at all), but broken once there are any non-0 swp_type bits set in the higher order bits" Link: http://lkml.kernel.org/r/[email protected] Fixes: f6ab1f7f6b2d ("mm, swap: use offset of swap entry as key of swap cache") Signed-off-by: Yu Zhao <[email protected]> Reviewed-by: Matthew Wilcox <[email protected]> Acked-by: Hugh Dickins <[email protected]> Cc: <[email protected]> [4.9+] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>