Age | Commit message (Collapse) | Author | Files | Lines |
|
Currently the copy_to_user of data in the gentry struct is copying
uninitiaized data in field _pad from the stack to userspace.
Fix this by explicitly memset'ing gentry to zero, this also will zero any
compiler added padding fields that may be in struct (currently there are
none).
Detected by CoverityScan, CID#200783 ("Uninitialized scalar variable")
Fixes: b263b31e8ad6 ("x86, mtrr: Use explicit sizing and padding for the 64-bit ioctls")
Signed-off-by: Colin Ian King <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Tyler Hicks <[email protected]>
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]
|
|
Andy spotted a regression in the fs/gs base helpers after the patch series
was committed. The helper functions which write fs/gs base are not just
writing the base, they are also changing the index. That's wrong and needs
to be separated because writing the base has not to modify the index.
While the regression is not causing any harm right now because the only
caller depends on that behaviour, it's a guarantee for subtle breakage down
the road.
Make the index explicitly changed from the caller, instead of including
the code in the helpers.
Subsequently, the task write helpers do not handle for the current task
anymore. The range check for a base value is also factored out, to minimize
code redundancy from the caller.
Fixes: b1378a561fd1 ("x86/fsgsbase/64: Introduce FS/GS base helper functions")
Suggested-by: Andy Lutomirski <[email protected]>
Signed-off-by: Chang S. Bae <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Andy Lutomirski <[email protected]>
Cc: "H . Peter Anvin" <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Ravi Shankar <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
In commit:
a7295fd53c39 ("x86/mm/cpa: Use flush_tlb_kernel_range()")
I misread the CAP array code and incorrectly used
tlb_flush_kernel_range(), resulting in missing TLB flushes and
consequent failures.
Instead do a full invalidate in this case -- for now.
Reported-by: StDenis, Tom <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Fixes: a7295fd53c39 ("x86/mm/cpa: Use flush_tlb_kernel_range()")
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Commit
379d98ddf413 ("x86: vdso: Use $LD instead of $CC to link")
accidentally broke unwinding from userspace, because ld would strip the
.eh_frame sections when linking.
Originally, the compiler would implicitly add --eh-frame-hdr when
invoking the linker, but when this Makefile was converted from invoking
ld via the compiler, to invoking it directly (like vmlinux does),
the flag was missed. (The EH_FRAME section is important for the VDSO
shared libraries, but not for vmlinux.)
Fix the problem by explicitly specifying --eh-frame-hdr, which restores
parity with the old method.
See relevant bug reports for additional info:
https://bugzilla.kernel.org/show_bug.cgi?id=201741
https://bugzilla.redhat.com/show_bug.cgi?id=1659295
Fixes: 379d98ddf413 ("x86: vdso: Use $LD instead of $CC to link")
Reported-by: Florian Weimer <[email protected]>
Reported-by: Carlos O'Donell <[email protected]>
Reported-by: "H. J. Lu" <[email protected]>
Signed-off-by: Alistair Strachan <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Tested-by: Laura Abbott <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Carlos O'Donell <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Joel Fernandes <[email protected]>
Cc: [email protected]
Cc: Laura Abbott <[email protected]>
Cc: stable <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: X86 ML <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
A decoy address is used by set_mce_nospec() to update the cache attributes
for a page that may contain poison (multi-bit ECC error) while attempting
to minimize the possibility of triggering a speculative access to that
page.
When reserve_memtype() is handling a decoy address it needs to convert it
to its real physical alias. The conversion, AND'ing with __PHYSICAL_MASK,
is broken for a 32-bit physical mask and reserve_memtype() is passed the
last physical page. Gert reports triggering the:
BUG_ON(start >= end);
...assertion when running a 32-bit non-PAE build on a platform that has
a driver resource at the top of physical memory:
BIOS-e820: [mem 0x00000000fff00000-0x00000000ffffffff] reserved
Given that the decoy address scheme is only targeted at 64-bit builds and
assumes that the top of physical address space is free for use as a decoy
address range, simply bypass address sanitization in the 32-bit case.
Lastly, there was no need to crash the system when this failure occurred,
and no need to crash future systems if the assumptions of decoy addresses
are ever violated. Change the BUG_ON() to a WARN() with an error return.
Fixes: 510ee090abc3 ("x86/mm/pat: Prepare {reserve, free}_memtype() for...")
Reported-by: Gert Robben <[email protected]>
Signed-off-by: Dan Williams <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Gert Robben <[email protected]>
Cc: [email protected]
Cc: Andy Shevchenko <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: [email protected]
Cc: <[email protected]>
Link: https://lkml.kernel.org/r/154454337985.789277.12133288391664677775.stgit@dwillia2-desk3.amr.corp.intel.com
|
|
sequence
The user triggers the creation of a pseudo-locked region when writing
the requested schemata to the schemata resctrl file. The pseudo-locking
of a region is required to be done on a CPU that is associated with the
cache on which the pseudo-locked region will reside. In order to run the
locking code on a specific CPU, the needed CPU has to be selected and
ensured to remain online during the entire locking sequence.
At this time, the cpu_hotplug_lock is not taken during the pseudo-lock
region creation and it is thus possible for a CPU to be selected to run
the pseudo-locking code and then that CPU to go offline before the
thread is able to run on it.
Fix this by ensuring that the cpu_hotplug_lock is taken while the CPU on
which code has to run needs to be controlled. Since the cpu_hotplug_lock
is always taken before rdtgroup_mutex the lock order is maintained.
Fixes: e0bdfe8e36f3 ("x86/intel_rdt: Support creation/removal of pseudo-locked region")
Signed-off-by: Reinette Chatre <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: stable <[email protected]>
Cc: x86-ml <[email protected]>
Link: https://lkml.kernel.org/r/b7b17432a80f95a1fa21a1698ba643014f58ad31.1544476425.git.reinette.chatre@intel.com
|
|
The LDT remap placement has been changed. It's now placed before the direct
mapping in the kernel virtual address space for both paging modes.
Change address markers order accordingly.
Fixes: d52888aa2753 ("x86/mm: Move LDT remap out of KASLR region on 5-level paging")
Signed-off-by: Kirill A. Shutemov <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]
|
|
There is a guard hole at the beginning of the kernel address space, also
used by hypervisors. It occupies 16 PGD entries.
This reserved range is not defined explicitely, it is calculated relative
to other entities: direct mapping and user space ranges.
The calculation got broken by recent changes of the kernel memory layout:
LDT remap range is now mapped before direct mapping and makes the
calculation invalid.
The breakage leads to crash on Xen dom0 boot[1].
Define the reserved range explicitely. It's part of kernel ABI (hypervisors
expect it to be stable) and must not depend on changes in the rest of
kernel memory layout.
[1] https://lists.xenproject.org/archives/html/xen-devel/2018-11/msg03313.html
Fixes: d52888aa2753 ("x86/mm: Move LDT remap out of KASLR region on 5-level paging")
Reported-by: Hans van Kranenburg <[email protected]>
Signed-off-by: Kirill A. Shutemov <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Hans van Kranenburg <[email protected]>
Reviewed-by: Juergen Gross <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]
|
|
GNU linker's -z common-page-size's default value is based on the target
architecture. arch/x86/entry/vdso/Makefile sets it to the architecture
default, which is implicit and redundant. Drop it.
Fixes: 2aae950b21e4 ("x86_64: Add vDSO for x86-64 with gettimeofday/clock_gettime/getcpu")
Reported-by: Dmitry Golovin <[email protected]>
Reported-by: Bill Wendling <[email protected]>
Suggested-by: Dmitry Golovin <[email protected]>
Suggested-by: Rui Ueyama <[email protected]>
Signed-off-by: Nick Desaulniers <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Acked-by: Andy Lutomirski <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Fangrui Song <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: x86-ml <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
Link: https://bugs.llvm.org/show_bug.cgi?id=38774
Link: https://github.com/ClangBuiltLinux/linux/issues/31
|
|
It is troublesome to add a diagnostic like this to the Makefile
parse stage because the top-level Makefile could be parsed with
a stale include/config/auto.conf.
Once you are hit by the error about non-retpoline compiler, the
compilation still breaks even after disabling CONFIG_RETPOLINE.
The easiest fix is to move this check to the "archprepare" like
this commit did:
829fe4aa9ac1 ("x86: Allow generating user-space headers without a compiler")
Reported-by: Meelis Roos <[email protected]>
Tested-by: Meelis Roos <[email protected]>
Signed-off-by: Masahiro Yamada <[email protected]>
Acked-by: Zhenzhong Duan <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Zhenzhong Duan <[email protected]>
Fixes: 4cd24de3a098 ("x86/retpoline: Make CONFIG_RETPOLINE depend on compiler support")
Link: http://lkml.kernel.org/r/[email protected]
Link: https://lkml.org/lkml/2018/12/4/206
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Gunnar Krueger reported a systemd-boot failure and bisected it down to:
e6e094e053af75 ("x86/acpi, x86/boot: Take RSDP address from boot params if available")
In case a broken boot loader doesn't clear its 'struct boot_params', clear
rsdp_addr in sanitize_boot_params().
Reported-by: Gunnar Krueger <[email protected]>
Tested-by: Gunnar Krueger <[email protected]>
Signed-off-by: Juergen Gross <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: [email protected]
Fixes: e6e094e053af75 ("x86/acpi, x86/boot: Take RSDP address from boot params if available")
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Pull ARM SoC fixes from Olof Johansson:
"Volume is a little higher than usual due to a set of gpio fixes for
Davinci platforms that's been around a while, still seemed appropriate
to not hold off until next merge window.
Besides that it's the usual mix of minor fixes, mostly corrections of
small stuff in device trees.
Major stability-related one is the removal of a regulator from DT on
Rock960, since DVFS caused undervoltage. I expect it'll be restored
once they figure out the underlying issue"
* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (28 commits)
MAINTAINERS: Remove unused Qualcomm SoC mailing list
ARM: davinci: dm644x: set the GPIO base to 0
ARM: davinci: da830: set the GPIO base to 0
ARM: davinci: dm355: set the GPIO base to 0
ARM: davinci: dm646x: set the GPIO base to 0
ARM: davinci: dm365: set the GPIO base to 0
ARM: davinci: da850: set the GPIO base to 0
gpio: davinci: restore a way to manually specify the GPIO base
ARM: davinci: dm644x: define gpio interrupts as separate resources
ARM: davinci: dm355: define gpio interrupts as separate resources
ARM: davinci: dm646x: define gpio interrupts as separate resources
ARM: davinci: dm365: define gpio interrupts as separate resources
ARM: davinci: da8xx: define gpio interrupts as separate resources
ARM: dts: at91: sama5d2: use the divided clock for SMC
ARM: dts: imx51-zii-rdu1: Remove EEPROM node
ARM: dts: rockchip: Remove @0 from the veyron memory node
arm64: dts: rockchip: Fix PCIe reset polarity for rk3399-puma-haikou.
arm64: dts: qcom: msm8998: Reserve gpio ranges on MTP
arm64: dts: sdm845-mtp: Reserve reserved gpios
arm64: dts: ti: k3-am654: Fix wakeup_uart reg address
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen fixes from Juergen Gross:
- A revert of a previous commit as it is no longer necessary and has
shown to cause problems in some memory hotplug cases.
- Some small fixes and a minor cleanup.
- A patch for adding better diagnostic data in a very rare failure
case.
* tag 'for-linus-4.20a-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
pvcalls-front: fixes incorrect error handling
Revert "xen/balloon: Mark unallocated host memory as UNUSABLE"
xen: xlate_mmu: add missing header to fix 'W=1' warning
xen/x86: add diagnostic printout to xen_mc_flush() in case of error
x86/xen: cleanup includes in arch/x86/xen/spinlock.c
|
|
git://git.infradead.org/users/vkoul/slave-dma
Pull dmaengine fixes from Vinod Koul:
"This contains two fixes to at_hdmac which fixes long standing bus
reported recently on serial transfers causing memory leak. These fixes
were done by Richard Genoud"
* tag 'dmaengine-fix-4.20-rc5' of git://git.infradead.org/users/vkoul/slave-dma:
dmaengine: at_hdmac: fix module unloading
dmaengine: at_hdmac: fix memory leak in at_dma_xlate()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull STIBP fallout fixes from Thomas Gleixner:
"The performance destruction department finally got it's act together
and came up with a cure for the STIPB regression:
- Provide a command line option to control the spectre v2 user space
mitigations. Default is either seccomp or prctl (if seccomp is
disabled in Kconfig). prctl allows mitigation opt-in, seccomp
enables the migitation for sandboxed processes.
- Rework the code to handle the conditional STIBP/IBPB control and
remove the now unused ptrace_may_access_sched() optimization
attempt
- Disable STIBP automatically when SMT is disabled
- Optimize the switch_to() logic to avoid MSR writes and invocations
of __switch_to_xtra().
- Make the asynchronous speculation TIF updates synchronous to
prevent stale mitigation state.
As a general cleanup this also makes retpoline directly depend on
compiler support and removes the 'minimal retpoline' option which just
pretended to provide some form of security while providing none"
* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (31 commits)
x86/speculation: Provide IBPB always command line options
x86/speculation: Add seccomp Spectre v2 user space protection mode
x86/speculation: Enable prctl mode for spectre_v2_user
x86/speculation: Add prctl() control for indirect branch speculation
x86/speculation: Prepare arch_smt_update() for PRCTL mode
x86/speculation: Prevent stale SPEC_CTRL msr content
x86/speculation: Split out TIF update
ptrace: Remove unused ptrace_may_access_sched() and MODE_IBRS
x86/speculation: Prepare for conditional IBPB in switch_mm()
x86/speculation: Avoid __switch_to_xtra() calls
x86/process: Consolidate and simplify switch_to_xtra() code
x86/speculation: Prepare for per task indirect branch speculation control
x86/speculation: Add command line control for indirect branch speculation
x86/speculation: Unify conditional spectre v2 print functions
x86/speculataion: Mark command line parser data __initdata
x86/speculation: Mark string arrays const correctly
x86/speculation: Reorder the spec_v2 code
x86/l1tf: Show actual SMT state
x86/speculation: Rework SMT state change
sched/smt: Expose sched_smt_present static key
...
|
|
Pull block layer fixes from Jens Axboe:
- Single range elevator discard merge fix, that caused crashes (Ming)
- Fix for a regression in O_DIRECT, where we could potentially lose the
error value (Maximilian Heyne)
- NVMe pull request from Christoph, with little fixes all over the map
for NVMe.
* tag 'for-linus-20181201' of git://git.kernel.dk/linux-block:
block: fix single range discard merge
nvme-rdma: fix double freeing of async event data
nvme: flush namespace scanning work just before removing namespaces
nvme: warn when finding multi-port subsystems without multipathing enabled
fs: fix lost error code in dio_complete
nvme-pci: fix surprise removal
nvme-fc: initialize nvme_req(rq)->ctrl after calling __nvme_fc_init_request()
nvme: Free ctrl device name on init failure
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull PCI fixes from Bjorn Helgaas:
- Fix a link speed checking interface that broke PCIe gen3 cards in
gen1 slots (Mikulas Patocka)
- Fix an imx6 link training error (Trent Piepho)
- Fix a layerscape outbound window accessor calling error (Hou
Zhiqiang)
- Fix a DesignWare endpoint MSI-X address calculation error (Gustavo
Pimentel)
* tag 'pci-v4.20-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
PCI: Fix incorrect value returned from pcie_get_speed_cap()
PCI: dwc: Fix MSI-X EP framework address calculation bug
PCI: layerscape: Fix wrong invocation of outbound window disable accessor
PCI: imx6: Fix link training status detection in link up check
|
|
- Fix DesignWare endpoint MSI-X address calculation bug (Gustavo
Pimentel)
- Fix Layerscape outbound window disable usage (Hou Zhiqiang)
- Fix imx6 link up detection (Trent Piepho)
* lorenzo/pci/controller-fixes:
PCI: dwc: Fix MSI-X EP framework address calculation bug
PCI: layerscape: Fix wrong invocation of outbound window disable accessor
PCI: imx6: Fix link training status detection in link up check
|
|
The macros PCI_EXP_LNKCAP_SLS_*GB are values, not bit masks. We must mask
the register and compare it against them.
This fixes errors like this:
amdgpu: [powerplay] failed to send message 261 ret is 0
when a PCIe-v3 card is plugged into a PCIe-v1 slot, because the slot is
being incorrectly reported as PCIe-v3 capable.
6cf57be0f78e, which appeared in v4.17, added pcie_get_speed_cap() with the
incorrect test of PCI_EXP_LNKCAP_SLS as a bitmask. 5d9a63304032, which
appeared in v4.19, changed amdgpu to use pcie_get_speed_cap(), so the
amdgpu bug reports below are regressions in v4.19.
Fixes: 6cf57be0f78e ("PCI: Add pcie_get_speed_cap() to find max supported link speed")
Fixes: 5d9a63304032 ("drm/amdgpu: use pcie functions for link width and speed")
Link: https://bugs.freedesktop.org/show_bug.cgi?id=108704
Link: https://bugs.freedesktop.org/show_bug.cgi?id=108778
Signed-off-by: Mikulas Patocka <[email protected]>
[bhelgaas: update comment, remove use of PCI_EXP_LNKCAP_SLS_8_0GB and
PCI_EXP_LNKCAP_SLS_16_0GB since those should be covered by PCI_EXP_LNKCAP2,
remove test of PCI_EXP_LNKCAP for zero, since that register is required]
Signed-off-by: Bjorn Helgaas <[email protected]>
Acked-by: Alex Deucher <[email protected]>
Cc: [email protected] # v4.17+
|
|
Merge misc fixes from Andrew Morton:
"31 fixes"
* emailed patches from Andrew Morton <[email protected]>: (31 commits)
ocfs2: fix potential use after free
mm/khugepaged: fix the xas_create_range() error path
mm/khugepaged: collapse_shmem() do not crash on Compound
mm/khugepaged: collapse_shmem() without freezing new_page
mm/khugepaged: minor reorderings in collapse_shmem()
mm/khugepaged: collapse_shmem() remember to clear holes
mm/khugepaged: fix crashes due to misaccounted holes
mm/khugepaged: collapse_shmem() stop if punched or truncated
mm/huge_memory: fix lockdep complaint on 32-bit i_size_read()
mm/huge_memory: splitting set mapping+index before unfreeze
mm/huge_memory: rename freeze_page() to unmap_page()
initramfs: clean old path before creating a hardlink
kernel/kcov.c: mark funcs in __sanitizer_cov_trace_pc() as notrace
psi: make disabling/enabling easier for vendor kernels
proc: fixup map_files test on arm
debugobjects: avoid recursive calls with kmemleak
userfaultfd: shmem: UFFDIO_COPY: set the page dirty if VM_WRITE is not set
userfaultfd: shmem: add i_size checks
userfaultfd: shmem/hugetlbfs: only allow to register VM_MAYWRITE vmas
userfaultfd: shmem: allocate anonymous memory for MAP_PRIVATE shmem
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux
Pull few more MIPS fixes from Paul Burton:
- Fix mips_get_syscall_arg() to operate on the task specified when
detecting o32 tasks running on MIPS64 kernels.
- Fix some incorrect GPIO pin muxing for the MT7620 SoC.
- Update the linux-mips mailing list address.
* tag 'mips_fixes_4.20_4' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
MAINTAINERS: Update linux-mips mailing list address
MIPS: ralink: Fix mt7620 nd_sd pinmux
mips: fix mips_get_syscall_arg o32 check
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Catalin Marinas:
- Cortex-A76 erratum workaround
- ftrace fix to enable syscall events on arm64
- Fix uninitialised pointer in iort_get_platform_device_domain()
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
ACPI/IORT: Fix iort_get_platform_device_domain() uninitialized pointer value
arm64: ftrace: Fix to enable syscall events on arm64
arm64: Add workaround for Cortex-A76 erratum 1286807
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull stackleak plugin fix from Kees Cook:
"Fix crash by not allowing kprobing of stackleak_erase() (Alexander
Popov)"
* tag 'gcc-plugins-v4.20-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
stackleak: Disable function tracing and kprobes for stackleak_erase()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
Pull fscache and cachefiles fixes from David Howells:
"Misc fixes:
- Fix an assertion failure at fs/cachefiles/xattr.c:138 caused by a
race between a cache object lookup failing and someone attempting
to reenable that object, thereby triggering an update of the
object's attributes.
- Fix an assertion failure at fs/fscache/operation.c:449 caused by a
split atomic subtract and atomic read that allows a race to happen.
- Fix a leak of backing pages when simultaneously reading the same
page from the same object from two or more threads.
- Fix a hang due to a race between a cache object being discarded and
the corresponding cookie being reenabled.
There are also some minor cleanups:
- Cast an enum value to a different enum type to prevent clang from
generating a warning. This shouldn't cause any sort of change in
the emitted code.
- Use ktime_get_real_seconds() instead of get_seconds(). This is just
used to uniquify a filename for an object to be placed in the
graveyard. Objects placed there are deleted by cachfilesd in
userspace immediately thereafter.
- Remove an initialised, but otherwise unused variable. This should
have been entirely optimised away anyway"
* tag 'fscache-fixes-20181130' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
fscache, cachefiles: remove redundant variable 'cache'
cachefiles: avoid deprecated get_seconds()
cachefiles: Explicitly cast enumerated type in put_object
fscache: fix race between enablement and dropping of object
cachefiles: Fix page leak in cachefiles_read_backing_file while vmscan is active
fscache: Fix race in fscache_op_complete() due to split atomic_sub & read
cachefiles: Fix an assertion failure when trying to update a failed object
|
|
The linux-mips.org infrastructure has been unreliable recently & nobody
with sufficient access to fix it is around to do so. As a result we're
moving away from it, and part of this is migrating our mailing list to
kernel.org.
Replace all instances of [email protected] in MAINTAINERS with
the shiny new [email protected] address.
The new list is now being archived on kernel.org at
https://lore.kernel.org/linux-mips/ which also holds the history of the
old linux-mips.org list.
Signed-off-by: Paul Burton <[email protected]>
Cc: [email protected]
Cc: [email protected]
|
|
ocfs2_get_dentry() calls iput(inode) to drop the reference count of
inode, and if the reference count hits 0, inode is freed. However, in
this function, it then reads inode->i_generation, which may result in a
use after free bug. Move the put operation later.
Link: http://lkml.kernel.org/r/[email protected]
Fixes: 781f200cb7a("ocfs2: Remove masklog ML_EXPORT.")
Signed-off-by: Pan Bian <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Cc: Mark Fasheh <[email protected]>
Cc: Joel Becker <[email protected]>
Cc: Junxiao Bi <[email protected]>
Cc: Joseph Qi <[email protected]>
Cc: Changwei Ge <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
collapse_shmem()'s xas_nomem() is very unlikely to fail, but it is
rightly given a failure path, so move the whole xas_create_range() block
up before __SetPageLocked(new_page): so that it does not need to
remember to unlock_page(new_page).
Add the missing mem_cgroup_cancel_charge(), and set (currently unused)
result to SCAN_FAIL rather than SCAN_SUCCEED.
Link: http://lkml.kernel.org/r/[email protected]
Fixes: 77da9389b9d5 ("mm: Convert collapse_shmem to XArray")
Signed-off-by: Hugh Dickins <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Jerome Glisse <[email protected]>
Cc: Konstantin Khlebnikov <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
collapse_shmem()'s VM_BUG_ON_PAGE(PageTransCompound) was unsafe: before
it holds page lock of the first page, racing truncation then extension
might conceivably have inserted a hugepage there already. Fail with the
SCAN_PAGE_COMPOUND result, instead of crashing (CONFIG_DEBUG_VM=y) or
otherwise mishandling the unexpected hugepage - though later we might
code up a more constructive way of handling it, with SCAN_SUCCESS.
Link: http://lkml.kernel.org/r/[email protected]
Fixes: f3f0e1d2150b2 ("khugepaged: add support of collapse for tmpfs/shmem pages")
Signed-off-by: Hugh Dickins <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Jerome Glisse <[email protected]>
Cc: Konstantin Khlebnikov <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: <[email protected]> [4.8+]
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
khugepaged's collapse_shmem() does almost all of its work, to assemble
the huge new_page from 512 scattered old pages, with the new_page's
refcount frozen to 0 (and refcounts of all old pages so far also frozen
to 0). Including shmem_getpage() to read in any which were out on swap,
memory reclaim if necessary to allocate their intermediate pages, and
copying over all the data from old to new.
Imagine the frozen refcount as a spinlock held, but without any lock
debugging to highlight the abuse: it's not good, and under serious load
heads into lockups - speculative getters of the page are not expecting
to spin while khugepaged is rescheduled.
One can get a little further under load by hacking around elsewhere; but
fortunately, freezing the new_page turns out to have been entirely
unnecessary, with no hacks needed elsewhere.
The huge new_page lock is already held throughout, and guards all its
subpages as they are brought one by one into the page cache tree; and
anything reading the data in that page, without the lock, before it has
been marked PageUptodate, would already be in the wrong. So simply
eliminate the freezing of the new_page.
Each of the old pages remains frozen with refcount 0 after it has been
replaced by a new_page subpage in the page cache tree, until they are
all unfrozen on success or failure: just as before. They could be
unfrozen sooner, but cause no problem once no longer visible to
find_get_entry(), filemap_map_pages() and other speculative lookups.
Link: http://lkml.kernel.org/r/[email protected]
Fixes: f3f0e1d2150b2 ("khugepaged: add support of collapse for tmpfs/shmem pages")
Signed-off-by: Hugh Dickins <[email protected]>
Acked-by: Kirill A. Shutemov <[email protected]>
Cc: Jerome Glisse <[email protected]>
Cc: Konstantin Khlebnikov <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: <[email protected]> [4.8+]
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Several cleanups in collapse_shmem(): most of which probably do not
really matter, beyond doing things in a more familiar and reassuring
order. Simplify the failure gotos in the main loop, and on success
update stats while interrupts still disabled from the last iteration.
Link: http://lkml.kernel.org/r/[email protected]
Fixes: f3f0e1d2150b2 ("khugepaged: add support of collapse for tmpfs/shmem pages")
Signed-off-by: Hugh Dickins <[email protected]>
Acked-by: Kirill A. Shutemov <[email protected]>
Cc: Jerome Glisse <[email protected]>
Cc: Konstantin Khlebnikov <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: <[email protected]> [4.8+]
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Huge tmpfs testing reminds us that there is no __GFP_ZERO in the gfp
flags khugepaged uses to allocate a huge page - in all common cases it
would just be a waste of effort - so collapse_shmem() must remember to
clear out any holes that it instantiates.
The obvious place to do so, where they are put into the page cache tree,
is not a good choice: because interrupts are disabled there. Leave it
until further down, once success is assured, where the other pages are
copied (before setting PageUptodate).
Link: http://lkml.kernel.org/r/[email protected]
Fixes: f3f0e1d2150b2 ("khugepaged: add support of collapse for tmpfs/shmem pages")
Signed-off-by: Hugh Dickins <[email protected]>
Acked-by: Kirill A. Shutemov <[email protected]>
Cc: Jerome Glisse <[email protected]>
Cc: Konstantin Khlebnikov <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: <[email protected]> [4.8+]
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Huge tmpfs testing on a shortish file mapped into a pmd-rounded extent
hit shmem_evict_inode()'s WARN_ON(inode->i_blocks) followed by
clear_inode()'s BUG_ON(inode->i_data.nrpages) when the file was later
closed and unlinked.
khugepaged's collapse_shmem() was forgetting to update mapping->nrpages
on the rollback path, after it had added but then needs to undo some
holes.
There is indeed an irritating asymmetry between shmem_charge(), whose
callers want it to increment nrpages after successfully accounting
blocks, and shmem_uncharge(), when __delete_from_page_cache() already
decremented nrpages itself: oh well, just add a comment on that to them
both.
And shmem_recalc_inode() is supposed to be called when the accounting is
expected to be in balance (so it can deduce from imbalance that reclaim
discarded some pages): so change shmem_charge() to update nrpages
earlier (though it's rare for the difference to matter at all).
Link: http://lkml.kernel.org/r/[email protected]
Fixes: 800d8c63b2e98 ("shmem: add huge pages support")
Fixes: f3f0e1d2150b2 ("khugepaged: add support of collapse for tmpfs/shmem pages")
Signed-off-by: Hugh Dickins <[email protected]>
Acked-by: Kirill A. Shutemov <[email protected]>
Cc: Jerome Glisse <[email protected]>
Cc: Konstantin Khlebnikov <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: <[email protected]> [4.8+]
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Huge tmpfs testing showed that although collapse_shmem() recognizes a
concurrently truncated or hole-punched page correctly, its handling of
holes was liable to refill an emptied extent. Add check to stop that.
Link: http://lkml.kernel.org/r/[email protected]
Fixes: f3f0e1d2150b2 ("khugepaged: add support of collapse for tmpfs/shmem pages")
Signed-off-by: Hugh Dickins <[email protected]>
Reviewed-by: Matthew Wilcox <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Jerome Glisse <[email protected]>
Cc: Konstantin Khlebnikov <[email protected]>
Cc: <[email protected]> [4.8+]
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Huge tmpfs testing, on 32-bit kernel with lockdep enabled, showed that
__split_huge_page() was using i_size_read() while holding the irq-safe
lru_lock and page tree lock, but the 32-bit i_size_read() uses an
irq-unsafe seqlock which should not be nested inside them.
Instead, read the i_size earlier in split_huge_page_to_list(), and pass
the end offset down to __split_huge_page(): all while holding head page
lock, which is enough to prevent truncation of that extent before the
page tree lock has been taken.
Link: http://lkml.kernel.org/r/[email protected]
Fixes: baa355fd33142 ("thp: file pages support for split_huge_page()")
Signed-off-by: Hugh Dickins <[email protected]>
Acked-by: Kirill A. Shutemov <[email protected]>
Cc: Jerome Glisse <[email protected]>
Cc: Konstantin Khlebnikov <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: <[email protected]> [4.8+]
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Huge tmpfs stress testing has occasionally hit shmem_undo_range()'s
VM_BUG_ON_PAGE(page_to_pgoff(page) != index, page).
Move the setting of mapping and index up before the page_ref_unfreeze()
in __split_huge_page_tail() to fix this: so that a page cache lookup
cannot get a reference while the tail's mapping and index are unstable.
In fact, might as well move them up before the smp_wmb(): I don't see an
actual need for that, but if I'm missing something, this way round is
safer than the other, and no less efficient.
You might argue that VM_BUG_ON_PAGE(page_to_pgoff(page) != index, page) is
misplaced, and should be left until after the trylock_page(); but left as
is has not crashed since, and gives more stringent assurance.
Link: http://lkml.kernel.org/r/[email protected]
Fixes: e9b61f19858a5 ("thp: reintroduce split_huge_page()")
Requires: 605ca5ede764 ("mm/huge_memory.c: reorder operations in __split_huge_page_tail()")
Signed-off-by: Hugh Dickins <[email protected]>
Acked-by: Kirill A. Shutemov <[email protected]>
Cc: Konstantin Khlebnikov <[email protected]>
Cc: Jerome Glisse <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: <[email protected]> [4.8+]
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The term "freeze" is used in several ways in the kernel, and in mm it
has the particular meaning of forcing page refcount temporarily to 0.
freeze_page() is just too confusing a name for a function that unmaps a
page: rename it unmap_page(), and rename unfreeze_page() remap_page().
Went to change the mention of freeze_page() added later in mm/rmap.c,
but found it to be incorrect: ordinary page reclaim reaches there too;
but the substance of the comment still seems correct, so edit it down.
Link: http://lkml.kernel.org/r/[email protected]
Fixes: e9b61f19858a5 ("thp: reintroduce split_huge_page()")
Signed-off-by: Hugh Dickins <[email protected]>
Acked-by: Kirill A. Shutemov <[email protected]>
Cc: Jerome Glisse <[email protected]>
Cc: Konstantin Khlebnikov <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: <[email protected]> [4.8+]
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
sys_link() can fail due to the new path already existing. This case
ofen occurs when we use a concated initrd, for example:
1) prepare a basic rootfs, it contains a regular files rc.local
lizhijian@:~/yocto-tiny-i386-2016-04-22$ cat etc/rc.local
#!/bin/sh
echo "Running /etc/rc.local..."
yocto-tiny-i386-2016-04-22$ find . | sed 's,^\./,,' | cpio -o -H newc | gzip -n -9 >../rootfs.cgz
2) create a extra initrd which also includes a etc/rc.local
lizhijian@:~/lkp-x86_64/etc$ echo "append initrd" >rc.local
lizhijian@:~/lkp/lkp-x86_64/etc$ cat rc.local
append initrd
lizhijian@:~/lkp/lkp-x86_64/etc$ ln rc.local rc.local.hardlink
append initrd
lizhijian@:~/lkp/lkp-x86_64/etc$ stat rc.local rc.local.hardlink
File: 'rc.local'
Size: 14 Blocks: 8 IO Block: 4096 regular file
Device: 801h/2049d Inode: 11296086 Links: 2
Access: (0664/-rw-rw-r--) Uid: ( 1002/lizhijian) Gid: ( 1002/lizhijian)
Access: 2018-11-15 16:08:28.654464815 +0800
Modify: 2018-11-15 16:07:57.514903210 +0800
Change: 2018-11-15 16:08:24.180228872 +0800
Birth: -
File: 'rc.local.hardlink'
Size: 14 Blocks: 8 IO Block: 4096 regular file
Device: 801h/2049d Inode: 11296086 Links: 2
Access: (0664/-rw-rw-r--) Uid: ( 1002/lizhijian) Gid: ( 1002/lizhijian)
Access: 2018-11-15 16:08:28.654464815 +0800
Modify: 2018-11-15 16:07:57.514903210 +0800
Change: 2018-11-15 16:08:24.180228872 +0800
Birth: -
lizhijian@:~/lkp/lkp-x86_64$ find . | sed 's,^\./,,' | cpio -o -H newc | gzip -n -9 >../rc-local.cgz
lizhijian@:~/lkp/lkp-x86_64$ gzip -dc ../rc-local.cgz | cpio -t
.
etc
etc/rc.local.hardlink <<< it will be extracted first at this initrd
etc/rc.local
3) concate 2 initrds and boot
lizhijian@:~/lkp$ cat rootfs.cgz rc-local.cgz >concate-initrd.cgz
lizhijian@:~/lkp$ qemu-system-x86_64 -nographic -enable-kvm -cpu host -smp 1 -m 1024 -kernel ~/lkp/linux/arch/x86/boot/bzImage -append "console=ttyS0 earlyprint=ttyS0 ignore_loglevel" -initrd ./concate-initr.cgz -serial stdio -nodefaults
In this case, sys_link(2) will fail and return -EEXIST, so we can only get
the rc.local at rootfs.cgz instead of rc-local.cgz
[[email protected]: move code to avoid forward declaration]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Li Zhijian <[email protected]>
Cc: Philip Li <[email protected]>
Cc: Dominik Brodowski <[email protected]>
Cc: Li Zhijian <[email protected]>
Cc: Al Viro <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Since __sanitizer_cov_trace_pc() is marked as notrace, function calls in
__sanitizer_cov_trace_pc() shouldn't be traced either.
ftrace_graph_caller() gets called for each function that isn't marked
'notrace', like canonicalize_ip(). This is the call trace from a run:
[ 139.644550] ftrace_graph_caller+0x1c/0x24
[ 139.648352] canonicalize_ip+0x18/0x28
[ 139.652313] __sanitizer_cov_trace_pc+0x14/0x58
[ 139.656184] sched_clock+0x34/0x1e8
[ 139.659759] trace_clock_local+0x40/0x88
[ 139.663722] ftrace_push_return_trace+0x8c/0x1f0
[ 139.667767] prepare_ftrace_return+0xa8/0x100
[ 139.671709] ftrace_graph_caller+0x1c/0x24
Rework so that check_kcov_mode() and canonicalize_ip() that are called
from __sanitizer_cov_trace_pc() are also marked as notrace.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnd Bergmann <[email protected]>
Signen-off-by: Anders Roxell <[email protected]>
Co-developed-by: Arnd Bergmann <[email protected]>
Acked-by: Steven Rostedt (VMware) <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Cc: Ingo Molnar <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Mel Gorman reports a hackbench regression with psi that would prohibit
shipping the suse kernel with it default-enabled, but he'd still like
users to be able to opt in at little to no cost to others.
With the current combination of CONFIG_PSI and the psi_disabled bool set
from the commandline, this is a challenge. Do the following things to
make it easier:
1. Add a config option CONFIG_PSI_DEFAULT_DISABLED that allows distros
to enable CONFIG_PSI in their kernel but leave the feature disabled
unless a user requests it at boot-time.
To avoid double negatives, rename psi_disabled= to psi=.
2. Make psi_disabled a static branch to eliminate any branch costs
when the feature is disabled.
In terms of numbers before and after this patch, Mel says:
: The following is a comparision using CONFIG_PSI=n as a baseline against
: your patch and a vanilla kernel
:
: 4.20.0-rc4 4.20.0-rc4 4.20.0-rc4
: kconfigdisable-v1r1 vanilla psidisable-v1r1
: Amean 1 1.3100 ( 0.00%) 1.3923 ( -6.28%) 1.3427 ( -2.49%)
: Amean 3 3.8860 ( 0.00%) 4.1230 * -6.10%* 3.8860 ( -0.00%)
: Amean 5 6.8847 ( 0.00%) 8.0390 * -16.77%* 6.7727 ( 1.63%)
: Amean 7 9.9310 ( 0.00%) 10.8367 * -9.12%* 9.9910 ( -0.60%)
: Amean 12 16.6577 ( 0.00%) 18.2363 * -9.48%* 17.1083 ( -2.71%)
: Amean 18 26.5133 ( 0.00%) 27.8833 * -5.17%* 25.7663 ( 2.82%)
: Amean 24 34.3003 ( 0.00%) 34.6830 ( -1.12%) 32.0450 ( 6.58%)
: Amean 30 40.0063 ( 0.00%) 40.5800 ( -1.43%) 41.5087 ( -3.76%)
: Amean 32 40.1407 ( 0.00%) 41.2273 ( -2.71%) 39.9417 ( 0.50%)
:
: It's showing that the vanilla kernel takes a hit (as the bisection
: indicated it would) and that disabling PSI by default is reasonably
: close in terms of performance for this particular workload on this
: particular machine so;
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Johannes Weiner <[email protected]>
Tested-by: Mel Gorman <[email protected]>
Reported-by: Mel Gorman <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
https://bugs.linaro.org/show_bug.cgi?id=3782
Turns out arm doesn't permit mapping address 0, so try minimum virtual
address instead.
Link: http://lkml.kernel.org/r/20181113165446.GA28157@avx2
Signed-off-by: Alexey Dobriyan <[email protected]>
Reported-by: Rafael David Tinoco <[email protected]>
Tested-by: Rafael David Tinoco <[email protected]>
Acked-by: Cyrill Gorcunov <[email protected]>
Cc: Shuah Khan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
CONFIG_DEBUG_OBJECTS_RCU_HEAD does not play well with kmemleak due to
recursive calls.
fill_pool
kmemleak_ignore
make_black_object
put_object
__call_rcu (kernel/rcu/tree.c)
debug_rcu_head_queue
debug_object_activate
debug_object_init
fill_pool
kmemleak_ignore
make_black_object
...
So add SLAB_NOLEAKTRACE to kmem_cache_create() to not register newly
allocated debug objects at all.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Qian Cai <[email protected]>
Suggested-by: Catalin Marinas <[email protected]>
Acked-by: Waiman Long <[email protected]>
Acked-by: Catalin Marinas <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Yang Shi <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Set the page dirty if VM_WRITE is not set because in such case the pte
won't be marked dirty and the page would be reclaimed without writepage
(i.e. swapout in the shmem case).
This was found by source review. Most apps (certainly including QEMU)
only use UFFDIO_COPY on PROT_READ|PROT_WRITE mappings or the app can't
modify the memory in the first place. This is for correctness and it
could help the non cooperative use case to avoid unexpected data loss.
Link: http://lkml.kernel.org/r/[email protected]
Reviewed-by: Hugh Dickins <[email protected]>
Cc: [email protected]
Fixes: 4c27fe4c4c84 ("userfaultfd: shmem: add shmem_mcopy_atomic_pte for userfaultfd support")
Reported-by: Hugh Dickins <[email protected]>
Signed-off-by: Andrea Arcangeli <[email protected]>
Cc: "Dr. David Alan Gilbert" <[email protected]>
Cc: Jann Horn <[email protected]>
Cc: Mike Kravetz <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Peter Xu <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
With MAP_SHARED: recheck the i_size after taking the PT lock, to
serialize against truncate with the PT lock. Delete the page from the
pagecache if the i_size_read check fails.
With MAP_PRIVATE: check the i_size after the PT lock before mapping
anonymous memory or zeropages into the MAP_PRIVATE shmem mapping.
A mostly irrelevant cleanup: like we do the delete_from_page_cache()
pagecache removal after dropping the PT lock, the PT lock is a spinlock
so drop it before the sleepable page lock.
Link: http://lkml.kernel.org/r/[email protected]
Fixes: 4c27fe4c4c84 ("userfaultfd: shmem: add shmem_mcopy_atomic_pte for userfaultfd support")
Signed-off-by: Andrea Arcangeli <[email protected]>
Reviewed-by: Mike Rapoport <[email protected]>
Reviewed-by: Hugh Dickins <[email protected]>
Reported-by: Jann Horn <[email protected]>
Cc: <[email protected]>
Cc: "Dr. David Alan Gilbert" <[email protected]>
Cc: Mike Kravetz <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: [email protected]
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
After the VMA to register the uffd onto is found, check that it has
VM_MAYWRITE set before allowing registration. This way we inherit all
common code checks before allowing to fill file holes in shmem and
hugetlbfs with UFFDIO_COPY.
The userfaultfd memory model is not applicable for readonly files unless
it's a MAP_PRIVATE.
Link: http://lkml.kernel.org/r/[email protected]
Fixes: ff62a3421044 ("hugetlb: implement memfd sealing")
Signed-off-by: Andrea Arcangeli <[email protected]>
Reviewed-by: Mike Rapoport <[email protected]>
Reviewed-by: Hugh Dickins <[email protected]>
Reported-by: Jann Horn <[email protected]>
Fixes: 4c27fe4c4c84 ("userfaultfd: shmem: add shmem_mcopy_atomic_pte for userfaultfd support")
Cc: <[email protected]>
Cc: "Dr. David Alan Gilbert" <[email protected]>
Cc: Mike Kravetz <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: [email protected]
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Userfaultfd did not create private memory when UFFDIO_COPY was invoked
on a MAP_PRIVATE shmem mapping. Instead it wrote to the shmem file,
even when that had not been opened for writing. Though, fortunately,
that could only happen where there was a hole in the file.
Fix the shmem-backed implementation of UFFDIO_COPY to create private
memory for MAP_PRIVATE mappings. The hugetlbfs-backed implementation
was already correct.
This change is visible to userland, if userfaultfd has been used in
unintended ways: so it introduces a small risk of incompatibility, but
is necessary in order to respect file permissions.
An app that uses UFFDIO_COPY for anything like postcopy live migration
won't notice the difference, and in fact it'll run faster because there
will be no copy-on-write and memory waste in the tmpfs pagecache
anymore.
Userfaults on MAP_PRIVATE shmem keep triggering only on file holes like
before.
The real zeropage can also be built on a MAP_PRIVATE shmem mapping
through UFFDIO_ZEROPAGE and that's safe because the zeropage pte is
never dirty, in turn even an mprotect upgrading the vma permission from
PROT_READ to PROT_READ|PROT_WRITE won't make the zeropage pte writable.
Link: http://lkml.kernel.org/r/[email protected]
Fixes: 4c27fe4c4c84 ("userfaultfd: shmem: add shmem_mcopy_atomic_pte for userfaultfd support")
Signed-off-by: Andrea Arcangeli <[email protected]>
Reported-by: Mike Rapoport <[email protected]>
Reviewed-by: Hugh Dickins <[email protected]>
Cc: <[email protected]>
Cc: "Dr. David Alan Gilbert" <[email protected]>
Cc: Jann Horn <[email protected]>
Cc: Mike Kravetz <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: [email protected]
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Patch series "userfaultfd shmem updates".
Jann found two bugs in the userfaultfd shmem MAP_SHARED backend: the
lack of the VM_MAYWRITE check and the lack of i_size checks.
Then looking into the above we also fixed the MAP_PRIVATE case.
Hugh by source review also found a data loss source if UFFDIO_COPY is
used on shmem MAP_SHARED PROT_READ mappings (the production usages
incidentally run with PROT_READ|PROT_WRITE, so the data loss couldn't
happen in those production usages like with QEMU).
The whole patchset is marked for stable.
We verified QEMU postcopy live migration with guest running on shmem
MAP_PRIVATE run as well as before after the fix of shmem MAP_PRIVATE.
Regardless if it's shmem or hugetlbfs or MAP_PRIVATE or MAP_SHARED, QEMU
unconditionally invokes a punch hole if the guest mapping is filebacked
and a MADV_DONTNEED too (needed to get rid of the MAP_PRIVATE COWs and
for the anon backend).
This patch (of 5):
We internally used EFAULT to communicate with the caller, switch to
ENOENT, so EFAULT can be used as a non internal retval.
Link: http://lkml.kernel.org/r/[email protected]
Fixes: 4c27fe4c4c84 ("userfaultfd: shmem: add shmem_mcopy_atomic_pte for userfaultfd support")
Signed-off-by: Andrea Arcangeli <[email protected]>
Reviewed-by: Mike Rapoport <[email protected]>
Reviewed-by: Hugh Dickins <[email protected]>
Cc: Mike Kravetz <[email protected]>
Cc: Jann Horn <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: "Dr. David Alan Gilbert" <[email protected]>
Cc: <[email protected]>
Cc: [email protected]
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
We free the misc device string twice on rmmod; fix this. Without this
we cannot remove the module without crashing.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Luis Chamberlain <[email protected]>
Reported-by: Randy Dunlap <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Cc: <[email protected]> [4.12+]
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
hfs_bmap_free() frees node via hfs_bnode_put(node). However it then
reads node->this when dumping error message on an error path, which may
result in a use-after-free bug. This patch frees node only when it is
never used.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Pan Bian <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Cc: Ernesto A. Fernandez <[email protected]>
Cc: Joe Perches <[email protected]>
Cc: Viacheslav Dubeyko <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
hfs_bmap_free() frees the node via hfs_bnode_put(node). However, it
then reads node->this when dumping error message on an error path, which
may result in a use-after-free bug. This patch frees the node only when
it is never again used.
Link: http://lkml.kernel.org/r/[email protected]
Fixes: a1185ffa2fc ("HFS rewrite")
Signed-off-by: Pan Bian <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Cc: Joe Perches <[email protected]>
Cc: Ernesto A. Fernandez <[email protected]>
Cc: Viacheslav Dubeyko <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|