linux-IllusionX/kernel
Jason Gunthorpe 57efa1fe59 mm/gup: prevent gup_fast from racing with COW during fork
Since commit 70e806e4e6 ("mm: Do early cow for pinned pages during
fork() for ptes") pages under a FOLL_PIN will not be write protected
during COW for fork.  This means that pages returned from
pin_user_pages(FOLL_WRITE) should not become write protected while the pin
is active.

However, there is a small race where get_user_pages_fast(FOLL_PIN) can
establish a FOLL_PIN at the same time copy_present_page() is write
protecting it:

        CPU 0                             CPU 1
   get_user_pages_fast()
    internal_get_user_pages_fast()
                                       copy_page_range()
                                         pte_alloc_map_lock()
                                           copy_present_page()
                                             atomic_read(has_pinned) == 0
					     page_maybe_dma_pinned() == false
     atomic_set(has_pinned, 1);
     gup_pgd_range()
      gup_pte_range()
       pte_t pte = gup_get_pte(ptep)
       pte_access_permitted(pte)
       try_grab_compound_head()
                                             pte = pte_wrprotect(pte)
	                                     set_pte_at();
                                         pte_unmap_unlock()
      // GUP now returns with a write protected page

The first attempt to resolve this by using the write protect caused
problems (and was missing a barrrier), see commit f3c64eda3e ("mm: avoid
early COW write protect games during fork()")

Instead wrap copy_p4d_range() with the write side of a seqcount and check
the read side around gup_pgd_range().  If there is a collision then
get_user_pages_fast() fails and falls back to slow GUP.

Slow GUP is safe against this race because copy_page_range() is only
called while holding the exclusive side of the mmap_lock on the src
mm_struct.

[akpm@linux-foundation.org: coding style fixes]
  Link: https://lore.kernel.org/r/CAHk-=wi=iCnYCARbPGjkVJu9eyYeZ13N64tZYLdOB8CP5Q_PLw@mail.gmail.com

Link: https://lkml.kernel.org/r/2-v4-908497cf359a+4782-gup_fork_jgg@nvidia.com
Fixes: f3c64eda3e ("mm: avoid early COW write protect games during fork()")
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Peter Xu <peterx@redhat.com>
Acked-by: "Ahmed S. Darwish" <a.darwish@linutronix.de>	[seqcount_t parts]
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Kirill Shutemov <kirill@shutemov.name>
Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
Cc: Leon Romanovsky <leonro@nvidia.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15 12:13:39 -08:00
..
bpf bpf: Fix enum names for bpf_this_cpu_ptr() and bpf_per_cpu_ptr() helpers 2020-12-11 14:19:07 -08:00
cgroup
configs
debug
dma
entry
events perf: Tweak perf_event_attr::exclusive semantics 2020-11-09 18:12:36 +01:00
gcov
irq genirq/irqdomain: Add an irq_create_mapping_affinity() function 2020-11-30 12:21:31 +01:00
kcsan
livepatch
locking lockdep: Put graph lock/unlock under lock_recursion protection 2020-11-17 13:15:35 +01:00
power
printk Urgent printk fix for 5.10 2020-11-27 10:38:36 -08:00
rcu Merge branch 'urgent-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu 2020-11-17 10:31:56 -08:00
sched membarrier: Execute SYNC_CORE on the calling thread 2020-12-09 09:37:43 +01:00
time
trace bpf: Fix enum names for bpf_this_cpu_ptr() and bpf_per_cpu_ptr() helpers 2020-12-11 14:19:07 -08:00
.gitignore
acct.c
async.c
audit.c
audit.h
audit_fsnotify.c
audit_tree.c
audit_watch.c
auditfilter.c
auditsc.c
backtracetest.c
bounds.c
capability.c
compat.c
configs.c
context_tracking.c
cpu.c kernel/cpu: add arch override for clear_tasks_mm_cpumask() mm handling 2020-11-27 00:10:39 +11:00
cpu_pm.c
crash_core.c
crash_dump.c
cred.c
delayacct.c
dma.c
exec_domain.c
exit.c
extable.c
fail_function.c fail_function: Remove a redundant mutex unlock 2020-11-19 11:58:16 -08:00
fork.c mm/gup: prevent gup_fast from racing with COW during fork 2020-12-15 12:13:39 -08:00
freezer.c
futex.c futex: Don't enable IRQs unconditionally in put_pi_state() 2020-11-09 14:30:30 +01:00
gen_kheaders.sh
groups.c
hung_task.c
iomem.c
irq_work.c
jump_label.c
kallsyms.c
kcmp.c
Kconfig.freezer
Kconfig.hz
Kconfig.locks
Kconfig.preempt
kcov.c
kexec.c
kexec_core.c
kexec_elf.c
kexec_file.c
kexec_internal.h
kheaders.c
kmod.c
kprobes.c
ksysfs.c
kthread.c kthread_worker: document CPU hotplug handling 2020-12-15 12:13:36 -08:00
latencytop.c
Makefile elfcore: fix building with clang 2020-12-11 14:02:14 -08:00
module-internal.h
module.c
module_signature.c
module_signing.c
notifier.c
nsproxy.c
padata.c
panic.c panic: don't dump stack twice on warn 2020-11-14 11:26:04 -08:00
params.c
pid.c
pid_namespace.c
profile.c
ptrace.c ptrace: Set PF_SUPERPRIV when checking capability 2020-11-17 12:53:22 -08:00
range.c
reboot.c reboot: fix overflow parsing reboot cpu number 2020-11-14 11:26:03 -08:00
regset.c
relay.c
resource.c
rseq.c
scftorture.c
scs.c
seccomp.c seccomp: Set PF_SUPERPRIV when checking capability 2020-11-17 12:53:22 -08:00
signal.c
smp.c
smpboot.c
smpboot.h
softirq.c
stackleak.c
stacktrace.c
static_call.c
stop_machine.c
sys.c
sys_ni.c
sysctl-test.c
sysctl.c
task_work.c
taskstats.c
test_kprobes.c
torture.c
tracepoint.c
tsacct.c
ucount.c
uid16.c
uid16.h
umh.c
up.c
user-return-notifier.c
user.c
user_namespace.c
usermode_driver.c
utsname.c
utsname_sysctl.c
watch_queue.c
watchdog.c kernel/watchdog: fix watchdog_allowed_mask not used warning 2020-11-14 11:26:03 -08:00
watchdog_hld.c
workqueue.c
workqueue_internal.h