linux-IllusionX/fs
Mike Kravetz c86aa7bbfd hugetlbfs: Use i_mmap_rwsem to fix page fault/truncate race
hugetlbfs page faults can race with truncate and hole punch operations.
Current code in the page fault path attempts to handle this by 'backing
out' operations if we encounter the race.  One obvious omission in the
current code is removing a page newly added to the page cache.  This is
pretty straight forward to address, but there is a more subtle and
difficult issue of backing out hugetlb reservations.  To handle this
correctly, the 'reservation state' before page allocation needs to be
noted so that it can be properly backed out.  There are four distinct
possibilities for reservation state: shared/reserved, shared/no-resv,
private/reserved and private/no-resv.  Backing out a reservation may
require memory allocation which could fail so that needs to be taken into
account as well.

Instead of writing the required complicated code for this rare occurrence,
just eliminate the race.  i_mmap_rwsem is now held in read mode for the
duration of page fault processing.  Hold i_mmap_rwsem longer in truncation
and hold punch code to cover the call to remove_inode_hugepages.

With this modification, code in remove_inode_hugepages checking for races
becomes 'dead' as it can not longer happen.  Remove the dead code and
expand comments to explain reasoning.  Similarly, checks for races with
truncation in the page fault path can be simplified and removed.

[mike.kravetz@oracle.com: incorporat suggestions from Kirill]
  Link: http://lkml.kernel.org/r/20181222223013.22193-3-mike.kravetz@oracle.com
Link: http://lkml.kernel.org/r/20181218223557.5202-3-mike.kravetz@oracle.com
Fixes: ebed4bfc8d ("hugetlb: fix absurd HugePages_Rsvd")
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Prakash Sangappa <prakash.sangappa@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-28 12:11:52 -08:00
..
9p
adfs
affs
afs Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2018-11-30 10:47:50 -08:00
autofs
befs
bfs
btrfs btrfs: Fix typos in comments and strings 2018-12-17 14:51:50 +01:00
cachefiles fscache, cachefiles: remove redundant variable 'cache' 2018-11-30 16:00:58 +00:00
ceph mm: convert totalram_pages and totalhigh_pages variables to atomic 2018-12-28 12:11:47 -08:00
cifs File locking changes for v4.21 2018-12-27 17:12:30 -08:00
coda
configfs
cramfs
crypto
debugfs
devpts
dlm dlm: fix invalid cluster name warning 2018-12-03 15:30:24 -06:00
ecryptfs
efivarfs
efs
exofs
exportfs exportfs: do not read dentry after free 2018-11-23 09:08:17 -05:00
ext2 \n 2018-12-27 17:00:35 -08:00
ext4 ext4: check for shutdown and r/o file system in ext4_write_inode() 2018-12-19 14:36:58 -05:00
f2fs mm: migrate: drop unused argument of migrate_page_move_mapping() 2018-12-28 12:11:51 -08:00
fat
freevxfs
fscache fscache: fix race between enablement and dropping of object 2018-11-30 15:57:31 +00:00
fuse mm: convert totalram_pages and totalhigh_pages variables to atomic 2018-12-28 12:11:47 -08:00
gfs2 File locking changes for v4.21 2018-12-27 17:12:30 -08:00
hfs hfs: do not free node before using 2018-11-30 14:56:14 -08:00
hfsplus hfsplus: do not free node before using 2018-11-30 14:56:14 -08:00
hostfs
hpfs
hugetlbfs hugetlbfs: Use i_mmap_rwsem to fix page fault/truncate race 2018-12-28 12:11:52 -08:00
isofs
jbd2 jbd2: clean up indentation issue, replace spaces with tab 2018-12-04 00:20:10 -05:00
jffs2 jffs2: Fix use of uninitialized delayed_work, lockdep breakage 2018-12-02 09:20:34 +01:00
jfs
kernfs
lockd fs/locks: merge posix_unblock_lock() and locks_delete_block() 2018-12-07 06:50:56 -05:00
minix
nfs mm: convert totalram_pages and totalhigh_pages variables to atomic 2018-12-28 12:11:47 -08:00
nfs_common
nfsd mm: convert totalram_pages and totalhigh_pages variables to atomic 2018-12-28 12:11:47 -08:00
nilfs2 nilfs2: Use xa_erase_irq 2018-11-05 14:57:05 -05:00
nls
notify fanotify: Use inode_is_open_for_write 2018-12-11 10:55:45 +01:00
ntfs mm: convert totalram_pages and totalhigh_pages variables to atomic 2018-12-28 12:11:47 -08:00
ocfs2 ocfs2: don't clear bh uptodate for block read 2018-12-28 12:11:46 -08:00
omfs
openpromfs fs/openpromfs: Use of_node_name_eq for node name comparisons 2018-11-18 13:35:19 -08:00
orangefs
overlayfs Revert "ovl: relax permission checking on underlying layers" 2018-12-04 11:31:30 +01:00
proc mm, proc: report PR_SET_THP_DISABLE in proc 2018-12-28 12:11:50 -08:00
pstore pstore/ram: Avoid NULL deref in ftrace merging failure path 2018-12-03 17:11:02 -08:00
qnx4
qnx6
quota quota: Lock s_umount in exclusive mode for Q_XQUOTA{ON,OFF} quotactls. 2018-12-18 18:29:15 +01:00
ramfs
reiserfs
romfs
squashfs
sysfs sysfs: constify sysfs create/remove files harder 2018-12-03 18:18:19 +02:00
sysv sysv: return 'err' instead of 0 in __sysv_write_inode 2018-11-10 08:02:40 -05:00
tracefs
ubifs mm: migrate: drop unused argument of migrate_page_move_mapping() 2018-12-28 12:11:51 -08:00
udf \n 2018-12-27 17:00:35 -08:00
ufs
xfs xfs: reallocate realtime summary cache on growfs 2018-12-21 18:45:18 -08:00
aio.c mm: migrate: drop unused argument of migrate_page_move_mapping() 2018-12-28 12:11:51 -08:00
anon_inodes.c
attr.c
bad_inode.c
binfmt_aout.c
binfmt_elf.c
binfmt_elf_fdpic.c
binfmt_em86.c
binfmt_flat.c
binfmt_misc.c
binfmt_script.c
block_dev.c blkdev: avoid migration stalls for blkdev pages 2018-12-28 12:11:51 -08:00
buffer.c
char_dev.c
compat.c
compat_binfmt_elf.c
compat_ioctl.c
coredump.c
d_path.c
dax.c mm/mmu_notifier: use structure for invalidate_range_start/end calls v2 2018-12-28 12:11:50 -08:00
dcache.c
dcookies.c
direct-io.c fs: fix lost error code in dio_complete 2018-11-30 08:35:14 -07:00
drop_caches.c
eventfd.c
eventpoll.c
exec.c Revert "exec: make de_thread() freezable" 2018-12-04 16:04:20 +01:00
fcntl.c
fhandle.c
file.c fs/file: Replace synchronize_sched() with synchronize_rcu() 2018-11-27 09:21:39 -08:00
file_table.c mm: convert totalram_pages and totalhigh_pages variables to atomic 2018-12-28 12:11:47 -08:00
filesystems.c
fs-writeback.c
fs_pin.c
fs_struct.c
inode.c mm: don't reclaim inodes with many attached pages 2018-11-18 10:15:09 -08:00
internal.h
ioctl.c
iomap.c mm: migrate: drop unused argument of migrate_page_move_mapping() 2018-12-28 12:11:51 -08:00
Kconfig
Kconfig.binfmt
libfs.c
locks.c locks: Use inode_is_open_for_write 2018-12-17 07:19:46 -05:00
Makefile
mbcache.c
mount.h
mpage.c
namei.c Revert "vfs: Allow userns root to call mknod on owned filesystems." 2018-12-22 14:18:34 -08:00
namespace.c mnt: fix __detach_mounts infinite loop 2018-11-12 01:02:34 -06:00
no-block.c
nsfs.c
open.c
pipe.c
pnode.c
pnode.h
posix_acl.c
proc_namespace.c
read_write.c vfs: allow some remap flags to be passed to vfs_clone_file_range 2018-12-04 08:50:49 -08:00
readdir.c
select.c
seq_file.c
signalfd.c
splice.c splice: don't read more than available pipe space 2018-12-04 08:50:49 -08:00
stack.c
stat.c
statfs.c
super.c
sync.c
timerfd.c
userfaultfd.c userfaultfd: clear flag if remap event not enabled 2018-12-28 12:11:51 -08:00
utimes.c
xattr.c