aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2020-06-02mm/gup: introduce pin_user_pages_unlockedJohn Hubbard2-0/+19
Introduce pin_user_pages_unlocked(), which is nearly identical to the get_user_pages_unlocked() that it wraps, except that it sets FOLL_PIN and rejects FOLL_GET. Signed-off-by: John Hubbard <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Cc: Andy Walls <[email protected]> Cc: Mauro Carvalho Chehab <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-06-02mm/gup.c: update the documentationSouptick Joarder1-18/+39
This patch is an attempt to update the documentation. - Add/ remove extra * based on type of function static/global. - Add description for functions and their input arguments. [[email protected]: s@/*@/**@] Signed-off-by: Souptick Joarder <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-06-02mm/writeback: discard NR_UNSTABLE_NFS, use NR_WRITEBACK insteadNeilBrown12-36/+28
After an NFS page has been written it is considered "unstable" until a COMMIT request succeeds. If the COMMIT fails, the page will be re-written. These "unstable" pages are currently accounted as "reclaimable", either in WB_RECLAIMABLE, or in NR_UNSTABLE_NFS which is included in a 'reclaimable' count. This might have made sense when sending the COMMIT required a separate action by the VFS/MM (e.g. releasepage() used to send a COMMIT). However now that all writes generated by ->writepages() will automatically be followed by a COMMIT (since commit 919e3bd9a875 ("NFS: Ensure we commit after writeback is complete")) it makes more sense to treat them as writeback pages. So this patch removes NR_UNSTABLE_NFS and accounts unstable pages in NR_WRITEBACK and WB_WRITEBACK. A particular effect of this change is that when wb_check_background_flush() calls wb_over_bg_threshold(), the latter will report 'true' a lot less often as the 'unstable' pages are no longer considered 'dirty' (as there is nothing that writeback can do about them anyway). Currently wb_check_background_flush() will trigger writeback to NFS even when there are relatively few dirty pages (if there are lots of unstable pages), this can result in small writes going to the server (10s of Kilobytes rather than a Megabyte) which hurts throughput. With this patch, there are fewer writes which are each larger on average. Where the NR_UNSTABLE_NFS count was included in statistics virtual-files, the entry is retained, but the value is hard-coded as zero. static trace points and warning printks which mentioned this counter no longer report it. [[email protected]: re-layout comment] [[email protected]: fix printk warning] Signed-off-by: NeilBrown <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Jan Kara <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Acked-by: Trond Myklebust <[email protected]> Acked-by: Michal Hocko <[email protected]> [mm] Cc: Christoph Hellwig <[email protected]> Cc: Chuck Lever <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-06-02mm/writeback: replace PF_LESS_THROTTLE with PF_LOCAL_THROTTLENeilBrown6-17/+44
PF_LESS_THROTTLE exists for loop-back nfsd (and a similar need in the loop block driver and callers of prctl(PR_SET_IO_FLUSHER)), where a daemon needs to write to one bdi (the final bdi) in order to free up writes queued to another bdi (the client bdi). The daemon sets PF_LESS_THROTTLE and gets a larger allowance of dirty pages, so that it can still dirty pages after other processses have been throttled. The purpose of this is to avoid deadlock that happen when the PF_LESS_THROTTLE process must write for any dirty pages to be freed, but it is being thottled and cannot write. This approach was designed when all threads were blocked equally, independently on which device they were writing to, or how fast it was. Since that time the writeback algorithm has changed substantially with different threads getting different allowances based on non-trivial heuristics. This means the simple "add 25%" heuristic is no longer reliable. The important issue is not that the daemon needs a *larger* dirty page allowance, but that it needs a *private* dirty page allowance, so that dirty pages for the "client" bdi that it is helping to clear (the bdi for an NFS filesystem or loop block device etc) do not affect the throttling of the daemon writing to the "final" bdi. This patch changes the heuristic so that the task is not throttled when the bdi it is writing to has a dirty page count below below (or equal to) the free-run threshold for that bdi. This ensures it will always be able to have some pages in flight, and so will not deadlock. In a steady-state, it is expected that PF_LOCAL_THROTTLE tasks might still be throttled by global threshold, but that is acceptable as it is only the deadlock state that is interesting for this flag. This approach of "only throttle when target bdi is busy" is consistent with the other use of PF_LESS_THROTTLE in current_may_throttle(), were it causes attention to be focussed only on the target bdi. So this patch - renames PF_LESS_THROTTLE to PF_LOCAL_THROTTLE, - removes the 25% bonus that that flag gives, and - If PF_LOCAL_THROTTLE is set, don't delay at all unless the global and the local free-run thresholds are exceeded. Note that previously realtime threads were treated the same as PF_LESS_THROTTLE threads. This patch does *not* change the behvaiour for real-time threads, so it is now different from the behaviour of nfsd and loop tasks. I don't know what is wanted for realtime. [[email protected]: coding style fixes] Signed-off-by: NeilBrown <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Jan Kara <[email protected]> Acked-by: Chuck Lever <[email protected]> [nfsd] Cc: Christoph Hellwig <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Trond Myklebust <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-06-02mm/page-writeback.c: remove unused variableChao Yu1-3/+1
Commit 64081362e8ff ("mm/page-writeback.c: fix range_cyclic writeback vs writepages deadlock") left unused variable, remove it. Signed-off-by: Chao Yu <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-06-02mm/filemap.c: remove misleading commentMatthew Wilcox (Oracle)1-1/+0
We no longer return 0 here and the comment doesn't tell us anything that we don't already know (SIGBUS is a pretty good indicator that things didn't work out). Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Reviewed-by: William Kucharski <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-06-02mm_types.h: change set_page_private to inline functionGuoqing Jiang1-1/+5
Change it to inline function to make callers use the proper argument. And no need for it to be macro per Andrew's comment [1]. [1] https://lore.kernel.org/lkml/[email protected]/ Signed-off-by: Guoqing Jiang <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-06-02mm/migrate.c: call detach_page_private to cleanup codeGuoqing Jiang1-6/+1
We can cleanup code a little by call detach_page_private here. [[email protected]: use attach_page_private(), per Dave] http://lkml.kernel.org/r/[email protected] [[email protected]: clear PagePrivate] Signed-off-by: Guoqing Jiang <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Cc: Chao Yu <[email protected]> Cc: Cong Wang <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Eric Biggers <[email protected]> Cc: Gao Xiang <[email protected]> Cc: Jaegeuk Kim <[email protected]> Cc: John Hubbard <[email protected]> Cc: Joseph Qi <[email protected]> Cc: Junxiao Bi <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Zi Yan <[email protected]> Cc: Johannes Thumshirn <[email protected]> Cc: Miklos Szeredi <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-06-02buffer_head.h: remove attach_page_buffersGuoqing Jiang1-8/+0
All the callers have replaced attach_page_buffers with the new function attach_page_private, so remove it. Signed-off-by: Guoqing Jiang <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Sebastian Andrzej Siewior <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Andreas Dilger <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-06-02orangefs: use attach/detach_page_privateGuoqing Jiang1-26/+6
Since the new pair function is introduced, we can call them to clean the code in orangefs. Signed-off-by: Guoqing Jiang <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Tested-by: Mike Marshall <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Cc: Martin Brandenburg <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-06-02ntfs: replace attach_page_buffers with attach_page_privateGuoqing Jiang2-2/+2
Call the new function since attach_page_buffers will be removed. Signed-off-by: Guoqing Jiang <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Cc: Anton Altaparmakov <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-06-02iomap: use attach/detach_page_privateGuoqing Jiang1-15/+4
Since the new pair function is introduced, we can call them to clean the code in iomap. Signed-off-by: Guoqing Jiang <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Dave Chinner <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-06-02f2fs: use attach/detach_page_privateGuoqing Jiang1-9/+2
Since the new pair function is introduced, we can call them to clean the code in f2fs.h. Signed-off-by: Guoqing Jiang <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Acked-by: Chao Yu <[email protected]> Cc: Jaegeuk Kim <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-06-02fs/buffer.c: use attach/detach_page_privateGuoqing Jiang1-12/+4
Since the new pair function is introduced, we can call them to clean the code in buffer.c. Signed-off-by: Guoqing Jiang <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Cc: Alexander Viro <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-06-02btrfs: use attach/detach_page_privateGuoqing Jiang3-36/+12
Since the new pair function is introduced, we can call them to clean the code in btrfs. Signed-off-by: Guoqing Jiang <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Acked-by: David Sterba <[email protected]> Cc: Chris Mason <[email protected]> Cc: Josef Bacik <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-06-02md: remove __clear_page_buffers and use attach/detach_page_privateGuoqing Jiang1-10/+2
After introduction attach/detach_page_private in pagemap.h, we can remove the duplicated code and call the new functions. Signed-off-by: Guoqing Jiang <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Acked-by: Song Liu <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-06-02include/linux/pagemap.h: introduce attach/detach_page_privateGuoqing Jiang1-0/+37
Patch series "Introduce attach/detach_page_private to cleanup code". This patch (of 10): The logic in attach_page_buffers and __clear_page_buffers are quite paired, but 1. they are located in different files. 2. attach_page_buffers is implemented in buffer_head.h, so it could be used by other files. But __clear_page_buffers is static function in buffer.c and other potential users can't call the function, md-bitmap even copied the function. So, introduce the new attach/detach_page_private to replace them. With the new pair of function, we will remove the usage of attach_page_buffers and __clear_page_buffers in next patches. Thanks for suggestions about the function name from Alexander Viro, Andreas Grünbacher, Christoph Hellwig and Matthew Wilcox. Suggested-by: Matthew Wilcox <[email protected]> Signed-off-by: Guoqing Jiang <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Cc: "Darrick J. Wong" <[email protected]> Cc: William Kucharski <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Cc: Andreas Gruenbacher <[email protected]> Cc: Yang Shi <[email protected]> Cc: Yafang Shao <[email protected]> Cc: Song Liu <[email protected]> Cc: Chris Mason <[email protected]> Cc: Josef Bacik <[email protected]> Cc: David Sterba <[email protected]> Cc: Alexander Viro <[email protected]> Cc: Jaegeuk Kim <[email protected]> Cc: Chao Yu <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Anton Altaparmakov <[email protected]> Cc: Mike Marshall <[email protected]> Cc: Martin Brandenburg <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Sebastian Andrzej Siewior <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Andreas Dilger <[email protected]> Cc: Chao Yu <[email protected]> Cc: Dave Chinner <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-06-02iomap: convert from readpages to readaheadMatthew Wilcox (Oracle)5-74/+41
Use the new readahead operation in iomap. Convert XFS and ZoneFS to use it. Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Reviewed-by: William Kucharski <[email protected]> Cc: Chao Yu <[email protected]> Cc: Cong Wang <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Eric Biggers <[email protected]> Cc: Gao Xiang <[email protected]> Cc: Jaegeuk Kim <[email protected]> Cc: John Hubbard <[email protected]> Cc: Joseph Qi <[email protected]> Cc: Junxiao Bi <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Zi Yan <[email protected]> Cc: Johannes Thumshirn <[email protected]> Cc: Miklos Szeredi <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-06-02fuse: convert from readpages to readaheadMatthew Wilcox (Oracle)1-72/+28
Implement the new readahead operation in fuse by using __readahead_batch() to fill the array of pages in fuse_args_pages directly. This lets us inline fuse_readpages_fill() into fuse_readahead(). [[email protected]: build fix] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Dave Chinner <[email protected]> Reviewed-by: William Kucharski <[email protected]> Acked-by: Miklos Szeredi <[email protected]> Cc: Chao Yu <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Cong Wang <[email protected]> Cc: Darrick J. Wong <[email protected]> Cc: Eric Biggers <[email protected]> Cc: Gao Xiang <[email protected]> Cc: Jaegeuk Kim <[email protected]> Cc: John Hubbard <[email protected]> Cc: Joseph Qi <[email protected]> Cc: Junxiao Bi <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Zi Yan <[email protected]> Cc: Johannes Thumshirn <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-06-02f2fs: pass the inode to f2fs_mpage_readpagesMatthew Wilcox (Oracle)1-4/+3
This function now only uses the mapping argument to look up the inode, and both callers already have the inode, so just pass the inode instead of the mapping. Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: William Kucharski <[email protected]> Reviewed-by: Eric Biggers <[email protected]> Reviewed-by: Chao Yu <[email protected]> Acked-by: Jaegeuk Kim <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Cong Wang <[email protected]> Cc: Darrick J. Wong <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Gao Xiang <[email protected]> Cc: John Hubbard <[email protected]> Cc: Joseph Qi <[email protected]> Cc: Junxiao Bi <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Zi Yan <[email protected]> Cc: Johannes Thumshirn <[email protected]> Cc: Miklos Szeredi <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-06-02f2fs: convert from readpages to readaheadMatthew Wilcox (Oracle)2-31/+22
Use the new readahead operation in f2fs Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: William Kucharski <[email protected]> Reviewed-by: Eric Biggers <[email protected]> Reviewed-by: Chao Yu <[email protected]> Acked-by: Jaegeuk Kim <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Cong Wang <[email protected]> Cc: Darrick J. Wong <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Gao Xiang <[email protected]> Cc: John Hubbard <[email protected]> Cc: Joseph Qi <[email protected]> Cc: Junxiao Bi <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Zi Yan <[email protected]> Cc: Johannes Thumshirn <[email protected]> Cc: Miklos Szeredi <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-06-02ext4: pass the inode to ext4_mpage_readpagesMatthew Wilcox (Oracle)3-5/+4
This function now only uses the mapping argument to look up the inode, and both callers already have the inode, so just pass the inode instead of the mapping. Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: William Kucharski <[email protected]> Reviewed-by: Eric Biggers <[email protected]> Cc: Chao Yu <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Cong Wang <[email protected]> Cc: Darrick J. Wong <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Gao Xiang <[email protected]> Cc: Jaegeuk Kim <[email protected]> Cc: John Hubbard <[email protected]> Cc: Joseph Qi <[email protected]> Cc: Junxiao Bi <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Zi Yan <[email protected]> Cc: Johannes Thumshirn <[email protected]> Cc: Miklos Szeredi <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-06-02ext4: convert from readpages to readaheadMatthew Wilcox (Oracle)3-28/+18
Use the new readahead operation in ext4 Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: William Kucharski <[email protected]> Reviewed-by: Eric Biggers <[email protected]> Cc: Chao Yu <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Cong Wang <[email protected]> Cc: Darrick J. Wong <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Gao Xiang <[email protected]> Cc: Jaegeuk Kim <[email protected]> Cc: John Hubbard <[email protected]> Cc: Joseph Qi <[email protected]> Cc: Junxiao Bi <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Zi Yan <[email protected]> Cc: Johannes Thumshirn <[email protected]> Cc: Miklos Szeredi <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-06-02erofs: convert compressed files from readpages to readaheadMatthew Wilcox (Oracle)1-20/+9
Use the new readahead operation in erofs. Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Dave Chinner <[email protected]> Reviewed-by: William Kucharski <[email protected]> Reviewed-by: Chao Yu <[email protected]> Acked-by: Gao Xiang <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Cong Wang <[email protected]> Cc: Darrick J. Wong <[email protected]> Cc: Eric Biggers <[email protected]> Cc: Jaegeuk Kim <[email protected]> Cc: John Hubbard <[email protected]> Cc: Joseph Qi <[email protected]> Cc: Junxiao Bi <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Zi Yan <[email protected]> Cc: Johannes Thumshirn <[email protected]> Cc: Miklos Szeredi <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-06-02erofs: convert uncompressed files from readpages to readaheadMatthew Wilcox (Oracle)3-29/+18
Use the new readahead operation in erofs Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: William Kucharski <[email protected]> Reviewed-by: Chao Yu <[email protected]> Acked-by: Gao Xiang <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Cong Wang <[email protected]> Cc: Darrick J. Wong <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Eric Biggers <[email protected]> Cc: Jaegeuk Kim <[email protected]> Cc: John Hubbard <[email protected]> Cc: Joseph Qi <[email protected]> Cc: Junxiao Bi <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Zi Yan <[email protected]> Cc: Johannes Thumshirn <[email protected]> Cc: Miklos Szeredi <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-06-02btrfs: convert from readpages to readaheadMatthew Wilcox (Oracle)3-42/+20
Implement the new readahead method in btrfs using the new readahead_page_batch() function. Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: William Kucharski <[email protected]> Cc: Chao Yu <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Cong Wang <[email protected]> Cc: Darrick J. Wong <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Eric Biggers <[email protected]> Cc: Gao Xiang <[email protected]> Cc: Jaegeuk Kim <[email protected]> Cc: John Hubbard <[email protected]> Cc: Joseph Qi <[email protected]> Cc: Junxiao Bi <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Zi Yan <[email protected]> Cc: Johannes Thumshirn <[email protected]> Cc: Miklos Szeredi <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-06-02fs: convert mpage_readpages to mpage_readaheadMatthew Wilcox (Oracle)18-126/+73
Implement the new readahead aop and convert all callers (block_dev, exfat, ext2, fat, gfs2, hpfs, isofs, jfs, nilfs2, ocfs2, omfs, qnx6, reiserfs & udf). The callers are all trivial except for GFS2 & OCFS2. Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Junxiao Bi <[email protected]> # ocfs2 Reviewed-by: Joseph Qi <[email protected]> # ocfs2 Reviewed-by: Dave Chinner <[email protected]> Reviewed-by: John Hubbard <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: William Kucharski <[email protected]> Cc: Chao Yu <[email protected]> Cc: Cong Wang <[email protected]> Cc: Darrick J. Wong <[email protected]> Cc: Eric Biggers <[email protected]> Cc: Gao Xiang <[email protected]> Cc: Jaegeuk Kim <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Zi Yan <[email protected]> Cc: Johannes Thumshirn <[email protected]> Cc: Miklos Szeredi <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-06-02mm: use memalloc_nofs_save in readahead pathMatthew Wilcox (Oracle)1-0/+14
Ensure that memory allocations in the readahead path do not attempt to reclaim file-backed pages, which could lead to a deadlock. It is possible, though unlikely this is the root cause of a problem observed by Cong Wang. Reported-by: Cong Wang <[email protected]> Suggested-by: Michal Hocko <[email protected]> Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: William Kucharski <[email protected]> Cc: Chao Yu <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Darrick J. Wong <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Eric Biggers <[email protected]> Cc: Gao Xiang <[email protected]> Cc: Jaegeuk Kim <[email protected]> Cc: John Hubbard <[email protected]> Cc: Joseph Qi <[email protected]> Cc: Junxiao Bi <[email protected]> Cc: Zi Yan <[email protected]> Cc: Johannes Thumshirn <[email protected]> Cc: Miklos Szeredi <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-06-02mm: document why we don't set PageReadaheadMatthew Wilcox (Oracle)1-3/+6
If the page is already in cache, we don't set PageReadahead on it. Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: William Kucharski <[email protected]> Cc: Chao Yu <[email protected]> Cc: Cong Wang <[email protected]> Cc: Darrick J. Wong <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Eric Biggers <[email protected]> Cc: Gao Xiang <[email protected]> Cc: Jaegeuk Kim <[email protected]> Cc: John Hubbard <[email protected]> Cc: Joseph Qi <[email protected]> Cc: Junxiao Bi <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Zi Yan <[email protected]> Cc: Johannes Thumshirn <[email protected]> Cc: Miklos Szeredi <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-06-02mm: add page_cache_readahead_unboundedMatthew Wilcox (Oracle)6-91/+55
ext4 and f2fs have duplicated the guts of the readahead code so they can read past i_size. Instead, separate out the guts of the readahead code so they can call it directly. Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Tested-by: Eric Biggers <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: William Kucharski <[email protected]> Reviewed-by: Eric Biggers <[email protected]> Cc: Chao Yu <[email protected]> Cc: Cong Wang <[email protected]> Cc: Darrick J. Wong <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Gao Xiang <[email protected]> Cc: Jaegeuk Kim <[email protected]> Cc: John Hubbard <[email protected]> Cc: Joseph Qi <[email protected]> Cc: Junxiao Bi <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Zi Yan <[email protected]> Cc: Johannes Thumshirn <[email protected]> Cc: Miklos Szeredi <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-06-02mm: move end_index check out of readahead loopMatthew Wilcox (Oracle)1-6/+8
By reducing nr_to_read, we can eliminate this check from inside the loop. Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: John Hubbard <[email protected]> Reviewed-by: William Kucharski <[email protected]> Cc: Chao Yu <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Cong Wang <[email protected]> Cc: Darrick J. Wong <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Eric Biggers <[email protected]> Cc: Gao Xiang <[email protected]> Cc: Jaegeuk Kim <[email protected]> Cc: Joseph Qi <[email protected]> Cc: Junxiao Bi <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Zi Yan <[email protected]> Cc: Johannes Thumshirn <[email protected]> Cc: Miklos Szeredi <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-06-02mm: add readahead address space operationMatthew Wilcox (Oracle)4-3/+32
This replaces ->readpages with a saner interface: - Return void instead of an ignored error code. - Page cache is already populated with locked pages when ->readahead is called. - New arguments can be passed to the implementation without changing all the filesystems that use a common helper function like mpage_readahead(). Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: John Hubbard <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: William Kucharski <[email protected]> Cc: Chao Yu <[email protected]> Cc: Cong Wang <[email protected]> Cc: Darrick J. Wong <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Eric Biggers <[email protected]> Cc: Gao Xiang <[email protected]> Cc: Jaegeuk Kim <[email protected]> Cc: Joseph Qi <[email protected]> Cc: Junxiao Bi <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Zi Yan <[email protected]> Cc: Johannes Thumshirn <[email protected]> Cc: Miklos Szeredi <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-06-02mm: put readahead pages in cache earlierMatthew Wilcox (Oracle)1-18/+28
When populating the page cache for readahead, mappings that use ->readpages must populate the page cache themselves as the pages are passed on a linked list which would normally be used for the page cache's LRU. For mappings that use ->readpage or the upcoming ->readahead method, we can put the pages into the page cache as soon as they're allocated, which solves a race between readahead and direct IO. It also lets us remove the gfp argument from read_pages(). Use the new readahead_page() API to implement the repeated calls to ->readpage(), just like most filesystems will. Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: William Kucharski <[email protected]> Cc: Chao Yu <[email protected]> Cc: Cong Wang <[email protected]> Cc: Darrick J. Wong <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Eric Biggers <[email protected]> Cc: Gao Xiang <[email protected]> Cc: Jaegeuk Kim <[email protected]> Cc: John Hubbard <[email protected]> Cc: Joseph Qi <[email protected]> Cc: Junxiao Bi <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Zi Yan <[email protected]> Cc: Johannes Thumshirn <[email protected]> Cc: Miklos Szeredi <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-06-02mm: remove 'page_offset' from readahead loopMatthew Wilcox (Oracle)1-5/+3
Replace the page_offset variable with 'index + i'. Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: John Hubbard <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: William Kucharski <[email protected]> Cc: Chao Yu <[email protected]> Cc: Cong Wang <[email protected]> Cc: Darrick J. Wong <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Eric Biggers <[email protected]> Cc: Gao Xiang <[email protected]> Cc: Jaegeuk Kim <[email protected]> Cc: Joseph Qi <[email protected]> Cc: Junxiao Bi <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Zi Yan <[email protected]> Cc: Johannes Thumshirn <[email protected]> Cc: Miklos Szeredi <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-06-02mm: rename readahead loop variable to 'i'Matthew Wilcox (Oracle)1-4/+4
Change the type of page_idx to unsigned long, and rename it -- it's just a loop counter, not a page index. Suggested-by: John Hubbard <[email protected]> Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Dave Chinner <[email protected]> Reviewed-by: William Kucharski <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]> Cc: Chao Yu <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Cong Wang <[email protected]> Cc: Darrick J. Wong <[email protected]> Cc: Eric Biggers <[email protected]> Cc: Gao Xiang <[email protected]> Cc: Jaegeuk Kim <[email protected]> Cc: Joseph Qi <[email protected]> Cc: Junxiao Bi <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Zi Yan <[email protected]> Cc: Miklos Szeredi <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-06-02mm: rename various 'offset' parameters to 'index'Matthew Wilcox (Oracle)1-44/+42
The word 'offset' is used ambiguously to mean 'byte offset within a page', 'byte offset from the start of the file' and 'page offset from the start of the file'. Use 'index' to mean 'page offset from the start of the file' throughout the readahead code. [ We should probably rename the 'pgoff_t' type to 'pgidx_t' too - Linus ] Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Zi Yan <[email protected]> Reviewed-by: William Kucharski <[email protected]> Cc: Chao Yu <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Cong Wang <[email protected]> Cc: Darrick J. Wong <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Eric Biggers <[email protected]> Cc: Gao Xiang <[email protected]> Cc: Jaegeuk Kim <[email protected]> Cc: John Hubbard <[email protected]> Cc: Joseph Qi <[email protected]> Cc: Junxiao Bi <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Johannes Thumshirn <[email protected]> Cc: Miklos Szeredi <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-06-02mm: use readahead_control to pass argumentsMatthew Wilcox (Oracle)1-14/+19
In this patch, only between __do_page_cache_readahead() and read_pages(), but it will be extended in upcoming patches. The read_pages() function becomes aops centric, as this makes the most sense by the end of the patchset. Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: John Hubbard <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: William Kucharski <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]> Cc: Chao Yu <[email protected]> Cc: Cong Wang <[email protected]> Cc: Darrick J. Wong <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Eric Biggers <[email protected]> Cc: Gao Xiang <[email protected]> Cc: Jaegeuk Kim <[email protected]> Cc: Joseph Qi <[email protected]> Cc: Junxiao Bi <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Zi Yan <[email protected]> Cc: Miklos Szeredi <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-06-02mm: add new readahead_control APIMatthew Wilcox (Oracle)1-0/+140
Filesystems which implement the upcoming ->readahead method will get their pages by calling readahead_page() or readahead_page_batch(). These functions support large pages, even though none of the filesystems to be converted do yet. Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: William Kucharski <[email protected]> Cc: Chao Yu <[email protected]> Cc: Cong Wang <[email protected]> Cc: Darrick J. Wong <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Eric Biggers <[email protected]> Cc: Gao Xiang <[email protected]> Cc: Jaegeuk Kim <[email protected]> Cc: John Hubbard <[email protected]> Cc: Joseph Qi <[email protected]> Cc: Junxiao Bi <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Zi Yan <[email protected]> Cc: Johannes Thumshirn <[email protected]> Cc: Miklos Szeredi <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-06-02mm: move readahead nr_pages check into read_pagesMatthew Wilcox (Oracle)1-5/+7
Simplify the callers by moving the check for nr_pages and the BUG_ON into read_pages(). Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Zi Yan <[email protected]> Reviewed-by: John Hubbard <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: William Kucharski <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]> Cc: Chao Yu <[email protected]> Cc: Cong Wang <[email protected]> Cc: Darrick J. Wong <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Eric Biggers <[email protected]> Cc: Gao Xiang <[email protected]> Cc: Jaegeuk Kim <[email protected]> Cc: Joseph Qi <[email protected]> Cc: Junxiao Bi <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Miklos Szeredi <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-06-02mm: ignore return value of ->readpagesMatthew Wilcox (Oracle)1-6/+2
We used to assign the return value to a variable, which we then ignored. Remove the pretence of caring. Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Dave Chinner <[email protected]> Reviewed-by: John Hubbard <[email protected]> Reviewed-by: William Kucharski <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]> Cc: Chao Yu <[email protected]> Cc: Cong Wang <[email protected]> Cc: Darrick J. Wong <[email protected]> Cc: Eric Biggers <[email protected]> Cc: Gao Xiang <[email protected]> Cc: Jaegeuk Kim <[email protected]> Cc: Joseph Qi <[email protected]> Cc: Junxiao Bi <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Zi Yan <[email protected]> Cc: Miklos Szeredi <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-06-02mm: return void from various readahead functionsMatthew Wilcox (Oracle)3-28/+19
ondemand_readahead has two callers, neither of which use the return value. That means that both ra_submit and __do_page_cache_readahead() can return void, and we don't need to worry that a present page in the readahead window causes us to return a smaller nr_pages than we ought to have. Similarly, no caller uses the return value from force_page_cache_readahead(). Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Dave Chinner <[email protected]> Reviewed-by: John Hubbard <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: William Kucharski <[email protected]> Cc: Chao Yu <[email protected]> Cc: Cong Wang <[email protected]> Cc: Darrick J. Wong <[email protected]> Cc: Eric Biggers <[email protected]> Cc: Gao Xiang <[email protected]> Cc: Jaegeuk Kim <[email protected]> Cc: Joseph Qi <[email protected]> Cc: Junxiao Bi <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Zi Yan <[email protected]> Cc: Johannes Thumshirn <[email protected]> Cc: Miklos Szeredi <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-06-02mm: move readahead prototypes from mm.hMatthew Wilcox (Oracle)5-19/+13
Patch series "Change readahead API", v11. This series adds a readahead address_space operation to replace the readpages operation. The key difference is that pages are added to the page cache as they are allocated (and then looked up by the filesystem) instead of passing them on a list to the readpages operation and having the filesystem add them to the page cache. It's a net reduction in code for each implementation, more efficient than walking a list, and solves the direct-write vs buffered-read problem reported by yu kuai at http://lkml.kernel.org/r/[email protected] The only unconverted filesystems are those which use fscache. Their conversion is pending Dave Howells' rewrite which will make the conversion substantially easier. This should be completed by the end of the year. I want to thank the reviewers/testers; Dave Chinner, John Hubbard, Eric Biggers, Johannes Thumshirn, Dave Sterba, Zi Yan, Christoph Hellwig and Miklos Szeredi have done a marvellous job of providing constructive criticism. These patches pass an xfstests run on ext4, xfs & btrfs with no regressions that I can tell (some of the tests seem a little flaky before and remain flaky afterwards). This patch (of 25): The readahead code is part of the page cache so should be found in the pagemap.h file. force_page_cache_readahead is only used within mm, so move it to mm/internal.h instead. Remove the parameter names where they add no value, and rename the ones which were actively misleading. Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: John Hubbard <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: William Kucharski <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]> Cc: Chao Yu <[email protected]> Cc: Cong Wang <[email protected]> Cc: Darrick J. Wong <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Eric Biggers <[email protected]> Cc: Gao Xiang <[email protected]> Cc: Jaegeuk Kim <[email protected]> Cc: Joseph Qi <[email protected]> Cc: Junxiao Bi <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Zi Yan <[email protected]> Cc: Miklos Szeredi <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-06-02mm, dump_page(): do not crash with invalid mapping pointerVlastimil Babka1-6/+50
We have seen a following problem on a RPi4 with 1G RAM: BUG: Bad page state in process systemd-hwdb pfn:35601 page:ffff7e0000d58040 refcount:15 mapcount:131221 mapping:efd8fe765bc80080 index:0x1 compound_mapcount: -32767 Unable to handle kernel paging request at virtual address efd8fe765bc80080 Mem abort info: ESR = 0x96000004 Exception class = DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 Data abort info: ISV = 0, ISS = 0x00000004 CM = 0, WnR = 0 [efd8fe765bc80080] address between user and kernel address ranges Internal error: Oops: 96000004 [#1] SMP Modules linked in: btrfs libcrc32c xor xor_neon zlib_deflate raid6_pq mmc_block xhci_pci xhci_hcd usbcore sdhci_iproc sdhci_pltfm sdhci mmc_core clk_raspberrypi gpio_raspberrypi_exp pcie_brcmstb bcm2835_dma gpio_regulator phy_generic fixed sg scsi_mod efivarfs Supported: No, Unreleased kernel CPU: 3 PID: 408 Comm: systemd-hwdb Not tainted 5.3.18-8-default #1 SLE15-SP2 (unreleased) Hardware name: raspberrypi rpi/rpi, BIOS 2020.01 02/21/2020 pstate: 40000085 (nZcv daIf -PAN -UAO) pc : __dump_page+0x268/0x368 lr : __dump_page+0xc4/0x368 sp : ffff000012563860 x29: ffff000012563860 x28: ffff80003ddc4300 x27: 0000000000000010 x26: 000000000000003f x25: ffff7e0000d58040 x24: 000000000000000f x23: efd8fe765bc80080 x22: 0000000000020095 x21: efd8fe765bc80080 x20: ffff000010ede8b0 x19: ffff7e0000d58040 x18: ffffffffffffffff x17: 0000000000000001 x16: 0000000000000007 x15: ffff000011689708 x14: 3030386362353637 x13: 6566386466653a67 x12: 6e697070616d2031 x11: 32323133313a746e x10: 756f6370616d2035 x9 : ffff00001168a840 x8 : ffff00001077a670 x7 : 000000000000013d x6 : ffff0000118a43b5 x5 : 0000000000000001 x4 : ffff80003dd9e2c8 x3 : ffff80003dd9e2c8 x2 : 911c8d7c2f483500 x1 : dead000000000100 x0 : efd8fe765bc80080 Call trace: __dump_page+0x268/0x368 bad_page+0xd4/0x168 check_new_page_bad+0x80/0xb8 rmqueue_bulk.constprop.26+0x4d8/0x788 get_page_from_freelist+0x4d4/0x1228 __alloc_pages_nodemask+0x134/0xe48 alloc_pages_vma+0x198/0x1c0 do_anonymous_page+0x1a4/0x4d8 __handle_mm_fault+0x4e8/0x560 handle_mm_fault+0x104/0x1e0 do_page_fault+0x1e8/0x4c0 do_translation_fault+0xb0/0xc0 do_mem_abort+0x50/0xb0 el0_da+0x24/0x28 Code: f9401025 8b8018a0 9a851005 17ffffca (f94002a0) Besides the underlying issue with page->mapping containing a bogus value for some reason, we can see that __dump_page() crashed by trying to read the pointer at mapping->host, turning a recoverable warning into full Oops. It can be expected that when page is reported as bad state for some reason, the pointers there should not be trusted blindly. So this patch treats all data in __dump_page() that depends on page->mapping as lava, using probe_kernel_read_strict(). Ideally this would include the dentry->d_parent recursively, but that would mean changing printk handler for %pd. Chances of reaching the dentry printing part with an initially bogus mapping pointer should be rather low, though. Also prefix printing mapping->a_ops with a description of what is being printed. In case the value is bogus, %ps will print raw value instead of the symbol name and then it's not obvious at all that it's printing a_ops. Reported-by: Petr Tesarik <[email protected]> Signed-off-by: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: John Hubbard <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-06-02Documentation/vm/slub.rst: s/Toggle/Enable/Andrew Morton1-1/+1
"toggle" means to change a boolean thing's state. This operation doesn't do that - it sets it to "true". Signed-off-by: Andrew Morton <[email protected]> Acked-by: Rafael Aquini <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: David Rientjes <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Pekka Enberg <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-06-02mm/slub: fix stack overruns with SLUB_STATSQian Cai1-1/+2
There is no need to copy SLUB_STATS items from root memcg cache to new memcg cache copies. Doing so could result in stack overruns because the store function only accepts 0 to clear the stat and returns an error for everything else while the show method would print out the whole stat. Then, the mismatch of the lengths returns from show and store methods happens in memcg_propagate_slab_attrs(): else if (root_cache->max_attr_size < ARRAY_SIZE(mbuf)) buf = mbuf; max_attr_size is only 2 from slab_attr_store(), then, it uses mbuf[64] in show_stat() later where a bounch of sprintf() would overrun the stack variable. Fix it by always allocating a page of buffer to be used in show_stat() if SLUB_STATS=y which should only be used for debug purpose. # echo 1 > /sys/kernel/slab/fs_cache/shrink BUG: KASAN: stack-out-of-bounds in number+0x421/0x6e0 Write of size 1 at addr ffffc900256cfde0 by task kworker/76:0/53251 Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019 Workqueue: memcg_kmem_cache memcg_kmem_cache_create_func Call Trace: number+0x421/0x6e0 vsnprintf+0x451/0x8e0 sprintf+0x9e/0xd0 show_stat+0x124/0x1d0 alloc_slowpath_show+0x13/0x20 __kmem_cache_create+0x47a/0x6b0 addr ffffc900256cfde0 is located in stack of task kworker/76:0/53251 at offset 0 in frame: process_one_work+0x0/0xb90 this frame has 1 object: [32, 72) 'lockdep_map' Memory state around the buggy address: ffffc900256cfc80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffffc900256cfd00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >ffffc900256cfd80: 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1 ^ ffffc900256cfe00: 00 00 00 00 00 f2 f2 f2 00 00 00 00 00 00 00 00 ffffc900256cfe80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ================================================================== Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: __kmem_cache_create+0x6ac/0x6b0 Workqueue: memcg_kmem_cache memcg_kmem_cache_create_func Call Trace: __kmem_cache_create+0x6ac/0x6b0 Fixes: 107dab5c92d5 ("slub: slub-specific propagation changes") Signed-off-by: Qian Cai <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Cc: Glauber Costa <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: David Rientjes <[email protected]> Cc: Joonsoo Kim <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-06-02slub: remove kmalloc under list_lock from list_slab_objects() V2Christopher Lameter1-5/+15
list_slab_objects() is called when a slab is destroyed and there are objects still left to list the objects in the syslog. This is a pretty rare event. And there it seems we take the list_lock and call kmalloc while holding that lock. Perform the allocation in free_partial() before the list_lock is taken. Fixes: bbd7d57bfe852d9788bae5fb171c7edb4021d8ac ("slub: Potential stack overflow") Signed-off-by: Christopher Lameter <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: David Rientjes <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Cc: Tetsuo Handa <[email protected]> Cc: Yu Zhao <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-06-02slub: Remove userspace notifier for cache add/removeChristoph Lameter1-16/+1
I came across some unnecessary uevents once again which reminded me this. The patch seems to be lost in the leaves of the original discussion [1], so resending. [1] https://lore.kernel.org/r/[email protected] Kmem caches are internal kernel structures so it is strange that userspace notifiers would be needed. And I am not aware of any use of these notifiers. These notifiers may just exist because in the initial slub release the sysfs code was copied from another subsystem. Signed-off-by: Christoph Lameter <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Acked-by: Michal Koutný <[email protected]> Acked-by: David Rientjes <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Joonsoo Kim <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-06-02mm/slub.c: fix corrupted freechain in deactivate_slab()Dongli Zhang1-0/+27
The slub_debug is able to fix the corrupted slab freelist/page. However, alloc_debug_processing() only checks the validity of current and next freepointer during allocation path. As a result, once some objects have their freepointers corrupted, deactivate_slab() may lead to page fault. Below is from a test kernel module when 'slub_debug=PUF,kmalloc-128 slub_nomerge'. The test kernel corrupts the freepointer of one free object on purpose. Unfortunately, deactivate_slab() does not detect it when iterating the freechain. BUG: unable to handle page fault for address: 00000000123456f8 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] SMP PTI ... ... RIP: 0010:deactivate_slab.isra.92+0xed/0x490 ... ... Call Trace: ___slab_alloc+0x536/0x570 __slab_alloc+0x17/0x30 __kmalloc+0x1d9/0x200 ext4_htree_store_dirent+0x30/0xf0 htree_dirblock_to_tree+0xcb/0x1c0 ext4_htree_fill_tree+0x1bc/0x2d0 ext4_readdir+0x54f/0x920 iterate_dir+0x88/0x190 __x64_sys_getdents+0xa6/0x140 do_syscall_64+0x49/0x170 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Therefore, this patch adds extra consistency check in deactivate_slab(). Once an object's freepointer is corrupted, all following objects starting at this object are isolated. [[email protected]: fix build with CONFIG_SLAB_DEBUG=n] Signed-off-by: Dongli Zhang <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Cc: Joe Jin <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: David Rientjes <[email protected]> Cc: Joonsoo Kim <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-06-02usercopy: mark dma-kmalloc caches as usercopy cachesVlastimil Babka1-1/+2
We have seen a "usercopy: Kernel memory overwrite attempt detected to SLUB object 'dma-kmalloc-1 k' (offset 0, size 11)!" error on s390x, as IUCV uses kmalloc() with __GFP_DMA because of memory address restrictions. The issue has been discussed [2] and it has been noted that if all the kmalloc caches are marked as usercopy, there's little reason not to mark dma-kmalloc caches too. The 'dma' part merely means that __GFP_DMA is used to restrict memory address range. As Jann Horn put it [3]: "I think dma-kmalloc slabs should be handled the same way as normal kmalloc slabs. When a dma-kmalloc allocation is freshly created, it is just normal kernel memory - even if it might later be used for DMA -, and it should be perfectly fine to copy_from_user() into such allocations at that point, and to copy_to_user() out of them at the end. If you look at the places where such allocations are created, you can see things like kmemdup(), memcpy() and so on - all normal operations that shouldn't conceptually be different from usercopy in any relevant way." Thus this patch marks the dma-kmalloc-* caches as usercopy. [1] https://bugzilla.suse.com/show_bug.cgi?id=1156053 [2] https://lore.kernel.org/kernel-hardening/[email protected]/ [3] https://lore.kernel.org/kernel-hardening/CAG48ez1a4waGk9kB0WLaSbs4muSoK0AYAVk8=XYaKj4_+6e6Hg@mail.gmail.com/ Signed-off-by: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Acked-by: Christian Borntraeger <[email protected]> Acked-by: Jiri Slaby <[email protected]> Cc: Jann Horn <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Christopher Lameter <[email protected]> Cc: Julian Wiedmann <[email protected]> Cc: Ursula Braun <[email protected]> Cc: Alexander Viro <[email protected]> Cc: David Windsor <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: David Rientjes <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Laura Abbott <[email protected]> Cc: Mark Rutland <[email protected]> Cc: "Martin K. Petersen" <[email protected]> Cc: Paolo Bonzini <[email protected]> Cc: Christoffer Dall <[email protected]> Cc: Dave Kleikamp <[email protected]> Cc: Jan Kara <[email protected]> Cc: Luis de Bethencourt <[email protected]> Cc: Marc Zyngier <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Matthew Garrett <[email protected]> Cc: Michal Kubecek <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-06-02fs/buffer.c: record blockdev write errors in super_block that it backsJeff Layton1-0/+7
When syncing out a block device (a'la __sync_blockdev), any error encountered will only be recorded in the bd_inode's mapping. When the blockdev contains a filesystem however, we'd like to also record the error in the super_block that's stored there. Make mark_buffer_write_io_error also record the error in the corresponding super_block when a writeback error occurs and the block device contains a mounted superblock. Since superblocks are RCU freed, hold the rcu_read_lock to ensure that the superblock doesn't go away while we're marking it. Signed-off-by: Jeff Layton <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Jan Kara <[email protected]> Cc: Al Viro <[email protected]> Cc: Andres Freund <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: David Howells <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Dave Chinner <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>