aboutsummaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)AuthorFilesLines
2018-06-08Merge tag 'gpio-v4.18-1' of ↵Linus Torvalds3-5/+39
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio Pull GPIO updates from Linus Walleij: "This is the bulk of GPIO changes for the v4.18 development cycle. Core changes: - We have killed off VLA from the core library and all drivers. The background should be clear for everyone at this point: https://lwn.net/Articles/749064/ Also I just don't like VLA's, kernel developers hate it when compilers do things behind their back. It's as simple as that. I'm sorry that they even slipped in to begin with. Kudos to Laura Abbott for exorcising them. - Support GPIO hogs in machines/board files. New drivers and chip support: - R-Car r8a77470 (RZ/G1C) - R-Car r8a77965 (M3-N) - R-Car r8a77990 (E3) - PCA953x driver improvements to accomodate more variants. Improvements and new features: - Support one interrupt per line on port A in the DesignWare dwapb driver. Misc: - Random cleanups, right header files in the drivers, some size optimizations etc" * tag 'gpio-v4.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio: (73 commits) gpio: davinci: fix build warning when !CONFIG_OF gpio: dwapb: Fix rework support for 1 interrupt per port A GPIO gpio: pxa: Include the right header gpio: pl061: Include the right header gpio: pch: Include the right header gpio: pcf857x: Include the right header gpio: pca953x: Include the right header gpio: palmas: Include the right header gpio: omap: Include the right header gpio: octeon: Include the right header gpio: mxs: Switch to SPDX identifier gpio: Remove VLA from stmpe driver gpio: mxc: Switch to SPDX identifier gpio: mxc: add clock operation gpio: Remove VLA from gpiolib gpio: aspeed: Use a cache of output data registers gpio: aspeed: Set output latch before changing direction gpio: pca953x: fix address calculation for pcal6524 gpio: pca953x: define masks for addressing common and extended registers gpio: pca953x: set the PCA_PCAL flag also when matching by DT ...
2018-06-08Merge branch 'for-linus' of ↵Linus Torvalds1-3/+16
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid Pull HID updates from Jiri Kosina: - Valve Steam Controller support from Rodrigo Rivas Costa - Redragon Asura support from Robert Munteanu - improvement of duplicate usage handling in generic hid-input from Benjamin Tissoires - Win 8.1 precisioun touchpad spec implementation from Benjamin Tissoires - Support for "In Range" flag for Wacom Intuos/Bamboo devices from Jason Gerecke - other various assorted smaller fixes and improvements * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid: (27 commits) HID: rmi: use HID_QUIRK_NO_INPUT_SYNC HID: multitouch: fix calculation of last slot field in multi-touch reports HID: quirks: remove Delcom Visual Signal Indicator from hid_have_special_driver[] HID: steam: select CONFIG_POWER_SUPPLY HID: i2c-hid: remove i2c_hid_open_mut HID: wacom: Support "in range" for Intuos/Bamboo tablets where possible HID: core: fix hid_hw_open() comment HID: hid-plantronics: Re-resend Update to map button for PTT products HID: multitouch: fix types returned from mt_need_to_apply_feature() HID: i2c-hid: check if device is there before really probing HID: steam: add missing fields in client initialization HID: steam: add battery device. HID: add driver for Valve Steam Controller HID: alps: Fix some style in 't4_read_write_register()' HID: alps: Check errors returned by 't4_read_write_register()' HID: alps: Save a memory allocation in 't4_read_write_register()' when writing data HID: alps: Report an error if we receive invalid data in 't4_read_write_register()' HID: multitouch: implement precision touchpad latency and switches HID: multitouch: simplify the settings of the various features HID: multitouch: make use of HID_QUIRK_INPUT_PER_APP ...
2018-06-08Merge branch 'work.aio' of ↵Linus Torvalds3-2/+24
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull aio iopriority support from Al Viro: "The rest of aio stuff for this cycle - Adam's aio ioprio series" * 'work.aio' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fs: aio ioprio use ioprio_check_cap ret val fs: aio ioprio add explicit block layer dependence fs: iomap dio set bio prio from kiocb prio fs: blkdev set bio prio from kiocb prio fs: Add aio iopriority support fs: Convert kiocb rw_hint from enum to u16 block: add ioprio_check_cap function
2018-06-08Merge tag 'for-linus-4.18-rc1-tag' of ↵Linus Torvalds4-4/+104
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen updates from Juergen Gross: "This contains some minor code cleanups (fixing return types of functions), some fixes for Linux running as Xen PVH guest, and adding of a new guest resource mapping feature for Xen tools" * tag 'for-linus-4.18-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen/PVH: Make GDT selectors PVH-specific xen/PVH: Set up GS segment for stack canary xen/store: do not store local values in xen_start_info xen-netfront: fix xennet_start_xmit()'s return type xen/privcmd: add IOCTL_PRIVCMD_MMAP_RESOURCE xen: Change return type to vm_fault_t
2018-06-08block: add bioset_init_from_src() helperJens Axboe1-0/+1
Add a helper that allows a caller to initialize a new bio_set, using the settings from an existing bio_set. Reported-by: Venkat R.B <[email protected]> Tested-by: Venkat R.B <[email protected]> Tested-by: Li Wang <[email protected]> Reviewed-by: Mike Snitzer <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-06-08netfilter: remove include/net/netfilter/nft_dup.hCorentin Labbe1-10/+0
include/net/netfilter/nft_dup.h was introduced in d877f07112f1 ("netfilter: nf_tables: add nft_dup expression") but was never user since this date. Furthermore, the only struct in this file is unused elsewhere. Signed-off-by: Corentin Labbe <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2018-06-08Merge branch 'for-4.18/multitouch' into for-linusJiri Kosina1-3/+15
- improvement of duplicate usage handling in hid-input from Benjamin Tissoires - Win 8.1 precisioun touchpad spec implementation from Benjamin Tissoires
2018-06-07Merge branch 'akpm' (patches from Andrew)Linus Torvalds24-476/+437
Merge updates from Andrew Morton: - a few misc things - ocfs2 updates - v9fs updates - MM - procfs updates - lib/ updates - autofs updates * emailed patches from Andrew Morton <[email protected]>: (118 commits) autofs: small cleanup in autofs_getpath() autofs: clean up includes autofs: comment on selinux changes needed for module autoload autofs: update MAINTAINERS entry for autofs autofs: use autofs instead of autofs4 in documentation autofs: rename autofs documentation files autofs: create autofs Kconfig and Makefile autofs: delete fs/autofs4 source files autofs: update fs/autofs4/Makefile autofs: update fs/autofs4/Kconfig autofs: copy autofs4 to autofs autofs4: use autofs instead of autofs4 everywhere autofs4: merge auto_fs.h and auto_fs4.h fs/binfmt_misc.c: do not allow offset overflow checkpatch: improve patch recognition lib/ucs2_string.c: add MODULE_LICENSE() lib/mpi: headers cleanup lib/percpu_ida.c: use _irqsave() instead of local_irq_save() + spin_lock lib/idr.c: remove simple_ida_lock lib/bitmap.c: micro-optimization for __bitmap_complement() ...
2018-06-07autofs4: merge auto_fs.h and auto_fs4.hIan Kent2-162/+160
The autofs module has long since been removed so there's no need to have two separate include files for autofs. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ian Kent <[email protected]> Cc: Al Viro <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-06-07lib/mpi: headers cleanupVasily Averin1-61/+0
MPI headers contain definitions for huge number of non-existing functions. Most part of these functions was removed in 2012 by Dmitry Kasatkin - 7cf4206a99d1 ("Remove unused code from MPI library") - 9e235dcaf4f6 ("Revert "crypto: GnuPG based MPI lib - additional ...") - bc95eeadf5c6 ("lib/mpi: removed unused functions") however headers wwere not updated properly. Also I deleted some unused macros. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Vasily Averin <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Cc: Dmitry Kasatkin <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-06-07include/linux/types.h: use fixed width types without double-underscore prefixMasahiro Yamada1-14/+14
This header file is not exported. It is safe to reference types without double-underscore prefix. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Masahiro Yamada <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Alexey Dobriyan <[email protected]> Cc: Lihao Liang <[email protected]> Cc: Philippe Ombredanne <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-06-07include/linux/types.h: define aligned_ types based on uapi headerMasahiro Yamada1-3/+3
<uapi/linux/types.h> has the same typedefs except that it prefixes them with double-underscore for user space. Use them for the kernel space typedefs. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Masahiro Yamada <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Alexey Dobriyan <[email protected]> Cc: Lihao Liang <[email protected]> Cc: Philippe Ombredanne <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-06-07int-ll64.h: define u{8,16,32,64} and s{8,16,32,64} based on uapi headerMasahiro Yamada1-11/+8
<uapi/asm-generic/int-ll64.h> has the same typedefs except that it prefixes them with double-underscore for user space. Use them for the kernel space typedefs. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Masahiro Yamada <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Alexey Dobriyan <[email protected]> Cc: Lihao Liang <[email protected]> Cc: Philippe Ombredanne <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-06-07mm: remove page_is_poisoned() from linux/mm.hSahara1-2/+0
When commit bd33ef368135 ("mm: enable page poisoning early at boot") got rid of the PAGE_EXT_DEBUG_POISON, page_is_poisoned in the header left behind. This patch cleans up the leftovers under the table. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Sahara <[email protected]> Acked-by: Michal Hocko <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-06-07mem_cgroup: make sure moving_account, move_lock_task and stat_cpu in the ↵Aaron Lu1-4/+19
same cacheline The LKP robot found a 27% will-it-scale/page_fault3 performance regression regarding commit e27be240df53("mm: memcg: make sure memory.events is uptodate when waking pollers"). What the test does is: 1 mkstemp() a 128M file on a tmpfs; 2 start $nr_cpu processes, each to loop the following: 2.1 mmap() this file in shared write mode; 2.2 write 0 to this file in a PAGE_SIZE step till the end of the file; 2.3 unmap() this file and repeat this process. 3 After 5 minutes, check how many loops they managed to complete, the higher the better. The commit itself looks innocent enough as it merely changed some event counting mechanism and this test didn't trigger those events at all. Perf shows increased cycles spent on accessing root_mem_cgroup->stat_cpu in count_memcg_event_mm()(called by handle_mm_fault()) and in __mod_memcg_state() called by page_add_file_rmap(). So it's likely due to the changed layout of 'struct mem_cgroup' that either make stat_cpu falling into a constantly modifying cacheline or some hot fields stop being in the same cacheline. I verified this by moving memory_events[] back to where it was: : --- a/include/linux/memcontrol.h : +++ b/include/linux/memcontrol.h : @@ -205,7 +205,6 @@ struct mem_cgroup { : int oom_kill_disable; : : /* memory.events */ : - atomic_long_t memory_events[MEMCG_NR_MEMORY_EVENTS]; : struct cgroup_file events_file; : : /* protect arrays of thresholds */ : @@ -238,6 +237,7 @@ struct mem_cgroup { : struct mem_cgroup_stat_cpu __percpu *stat_cpu; : atomic_long_t stat[MEMCG_NR_STAT]; : atomic_long_t events[NR_VM_EVENT_ITEMS]; : + atomic_long_t memory_events[MEMCG_NR_MEMORY_EVENTS]; : : unsigned long socket_pressure; And performance restored. Later investigation found that as long as the following 3 fields moving_account, move_lock_task and stat_cpu are in the same cacheline, performance will be good. To avoid future performance surprise by other commits changing the layout of 'struct mem_cgroup', this patch makes sure the 3 fields stay in the same cacheline. One concern of this approach is, moving_account and move_lock_task could be modified when a process changes memory cgroup while stat_cpu is a always read field, it might hurt to place them in the same cacheline. I assume it is rare for a process to change memory cgroup so this should be OK. Link: https://lkml.kernel.org/r/20180528114019.GF9904@yexl-desktop Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Aaron Lu <[email protected]> Reported-by: kernel test robot <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Tejun Heo <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-06-07include/linux/gfp.h: fix the annotation of GFP_ZONE_TABLEHuaisheng Ye1-2/+2
When bit is equal to 0x4, it means OPT_ZONE_DMA32 should be got from GFP_ZONE_TABLE. OPT_ZONE_DMA32 shall be equal to ZONE_DMA32 or ZONE_NORMAL according to the status of CONFIG_ZONE_DMA32. Similarly, when bit is equal to 0xc, that means OPT_ZONE_DMA32 should be got with an allocation policy GFP_MOVABLE. So ZONE_DMA32 or ZONE_NORMAL is the possible result value. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Huaisheng Ye <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Kate Stewart <[email protected]> Cc: "Levin, Alexander (Sasha Levin)" <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Matthew Wilcox <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-06-07userfaultfd: prevent non-cooperative events vs mcopy_atomic racesMike Rapoport1-2/+4
If a process monitored with userfaultfd changes it's memory mappings or forks() at the same time as uffd monitor fills the process memory with UFFDIO_COPY, the actual creation of page table entries and copying of the data in mcopy_atomic may happen either before of after the memory mapping modifications and there is no way for the uffd monitor to maintain consistent view of the process memory layout. For instance, let's consider fork() running in parallel with userfaultfd_copy(): process | uffd monitor ---------------------------------+------------------------------ fork() | userfaultfd_copy() ... | ... dup_mmap() | down_read(mmap_sem) down_write(mmap_sem) | /* create PTEs, copy data */ dup_uffd() | up_read(mmap_sem) copy_page_range() | up_write(mmap_sem) | dup_uffd_complete() | /* notify monitor */ | If the userfaultfd_copy() takes the mmap_sem first, the new page(s) will be present by the time copy_page_range() is called and they will appear in the child's memory mappings. However, if the fork() is the first to take the mmap_sem, the new pages won't be mapped in the child's address space. If the pages are not present and child tries to access them, the monitor will get page fault notification and everything is fine. However, if the pages *are present*, the child can access them without uffd noticing. And if we copy them into child it'll see the wrong data. Since we are talking about background copy, we'd need to decide whether the pages should be copied or not regardless #PF notifications. Since userfaultfd monitor has no way to determine what was the order, let's disallow userfaultfd_copy in parallel with the non-cooperative events. In such case we return -EAGAIN and the uffd monitor can understand that userfaultfd_copy() clashed with a non-cooperative event and take an appropriate action. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Mike Rapoport <[email protected]> Acked-by: Pavel Emelyanov <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Andrei Vagin <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-06-07slub: remove kmem_cache->reservedMatthew Wilcox1-1/+0
The reserved field was only used for embedding an rcu_head in the data structure. With the previous commit, we no longer need it. That lets us remove the 'reserved' argument to a lot of functions. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox <[email protected]> Acked-by: Christoph Lameter <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Jérôme Glisse <[email protected]> Cc: "Kirill A . Shutemov" <[email protected]> Cc: Lai Jiangshan <[email protected]> Cc: Martin Schwidefsky <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Randy Dunlap <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Andrey Ryabinin <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-06-07mm: add hmm_data to struct pageMatthew Wilcox2-12/+8
Make hmm_data an explicit member of the struct page union. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Jérôme Glisse <[email protected]> Cc: "Kirill A . Shutemov" <[email protected]> Cc: Lai Jiangshan <[email protected]> Cc: Martin Schwidefsky <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Randy Dunlap <[email protected]> Cc: Andrey Ryabinin <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-06-07mm: add pt_mm to struct pageMatthew Wilcox1-1/+1
For pgd page table pages, x86 overloads the page->index field to store a pointer to the mm_struct. Rename this to pt_mm so it's visible to other users. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Jérôme Glisse <[email protected]> Cc: "Kirill A . Shutemov" <[email protected]> Cc: Lai Jiangshan <[email protected]> Cc: Martin Schwidefsky <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Randy Dunlap <[email protected]> Cc: Andrey Ryabinin <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-06-07mm: improve struct page documentationMatthew Wilcox1-21/+19
Rewrite the documentation to describe what you can use in struct page rather than what you can't. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox <[email protected]> Reviewed-by: Randy Dunlap <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Jérôme Glisse <[email protected]> Cc: "Kirill A . Shutemov" <[email protected]> Cc: Lai Jiangshan <[email protected]> Cc: Martin Schwidefsky <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Andrey Ryabinin <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-06-07mm: combine LRU and main union in struct pageMatthew Wilcox1-51/+46
This gives us five words of space in a single union in struct page. The compound_mapcount moves position (from offset 24 to offset 20) on 64-bit systems, but that does not seem likely to cause any trouble. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Jérôme Glisse <[email protected]> Cc: Lai Jiangshan <[email protected]> Cc: Martin Schwidefsky <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Randy Dunlap <[email protected]> Cc: Andrey Ryabinin <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-06-07mm: move lru union within struct pageMatthew Wilcox1-51/+51
Since the LRU is two words, this does not affect the double-word alignment of SLUB's freelist. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Jérôme Glisse <[email protected]> Cc: Lai Jiangshan <[email protected]> Cc: Martin Schwidefsky <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Randy Dunlap <[email protected]> Cc: Andrey Ryabinin <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-06-07mm: combine first three unions in struct pageMatthew Wilcox1-33/+33
By combining these three one-word unions into one three-word union, we make it easier for users to add their own multi-word fields to struct page, as well as making it obvious that SLUB needs to keep its double-word alignment for its freelist & counters. No field moves position; verified with pahole. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Jérôme Glisse <[email protected]> Cc: Lai Jiangshan <[email protected]> Cc: Martin Schwidefsky <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Randy Dunlap <[email protected]> Cc: Andrey Ryabinin <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-06-07mm: move _refcount out of struct page unionMatthew Wilcox1-15/+10
Keeping the refcount in the union only encourages people to put something else in the union which will overlap with _refcount and eventually explode messily. pahole reports no fields change location. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Jérôme Glisse <[email protected]> Cc: Lai Jiangshan <[email protected]> Cc: Martin Schwidefsky <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Randy Dunlap <[email protected]> Cc: Andrey Ryabinin <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-06-07mm: move 'private' union within struct pageMatthew Wilcox1-31/+25
By moving page->private to the fourth word of struct page, we can put the SLUB counters in the same word as SLAB's s_mem and still do the cmpxchg_double trick. Now the SLUB counters no longer overlap with the mapcount or refcount so we can drop the call to page_mapcount_reset() and simplify set_page_slub_counters() to a single line. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Jérôme Glisse <[email protected]> Cc: Lai Jiangshan <[email protected]> Cc: Martin Schwidefsky <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Randy Dunlap <[email protected]> Cc: Andrey Ryabinin <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-06-07mm: switch s_mem and slab_cache in struct pageMatthew Wilcox1-2/+2
This will allow us to store slub's counters in the same bits as slab's s_mem. slub now needs to set page->mapping to NULL as it frees the page, just like slab does. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox <[email protected]> Acked-by: Christoph Lameter <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Jérôme Glisse <[email protected]> Cc: "Kirill A . Shutemov" <[email protected]> Cc: Lai Jiangshan <[email protected]> Cc: Martin Schwidefsky <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Randy Dunlap <[email protected]> Cc: Andrey Ryabinin <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-06-07mm: mark pages in use for page tablesMatthew Wilcox3-1/+9
Define a new PageTable bit in the page_type and use it to mark pages in use as page tables. This can be helpful when debugging crashdumps or analysing memory fragmentation. Add a KPF flag to report these pages to userspace and update page-types.c to interpret that flag. Note that only pages currently accounted as NR_PAGETABLES are tracked as PageTable; this does not include pgd/p4d/pud/pmd pages. Those will be the subject of a later patch. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Jérôme Glisse <[email protected]> Cc: Lai Jiangshan <[email protected]> Cc: Martin Schwidefsky <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Randy Dunlap <[email protected]> Cc: Andrey Ryabinin <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-06-07mm: split page_type out from _mapcountMatthew Wilcox2-24/+34
We're already using a union of many fields here, so stop abusing the _mapcount and make page_type its own field. That implies renaming some of the machinery that creates PageBuddy, PageBalloon and PageKmemcg; bring back the PG_buddy, PG_balloon and PG_kmemcg names. As suggested by Kirill, make page_type a bitmask. Because it starts out life as -1 (thanks to sharing the storage with _mapcount), setting a page flag means clearing the appropriate bit. This gives us space for probably twenty or so extra bits (depending how paranoid we want to be about _mapcount underflow). Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Jérôme Glisse <[email protected]> Cc: Lai Jiangshan <[email protected]> Cc: Martin Schwidefsky <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Randy Dunlap <[email protected]> Cc: Andrey Ryabinin <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-06-07mm: save two stranded bits in gfp_maskShakeel Butt1-5/+5
___GFP_COLD and ___GFP_OTHER_NODE were removed but their bits were stranded. Fill the gaps by moving the existing gfp masks around. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Shakeel Butt <[email protected]> Suggested-by: Vlastimil Babka <[email protected]> Acked-by: Michal Hocko <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Cc: Greg Thelen <[email protected]> Cc: Mel Gorman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-06-07mm: change return type to vm_fault_tSouptick Joarder1-3/+3
Use new return type vm_fault_t for fault handler in struct vm_operations_struct. For now, this is just documenting that the function returns a VM_FAULT value rather than an errno. Once all instances are converted, vm_fault_t will become a distinct type. See commit 1c8f422059ae ("mm: change return type to vm_fault_t") Link: http://lkml.kernel.org/r/20180512063745.GA26866@jordon-HP-15-Notebook-PC Signed-off-by: Souptick Joarder <[email protected]> Reviewed-by: Matthew Wilcox <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Cc: Joe Perches <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Dan Williams <[email protected]> Cc: David Rientjes <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: Aneesh Kumar K.V <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-06-07mm: use new return type vm_fault_tSouptick Joarder1-2/+2
Use new return type vm_fault_t for fault handler in struct vm_operations_struct. For now, this is just documenting that the function returns a VM_FAULT value rather than an errno. Once all instances are converted, vm_fault_t will become a distinct type. Link: http://lkml.kernel.org/r/20180511190542.GA2412@jordon-HP-15-Notebook-PC Signed-off-by: Souptick Joarder <[email protected]> Reviewed-by: Matthew Wilcox <[email protected]> Cc: Dan Williams <[email protected]> Cc: Jan Kara <[email protected]> Cc: Ross Zwisler <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Pavel Tatashin <[email protected]> Cc: Michal Hocko <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-06-07memcg: introduce memory.minRoman Gushchin2-6/+20
Memory controller implements the memory.low best-effort memory protection mechanism, which works perfectly in many cases and allows protecting working sets of important workloads from sudden reclaim. But its semantics has a significant limitation: it works only as long as there is a supply of reclaimable memory. This makes it pretty useless against any sort of slow memory leaks or memory usage increases. This is especially true for swapless systems. If swap is enabled, memory soft protection effectively postpones problems, allowing a leaking application to fill all swap area, which makes no sense. The only effective way to guarantee the memory protection in this case is to invoke the OOM killer. It's possible to handle this case in userspace by reacting on MEMCG_LOW events; but there is still a place for a fail-safe in-kernel mechanism to provide stronger guarantees. This patch introduces the memory.min interface for cgroup v2 memory controller. It works very similarly to memory.low (sharing the same hierarchical behavior), except that it's not disabled if there is no more reclaimable memory in the system. If cgroup is not populated, its memory.min is ignored, because otherwise even the OOM killer wouldn't be able to reclaim the protected memory, and the system can stall. [[email protected]: s/low/min/ in docs] Link: http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Roman Gushchin <[email protected]> Reviewed-by: Randy Dunlap <[email protected]> Acked-by: Johannes Weiner <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Vladimir Davydov <[email protected]> Cc: Tejun Heo <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-06-07mm: move is_pageblock_removable_nolock() to mm/memory_hotplug.cMathieu Malaterre1-1/+0
is_pageblock_removable_nolock() is not used outside of mm/memory_hotplug.c. Move it next to unique caller is_mem_section_removable() and make it static. Remove prototype in <linux/memory_hotplug.h> to silence gcc warning (W=1): mm/page_alloc.c:7704:6: warning: no previous prototype for `is_pageblock_removable_nolock' [-Wmissing-prototypes] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Mathieu Malaterre <[email protected]> Suggested-by: Michal Hocko <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Acked-by: Michal Hocko <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-06-07memcg: writeback: use memcg->cgwb_list directlyWang Long1-1/+0
mem_cgroup_cgwb_list is a very simple wrapper and it will never be used outside of code under CONFIG_CGROUP_WRITEBACK. so use memcg->cgwb_list directly. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Wang Long <[email protected]> Reviewed-by: Jan Kara <[email protected]> Acked-by: Tejun Heo <[email protected]> Acked-by: Michal Hocko <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Cc: Johannes Weiner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-06-07mm/ksm: move [set_]page_stable_node from ksm.h to ksm.cMike Rapoport1-11/+0
page_stable_node() and set_page_stable_node() are only used in mm/ksm.c and there is no point to keep them in the include/linux/ksm.h [[email protected]: fix SYSFS=n build] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Mike Rapoport <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Cc: Andrea Arcangeli <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-06-07mm/ksm: remove unused page_referenced_ksm declarationMike Rapoport1-6/+0
Commit 9f32624be943 ("mm/rmap: use rmap_walk() in page_referenced()") removed the declaration of page_referenced_ksm for the case CONFIG_KSM=y, but left one for CONFIG_KSM=n. Remove the unused leftover. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Mike Rapoport <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Cc: Andrea Arcangeli <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-06-07lockdep: fix fs_reclaim annotationOmar Sandoval1-0/+4
While revisiting my Btrfs swapfile series [1], I introduced a situation in which reclaim would lock i_rwsem, and even though the swapon() path clearly made GFP_KERNEL allocations while holding i_rwsem, I got no complaints from lockdep. It turns out that the rework of the fs_reclaim annotation was broken: if the current task has PF_MEMALLOC set, we don't acquire the dummy fs_reclaim lock, but when reclaiming we always check this _after_ we've just set the PF_MEMALLOC flag. In most cases, we can fix this by moving the fs_reclaim_{acquire,release}() outside of the memalloc_noreclaim_{save,restore}(), althought kswapd is slightly different. After applying this, I got the expected lockdep splats. 1: https://lwn.net/Articles/625412/ Link: http://lkml.kernel.org/r/9f8aa70652a98e98d7c4de0fc96a4addcee13efe.1523778026.git.osandov@fb.com Fixes: d92a8cfcb37e ("locking/lockdep: Rework FS_RECLAIM annotation") Signed-off-by: Omar Sandoval <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Tetsuo Handa <[email protected]> Cc: Ingo Molnar <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-06-07mm: memory.low hierarchical behaviorRoman Gushchin2-2/+8
This patch aims to address an issue in current memory.low semantics, which makes it hard to use it in a hierarchy, where some leaf memory cgroups are more valuable than others. For example, there are memcgs A, A/B, A/C, A/D and A/E: A A/memory.low = 2G, A/memory.current = 6G //\\ BC DE B/memory.low = 3G B/memory.current = 2G C/memory.low = 1G C/memory.current = 2G D/memory.low = 0 D/memory.current = 2G E/memory.low = 10G E/memory.current = 0 If we apply memory pressure, B, C and D are reclaimed at the same pace while A's usage exceeds 2G. This is obviously wrong, as B's usage is fully below B's memory.low, and C has 1G of protection as well. Also, A is pushed to the size, which is less than A's 2G memory.low, which is also wrong. A simple bash script (provided below) can be used to reproduce the problem. Current results are: A: 1430097920 A/B: 711929856 A/C: 717426688 A/D: 741376 A/E: 0 To address the issue a concept of effective memory.low is introduced. Effective memory.low is always equal or less than original memory.low. In a case, when there is no memory.low overcommittment (and also for top-level cgroups), these two values are equal. Otherwise it's a part of parent's effective memory.low, calculated as a cgroup's memory.low usage divided by sum of sibling's memory.low usages (under memory.low usage I mean the size of actually protected memory: memory.current if memory.current < memory.low, 0 otherwise). It's necessary to track the actual usage, because otherwise an empty cgroup with memory.low set (A/E in my example) will affect actual memory distribution, which makes no sense. To avoid traversing the cgroup tree twice, page_counters code is reused. Calculating effective memory.low can be done in the reclaim path, as we conveniently traversing the cgroup tree from top to bottom and check memory.low on each level. So, it's a perfect place to calculate effective memory low and save it to use it for children cgroups. This also eliminates a need to traverse the cgroup tree from bottom to top each time to check if parent's guarantee is not exceeded. Setting/resetting effective memory.low is intentionally racy, but it's fine and shouldn't lead to any significant differences in actual memory distribution. With this patch applied results are matching the expectations: A: 2147930112 A/B: 1428721664 A/C: 718393344 A/D: 815104 A/E: 0 Test script: #!/bin/bash CGPATH="/sys/fs/cgroup" truncate /file1 --size 2G truncate /file2 --size 2G truncate /file3 --size 2G truncate /file4 --size 50G mkdir "${CGPATH}/A" echo "+memory" > "${CGPATH}/A/cgroup.subtree_control" mkdir "${CGPATH}/A/B" "${CGPATH}/A/C" "${CGPATH}/A/D" "${CGPATH}/A/E" echo 2G > "${CGPATH}/A/memory.low" echo 3G > "${CGPATH}/A/B/memory.low" echo 1G > "${CGPATH}/A/C/memory.low" echo 0 > "${CGPATH}/A/D/memory.low" echo 10G > "${CGPATH}/A/E/memory.low" echo $$ > "${CGPATH}/A/B/cgroup.procs" && vmtouch -qt /file1 echo $$ > "${CGPATH}/A/C/cgroup.procs" && vmtouch -qt /file2 echo $$ > "${CGPATH}/A/D/cgroup.procs" && vmtouch -qt /file3 echo $$ > "${CGPATH}/cgroup.procs" && vmtouch -qt /file4 echo "A: " `cat "${CGPATH}/A/memory.current"` echo "A/B: " `cat "${CGPATH}/A/B/memory.current"` echo "A/C: " `cat "${CGPATH}/A/C/memory.current"` echo "A/D: " `cat "${CGPATH}/A/D/memory.current"` echo "A/E: " `cat "${CGPATH}/A/E/memory.current"` rmdir "${CGPATH}/A/B" "${CGPATH}/A/C" "${CGPATH}/A/D" "${CGPATH}/A/E" rmdir "${CGPATH}/A" rm /file1 /file2 /file3 /file4 Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Roman Gushchin <[email protected]> Acked-by: Johannes Weiner <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Vladimir Davydov <[email protected]> Cc: Tejun Heo <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-06-07mm: rename page_counter's count/limit into usage/maxRoman Gushchin2-8/+8
This patch renames struct page_counter fields: count -> usage limit -> max and the corresponding functions: page_counter_limit() -> page_counter_set_max() mem_cgroup_get_limit() -> mem_cgroup_get_max() mem_cgroup_resize_limit() -> mem_cgroup_resize_max() memcg_update_kmem_limit() -> memcg_update_kmem_max() memcg_update_tcp_limit() -> memcg_update_tcp_max() The idea behind this renaming is to have the direct matching between memory cgroup knobs (low, high, max) and page_counters API. This is pure renaming, this patch doesn't bring any functional change. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Roman Gushchin <[email protected]> Acked-by: Johannes Weiner <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Vladimir Davydov <[email protected]> Cc: Tejun Heo <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-06-07mm/memblock: introduce PHYS_ADDR_MAXStefan Agner1-0/+1
So far code was using ULLONG_MAX and type casting to obtain a phys_addr_t with all bits set. The typecast is necessary to silence compiler warnings on 32-bit platforms. Use the simpler but still type safe approach "~(phys_addr_t)0" to create a preprocessor define for all bits set. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Stefan Agner <[email protected]> Suggested-by: Linus Torvalds <[email protected]> Acked-by: Michal Hocko <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Pavel Tatashin <[email protected]> Cc: Ard Biesheuvel <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-06-07mm: introduce ARCH_HAS_PTE_SPECIALLaurent Dufour1-2/+2
Currently the PTE special supports is turned on in per architecture header files. Most of the time, it is defined in arch/*/include/asm/pgtable.h depending or not on some other per architecture static definition. This patch introduce a new configuration variable to manage this directly in the Kconfig files. It would later replace __HAVE_ARCH_PTE_SPECIAL. Here notes for some architecture where the definition of __HAVE_ARCH_PTE_SPECIAL is not obvious: arm __HAVE_ARCH_PTE_SPECIAL which is currently defined in arch/arm/include/asm/pgtable-3level.h which is included by arch/arm/include/asm/pgtable.h when CONFIG_ARM_LPAE is set. So select ARCH_HAS_PTE_SPECIAL if ARM_LPAE. powerpc __HAVE_ARCH_PTE_SPECIAL is defined in 2 files: - arch/powerpc/include/asm/book3s/64/pgtable.h - arch/powerpc/include/asm/pte-common.h The first one is included if (PPC_BOOK3S & PPC64) while the second is included in all the other cases. So select ARCH_HAS_PTE_SPECIAL all the time. sparc: __HAVE_ARCH_PTE_SPECIAL is defined if defined(__sparc__) && defined(__arch64__) which are defined through the compiler in sparc/Makefile if !SPARC32 which I assume to be if SPARC64. So select ARCH_HAS_PTE_SPECIAL if SPARC64 There is no functional change introduced by this patch. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Laurent Dufour <[email protected]> Suggested-by: Jerome Glisse <[email protected]> Reviewed-by: Jerome Glisse <[email protected]> Acked-by: David Rientjes <[email protected]> Cc: Michal Hocko <[email protected]> Cc: "Aneesh Kumar K . V" <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Will Deacon <[email protected]> Cc: Yoshinori Sato <[email protected]> Cc: Rich Felker <[email protected]> Cc: David S. Miller <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Vineet Gupta <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Albert Ou <[email protected]> Cc: Martin Schwidefsky <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: David Rientjes <[email protected]> Cc: Robin Murphy <[email protected]> Cc: Christophe LEROY <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-06-07mm: restructure memfd codeMike Kravetz2-13/+16
With the addition of memfd hugetlbfs support, we now have the situation where memfd depends on TMPFS -or- HUGETLBFS. Previously, memfd was only supported on tmpfs, so it made sense that the code resided in shmem.c. In the current code, memfd is only functional if TMPFS is defined. If HUGETLFS is defined and TMPFS is not defined, then memfd functionality will not be available for hugetlbfs. This does not cause BUGs, just a lack of potentially desired functionality. Code is restructured in the following way: - include/linux/memfd.h is a new file containing memfd specific definitions previously contained in shmem_fs.h. - mm/memfd.c is a new file containing memfd specific code previously contained in shmem.c. - memfd specific code is removed from shmem_fs.h and shmem.c. - A new config option MEMFD_CREATE is added that is defined if TMPFS or HUGETLBFS is defined. No functional changes are made to the code: restructuring only. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Mike Kravetz <[email protected]> Reviewed-by: Khalid Aziz <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: David Herrmann <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Marc-Andr Lureau <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Michal Hocko <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-06-07mm, memcontrol: implement memory.swap.eventsTejun Heo1-0/+5
Add swap max and fail events so that userland can monitor and respond to running out of swap. I'm not too sure about the fail event. Right now, it's a bit confusing which stats / events are recursive and which aren't and also which ones reflect events which originate from a given cgroup and which targets the cgroup. No idea what the right long term solution is and it could just be that growing them organically is actually the only right thing to do. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Tejun Heo <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Vladimir Davydov <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Rik van Riel <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-06-07mm: introduce arg_lock to protect arg_start|end and env_start|end in mm_structYang Shi1-0/+2
mmap_sem is on the hot path of kernel, and it very contended, but it is abused too. It is used to protect arg_start|end and evn_start|end when reading /proc/$PID/cmdline and /proc/$PID/environ, but it doesn't make sense since those proc files just expect to read 4 values atomically and not related to VM, they could be set to arbitrary values by C/R. And, the mmap_sem contention may cause unexpected issue like below: INFO: task ps:14018 blocked for more than 120 seconds. Tainted: G E 4.9.79-009.ali3000.alios7.x86_64 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. ps D 0 14018 1 0x00000004 Call Trace: schedule+0x36/0x80 rwsem_down_read_failed+0xf0/0x150 call_rwsem_down_read_failed+0x18/0x30 down_read+0x20/0x40 proc_pid_cmdline_read+0xd9/0x4e0 __vfs_read+0x37/0x150 vfs_read+0x96/0x130 SyS_read+0x55/0xc0 entry_SYSCALL_64_fastpath+0x1a/0xc5 Both Alexey Dobriyan and Michal Hocko suggested to use dedicated lock for them to mitigate the abuse of mmap_sem. So, introduce a new spinlock in mm_struct to protect the concurrent access to arg_start|end, env_start|end and others, as well as replace write map_sem to read to protect the race condition between prctl and sys_brk which might break check_data_rlimit(), and makes prctl more friendly to other VM operations. This patch just eliminates the abuse of mmap_sem, but it can't resolve the above hung task warning completely since the later access_remote_vm() call needs acquire mmap_sem. The mmap_sem scalability issue will be solved in the future. [[email protected]: add comment about mmap_sem and arg_lock] Link: http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Yang Shi <[email protected]> Reviewed-by: Cyrill Gorcunov <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Alexey Dobriyan <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Mateusz Guzik <[email protected]> Cc: Kirill Tkhai <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-06-07slab: clean up the code comment in slab kmem_cache structBaoquan He1-3/+4
In commit 3b0efdfa1e7 ("mm, sl[aou]b: Extract common fields from struct kmem_cache") the variable 'obj_size' was moved above, however the related code comment is not updated accordingly. Do it here. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Baoquan He <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Acked-by: Christoph Lameter <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: David Rientjes <[email protected]> Cc: Joonsoo Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-06-07fs/dax.c: use new return type vm_fault_tSouptick Joarder2-4/+4
Use new return type vm_fault_t for fault handler. For now, this is just documenting that the function returns a VM_FAULT value rather than an errno. Once all instances are converted, vm_fault_t will become a distinct type. commit 1c8f422059ae ("mm: change return type to vm_fault_t") There was an existing bug inside dax_load_hole() if vm_insert_mixed had failed to allocate a page table, we'd return VM_FAULT_NOPAGE instead of VM_FAULT_OOM. With new vmf_insert_mixed() this issue is addressed. vm_insert_mixed_mkwrite has inefficiency when it returns an error value, driver has to convert it to vm_fault_t type. With new vmf_insert_mixed_mkwrite() this limitation will be addressed. Link: http://lkml.kernel.org/r/20180510181121.GA15239@jordon-HP-15-Notebook-PC Signed-off-by: Souptick Joarder <[email protected]> Reviewed-by: Jan Kara <[email protected]> Reviewed-by: Matthew Wilcox <[email protected]> Reviewed-by: Ross Zwisler <[email protected]> Cc: Alexander Viro <[email protected]> Cc: Dan Williams <[email protected]> Cc: Michal Hocko <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-06-07Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller1-2/+2
Daniel Borkmann says: ==================== pull-request: bpf 2018-06-08 The following pull-request contains BPF updates for your *net* tree. The main changes are: 1) Fix in the BPF verifier to reject modified ctx pointers on helper functions, from Daniel. 2) Fix in BPF kselftests for get_cgroup_id_user() helper to only record the cgroup id for a provided pid in order to reduce test failures from processes interferring with the test, from Yonghong. 3) Fix a crash in AF_XDP's mem accounting when the process owning the sock has CAP_IPC_LOCK capabilities set, from Daniel. 4) Fix an issue for AF_XDP on 32 bit machines where XDP_UMEM_PGOFF_*_RING defines need ULL suffixes and use loff_t type as they are otherwise truncated, from Geert. ==================== Signed-off-by: David S. Miller <[email protected]>
2018-06-07Merge branch 'next-integrity' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security Pull integrity updates from James Morris: "From Mimi: - add run time support for specifying additional security xattrs included in the security.evm HMAC/signature - some code clean up and bug fixes" * 'next-integrity' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: EVM: unlock on error path in evm_read_xattrs() EVM: prevent array underflow in evm_write_xattrs() EVM: Fix null dereference on xattr when xattr fails to allocate EVM: fix memory leak of temporary buffer 'temp' IMA: use list_splice_tail_init_rcu() instead of its open coded variant ima: use match_string() helper ima: fix updating the ima_appraise flag ima: based on policy verify firmware signatures (pre-allocated buffer) ima: define a new policy condition based on the filesystem name EVM: Allow runtime modification of the set of verified xattrs EVM: turn evm_config_xattrnames into a list integrity: Add an integrity directory in securityfs ima: Remove unused variable ima_initialized ima: Unify logging ima: Reflect correct permissions for policy
2018-06-08xsk: Fix umem fill/completion queue mmap on 32-bitGeert Uytterhoeven1-2/+2
With gcc-4.1.2 on 32-bit: net/xdp/xsk.c:663: warning: integer constant is too large for ‘long’ type net/xdp/xsk.c:665: warning: integer constant is too large for ‘long’ type Add the missing "ULL" suffixes to the large XDP_UMEM_PGOFF_*_RING values to fix this. net/xdp/xsk.c:663: warning: comparison is always false due to limited range of data type net/xdp/xsk.c:665: warning: comparison is always false due to limited range of data type "unsigned long" is 32-bit on 32-bit systems, hence the offset is truncated, and can never be equal to any of the XDP_UMEM_PGOFF_*_RING values. Use loff_t (and the required cast) to fix this. Fixes: 423f38329d267969 ("xsk: add umem fill queue support and mmap") Fixes: fe2308328cd2f26e ("xsk: add umem completion queue support and mmap") Signed-off-by: Geert Uytterhoeven <[email protected]> Acked-by: Björn Töpel <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]>