aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2017-05-03lockdep: allow to disable reclaim lockup detectionMichal Hocko3-1/+15
The current implementation of the reclaim lockup detection can lead to false positives and those even happen and usually lead to tweak the code to silence the lockdep by using GFP_NOFS even though the context can use __GFP_FS just fine. See http://lkml.kernel.org/r/20160512080321.GA18496@dastard as an example. ================================= [ INFO: inconsistent lock state ] 4.5.0-rc2+ #4 Tainted: G O --------------------------------- inconsistent {RECLAIM_FS-ON-R} -> {IN-RECLAIM_FS-W} usage. kswapd0/543 [HC0[0]:SC0[0]:HE1:SE1] takes: (&xfs_nondir_ilock_class){++++-+}, at: xfs_ilock+0x177/0x200 [xfs] {RECLAIM_FS-ON-R} state was registered at: mark_held_locks+0x79/0xa0 lockdep_trace_alloc+0xb3/0x100 kmem_cache_alloc+0x33/0x230 kmem_zone_alloc+0x81/0x120 [xfs] xfs_refcountbt_init_cursor+0x3e/0xa0 [xfs] __xfs_refcount_find_shared+0x75/0x580 [xfs] xfs_refcount_find_shared+0x84/0xb0 [xfs] xfs_getbmap+0x608/0x8c0 [xfs] xfs_vn_fiemap+0xab/0xc0 [xfs] do_vfs_ioctl+0x498/0x670 SyS_ioctl+0x79/0x90 entry_SYSCALL_64_fastpath+0x12/0x6f CPU0 ---- lock(&xfs_nondir_ilock_class); <Interrupt> lock(&xfs_nondir_ilock_class); *** DEADLOCK *** 3 locks held by kswapd0/543: stack backtrace: CPU: 0 PID: 543 Comm: kswapd0 Tainted: G O 4.5.0-rc2+ #4 Call Trace: lock_acquire+0xd8/0x1e0 down_write_nested+0x5e/0xc0 xfs_ilock+0x177/0x200 [xfs] xfs_reflink_cancel_cow_range+0x150/0x300 [xfs] xfs_fs_evict_inode+0xdc/0x1e0 [xfs] evict+0xc5/0x190 dispose_list+0x39/0x60 prune_icache_sb+0x4b/0x60 super_cache_scan+0x14f/0x1a0 shrink_slab.part.63.constprop.79+0x1e9/0x4e0 shrink_zone+0x15e/0x170 kswapd+0x4f1/0xa80 kthread+0xf2/0x110 ret_from_fork+0x3f/0x70 To quote Dave: "Ignoring whether reflink should be doing anything or not, that's a "xfs_refcountbt_init_cursor() gets called both outside and inside transactions" lockdep false positive case. The problem here is lockdep has seen this allocation from within a transaction, hence a GFP_NOFS allocation, and now it's seeing it in a GFP_KERNEL context. Also note that we have an active reference to this inode. So, because the reclaim annotations overload the interrupt level detections and it's seen the inode ilock been taken in reclaim ("interrupt") context, this triggers a reclaim context warning where it thinks it is unsafe to do this allocation in GFP_KERNEL context holding the inode ilock..." This sounds like a fundamental problem of the reclaim lock detection. It is really impossible to annotate such a special usecase IMHO unless the reclaim lockup detection is reworked completely. Until then it is much better to provide a way to add "I know what I am doing flag" and mark problematic places. This would prevent from abusing GFP_NOFS flag which has a runtime effect even on configurations which have lockdep disabled. Introduce __GFP_NOLOCKDEP flag which tells the lockdep gfp tracking to skip the current allocation request. While we are at it also make sure that the radix tree doesn't accidentaly override tags stored in the upper part of the gfp_mask. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Michal Hocko <[email protected]> Suggested-by: Peter Zijlstra <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Theodore Ts'o <[email protected]> Cc: Chris Mason <[email protected]> Cc: David Sterba <[email protected]> Cc: Jan Kara <[email protected]> Cc: Brian Foster <[email protected]> Cc: Darrick J. Wong <[email protected]> Cc: Nikolay Borisov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-05-03lockdep: teach lockdep about memalloc_noio_saveNikolay Borisov1-1/+4
Patch series "scope GFP_NOFS api", v5. This patch (of 7): Commit 21caf2fc1931 ("mm: teach mm by current context info to not do I/O during memory allocation") added the memalloc_noio_(save|restore) functions to enable people to modify the MM behavior by disabling I/O during memory allocation. This was further extended in commit 934f3072c17c ("mm: clear __GFP_FS when PF_MEMALLOC_NOIO is set"). memalloc_noio_* functions prevent allocation paths recursing back into the filesystem without explicitly changing the flags for every allocation site. However, lockdep hasn't been keeping up with the changes and it entirely misses handling the memalloc_noio adjustments. Instead, it is left to the callers of __lockdep_trace_alloc to call the function after they have shaven the respective GFP flags which can lead to false positives: ================================= [ INFO: inconsistent lock state ] 4.10.0-nbor #134 Not tainted --------------------------------- inconsistent {IN-RECLAIM_FS-W} -> {RECLAIM_FS-ON-W} usage. fsstress/3365 [HC0[0]:SC0[0]:HE1:SE1] takes: (&xfs_nondir_ilock_class){++++?.}, at: xfs_ilock+0x141/0x230 {IN-RECLAIM_FS-W} state was registered at: __lock_acquire+0x62a/0x17c0 lock_acquire+0xc5/0x220 down_write_nested+0x4f/0x90 xfs_ilock+0x141/0x230 xfs_reclaim_inode+0x12a/0x320 xfs_reclaim_inodes_ag+0x2c8/0x4e0 xfs_reclaim_inodes_nr+0x33/0x40 xfs_fs_free_cached_objects+0x19/0x20 super_cache_scan+0x191/0x1a0 shrink_slab+0x26f/0x5f0 shrink_node+0xf9/0x2f0 kswapd+0x356/0x920 kthread+0x10c/0x140 ret_from_fork+0x31/0x40 irq event stamp: 173777 hardirqs last enabled at (173777): __local_bh_enable_ip+0x70/0xc0 hardirqs last disabled at (173775): __local_bh_enable_ip+0x37/0xc0 softirqs last enabled at (173776): _xfs_buf_find+0x67a/0xb70 softirqs last disabled at (173774): _xfs_buf_find+0x5db/0xb70 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&xfs_nondir_ilock_class); <Interrupt> lock(&xfs_nondir_ilock_class); *** DEADLOCK *** 4 locks held by fsstress/3365: #0: (sb_writers#10){++++++}, at: mnt_want_write+0x24/0x50 #1: (&sb->s_type->i_mutex_key#12){++++++}, at: vfs_setxattr+0x6f/0xb0 #2: (sb_internal#2){++++++}, at: xfs_trans_alloc+0xfc/0x140 #3: (&xfs_nondir_ilock_class){++++?.}, at: xfs_ilock+0x141/0x230 stack backtrace: CPU: 0 PID: 3365 Comm: fsstress Not tainted 4.10.0-nbor #134 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014 Call Trace: kmem_cache_alloc_node_trace+0x3a/0x2c0 vm_map_ram+0x2a1/0x510 _xfs_buf_map_pages+0x77/0x140 xfs_buf_get_map+0x185/0x2a0 xfs_attr_rmtval_set+0x233/0x430 xfs_attr_leaf_addname+0x2d2/0x500 xfs_attr_set+0x214/0x420 xfs_xattr_set+0x59/0xb0 __vfs_setxattr+0x76/0xa0 __vfs_setxattr_noperm+0x5e/0xf0 vfs_setxattr+0xae/0xb0 setxattr+0x15e/0x1a0 path_setxattr+0x8f/0xc0 SyS_lsetxattr+0x11/0x20 entry_SYSCALL_64_fastpath+0x23/0xc6 Let's fix this by making lockdep explicitly do the shaving of respective GFP flags. Fixes: 934f3072c17c ("mm: clear __GFP_FS when PF_MEMALLOC_NOIO is set") Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Nikolay Borisov <[email protected]> Signed-off-by: Michal Hocko <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Theodore Ts'o <[email protected]> Cc: Chris Mason <[email protected]> Cc: David Sterba <[email protected]> Cc: Jan Kara <[email protected]> Cc: Brian Foster <[email protected]> Cc: Darrick J. Wong <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-05-03mm, vmstat: suppress pcp stats for unpopulated zones in zoneinfoDavid Rientjes1-7/+13
After "mm, vmstat: print non-populated zones in zoneinfo", /proc/zoneinfo will show unpopulated zones. The per-cpu pageset statistics are not relevant for unpopulated zones and can be potentially lengthy, so supress them when they are not interesting. Also moves lowmem reserve protection information above pcp stats since it is relevant for all zones per vm.lowmem_reserve_ratio. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: David Rientjes <[email protected]> Cc: Anshuman Khandual <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Johannes Weiner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-05-03mm, vmstat: print non-populated zones in zoneinfoDavid Rientjes1-10/+17
Initscripts can use the information (protection levels) from /proc/zoneinfo to configure vm.lowmem_reserve_ratio at boot. vm.lowmem_reserve_ratio is an array of ratios for each configured zone on the system. If a zone is not populated on an arch, /proc/zoneinfo suppresses its output. This results in there not being a 1:1 mapping between the set of zones emitted by /proc/zoneinfo and the zones configured by vm.lowmem_reserve_ratio. This patch shows statistics for non-populated zones in /proc/zoneinfo. The zones exist and hold a spot in the vm.lowmem_reserve_ratio array. Without this patch, it is not possible to determine which index in the array controls which zone if one or more zones on the system are not populated. Remaining users of walk_zones_in_node() are unchanged. Files such as /proc/pagetypeinfo require certain zone data to be initialized properly for display, which is not done for unpopulated zones. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: David Rientjes <[email protected]> Reviewed-by: Anshuman Khandual <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Michal Hocko <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-05-03mm: use is_migrate_isolate_page() to simplify the codeXishi Qiu1-3/+3
Use is_migrate_isolate_page() to simplify the code, no functional changes. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Xishi Qiu <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Joonsoo Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-05-03mm: use is_migrate_highatomic() to simplify the codeXishi Qiu3-9/+17
Introduce two helpers, is_migrate_highatomic() and is_migrate_highatomic_page(). Simplify the code, no functional changes. [[email protected]: use static inlines rather than macros, per mhocko] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Xishi Qiu <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Joonsoo Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-05-03mm, swap: Fix a race in free_swap_and_cache()Huang Ying1-9/+16
Before using cluster lock in free_swap_and_cache(), the swap_info_struct->lock will be held during freeing the swap entry and acquiring page lock, so the page swap count will not change when testing page information later. But after using cluster lock, the cluster lock (or swap_info_struct->lock) will be held only during freeing the swap entry. So before acquiring the page lock, the page swap count may be changed in another thread. If the page swap count is not 0, we should not delete the page from the swap cache. This is fixed via checking page swap count again after acquiring the page lock. I found the race when I review the code, so I didn't trigger the race via a test program. If the race occurs for an anonymous page shared by multiple processes via fork, multiple pages will be allocated and swapped in from the swap device for the previously shared one page. That is, the user-visible runtime effect is more memory will be used and the access latency for the page will be higher, that is, the performance regression. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: "Huang, Ying" <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Shaohua Li <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Tim Chen <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-05-03mm: memcontrol: provide shmem statisticsJohannes Weiner3-8/+26
Cgroups currently don't report how much shmem they use, which can be useful data to have, in particular since shmem is included in the cache/file item while being reclaimed like anonymous memory. Add a counter to track shmem pages during charging and uncharging. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Johannes Weiner <[email protected]> Reported-by: Chris Down <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Vladimir Davydov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-05-03proc: show MADV_FREE pages info in smapsShaohua Li2-1/+13
Show MADV_FREE pages info of each vma in smaps. The interface is for diganose or monitoring purpose, userspace could use it to understand what happens in the application. Since userspace could dirty MADV_FREE pages without notice from kernel, this interface is the only place we can get accurate accounting info about MADV_FREE pages. [[email protected]: update Documentation/filesystems/proc.txt] Link: http://lkml.kernel.org/r/89efde633559de1ec07444f2ef0f4963a97a2ce8.1487965799.git.shli@fb.com Signed-off-by: Shaohua Li <[email protected]> Acked-by: Johannes Weiner <[email protected]> Acked-by: Minchan Kim <[email protected]> Acked-by: Michal Hocko <[email protected]> Acked-by: Hillf Danton <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Mel Gorman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-05-03mm: enable MADV_FREE for swapless systemShaohua Li1-7/+1
Now MADV_FREE pages can be easily reclaimed even for swapless system. We can safely enable MADV_FREE for all systems. Link: http://lkml.kernel.org/r/155648585589300bfae1d45078e7aebb3d988b87.1487965799.git.shli@fb.com Signed-off-by: Shaohua Li <[email protected]> Acked-by: Johannes Weiner <[email protected]> Acked-by: Michal Hocko <[email protected]> Acked-by: Hillf Danton <[email protected]> Acked-by: Minchan Kim <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Mel Gorman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-05-03mm: fix lazyfree BUG_ON check in try_to_unmap_one()Minchan Kim1-2/+6
If a page is swapbacked, it means it should be in swapcache in try_to_unmap_one's path. If a page is !swapbacked, it mean it shouldn't be in swapcache in try_to_unmap_one's path. Check both two cases all at once and if it fails, warn and return SWAP_FAIL. Such bug never mean we should shut down the kernel. [[email protected]: do not use VM_WARN_ON_ONCE as if condition[ Link: http://lkml.kernel.org/r/20170309060226.GB854@bbox Link: http://lkml.kernel.org/r/20170307055551.GC29458@bbox Signed-off-by: Minchan Kim <[email protected]> Suggested-by: Johannes Weiner <[email protected]> Acked-by: Michal Hocko <[email protected]> Acked-by: Johannes Weiner <[email protected]> Cc: Shaohua Li <[email protected]> Cc: Hillf Danton <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Mel Gorman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-05-03mm: reclaim MADV_FREE pagesShaohua Li5-39/+46
When memory pressure is high, we free MADV_FREE pages. If the pages are not dirty in pte, the pages could be freed immediately. Otherwise we can't reclaim them. We put the pages back to anonumous LRU list (by setting SwapBacked flag) and the pages will be reclaimed in normal swapout way. We use normal page reclaim policy. Since MADV_FREE pages are put into inactive file list, such pages and inactive file pages are reclaimed according to their age. This is expected, because we don't want to reclaim too many MADV_FREE pages before used once pages. Based on Minchan's original patch [[email protected]: clean up lazyfree page handling] Link: http://lkml.kernel.org/r/20170303025237.GB3503@bbox Link: http://lkml.kernel.org/r/14b8eb1d3f6bf6cc492833f183ac8c304e560484.1487965799.git.shli@fb.com Signed-off-by: Shaohua Li <[email protected]> Signed-off-by: Minchan Kim <[email protected]> Acked-by: Minchan Kim <[email protected]> Acked-by: Michal Hocko <[email protected]> Acked-by: Johannes Weiner <[email protected]> Acked-by: Hillf Danton <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Mel Gorman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-05-03mm: move MADV_FREE pages into LRU_INACTIVE_FILE listShaohua Li6-28/+31
madv()'s MADV_FREE indicate pages are 'lazyfree'. They are still anonymous pages, but they can be freed without pageout. To distinguish these from normal anonymous pages, we clear their SwapBacked flag. MADV_FREE pages could be freed without pageout, so they pretty much like used once file pages. For such pages, we'd like to reclaim them once there is memory pressure. Also it might be unfair reclaiming MADV_FREE pages always before used once file pages and we definitively want to reclaim the pages before other anonymous and file pages. To speed up MADV_FREE pages reclaim, we put the pages into LRU_INACTIVE_FILE list. The rationale is LRU_INACTIVE_FILE list is tiny nowadays and should be full of used once file pages. Reclaiming MADV_FREE pages will not have much interfere of anonymous and active file pages. And the inactive file pages and MADV_FREE pages will be reclaimed according to their age, so we don't reclaim too many MADV_FREE pages too. Putting the MADV_FREE pages into LRU_INACTIVE_FILE_LIST also means we can reclaim the pages without swap support. This idea is suggested by Johannes. This patch doesn't move MADV_FREE pages to LRU_INACTIVE_FILE list yet to avoid bisect failure, next patch will do it. The patch is based on Minchan's original patch. [[email protected]: coding-style fixes] Link: http://lkml.kernel.org/r/2f87063c1e9354677b7618c647abde77b07561e5.1487965799.git.shli@fb.com Signed-off-by: Shaohua Li <[email protected]> Suggested-by: Johannes Weiner <[email protected]> Acked-by: Johannes Weiner <[email protected]> Acked-by: Minchan Kim <[email protected]> Acked-by: Michal Hocko <[email protected]> Acked-by: Hillf Danton <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Mel Gorman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-05-03mm: don't assume anonymous pages have SwapBacked flagShaohua Li4-8/+7
There are a few places the code assumes anonymous pages should have SwapBacked flag set. MADV_FREE pages are anonymous pages but we are going to add them to LRU_INACTIVE_FILE list and clear SwapBacked flag for them. The assumption doesn't hold any more, so fix them. Link: http://lkml.kernel.org/r/3945232c0df3dd6c4ef001976f35a95f18dcb407.1487965799.git.shli@fb.com Signed-off-by: Shaohua Li <[email protected]> Acked-by: Johannes Weiner <[email protected]> Acked-by: Hillf Danton <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Mel Gorman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-05-03mm: delete unnecessary TTU_* flagsShaohua Li4-22/+15
Patch series "mm: fix some MADV_FREE issues", v5. We are trying to use MADV_FREE in jemalloc. Several issues are found. Without solving the issues, jemalloc can't use the MADV_FREE feature. - Doesn't support system without swap enabled. Because if swap is off, we can't or can't efficiently age anonymous pages. And since MADV_FREE pages are mixed with other anonymous pages, we can't reclaim MADV_FREE pages. In current implementation, MADV_FREE will fallback to MADV_DONTNEED without swap enabled. But in our environment, a lot of machines don't enable swap. This will prevent our setup using MADV_FREE. - Increases memory pressure. page reclaim bias file pages reclaim against anonymous pages. This doesn't make sense for MADV_FREE pages, because those pages could be freed easily and refilled with very slight penality. Even page reclaim doesn't bias file pages, there is still an issue, because MADV_FREE pages and other anonymous pages are mixed together. To reclaim a MADV_FREE page, we probably must scan a lot of other anonymous pages, which is inefficient. In our test, we usually see oom with MADV_FREE enabled and nothing without it. - Accounting. There are two accounting problems. We don't have a global accounting. If the system is abnormal, we don't know if it's a problem from MADV_FREE side. The other problem is RSS accounting. MADV_FREE pages are accounted as normal anon pages and reclaimed lazily, so application's RSS becomes bigger. This confuses our workloads. We have monitoring daemon running and if it finds applications' RSS becomes abnormal, the daemon will kill the applications even kernel can reclaim the memory easily. To address the first the two issues, we can either put MADV_FREE pages into a separate LRU list (Minchan's previous patches and V1 patches), or put them into LRU_INACTIVE_FILE list (suggested by Johannes). The patchset use the second idea. The reason is LRU_INACTIVE_FILE list is tiny nowadays and should be full of used once file pages. So we can still efficiently reclaim MADV_FREE pages there without interference with other anon and active file pages. Putting the pages into inactive file list also has an advantage which allows page reclaim to prioritize MADV_FREE pages and used once file pages. MADV_FREE pages are put into the lru list and clear SwapBacked flag, so PageAnon(page) && !PageSwapBacked(page) will indicate a MADV_FREE pages. These pages will directly freed without pageout if they are clean, otherwise normal swap will reclaim them. For the third issue, the previous post adds global accounting and a separate RSS count for MADV_FREE pages. The problem is we never get accurate accounting for MADV_FREE pages. The pages are mapped to userspace, can be dirtied without notice from kernel side. To get accurate accounting, we could write protect the page, but then there is extra page fault overhead, which people don't want to pay. Jemalloc guys have concerns about the inaccurate accounting, so this post drops the accounting patches temporarily. The info exported to /proc/pid/smaps for MADV_FREE pages are kept, which is the only place we can get accurate accounting right now. This patch (of 6): Johannes pointed out TTU_LZFREE is unnecessary. It's true because we always have the flag set if we want to do an unmap. For cases we don't do an unmap, the TTU_LZFREE part of code should never run. Also the TTU_UNMAP is unnecessary. If no other flags set (for example, TTU_MIGRATION), an unmap is implied. The patch includes Johannes's cleanup and dead TTU_ACTION macro removal code Link: http://lkml.kernel.org/r/4be3ea1bc56b26fd98a54d0a6f70bec63f6d8980.1487965799.git.shli@fb.com Signed-off-by: Shaohua Li <[email protected]> Suggested-by: Johannes Weiner <[email protected]> Acked-by: Johannes Weiner <[email protected]> Acked-by: Minchan Kim <[email protected]> Acked-by: Hillf Danton <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Mel Gorman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-05-03mm/page-writeback.c: use setup_deferrable_timerGeliang Tang1-3/+2
Use setup_deferrable_timer() instead of init_timer_deferrable() to simplify the code. Link: http://lkml.kernel.org/r/e8e3d4280a34facbc007346f31df833cec28801e.1488070291.git.geliangtang@gmail.com Signed-off-by: Geliang Tang <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-05-03mm: remove unnecessary back-off function when retrying page reclaimJohannes Weiner1-9/+6
The backoff mechanism is not needed. If we have MAX_RECLAIM_RETRIES loops without progress, we'll OOM anyway; backing off might cut one or two iterations off that in the rare OOM case. If we have intermittent success reclaiming a few pages, the backoff function gets reset also, and so is of little help in these scenarios. We might want a backoff function for when there IS progress, but not enough to be satisfactory. But this isn't that. Remove it. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Johannes Weiner <[email protected]> Acked-by: Hillf Danton <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Jia He <[email protected]> Cc: Mel Gorman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-05-03Revert "mm, vmscan: account for skipped pages as a partial scan"Johannes Weiner1-18/+4
This reverts commit d7f05528eedb047efe2288cff777676b028747b6. Now that reclaimability of a node is no longer based on the ratio between pages scanned and theoretically reclaimable pages, we can remove accounting tricks for pages skipped due to zone constraints. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Johannes Weiner <[email protected]> Acked-by: Hillf Danton <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Jia He <[email protected]> Cc: Mel Gorman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-05-03mm: delete NR_PAGES_SCANNED and pgdat_reclaimable()Johannes Weiner5-41/+3
NR_PAGES_SCANNED counts number of pages scanned since the last page free event in the allocator. This was used primarily to measure the reclaimability of zones and nodes, and determine when reclaim should give up on them. In that role, it has been replaced in the preceding patches by a different mechanism. Being implemented as an efficient vmstat counter, it was automatically exported to userspace as well. It's however unlikely that anyone outside the kernel is using this counter in any meaningful way. Remove the counter and the unused pgdat_reclaimable(). Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Johannes Weiner <[email protected]> Acked-by: Hillf Danton <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Jia He <[email protected]> Cc: Mel Gorman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-05-03mm: don't avoid high-priority reclaim on memcg limit reclaimJohannes Weiner1-57/+37
Commit 246e87a93934 ("memcg: fix get_scan_count() for small targets") sought to avoid high reclaim priorities for memcg by forcing it to scan a minimum amount of pages when lru_pages >> priority yielded nothing. This was done at a time when reclaim decisions like dirty throttling were tied to the priority level. Nowadays, the only meaningful thing still tied to priority dropping below DEF_PRIORITY - 2 is gating whether laptop_mode=1 is generally allowed to write. But that is from an era where direct reclaim was still allowed to call ->writepage, and kswapd nowadays avoids writes until it's scanned every clean page in the system. Potential changes to how quick sc->may_writepage could trigger are of little concern. Remove the force_scan stuff, as well as the ugly multi-pass target calculation that it necessitated. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Johannes Weiner <[email protected]> Acked-by: Hillf Danton <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Jia He <[email protected]> Cc: Mel Gorman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-05-03mm: don't avoid high-priority reclaim on unreclaimable nodesJohannes Weiner1-14/+5
Commit 246e87a93934 ("memcg: fix get_scan_count() for small targets") sought to avoid high reclaim priorities for kswapd by forcing it to scan a minimum amount of pages when lru_pages >> priority yielded nothing. Commit b95a2f2d486d ("mm: vmscan: convert global reclaim to per-memcg LRU lists"), due to switching global reclaim to a round-robin scheme over all cgroups, had to restrict this forceful behavior to unreclaimable zones in order to prevent massive overreclaim with many cgroups. The latter patch effectively neutered the behavior completely for all but extreme memory pressure. But in those situations we might as well drop the reclaimers to lower priority levels. Remove the check. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Johannes Weiner <[email protected]> Acked-by: Hillf Danton <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Jia He <[email protected]> Cc: Mel Gorman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-05-03mm: remove unnecessary reclaimability check from NUMA balancing targetJohannes Weiner1-3/+0
NUMA balancing already checks the watermarks of the target node to decide whether it's a suitable balancing target. Whether the node is reclaimable or not is irrelevant when we don't intend to reclaim. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Johannes Weiner <[email protected]> Acked-by: Hillf Danton <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Jia He <[email protected]> Cc: Mel Gorman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-05-03mm: remove seemingly spurious reclaimability check from laptop_mode gatingJohannes Weiner1-1/+1
Commit 1d82de618ddd ("mm, vmscan: make kswapd reclaim in terms of nodes") allowed laptop_mode=1 to start writing not just when the priority drops to DEF_PRIORITY - 2 but also when the node is unreclaimable. That appears to be a spurious change in this patch as I doubt the series was tested with laptop_mode, and neither is that particular change mentioned in the changelog. Remove it, it's still recent. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Johannes Weiner <[email protected]> Acked-by: Hillf Danton <[email protected]> Acked-by: Mel Gorman <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Jia He <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-05-03mm: fix check for reclaimable pages in PF_MEMALLOC reclaim throttlingJohannes Weiner1-2/+4
PF_MEMALLOC direct reclaimers get throttled on a node when the sum of all free pages in each zone fall below half the min watermark. During the summation, we want to exclude zones that don't have reclaimables. Checking the same pgdat over and over again doesn't make sense. Fixes: 599d0c954f91 ("mm, vmscan: move LRU lists to node") Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Johannes Weiner <[email protected]> Acked-by: Hillf Danton <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Jia He <[email protected]> Cc: Mel Gorman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-05-03mm: fix 100% CPU kswapd busyloop on unreclaimable nodesJohannes Weiner5-23/+43
Patch series "mm: kswapd spinning on unreclaimable nodes - fixes and cleanups". Jia reported a scenario in which the kswapd of a node indefinitely spins at 100% CPU usage. We have seen similar cases at Facebook. The kernel's current method of judging its ability to reclaim a node (or whether to back off and sleep) is based on the amount of scanned pages in proportion to the amount of reclaimable pages. In Jia's and our scenarios, there are no reclaimable pages in the node, however, and the condition for backing off is never met. Kswapd busyloops in an attempt to restore the watermarks while having nothing to work with. This series reworks the definition of an unreclaimable node based not on scanning but on whether kswapd is able to actually reclaim pages in MAX_RECLAIM_RETRIES (16) consecutive runs. This is the same criteria the page allocator uses for giving up on direct reclaim and invoking the OOM killer. If it cannot free any pages, kswapd will go to sleep and leave further attempts to direct reclaim invocations, which will either make progress and re-enable kswapd, or invoke the OOM killer. Patch #1 fixes the immediate problem Jia reported, the remainder are smaller fixlets, cleanups, and overall phasing out of the old method. Patch #6 is the odd one out. It's a nice cleanup to get_scan_count(), and directly related to #5, but in itself not relevant to the series. If the whole series is too ambitious for 4.11, I would consider the first three patches fixes, the rest cleanups. This patch (of 9): Jia He reports a problem with kswapd spinning at 100% CPU when requesting more hugepages than memory available in the system: $ echo 4000 >/proc/sys/vm/nr_hugepages top - 13:42:59 up 3:37, 1 user, load average: 1.09, 1.03, 1.01 Tasks: 1 total, 1 running, 0 sleeping, 0 stopped, 0 zombie %Cpu(s): 0.0 us, 12.5 sy, 0.0 ni, 85.5 id, 2.0 wa, 0.0 hi, 0.0 si, 0.0 st KiB Mem: 31371520 total, 30915136 used, 456384 free, 320 buffers KiB Swap: 6284224 total, 115712 used, 6168512 free. 48192 cached Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 76 root 20 0 0 0 0 R 100.0 0.000 217:17.29 kswapd3 At that time, there are no reclaimable pages left in the node, but as kswapd fails to restore the high watermarks it refuses to go to sleep. Kswapd needs to back away from nodes that fail to balance. Up until commit 1d82de618ddd ("mm, vmscan: make kswapd reclaim in terms of nodes") kswapd had such a mechanism. It considered zones whose theoretically reclaimable pages it had reclaimed six times over as unreclaimable and backed away from them. This guard was erroneously removed as the patch changed the definition of a balanced node. However, simply restoring this code wouldn't help in the case reported here: there *are* no reclaimable pages that could be scanned until the threshold is met. Kswapd would stay awake anyway. Introduce a new and much simpler way of backing off. If kswapd runs through MAX_RECLAIM_RETRIES (16) cycles without reclaiming a single page, make it back off from the node. This is the same number of shots direct reclaim takes before declaring OOM. Kswapd will go to sleep on that node until a direct reclaimer manages to reclaim some pages, thus proving the node reclaimable again. [[email protected]: check kswapd failure against the cumulative nr_reclaimed count] Link: http://lkml.kernel.org/r/[email protected] [[email protected]: fix condition for throttle_direct_reclaim] Link: http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Johannes Weiner <[email protected]> Signed-off-by: Shakeel Butt <[email protected]> Reported-by: Jia He <[email protected]> Tested-by: Jia He <[email protected]> Acked-by: Michal Hocko <[email protected]> Acked-by: Hillf Danton <[email protected]> Acked-by: Minchan Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-05-03slab: avoid IPIs when creating kmem cachesGreg Thelen1-1/+6
Each slab kmem cache has per cpu array caches. The array caches are created when the kmem_cache is created, either via kmem_cache_create() or lazily when the first object is allocated in context of a kmem enabled memcg. Array caches are replaced by writing to /proc/slabinfo. Array caches are protected by holding slab_mutex or disabling interrupts. Array cache allocation and replacement is done by __do_tune_cpucache() which holds slab_mutex and calls kick_all_cpus_sync() to interrupt all remote processors which confirms there are no references to the old array caches. IPIs are needed when replacing array caches. But when creating a new array cache, there's no need to send IPIs because there cannot be any references to the new cache. Outside of memcg kmem accounting these IPIs occur at boot time, so they're not a problem. But with memcg kmem accounting each container can create kmem caches, so the IPIs are wasteful. Avoid unnecessary IPIs when creating array caches. Test which reports the IPI count of allocating slab in 10000 memcg: import os def ipi_count(): with open("/proc/interrupts") as f: for l in f: if 'Function call interrupts' in l: return int(l.split()[1]) def echo(val, path): with open(path, "w") as f: f.write(val) n = 10000 os.chdir("/mnt/cgroup/memory") pid = str(os.getpid()) a = ipi_count() for i in range(n): os.mkdir(str(i)) echo("1G\n", "%d/memory.limit_in_bytes" % i) echo("1G\n", "%d/memory.kmem.limit_in_bytes" % i) echo(pid, "%d/cgroup.procs" % i) open("/tmp/x", "w").close() os.unlink("/tmp/x") b = ipi_count() print "%d loops: %d => %d (+%d ipis)" % (n, a, b, b-a) echo(pid, "cgroup.procs") for i in range(n): os.rmdir(str(i)) patched: 10000 loops: 1069 => 1170 (+101 ipis) unpatched: 10000 loops: 1192 => 48933 (+47741 ipis) Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Greg Thelen <[email protected]> Acked-by: Joonsoo Kim <[email protected]> Acked-by: David Rientjes <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Pekka Enberg <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-05-03fs/ocfs2/cluster: use offset_in_page() macroGeliang Tang1-1/+1
Use offset_in_page() macro instead of open-coding. Link: http://lkml.kernel.org/r/4dbc77ccaaed98b183cf4dba58a4fa325fd65048.1492758503.git.geliangtang@gmail.com Signed-off-by: Geliang Tang <[email protected]> Cc: Mark Fasheh <[email protected]> Cc: Joel Becker <[email protected]> Cc: Junxiao Bi <[email protected]> Cc: Joseph Qi <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-05-03ocfs2: o2hb: revert hb threshold to keep compatibleJunxiao Bi1-4/+4
Configfs is the interface for ocfs2-tools to set configure to kernel and $configfs_dir/cluster/$clustername/heartbeat/dead_threshold is the one used to configure heartbeat dead threshold. Kernel has a default value of it but user can set O2CB_HEARTBEAT_THRESHOLD in /etc/sysconfig/o2cb to override it. Commit 45b997737a80 ("ocfs2/cluster: use per-attribute show and store methods") changed heartbeat dead threshold name while ocfs2-tools did not, so ocfs2-tools won't set this configurable and the default value is always used. So revert it. Fixes: 45b997737a80 ("ocfs2/cluster: use per-attribute show and store methods") Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Junxiao Bi <[email protected]> Acked-by: Joseph Qi <[email protected]> Cc: Mark Fasheh <[email protected]> Cc: Joel Becker <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-05-03fs/ocfs2/cluster: use setup_timerGeliang Tang1-3/+2
Use setup_timer() instead of init_timer() to simplify the code. Link: http://lkml.kernel.org/r/5e75bf07beb91e092d5aa36c36769949a480456a.1489060564.git.geliangtang@gmail.com Signed-off-by: Geliang Tang <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-05-03blackfin: bf609: let clk_disable() return immediately if clk is NULLMasahiro Yamada1-0/+3
In many of clk_disable() implementations, it is a no-op for a NULL pointer input, but this is one of the exceptions. Making it treewide consistent will allow clock consumers to call clk_disable() without NULL pointer check. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Masahiro Yamada <[email protected]> Cc: Stephen Boyd <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: Michael Turquette <[email protected]> Cc: Steven Miao <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-05-03scripts/spelling.txt: add several more common spelling mistakesColin Ian King1-0/+25
Here are some of the more common spelling mistakes that I've found while fixing up spelling mistakes in kernel error message text. They probably should be added to this list so we don't keep on seeing them appearing again. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Colin Ian King <[email protected]> Acked-by: Kees Cook <[email protected]> Cc: Masahiro Yamada <[email protected]> Cc: Joe Perches <[email protected]> Cc: Stephen Boyd <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-05-03lib/dma-debug.c: make locking work for RTPankaj Gupta1-6/+2
Interrupt enable/disabled with spinlock is not a valid operation for RT as it can make executing tasks sleep from a non-sleepable context. So convert it to spin_lock_irq[save, restore]. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Pankaj Gupta <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Niklas Söderlund <[email protected]> Cc: Vinod Koul <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Ville Syrjl <[email protected]> Cc: Miles Chen <[email protected]> Cc: Marcelo Tosatti <[email protected]> Cc: Joerg Roedel <[email protected]> Cc: Stanislaw Gruszka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-05-03Merge tag 'for-linus' of ↵Linus Torvalds242-5505/+14599
git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull rdma updates from Doug Ledford: "More exchaustive description of primary updates in this release: - Lots of driver fixes and misc fixes across the board. - I had to base on a net-next tree because the IPoIB Accelorator patches needed it. Unfortunately, it was known to Mellanox that there would need to be an IPoIB accelorator patch to the net tree (which left some functions turned off by an #ifdef construct to avoid warnings about defined but unused functions), then one to the RDMA tree, then a fixup that went back and re-enabled the functions in the net tree and enabled their use in the rdma tree Also, a sparse fix was sent to the net tree after I did my pull, and the fixup patch conflicts quite directly with that sparse fix, so I'm going to submit the fixup patch towards the end of the merge window by itself and based upon your master branch at the time. - Two separate rounds of hfi1 fixes, one that got dropped from last release because it came in just a day or two before the end of the merge window and then the one from this release cycle. Of note is that I now have a third series that just landed from Intel yesterday. It is not included in this pull request, but I may submit it by the end of the week. I'll talk to Intel about improving the timing of thier submissions for my workflow. - Changes to our idr usage in the RDMA subsystem that will tie into our cgroup management and also into the upcoming changes for the RDMA kernel<->userspace API. - Addition of support for a netdev to be tied to an RDMA device at the core level - Addition of the VNIC driver from Intel. While IPoIB provides IP over InfiniBand (and *only* IP, no lower layer protocol headers are allowed or supported), the VNIC driver presents a virtual Ethernet device with support for things like varying Ethertypes, VLANs, priorities and other features of Ethernet. The virtual devices are centrally managed by the OPA fabric manager, making this (for the time being) a strictly OPA specific feature. - Improvements to the On-Demand Paging support in the RDMA subsystem. - Addition of three significant OPA changes. While we added OPA support some time ago (via the hfi1 driver), the RDMA subsystem has so far glossed over the areas where OPA and InfiniBand differ. With this release we are starting to add support for the OPA extensions into the RDMA core in the following area: Extended port information for OPA is now supported, extended Address Handle attributes for OPA are now supported, and extended SA Queries to get OPA specific subnet information is now supported. Concise summary from the tag: - idr usage and locking changes - build fix for hns - ipoib debug path record file fix - hfi1 updates - core RDMA netdev addition - Intel VNIC driver addition - Enhanced accelerators for IPoIB addition - Debug cleanups in cxgb3/4 - Trivial cleanups from SF Markus Elfring - Misc rxe fixes from Mellanox - Misc ipoib fixes from Mellanox - Lots of mlx4/mlx5 changes from Mellanox - Misc fixes across the RDMA subsystem - ODP paging fixes and improvements - qedr updates - hfi1 updates - OPA port info patches - OPA AH patches - OPA SA Query patches" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (191 commits) infiniband: avoid dereferencing uninitialized dst on error path IB/SA: Add OPA addr header IB/mlx5: Add port_xmit_wait to counter registers read IB/ocrdma: fix out of bounds access to local buffer IB/mlx4: Fix incorrect order of formal and actual parameters IB/mlx4: Change flush logic so it adheres to the variable name mlx5: Fix mlx5_ib_map_mr_sg mr length IB/rxe: Don't clamp residual length to mtu IB/SA: Add support to query OPA path records IB/SA: Add OPA path record type IB/SA: Split struct sa_path_rec based on IB and ROCE specific fields IB/SA: Introduce path record specific types IB/SA: Rename ib_sa_path_rec to sa_path_rec IB/CM: Add braces when using sizeof IB/core: Define 'opa' rdma_ah_attr type IB/core: Define 'ib' and 'roce' rdma_ah_attr types IB/core: Use rdma_ah_attr accessor functions IB/core: Add accessor functions for rdma_ah_attr fields IB/PVRDMA: Rename ib_ah_attr related functions IB/mthca: Rename to_ib_ah_attr to to_rdma_ah_attr ...
2017-05-03Merge branch 'for-linus' of ↵Linus Torvalds129-5390/+7608
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input Pull input subsystem updates from Dmitry Torokhov: - a big update from Mauro converting input documentation to ReST format - Synaptics PS/2 is now aware of SMBus companion devices, which means that we can now use native RMI4 protocol to handle touchpads, instead of relying on legacy PS/2 mode. - we removed support from BMA180 accelerometer from input devices as it is now handled properly by IIO - update to TSC2007 to corretcly report pressure - other miscellaneous driver fixes. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (152 commits) Input: ar1021_i2c - use BIT to check for a bit Input: twl4030-pwrbutton - use input_set_capability() helper Input: twl4030-pwrbutton - use correct device for irq request Input: ar1021_i2c - enable touch mode during open Input: add uinput documentation dt-bindings: input: add bindings document for ar1021_i2c driver dt-bindings: input: rotary-encoder: fix typo Input: xen-kbdfront - add module parameter for setting resolution ARM: pxa/raumfeld: fix compile error in rotary controller resources Input: xpad - do not suggest writing to Dominic Input: xpad - don't use literal blocks inside footnotes Input: xpad - note that usb/devices is now at /sys/kernel/debug/ Input: docs - freshen up introduction Input: docs - split input docs into kernel- and user-facing Input: docs - note that MT-A protocol is obsolete Input: docs - update joystick documentation a bit Input: docs - remove disclaimer/GPL notice Input: fix "Game console" heading level in joystick documentation Input: rotary-encoder - remove references to platform data from docs Input: move documentation for Amiga CD32 ...
2017-05-03Merge tag 'spi-v4.12' of ↵Linus Torvalds32-410/+964
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi Pull spi updates from Mark Brown: "There's quite a lot of small driver specific fixes and enhancements in this release but the main activity has been around the loopback and spidev test drivers which is good to see as it should hopefully help improve the quality of all the drivers as people start to make use of the new code: - Additional tests in the loopback test driver for vmalloc() compatibility and around delays together with fixes for existing tests. - Support for testing continuous data transfer for use in soak testing. - Device property support for board info platforms. - Support for registering empty sets of devices via board info (useful when writing code to enumerate hardware automatically)" * tag 'spi-v4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: (52 commits) spi: cadence: Allow for GPIO pins to be used as chipselects spi-imx: Implements handling of the SPI_READY mode flag. spi: tegra: fix spelling mistake: "trasfer" -> "transfer" spi: spi-ti-qspi: Use bounce buffer if read buffer is not DMA'ble spi: Add can_dma like interface for spi_flash_read spi: dw: Disable clock after unregistering the host spi: double time out tolerance spi: atmel: add deepest PM support to SAMA5D2 spi: atmel: factorize reusable code for SPI controller init spi: orion: add LSB support spi: pl022: don't use uninitialized variable spi: loopback-test: fix spelling mistake: "minimam" -> "minimum" spi: dynamycally allocated message initialization spi: spi-ti-qspi: Remove unused dma_dev variable spi: omap2-mcspi: poll OMAP2_MCSPI_CHSTAT_RXS for PIO transfer spi: spi-ti-qspi: Use dma_engine wrapper for dma memcpy call spi: spidev_test: add option to continuously transfer data spi: loopback-test: fix potential integer overflow on multiple spi: sun6i: update max transfer size reported spi: pl022: Document property values ...
2017-05-03Merge tag 'regulator-v4.12' of ↵Linus Torvalds42-175/+1626
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator Pull regulator updates from Mark Brown: "Quite a lot going on with the regulator API for this release, much more in the core than in the drivers for a change: - Fixes for voltage change propagation through dumb power switches. - A notification when regulators are enabled. - A new settling time property for regulators where the time taken to move to a new voltage is not related to the size of the change. - Some reorganization of the Arizona drivers in preparation for sharing the code with the next generation devices they've been integrated with. - Support for newer Freescale chips in the Anatop regulator. - A new driver for voltage controlled regulators to cope with some exciting ChromeOS hardware designs. - Support for Rohm BD9571MWV-M and TI TPS65132" * tag 'regulator-v4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator: (51 commits) regulator: Add ROHM BD9571MWV-M PMIC regulator driver regulator: arizona-ldo1: Factor out generic initialization regulator: arizona-ldo1: Make arizona_ldo1 independent of struct arizona regulator: arizona-ldo1: Move pdata into a separate structure regulator: arizona-micsupp: Factor out generic initialization regulator: arizona-micsupp: Make arizona_micsupp independent of struct arizona regulator: arizona-micsupp: Move pdata into a separate structure regulator: arizona: Split KConfig options for LDO1 and MICSUPP regulators regulator: anatop: make regulator name property required regulator: tps65023: Fix inverted core enable logic. regulator: anatop: make sure regulator name is properly defined regulator: core: Allow dummy regulators for supplies regulator: core: Only propagate voltage changes to if it can change voltages regulator: vctrl: Fix out of bounds array access for vctrl->vtable regulator: tps65132: fix platform_no_drv_owner.cocci warnings regulator: tps65132: Fix off-by-one for .max_register setting regulator: anatop: set default voltage selector for pcie regulator: tps65132: add device-tree binding regulator: tps65132: add regulator driver for TI TPS65132 regulator: anatop: remove unneeded name field of struct anatop_regulator ...
2017-05-03Merge branch 'i2c/for-4.12' of ↵Linus Torvalds31-259/+674
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c updates from Wilfram Sang: "I2C has the following updates for you: - an immutable cross-subsystem branch fixing PMIC access on Intel Baytrail - bigger driver updates to the designware, meson, exynos5 drivers - new i2c_acpi_new_device() function to create devices from ACPI - struct i2c_driver has now a flag 'disable_i2c_core_irq_mapping' to allow custom IRQ mapping in case the default does not fit - mux subsystem centralized error messages in its core - new driver for ltc4306 i2c mux - usual set of small updates" * 'i2c/for-4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: (44 commits) i2c: thunderx: Enable HWMON class probing i2c: rcar: clarify PM handling with more comments i2c: rcar: fix resume by always initializing registers before transfer i2c: tegra: fix spelling mistake: "contoller" -> "controller" i2c: exynos5: use core helper to get driver data i2c: exynos5: de-duplicate error logs on clock setup i2c: exynos5: simplify clock frequency handling i2c: exynos5: simplify timings calculation i2c: designware-baytrail: fix potential null pointer dereference on dev i2c: designware: Get selected speed mode sda-hold-time via ACPI [media] cx231xx: stop double error reporting i2c: core: Allow drivers to disable i2c-core irq mapping i2c: core: Add new i2c_acpi_new_device helper function i2c: core: Allow getting ACPI info by index i2c: img-scb: use setup_timer i2c: i2c-scmi: add a MS HID i2c: mux: ltc4306: LTC4306 and LTC4305 I2C multiplexer/switch dt-bindings: i2c: mux: ltc4306: Add dt-bindings for I2C multiplexer/switch i2c: mux: reg: stop double error reporting i2c: mux: pinctrl: stop double error reporting ...
2017-05-03Merge tag 'mfd-next-4.12' of ↵Linus Torvalds80-2566/+5573
git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd Pull MFD updates from Lee Jones: "New Drivers: - Freescale MXS Low Resolution ADC - Freescale i.MX23/i.MX28 LRADC touchscreen - Motorola CPCAP Power Button - TI LMU (Lighting Management Unit) - Atmel SMC (Static Memory Controller) New Device Support: - Add support for X-Powers AXP803 to axp20x - Add support for Dialog Semi DA9061 to da9062-core - Add support for Intel Cougar Mountain to lpc_ich - Add support for Intel Gemini Lake to lpc_ich New Functionality: - Add Device Tree support; wm831x-*, axp20x, ti-lmu, da9062, sun4i-gpadc - Add IRQ sense support; motorola-cpcap - Add ACPI support; cros_ec - Add Reset support; altera-a10sr - Add ADC support; axp20x - Add AC Power support; axp20x - Add Runtime PM support; atmel-ebi, exynos-lpass - Add Battery Power Supply support; axp20x - Add Clock support; exynos-lpass, hi655x-pmic Fix-ups: - Implicitly specify required headers; motorola-cpcap, intel_soc_pmic_bxtwc - Add .remove() method; stm32-timers, exynos-lpass - Remove unused code; intel_soc_pmic_core, intel-lpss-acpi, ipaq-micro, atmel-smc, menelaus - Rename variables for clarity; axp20x - Convert pr_warning() to pr_warn(); db8500-prcmu, sta2x11-mfd, twl4030-power - Improve formatting; arizona-core, axp20x - Use raw_spinlock_*() variants; asic3, t7l66xb, tc6393xb - Simplify/refactor code; arizona-core, atmel-ebi - Improve error checking; intel_soc_pmic_core Bug Fixes: - Ensure OMAP3630/3730 boards can successfully reboot; twl4030-power - Correct max-register value; stm32-timers - Extend timeout to account for clock stretching; cros_ec_spi - Use correct IRQ trigger type; motorola-cpcap - Fix bad use of IRQ sense register; motorola-cpcap - Logic error "||" should be "&&"; mxs-lradc-ts" * tag 'mfd-next-4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd: (79 commits) input: touchscreen: mxs-lradc: || vs && typos dt-bindings: Add AXP803's regulator info mfd: axp20x: Support AXP803 variant dt-bindings: Add device tree binding for X-Powers AXP803 PMIC dt-bindings: Make AXP20X compatible strings one per line mfd: intel_soc_pmic_core: Fix unchecked return value mfd: menelaus: Remove obsolete local_irq_disable() and local_irq_enable() mfd: omap-usb-tll: Configure ULPIAUTOIDLE mfd: omap-usb-tll: Fix inverted bit use for USB TLL mode mfd: palmas: Fixed spelling mistake in error message mfd: lpc_ich: Add support for Intel Gemini Lake SoC mfd: hi655x: Add the clock cell to provide WiFi and Bluetooth mfd: intel_soc_pmic: Fix a mess with compilation units mfd: exynos-lpass: Add runtime PM support mfd: exynos-lpass: Add missing remove() function mfd: exynos-lpass: Add support for clocks mfd: exynos-lpass: Remove pad retention control iio: adc: add support for X-Powers AXP20X and AXP22X PMICs ADCs mfd: cpcap: Fix bad use of IRQ sense register mfd: cpcap: Use ack_invert interrupts ...
2017-05-03Merge tag 'backlight-next-4.12' of ↵Linus Torvalds4-0/+460
git://git.kernel.org/pub/scm/linux/kernel/git/lee/backlight Pull backlight update from Lee Jones: "New Arctic Sand ARC2C0608 LED Backlight driver" * tag 'backlight-next-4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/backlight: backlight: Add support for Arctic Sand LED backlight driver chips dt-bindings: backlight: arcxcnn: Supply bindings for Arctic Sand backlight
2017-05-03Merge tag 'sound-4.12-rc1' of ↵Linus Torvalds306-2691/+19287
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound updates from Takashi Iwai: "It was a relatively calm development cycle, and no scaring changes are seen in both core and driver sides. Here are some highlights: ASoC: - A new API for hooking up jacks more generically and easily - Card longname is set based on DMI for a unique UCM profile - Lots of Intel driver fixes: Atom, Broxton, Skylake and newer chips - New drivers for Cirrus CS35L35, DIO DIO2125, Everest ES7132, HiSilicon hi6210, Maxim MAX98927, MT2701 systems with WM8960, Nuvoton NAU8824, Odroid systems, ST STM32 SAI controllers and x86 systems with DA7213 HD-audio: - Many new quirks to support headset for various devices (mostly ASUS ones) as usual - Support for dual codecs on some Gigabyte mobos and Lenovo laptop - Improvement on PCM position reporting for Skylake and newer FireWire: - New drivers for MOTU and RME Fireface series - Updates for Digidesign Digi00x and TASCAM series - Support for tracepoints Others: - USB-audio: improved support for quirk_alias option - Cleanups, constification allover the places" * tag 'sound-4.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (299 commits) ASoC: codec: wm8960: Relax bit clock computation when using PLL ASoC: codec: wm9860: avoid maybe-uninitialized warning ASoC: nau8824: leave Class D gain at chip default ASoC: nau8824: rename controls to match DAPM controls ASoC: Intel: Skylake: Return negative error code ASoC: Intel: Skylake: Fix unused variable warning ASoC: Intel: Skylake: fix uninitialized pointer use ASoC: sti: Fix error handling if of_clk_get() fails ASoC: cs4271: configure reset GPIO as output ASoC: dwc: Disallow building designware_pcm as a module ALSA: ali5451: fix spelling mistake in "ali_capture_preapre" ASoC: stm32: add SAI driver ASoC: stm32: add bindings for SAI ASoC: Intel: Skylake: Add loadable module support on KBL platform ASoC: Intel: Skylake: Modify load_lib_ipc arguments for a nowait version ASoC: Intel: Skylake: Register dsp_fw_ops for kabylake ASoC: Intel: Skylake: Modify arguments to reuse module transfer function ASoC: Intel: Skylake: Commonize library load ASoC: Intel: Skylake: Move sst common initialization to a helper function ASoC: nau8824: new driver ...
2017-05-03Merge tag 'drm-for-v4.12' of git://people.freedesktop.org/~airlied/linuxLinus Torvalds1060-25924/+455128
Pull drm u pdates from Dave Airlie: "This is the main drm pull request for v4.12. Apart from two fixes pulls, everything should have been in drm-next for at least 2 weeks. The biggest thing in here is AMD released the public headers for their upcoming VEGA GPUs. These as always are quite a sizeable chunk of header files. They've also added initial non-display support for those GPUs, though they aren't available in production yet. Otherwise it's pretty much normal. New bridge drivers: - megachips-stdpxxxx-ge-b850v3-fw LVDS->DP++ - generic LVDS bridge support. Core: - Displayport link train failure reporting to userspace - debugfs interface cleaned up - subsystem TODO in kerneldoc now - Extended fbdev support (flipping and vblank wait) - drm_platform removed - EDP CRC support in helper - HF-VSDB SCDC support in EDID parser - Lots of code cleanups and header extraction - Thunderbolt external GPU awareness - Atomic helper improvements - Documentation improvements panel: - Sitronix and Samsung new panel support amdgpu: - Preliminary vega10 support - Multi-level page table support - GPU sensor support for userspace - PRT support for sparse buffers - SR-IOV improvements - Non-contig VRAM CPU mapping i915: - Atomic modesetting enabled by default on Gen5+ - LSPCON improvements - Atomic state handling for cdclk - GPU reset improvements - In-kernel unit tests - Geminilake improvements and color manager support - Designware i2c fixes - vblank evasion improvements - Hotplug safe connector iterators - GVT scheduler QoS support - GVT Kabylake support nouveau: - Acceleration support for Pascal (GP10x). - Rearchitecture of code handling proprietary signed firmware - Fix GTX 970 with odd MMU configuration - GP10B support - GP107 acceleration support vmwgfx: - Atomic modesetting support for vmwgfx omapdrm: - Support for render nodes - Refactor omapdss code - Fix some probe ordering issues - Fix too dark RGB565 rendering sunxi: - prelim rework for multiple pipes. mali-dp: - Color management support - Plane scaling - Power management improvements imx-drm: - Prefetch Resolve Engine/Gasket on i.MX6QP - Deferred plane disabling - Separate alpha support mediatek: - Mediatek SoC MT2701 support rcar-du: - Gen3 HDMI support msm: - 4k support for newer chips - OPP bindings for gpu - prep work for per-process pagetables vc4: - HDMI audio support - fixes qxl: - minor fixes. dw-hdmi: - PHY improvements - CSC fixes - Amlogic GX SoC support" * tag 'drm-for-v4.12' of git://people.freedesktop.org/~airlied/linux: (1778 commits) drm/nouveau/fb/gf100-: Fix 32 bit wraparound in new ram detection drm/nouveau/secboot/gm20b: fix the error return code in gm20b_secboot_tegra_read_wpr() drm/nouveau/kms: Increase max retries in scanout position queries. drm/nouveau/bios/bitP: check that table is long enough for optional pointers drm/nouveau/fifo/nv40: no ctxsw for pre-nv44 mpeg engine drm: mali-dp: use div_u64 for expensive 64-bit divisions drm/i915: Confirm the request is still active before adding it to the await drm/i915: Avoid busy-spinning on VLV_GLTC_PW_STATUS mmio drm/i915/selftests: Allocate inode/file dynamically drm/i915: Fix system hang with EI UP masked on Haswell drm/i915: checking for NULL instead of IS_ERR() in mock selftests drm/i915: Perform link quality check unconditionally during long pulse drm/i915: Fix use after free in lpe_audio_platdev_destroy() drm/i915: Use the right mapping_gfp_mask for final shmem allocation drm/i915: Make legacy cursor updates more unsynced drm/i915: Apply a cond_resched() to the saturated signaler drm/i915: Park the signaler before sleeping drm: mali-dp: Check the mclk rate and allow up/down scaling drm: mali-dp: Enable image enhancement when scaling drm: mali-dp: Add plane upscaling support ...
2017-05-03Merge branch 'generic' of ↵Linus Torvalds24-180/+335
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull quota, reiserfs, udf and ext2 updates from Jan Kara: "The branch contains changes to quota code so that it does not modify persistent flags in inode->i_flags (it was the only place in kernel doing that) and handle it inside filesystem's quotaon/off handlers instead. The branch also contains two UDF cleanups, a couple of reiserfs fixes and one fix for ext2 quota locking" * 'generic' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: ext4: Improve comments in ext4_quota_{on|off}() udf: use kmap_atomic for memcpy copying udf: use octal for permissions quota: Remove dquot_quotactl_ops reiserfs: Remove i_attrs_to_sd_attrs() reiserfs: Remove useless setting of i_flags jfs: Remove jfs_get_inode_flags() ext2: Remove ext2_get_inode_flags() ext4: Remove ext4_get_inode_flags() quota: Stop setting IMMUTABLE and NOATIME flags on quota files jfs: Set flags on quota files directly ext2: Set flags on quota files directly reiserfs: Set flags on quota files directly ext4: Set flags on quota files directly reiserfs: Protect dquot_writeback_dquots() by s_umount semaphore reiserfs: Make cancel_old_flush() reliable ext2: Call dquot_writeback_dquots() with s_umount held reiserfs: avoid a -Wmaybe-uninitialized warning
2017-05-03Merge branch 'fsnotify' of ↵Linus Torvalds24-771/+815
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull fsnotify updates from Jan Kara: "The branch contains mainly a rework of fsnotify infrastructure fixing a shortcoming that we have waited for response to fanotify permission events with SRCU read lock held and when the process consuming events was slow to respond the kernel has stalled. It also contains several cleanups of unnecessary indirections in fsnotify framework and a bugfix from Amir fixing leakage of kernel internal errno to userspace" * 'fsnotify' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: (37 commits) fanotify: don't expose EOPENSTALE to userspace fsnotify: remove a stray unlock fsnotify: Move ->free_mark callback to fsnotify_ops fsnotify: Add group pointer in fsnotify_init_mark() fsnotify: Drop inode_mark.c fsnotify: Remove fsnotify_find_{inode|vfsmount}_mark() fsnotify: Remove fsnotify_detach_group_marks() fsnotify: Rename fsnotify_clear_marks_by_group_flags() fsnotify: Inline fsnotify_clear_{inode|vfsmount}_mark_group() fsnotify: Remove fsnotify_recalc_{inode|vfsmount}_mask() fsnotify: Remove fsnotify_set_mark_{,ignored_}mask_locked() fanotify: Release SRCU lock when waiting for userspace response fsnotify: Pass fsnotify_iter_info into handle_event handler fsnotify: Provide framework for dropping SRCU lock in ->handle_event fsnotify: Remove special handling of mark destruction on group shutdown fsnotify: Detach mark from object list when last reference is dropped fsnotify: Move queueing of mark for destruction into fsnotify_put_mark() inotify: Do not drop mark reference under idr_lock fsnotify: Free fsnotify_mark_connector when there is no mark attached fsnotify: Lock object list with connector lock ...
2017-05-03Merge tag 'for-4.12/dm-post-merge-changes' of ↵Linus Torvalds4-48/+39
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull additional device mapper updates from Mike Snitzer: "Here are some changes from Christoph that needed to be rebased ontop of changes that were already merged into the device mapper tree. In addition, these changes depend on the 'for-4.12/block' changes that you've already merged. - Cleanups to request-based DM and DM multipath from Christoph that prepare for his block core error code type checking improvements" * tag 'for-4.12/dm-post-merge-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm: introduce a new DM_MAPIO_KILL return value dm rq: change ->rq_end_io calling conventions dm mpath: merge do_end_io into multipath_end_io
2017-05-03Merge tag 'for-4.12/dm-changes' of ↵Linus Torvalds45-2995/+7610
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper updates from Mike Snitzer: - A major update for DM cache that reduces the latency for deciding whether blocks should migrate to/from the cache. The bio-prison-v2 interface supports this improvement by enabling direct dispatch of work to workqueues rather than having to delay the actual work dispatch to the DM cache core. So the dm-cache policies are much more nimble by being able to drive IO as they see fit. One immediate benefit from the improved latency is a cache that should be much more adaptive to changing workloads. - Add a new DM integrity target that emulates a block device that has additional per-sector tags that can be used for storing integrity information. - Add a new authenticated encryption feature to the DM crypt target that builds on the capabilities provided by the DM integrity target. - Add MD interface for switching the raid4/5/6 journal mode and update the DM raid target to use it to enable aid4/5/6 journal write-back support. - Switch the DM verity target over to using the asynchronous hash crypto API (this helps work better with architectures that have access to off-CPU algorithm providers, which should reduce CPU utilization). - Various request-based DM and DM multipath fixes and improvements from Bart and Christoph. - A DM thinp target fix for a bio structure leak that occurs for each discard IFF discard passdown is enabled. - A fix for a possible deadlock in DM bufio and a fix to re-check the new buffer allocation watermark in the face of competing admin changes to the 'max_cache_size_bytes' tunable. - A couple DM core cleanups. * tag 'for-4.12/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (50 commits) dm bufio: check new buffer allocation watermark every 30 seconds dm bufio: avoid a possible ABBA deadlock dm mpath: make it easier to detect unintended I/O request flushes dm mpath: cleanup QUEUE_IF_NO_PATH bit manipulation by introducing assign_bit() dm mpath: micro-optimize the hot path relative to MPATHF_QUEUE_IF_NO_PATH dm: introduce enum dm_queue_mode to cleanup related code dm mpath: verify __pg_init_all_paths locking assumptions at runtime dm: verify suspend_locking assumptions at runtime dm block manager: remove an unused argument from dm_block_manager_create() dm rq: check blk_mq_register_dev() return value in dm_mq_init_request_queue() dm mpath: delay requeuing while path initialization is in progress dm mpath: avoid that path removal can trigger an infinite loop dm mpath: split and rename activate_path() to prepare for its expanded use dm ioctl: prevent stack leak in dm ioctl call dm integrity: use previously calculated log2 of sectors_per_block dm integrity: use hex2bin instead of open-coded variant dm crypt: replace custom implementation of hex2bin() dm crypt: remove obsolete references to per-CPU state dm verity: switch to using asynchronous hash crypto API dm crypt: use WQ_HIGHPRI for the IO and crypt workqueues ...
2017-05-03Merge branch 'for-linus' of ↵Linus Torvalds26-1483/+3582
git://git.kernel.org/pub/scm/linux/kernel/git/shli/md Pull MD updates from Shaohua Li: - Add Partial Parity Log (ppl) feature found in Intel IMSM raid array by Artur Paszkiewicz. This feature is another way to close RAID5 writehole. The Linux implementation is also available for normal RAID5 array if specific superblock bit is set. - A number of md-cluser fixes and enabling md-cluster array resize from Guoqing Jiang - A bunch of patches from Ming Lei and Neil Brown to rewrite MD bio handling related code. Now MD doesn't directly access bio bvec, bi_phys_segments and uses modern bio API for bio split. - Improve RAID5 IO pattern to improve performance for hard disk based RAID5/6 from me. - Several patches from Song Liu to speed up raid5-cache recovery and allow raid5 cache feature disabling in runtime. - Fix a performance regression in raid1 resync from Xiao Ni. - Other cleanup and fixes from various people. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md: (84 commits) md/raid10: skip spare disk as 'first' disk md/raid1: Use a new variable to count flighting sync requests md: clear WantReplacement once disk is removed md/raid1/10: remove unused queue md: handle read-only member devices better. md/raid10: wait up frozen array in handle_write_completed uapi: fix linux/raid/md_p.h userspace compilation error md-cluster: Fix a memleak in an error handling path md: support disabling of create-on-open semantics. md: allow creation of mdNNN arrays via md_mod/parameters/new_array raid5-ppl: use a single mempool for ppl_io_unit and header_page md/raid0: fix up bio splitting. md/linear: improve bio splitting. md/raid5: make chunk_aligned_read() split bios more cleanly. md/raid10: simplify handle_read_error() md/raid10: simplify the splitting of requests. md/raid1: factor out flush_bio_list() md/raid1: simplify handle_read_error(). Revert "block: introduce bio_copy_data_partial" md/raid1: simplify alloc_behind_master_bio() ...
2017-05-03Merge branch 'stable-4.12' of git://git.infradead.org/users/pcmoore/auditLinus Torvalds10-275/+232
Pull audit updates from Paul Moore: "Fourteen audit patches for v4.12 that span the full range of fixes, new features, and internal cleanups. We have a patches to move to 64-bit timestamps, convert refcounts from atomic_t to refcount_t, track PIDs using the pid struct instead of pid_t, convert our own private audit buffer cache to a standard kmem_cache, log kernel module names when they are unloaded, and normalize the NETFILTER_PKT to make the userspace folks happier. From a fixes perspective, the most important is likely the auditd connection tracking RCU fix; it was a rather brain dead bug that I'll take the blame for, but thankfully it didn't seem to affect many people (only one report). I think the patch subject lines and commit descriptions do a pretty good job of explaining the details and why the changes are important so I'll point you there instead of duplicating it here; as usual, if you have any questions you know where to find us. We also manage to take out more code than we put in this time, that always makes me happy :)" * 'stable-4.12' of git://git.infradead.org/users/pcmoore/audit: audit: fix the RCU locking for the auditd_connection structure audit: use kmem_cache to manage the audit_buffer cache audit: Use timespec64 to represent audit timestamps audit: store the auditd PID as a pid struct instead of pid_t audit: kernel generated netlink traffic should have a portid of 0 audit: combine audit_receive() and audit_receive_skb() audit: convert audit_watch.count from atomic_t to refcount_t audit: convert audit_tree.count from atomic_t to refcount_t audit: normalize NETFILTER_PKT netfilter: use consistent ipv4 network offset in xt_AUDIT audit: log module name on delete_module audit: remove unnecessary semicolon in audit_watch_handle_event() audit: remove unnecessary semicolon in audit_mark_handle_event() audit: remove unnecessary semicolon in audit_field_valid()
2017-05-03Merge branch 'next' of ↵Linus Torvalds95-1120/+3240
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security Pull security subsystem updates from James Morris: "Highlights: IMA: - provide ">" and "<" operators for fowner/uid/euid rules KEYS: - add a system blacklist keyring - add KEYCTL_RESTRICT_KEYRING, exposes keyring link restriction functionality to userland via keyctl() LSM: - harden LSM API with __ro_after_init - add prlmit security hook, implement for SELinux - revive security_task_alloc hook TPM: - implement contextual TPM command 'spaces'" * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (98 commits) tpm: Fix reference count to main device tpm_tis: convert to using locality callbacks tpm: fix handling of the TPM 2.0 event logs tpm_crb: remove a cruft constant keys: select CONFIG_CRYPTO when selecting DH / KDF apparmor: Make path_max parameter readonly apparmor: fix parameters so that the permission test is bypassed at boot apparmor: fix invalid reference to index variable of iterator line 836 apparmor: use SHASH_DESC_ON_STACK security/apparmor/lsm.c: set debug messages apparmor: fix boolreturn.cocci warnings Smack: Use GFP_KERNEL for smk_netlbl_mls(). smack: fix double free in smack_parse_opts_str() KEYS: add SP800-56A KDF support for DH KEYS: Keyring asymmetric key restrict method with chaining KEYS: Restrict asymmetric key linkage using a specific keychain KEYS: Add a lookup_restriction function for the asymmetric key type KEYS: Add KEYCTL_RESTRICT_KEYRING KEYS: Consistent ordering for __key_link_begin and restrict check KEYS: Add an optional lookup_restriction hook to key_type ...
2017-05-03Merge branch 'ibmvnic-Updated-reset-handler-andcode-fixes'David S. Miller2-204/+388
Nathan Fontenot says: ==================== ibmvnic: Updated reset handler and code fixes This set of patches multiple code fixes and a new rest handler for the ibmvnic driver. In order to implement the new reset handler for the ibmvnic driver resource initialization needed to be moved to its own routine, a state variable is introduced to replace the various is_* flags in the driver, and a new routine to handle the assorted reasons the driver can be reset. v4 updates: Patch 3/11: Corrected trailing whitespace Patch 7/11: Corrected trailing whitespace v3 updates: Patch 10/11: Correct patch subject line to be a description of the patch. v2 updates: Patch 11/11: Use __netif_subqueue_stopped() instead of netif_subqueue_stopped() to avoid possible use of an un-initialized skb variable. ==================== Signed-off-by: David S. Miller <[email protected]>
2017-05-03ibmvnic: Move queue restarting in ibmvnic_tx_completeNathan Fontenot1-12/+10
Restart of the subqueue should occur outside of the loop processing any tx buffers instead of doing this in the middle of the loop. Signed-off-by: Nathan Fontenot <[email protected]> Signed-off-by: David S. Miller <[email protected]>