aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2018-01-31mm: drop hotplug lock from lru_add_drain_all()Michal Hocko3-10/+9
Pulling cpu hotplug locks inside the mm core function like lru_add_drain_all just asks for problems and the recent lockdep splat [1] just proves this. While the usage in that particular case might be wrong we should avoid the locking as lru_add_drain_all() is used in many places. It seems that this is not all that hard to achieve actually. We have done the same thing for drain_all_pages which is analogous by commit a459eeb7b852 ("mm, page_alloc: do not depend on cpu hotplug locks inside the allocator"). All we have to care about is to handle - the work item might be executed on a different cpu in worker from unbound pool so it doesn't run on pinned on the cpu - we have to make sure that we do not race with page_alloc_cpu_dead calling lru_add_drain_cpu the first part is already handled because the worker calls lru_add_drain which disables preemption when calling lru_add_drain_cpu on the local cpu it is draining. The later is true because page_alloc_cpu_dead is called on the controlling CPU after the hotplugged CPU vanished completely. [1] http://lkml.kernel.org/r/[email protected] [add a cpu hotplug locking interaction as per tglx] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Michal Hocko <[email protected]> Acked-by: Thomas Gleixner <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Mel Gorman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-01-31mm/mempolicy: add nodes_empty check in SYSC_migrate_pagesYisheng Xie1-3/+7
As in manpage of migrate_pages, the errno should be set to EINVAL when none of the node IDs specified by new_nodes are on-line and allowed by the process's current cpuset context, or none of the specified nodes contain memory. However, when test by following case: new_nodes = 0; old_nodes = 0xf; ret = migrate_pages(pid, old_nodes, new_nodes, MAX); The ret will be 0 and no errno is set. As the new_nodes is empty, we should expect EINVAL as documented. To fix the case like above, this patch check whether target nodes AND current task_nodes is empty, and then check whether AND node_states[N_MEMORY] is empty. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Yisheng Xie <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Chris Salls <[email protected]> Cc: Christopher Lameter <[email protected]> Cc: David Rientjes <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: Tan Xiaojun <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-01-31mm/mempolicy: fix the check of nodemask from userYisheng Xie1-3/+20
As Xiaojun reported the ltp of migrate_pages01 will fail on arm64 system which has 4 nodes[0...3], all have memory and CONFIG_NODES_SHIFT=2: migrate_pages01 0 TINFO : test_invalid_nodes migrate_pages01 14 TFAIL : migrate_pages_common.c:45: unexpected failure - returned value = 0, expected: -1 migrate_pages01 15 TFAIL : migrate_pages_common.c:55: call succeeded unexpectedly In this case the test_invalid_nodes of migrate_pages01 will call: SYSC_migrate_pages as: migrate_pages(0, , {0x0000000000000001}, 64, , {0x0000000000000010}, 64) = 0 The new nodes specifies one or more node IDs that are greater than the maximum supported node ID, however, the errno is not set to EINVAL as expected. As man pages of set_mempolicy[1], mbind[2], and migrate_pages[3] mentioned, when nodemask specifies one or more node IDs that are greater than the maximum supported node ID, the errno should set to EINVAL. However, get_nodes only check whether the part of bits [BITS_PER_LONG*BITS_TO_LONGS(MAX_NUMNODES), maxnode) is zero or not, and remain [MAX_NUMNODES, BITS_PER_LONG*BITS_TO_LONGS(MAX_NUMNODES) unchecked. This patch is to check the bits of [MAX_NUMNODES, maxnode) in get_nodes to let migrate_pages set the errno to EINVAL when nodemask specifies one or more node IDs that are greater than the maximum supported node ID, which follows the manpage's guide. [1] http://man7.org/linux/man-pages/man2/set_mempolicy.2.html [2] http://man7.org/linux/man-pages/man2/mbind.2.html [3] http://man7.org/linux/man-pages/man2/migrate_pages.2.html Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Yisheng Xie <[email protected]> Reported-by: Tan Xiaojun <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Chris Salls <[email protected]> Cc: Christopher Lameter <[email protected]> Cc: David Rientjes <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Naoya Horiguchi <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-01-31mm/mempolicy: remove redundant check in get_nodesYisheng Xie1-2/+0
We have already checked whether maxnode is a page worth of bits, by: maxnode > PAGE_SIZE*BITS_PER_BYTE So no need to check it once more. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Yisheng Xie <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Acked-by: David Rientjes <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: David Rientjes <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: Chris Salls <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Christopher Lameter <[email protected]> Cc: Tan Xiaojun <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-01-31mm: relax deferred struct page requirementsPavel Tatashin4-9/+1
There is no need to have ARCH_SUPPORTS_DEFERRED_STRUCT_PAGE_INIT, as all the page initialization code is in common code. Also, there is no need to depend on MEMORY_HOTPLUG, as initialization code does not really use hotplug memory functionality. So, we can remove this requirement as well. This patch allows to use deferred struct page initialization on all platforms with memblock allocator. Tested on x86, arm64, and sparc. Also, verified that code compiles on PPC with CONFIG_MEMORY_HOTPLUG disabled. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Pavel Tatashin <[email protected]> Acked-by: Heiko Carstens <[email protected]> [s390] Reviewed-by: Khalid Aziz <[email protected]> Acked-by: Michael Ellerman <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Steven Sistare <[email protected]> Cc: Daniel Jordan <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Reza Arbab <[email protected]> Cc: Martin Schwidefsky <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Mel Gorman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-01-31zswap: same-filled pages handlingSrividya Desireddy1-5/+66
Zswap is a cache which compresses the pages that are being swapped out and stores them into a dynamically allocated RAM-based memory pool. Experiments have shown that around 10-20% of pages stored in zswap are same-filled pages (i.e. contents of the page are all same), but these pages are handled as normal pages by compressing and allocating memory in the pool. This patch adds a check in zswap_frontswap_store() to identify same-filled page before compression of the page. If the page is a same-filled page, set zswap_entry.length to zero, save the same-filled value and skip the compression of the page and alloction of memory in zpool. In zswap_frontswap_load(), check if value of zswap_entry.length is zero corresponding to the page to be loaded. If zswap_entry.length is zero, fill the page with same-filled value. This saves the decompression time during load. On a ARM Quad Core 32-bit device with 1.5GB RAM by launching and relaunching different applications, out of ~64000 pages stored in zswap, ~11000 pages were same-value filled pages (including zero-filled pages) and ~9000 pages were zero-filled pages. An average of 17% of pages(including zero-filled pages) in zswap are same-value filled pages and 14% pages are zero-filled pages. An average of 3% of pages are same-filled non-zero pages. The below table shows the execution time profiling with the patch. Baseline With patch % Improvement ----------------------------------------------------------------- *Zswap Store Time 26.5ms 18ms 32% (of same value pages) *Zswap Load Time (of same value pages) 25.5ms 13ms 49% ----------------------------------------------------------------- On Ubuntu PC with 2GB RAM, while executing kernel build and other test scripts and running multimedia applications, out of 360000 pages stored in zswap 78000(~22%) of pages were found to be same-value filled pages (including zero-filled pages) and 64000(~17%) are zero-filled pages. So an average of %5 of pages are same-filled non-zero pages. The below table shows the execution time profiling with the patch. Baseline With patch % Improvement ----------------------------------------------------------------- *Zswap Store Time 91ms 74ms 19% (of same value pages) *Zswap Load Time 50ms 7.5ms 85% (of same value pages) ----------------------------------------------------------------- *The execution times may vary with test device used. Dan said: : I did test this patch out this week, and I added some instrumentation to : check the performance impact, and tested with a small program to try to : check the best and worst cases. : : When doing a lot of swap where all (or almost all) pages are same-value, I : found this patch does save both time and space, significantly. The exact : improvement in time and space depends on which compressor is being used, : but roughly agrees with the numbers you listed. : : In the worst case situation, where all (or almost all) pages have the : same-value *except* the final long (meaning, zswap will check each long on : the entire page but then still have to pass the page to the compressor), : the same-value check is around 10-15% of the total time spent in : zswap_frontswap_store(). That's a not-insignificant amount of time, but : it's not huge. Considering that most systems will probably be swapping : pages that aren't similar to the worst case (although I don't have any : data to know that), I'd say the improvement is worth the possible : worst-case performance impact. [[email protected]: add memset_l instead of for loop] Link: http://lkml.kernel.org/r/20171018104832epcms5p1b2232e2236258de3d03d1344dde9fce0@epcms5p1 Signed-off-by: Srividya Desireddy <[email protected]> Acked-by: Dan Streetman <[email protected]> Cc: Seth Jennings <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Dinakar Reddy Pathireddy <[email protected]> Cc: SHARAN ALLUR <[email protected]> Cc: RAJIB BASU <[email protected]> Cc: JUHUN KIM <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Timofey Titovets <[email protected]> Cc: Andi Kleen <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-01-31mm: kmemleak: remove unused hardirq.hYang Shi1-1/+0
Preempt counter APIs have been split out, currently, hardirq.h just includes irq_enter/exit APIs which are not used by kmemleak at all. So, remove the unused hardirq.h. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Yang Shi <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Matthew Wilcox <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-01-31include/linux/sched/mm.h: uninline mmdrop_async(), etcAndrew Morton2-234/+238
mmdrop_async() is only used in fork.c. Move that and its support functions into fork.c, uninline it all. Quite a lot of code gets moved around to avoid forward declarations. Cc: Ingo Molnar <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Oleg Nesterov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-01-31slub: remove obsolete comments of put_cpu_partial()Miles Chen1-3/+1
Commit d6e0b7fa1186 ("slub: make dead caches discard free slabs immediately") makes put_cpu_partial() run with preemption disabled and interrupts disabled when calling unfreeze_partials(). The comment: "put_cpu_partial() is done without interrupts disabled and without preemption disabled" looks obsolete, so remove it. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Miles Chen <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: David Rientjes <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Vladimir Davydov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-01-31mm/slub.c: fix wrong address during slab padding restorationBalasubramani Vivekanandan1-3/+5
Start address calculated for slab padding restoration was wrong. Wrong address would point to some section before padding and could cause corruption Link: http://lkml.kernel.org/r/1516604578-4577-1-git-send-email-balasubramani_vivekanandan@mentor.com Signed-off-by: Balasubramani Vivekanandan <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: David Rientjes <[email protected]> Cc: Joonsoo Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-01-31mm/slab.c: remove redundant assignments for slab_stateOscar Salvador1-4/+0
slab_state is being set to "UP" in create_kmalloc_caches(), and later on we set it again in kmem_cache_init_late(), but slab_state does not change in the meantime. Remove the redundant assignment from kmem_cache_init_late(). And unless I overlooked anything, the same goes for "slab_state = FULL". slab_state is set to "FULL" in kmem_cache_init_late(), but it is later being set again in cpucache_init(), which gets called from do_initcall_level(). So remove the assignment from cpucache_init() as well. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Oscar Salvador <[email protected]> Acked-by: Christoph Lameter <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: David Rientjes <[email protected]> Cc: Joonsoo Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-01-31mm/slab_common.c: make calculate_alignment() staticByongho Lee2-30/+29
calculate_alignment() function is only used inside slab_common.c. So make it static and let the compiler do more optimizations. After this patch there's a small improvement in text and data size. $ gcc --version gcc (GCC) 7.2.1 20171128 Before: text data bss dec hex filename 9890457 3828702 1212364 14931523 e3d643 vmlinux After: text data bss dec hex filename 9890437 3828670 1212364 14931471 e3d60f vmlinux Also I fixed a style problem reported by checkpatch. WARNING: Missing a blank line after declarations #53: FILE: mm/slab_common.c:286: + unsigned long ralign = cache_line_size(); + while (size <= ralign / 2) Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Byongho Lee <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: David Rientjes <[email protected]> Cc: Joonsoo Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-01-31ocfs2: return error when we attempt to access a dirty bh in jbd2piaojun1-11/+12
We should not reuse the dirty bh in jbd2 directly due to the following situation: 1. When removing extent rec, we will dirty the bhs of extent rec and truncate log at the same time, and hand them over to jbd2. 2. The bhs are submitted to jbd2 area successfully. 3. The write-back thread of device help flush the bhs to disk but encounter write error due to abnormal storage link. 4. After a while the storage link become normal. Truncate log flush worker triggered by the next space reclaiming found the dirty bh of truncate log and clear its 'BH_Write_EIO' and then set it uptodate in __ocfs2_journal_access(): ocfs2_truncate_log_worker ocfs2_flush_truncate_log __ocfs2_flush_truncate_log ocfs2_replay_truncate_records ocfs2_journal_access_di __ocfs2_journal_access // here we clear io_error and set 'tl_bh' uptodata. 5. Then jbd2 will flush the bh of truncate log to disk, but the bh of extent rec is still in error state, and unfortunately nobody will take care of it. 6. At last the space of extent rec was not reduced, but truncate log flush worker have given it back to globalalloc. That will cause duplicate cluster problem which could be identified by fsck.ocfs2. Sadly we can hardly revert this but set fs read-only in case of ruining atomicity and consistency of space reclaim. Link: http://lkml.kernel.org/r/[email protected] Fixes: acf8fdbe6afb ("ocfs2: do not BUG if buffer not uptodate in __ocfs2_journal_access") Signed-off-by: Jun Piao <[email protected]> Reviewed-by: Yiwen Jiang <[email protected]> Reviewed-by: Changwei Ge <[email protected]> Cc: Mark Fasheh <[email protected]> Cc: Joel Becker <[email protected]> Cc: Junxiao Bi <[email protected]> Cc: Joseph Qi <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-01-31ocfs2: unlock bh_state if bg check failsChangwei Ge1-0/+2
We should unlock bh_stat if bg->bg_free_bits_count > bg->bg_bits Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Changwei Ge <[email protected]> Suggested-by: Jan Kara <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Cc: Mark Fasheh <[email protected]> Cc: Joel Becker <[email protected]> Cc: Junxiao Bi <[email protected]> Cc: Joseph Qi <[email protected]> Cc: Changwei Ge <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-01-31ocfs2: nowait aio supportGang He6-33/+104
Return EAGAIN if any of the following checks fail for direct I/O: - Cannot get the related locks immediately - Blocks are not allocated at the write location, it will trigger block allocation and block IO operations. [[email protected]: v4] Link: http://lkml.kernel.org/r/[email protected] [[email protected]: v2] Link: http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Gang He <[email protected]> Reviewed-by: Alex Chen <[email protected]> Cc: Mark Fasheh <[email protected]> Cc: Joel Becker <[email protected]> Cc: Junxiao Bi <[email protected]> Cc: Joseph Qi <[email protected]> Cc: Changwei Ge <[email protected]> Cc: Jun Piao <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-01-31ocfs2: add ocfs2_overwrite_io()Gang He2-0/+48
Add ocfs2_overwrite_io function, which is used to judge if overwrite allocated blocks, otherwise, the write will bring extra block allocation overhead. [[email protected]: v3] Link: http://lkml.kernel.org/r/[email protected] [[email protected]: v2] Link: http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Gang He <[email protected]> Reviewed-by: Changwei Ge <[email protected]> Cc: Mark Fasheh <[email protected]> Cc: Joel Becker <[email protected]> Cc: Junxiao Bi <[email protected]> Cc: Joseph Qi <[email protected]> Cc: Jun Piao <[email protected]> Cc: alex chen <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-01-31ocfs2: add ocfs2_try_rw_lock() and ocfs2_try_inode_lock()Gang He2-0/+25
Patch series "ocfs2: add nowait aio support", v4. VFS layer has introduced the non-blocking aio flag IOCB_NOWAIT, which tells the kernel to bail out if an AIO request will block for reasons such as file allocations, or writeback triggering, or would block while allocating requests while performing direct I/O. Subsequently, pwritev2/preadv2 also can leverage this part of kernel code. So far, ext4/xfs/btrfs have supported this feature. Add the related code for the ocfs2 file system. This patch (of 3): Add ocfs2_try_rw_lock and ocfs2_try_inode_lock functions, which will be used in non-blocking IO scenarios. [[email protected]: v2] Link: http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Gang He <[email protected]> Reviewed-by: Jun Piao <[email protected]> Acked-by: alex chen <[email protected]> Cc: Mark Fasheh <[email protected]> Cc: Joel Becker <[email protected]> Cc: Junxiao Bi <[email protected]> Cc: Joseph Qi <[email protected]> Cc: Changwei Ge <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-01-31ocfs2: add trimfs lock to avoid duplicated trims in clusterGang He1-0/+44
ocfs2 supports trimming the underlying disk via the fstrim command. But there is a problem, ocfs2 is a shared disk cluster file system, if the user configures a scheduled fstrim job on each file system node, this will trigger multiple nodes trimming a shared disk simultaneously, which is very wasteful for CPU and IO consumption. This also might negatively affect the lifetime of poor-quality SSD devices. So we introduce a trimfs dlm lock to communicate with each other in this case, which will make only one fstrim command to do the trimming on a shared disk among the cluster. The fstrim commands from the other nodes should wait for the first fstrim to finish and return success directly, to avoid running the same trim on the shared disk again. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Gang He <[email protected]> Reviewed-by: Changwei Ge <[email protected]> Cc: Mark Fasheh <[email protected]> Cc: Joel Becker <[email protected]> Cc: Junxiao Bi <[email protected]> Cc: Joseph Qi <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-01-31ocfs2: add trimfs dlm lock resourceGang He4-0/+121
Introduce a new dlm lock resource, which will be used to communicate during fstrimming of an ocfs2 device from cluster nodes. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Gang He <[email protected]> Reviewed-by: Changwei Ge <[email protected]> Cc: Mark Fasheh <[email protected]> Cc: Joel Becker <[email protected]> Cc: Junxiao Bi <[email protected]> Cc: Joseph Qi <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-01-31ocfs2: try to reuse extent block in dealloc without meta_allocChangwei Ge3-10/+203
A crash issue was reported by John Lightsey with a call trace as follows: ocfs2_split_extent+0x1ad3/0x1b40 [ocfs2] ocfs2_change_extent_flag+0x33a/0x470 [ocfs2] ocfs2_mark_extent_written+0x172/0x220 [ocfs2] ocfs2_dio_end_io+0x62d/0x910 [ocfs2] dio_complete+0x19a/0x1a0 do_blockdev_direct_IO+0x19dd/0x1eb0 __blockdev_direct_IO+0x43/0x50 ocfs2_direct_IO+0x8f/0xa0 [ocfs2] generic_file_direct_write+0xb2/0x170 __generic_file_write_iter+0xc3/0x1b0 ocfs2_file_write_iter+0x4bb/0xca0 [ocfs2] __vfs_write+0xae/0xf0 vfs_write+0xb8/0x1b0 SyS_write+0x4f/0xb0 system_call_fastpath+0x16/0x75 The BUG code told that extent tree wants to grow but no metadata was reserved ahead of time. From my investigation into this issue, the root cause it that although enough metadata is not reserved, there should be enough for following use. Rightmost extent is merged into its left one due to a certain times of marking extent written. Because during marking extent written, we got many physically continuous extents. At last, an empty extent showed up and the rightmost path is removed from extent tree. Add a new mechanism to reuse extent block cached in dealloc which were just unlinked from extent tree to solve this crash issue. Criteria is that during marking extents *written*, if extent rotation and merging results in unlinking extent with growing extent tree later without any metadata reserved ahead of time, try to reuse those extents in dealloc in which deleted extents are cached. Also, this patch addresses the issue John reported that ::dw_zero_count is not calculated properly. After applying this patch, the issue John reported was gone. Thanks for the reproducer provided by John. And this patch has passed ocfs2-test(29 cases) suite running by New H3C Group. [[email protected]: fix static checker warnning] Link: http://lkml.kernel.org/r/63ADC13FD55D6546B7DECE290D39E373F29196AE@H3CMLB12-EX.srv.huawei-3com.com [[email protected]: brelse(NULL) is legal] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Changwei Ge <[email protected]> Reported-by: John Lightsey <[email protected]> Tested-by: John Lightsey <[email protected]> Cc: Joel Becker <[email protected]> Cc: Joseph Qi <[email protected]> Cc: Junxiao Bi <[email protected]> Cc: Dan Carpenter <[email protected]> Cc: Mark Fasheh <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-01-31ocfs2: make metadata estimation accurate and clearChangwei Ge1-1/+3
Current code assume that ::w_unwritten_list always has only one item on. This is not right and hard to get understood. So improve how to count unwritten item. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Changwei Ge <[email protected]> Reported-by: John Lightsey <[email protected]> Tested-by: John Lightsey <[email protected]> Cc: Mark Fasheh <[email protected]> Cc: Joseph Qi <[email protected]> Cc: Junxiao Bi <[email protected]> Cc: Joel Becker <[email protected]> Cc: Changwei Ge <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-01-31ocfs2/acl: use 'ip_xattr_sem' to protect getting extended attributepiaojun2-0/+8
The race between *set_acl and *get_acl will cause getting incomplete xattr data as below: processA processB ocfs2_set_acl ocfs2_xattr_set __ocfs2_xattr_set_handle ocfs2_get_acl_nolock ocfs2_xattr_get_nolock: processB may get incomplete xattr data if processA hasn't set_acl done. So we should use 'ip_xattr_sem' to protect getting extended attribute in ocfs2_get_acl_nolock(), as other processes could be changing it concurrently. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Jun Piao <[email protected]> Reviewed-by: Alex Chen <[email protected]> Cc: Mark Fasheh <[email protected]> Cc: Joel Becker <[email protected]> Cc: Junxiao Bi <[email protected]> Cc: Joseph Qi <[email protected]> Cc: Changwei Ge <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-01-31ocfs2: clean up dead code in alloc.cChangwei Ge1-9/+2
Some stack variables are no longer used but still assigned. Trim them. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Changwei Ge <[email protected]> Reviewed-by: Jun Piao <[email protected]> Reviewed-by: Alex Chen <[email protected]> Cc: Mark Fasheh <[email protected]> Cc: Joel Becker <[email protected]> Cc: Junxiao Bi <[email protected]> Cc: Joseph Qi <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-01-31ocfs2/xattr: assign errno to 'ret' in ocfs2_calc_xattr_init()piaojun1-0/+1
We need catch the errno returned by ocfs2_xattr_get_nolock() and assign it to 'ret' for printing and noticing upper callers. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Jun Piao <[email protected]> Reviewed-by: Alex Chen <[email protected]> Reviewed-by: Yiwen Jiang <[email protected]> Acked-by: Gang He <[email protected]> Acked-by: Changwei Ge <[email protected]> Cc: Mark Fasheh <[email protected]> Cc: Joel Becker <[email protected]> Cc: Junxiao Bi <[email protected]> Cc: Joseph Qi <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-01-31ocfs2: try a blocking lock before return AOP_TRUNCATED_PAGEGang He1-0/+9
If we can't get inode lock immediately in the function ocfs2_inode_lock_with_page() when reading a page, we should not return directly here, since this will lead to a softlockup problem when the kernel is configured with CONFIG_PREEMPT is not set. The method is to get a blocking lock and immediately unlock before returning, this can avoid CPU resource waste due to lots of retries, and benefits fairness in getting lock among multiple nodes, increase efficiency in case modifying the same file frequently from multiple nodes. The softlockup crash (when set /proc/sys/kernel/softlockup_panic to 1) looks like: Kernel panic - not syncing: softlockup: hung tasks CPU: 0 PID: 885 Comm: multi_mmap Tainted: G L 4.12.14-6.1-default #1 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 Call Trace: <IRQ> dump_stack+0x5c/0x82 panic+0xd5/0x21e watchdog_timer_fn+0x208/0x210 __hrtimer_run_queues+0xcc/0x200 hrtimer_interrupt+0xa6/0x1f0 smp_apic_timer_interrupt+0x34/0x50 apic_timer_interrupt+0x96/0xa0 </IRQ> RIP: 0010:unlock_page+0x17/0x30 RSP: 0000:ffffaf154080bc88 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff10 RAX: dead000000000100 RBX: fffff21e009f5300 RCX: 0000000000000004 RDX: dead0000000000ff RSI: 0000000000000202 RDI: fffff21e009f5300 RBP: 0000000000000000 R08: 0000000000000000 R09: ffffaf154080bb00 R10: ffffaf154080bc30 R11: 0000000000000040 R12: ffff993749a39518 R13: 0000000000000000 R14: fffff21e009f5300 R15: fffff21e009f5300 ocfs2_inode_lock_with_page+0x25/0x30 [ocfs2] ocfs2_readpage+0x41/0x2d0 [ocfs2] filemap_fault+0x12b/0x5c0 ocfs2_fault+0x29/0xb0 [ocfs2] __do_fault+0x1a/0xa0 __handle_mm_fault+0xbe8/0x1090 handle_mm_fault+0xaa/0x1f0 __do_page_fault+0x235/0x4b0 trace_do_page_fault+0x3c/0x110 async_page_fault+0x28/0x30 RIP: 0033:0x7fa75ded638e RSP: 002b:00007ffd6657db18 EFLAGS: 00010287 RAX: 000055c7662fb700 RBX: 0000000000000001 RCX: 000055c7662fb700 RDX: 0000000000001770 RSI: 00007fa75e909000 RDI: 000055c7662fb700 RBP: 0000000000000003 R08: 000000000000000e R09: 0000000000000000 R10: 0000000000000483 R11: 00007fa75ded61b0 R12: 00007fa75e90a770 R13: 000000000000000e R14: 0000000000001770 R15: 0000000000000000 About performance improvement, we can see the testing time is reduced, and CPU utilization decreases, the detailed data is as follows. I ran multi_mmap test case in ocfs2-test package in a three nodes cluster. Before applying this patch: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2754 ocfs2te+ 20 0 170248 6980 4856 D 80.73 0.341 0:18.71 multi_mmap 1505 root rt 0 222236 123060 97224 S 2.658 6.015 0:01.44 corosync 5 root 20 0 0 0 0 S 1.329 0.000 0:00.19 kworker/u8:0 95 root 20 0 0 0 0 S 1.329 0.000 0:00.25 kworker/u8:1 2728 root 20 0 0 0 0 S 0.997 0.000 0:00.24 jbd2/sda1-33 2721 root 20 0 0 0 0 S 0.664 0.000 0:00.07 ocfs2dc-3C8CFD4 2750 ocfs2te+ 20 0 142976 4652 3532 S 0.664 0.227 0:00.28 mpirun ocfs2test@tb-node2:~>multiple_run.sh -i ens3 -k ~/linux-4.4.21-69.tar.gz -o ~/ocfs2mullog -C hacluster -s pcmk -n tb-node2,tb-node1,tb-node3 -d /dev/sda1 -b 4096 -c 32768 -t multi_mmap /mnt/shared Tests with "-b 4096 -C 32768" Thu Dec 28 14:44:52 CST 2017 multi_mmap..................................................Passed. Runtime 783 seconds. After apply this patch: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2508 ocfs2te+ 20 0 170248 6804 4680 R 54.00 0.333 0:55.37 multi_mmap 155 root 20 0 0 0 0 S 2.667 0.000 0:01.20 kworker/u8:3 95 root 20 0 0 0 0 S 2.000 0.000 0:01.58 kworker/u8:1 2504 ocfs2te+ 20 0 142976 4604 3480 R 1.667 0.225 0:01.65 mpirun 5 root 20 0 0 0 0 S 1.000 0.000 0:01.36 kworker/u8:0 2482 root 20 0 0 0 0 S 1.000 0.000 0:00.86 jbd2/sda1-33 299 root 0 -20 0 0 0 S 0.333 0.000 0:00.13 kworker/2:1H 335 root 0 -20 0 0 0 S 0.333 0.000 0:00.17 kworker/1:1H 535 root 20 0 12140 7268 1456 S 0.333 0.355 0:00.34 haveged 1282 root rt 0 222284 123108 97224 S 0.333 6.017 0:01.33 corosync ocfs2test@tb-node2:~>multiple_run.sh -i ens3 -k ~/linux-4.4.21-69.tar.gz -o ~/ocfs2mullog -C hacluster -s pcmk -n tb-node2,tb-node1,tb-node3 -d /dev/sda1 -b 4096 -c 32768 -t multi_mmap /mnt/shared Tests with "-b 4096 -C 32768" Thu Dec 28 15:04:12 CST 2017 multi_mmap..................................................Passed. Runtime 487 seconds. Link: http://lkml.kernel.org/r/[email protected] Fixes: 1cce4df04f37 ("ocfs2: do not lock/unlock() inode DLM lock") Signed-off-by: Gang He <[email protected]> Reviewed-by: Eric Ren <[email protected]> Acked-by: alex chen <[email protected]> Acked-by: piaojun <[email protected]> Cc: Mark Fasheh <[email protected]> Cc: Joel Becker <[email protected]> Cc: Junxiao Bi <[email protected]> Cc: Joseph Qi <[email protected]> Cc: Changwei Ge <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-01-31ocfs2: return -EROFS to mount.ocfs2 if inode block is invalidpiaojun1-3/+2
If metadata is corrupted such as 'invalid inode block', we will get failed by calling 'mount()' and then set filesystem readonly as below: ocfs2_mount ocfs2_initialize_super ocfs2_init_global_system_inodes ocfs2_iget ocfs2_read_locked_inode ocfs2_validate_inode_block ocfs2_error ocfs2_handle_error ocfs2_set_ro_flag(osb, 0); // set readonly In this situation we need return -EROFS to 'mount.ocfs2', so that user can fix it by fsck. And then mount again. In addition, 'mount.ocfs2' should be updated correspondingly as it only return 1 for all errno. And I will post a patch for 'mount.ocfs2' too. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Jun Piao <[email protected]> Reviewed-by: Alex Chen <[email protected]> Reviewed-by: Joseph Qi <[email protected]> Reviewed-by: Changwei Ge <[email protected]> Reviewed-by: Gang He <[email protected]> Cc: Mark Fasheh <[email protected]> Cc: Joel Becker <[email protected]> Cc: Junxiao Bi <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-01-31ocfs2: clean dead code in suballoc.cChangwei Ge1-3/+3
Stack variable fe is no longer used, so trim it to save some CPU cycles and stack space. Link: http://lkml.kernel.org/r/63ADC13FD55D6546B7DECE290D39E373F1F5A8DD@H3CMLB14-EX.srv.huawei-3com.com Signed-off-by: Changwei Ge <[email protected]> Reviewed-by: Joseph Qi <[email protected]> Cc: Mark Fasheh <[email protected]> Cc: Joel Becker <[email protected]> Cc: Junxiao Bi <[email protected]> Cc: Changwei Ge <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-01-31ocfs2: use the OCFS2_XATTR_ROOT_SIZE macro in ocfs2_reflink_xattr_header()alex chen1-1/+1
Use the OCFS2_XATTR_ROOT_SIZE macro improves the readability of the code. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Alex Chen <[email protected]> Reviewed-by: Jun Piao <[email protected]> Cc: Mark Fasheh <[email protected]> Cc: Joel Becker <[email protected]> Cc: Junxiao Bi <[email protected]> Cc: Joseph Qi <[email protected]> Cc: Changwei Ge <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-01-31ocfs2/cluster: close a race that fence can't be triggeredYang Zhang1-2/+3
When some nodes of cluster face with TCP connection fault, ocfs2 will pick up a quorum to continue to work and other nodes will be fenced by resetting host. In order to decide which node should be fenced, ocfs2 leverages o2quo_state::qs_holds. If that variable is reduced to zero, then a try to decide if fence local node is performed. However, under a specific scenario that local node is not disconnected from others at the same time, above method has a problem to reduce ::qs_holds to zero. Because, o2net 90s idle timer corresponding to different nodes is triggered one after another. node 2 node 3 90s idle timer elapses clear ::qs_conn_bm set hold 40s is passed 90 idle timer elapses clear ::qs_conn_bm set hold still up timer elapses clear hold (NOT to zero ) 90s idle timer elapses AGAIN still up timer elapses. clear hold still up timer elapses To solve this issue, a node which has already be evicted from ::qs_conn_bm can't set hold again and again invoked from idle timer. Link: http://lkml.kernel.org/r/63ADC13FD55D6546B7DECE290D39E373F1F3F93B@H3CMLB12-EX.srv.huawei-3com.com Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Changwei Ge <[email protected]> Cc: Mark Fasheh <[email protected]> Cc: Joel Becker <[email protected]> Cc: Junxiao Bi <[email protected]> Cc: Joseph Qi <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-01-31ocfs2: give an obvious tip for mismatched cluster namesGang He1-2/+6
Add an obvious error message, due to mismatched cluster names between on-disk and in the current cluster. We can meet this case during OCFS2 cluster migration. If we can give the user an obvious tip for why they can not mount the file system after migration, they can quickly fix this mismatch problem. Second, also move printing ocfs2_fill_super() errno to the front of ocfs2_dismount_volume(), since ocfs2_dismount_volume() will also print its own message. I looked through all the code of OCFS2 (include o2cb); there is not any place which returns this error. In fact, the function calling path ocfs2_fill_super -> ocfs2_mount_volume -> ocfs2_dlm_init -> dlm_new_lockspace is a very specific one. We can use this errno to give the user a more clear tip, since this case is a little common during cluster migration, but the customer can quickly get the failure cause if there is a error printed. Also, I think it is not possible to add this errno in the o2cb path during ocfs2_dlm_init(), since the o2cb code has been stable for a long time. We only print this error tip when the user uses pcmk stack, since using the o2cb stack the user will not meet this error. [[email protected]: v2] Link: http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Gang He <[email protected]> Reviewed-by: Mark Fasheh <[email protected]> Acked-by: Joseph Qi <[email protected]> Cc: Joel Becker <[email protected]> Cc: Junxiao Bi <[email protected]> Cc: Joseph Qi <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-01-31ocfs2/cluster: neaten a member of o2net_msg_handlerChangwei Ge1-1/+1
It's odd that o2net_msg_handler::nh_func_data is declared as type o2net_msg_handler_func*. So neaten it. Link: http://lkml.kernel.org/r/63ADC13FD55D6546B7DECE290D39E373F1F554DA@H3CMLB14-EX.srv.huawei-3com.com Signed-off-by: Changwei Ge <[email protected]> Reviewed-by: Joseph Qi <[email protected]> Reviewed-by: Alex Chen <[email protected]> Cc: Mark Fasheh <[email protected]> Cc: Joel Becker <[email protected]> Cc: Junxiao Bi <[email protected]> Cc: Changwei Ge <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-01-31fs/ocfs2/dlm/dlmmaster.c: clean up dead codeChangwei Ge1-7/+0
This code has been commented out for 12 years. Remove it. Link: http://lkml.kernel.org/r/63ADC13FD55D6546B7DECE290D39E373CED7EF9E@H3CMLB14-EX.srv.huawei-3com.com Signed-off-by: Changwei Ge <[email protected]> Cc: Mark Fasheh <[email protected]> Cc: Joel Becker <[email protected]> Cc: Junxiao Bi <[email protected]> Cc: Joseph Qi <[email protected]> Cc: Changwei Ge <[email protected]> Cc: alex chen <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-01-31m32r: remove abort()Sudip Mukherjee1-8/+0
Commit 7c2c11b208be ("arch: define weak abort()") has introduced a weak abort() which is common for all arch. And, so we will not need arch specific abort which has the same code as the weak abort(). Remove the abort() for m32r. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Sudip Mukherjee <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-01-31scripts/tags.sh: change find_other_sources() for include directoriesArend van Spriel1-1/+1
The current find done in find_other_sources() excludes directories in the kernel tree that are named 'include', eg.: ./security/apparmor/include ./security/selinux/include ./drivers/net/wireless/broadcom/brcm80211/include ./drivers/gpu/drm/amd/acp/include ./drivers/gpu/drm/amd/display/include ./drivers/gpu/drm/amd/include ./drivers/gpu/drm/nouveau/include This changes the find command in find_other_sources() to include those using the -path option. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arend van Spriel <[email protected]> Cc: Robert Jarzmik <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-01-31scripts/decodecode: make it take multiline Code lineAndy Shevchenko1-0/+12
In case of running scripts/decodecode without any parameters in order to give a copy'n'pasted Code line from, for example, email it would parse only first line of it, while in emails it's split to few. ie, when you have a file out of oops the Code line looks like Code: hh hh ... <hh> ... hh\n When copy'n'paste from, for example, email where sender or some middle MTA split it, the line looks like: Code: hh hh ... hh\n hh ... <hh> ... hh\n hh hh ... hh\n The Code line followed by another oops line usually contains characters out of hex digit + space + < + > set. So add logic to join this split back if and only if the following lines have hex digits, or spaces, or '<', or '>' characters. It will be quite unlikely to have a broken input in well formed Oops or dmesg, thus a simple regex is being used. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Andy Shevchenko <[email protected]> Cc: Will Deacon <[email protected]> Cc: Dave Martin <[email protected]> Cc: Philippe Ombredanne <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-01-31fs/dax.c: release PMD lock even when there is no PMD support in DAXJan H. Schönherr1-1/+1
follow_pte_pmd() can theoretically return after having acquired a PMD lock, even when DAX was not compiled with CONFIG_FS_DAX_PMD. Release the PMD lock unconditionally. Link: http://lkml.kernel.org/r/[email protected] Fixes: f729c8c9b24f ("dax: wrprotect pmd_t in dax_mapping_entry_mkclean") Signed-off-by: Jan H. Schönherr <[email protected]> Reviewed-by: Ross Zwisler <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Cc: Matthew Wilcox <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-01-31Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds1666-45602/+120519
Pull networking updates from David Miller: 1) Significantly shrink the core networking routing structures. Result of http://vger.kernel.org/~davem/seoul2017_netdev_keynote.pdf 2) Add netdevsim driver for testing various offloads, from Jakub Kicinski. 3) Support cross-chip FDB operations in DSA, from Vivien Didelot. 4) Add a 2nd listener hash table for TCP, similar to what was done for UDP. From Martin KaFai Lau. 5) Add eBPF based queue selection to tun, from Jason Wang. 6) Lockless qdisc support, from John Fastabend. 7) SCTP stream interleave support, from Xin Long. 8) Smoother TCP receive autotuning, from Eric Dumazet. 9) Lots of erspan tunneling enhancements, from William Tu. 10) Add true function call support to BPF, from Alexei Starovoitov. 11) Add explicit support for GRO HW offloading, from Michael Chan. 12) Support extack generation in more netlink subsystems. From Alexander Aring, Quentin Monnet, and Jakub Kicinski. 13) Add 1000BaseX, flow control, and EEE support to mvneta driver. From Russell King. 14) Add flow table abstraction to netfilter, from Pablo Neira Ayuso. 15) Many improvements and simplifications to the NFP driver bpf JIT, from Jakub Kicinski. 16) Support for ipv6 non-equal cost multipath routing, from Ido Schimmel. 17) Add resource abstration to devlink, from Arkadi Sharshevsky. 18) Packet scheduler classifier shared filter block support, from Jiri Pirko. 19) Avoid locking in act_csum, from Davide Caratti. 20) devinet_ioctl() simplifications from Al viro. 21) More TCP bpf improvements from Lawrence Brakmo. 22) Add support for onlink ipv6 route flag, similar to ipv4, from David Ahern. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1925 commits) tls: Add support for encryption using async offload accelerator ip6mr: fix stale iterator net/sched: kconfig: Remove blank help texts openvswitch: meter: Use 64-bit arithmetic instead of 32-bit tcp_nv: fix potential integer overflow in tcpnv_acked r8169: fix RTL8168EP take too long to complete driver initialization. qmi_wwan: Add support for Quectel EP06 rtnetlink: enable IFLA_IF_NETNSID for RTM_NEWLINK ipmr: Fix ptrdiff_t print formatting ibmvnic: Wait for device response when changing MAC qlcnic: fix deadlock bug tcp: release sk_frag.page in tcp_disconnect ipv4: Get the address of interface correctly. net_sched: gen_estimator: fix lockdep splat net: macb: Handle HRESP error net/mlx5e: IPoIB, Fix copy-paste bug in flow steering refactoring ipv6: addrconf: break critical section in addrconf_verify_rtnl() ipv6: change route cache aging logic i40e/i40evf: Update DESC_NEEDED value to reflect larger value bnxt_en: cleanup DIM work on device shutdown ...
2018-01-31Merge branch 'linus' of ↵Linus Torvalds162-2683/+7472
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto updates from Herbert Xu: "API: - Enforce the setting of keys for keyed aead/hash/skcipher algorithms. - Add multibuf speed tests in tcrypt. Algorithms: - Improve performance of sha3-generic. - Add native sha512 support on arm64. - Add v8.2 Crypto Extentions version of sha3/sm3 on arm64. - Avoid hmac nesting by requiring underlying algorithm to be unkeyed. - Add cryptd_max_cpu_qlen module parameter to cryptd. Drivers: - Add support for EIP97 engine in inside-secure. - Add inline IPsec support to chelsio. - Add RevB core support to crypto4xx. - Fix AEAD ICV check in crypto4xx. - Add stm32 crypto driver. - Add support for BCM63xx platforms in bcm2835 and remove bcm63xx. - Add Derived Key Protocol (DKP) support in caam. - Add Samsung Exynos True RNG driver. - Add support for Exynos5250+ SoCs in exynos PRNG driver" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (166 commits) crypto: picoxcell - Fix error handling in spacc_probe() crypto: arm64/sha512 - fix/improve new v8.2 Crypto Extensions code crypto: arm64/sm3 - new v8.2 Crypto Extensions implementation crypto: arm64/sha3 - new v8.2 Crypto Extensions implementation crypto: testmgr - add new testcases for sha3 crypto: sha3-generic - export init/update/final routines crypto: sha3-generic - simplify code crypto: sha3-generic - rewrite KECCAK transform to help the compiler optimize crypto: sha3-generic - fixes for alignment and big endian operation crypto: aesni - handle zero length dst buffer crypto: artpec6 - remove select on non-existing CRYPTO_SHA384 hwrng: bcm2835 - Remove redundant dev_err call in bcm2835_rng_probe() crypto: stm32 - remove redundant dev_err call in stm32_cryp_probe() crypto: axis - remove unnecessary platform_get_resource() error check crypto: testmgr - test misuse of result in ahash crypto: inside-secure - make function safexcel_try_push_requests static crypto: aes-generic - fix aes-generic regression on powerpc crypto: chelsio - Fix indentation warning crypto: arm64/sha1-ce - get rid of literal pool crypto: arm64/sha2-ce - move the round constant table to .rodata section ...
2018-01-31Merge tag 'selinux-pr-20180130' of ↵Linus Torvalds3-14/+13
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux Pull selinux updates from Paul Moore: "A small pull request this time, just three patches, and one of these is just a comment update (swap the FSF physical address for a URL). The other two patches are small bug fixes found by szybot/syzkaller; they individual patch descriptions should tell you all you ever wanted to know" * tag 'selinux-pr-20180130' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux: selinux: skip bounded transition processing if the policy isn't loaded selinux: ensure the context is NUL terminated in security_context_to_sid_core() security: replace FSF address with web source in license notices
2018-01-31Merge branch 'next-seccomp' of ↵Linus Torvalds4-30/+94
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security Pull seccomp updates from James Morris: "Add support for retrieving seccomp metadata" * 'next-seccomp' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: ptrace, seccomp: add support for retrieving seccomp metadata seccomp: hoist out filter resolving logic
2018-01-31Merge branch 'next-tpm' of ↵Linus Torvalds33-443/+789
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security Pull tpm updates from James Morris: - reduce polling delays in tpm_tis - support retrieving TPM 2.0 Event Log through EFI before ExitBootServices - replace tpm-rng.c with a hwrng device managed by the driver for each TPM device - TPM resource manager synthesizes TPM_RC_COMMAND_CODE response instead of returning -EINVAL for unknown TPM commands. This makes user space more sound. - CLKRUN fixes: * Keep #CLKRUN disable through the entier TPM command/response flow * Check whether #CLKRUN is enabled before disabling and enabling it again because enabling it breaks PS/2 devices on a system where it is disabled * 'next-tpm' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: tpm: remove unused variables tpm: remove unused data fields from I2C and OF device ID tables tpm: only attempt to disable the LPC CLKRUN if is already enabled tpm: follow coding style for variable declaration in tpm_tis_core_init() tpm: delete the TPM_TIS_CLK_ENABLE flag tpm: Update MAINTAINERS for Jason Gunthorpe tpm: Keep CLKRUN enabled throughout the duration of transmit_cmd() tpm_tis: Move ilb_base_addr to tpm_tis_data tpm2-cmd: allow more attempts for selftest execution tpm: return a TPM_RC_COMMAND_CODE response if command is not implemented tpm: Move Linux RNG connection to hwrng tpm: use struct tpm_chip for tpm_chip_find_get() tpm: parse TPM event logs based on EFI table efi: call get_event_log before ExitBootServices tpm: add event log format version tpm: rename event log provider files tpm: move tpm_eventlog.h outside of drivers folder tpm: use tpm_msleep() value as max delay tpm: reduce tpm polling delay in tpm_tis_core tpm: move wait_for_tpm_stat() to respective driver files
2018-01-31Merge branch 'next-smack' of ↵Linus Torvalds3-12/+39
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security Pull smack updates from James Morris: "Two minor fixes" * 'next-smack' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: Smack: Privilege check on key operations Smack: fix dereferenced before check
2018-01-31Merge branch 'next-integrity' of ↵Linus Torvalds14-125/+332
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security Pull integrity updates from James Morris: "This contains a mixture of bug fixes, code cleanup, and new functionality. Of note is the integrity cache locking fix, file change detection, and support for a new EVM portable and immutable signature type. The re-introduction of the integrity cache lock (iint) fixes the problem of attempting to take the i_rwsem shared a second time, when it was previously taken exclusively. Defining atomic flags resolves the original iint/i_rwsem circular locking - accessing the file data vs. modifying the file metadata. Although it fixes the O_DIRECT problem as well, a subsequent patch is needed to remove the explicit O_DIRECT prevention. For performance reasons, detecting when a file has changed and needs to be re-measured, re-appraised, and/or re-audited, was limited to after the last writer has closed, and only if the file data has changed. Detecting file change is based on i_version. For filesystems that do not support i_version, remote filesystems, or userspace filesystems, the file was measured, appraised and/or audited once and never re-evaluated. Now local filesystems, which do not support i_version or are not mounted with the i_version option, assume the file has changed and are required to re-evaluate the file. This change does not address detecting file change on remote or userspace filesystems. Unlike file data signatures, which can be included and distributed in software packages (eg. rpm, deb), the existing EVM signature, which protects the file metadata, could not be included in software packages, as it includes file system specific information (eg. i_ino, possibly the UUID). This pull request defines a new EVM portable and immutable file metadata signature format, which can be included in software packages" * 'next-integrity' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: ima/policy: fix parsing of fsuuid ima: Use i_version only when filesystem supports it integrity: remove unneeded initializations in integrity_iint_cache entries ima: log message to module appraisal error ima: pass filename to ima_rdwr_violation_check() ima: Fix line continuation format ima: support new "hash" and "dont_hash" policy actions ima: re-introduce own integrity cache lock EVM: Add support for portable signature format EVM: Allow userland to permit modification of EVM-protected metadata ima: relax requiring a file signature for new files with zero length
2018-01-31Merge branch 'for-linus' of ↵Linus Torvalds13-187/+227
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching Pull livepatching updates from Jiri Kosina: - handle 'infinitely'-long sleeping tasks, from Miroslav Benes - remove 'immediate' feature, as it turns out it doesn't provide the originally expected semantics, and brings more issues than value * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching: livepatch: add locking to force and signal functions livepatch: Remove immediate feature livepatch: force transition to finish livepatch: send a fake signal to all blocking tasks
2018-01-31Merge branch 'for-linus' of ↵Linus Torvalds27-1396/+1995
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid Pull HID updates from Jiri Kosina: - remove hid_have_special_driver[] entry hard requirement for any newly supported VID/PID by a specific non-core hid driver, and general related cleanup of HID matching core, from Benjamin Tissoires - support for new Wacom devices and a few small fixups for already supported ones in Wacom driver, from Aaron Armstrong Skomra and Jason Gerecke - sysfs interface fix for roccat driver from Dan Carpenter - support for new Asus HW (T100TAF, T100HA, T200TA) from Hans de Goede - improved support for Jabra devices, from Niels Skou Olsen - other assorted small fixes and new device IDs * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid: (30 commits) HID: quirks: Fix keyboard + touchpad on Toshiba Click Mini not working HID: roccat: prevent an out of bounds read in kovaplus_profile_activated() HID: asus: Fix special function keys on T200TA HID: asus: Add touchpad max x/y and resolution info for the T200TA HID: wacom: Add support for One by Wacom (CTL-472 / CTL-672) HID: wacom: Fix reporting of touch toggle (WACOM_HID_WD_MUTE_DEVICE) events HID: intel-ish-hid: Enable Cannon Lake and Coffee Lake laptop/desktop HID: elecom: rewrite report fixup for EX-G and future mice HID: sony: Report DS4 version info through sysfs HID: sony: Print reversed MAC address via %pMR HID: wacom: EKR: ensure devres groups at higher indexes are released HID: rmi: Support the Fujitsu R726 Pad dock using hid-rmi HID: add quirk for another PIXART OEM mouse used by HP HID: quirks: make array hid_quirks static HID: hid-multitouch: support fine-grain orientation reporting HID: asus: Add product-id for the T100TAF and T100HA keyboard docks HID: elo: clear BTN_LEFT mapping HID: multitouch: Combine all left-button events in a frame HID: multitouch: Only look at non touch fields in first packet of a frame HID: multitouch: Properly deal with Win8 PTP reports with 0 touches ...
2018-01-31Merge tag 'for-linus-4.16-1' of git://github.com/cminyard/linux-ipmiLinus Torvalds7-14/+21
Pull IPMI updates from Corey Minyard: "Small fixes for various things, been sitting in next for a while (some a long time)" * tag 'for-linus-4.16-1' of git://github.com/cminyard/linux-ipmi: ipmi_ssif: Remove duplicate NULL check ipmi/powernv: Fix error return code in ipmi_powernv_probe() ipmi: use dynamic memory for DMI driver override ipmi/ipmi_powernv: remove outdated todo in powernv IPMI driver ipmi: Clear smi_info->thread to prevent use-after-free during module unload ipmi: use correct string length ipmi_si: Fix error handling of platform device ipmi watchdog: fix typo in parameter description ipmi_si_platform: Fix typo in parameter description
2018-01-31Merge tag 'for-v4.16' of ↵Linus Torvalds22-489/+394
git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply Pull power supply and reset updates from Sebastian Reichel: - bq27xxx: add bq27521 support - drop unused imx-snvs-poweroff driver - improve axp288 driver - misc fixes * tag 'for-v4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply: (32 commits) power: supply: max17042_battery: Always fall back to default platform-data power: supply: max17042_battery: Check battery current for status when supplied MAINTAINERS: Add AXP288 PMIC entry power: supply: axp288_fuel_gauge: Do not register our psy on (some) HDMI sticks power: supply: axp288_fuel_gauge: Optimize get_current() power: supply: axp288_fuel_gauge: Rework get_status() power: reset: account for const type of of_device_id.data power: supply: account for const type of of_device_id.data bq24190: Simplify code in property_is_writeable power: supply: axp288_fuel_gauge: Get iio-channels once during boot power: supply: axp288_charger: Properly stop work on probe-error / remove power: supply: axp288_charger: Simplify extcon cable handling power: supply: axp288_charger: Use the right property for the input current limit power: supply: axp288_charger: Pick lower input current limit not higher power: supply: axp288_charger: Do not cache input current limit value power: supply: axp288_charger: Remove no longer needed locking power: supply: axp288_charger: Use regmap_update_bits to set the input limits power: supply: axp288_charger: Cleanup some double empty lines power: supply: axp288_charger: Remove charger-enabled state tracking power: supply: axp288_charger: Add missing newlines to some messages ...
2018-01-31Merge tag 'hsi-for-4.16' of ↵Linus Torvalds1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-hsi Pull HSI update from Sebastian Reichel: "Y2038 fix for cmt-speech" * tag 'hsi-for-4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-hsi: HSI: cmt_speech: use timespec64 instead of timespec
2018-01-31Merge tag 'gpio-v4.16-1' of ↵Linus Torvalds60-696/+2743
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio Pull GPIO updates from Linus Walleij: "The is the bulk of GPIO changes for the v4.16 kernel cycle. It is pretty calm this time around I think. I even got time to get to things like starting to clean up header includes. Core changes: - Disallow open drain and open source flags to be set simultaneously. This doesn't make electrical sense, and would the hardware actually respond to this setting, the result would be short circuit. - ACPI GPIO has a new core infrastructure for handling quirks. The quirks are there to deal with broken ACPI tables centrally instead of pushing the work to individual drivers. In the world of BIOS writers, the ACPI tables are perfect. Until they find a mistake in it. When such a mistake is found, we can patch it with a quirk. It should never happen, the problem is that it happens. So we accomodate for it. - Several documentation updates. - Revert the patch setting up initial direction state from reading the device. This was causing bad things for drivers that can't read status on all its pins. It is only affecting debugfs information quality. - Label descriptors with the device name if no explicit label is passed in. - Pave the ground for transitioning SPI and regulators to use GPIO descriptors by implementing some quirks in the device tree GPIO parsing code. New drivers: - New driver for the Access PCIe IDIO 24 family. Other: - Major refactorings and improvements to the GPIO mockup driver used for test and verification. - Moved the AXP209 driver over to pin control since it gained a pin control back-end. These patches will appear (with the same hashes) in the pin control pull request as well. - Convert the onewire GPIO driver w1-gpio to use descriptors. This is merged here since the W1 maintainers send very few pull requests and he ACKed it. - Start to clean up driver headers using <linux/gpio.h> to just use <linux/gpio/driver.h> as appropriate" * tag 'gpio-v4.16-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio: (103 commits) gpio: Timestamp events in hardirq handler gpio: Fix kernel stack leak to userspace gpio: Fix a documentation spelling mistake gpio: Documentation update gpiolib: remove redundant initialization of pointer desc gpio: of: Fix NPE from OF flags gpio: stmpe: Delete an unnecessary variable initialisation in stmpe_gpio_probe() gpio: stmpe: Move an assignment in stmpe_gpio_probe() gpio: stmpe: Improve a size determination in stmpe_gpio_probe() gpio: stmpe: Use seq_putc() in stmpe_dbg_show() gpio: No NULL owner gpio: stmpe: i2c transfer are forbiden in atomic context gpio: davinci: Include proper header gpio: da905x: Include proper header gpio: cs5535: Include proper header gpio: crystalcove: Include proper header gpio: bt8xx: Include proper header gpio: bcm-kona: Include proper header gpio: arizona: Include proper header gpio: amd8111: Include proper header ...
2018-01-31Merge tag 'leds_for_4.16-rc1' of ↵Linus Torvalds14-55/+1091
git://git.kernel.org/pub/scm/linux/kernel/git/j.anaszewski/linux-leds Pull LED updates from Jacek Anaszewski: "New LED class driver: - introduce LM3692x dual string driver New LED trigger: - introduce a NETDEV trigger leds-lp8860: - various fixes to align with LED framework - add regulator enable during init - DT support related improvements Minor fixes and cleanups to the LED class drivers: - leds-pwm - ledtrig-activity - leds-blinkm - leds-as3645a - ledtrig-transient" * tag 'leds_for_4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/j.anaszewski/linux-leds: leds: ledtrig-transient: Add SPDX license identifiers leds: lp8860: Various fixes to align with LED framework leds: lp8860: Add DT parsing to retrieve the trigger node dt: bindings: lp8860: Add trigger binding to the lp8860 leds: lp8860: Update the dt parsing for LED labeling dt: bindings: lp8860: Update DT label binding dt: bindings: lp8860: Update bindings for lp8860 leds: as3645a: Fix line over 80 characters leds: as3645a: Fix quoted string split warning leds: lm3692x: Introduce LM3692x dual string driver dt: bindings: lm3692x: Add bindings for lm3692x LED driver leds: trigger: Introduce a NETDEV trigger leds: blinkm: avoid uninitialized data use ledtrig-activity: Grammar s/a immediate/an immediate/ leds: pwm: Remove unneeded header file leds: lp8860: Add regulator enable during init leds: lp8860: Fix linuxdoc format for structure