aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2019-05-14fs/binfmt_elf.c: extract PROT_* calculationsAlexey Dobriyan1-14/+16
There are two places where mapping protections are calculated: one for executable, another one for interpreter -- take them out. ELF read and execute permissions are interchanged with Linux PROT_READ and PROT_EXEC, microoptimizations are welcome! Link: http://lkml.kernel.org/r/20190417213413.GB26474@avx2 Signed-off-by: Alexey Dobriyan <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-05-14fs//binfmt_elf.c: move variables initialization closer to their usageAlexey Dobriyan1-8/+8
Link: http://lkml.kernel.org/r/20190416202002.GB24304@avx2 Signed-off-by: Alexey Dobriyan <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-05-14fs/binfmt_elf.c: save 1 indent levelAlexey Dobriyan1-57/+54
Rewrite for (...) { if (->p_type == PT_INTERP) { ... break; } } loop into for (...) { if (->p_type != PT_INTERP) continue; ... break; } Link: http://lkml.kernel.org/r/20190416201906.GA24304@avx2 Signed-off-by: Alexey Dobriyan <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-05-14fs/binfmt_elf.c: delete trailing "return;" in functions returning "void"Alexey Dobriyan1-4/+0
Link: http://lkml.kernel.org/r/20190314205042.GE18143@avx2 Signed-off-by: Alexey Dobriyan <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-05-14fs/binfmt_elf.c: free PT_INTERP filename ASAPAlexey Dobriyan1-9/+11
There is no reason for PT_INTERP filename to linger till the end of the whole loading process. Link: http://lkml.kernel.org/r/20190314204953.GD18143@avx2 Signed-off-by: Alexey Dobriyan <[email protected]> Signed-off-by: Nikitas Angelinas <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Cc: Mukesh Ojha <[email protected]> [[email protected]: fix GPF when dereferencing invalid interpreter] Link: http://lkml.kernel.org/r/20190330140032.GA1527@vostro Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-05-14fs/binfmt_elf.c: make scope of "pos" variable smallerAlexey Dobriyan1-1/+2
Link: http://lkml.kernel.org/r/20190314204707.GC18143@avx2 Signed-off-by: Alexey Dobriyan <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-05-14fs/binfmt_elf.c: remove unneeded initialization of mm->start_stackAndrew Morton1-2/+0
As pointed out by [email protected], setup_arg_pages() already initialized current->mm->start_stack. Link: https://bugzilla.kernel.org/show_bug.cgi?id=202881 Reported-by: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-05-14lib/test_vmalloc.c:test_func(): eliminate local `ret'Andrew Morton1-5/+3
Local 'ret' is unneeded and was poorly named: the variable `ret' generally means the "the value which this function will return". Cc: Roman Gushchin <[email protected]> Cc: Uladzislau Rezki <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Thomas Garnier <[email protected]> Cc: Oleksiy Avramchenko <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Joel Fernandes <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Tejun Heo <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-05-14include/linux/bitops.h: sanitize rotate primitivesRasmus Villemoes1-8/+8
The ror32 implementation (word >> shift) | (word << (32 - shift) has undefined behaviour if shift is outside the [1, 31] range. Similarly for the 64 bit variants. Most callers pass a compile-time constant (naturally in that range), but there's an UBSAN report that these may actually be called with a shift count of 0. Instead of special-casing that, we can make them DTRT for all values of shift while also avoiding UB. For some reason, this was already partly done for rol32 (which was well-defined for [0, 31]). gcc 8 recognizes these patterns as rotates, so for example __u32 rol32(__u32 word, unsigned int shift) { return (word << (shift & 31)) | (word >> ((-shift) & 31)); } compiles to 0000000000000020 <rol32>: 20: 89 f8 mov %edi,%eax 22: 89 f1 mov %esi,%ecx 24: d3 c0 rol %cl,%eax 26: c3 retq Older compilers unfortunately do not do as well, but this only affects the small minority of users that don't pass constants. Due to integer promotions, ro[lr]8 were already well-defined for shifts in [0, 8], and ro[lr]16 were mostly well-defined for shifts in [0, 16] (only mostly - u16 gets promoted to _signed_ int, so if bit 15 is set, word << 16 is undefined). For consistency, update those as well. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Rasmus Villemoes <[email protected]> Reported-by: Ido Schimmel <[email protected]> Tested-by: Ido Schimmel <[email protected]> Reviewed-by: Will Deacon <[email protected]> Cc: Vadim Pasternak <[email protected]> Cc: Andrey Ryabinin <[email protected]> Cc: Jacek Anaszewski <[email protected]> Cc: Pavel Machek <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-05-14lib/test_bitmap: add tests for bitmap_parselist_user()Yury Norov1-10/+36
Propagate existing bitmap_parselist() tests to bitmap_parselist_user(). Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Yury Norov <[email protected]> Reviewed-by: Andy Shevchenko <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Kees Cook <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Mike Travis <[email protected]> Cc: Rasmus Villemoes <[email protected]> Cc: Tetsuo Handa <[email protected]> Cc: Guenter Roeck <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-05-14lib/test_bitmap: add testcases for bitmap_parselist()Yury Norov1-1/+17
Add tests for non-number character, empty regions, integer overflow. [[email protected]: v5] Link: http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Yury Norov <[email protected]> Reviewed-by: Andy Shevchenko <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Kees Cook <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Mike Travis <[email protected]> Cc: Rasmus Villemoes <[email protected]> Cc: Tetsuo Handa <[email protected]> Cc: Guenter Roeck <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-05-14lib/test_bitmap: switch test_bitmap_parselist to ktime_get()Yury Norov1-5/+4
test_bitmap_parselist currently uses get_cycles which is not implemented on some platforms, so use ktime_get() instead. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Yury Norov <[email protected]> Reviewed-by: Andy Shevchenko <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Kees Cook <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Mike Travis <[email protected]> Cc: Rasmus Villemoes <[email protected]> Cc: Tetsuo Handa <[email protected]> Cc: Guenter Roeck <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-05-14lib: rework bitmap_parselistYury Norov1-113/+142
Remove __bitmap_parselist helper and split the function to logical parts. [[email protected]: v5] Link: http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Yury Norov <[email protected]> Reviewed-by: Andy Shevchenko <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Kees Cook <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Mike Travis <[email protected]> Cc: Rasmus Villemoes <[email protected]> Cc: Tetsuo Handa <[email protected]> Cc: Guenter Roeck <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-05-14lib: make bitmap_parselist_user() a wrapper on bitmap_parselist()Yury Norov1-8/+11
Patch series "lib: rework bitmap_parselist and tests", v5. bitmap_parselist has been evolved from a pretty simple idea for long and now lacks for refactoring. It is not structured, has nested loops and a set of opaque-named variables. Things are more complicated because bitmap_parselist() is a part of user interface, and its behavior should not change. In this patchset - bitmap_parselist_user() made a wrapper on bitmap_parselist(); - bitmap_parselist() reworked (patch 2); - time measurement in test_bitmap_parselist switched to ktime_get (patch 3); - new tests introduced (patch 4), and - bitmap_parselist_user() testing enabled with the same testset as bitmap_parselist() (patch 5). This patch (of 5): Currently we parse user data byte after byte which leads to overcomplification of parsing algorithm. The only user of bitmap_parselist_user() is not performance-critical, and so we can duplicate user data to kernel buffer and simply call bitmap_parselist(). This rework lets us unify and simplify bitmap_parselist() and bitmap_parselist_user(), which is done in the following patch. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Yury Norov <[email protected]> Reviewed-by: Andy Shevchenko <[email protected]> Cc: Rasmus Villemoes <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Kees Cook <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Tetsuo Handa <[email protected]> Cc: Mike Travis <[email protected]> Cc: Guenter Roeck <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-05-14lib/math: move int_pow() from pwm_bl.c for wider useAndy Shevchenko4-16/+34
The integer exponentiation is used in few places and might be used in the future by other call sites. Move it to wider use. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Andy Shevchenko <[email protected]> Cc: Daniel Thompson <[email protected]> Cc: Lee Jones <[email protected]> Cc: Ray Jui <[email protected]> Cc: Thierry Reding <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-05-14lib: Move mathematic helpers to separate folderAndy Shevchenko13-24/+27
For better maintenance and expansion move the mathematic helpers to the separate folder. No functional change intended. Note, the int_sqrt() is not used as a part of lib, so, moved to regular obj. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Andy Shevchenko <[email protected]> Signed-off-by: Mauro Carvalho Chehab <[email protected]> Cc: Randy Dunlap <[email protected]> Cc: Thierry Reding <[email protected]> Cc: Lee Jones <[email protected]> Cc: Daniel Thompson <[email protected]> Cc: Ray Jui <[email protected]> [[email protected]: fix broken doc references for div64.c and gcd.c] Link: http://lkml.kernel.org/r/734f49bae5d4052b3c25691dfefad59bea2e5843.1555580999.git.mchehab+samsung@kernel.org Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-05-14lib/list_sort: optimize number of calls to comparison functionGeorge Spelvin1-22/+91
CONFIG_RETPOLINE has severely degraded indirect function call performance, so it's worth putting some effort into reducing the number of times cmp() is called. This patch avoids badly unbalanced merges on unlucky input sizes. It slightly increases the code size, but saves an average of 0.2*n calls to cmp(). x86-64 code size 739 -> 803 bytes (+64) Unfortunately, there's not a lot of low-hanging fruit in a merge sort; it already performs only n*log2(n) - K*n + O(1) compares. The leading coefficient is already at the theoretical limit (log2(n!) corresponds to K=1.4427), so we're fighting over the linear term, and the best mergesort can do is K=1.2645, achieved when n is a power of 2. The differences between mergesort variants appear when n is *not* a power of 2; K is a function of the fractional part of log2(n). Top-down mergesort does best of all, achieving a minimum K=1.2408, and an average (over all sizes) K=1.248. However, that requires knowing the number of entries to be sorted ahead of time, and making a full pass over the input to count it conflicts with a second performance goal, which is cache blocking. Obviously, we have to read the entire list into L1 cache at some point, and performance is best if it fits. But if it doesn't fit, each full pass over the input causes a cache miss per element, which is undesirable. While textbooks explain bottom-up mergesort as a succession of merging passes, practical implementations do merging in depth-first order: as soon as two lists of the same size are available, they are merged. This allows as many merge passes as possible to fit into L1; only the final few merges force cache misses. This cache-friendly depth-first merge order depends on us merging the beginning of the input as much as possible before we've even seen the end of the input (and thus know its size). The simple eager merge pattern causes bad performance when n is just over a power of 2. If n=1028, the final merge is between 1024- and 4-element lists, which is wasteful of comparisons. (This is actually worse on average than n=1025, because a 1204:1 merge will, on average, end after 512 compares, while 1024:4 will walk 4/5 of the list.) Because of this, bottom-up mergesort achieves K < 0.5 for such sizes, and has an average (over all sizes) K of around 1. (My experiments show K=1.01, while theory predicts K=0.965.) There are "worst-case optimal" variants of bottom-up mergesort which avoid this bad performance, but the algorithms given in the literature, such as queue-mergesort and boustrodephonic mergesort, depend on the breadth-first multi-pass structure that we are trying to avoid. This implementation is as eager as possible while ensuring that all merge passes are at worst 1:2 unbalanced. This achieves the same average K=1.207 as queue-mergesort, which is 0.2*n better then bottom-up, and only 0.04*n behind top-down mergesort. Specifically, defers merging two lists of size 2^k until it is known that there are 2^k additional inputs following. This ensures that the final uneven merges triggered by reaching the end of the input will be at worst 2:1. This will avoid cache misses as long as 3*2^k elements fit into the cache. (I confess to being more than a little bit proud of how clean this code turned out. It took a lot of thinking, but the resultant inner loop is very simple and efficient.) Refs: Bottom-up Mergesort: A Detailed Analysis Wolfgang Panny, Helmut Prodinger Algorithmica 14(4):340--354, October 1995 https://doi.org/10.1007/BF01294131 https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.6.5260 The cost distribution of queue-mergesort, optimal mergesorts, and power-of-two rules Wei-Mei Chen, Hsien-Kuei Hwang, Gen-Huey Chen Journal of Algorithms 30(2); Pages 423--448, February 1999 https://doi.org/10.1006/jagm.1998.0986 https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.4.5380 Queue-Mergesort Mordecai J. Golin, Robert Sedgewick Information Processing Letters, 48(5):253--259, 10 December 1993 https://doi.org/10.1016/0020-0190(93)90088-q https://sci-hub.tw/10.1016/0020-0190(93)90088-Q Feedback from Rasmus Villemoes <[email protected]>. Link: http://lkml.kernel.org/r/fd560853cc4dca0d0f02184ffa888b4c1be89abc.1552704200.git.lkml@sdf.org Signed-off-by: George Spelvin <[email protected]> Acked-by: Andrey Abramov <[email protected]> Acked-by: Rasmus Villemoes <[email protected]> Reviewed-by: Andy Shevchenko <[email protected]> Cc: Daniel Wagner <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Don Mullis <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-05-14lib/list_sort: simplify and remove MAX_LIST_LENGTH_BITSGeorge Spelvin2-62/+104
Rather than a fixed-size array of pending sorted runs, use the ->prev links to keep track of things. This reduces stack usage, eliminates some ugly overflow handling, and reduces the code size. Also: * merge() no longer needs to handle NULL inputs, so simplify. * The same applies to merge_and_restore_back_links(), which is renamed to the less ponderous merge_final(). (It's a static helper function, so we don't need a super-descriptive name; comments will do.) * Document the actual return value requirements on the (*cmp)() function; some callers are already using this feature. x86-64 code size 1086 -> 739 bytes (-347) (Yes, I see checkpatch complaining about no space after comma in "__attribute__((nonnull(2,3,4,5)))". Checkpatch is wrong.) Feedback from Rasmus Villemoes, Andy Shevchenko and Geert Uytterhoeven. [[email protected]: remove __pure usage due to mysterious warning] Link: http://lkml.kernel.org/r/f63c410e0ff76009c9b58e01027e751ff7fdb749.1552704200.git.lkml@sdf.org Signed-off-by: George Spelvin <[email protected]> Acked-by: Andrey Abramov <[email protected]> Acked-by: Rasmus Villemoes <[email protected]> Reviewed-by: Andy Shevchenko <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Daniel Wagner <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Don Mullis <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-05-14lib/sort: avoid indirect calls to built-in swapGeorge Spelvin1-15/+36
Similar to what's being done in the net code, this takes advantage of the fact that most invocations use only a few common swap functions, and replaces indirect calls to them with (highly predictable) conditional branches. (The downside, of course, is that if you *do* use a custom swap function, there are a few extra predicted branches on the code path.) This actually *shrinks* the x86-64 code, because it inlines the various swap functions inside do_swap, eliding function prologues & epilogues. x86-64 code size 767 -> 703 bytes (-64) Link: http://lkml.kernel.org/r/d10c5d4b393a1847f32f5b26f4bbaa2857140e1e.1552704200.git.lkml@sdf.org Signed-off-by: George Spelvin <[email protected]> Acked-by: Andrey Abramov <[email protected]> Acked-by: Rasmus Villemoes <[email protected]> Reviewed-by: Andy Shevchenko <[email protected]> Cc: Daniel Wagner <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Don Mullis <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-05-14lib/sort: use more efficient bottom-up heapsort variantGeorge Spelvin1-31/+81
This uses fewer comparisons than the previous code (approaching half as many for large random inputs), but produces identical results; it actually performs the exact same series of swap operations. Specifically, it reduces the average number of compares from 2*n*log2(n) - 3*n + o(n) to n*log2(n) + 0.37*n + o(n). This is still 1.63*n worse than glibc qsort() which manages n*log2(n) - 1.26*n, but at least the leading coefficient is correct. Standard heapsort, when sifting down, performs two comparisons per level: one to find the greater child, and a second to see if the current node should be exchanged with that child. Bottom-up heapsort observes that it's better to postpone the second comparison and search for the leaf where -infinity would be sent to, then search back *up* for the current node's destination. Since sifting down usually proceeds to the leaf level (that's where half the nodes are), this does O(1) second comparisons rather than log2(n). That saves a lot of (expensive since Spectre) indirect function calls. The one time it's worse than the previous code is if there are large numbers of duplicate keys, when the top-down algorithm is O(n) and bottom-up is O(n log n). For distinct keys, it's provably always better, doing 1.5*n*log2(n) + O(n) in the worst case. (The code is not significantly more complex. This patch also merges the heap-building and -extracting sift-down loops, resulting in a net code size savings.) x86-64 code size 885 -> 767 bytes (-118) (I see the checkpatch complaint about "else if (n -= size)". The alternative is significantly uglier.) Link: http://lkml.kernel.org/r/2de8348635a1a421a72620677898c7fd5bd4b19d.1552704200.git.lkml@sdf.org Signed-off-by: George Spelvin <[email protected]> Acked-by: Andrey Abramov <[email protected]> Acked-by: Rasmus Villemoes <[email protected]> Reviewed-by: Andy Shevchenko <[email protected]> Cc: Daniel Wagner <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Don Mullis <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-05-14lib/sort: make swap functions more genericGeorge Spelvin1-24/+99
Patch series "lib/sort & lib/list_sort: faster and smaller", v2. Because CONFIG_RETPOLINE has made indirect calls much more expensive, I thought I'd try to reduce the number made by the library sort functions. The first three patches apply to lib/sort.c. Patch #1 is a simple optimization. The built-in swap has special cases for aligned 4- and 8-byte objects. But those are almost never used; most calls to sort() work on larger structures, which fall back to the byte-at-a-time loop. This generalizes them to aligned *multiples* of 4 and 8 bytes. (If nothing else, it saves an awful lot of energy by not thrashing the store buffers as much.) Patch #2 grabs a juicy piece of low-hanging fruit. I agree that nice simple solid heapsort is preferable to more complex algorithms (sorry, Andrey), but it's possible to implement heapsort with far fewer comparisons (50% asymptotically, 25-40% reduction for realistic sizes) than the way it's been done up to now. And with some care, the code ends up smaller, as well. This is the "big win" patch. Patch #3 adds the same sort of indirect call bypass that has been added to the net code of late. The great majority of the callers use the builtin swap functions, so replace the indirect call to sort_func with a (highly preditable) series of if() statements. Rather surprisingly, this decreased code size, as the swap functions were inlined and their prologue & epilogue code eliminated. lib/list_sort.c is a bit trickier, as merge sort is already close to optimal, and we don't want to introduce triumphs of theory over practicality like the Ford-Johnson merge-insertion sort. Patch #4, without changing the algorithm, chops 32% off the code size and removes the part[MAX_LIST_LENGTH+1] pointer array (and the corresponding upper limit on efficiently sortable input size). Patch #5 improves the algorithm. The previous code is already optimal for power-of-two (or slightly smaller) size inputs, but when the input size is just over a power of 2, there's a very unbalanced final merge. There are, in the literature, several algorithms which solve this, but they all depend on the "breadth-first" merge order which was replaced by commit 835cc0c8477f with a more cache-friendly "depth-first" order. Some hard thinking came up with a depth-first algorithm which defers merges as little as possible while avoiding bad merges. This saves 0.2*n compares, averaged over all sizes. The code size increase is minimal (64 bytes on x86-64, reducing the net savings to 26%), but the comments expanded significantly to document the clever algorithm. TESTING NOTES: I have some ugly user-space benchmarking code which I used for testing before moving this code into the kernel. Shout if you want a copy. I'm running this code right now, with CONFIG_TEST_SORT and CONFIG_TEST_LIST_SORT, but I confess I haven't rebooted since the last round of minor edits to quell checkpatch. I figure there will be at least one round of comments and final testing. This patch (of 5): Rather than having special-case swap functions for 4- and 8-byte objects, special-case aligned multiples of 4 or 8 bytes. This speeds up most users of sort() by avoiding fallback to the byte copy loop. Despite what ca96ab859ab4 ("lib/sort: Add 64 bit swap function") claims, very few users of sort() sort pointers (or pointer-sized objects); most sort structures containing at least two words. (E.g. drivers/acpi/fan.c:acpi_fan_get_fps() sorts an array of 40-byte struct acpi_fan_fps.) The functions also got renamed to reflect the fact that they support multiple words. In the great tradition of bikeshedding, the names were by far the most contentious issue during review of this patch series. x86-64 code size 872 -> 886 bytes (+14) With feedback from Andy Shevchenko, Rasmus Villemoes and Geert Uytterhoeven. Link: http://lkml.kernel.org/r/f24f932df3a7fa1973c1084154f1cea596bcf341.1552704200.git.lkml@sdf.org Signed-off-by: George Spelvin <[email protected]> Acked-by: Andrey Abramov <[email protected]> Acked-by: Rasmus Villemoes <[email protected]> Reviewed-by: Andy Shevchenko <[email protected]> Cc: Rasmus Villemoes <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Daniel Wagner <[email protected]> Cc: Don Mullis <[email protected]> Cc: Dave Chinner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-05-14lib/plist: rename DEBUG_PI_LIST to DEBUG_PLISTDavidlohr Bueso3-5/+5
This is a lot more appropriate than PI_LIST, which in the kernel one would assume that it has to do with priority-inheritance; which is not -- furthermore futexes make use of plists so this can be even more confusing, albeit the debug nature of the config option. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Davidlohr Bueso <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-05-14lib/bitmap.c: guard exotic bitmap functions by CONFIG_NUMARasmus Villemoes1-0/+2
The bitmap_remap, _bitremap, _onto and _fold functions are only used, via their node_ wrappers, in mm/mempolicy.c, which is only built for CONFIG_NUMA. The helper bitmap_ord_to_pos used by these functions is global, but its only external caller is node_random() in lib/nodemask.c, which is also guarded by CONFIG_NUMA. For !CONFIG_NUMA: add/remove: 0/6 grow/shrink: 0/0 up/down: 0/-621 (-621) Function old new delta bitmap_pos_to_ord 20 - -20 bitmap_ord_to_pos 70 - -70 bitmap_bitremap 81 - -81 bitmap_fold 113 - -113 bitmap_onto 123 - -123 bitmap_remap 214 - -214 Total: Before=4776, After=4155, chg -13.00% Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Rasmus Villemoes <[email protected]> Cc: Andy Shevchenko <[email protected]> Cc: Yury Norov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-05-14lib/bitmap.c: remove unused EXPORT_SYMBOLsRasmus Villemoes1-4/+0
AFAICT, there have never been any callers of these functions outside mm/mempolicy.c (via their nodemask.h wrappers). In particular, no modular code has ever used them, and given their somewhat exotic semantics, I highly doubt they will ever find such a use. In any case, no need to export them currently. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Rasmus Villemoes <[email protected]> Cc: Andy Shevchenko <[email protected]> Cc: Yury Norov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-05-14kernel/user.c: clean up some leftover codeRasmus Villemoes1-6/+1
The out_unlock label is misleading; no unlocking happens after it, so just return NULL directly. Also, nothing between the kmem_cache_zalloc() that creates new and the two key_put() can initialize new->uid_keyring or new->session_keyring, so those calls are no-ops. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Rasmus Villemoes <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Cc: "Peter Zijlstra (Intel)" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-05-14kernel/latencytop.c: rename clear_all_latency_tracing to ↵Lin Feng4-5/+5
clear_tsk_latency_tracing The name clear_all_latency_tracing is misleading, in fact which only clear per task's latency_record[], and we do have another function named clear_global_latency_tracing which clear the global latency_record[] buffer. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Lin Feng <[email protected]> Cc: Alexey Dobriyan <[email protected]> Cc: Fabian Frederick <[email protected]> Cc: Arjan van de Ven <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-05-14kernel/latencytop.c: remove unnecessary checks for latencytop_enabledLin Feng1-6/+0
1. In latencytop source codes, we only have such calling chain: account_scheduler_latency(struct task_struct *task, int usecs, int inter) { if (unlikely(latencytop_enabled)) /* the outtermost check */ __account_scheduler_latency(task, usecs, inter); } __account_scheduler_latency account_global_scheduler_latency if (!latencytop_enabled) So, the inner check for latencytop_enabled is not necessary at all. 2. In clear_all_latency_tracing and now is called clear_tsk_latency_tracing the check for latencytop_enabled is redundant and buggy to some extent. We have no reason to refuse clearing the /proc/$pid/latency if latencytop_enabled is set to 0, considering that if we use latencytop manually by echo 0 > /proc/sys/kernel/latencytop, then we want to clear /proc/$pid/latency and failed. Also we don't have such check in brother function clear_global_latency_tracing. Notes: These changes are only visible to users who set CONFIG_LATENCYTOP and won't change user tool latencytop's behavior. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Lin Feng <[email protected]> Cc: Alexey Dobriyan <[email protected]> Cc: Fabian Frederick <[email protected]> Cc: Arjan van de Ven <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-05-14kernel/notifier.c: double register detectionVasily Averin1-0/+1
By design notifiers can be registerd once only, 2nd register attempt called by mistake silently corrupts notifiers list. A few years ago I investigated described problem, the host was power cycled because of notifier list corruption. I've prepared this patch and applied it to the OpenVZ kernel and sent this patch but nobody commented on it. Later it helped us to detect a similar problem in the OpenVz kernel. Mistakes with notifier registration can happen for example during subsystem initialization from different namespaces, or because of a lost unregister in the roll-back path on initialization failures. The proposed check cannot prevent the described problem, however it allows us to detect its reason quickly without coredump analysis. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Vasily Averin <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-05-14compiler: allow all arches to enable CONFIG_OPTIMIZE_INLININGMasahiro Yamada4-19/+15
Commit 60a3cdd06394 ("x86: add optimized inlining") introduced CONFIG_OPTIMIZE_INLINING, but it has been available only for x86. The idea is obviously arch-agnostic. This commit moves the config entry from arch/x86/Kconfig.debug to lib/Kconfig.debug so that all architectures can benefit from it. This can make a huge difference in kernel image size especially when CONFIG_OPTIMIZE_FOR_SIZE is enabled. For example, I got 3.5% smaller arm64 kernel for v5.1-rc1. dec file 18983424 arch/arm64/boot/Image.before 18321920 arch/arm64/boot/Image.after This also slightly improves the "Kernel hacking" Kconfig menu as e61aca5158a8 ("Merge branch 'kconfig-diet' from Dave Hansen') suggested; this config option would be a good fit in the "compiler option" menu. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Masahiro Yamada <[email protected]> Acked-by: Borislav Petkov <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Boris Brezillon <[email protected]> Cc: Brian Norris <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: David Woodhouse <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Marek Vasut <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Malaterre <[email protected]> Cc: Miquel Raynal <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: Richard Weinberger <[email protected]> Cc: Russell King <[email protected]> Cc: Stefan Agner <[email protected]> Cc: Thomas Gleixner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-05-14powerpc/mm/radix: mark as __tlbie_pid() and friends as__always_inlineMasahiro Yamada1-4/+4
This prepares to move CONFIG_OPTIMIZE_INLINING from x86 to a common place. We need to eliminate potential issues beforehand. If it is enabled for powerpc, the following errors are reported: arch/powerpc/mm/tlb-radix.c: In function '__tlbie_lpid': arch/powerpc/mm/tlb-radix.c:148:2: warning: asm operand 3 probably doesn't match constraints asm volatile(PPC_TLBIE_5(%0, %4, %3, %2, %1) ^~~ arch/powerpc/mm/tlb-radix.c:148:2: error: impossible constraint in 'asm' arch/powerpc/mm/tlb-radix.c: In function '__tlbie_pid': arch/powerpc/mm/tlb-radix.c:118:2: warning: asm operand 3 probably doesn't match constraints asm volatile(PPC_TLBIE_5(%0, %4, %3, %2, %1) ^~~ arch/powerpc/mm/tlb-radix.c: In function '__tlbiel_pid': arch/powerpc/mm/tlb-radix.c:104:2: warning: asm operand 3 probably doesn't match constraints asm volatile(PPC_TLBIEL(%0, %4, %3, %2, %1) ^~~ Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Masahiro Yamada <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Boris Brezillon <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Norris <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: David Woodhouse <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Marek Vasut <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Malaterre <[email protected]> Cc: Miquel Raynal <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: Richard Weinberger <[email protected]> Cc: Russell King <[email protected]> Cc: Stefan Agner <[email protected]> Cc: Thomas Gleixner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-05-14powerpc/mm/radix: mark __radix__flush_tlb_range_psize() as __always_inlineMasahiro Yamada1-1/+1
This prepares to move CONFIG_OPTIMIZE_INLINING from x86 to a common place. We need to eliminate potential issues beforehand. If it is enabled for powerpc, the following error is reported: arch/powerpc/mm/tlb-radix.c: In function '__radix__flush_tlb_range_psize': arch/powerpc/mm/tlb-radix.c:104:2: error: asm operand 3 probably doesn't match constraints [-Werror] asm volatile(PPC_TLBIEL(%0, %4, %3, %2, %1) ^~~ arch/powerpc/mm/tlb-radix.c:104:2: error: impossible constraint in 'asm' Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Masahiro Yamada <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Boris Brezillon <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Norris <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: David Woodhouse <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Marek Vasut <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Malaterre <[email protected]> Cc: Miquel Raynal <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: Richard Weinberger <[email protected]> Cc: Russell King <[email protected]> Cc: Stefan Agner <[email protected]> Cc: Thomas Gleixner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-05-14powerpc/prom_init: mark prom_getprop() and prom_getproplen() as __initMasahiro Yamada1-3/+3
This prepares to move CONFIG_OPTIMIZE_INLINING from x86 to a common place. We need to eliminate potential issues beforehand. If it is enabled for powerpc, the following modpost warnings are reported: WARNING: vmlinux.o(.text.unlikely+0x20): Section mismatch in reference from the function .prom_getprop() to the function .init.text:.call_prom() The function .prom_getprop() references the function __init .call_prom(). This is often because .prom_getprop lacks a __init annotation or the annotation of .call_prom is wrong. WARNING: vmlinux.o(.text.unlikely+0x3c): Section mismatch in reference from the function .prom_getproplen() to the function .init.text:.call_prom() The function .prom_getproplen() references the function __init .call_prom(). This is often because .prom_getproplen lacks a __init annotation or the annotation of .call_prom is wrong. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Masahiro Yamada <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Boris Brezillon <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Norris <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: David Woodhouse <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Marek Vasut <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Malaterre <[email protected]> Cc: Miquel Raynal <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: Richard Weinberger <[email protected]> Cc: Russell King <[email protected]> Cc: Stefan Agner <[email protected]> Cc: Thomas Gleixner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-05-14ARM: mark setup_machine_tags() stub as __init __noreturnMasahiro Yamada1-1/+1
This prepares to move CONFIG_OPTIMIZE_INLINING from x86 to a common place. We need to eliminate potential issues beforehand. If it is enabled for arm, Clang build results in the following modpost warning: WARNING: vmlinux.o(.text+0x1124): Section mismatch in reference from the function setup_machine_tags() to the function .init.text:early_print() The function setup_machine_tags() references the function __init early_print(). This is often because setup_machine_tags lacks a __init annotation or the annotation of early_print is wrong. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Masahiro Yamada <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Boris Brezillon <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Norris <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: David Woodhouse <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Marek Vasut <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Malaterre <[email protected]> Cc: Miquel Raynal <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: Richard Weinberger <[email protected]> Cc: Russell King <[email protected]> Cc: Stefan Agner <[email protected]> Cc: Thomas Gleixner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-05-14MIPS: mark __fls() and __ffs() as __always_inlineMasahiro Yamada1-2/+2
This prepares to move CONFIG_OPTIMIZE_INLINING from x86 to a common place. We need to eliminate potential issues beforehand. If it is enabled for mips, the following errors are reported: arch/mips/mm/sc-mips.o: In function `mips_sc_prefetch_enable.part.2': sc-mips.c:(.text+0x98): undefined reference to `mips_gcr_base' sc-mips.c:(.text+0x9c): undefined reference to `mips_gcr_base' sc-mips.c:(.text+0xbc): undefined reference to `mips_gcr_base' sc-mips.c:(.text+0xc8): undefined reference to `mips_gcr_base' sc-mips.c:(.text+0xdc): undefined reference to `mips_gcr_base' arch/mips/mm/sc-mips.o:sc-mips.c:(.text.unlikely+0x44): more undefined references to `mips_gcr_base' Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Masahiro Yamada <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Boris Brezillon <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Norris <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: David Woodhouse <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Marek Vasut <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Malaterre <[email protected]> Cc: Miquel Raynal <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: Richard Weinberger <[email protected]> Cc: Russell King <[email protected]> Cc: Stefan Agner <[email protected]> Cc: Thomas Gleixner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-05-14mtd: rawnand: vf610_nfc: add initializer to avoid -Wmaybe-uninitializedMasahiro Yamada1-1/+1
This prepares to move CONFIG_OPTIMIZE_INLINING from x86 to a common place. We need to eliminate potential issues beforehand. Kbuild test robot has never reported -Wmaybe-uninitialized warning for this probably because vf610_nfc_run() is inlined by the x86 compiler's inlining heuristic. If CONFIG_OPTIMIZE_INLINING is enabled for a different architecture and vf610_nfc_run() is not inlined, the following warning is reported: drivers/mtd/nand/raw/vf610_nfc.c: In function `vf610_nfc_cmd': drivers/mtd/nand/raw/vf610_nfc.c:455:3: warning: `offset' may be used uninitialized in this function [-Wmaybe-uninitialized] vf610_nfc_rd_from_sram(instr->ctx.data.buf.in + offset, ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ nfc->regs + NFC_MAIN_AREA(0) + offset, ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ trfr_sz, !nfc->data_access); ~~~~~~~~~~~~~~~~~~~~~~~~~~~ Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Masahiro Yamada <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Boris Brezillon <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Norris <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: David Woodhouse <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Marek Vasut <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Malaterre <[email protected]> Cc: Miquel Raynal <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: Richard Weinberger <[email protected]> Cc: Russell King <[email protected]> Cc: Stefan Agner <[email protected]> Cc: Thomas Gleixner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-05-14s390/cpacf: mark scpacf_query() as __always_inlineMasahiro Yamada1-1/+1
This prepares to move CONFIG_OPTIMIZE_INLINING from x86 to a common place. We need to eliminate potential issues beforehand. If it is enabled for s390, the following error is reported: In file included from arch/s390/crypto/des_s390.c:19: arch/s390/include/asm/cpacf.h: In function 'cpacf_query': arch/s390/include/asm/cpacf.h:170:2: warning: asm operand 3 probably doesn't match constraints asm volatile( ^~~ arch/s390/include/asm/cpacf.h:170:2: error: impossible constraint in 'asm' Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Masahiro Yamada <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Boris Brezillon <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Norris <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: David Woodhouse <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Marek Vasut <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Malaterre <[email protected]> Cc: Miquel Raynal <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: Richard Weinberger <[email protected]> Cc: Russell King <[email protected]> Cc: Stefan Agner <[email protected]> Cc: Thomas Gleixner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-05-14MIPS: mark mult_sh_align_mod() as __always_inlineMasahiro Yamada1-2/+2
This prepares to move CONFIG_OPTIMIZE_INLINING from x86 to a common place. We need to eliminate potential issues beforehand. If it is enabled for mips, the following error is reported: arch/mips/kernel/cpu-bugs64.c: In function 'mult_sh_align_mod.constprop': arch/mips/kernel/cpu-bugs64.c:33:2: error: asm operand 1 probably doesn't match constraints [-Werror] asm volatile( ^~~ arch/mips/kernel/cpu-bugs64.c:33:2: error: asm operand 1 probably doesn't match constraints [-Werror] asm volatile( ^~~ arch/mips/kernel/cpu-bugs64.c:33:2: error: impossible constraint in 'asm' asm volatile( ^~~ arch/mips/kernel/cpu-bugs64.c:33:2: error: impossible constraint in 'asm' asm volatile( ^~~ Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Masahiro Yamada <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Boris Brezillon <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Norris <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: David Woodhouse <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Marek Vasut <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Malaterre <[email protected]> Cc: Miquel Raynal <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: Richard Weinberger <[email protected]> Cc: Russell King <[email protected]> Cc: Stefan Agner <[email protected]> Cc: Thomas Gleixner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-05-14arm64: mark (__)cpus_have_const_cap as __always_inlineMasahiro Yamada1-2/+2
This prepares to move CONFIG_OPTIMIZE_INLINING from x86 to a common place. We need to eliminate potential issues beforehand. If it is enabled for arm64, the following errors are reported: In file included from include/linux/compiler_types.h:68, from <command-line>: arch/arm64/include/asm/jump_label.h: In function 'cpus_have_const_cap': include/linux/compiler-gcc.h:120:38: warning: asm operand 0 probably doesn't match constraints #define asm_volatile_goto(x...) do { asm goto(x); asm (""); } while (0) ^~~ arch/arm64/include/asm/jump_label.h:32:2: note: in expansion of macro 'asm_volatile_goto' asm_volatile_goto( ^~~~~~~~~~~~~~~~~ include/linux/compiler-gcc.h:120:38: error: impossible constraint in 'asm' #define asm_volatile_goto(x...) do { asm goto(x); asm (""); } while (0) ^~~ arch/arm64/include/asm/jump_label.h:32:2: note: in expansion of macro 'asm_volatile_goto' asm_volatile_goto( ^~~~~~~~~~~~~~~~~ Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Masahiro Yamada <[email protected]> Tested-by: Mark Rutland <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Boris Brezillon <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Norris <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: David Woodhouse <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Marek Vasut <[email protected]> Cc: Mathieu Malaterre <[email protected]> Cc: Miquel Raynal <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: Richard Weinberger <[email protected]> Cc: Russell King <[email protected]> Cc: Stefan Agner <[email protected]> Cc: Thomas Gleixner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-05-14ARM: prevent tracing IPI_CPU_BACKTRACEArnd Bergmann2-1/+6
Patch series "compiler: allow all arches to enable CONFIG_OPTIMIZE_INLINING", v3. This patch (of 11): When function tracing for IPIs is enabled, we get a warning for an overflow of the ipi_types array with the IPI_CPU_BACKTRACE type as triggered by raise_nmi(): arch/arm/kernel/smp.c: In function 'raise_nmi': arch/arm/kernel/smp.c:489:2: error: array subscript is above array bounds [-Werror=array-bounds] trace_ipi_raise(target, ipi_types[ipinr]); This is a correct warning as we actually overflow the array here. This patch raise_nmi() to call __smp_cross_call() instead of smp_cross_call(), to avoid calling into ftrace. For clarification, I'm also adding a two new code comments describing how this one is special. The warning appears to have shown up after commit e7273ff49acf ("ARM: 8488/1: Make IPI_CPU_BACKTRACE a "non-secure" SGI"), which changed the number assignment from '15' to '8', but as far as I can tell has existed since the IPI tracepoints were first introduced. If we decide to backport this patch to stable kernels, we probably need to backport e7273ff49acf as well. [[email protected]: rebase on v5.1-rc1] Link: http://lkml.kernel.org/r/[email protected] Fixes: e7273ff49acf ("ARM: 8488/1: Make IPI_CPU_BACKTRACE a "non-secure" SGI") Fixes: 365ec7b17327 ("ARM: add IPI tracepoints") # v3.17 Signed-off-by: Arnd Bergmann <[email protected]> Signed-off-by: Masahiro Yamada <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Mathieu Malaterre <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: Stefan Agner <[email protected]> Cc: Boris Brezillon <[email protected]> Cc: Miquel Raynal <[email protected]> Cc: Richard Weinberger <[email protected]> Cc: David Woodhouse <[email protected]> Cc: Brian Norris <[email protected]> Cc: Marek Vasut <[email protected]> Cc: Russell King <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Mark Rutland <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-05-14treewide: remove SPDX "WITH Linux-syscall-note" from kernel-space headersMasahiro Yamada3-3/+3
The "WITH Linux-syscall-note" should be added to headers exported to the user-space. Some kernel-space headers have "WITH Linux-syscall-note", which seems a mistake. [1] arch/x86/include/asm/hyperv-tlfs.h Commit 5a4858032217 ("x86/hyper-v: move hyperv.h out of uapi") moved this file out of uapi, but missed to update the SPDX License tag. [2] include/asm-generic/shmparam.h Commit 76ce2a80a28e ("Rename include/{uapi => }/asm-generic/shmparam.h really") moved this file out of uapi, but missed to update the SPDX License tag. [3] include/linux/qcom-geni-se.h Commit eddac5af0654 ("soc: qcom: Add GENI based QUP Wrapper driver") added this file, but I do not see a good reason why its license tag must include "WITH Linux-syscall-note". Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Masahiro Yamada <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-05-14fs/select: avoid clang stack usage warningArnd Bergmann1-0/+4
The select() implementation is carefully tuned to put a sensible amount of data on the stack for holding a copy of the user space fd_set, but not too large to risk overflowing the kernel stack. When building a 32-bit kernel with clang, we need a little more space than with gcc, which often triggers a warning: fs/select.c:619:5: error: stack frame size of 1048 bytes in function 'core_sys_select' [-Werror,-Wframe-larger-than=] int core_sys_select(int n, fd_set __user *inp, fd_set __user *outp, I experimentally found that for 32-bit ARM, reducing the maximum stack usage by 64 bytes keeps us reliably under the warning limit again. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnd Bergmann <[email protected]> Reviewed-by: Andi Kleen <[email protected]> Cc: Nick Desaulniers <[email protected]> Cc: Alexander Viro <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Eric Dumazet <[email protected]> Cc: "Darrick J. Wong" <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-05-14mm/mincore.c: make mincore() more conservativeJiri Kosina1-1/+22
The semantics of what mincore() considers to be resident is not completely clear, but Linux has always (since 2.3.52, which is when mincore() was initially done) treated it as "page is available in page cache". That's potentially a problem, as that [in]directly exposes meta-information about pagecache / memory mapping state even about memory not strictly belonging to the process executing the syscall, opening possibilities for sidechannel attacks. Change the semantics of mincore() so that it only reveals pagecache information for non-anonymous mappings that belog to files that the calling process could (if it tried to) successfully open for writing; otherwise we'd be including shared non-exclusive mappings, which - is the sidechannel - is not the usecase for mincore(), as that's primarily used for data, not (shared) text [[email protected]: v2] Link: http://lkml.kernel.org/r/[email protected] [[email protected]: restructure can_do_mincore() conditions] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Jiri Kosina <[email protected]> Signed-off-by: Vlastimil Babka <[email protected]> Acked-by: Josh Snyder <[email protected]> Acked-by: Michal Hocko <[email protected]> Originally-by: Linus Torvalds <[email protected]> Originally-by: Dominique Martinet <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Kevin Easton <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Cyril Hrubis <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Daniel Gruss <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-05-14mm: maintain randomization of page free listsDan Williams4-2/+56
When freeing a page with an order >= shuffle_page_order randomly select the front or back of the list for insertion. While the mm tries to defragment physical pages into huge pages this can tend to make the page allocator more predictable over time. Inject the front-back randomness to preserve the initial randomness established by shuffle_free_memory() when the kernel was booted. The overhead of this manipulation is constrained by only being applied for MAX_ORDER sized pages by default. [[email protected]: coding-style fixes] Link: http://lkml.kernel.org/r/154899812788.3165233.9066631950746578517.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <[email protected]> Reviewed-by: Kees Cook <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Keith Busch <[email protected]> Cc: Robert Elliott <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-05-14mm: move buddy list manipulations into helpersDan Williams5-48/+73
In preparation for runtime randomization of the zone lists, take all (well, most of) the list_*() functions in the buddy allocator and put them in helper functions. Provide a common control point for injecting additional behavior when freeing pages. [[email protected]: fix buddy list helpers] Link: http://lkml.kernel.org/r/155033679702.1773410.13041474192173212653.stgit@dwillia2-desk3.amr.corp.intel.com [[email protected]: remove del_page_from_free_area() migratetype parameter] Link: http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/154899812264.3165233.5219320056406926223.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <[email protected]> Signed-off-by: Vlastimil Babka <[email protected]> Tested-by: Tetsuo Handa <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Kees Cook <[email protected]> Cc: Keith Busch <[email protected]> Cc: Robert Elliott <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-05-14mm: shuffle initial free memory to improve memory-side-cache utilizationDan Williams9-2/+302
Patch series "mm: Randomize free memory", v10. This patch (of 3): Randomization of the page allocator improves the average utilization of a direct-mapped memory-side-cache. Memory side caching is a platform capability that Linux has been previously exposed to in HPC (high-performance computing) environments on specialty platforms. In that instance it was a smaller pool of high-bandwidth-memory relative to higher-capacity / lower-bandwidth DRAM. Now, this capability is going to be found on general purpose server platforms where DRAM is a cache in front of higher latency persistent memory [1]. Robert offered an explanation of the state of the art of Linux interactions with memory-side-caches [2], and I copy it here: It's been a problem in the HPC space: http://www.nersc.gov/research-and-development/knl-cache-mode-performance-coe/ A kernel module called zonesort is available to try to help: https://software.intel.com/en-us/articles/xeon-phi-software and this abandoned patch series proposed that for the kernel: https://lkml.kernel.org/r/[email protected] Dan's patch series doesn't attempt to ensure buffers won't conflict, but also reduces the chance that the buffers will. This will make performance more consistent, albeit slower than "optimal" (which is near impossible to attain in a general-purpose kernel). That's better than forcing users to deploy remedies like: "To eliminate this gradual degradation, we have added a Stream measurement to the Node Health Check that follows each job; nodes are rebooted whenever their measured memory bandwidth falls below 300 GB/s." A replacement for zonesort was merged upstream in commit cc9aec03e58f ("x86/numa_emulation: Introduce uniform split capability"). With this numa_emulation capability, memory can be split into cache sized ("near-memory" sized) numa nodes. A bind operation to such a node, and disabling workloads on other nodes, enables full cache performance. However, once the workload exceeds the cache size then cache conflicts are unavoidable. While HPC environments might be able to tolerate time-scheduling of cache sized workloads, for general purpose server platforms, the oversubscribed cache case will be the common case. The worst case scenario is that a server system owner benchmarks a workload at boot with an un-contended cache only to see that performance degrade over time, even below the average cache performance due to excessive conflicts. Randomization clips the peaks and fills in the valleys of cache utilization to yield steady average performance. Here are some performance impact details of the patches: 1/ An Intel internal synthetic memory bandwidth measurement tool, saw a 3X speedup in a contrived case that tries to force cache conflicts. The contrived cased used the numa_emulation capability to force an instance of the benchmark to be run in two of the near-memory sized numa nodes. If both instances were placed on the same emulated they would fit and cause zero conflicts. While on separate emulated nodes without randomization they underutilized the cache and conflicted unnecessarily due to the in-order allocation per node. 2/ A well known Java server application benchmark was run with a heap size that exceeded cache size by 3X. The cache conflict rate was 8% for the first run and degraded to 21% after page allocator aging. With randomization enabled the rate levelled out at 11%. 3/ A MongoDB workload did not observe measurable difference in cache-conflict rates, but the overall throughput dropped by 7% with randomization in one case. 4/ Mel Gorman ran his suite of performance workloads with randomization enabled on platforms without a memory-side-cache and saw a mix of some improvements and some losses [3]. While there is potentially significant improvement for applications that depend on low latency access across a wide working-set, the performance may be negligible to negative for other workloads. For this reason the shuffle capability defaults to off unless a direct-mapped memory-side-cache is detected. Even then, the page_alloc.shuffle=0 parameter can be specified to disable the randomization on those systems. Outside of memory-side-cache utilization concerns there is potentially security benefit from randomization. Some data exfiltration and return-oriented-programming attacks rely on the ability to infer the location of sensitive data objects. The kernel page allocator, especially early in system boot, has predictable first-in-first out behavior for physical pages. Pages are freed in physical address order when first onlined. Quoting Kees: "While we already have a base-address randomization (CONFIG_RANDOMIZE_MEMORY), attacks against the same hardware and memory layouts would certainly be using the predictability of allocation ordering (i.e. for attacks where the base address isn't important: only the relative positions between allocated memory). This is common in lots of heap-style attacks. They try to gain control over ordering by spraying allocations, etc. I'd really like to see this because it gives us something similar to CONFIG_SLAB_FREELIST_RANDOM but for the page allocator." While SLAB_FREELIST_RANDOM reduces the predictability of some local slab caches it leaves vast bulk of memory to be predictably in order allocated. However, it should be noted, the concrete security benefits are hard to quantify, and no known CVE is mitigated by this randomization. Introduce shuffle_free_memory(), and its helper shuffle_zone(), to perform a Fisher-Yates shuffle of the page allocator 'free_area' lists when they are initially populated with free memory at boot and at hotplug time. Do this based on either the presence of a page_alloc.shuffle=Y command line parameter, or autodetection of a memory-side-cache (to be added in a follow-on patch). The shuffling is done in terms of CONFIG_SHUFFLE_PAGE_ORDER sized free pages where the default CONFIG_SHUFFLE_PAGE_ORDER is MAX_ORDER-1 i.e. 10, 4MB this trades off randomization granularity for time spent shuffling. MAX_ORDER-1 was chosen to be minimally invasive to the page allocator while still showing memory-side cache behavior improvements, and the expectation that the security implications of finer granularity randomization is mitigated by CONFIG_SLAB_FREELIST_RANDOM. The performance impact of the shuffling appears to be in the noise compared to other memory initialization work. This initial randomization can be undone over time so a follow-on patch is introduced to inject entropy on page free decisions. It is reasonable to ask if the page free entropy is sufficient, but it is not enough due to the in-order initial freeing of pages. At the start of that process putting page1 in front or behind page0 still keeps them close together, page2 is still near page1 and has a high chance of being adjacent. As more pages are added ordering diversity improves, but there is still high page locality for the low address pages and this leads to no significant impact to the cache conflict rate. [1]: https://itpeernetwork.intel.com/intel-optane-dc-persistent-memory-operating-modes/ [2]: https://lkml.kernel.org/r/AT5PR8401MB1169D656C8B5E121752FC0F8AB120@AT5PR8401MB1169.NAMPRD84.PROD.OUTLOOK.COM [3]: https://lkml.org/lkml/2018/10/12/309 [[email protected]: fix shuffle enable] Link: http://lkml.kernel.org/r/154943713038.3858443.4125180191382062871.stgit@dwillia2-desk3.amr.corp.intel.com [[email protected]: fix SHUFFLE_PAGE_ALLOCATOR help texts] Link: http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/154899811738.3165233.12325692939590944259.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <[email protected]> Signed-off-by: Qian Cai <[email protected]> Reviewed-by: Kees Cook <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Keith Busch <[email protected]> Cc: Robert Elliott <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-05-14mm/vmalloc.c: convert vmap_lazy_nr to atomic_long_tUladzislau Rezki (Sony)1-10/+10
vmap_lazy_nr variable has atomic_t type that is 4 bytes integer value on both 32 and 64 bit systems. lazy_max_pages() deals with "unsigned long" that is 8 bytes on 64 bit system, thus vmap_lazy_nr should be 8 bytes on 64 bit as well. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Uladzislau Rezki (Sony) <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Reviewed-by: William Kucharski <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Thomas Garnier <[email protected]> Cc: Oleksiy Avramchenko <[email protected]> Cc: Joel Fernandes <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-05-14mm/vmalloc.c: add priority threshold to __purge_vmap_area_lazy()Uladzislau Rezki (Sony)1-6/+12
Commit 763b218ddfaf ("mm: add preempt points into __purge_vmap_area_lazy()") introduced some preempt points, one of those is making an allocation more prioritized over lazy free of vmap areas. Prioritizing an allocation over freeing does not work well all the time, i.e. it should be rather a compromise. 1) Number of lazy pages directly influences the busy list length thus on operations like: allocation, lookup, unmap, remove, etc. 2) Under heavy stress of vmalloc subsystem I run into a situation when memory usage gets increased hitting out_of_memory -> panic state due to completely blocking of logic that frees vmap areas in the __purge_vmap_area_lazy() function. Establish a threshold passing which the freeing is prioritized back over allocation creating a balance between each other. Using vmalloc test driver in "stress mode", i.e. When all available test cases are run simultaneously on all online CPUs applying a pressure on the vmalloc subsystem, my HiKey 960 board runs out of memory due to the fact that __purge_vmap_area_lazy() logic simply is not able to free pages in time. How I run it: 1) You should build your kernel with CONFIG_TEST_VMALLOC=m 2) ./tools/testing/selftests/vm/test_vmalloc.sh stress During this test "vmap_lazy_nr" pages will go far beyond acceptable lazy_max_pages() threshold, that will lead to enormous busy list size and other problems including allocation time and so on. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Uladzislau Rezki (Sony) <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Thomas Garnier <[email protected]> Cc: Oleksiy Avramchenko <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Joel Fernandes <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Joel Fernandes <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-05-14kernel/sched/psi.c: expose pressure metrics on root cgroupDan Schatzberg3-7/+14
Pressure metrics are already recorded and exposed in procfs for the entire system, but any tool which monitors cgroup pressure has to special case the root cgroup to read from procfs. This patch exposes the already recorded pressure metrics on the root cgroup. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Dan Schatzberg <[email protected]> Acked-by: Johannes Weiner <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Li Zefan <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Peter Zijlstra <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-05-14psi: introduce psi monitorSuren Baghdasaryan5-20/+742
Psi monitor aims to provide a low-latency short-term pressure detection mechanism configurable by users. It allows users to monitor psi metrics growth and trigger events whenever a metric raises above user-defined threshold within user-defined time window. Time window and threshold are both expressed in usecs. Multiple psi resources with different thresholds and window sizes can be monitored concurrently. Psi monitors activate when system enters stall state for the monitored psi metric and deactivate upon exit from the stall state. While system is in the stall state psi signal growth is monitored at a rate of 10 times per tracking window. Min window size is 500ms, therefore the min monitoring interval is 50ms. Max window size is 10s with monitoring interval of 1s. When activated psi monitor stays active for at least the duration of one tracking window to avoid repeated activations/deactivations when psi signal is bouncing. Notifications to the users are rate-limited to one per tracking window. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Suren Baghdasaryan <[email protected]> Signed-off-by: Johannes Weiner <[email protected]> Cc: Dennis Zhou <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Li Zefan <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Tejun Heo <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-05-14include/: refactor headers to allow kthread.h inclusion in psi_types.hSuren Baghdasaryan4-2/+4
kthread.h can't be included in psi_types.h because it creates a circular inclusion with kthread.h eventually including psi_types.h and complaining on kthread structures not being defined because they are defined further in the kthread.h. Resolve this by removing psi_types.h inclusion from the headers included from kthread.h. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Suren Baghdasaryan <[email protected]> Acked-by: Johannes Weiner <[email protected]> Cc: Dennis Zhou <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Li Zefan <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Tejun Heo <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>