aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2015-11-06arch/x86/kernel/cpu/perf_event_msr.c: use sign_extend64() for sign extensionMartin Kepplinger1-4/+3
Signed-off-by: Martin Kepplinger <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: George Spelvin <[email protected]> Cc: Rasmus Villemoes <[email protected]> Cc: Maxime Coquelin <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: Yury Norov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-06arch/sh/kernel/traps_64.c: use sign_extend64() for sign extensionMartin Kepplinger2-2/+2
Signed-off-by: Martin Kepplinger <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: George Spelvin <[email protected]> Cc: Rasmus Villemoes <[email protected]> Cc: Maxime Coquelin <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: Yury Norov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-06bitops.h: add sign_extend64()Martin Kepplinger1-0/+11
Months back, this was discussed, see https://lkml.org/lkml/2015/1/18/289 The result was the 64-bit version being "likely fine", "valuable" and "correct". The discussion fell asleep but since there are possible users, let's add it. Signed-off-by: Martin Kepplinger <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: George Spelvin <[email protected]> Cc: Rasmus Villemoes <[email protected]> Cc: Maxime Coquelin <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: Yury Norov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-06bitops.h: improve sign_extend32()'s documentationMartin Kepplinger1-0/+2
It is often overlooked that sign_extend32(), despite its name, is safe to use for 16 and 8 bit types as well. This should help prevent sign extension being done manually some other way. Signed-off-by: Martin Kepplinger <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: George Spelvin <[email protected]> Cc: Rasmus Villemoes <[email protected]> Cc: Maxime Coquelin <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: Yury Norov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-06MAINTAINERS: add missing extcon directoryChanwoo Choi1-0/+3
Add the missing extcon directory to maintain them. When using get_maintainer.pl, the result should include the correct maintainer information. Signed-off-by: Chanwoo Choi <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-06get_maintainer: add subsystem to reviewer outputJoe Perches1-15/+16
Reviewer output currently does not include the subsystem that matched. Add it. Miscellanea: o Add a get_subsystem_name routine to centralize this Signed-off-by: Joe Perches <[email protected]> Tested-by: Krzysztof Kozlowski <[email protected]> Cc: Lee Jones <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-06get_maintainer: --r (list reviewer) is on by defaultBrian Norris1-1/+1
We don't consistenly document the default value next to the option listing, but we do have a list of defaults here, so let's keep it up to date. Signed-off-by: Brian Norris <[email protected]> Signed-off-by: Joe Perches <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-06get_maintainer: add --no-foo options to --helpBrian Norris1-0/+3
Many flag options are boolean and support both a positive and a negative invocation from the command line. Some of these are even mentioned by example (e.g., --nogit is mentioned as a default option), but they aren't explicitly mentioned in the list of options. It happens that some of these are pretty important, as they are default-on, and to turn them off, you have to know about the --no-foo version. Rather than clutter the whole help text with bracketed '--[no]foo', let's just mention the general rule, a la 'man gcc'. Signed-off-by: Brian Norris <[email protected]> Signed-off-by: Joe Perches <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-06get_maintainer: it's '--pattern-depth', not '-pattern-depth'Brian Norris1-1/+1
Though it appears that Perl's GetOptions will take either, the latter is not documented in the options listing. Signed-off-by: Brian Norris <[email protected]> Signed-off-by: Joe Perches <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-06get_maintainer: add missing documentation for --git-blame-signaturesBrian Norris1-0/+1
I really haven't used this option much myself, so feel free to improve on the documentation for it. I just noticed it while inspecting this script for undocumented features. Signed-off-by: Brian Norris <[email protected]> Signed-off-by: Joe Perches <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-06printk: prevent userland from spoofing kernel messagesMathias Krause1-5/+8
The following statement of ABI/testing/dev-kmsg is not quite right: It is not possible to inject messages from userspace with the facility number LOG_KERN (0), to make sure that the origin of the messages can always be reliably determined. Userland actually can inject messages with a facility of 0 by abusing the fact that the facility is stored in a u8 data type. By using a facility which is a multiple of 256 the assignment of msg->facility in log_store() implicitly truncates it to 0, i.e. LOG_KERN, allowing users of /dev/kmsg to spoof kernel messages as shown below: The following call... # printf '<%d>Kernel panic - not syncing: beer empty\n' 0 >/dev/kmsg ...leads to the following log entry (dmesg -x | tail -n 1): user :emerg : [ 66.137758] Kernel panic - not syncing: beer empty However, this call... # printf '<%d>Kernel panic - not syncing: beer empty\n' 0x800 >/dev/kmsg ...leads to the slightly different log entry (note the kernel facility): kern :emerg : [ 74.177343] Kernel panic - not syncing: beer empty Fix that by limiting the user provided facility to 8 bit right from the beginning and catch the truncation early. Fixes: 7ff9554bb578 ("printk: convert byte-buffer to variable-length...") Signed-off-by: Mathias Krause <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Petr Mladek <[email protected]> Cc: Alex Elder <[email protected]> Cc: Joe Perches <[email protected]> Cc: Kay Sievers <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-06lib/vsprintf.c: update documentationRasmus Villemoes2-9/+10
%n is no longer just ignored; it results in early return from vsnprintf. Also add a request to add test cases for future %p extensions. Signed-off-by: Rasmus Villemoes <[email protected]> Reviewed-by: Martin Kletzander <[email protected]> Reviewed-by: Andy Shevchenko <[email protected]> Cc: Jonathan Corbet <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-06selftests: run lib/test_printf moduleKees Cook3-0/+19
This runs the lib/test_printf module to make sure printf is operating sanely. Signed-off-by: Kees Cook <[email protected]> Cc: Rasmus Villemoes <[email protected]> Cc: Shuah Khan <[email protected]> Cc: Andy Shevchenko <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-06test_printf: test printf family at runtimeRasmus Villemoes3-0/+366
This adds a simple module for testing the kernel's printf facilities. Previously, some %p extensions have caused a wrong return value in case the entire output didn't fit and/or been unusable in kasprintf(). This should help catch such issues. Also, it should help ensure that changes to the formatting algorithms don't break anything. I'm not sure if we have a struct dentry or struct file lying around at boot time or if we can fake one, but most %p extensions should be testable, as should the ordinary number and string formatting. The nature of vararg functions means we can't use a more conventional table-driven approach. For now, this is mostly a skeleton; contributions are very welcome. Some tests are/will be slightly annoying to write, since the expected output depends on stuff like CONFIG_*, sizeof(long), runtime values etc. Signed-off-by: Rasmus Villemoes <[email protected]> Reviewed-by: Kees Cook <[email protected]> Cc: Andy Shevchenko <[email protected]> Cc: Martin Kletzander <[email protected]> Cc: Rasmus Villemoes <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-06lib/vsprintf.c: remove SPECIAL handling in pointer()Rasmus Villemoes1-1/+1
As a quick git grep -E '%[ +0#-]*#[ +0#-]*(\*|[0-9]+)?(\.(\*|[0-9]+)?)?p' shows, nobody uses the # flag with %p. Should one try to do so, one will be met with warning: `#' flag used with `%p' gnu_printf format [-Wformat] (POSIX and C99 both say "... For other conversion specifiers, the behavior is undefined.". Obviously, the kernel can choose to define the behaviour however it wants, but as long as gcc issues that warning, users are unlikely to show up.) Since default_width is effectively always 2*sizeof(void*), we can simplify the prologue of pointer() and save a few instructions. Signed-off-by: Rasmus Villemoes <[email protected]> Reviewed-by: Andy Shevchenko <[email protected]> Acked-by: Kees Cook <[email protected]> Cc: Martin Kletzander <[email protected]> Cc: Rasmus Villemoes <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-06lib/vsprintf.c: also improve sanity check in bstr_printf()Rasmus Villemoes1-1/+1
Quoting from 2aa2f9e21e4e ("lib/vsprintf.c: improve sanity check in vsnprintf()"): On 64 bit, size may very well be huge even if bit 31 happens to be 0. Somehow it doesn't feel right that one can pass a 5 GiB buffer but not a 3 GiB one. So cap at INT_MAX as was probably the intention all along. This is also the made-up value passed by sprintf and vsprintf. I should have seen this copy-pasted instance back then, but let's just do it now. Signed-off-by: Rasmus Villemoes <[email protected]> Reviewed-by: Andy Shevchenko <[email protected]> Acked-by: Kees Cook <[email protected]> Cc: Martin Kletzander <[email protected]> Cc: Rasmus Villemoes <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-06lib/vsprintf.c: handle invalid format specifiers more robustlyRasmus Villemoes1-10/+21
If we meet any invalid or unsupported format specifier, 'handling' it by just printing it as a literal string is not safe: Presumably the format string and the arguments passed gcc's type checking, but that means something like sprintf(buf, "%n %pd", &intvar, dentry) would end up interpreting &intvar as a struct dentry*. When the offending specifier was %n it used to be at the end of the format string, but we can't rely on that always being the case. Also, gcc doesn't complain about some more or less exotic qualifiers (or 'length modifiers' in posix-speak) such as 'j' or 'q', but being unrecognized by the kernel's printf implementation, they'd be interpreted as unknown specifiers, and the rest of arguments would be interpreted wrongly. So let's complain about anything we don't understand, not just %n, and stop pretending that we'd be able to make sense of the rest of the format/arguments. If the offending specifier is in a printk() call we unfortunately only get a "BUG: recent printk recursion!", but at least direct users of the sprintf family will be caught. Signed-off-by: Rasmus Villemoes <[email protected]> Reviewed-by: Andy Shevchenko <[email protected]> Acked-by: Kees Cook <[email protected]> Cc: Martin Kletzander <[email protected]> Cc: Rasmus Villemoes <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-06printk: synchronize %p formatting documentationMartin Kletzander2-32/+37
Move all pointer-formatting documentation to one place in the code and one place in the documentation instead of keeping it in three places with different level of completeness. Documentation/printk-formats.txt has detailed information about each modifier, docstring above pointer() has short descriptions of them (as that is the function dealing with %p) and docstring above vsprintf() is removed as redundant. Both docstrings in the code that were modified are updated with a reminder of updating the documentation upon any further change. [[email protected]: fix comment] Signed-off-by: Martin Kletzander <[email protected]> Reviewed-by: Andy Shevchenko <[email protected]> Cc: Rasmus Villemoes <[email protected]> Cc: Jonathan Corbet <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-06lib/dynamic_debug.c: use kstrdup_constRasmus Villemoes1-4/+4
Using kstrdup_const, thus reusing .rodata when possible, saves around 2 kB of runtime memory on my laptop/.config combination. Signed-off-by: Rasmus Villemoes <[email protected]> Cc: Jason Baron <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-06fs/jffs2/wbuf.c: remove stray semicolonAndrew Morton1-1/+1
Reported-by: Wu Fengguang <[email protected]> Cc: Sasha Levin <[email protected]> Cc: David Woodhouse <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-06include/linux/compiler-gcc.h: improve __visible documentationAndrew Morton1-1/+4
Cc: Andi Kleen <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-06proc: actually make proc_fd_permission() thread-friendlyOleg Nesterov1-3/+11
The commit 96d0df79f264 ("proc: make proc_fd_permission() thread-friendly") fixed the access to /proc/self/fd from sub-threads, but introduced another problem: a sub-thread can't access /proc/<tid>/fd/ or /proc/thread-self/fd if generic_permission() fails. Change proc_fd_permission() to check same_thread_group(pid_task(), current). Fixes: 96d0df79f264 ("proc: make proc_fd_permission() thread-friendly") Reported-by: "Jin, Yihua" <[email protected]> Signed-off-by: Oleg Nesterov <[email protected]> Cc: "Eric W. Biederman" <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-06fs/proc/array.c: set overflow flag in case of errorAndy Shevchenko1-5/+5
For now in task_name() we ignore the return code of string_escape_str() call. This is not good if buffer suddenly becomes not big enough. Do the proper error handling there. Signed-off-by: Andy Shevchenko <[email protected]> Cc: Alexander Viro <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-06mm: use 'unsigned int' for compound_dtor/compound_order on 64BITKirill A. Shutemov1-0/+11
On 64 bit system we have enough space in struct page to encode compound_dtor and compound_order with unsigned int. On x86-64 it leads to slightly smaller code size due usesage of plain MOV instead of MOVZX (zero-extended move) or similar effect. allyesconfig: text data bss dec hex filename 159520446 48146736 72196096 279863278 10ae5fee vmlinux.pre 159520382 48146736 72196096 279863214 10ae5fae vmlinux.post On other architectures without native support of 16-bit data types the Signed-off-by: Kirill A. Shutemov <[email protected]> Acked-by: Michal Hocko <[email protected]> Reviewed-by: Andrea Arcangeli <[email protected]> Cc: "Paul E. McKenney" <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Aneesh Kumar K.V <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: David Rientjes <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Sergey Senozhatsky <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-06mm: use 'unsigned int' for page orderKirill A. Shutemov5-27/+32
Let's try to be consistent about data type of page order. [[email protected]: fix build (type of pageblock_order)] [[email protected]: some configs end up with MAX_ORDER and pageblock_order having different types] Signed-off-by: Kirill A. Shutemov <[email protected]> Acked-by: Michal Hocko <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Reviewed-by: Andrea Arcangeli <[email protected]> Cc: "Paul E. McKenney" <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Aneesh Kumar K.V <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: David Rientjes <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Sergey Senozhatsky <[email protected]> Signed-off-by: Stephen Rothwell <[email protected]> Signed-off-by: Hugh Dickins <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-06mm: make compound_head() robustKirill A. Shutemov15-175/+82
Hugh has pointed that compound_head() call can be unsafe in some context. There's one example: CPU0 CPU1 isolate_migratepages_block() page_count() compound_head() !!PageTail() == true put_page() tail->first_page = NULL head = tail->first_page alloc_pages(__GFP_COMP) prep_compound_page() tail->first_page = head __SetPageTail(p); !!PageTail() == true <head == NULL dereferencing> The race is pure theoretical. I don't it's possible to trigger it in practice. But who knows. We can fix the race by changing how encode PageTail() and compound_head() within struct page to be able to update them in one shot. The patch introduces page->compound_head into third double word block in front of compound_dtor and compound_order. Bit 0 encodes PageTail() and the rest bits are pointer to head page if bit zero is set. The patch moves page->pmd_huge_pte out of word, just in case if an architecture defines pgtable_t into something what can have the bit 0 set. hugetlb_cgroup uses page->lru.next in the second tail page to store pointer struct hugetlb_cgroup. The patch switch it to use page->private in the second tail page instead. The space is free since ->first_page is removed from the union. The patch also opens possibility to remove HUGETLB_CGROUP_MIN_ORDER limitation, since there's now space in first tail page to store struct hugetlb_cgroup pointer. But that's out of scope of the patch. That means page->compound_head shares storage space with: - page->lru.next; - page->next; - page->rcu_head.next; That's too long list to be absolutely sure, but looks like nobody uses bit 0 of the word. page->rcu_head.next guaranteed[1] to have bit 0 clean as long as we use call_rcu(), call_rcu_bh(), call_rcu_sched(), or call_srcu(). But future call_rcu_lazy() is not allowed as it makes use of the bit and we can get false positive PageTail(). [1] http://lkml.kernel.org/g/[email protected] Signed-off-by: Kirill A. Shutemov <[email protected]> Acked-by: Michal Hocko <[email protected]> Reviewed-by: Andrea Arcangeli <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: David Rientjes <[email protected]> Cc: Vlastimil Babka <[email protected]> Acked-by: Paul E. McKenney <[email protected]> Cc: Aneesh Kumar K.V <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Sergey Senozhatsky <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-06mm: pack compound_dtor and compound_order into one word in struct pageKirill A. Shutemov4-14/+35
The patch halves space occupied by compound_dtor and compound_order in struct page. For compound_order, it's trivial long -> short conversion. For get_compound_page_dtor(), we now use hardcoded table for destructor lookup and store its index in the struct page instead of direct pointer to destructor. It shouldn't be a big trouble to maintain the table: we have only two destructor and NULL currently. This patch free up one word in tail pages for reuse. This is preparation for the next patch. Signed-off-by: Kirill A. Shutemov <[email protected]> Reviewed-by: Michal Hocko <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Reviewed-by: Andrea Arcangeli <[email protected]> Cc: "Paul E. McKenney" <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Aneesh Kumar K.V <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: David Rientjes <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Sergey Senozhatsky <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-06zsmalloc: use page->private instead of page->first_pageKirill A. Shutemov1-6/+5
We are going to rework how compound_head() work. It will not use page->first_page as we have it now. The only other user of page->first_page beyond compound pages is zsmalloc. Let's use page->private instead of page->first_page here. It occupies the same storage space. Signed-off-by: Kirill A. Shutemov <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Reviewed-by: Sergey Senozhatsky <[email protected]> Reviewed-by: Andrea Arcangeli <[email protected]> Cc: "Paul E. McKenney" <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Aneesh Kumar K.V <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: David Rientjes <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Michal Hocko <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-06slab, slub: use page->rcu_head instead of page->lru plus castKirill A. Shutemov2-18/+4
We have properly typed page->rcu_head, no need to cast page->lru. Signed-off-by: Kirill A. Shutemov <[email protected]> Reviewed-by: Andrea Arcangeli <[email protected]> Acked-by: Christoph Lameter <[email protected]> Cc: "Paul E. McKenney" <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Aneesh Kumar K.V <[email protected]> Cc: David Rientjes <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Sergey Senozhatsky <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-06mm: drop page->slab_pageKirill A. Shutemov1-1/+0
Since 8456a648cf44 ("slab: use struct page for slab management") nobody uses slab_page field in struct page. Let's drop it. Signed-off-by: Kirill A. Shutemov <[email protected]> Acked-by: Christoph Lameter <[email protected]> Acked-by: David Rientjes <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Reviewed-by: Andrea Arcangeli <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Andi Kleen <[email protected]> Cc: "Paul E. McKenney" <[email protected]> Cc: Aneesh Kumar K.V <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Sergey Senozhatsky <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-06zsmalloc: reduce size_class memory usageSergey Senozhatsky1-4/+13
Each `struct size_class' contains `struct zs_size_stat': an array of NR_ZS_STAT_TYPE `unsigned long'. For zsmalloc built with no CONFIG_ZSMALLOC_STAT this results in a waste of `2 * sizeof(unsigned long)' per-class. The patch removes unneeded `struct zs_size_stat' members by redefining NR_ZS_STAT_TYPE (max stat idx in array). Since both NR_ZS_STAT_TYPE and zs_stat_type are compile time constants, GCC can eliminate zs_stat_inc()/zs_stat_dec() calls that use zs_stat_type larger than NR_ZS_STAT_TYPE: CLASS_ALMOST_EMPTY and CLASS_ALMOST_FULL at the moment. ./scripts/bloat-o-meter mm/zsmalloc.o.old mm/zsmalloc.o.new add/remove: 0/0 grow/shrink: 0/3 up/down: 0/-39 (-39) function old new delta fix_fullness_group 97 94 -3 insert_zspage 100 86 -14 remove_zspage 141 119 -22 To summarize: a) each class now uses less memory b) we avoid a number of dec/inc stats (a minor optimization, but still). The gain will increase once we introduce additional stats. A simple IO test. iozone -t 4 -R -r 32K -s 60M -I +Z patched base " Initial write " 4145599.06 4127509.75 " Rewrite " 4146225.94 4223618.50 " Read " 17157606.00 17211329.50 " Re-read " 17380428.00 17267650.50 " Reverse Read " 16742768.00 16162732.75 " Stride read " 16586245.75 16073934.25 " Random read " 16349587.50 15799401.75 " Mixed workload " 10344230.62 9775551.50 " Random write " 4277700.62 4260019.69 " Pwrite " 4302049.12 4313703.88 " Pread " 6164463.16 6126536.72 " Fwrite " 7131195.00 6952586.00 " Fread " 12682602.25 12619207.50 Signed-off-by: Sergey Senozhatsky <[email protected]> Cc: Minchan Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-06mm/zsmalloc.c: remove useless line in obj_free()Hui Zhu1-3/+0
Signed-off-by: Hui Zhu <[email protected]> Reviewed-by: Sergey Senozhatsky <[email protected]> Cc: Minchan Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-06zsmalloc: don't test shrinker_enabled in zs_shrinker_count()Sergey Senozhatsky1-3/+0
We don't let user to disable shrinker in zsmalloc (once it's been enabled), so no need to check ->shrinker_enabled in zs_shrinker_count(), at the moment at least. Signed-off-by: Sergey Senozhatsky <[email protected]> Acked-by: Minchan Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-06zsmalloc: use preempt.h for in_interrupt()Sergey Senozhatsky1-1/+1
A cosmetic change. Commit c60369f01125 ("staging: zsmalloc: prevent mappping in interrupt context") added in_interrupt() check to zs_map_object() and 'hardirq.h' include; but in_interrupt() macro is defined in 'preempt.h' not in 'hardirq.h', so include it instead. Signed-off-by: Sergey Senozhatsky <[email protected]> Acked-by: Minchan Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-06zsmalloc: fix obj_to_head use page_private(page) as value but not pointerHui Zhu1-1/+1
In obj_malloc(): if (!class->huge) /* record handle in the header of allocated chunk */ link->handle = handle; else /* record handle in first_page->private */ set_page_private(first_page, handle); In the hugepage we save handle to private directly. But in obj_to_head(): if (class->huge) { VM_BUG_ON(!is_first_page(page)); return *(unsigned long *)page_private(page); } else return *(unsigned long *)obj; It is used as a pointer. The reason why there is no problem until now is huge-class page is born with ZS_FULL so it can't be migrated. However, we need this patch for future work: "VM-aware zsmalloced page migration" to reduce external fragmentation. Signed-off-by: Hui Zhu <[email protected]> Acked-by: Minchan Kim <[email protected]> Cc: Sergey Senozhatsky <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-06zsmalloc: add comments for ->inuse to zspageHui Zhu1-0/+1
[[email protected]: fix grammar] Signed-off-by: Hui Zhu <[email protected]> Reviewed-by: Sergey Senozhatsky <[email protected]> Cc: Dan Streetman <[email protected]> Cc: Minchan Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-06mm: zsmalloc: constify struct zs_pool nameSergey SENOZHATSKY5-11/+13
Constify `struct zs_pool' ->name. [[email protected]: constify zpool_create_pool()'s `type' arg also] Signed-off-by: Sergey Senozhatsky <[email protected]> Acked-by: Dan Streetman <[email protected]> Cc: Minchan Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-06zpool: remove redundant zpool->type string, const-ify zpool_get_typeDan Streetman2-7/+9
Make the return type of zpool_get_type const; the string belongs to the zpool driver and should not be modified. Remove the redundant type field in the struct zpool; it is private to zpool.c and isn't needed since ->driver->type can be used directly. Add comments indicating strings must be null-terminated. Signed-off-by: Dan Streetman <[email protected]> Cc: Sergey Senozhatsky <[email protected]> Cc: Seth Jennings <[email protected]> Cc: Minchan Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-06zswap: use charp for zswap param stringsDan Streetman1-40/+40
Instead of using a fixed-length string for the zswap params, use charp. This simplifies the code and uses less memory, as most zswap param strings will be less than the current maximum length. Signed-off-by: Dan Streetman <[email protected]> Cc: Rusty Russell <[email protected]> Cc: Seth Jennings <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-06module: export param_free_charp()Dan Streetman2-1/+3
Change the param_free_charp() function from static to exported. It is used by zswap in the next patch ("zswap: use charp for zswap param strings"). Signed-off-by: Dan Streetman <[email protected]> Acked-by: Rusty Russell <[email protected]> Cc: Seth Jennings <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-06mm/zswap.c: remove unneeded initialization to NULL in zswap_entry_find_get()Alexey Klimov1-1/+1
On the next line entry variable will be re-initialized so no need to init it with NULL. Signed-off-by: Alexey Klimov <[email protected]> Cc: Seth Jennings <[email protected]> Cc: Dan Streetman <[email protected]> Cc: Minchan Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-06zram: make is_partial_io/valid_io_request/page_zero_filled return booleanGeliang Tang1-9/+9
Make is_partial_io()/valid_io_request()/page_zero_filled() return boolean, since each function only uses either one or zero as its return value. Signed-off-by: Geliang Tang <[email protected]> Reviewed-by: Sergey Senozhatsky <[email protected]> Cc: Minchan Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-06zram: keep the exact overcommited value in mem_used_maxSergey SENOZHATSKY1-2/+2
`mem_used_max' is designed to store the max amount of memory zram consumed to store the data. However, it does not represent the actual 'overcommited' (max) value. The existing code goes to -ENOMEM overcommited case before it updates `->stats.max_used_pages', which hides the reason we went to -ENOMEM in the first place -- we actually used more memory than `->limit_pages': alloced_pages = zs_get_total_pages(meta->mem_pool); if (zram->limit_pages && alloced_pages > zram->limit_pages) { zs_free(meta->mem_pool, handle); ret = -ENOMEM; goto out; } update_used_max(zram, alloced_pages); Which is misleading. User will see -ENOMEM, check `->limit_pages', check `->stats.max_used_pages', which will keep the value BEFORE zram passed `->limit_pages', and see: `->stats.max_used_pages' < `->limit_pages' Move update_used_max() before we do `->limit_pages' check, so that user will see: `->stats.max_used_pages' > `->limit_pages' should the overcommit and -ENOMEM happen. Signed-off-by: Sergey Senozhatsky <[email protected]> Acked-by: Minchan Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-06zram: introduce comp algorithm fallback functionalityLuis Henriques1-3/+3
When the user supplies an unsupported compression algorithm, keep the previously selected one (knowingly supported) or the default one (if the compression algorithm hasn't been changed yet). Note that previously this operation (i.e. setting an invalid algorithm) would result in no algorithm being selected, which means that this represents a small change in the default behaviour. Minchan said: For initializing zram, we need to set up 3 optional parameters in advance. 1. the number of compression streams 2. memory limitation 3. compression algorithm Although user pass completely wrong value to set up for 1 and 2 parameters, it's okay because they have default value so zram will be initialized with the default value (of course, when user passes a wrong value via *echo*, sysfs returns -EINVAL so the user can notice it). But 3 is not consistent with other optional parameters. IOW, if the user passes a wrong value to set up 3 parameter, zram's initialization would fail unlike other optional parameters. So this patch makes them consistent. Signed-off-by: Luis Henriques <[email protected]> Acked-by: Minchan Kim <[email protected]> Acked-by: Sergey Senozhatsky <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-06mm/memcontrol.c: uninline mem_cgroup_usageAndrew Morton1-1/+1
gcc version 5.2.1 20151010 (Debian 5.2.1-22) $ size mm/memcontrol.o mm/memcontrol.o.before text data bss dec hex filename 35535 7908 64 43507 a9f3 mm/memcontrol.o 35762 7908 64 43734 aad6 mm/memcontrol.o.before Cc: Michal Hocko <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-06fs/sync.c: make sync_file_range(2) use WB_SYNC_NONE writebackJan Kara1-1/+2
sync_file_range(2) is documented to issue writeback only for pages that are not currently being written. After all the system call has been created for userspace to be able to issue background writeout and so waiting for in-flight IO is undesirable there. However commit ee53a891f474 ("mm: do_sync_mapping_range integrity fix") switched do_sync_mapping_range() and thus sync_file_range() to issue writeback in WB_SYNC_ALL mode since do_sync_mapping_range() was used by other code relying on WB_SYNC_ALL semantics. These days do_sync_mapping_range() went away and we can switch sync_file_range(2) back to issuing WB_SYNC_NONE writeback. That should help PostgreSQL avoid large latency spikes when flushing data in the background. Andres measured a 20% increase in transactions per second on an SSD disk. Signed-off-by: Jan Kara <[email protected]> Reported-by: Andres Freund <[email protected]> Tested-By: Andres Freund <[email protected]> Cc: Al Viro <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-06thp: remove unused vma parameter from khugepaged_alloc_pageAaron Tomlin1-5/+3
The "vma" parameter to khugepaged_alloc_page() is unused. It has to remain unused or the drop read lock 'map_sem' optimisation introduce by commit 8b1645685acf ("mm, THP: don't hold mmap_sem in khugepaged when allocating THP") wouldn't be safe. So let's remove it. Signed-off-by: Aaron Tomlin <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-06mm, fs: introduce mapping_gfp_constraint()Michal Hocko19-30/+36
There are many places which use mapping_gfp_mask to restrict a more generic gfp mask which would be used for allocations which are not directly related to the page cache but they are performed in the same context. Let's introduce a helper function which makes the restriction explicit and easier to track. This patch doesn't introduce any functional changes. [[email protected]: coding-style fixes] Signed-off-by: Michal Hocko <[email protected]> Suggested-by: Andrew Morton <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-06include/linux/mmzone.h: reflow commentAndrew Morton1-6/+7
Someone has an 86 column display. Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-06mm: page_alloc: hide some GFP internals and document the bits and flag ↵Mel Gorman4-80/+194
combinations Andrew stated the following We have quite a history of remote parts of the kernel using weird/wrong/inexplicable combinations of __GFP_ flags. I tend to think that this is because we didn't adequately explain the interface. And I don't think that gfp.h really improved much in this area as a result of this patchset. Could you go through it some time and decide if we've adequately documented all this stuff? This patches first moves some GFP flag combinations that are part of the MM internals to mm/internal.h. The rest of the patch documents the __GFP_FOO bits under various headings and then documents the flag combinations. It will not help callers that are brain damaged but the clarity might motivate some fixes and avoid future mistakes. Signed-off-by: Mel Gorman <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: David Rientjes <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Vitaly Wool <[email protected]> Cc: Rik van Riel <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>