Age | Commit message (Collapse) | Author | Files | Lines |
|
In the bitmap based allocator implementation, nilfs_mdt_bgl_lock() helper
is frequently used to get a spinlock protecting a target block group.
This reduces its usage and simplifies arguments of some related functions
by directly passing a pointer to the spinlock.
Signed-off-by: Ryusuke Konishi <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
This uses nilfs_warning() to replace "printk(KERN_WARNING ...);" in the
bitmap based allocator implementation of nilfs2. The warning messages are
modified to include the device name and the inode number in each message.
This makes it clear which metadata file of which device has output
warnings such as "entry number xxxx already freed".
Signed-off-by: Ryusuke Konishi <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Remove unneeded NULL test.
The semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)
// <smpl>
@@ expression x; @@
-if (x != NULL)
\(kmem_cache_destroy\|mempool_destroy\|dma_pool_destroy\)(x);
// </smpl>
Signed-off-by: Julia Lawall <[email protected]>
Signed-off-by: Ryusuke Konishi <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Global and static variables don't need to be initialized to 0.
There is already a test for this but the output message doesn't
mention booleans initialized to false.
Improve the output message and the test by adding various forms
with possible specific integer types and possible multiple zeros.
Miscellanea:
o Use a variable to hold the possible 0 test
Signed-off-by: Joe Perches <[email protected]>
Signed-off-by: Shailendra Verma <[email protected]>
Tested-by: Shailendra Verma <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Including BUG and stack dumps in commit logs makes checkpatch produce some
false positive warning messages.
checkpatch has multiple types of false positives:
o Commit message lines > 75 chars
o Stack dump address are mistaken for git commit IDs
o Link: and Fixes: lines are allowed to be > 75 chars.
o Fixes: style doesn't require ("<commit_description>")
parentheses and double quotes like other uses of
git commit ID and description.
Fix these.
Miscellanea:
o Move the test for checking $commit_log_possible_stack_dump
above the test for a long line commit message
o Add test for hex address surrounded by square or angle brackets
Signed-off-by: Joe Perches <[email protected]>
Reported-by: Stephen Smalley <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
There is a classical off-by-one error in case when we try to place, for
example, 1+1 bytes as hex in the buffer of size 6. The expected result is
to get an output truncated, but in the reality we get 6 bytes filed
followed by terminating NUL.
Change the logic how we fill the output in case of byte dumping into
limited space. This will follow the snprintf() behaviour by truncating
output even on half bytes.
Fixes: 114fc1afb2de (hexdump: make it return number of bytes placed in buffer)
Signed-off-by: Andy Shevchenko <[email protected]>
Reported-by: Aaro Koskinen <[email protected]>
Tested-by: Aaro Koskinen <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
I noticed that commit a20135ffbc44 ("writeback: don't drain
bdi_writeback_congested on bdi destruction") added a usage of
rbtree_postorder_for_each_entry_safe() in mm/backing-dev.c which appears
to try to rb_erase() elements from an rbtree while iterating over it using
rbtree_postorder_for_each_entry_safe().
Doing this will cause random nodes to be missed by the iteration because
rb_erase() may rebalance the tree, changing the ordering that we're trying
to iterate over.
The previous documentation for rbtree_postorder_for_each_entry_safe()
wasn't clear that this wasn't allowed, it was taken from the docs for
list_for_each_entry_safe(), where erasing isn't a problem due to
list_del() not reordering.
Explicitly warn developers about this potential pit-fall.
Note that I haven't fixed the actual issue that (it appears) the commit
referenced above introduced (not familiar enough with that code).
In general (and in this case), the patterns to follow are:
- switch to rb_first() + rb_erase(), don't use
rbtree_postorder_for_each_entry_safe().
- keep the postorder iteration and don't rb_erase() at all. Instead
just clear the fields of rb_node & cgwb_congested_tree as required by
other users of those structures.
[[email protected]: tweak comments]
Signed-off-by: Cody P Schafer <[email protected]>
Cc: John de la Garza <[email protected]>
Cc: Michel Lespinasse <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Rusty Russell <[email protected]>
Cc: Tejun Heo <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
for_each_thread()
Change current_is_single_threaded() to use for_each_thread() rather than
deprecated while_each_thread().
Signed-off-by: Oleg Nesterov <[email protected]>
Cc: David Howells <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Sometimes kobject_set_name_vargs is called with a format string conaining
no %, or a format string of precisely "%s", where the single vararg
happens to point to .rodata. kvasprintf_const detects these cases for us
and returns a copy of that pointer instead of duplicating the string, thus
saving some run-time memory. Otherwise, it falls back to kvasprintf. We
just need to always deallocate ->name using kfree_const.
Unfortunately, the dance we need to do to perform the '/' -> '!'
sanitization makes the resulting code rather ugly.
I instrumented kstrdup_const to provide some statistics on the memory
saved, and for me this gave an additional ~14KB after boot (306KB was
already saved; this patch bumped that to 320KB). I have
KMALLOC_SHIFT_LOW==3, and since 80% of the kvasprintf_const hits were
satisfied by an 8-byte allocation, the 14K would roughly be quadrupled
when KMALLOC_SHIFT_LOW==5. Whether these numbers are sufficient to
justify the ugliness I'll leave to others to decide.
Signed-off-by: Rasmus Villemoes <[email protected]>
Cc: Greg KH <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
This adds kvasprintf_const which tries to use kstrdup_const if possible:
If the format string contains no % characters, or if the format string is
exactly "%s", we delegate to kstrdup_const. Otherwise, we fall back to
kvasprintf.
Just as for kstrdup_const, the main motivation is to save memory by
reusing .rodata when possible.
The return value should be freed by kfree_const, just like for
kstrdup_const.
There is deliberately no kasprintf_const: In the vast majority of cases,
the format string argument is a literal, so one can determine statically
whether one could instead use kstrdup_const directly (which would also
require one to change all corresponding kfree calls to kfree_const).
Signed-off-by: Rasmus Villemoes <[email protected]>
Cc: Greg KH <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
llist_del_first reads entry->next, but it did not acquire visibility over
the entry node. As the result it can get a stale value of entry->next
(e.g. NULL or whatever garbage was there before the appending thread
wrote correct value). And then commit that value as llist head with
cmpxchg. That will corrupt llist.
Note there is a control-dependency between read of head->first and read of
entry->next, but it does not make the code correct. Kernel memory model
unambiguously says: "A load-load control dependency requires a full read
memory barrier".
Use smp_load_acquire to acquire visibility over the entry node.
The data race was found with KernelThreadSanitizer (KTSAN).
Here is an example of KTSAN report:
ThreadSanitizer: data-race in llist_del_first
Read of size 1 by thread T389 (K2630, CPU0):
[<ffffffff8156b8a9>] llist_del_first+0x39/0x70 lib/llist.c:74
[< inlined >] tty_buffer_alloc drivers/tty/tty_buffer.c:181
[<ffffffff81664af4>] __tty_buffer_request_room+0xb4/0x250 drivers/tty/tty_buffer.c:292
[<ffffffff81664e6c>] tty_insert_flip_string_fixed_flag+0x6c/0x150 drivers/tty/tty_buffer.c:337
[< inlined >] tty_insert_flip_string include/linux/tty_flip.h:35
[<ffffffff81667422>] pty_write+0x72/0xc0 drivers/tty/pty.c:110
[< inlined >] process_output_block drivers/tty/n_tty.c:611
[<ffffffff8165c016>] n_tty_write+0x346/0x7f0 drivers/tty/n_tty.c:2401
[< inlined >] do_tty_write drivers/tty/tty_io.c:1159
[<ffffffff816568df>] tty_write+0x21f/0x3f0 drivers/tty/tty_io.c:1245
[<ffffffff8125f00f>] __vfs_write+0x5f/0x1f0 fs/read_write.c:489
[<ffffffff8125ff8f>] vfs_write+0xef/0x280 fs/read_write.c:538
[< inlined >] SYSC_write fs/read_write.c:585
[<ffffffff81261390>] SyS_write+0x70/0xe0 fs/read_write.c:577
[<ffffffff81ee862e>] entry_SYSCALL_64_fastpath+0x12/0x71 arch/x86/entry/entry_64.S:186
Previous write of size 8 by thread T226 (K761, CPU0):
[<ffffffff8156b832>] llist_add_batch+0x32/0x70 lib/llist.c:44 (discriminator 16)
[< inlined >] llist_add include/linux/llist.h:180
[<ffffffff816649fc>] tty_buffer_free+0x6c/0xb0 drivers/tty/tty_buffer.c:221
[<ffffffff816651e7>] flush_to_ldisc+0x107/0x300 drivers/tty/tty_buffer.c:514
[<ffffffff810b20ee>] process_one_work+0x47e/0x930 kernel/workqueue.c:2036
[<ffffffff810b2650>] worker_thread+0xb0/0x900 kernel/workqueue.c:2170
[<ffffffff810bbe20>] kthread+0x150/0x170 kernel/kthread.c:209
[<ffffffff81ee8a1f>] ret_from_fork+0x3f/0x70 arch/x86/entry/entry_64.S:526
Signed-off-by: Dmitry Vyukov <[email protected]>
Reviewed-by: Paul E. McKenney <[email protected]>
Cc: Rasmus Villemoes <[email protected]>
Cc: Huang Ying <[email protected]>
Cc: Konstantin Serebryany <[email protected]>
Cc: Andrey Konovalov <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Add a couple of simple tests for string_get_size(). The last one will
hang the kernel without the 'lib/string_helpers.c: fix infinite loop in
string_get_size()' fix.
Signed-off-by: Vitaly Kuznetsov <[email protected]>
Cc: James Bottomley <[email protected]>
Cc: Andy Shevchenko <[email protected]>
Cc: Rasmus Villemoes <[email protected]>
Cc: "K. Y. Srinivasan" <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
<linux/bitops.h> provides rol32() inline function, let's use already
predefined function instead of direct expression.
Signed-off-by: Alexander Kuleshov <[email protected]>
Cc: Herbert Xu <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Signed-off-by: Martin Kepplinger <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: George Spelvin <[email protected]>
Cc: Rasmus Villemoes <[email protected]>
Cc: Maxime Coquelin <[email protected]>
Cc: Denys Vlasenko <[email protected]>
Cc: Yury Norov <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Signed-off-by: Martin Kepplinger <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: George Spelvin <[email protected]>
Cc: Rasmus Villemoes <[email protected]>
Cc: Maxime Coquelin <[email protected]>
Cc: Denys Vlasenko <[email protected]>
Cc: Yury Norov <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Months back, this was discussed, see https://lkml.org/lkml/2015/1/18/289
The result was the 64-bit version being "likely fine", "valuable" and
"correct". The discussion fell asleep but since there are possible users,
let's add it.
Signed-off-by: Martin Kepplinger <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: George Spelvin <[email protected]>
Cc: Rasmus Villemoes <[email protected]>
Cc: Maxime Coquelin <[email protected]>
Cc: Denys Vlasenko <[email protected]>
Cc: Yury Norov <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
It is often overlooked that sign_extend32(), despite its name, is safe to
use for 16 and 8 bit types as well. This should help prevent sign
extension being done manually some other way.
Signed-off-by: Martin Kepplinger <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: George Spelvin <[email protected]>
Cc: Rasmus Villemoes <[email protected]>
Cc: Maxime Coquelin <[email protected]>
Cc: Denys Vlasenko <[email protected]>
Cc: Yury Norov <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Add the missing extcon directory to maintain them. When using
get_maintainer.pl, the result should include the correct maintainer
information.
Signed-off-by: Chanwoo Choi <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Reviewer output currently does not include the subsystem
that matched. Add it.
Miscellanea:
o Add a get_subsystem_name routine to centralize this
Signed-off-by: Joe Perches <[email protected]>
Tested-by: Krzysztof Kozlowski <[email protected]>
Cc: Lee Jones <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
We don't consistenly document the default value next to the option
listing, but we do have a list of defaults here, so let's keep it up to
date.
Signed-off-by: Brian Norris <[email protected]>
Signed-off-by: Joe Perches <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Many flag options are boolean and support both a positive and a negative
invocation from the command line. Some of these are even mentioned by
example (e.g., --nogit is mentioned as a default option), but they aren't
explicitly mentioned in the list of options. It happens that some of
these are pretty important, as they are default-on, and to turn them off,
you have to know about the --no-foo version.
Rather than clutter the whole help text with bracketed '--[no]foo', let's
just mention the general rule, a la 'man gcc'.
Signed-off-by: Brian Norris <[email protected]>
Signed-off-by: Joe Perches <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Though it appears that Perl's GetOptions will take either, the latter is
not documented in the options listing.
Signed-off-by: Brian Norris <[email protected]>
Signed-off-by: Joe Perches <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
I really haven't used this option much myself, so feel free to improve on
the documentation for it. I just noticed it while inspecting this script
for undocumented features.
Signed-off-by: Brian Norris <[email protected]>
Signed-off-by: Joe Perches <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The following statement of ABI/testing/dev-kmsg is not quite right:
It is not possible to inject messages from userspace with the
facility number LOG_KERN (0), to make sure that the origin of the
messages can always be reliably determined.
Userland actually can inject messages with a facility of 0 by abusing the
fact that the facility is stored in a u8 data type. By using a facility
which is a multiple of 256 the assignment of msg->facility in log_store()
implicitly truncates it to 0, i.e. LOG_KERN, allowing users of /dev/kmsg
to spoof kernel messages as shown below:
The following call...
# printf '<%d>Kernel panic - not syncing: beer empty\n' 0 >/dev/kmsg
...leads to the following log entry (dmesg -x | tail -n 1):
user :emerg : [ 66.137758] Kernel panic - not syncing: beer empty
However, this call...
# printf '<%d>Kernel panic - not syncing: beer empty\n' 0x800 >/dev/kmsg
...leads to the slightly different log entry (note the kernel facility):
kern :emerg : [ 74.177343] Kernel panic - not syncing: beer empty
Fix that by limiting the user provided facility to 8 bit right from the
beginning and catch the truncation early.
Fixes: 7ff9554bb578 ("printk: convert byte-buffer to variable-length...")
Signed-off-by: Mathias Krause <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Petr Mladek <[email protected]>
Cc: Alex Elder <[email protected]>
Cc: Joe Perches <[email protected]>
Cc: Kay Sievers <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
%n is no longer just ignored; it results in early return from vsnprintf.
Also add a request to add test cases for future %p extensions.
Signed-off-by: Rasmus Villemoes <[email protected]>
Reviewed-by: Martin Kletzander <[email protected]>
Reviewed-by: Andy Shevchenko <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
This runs the lib/test_printf module to make sure printf is operating
sanely.
Signed-off-by: Kees Cook <[email protected]>
Cc: Rasmus Villemoes <[email protected]>
Cc: Shuah Khan <[email protected]>
Cc: Andy Shevchenko <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
This adds a simple module for testing the kernel's printf facilities.
Previously, some %p extensions have caused a wrong return value in case
the entire output didn't fit and/or been unusable in kasprintf(). This
should help catch such issues. Also, it should help ensure that changes
to the formatting algorithms don't break anything.
I'm not sure if we have a struct dentry or struct file lying around at
boot time or if we can fake one, but most %p extensions should be
testable, as should the ordinary number and string formatting.
The nature of vararg functions means we can't use a more conventional
table-driven approach.
For now, this is mostly a skeleton; contributions are very
welcome. Some tests are/will be slightly annoying to write, since the
expected output depends on stuff like CONFIG_*, sizeof(long), runtime
values etc.
Signed-off-by: Rasmus Villemoes <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
Cc: Andy Shevchenko <[email protected]>
Cc: Martin Kletzander <[email protected]>
Cc: Rasmus Villemoes <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
As a quick
git grep -E '%[ +0#-]*#[ +0#-]*(\*|[0-9]+)?(\.(\*|[0-9]+)?)?p'
shows, nobody uses the # flag with %p. Should one try to do so, one
will be met with
warning: `#' flag used with `%p' gnu_printf format [-Wformat]
(POSIX and C99 both say "... For other conversion specifiers, the
behavior is undefined.". Obviously, the kernel can choose to define
the behaviour however it wants, but as long as gcc issues that
warning, users are unlikely to show up.)
Since default_width is effectively always 2*sizeof(void*), we can
simplify the prologue of pointer() and save a few instructions.
Signed-off-by: Rasmus Villemoes <[email protected]>
Reviewed-by: Andy Shevchenko <[email protected]>
Acked-by: Kees Cook <[email protected]>
Cc: Martin Kletzander <[email protected]>
Cc: Rasmus Villemoes <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Quoting from 2aa2f9e21e4e ("lib/vsprintf.c: improve sanity check in
vsnprintf()"):
On 64 bit, size may very well be huge even if bit 31 happens to be 0.
Somehow it doesn't feel right that one can pass a 5 GiB buffer but not a
3 GiB one. So cap at INT_MAX as was probably the intention all along.
This is also the made-up value passed by sprintf and vsprintf.
I should have seen this copy-pasted instance back then, but let's just
do it now.
Signed-off-by: Rasmus Villemoes <[email protected]>
Reviewed-by: Andy Shevchenko <[email protected]>
Acked-by: Kees Cook <[email protected]>
Cc: Martin Kletzander <[email protected]>
Cc: Rasmus Villemoes <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
If we meet any invalid or unsupported format specifier, 'handling' it by
just printing it as a literal string is not safe: Presumably the format
string and the arguments passed gcc's type checking, but that means
something like sprintf(buf, "%n %pd", &intvar, dentry) would end up
interpreting &intvar as a struct dentry*.
When the offending specifier was %n it used to be at the end of the format
string, but we can't rely on that always being the case. Also, gcc
doesn't complain about some more or less exotic qualifiers (or 'length
modifiers' in posix-speak) such as 'j' or 'q', but being unrecognized by
the kernel's printf implementation, they'd be interpreted as unknown
specifiers, and the rest of arguments would be interpreted wrongly.
So let's complain about anything we don't understand, not just %n, and
stop pretending that we'd be able to make sense of the rest of the
format/arguments. If the offending specifier is in a printk() call we
unfortunately only get a "BUG: recent printk recursion!", but at least
direct users of the sprintf family will be caught.
Signed-off-by: Rasmus Villemoes <[email protected]>
Reviewed-by: Andy Shevchenko <[email protected]>
Acked-by: Kees Cook <[email protected]>
Cc: Martin Kletzander <[email protected]>
Cc: Rasmus Villemoes <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Move all pointer-formatting documentation to one place in the code and one
place in the documentation instead of keeping it in three places with
different level of completeness. Documentation/printk-formats.txt has
detailed information about each modifier, docstring above pointer() has
short descriptions of them (as that is the function dealing with %p) and
docstring above vsprintf() is removed as redundant. Both docstrings in
the code that were modified are updated with a reminder of updating the
documentation upon any further change.
[[email protected]: fix comment]
Signed-off-by: Martin Kletzander <[email protected]>
Reviewed-by: Andy Shevchenko <[email protected]>
Cc: Rasmus Villemoes <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Using kstrdup_const, thus reusing .rodata when possible, saves around 2 kB
of runtime memory on my laptop/.config combination.
Signed-off-by: Rasmus Villemoes <[email protected]>
Cc: Jason Baron <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Reported-by: Wu Fengguang <[email protected]>
Cc: Sasha Levin <[email protected]>
Cc: David Woodhouse <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Cc: Andi Kleen <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The commit 96d0df79f264 ("proc: make proc_fd_permission() thread-friendly")
fixed the access to /proc/self/fd from sub-threads, but introduced another
problem: a sub-thread can't access /proc/<tid>/fd/ or /proc/thread-self/fd
if generic_permission() fails.
Change proc_fd_permission() to check same_thread_group(pid_task(), current).
Fixes: 96d0df79f264 ("proc: make proc_fd_permission() thread-friendly")
Reported-by: "Jin, Yihua" <[email protected]>
Signed-off-by: Oleg Nesterov <[email protected]>
Cc: "Eric W. Biederman" <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
For now in task_name() we ignore the return code of string_escape_str()
call. This is not good if buffer suddenly becomes not big enough. Do the
proper error handling there.
Signed-off-by: Andy Shevchenko <[email protected]>
Cc: Alexander Viro <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
On 64 bit system we have enough space in struct page to encode
compound_dtor and compound_order with unsigned int.
On x86-64 it leads to slightly smaller code size due usesage of plain
MOV instead of MOVZX (zero-extended move) or similar effect.
allyesconfig:
text data bss dec hex filename
159520446 48146736 72196096 279863278 10ae5fee vmlinux.pre
159520382 48146736 72196096 279863214 10ae5fae vmlinux.post
On other architectures without native support of 16-bit data types the
Signed-off-by: Kirill A. Shutemov <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Reviewed-by: Andrea Arcangeli <[email protected]>
Cc: "Paul E. McKenney" <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Aneesh Kumar K.V <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Sergey Senozhatsky <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Let's try to be consistent about data type of page order.
[[email protected]: fix build (type of pageblock_order)]
[[email protected]: some configs end up with MAX_ORDER and pageblock_order having different types]
Signed-off-by: Kirill A. Shutemov <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
Reviewed-by: Andrea Arcangeli <[email protected]>
Cc: "Paul E. McKenney" <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Aneesh Kumar K.V <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Sergey Senozhatsky <[email protected]>
Signed-off-by: Stephen Rothwell <[email protected]>
Signed-off-by: Hugh Dickins <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Hugh has pointed that compound_head() call can be unsafe in some
context. There's one example:
CPU0 CPU1
isolate_migratepages_block()
page_count()
compound_head()
!!PageTail() == true
put_page()
tail->first_page = NULL
head = tail->first_page
alloc_pages(__GFP_COMP)
prep_compound_page()
tail->first_page = head
__SetPageTail(p);
!!PageTail() == true
<head == NULL dereferencing>
The race is pure theoretical. I don't it's possible to trigger it in
practice. But who knows.
We can fix the race by changing how encode PageTail() and compound_head()
within struct page to be able to update them in one shot.
The patch introduces page->compound_head into third double word block in
front of compound_dtor and compound_order. Bit 0 encodes PageTail() and
the rest bits are pointer to head page if bit zero is set.
The patch moves page->pmd_huge_pte out of word, just in case if an
architecture defines pgtable_t into something what can have the bit 0
set.
hugetlb_cgroup uses page->lru.next in the second tail page to store
pointer struct hugetlb_cgroup. The patch switch it to use page->private
in the second tail page instead. The space is free since ->first_page is
removed from the union.
The patch also opens possibility to remove HUGETLB_CGROUP_MIN_ORDER
limitation, since there's now space in first tail page to store struct
hugetlb_cgroup pointer. But that's out of scope of the patch.
That means page->compound_head shares storage space with:
- page->lru.next;
- page->next;
- page->rcu_head.next;
That's too long list to be absolutely sure, but looks like nobody uses
bit 0 of the word.
page->rcu_head.next guaranteed[1] to have bit 0 clean as long as we use
call_rcu(), call_rcu_bh(), call_rcu_sched(), or call_srcu(). But future
call_rcu_lazy() is not allowed as it makes use of the bit and we can
get false positive PageTail().
[1] http://lkml.kernel.org/g/[email protected]
Signed-off-by: Kirill A. Shutemov <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Reviewed-by: Andrea Arcangeli <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Acked-by: Paul E. McKenney <[email protected]>
Cc: Aneesh Kumar K.V <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Sergey Senozhatsky <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The patch halves space occupied by compound_dtor and compound_order in
struct page.
For compound_order, it's trivial long -> short conversion.
For get_compound_page_dtor(), we now use hardcoded table for destructor
lookup and store its index in the struct page instead of direct pointer
to destructor. It shouldn't be a big trouble to maintain the table: we
have only two destructor and NULL currently.
This patch free up one word in tail pages for reuse. This is preparation
for the next patch.
Signed-off-by: Kirill A. Shutemov <[email protected]>
Reviewed-by: Michal Hocko <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
Reviewed-by: Andrea Arcangeli <[email protected]>
Cc: "Paul E. McKenney" <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Aneesh Kumar K.V <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Sergey Senozhatsky <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
We are going to rework how compound_head() work. It will not use
page->first_page as we have it now.
The only other user of page->first_page beyond compound pages is
zsmalloc.
Let's use page->private instead of page->first_page here. It occupies
the same storage space.
Signed-off-by: Kirill A. Shutemov <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
Reviewed-by: Sergey Senozhatsky <[email protected]>
Reviewed-by: Andrea Arcangeli <[email protected]>
Cc: "Paul E. McKenney" <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Aneesh Kumar K.V <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Michal Hocko <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
We have properly typed page->rcu_head, no need to cast page->lru.
Signed-off-by: Kirill A. Shutemov <[email protected]>
Reviewed-by: Andrea Arcangeli <[email protected]>
Acked-by: Christoph Lameter <[email protected]>
Cc: "Paul E. McKenney" <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Aneesh Kumar K.V <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Sergey Senozhatsky <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Since 8456a648cf44 ("slab: use struct page for slab management") nobody
uses slab_page field in struct page.
Let's drop it.
Signed-off-by: Kirill A. Shutemov <[email protected]>
Acked-by: Christoph Lameter <[email protected]>
Acked-by: David Rientjes <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
Reviewed-by: Andrea Arcangeli <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: "Paul E. McKenney" <[email protected]>
Cc: Aneesh Kumar K.V <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Sergey Senozhatsky <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Each `struct size_class' contains `struct zs_size_stat': an array of
NR_ZS_STAT_TYPE `unsigned long'. For zsmalloc built with no
CONFIG_ZSMALLOC_STAT this results in a waste of `2 * sizeof(unsigned
long)' per-class.
The patch removes unneeded `struct zs_size_stat' members by redefining
NR_ZS_STAT_TYPE (max stat idx in array).
Since both NR_ZS_STAT_TYPE and zs_stat_type are compile time constants,
GCC can eliminate zs_stat_inc()/zs_stat_dec() calls that use zs_stat_type
larger than NR_ZS_STAT_TYPE: CLASS_ALMOST_EMPTY and CLASS_ALMOST_FULL at
the moment.
./scripts/bloat-o-meter mm/zsmalloc.o.old mm/zsmalloc.o.new
add/remove: 0/0 grow/shrink: 0/3 up/down: 0/-39 (-39)
function old new delta
fix_fullness_group 97 94 -3
insert_zspage 100 86 -14
remove_zspage 141 119 -22
To summarize:
a) each class now uses less memory
b) we avoid a number of dec/inc stats (a minor optimization,
but still).
The gain will increase once we introduce additional stats.
A simple IO test.
iozone -t 4 -R -r 32K -s 60M -I +Z
patched base
" Initial write " 4145599.06 4127509.75
" Rewrite " 4146225.94 4223618.50
" Read " 17157606.00 17211329.50
" Re-read " 17380428.00 17267650.50
" Reverse Read " 16742768.00 16162732.75
" Stride read " 16586245.75 16073934.25
" Random read " 16349587.50 15799401.75
" Mixed workload " 10344230.62 9775551.50
" Random write " 4277700.62 4260019.69
" Pwrite " 4302049.12 4313703.88
" Pread " 6164463.16 6126536.72
" Fwrite " 7131195.00 6952586.00
" Fread " 12682602.25 12619207.50
Signed-off-by: Sergey Senozhatsky <[email protected]>
Cc: Minchan Kim <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Signed-off-by: Hui Zhu <[email protected]>
Reviewed-by: Sergey Senozhatsky <[email protected]>
Cc: Minchan Kim <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
We don't let user to disable shrinker in zsmalloc (once it's been
enabled), so no need to check ->shrinker_enabled in zs_shrinker_count(),
at the moment at least.
Signed-off-by: Sergey Senozhatsky <[email protected]>
Acked-by: Minchan Kim <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
A cosmetic change.
Commit c60369f01125 ("staging: zsmalloc: prevent mappping in interrupt
context") added in_interrupt() check to zs_map_object() and 'hardirq.h'
include; but in_interrupt() macro is defined in 'preempt.h' not in
'hardirq.h', so include it instead.
Signed-off-by: Sergey Senozhatsky <[email protected]>
Acked-by: Minchan Kim <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
In obj_malloc():
if (!class->huge)
/* record handle in the header of allocated chunk */
link->handle = handle;
else
/* record handle in first_page->private */
set_page_private(first_page, handle);
In the hugepage we save handle to private directly.
But in obj_to_head():
if (class->huge) {
VM_BUG_ON(!is_first_page(page));
return *(unsigned long *)page_private(page);
} else
return *(unsigned long *)obj;
It is used as a pointer.
The reason why there is no problem until now is huge-class page is born
with ZS_FULL so it can't be migrated. However, we need this patch for
future work: "VM-aware zsmalloced page migration" to reduce external
fragmentation.
Signed-off-by: Hui Zhu <[email protected]>
Acked-by: Minchan Kim <[email protected]>
Cc: Sergey Senozhatsky <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
[[email protected]: fix grammar]
Signed-off-by: Hui Zhu <[email protected]>
Reviewed-by: Sergey Senozhatsky <[email protected]>
Cc: Dan Streetman <[email protected]>
Cc: Minchan Kim <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Constify `struct zs_pool' ->name.
[[email protected]: constify zpool_create_pool()'s `type' arg also]
Signed-off-by: Sergey Senozhatsky <[email protected]>
Acked-by: Dan Streetman <[email protected]>
Cc: Minchan Kim <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|