aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2015-11-06coredump: change zap_threads() and zap_process() to use for_each_thread()Oleg Nesterov1-14/+13
Change zap_threads() paths to use for_each_thread() rather than while_each_thread(). While at it, change zap_threads() to avoid the nested if's to make the code more readable and lessen the indentation. Signed-off-by: Oleg Nesterov <[email protected]> Cc: David Rientjes <[email protected]> Cc: Kyle Walker <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Stanislav Kozina <[email protected]> Cc: Tetsuo Handa <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-06coredump: ensure all coredumping tasks have SIGNAL_GROUP_COREDUMPOleg Nesterov2-7/+7
task_will_free_mem() is wrong in many ways, and in particular the SIGNAL_GROUP_COREDUMP check is not reliable: a task can participate in the coredumping without SIGNAL_GROUP_COREDUMP bit set. change zap_threads() paths to always set SIGNAL_GROUP_COREDUMP even if other CLONE_VM processes can't react to SIGKILL. Fortunately, at least oom-kill case if fine; it kills all tasks sharing the same mm, so it should also kill the process which actually dumps the core. The change in prepare_signal() is not strictly necessary, it just ensures that the patch does not bring another subtle behavioural change. But it reminds us that this SIGNAL_GROUP_EXIT/COREDUMP case needs more changes. Signed-off-by: Oleg Nesterov <[email protected]> Cc: David Rientjes <[email protected]> Cc: Kyle Walker <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Stanislav Kozina <[email protected]> Cc: Tetsuo Handa <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-06signal: remove jffs2_garbage_collect_thread()->allow_signal(SIGCONT)Oleg Nesterov1-1/+0
jffs2_garbage_collect_thread() does allow_signal(SIGCONT) for no reason, SIGCONT will wake a stopped task up even if it is ignored. Signed-off-by: Oleg Nesterov <[email protected]> Reviewed-by: Tejun Heo <[email protected]> Cc: David Woodhouse <[email protected]> Cc: Felipe Balbi <[email protected]> Cc: Markus Pargmann <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-06signal: introduce kernel_signal_stop() to fix jffs2_garbage_collect_thread()Oleg Nesterov2-2/+11
jffs2_garbage_collect_thread() can race with SIGCONT and sleep in TASK_STOPPED state after it was already sent. Add the new helper, kernel_signal_stop(), which does this correctly. Signed-off-by: Oleg Nesterov <[email protected]> Reviewed-by: Tejun Heo <[email protected]> Cc: David Woodhouse <[email protected]> Cc: Felipe Balbi <[email protected]> Cc: Markus Pargmann <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-06signal: turn dequeue_signal_lock() into kernel_dequeue_signal()Oleg Nesterov4-21/+12
1. Rename dequeue_signal_lock() to kernel_dequeue_signal(). This matches another "for kthreads only" kernel_sigaction() helper. 2. Remove the "tsk" and "mask" arguments, they are always current and current->blocked. And it is simply wrong if tsk != current. 3. We could also remove the 3rd "siginfo_t *info" arg but it looks potentially useful. However we can simplify the callers if we change kernel_dequeue_signal() to accept info => NULL. 4. Remove _irqsave, it is never called from atomic context. Signed-off-by: Oleg Nesterov <[email protected]> Reviewed-by: Tejun Heo <[email protected]> Cc: David Woodhouse <[email protected]> Cc: Felipe Balbi <[email protected]> Cc: Markus Pargmann <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-06signals: kill block_all_signals() and unblock_all_signals()Oleg Nesterov4-98/+2
It is hardly possible to enumerate all problems with block_all_signals() and unblock_all_signals(). Just for example, 1. block_all_signals(SIGSTOP/etc) simply can't help if the caller is multithreaded. Another thread can dequeue the signal and force the group stop. 2. Even is the caller is single-threaded, it will "stop" anyway. It will not sleep, but it will spin in kernel space until SIGCONT or SIGKILL. And a lot more. In short, this interface doesn't work at all, at least the last 10+ years. Daniel said: Yeah the only times I played around with the DRM_LOCK stuff was when old drivers accidentally deadlocked - my impression is that the entire DRM_LOCK thing was never really tested properly ;-) Hence I'm all for purging where this leaks out of the drm subsystem. Signed-off-by: Oleg Nesterov <[email protected]> Acked-by: Daniel Vetter <[email protected]> Acked-by: Dave Airlie <[email protected]> Cc: Richard Weinberger <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-06nilfs2: fix gcc uninitialized-variable warnings in powerpc buildRyusuke Konishi3-4/+7
Some false positive warnings are reported for powerpc build. The following warnings are reported in http://kisskb.ellerman.id.au/kisskb/buildresult/12519703/ CC fs/nilfs2/super.o fs/nilfs2/super.c: In function 'nilfs_resize_fs': fs/nilfs2/super.c:376:2: warning: 'blocknr' may be used uninitialized in this function [-Wuninitialized] fs/nilfs2/super.c:362:11: note: 'blocknr' was declared here CC fs/nilfs2/recovery.o fs/nilfs2/recovery.c: In function 'nilfs_salvage_orphan_logs': fs/nilfs2/recovery.c:631:21: warning: 'sum' may be used uninitialized in this function [-Wuninitialized] fs/nilfs2/recovery.c:585:32: note: 'sum' was declared here fs/nilfs2/recovery.c: In function 'nilfs_search_super_root': fs/nilfs2/recovery.c:873:11: warning: 'sum' may be used uninitialized in this function [-Wuninitialized] Another similar warning is reported in http://kisskb.ellerman.id.au/kisskb/buildresult/12520079/ CC fs/nilfs2/btree.o fs/nilfs2/btree.c: In function 'nilfs_btree_convert_and_insert': include/asm-generic/bitops/non-atomic.h:105:20: warning: 'bh' may be used uninitialized in this function [-Wuninitialized] fs/nilfs2/btree.c:1859:22: note: 'bh' was declared here This cleans out these warnings by forcing the variables to be initialized. Signed-off-by: Ryusuke Konishi <[email protected]> Reported-by: Geert Uytterhoeven <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-06nilfs2: fix gcc unused-but-set-variable warningsRyusuke Konishi5-13/+3
Fix the following build warnings: $ make W=1 [...] CC [M] fs/nilfs2/btree.o fs/nilfs2/btree.c: In function 'nilfs_btree_split': fs/nilfs2/btree.c:923:8: warning: variable 'newptr' set but not used [-Wunused-but-set-variable] __u64 newptr; ^ fs/nilfs2/btree.c:922:8: warning: variable 'newkey' set but not used [-Wunused-but-set-variable] __u64 newkey; ^ CC [M] fs/nilfs2/dat.o fs/nilfs2/dat.c: In function 'nilfs_dat_prepare_end': fs/nilfs2/dat.c:158:8: warning: variable 'start' set but not used [-Wunused-but-set-variable] __u64 start; ^ CC [M] fs/nilfs2/segment.o fs/nilfs2/segment.c: In function 'nilfs_segctor_do_immediate_flush': fs/nilfs2/segment.c:2433:6: warning: variable 'err' set but not used [-Wunused-but-set-variable] int err; ^ CC [M] fs/nilfs2/sufile.o fs/nilfs2/sufile.c: In function 'nilfs_sufile_alloc': fs/nilfs2/sufile.c:320:27: warning: variable 'ncleansegs' set but not used [-Wunused-but-set-variable] unsigned long nsegments, ncleansegs, nsus, cnt; ^ CC [M] fs/nilfs2/alloc.o fs/nilfs2/alloc.c: In function 'nilfs_palloc_prepare_alloc_entry': fs/nilfs2/alloc.c:478:38: warning: variable 'groups_per_desc_block' set but not used [-Wunused-but-set-variable] unsigned long n, entries_per_group, groups_per_desc_block; ^ Signed-off-by: Ryusuke Konishi <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-06MAINTAINERS: nilfs2: add header file for tracingRyusuke Konishi1-0/+1
This adds header file "include/trace/events/nilfs2.h" to maintainer-ship of nilfs2 so that updates to the nilfs2 header file go to the mailing list of nilfs2. Signed-off-by: Ryusuke Konishi <[email protected]> Cc: Hitoshi Mitake <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-06nilfs2: add tracepoints for analyzing reading and writing metadata filesHitoshi Mitake2-0/+60
This patch adds tracepoints for analyzing requests of reading and writing metadata files. The tracepoints cover every in-place mdt files (cpfile, sufile, and datfile). Example of tracing mdt_insert_new_block(): cp-14635 [000] ...1 30598.199309: nilfs2_mdt_insert_new_block: inode = ffff88022a8d0178 ino = 3 block = 155 cp-14635 [000] ...1 30598.199520: nilfs2_mdt_insert_new_block: inode = ffff88022a8d0178 ino = 3 block = 5 cp-14635 [000] ...1 30598.200828: nilfs2_mdt_insert_new_block: inode = ffff88022a8d0178 ino = 3 block = 253 Signed-off-by: Hitoshi Mitake <[email protected]> Signed-off-by: Ryusuke Konishi <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: TK Kato <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-06nilfs2: add tracepoints for analyzing sufile manipulationHitoshi Mitake2-0/+75
This patch adds tracepoints which would be useful for analyzing segment usage from a perspective of high level sufile manipulation (check, alloc, free). sufile is an important in-place updated metadata file, so analyzing the behavior would be useful for performance turning. example of usage (a case of allocation): $ sudo bin/tpoint nilfs2:nilfs2_segment_usage_allocated Tracing nilfs2:nilfs2_segment_usage_allocated. Ctrl-C to end. segctord-17800 [002] ...1 10671.867294: nilfs2_segment_usage_allocated: sufile = ffff880054f908a8 segnum = 2 segctord-17800 [002] ...1 10675.073477: nilfs2_segment_usage_allocated: sufile = ffff880054f908a8 segnum = 3 Signed-off-by: Hitoshi Mitake <[email protected]> Signed-off-by: Ryusuke Konishi <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Benixon Dhas <[email protected]> Cc: TK Kato <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-06nilfs2: add a tracepoint for transaction eventsHitoshi Mitake2-1/+85
This patch adds a tracepoint for transaction events of nilfs. With the tracepoint, these events can be tracked: begin, abort, commit, trylock, lock, and unlock. Basically, these events have corresponding functions e.g. begin event corresponds nilfs_transaction_begin(). The unlock event is an exception. It corresponds to the iteration in nilfs_transaction_lock(). Only one tracepoint is introcued: nilfs2_transaction_transition. The above events are distinguished with newly introduced enum. With this tracepoint, we can analyse a critical section of segment constructoin. Sample output by tpoint of perf-tools: cp-4457 [000] ...1 63.266220: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800bf5ccc58 count = 1 flags = 9 state = BEGIN cp-4457 [000] ...1 63.266221: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800bf5ccc58 count = 0 flags = 9 state = COMMIT cp-4457 [000] ...1 63.266221: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800bf5ccc58 count = 0 flags = 9 state = COMMIT segctord-4371 [001] ...1 68.261196: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800b889bdf8 count = 0 flags = 10 state = TRYLOCK segctord-4371 [001] ...1 68.261280: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800b889bdf8 count = 0 flags = 10 state = LOCK segctord-4371 [001] ...1 68.261877: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800b889bdf8 count = 1 flags = 10 state = BEGIN segctord-4371 [001] ...1 68.262116: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800b889bdf8 count = 0 flags = 18 state = COMMIT segctord-4371 [001] ...1 68.265032: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800b889bdf8 count = 0 flags = 18 state = UNLOCK segctord-4371 [001] ...1 132.376847: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800b889bdf8 count = 0 flags = 10 state = TRYLOCK This patch also does trivial cleaning of comma usage in collection stage transition event for consistent coding style. Signed-off-by: Hitoshi Mitake <[email protected]> Signed-off-by: Ryusuke Konishi <[email protected]> Cc: Steven Rostedt <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-06nilfs2: add a tracepoint for tracking stage transition of segment constructionHitoshi Mitake3-21/+103
This patch adds a tracepoint for tracking stage transition of block collection in segment construction. With the tracepoint, we can analysis the behavior of segment construction in depth. It would be useful for bottleneck detection and debugging, etc. The tracepoint is created with the standard trace API of linux (like ext3, ext4, f2fs and btrfs). So we can analysis with existing tools easily. Of course, more detailed analysis will be possible if we can create nilfs specific analysis tools. Below is an example of event dump with Brendan Gregg's perf-tools (https://github.com/brendangregg/perf-tools). Time consumption between each stage can be obtained. $ sudo bin/tpoint nilfs2:nilfs2_collection_stage_transition Tracing nilfs2:nilfs2_collection_stage_transition. Ctrl-C to end. segctord-14875 [003] ...1 28311.067794: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_INIT segctord-14875 [003] ...1 28311.068139: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_GC segctord-14875 [003] ...1 28311.068139: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_FILE segctord-14875 [003] ...1 28311.068486: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_IFILE segctord-14875 [003] ...1 28311.068540: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_CPFILE segctord-14875 [003] ...1 28311.068561: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_SUFILE segctord-14875 [003] ...1 28311.068565: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_DAT segctord-14875 [003] ...1 28311.068573: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_SR segctord-14875 [003] ...1 28311.068574: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_DONE For capturing transition correctly, this patch adds wrappers for the member scnt of nilfs_cstage. With this change, every transition of the stage can produce trace event in a correct manner. Signed-off-by: Hitoshi Mitake <[email protected]> Signed-off-by: Ryusuke Konishi <[email protected]> Cc: Steven Rostedt <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-06nilfs2: free unused dat file blocks during garbage collectionRyusuke Konishi2-17/+75
As a nilfs2 volume ages, the amount of available disk space decreases little by little due to bloat of DAT (disk address translation) metadata file. Even if we delete all files in a file system and free their block addresses from the DAT file through a garbage collection, empty DAT blocks are not freed. This fixes the issue by extending the deallocator of block addresses so that empty data blocks and empty bitmap blocks of DAT are deleted. The following comparison shows the effect of this patch. Each shows disk amount information of a nilfs2 volume that we cleaned out by deleting all files and running gc after having filled 90% of its capacity. Before: Filesystem 1K-blocks Used Available Use% Mounted on /dev/sda1 500105212 3022844 472072192 1% /test After: Filesystem 1K-blocks Used Available Use% Mounted on /dev/sda1 500105212 16380 475078656 1% /test Signed-off-by: Ryusuke Konishi <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-06nilfs2: add helper functions to delete blocks from dat fileRyusuke Konishi1-0/+50
This adds delete functions for data blocks of metadata files using bitmap based allocator. nilfs_palloc_delete_entry_block() deletes an entry block (e.g. block storing dat entries), and nilfs_palloc_delete_bitmap_block() deletes a bitmap block, respectively. These helpers are intended to be used in the successive change on deallocator of block addresses ("nilfs2: free unused dat file blocks during garbage collection"). Signed-off-by: Ryusuke Konishi <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-06nilfs2: get rid of nilfs_palloc_group_is_in()Ryusuke Konishi1-19/+9
This unfolds nilfs_palloc_group_is_in() helper function into nilfs_palloc_freev() function to simplify a range check and an index calculation repeatedy performed in a loop of the function. Signed-off-by: Ryusuke Konishi <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-06nilfs2: refactor nilfs_palloc_find_available_slot()Ryusuke Konishi1-27/+21
The current implementation of nilfs_palloc_find_available_slot() function is overkill. The underlying bit search routine is well optimized, so this uses it more simply in nilfs_palloc_find_available_slot(). Signed-off-by: Ryusuke Konishi <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-06nilfs2: do not call nilfs_mdt_bgl_lock() needlesslyRyusuke Konishi1-44/+40
In the bitmap based allocator implementation, nilfs_mdt_bgl_lock() helper is frequently used to get a spinlock protecting a target block group. This reduces its usage and simplifies arguments of some related functions by directly passing a pointer to the spinlock. Signed-off-by: Ryusuke Konishi <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-06nilfs2: use nilfs_warning() in allocator implementationRyusuke Konishi1-8/+12
This uses nilfs_warning() to replace "printk(KERN_WARNING ...);" in the bitmap based allocator implementation of nilfs2. The warning messages are modified to include the device name and the inode number in each message. This makes it clear which metadata file of which device has output warnings such as "entry number xxxx already freed". Signed-off-by: Ryusuke Konishi <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-06nilfs2: drop null test before destroy functionsJulia Lawall1-8/+4
Remove unneeded NULL test. The semantic patch that makes this change is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ expression x; @@ -if (x != NULL) \(kmem_cache_destroy\|mempool_destroy\|dma_pool_destroy\)(x); // </smpl> Signed-off-by: Julia Lawall <[email protected]> Signed-off-by: Ryusuke Konishi <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-06checkpatch: improve the unnecessary initialisers testsJoe Perches1-7/+8
Global and static variables don't need to be initialized to 0. There is already a test for this but the output message doesn't mention booleans initialized to false. Improve the output message and the test by adding various forms with possible specific integer types and possible multiple zeros. Miscellanea: o Use a variable to hold the possible 0 test Signed-off-by: Joe Perches <[email protected]> Signed-off-by: Shailendra Verma <[email protected]> Tested-by: Shailendra Verma <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-06checkpatch: improve tests for fixes:, long lines and stack dumps in commit logJoe Perches1-25/+26
Including BUG and stack dumps in commit logs makes checkpatch produce some false positive warning messages. checkpatch has multiple types of false positives: o Commit message lines > 75 chars o Stack dump address are mistaken for git commit IDs o Link: and Fixes: lines are allowed to be > 75 chars. o Fixes: style doesn't require ("<commit_description>") parentheses and double quotes like other uses of git commit ID and description. Fix these. Miscellanea: o Move the test for checking $commit_log_possible_stack_dump above the test for a long line commit message o Add test for hex address surrounded by square or angle brackets Signed-off-by: Joe Perches <[email protected]> Reported-by: Stephen Smalley <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-06lib/hexdump.c: truncate output in case of overflowAndy Shevchenko1-1/+5
There is a classical off-by-one error in case when we try to place, for example, 1+1 bytes as hex in the buffer of size 6. The expected result is to get an output truncated, but in the reality we get 6 bytes filed followed by terminating NUL. Change the logic how we fill the output in case of byte dumping into limited space. This will follow the snprintf() behaviour by truncating output even on half bytes. Fixes: 114fc1afb2de (hexdump: make it return number of bytes placed in buffer) Signed-off-by: Andy Shevchenko <[email protected]> Reported-by: Aaro Koskinen <[email protected]> Tested-by: Aaro Koskinen <[email protected]> Cc: Al Viro <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-06rbtree: clarify documentation of rbtree_postorder_for_each_entry_safe()Cody P Schafer1-2/+10
I noticed that commit a20135ffbc44 ("writeback: don't drain bdi_writeback_congested on bdi destruction") added a usage of rbtree_postorder_for_each_entry_safe() in mm/backing-dev.c which appears to try to rb_erase() elements from an rbtree while iterating over it using rbtree_postorder_for_each_entry_safe(). Doing this will cause random nodes to be missed by the iteration because rb_erase() may rebalance the tree, changing the ordering that we're trying to iterate over. The previous documentation for rbtree_postorder_for_each_entry_safe() wasn't clear that this wasn't allowed, it was taken from the docs for list_for_each_entry_safe(), where erasing isn't a problem due to list_del() not reordering. Explicitly warn developers about this potential pit-fall. Note that I haven't fixed the actual issue that (it appears) the commit referenced above introduced (not familiar enough with that code). In general (and in this case), the patterns to follow are: - switch to rb_first() + rb_erase(), don't use rbtree_postorder_for_each_entry_safe(). - keep the postorder iteration and don't rb_erase() at all. Instead just clear the fields of rb_node & cgwb_congested_tree as required by other users of those structures. [[email protected]: tweak comments] Signed-off-by: Cody P Schafer <[email protected]> Cc: John de la Garza <[email protected]> Cc: Michel Lespinasse <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rusty Russell <[email protected]> Cc: Tejun Heo <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-06lib/is_single_threaded.c: change current_is_single_threaded() to use ↵Oleg Nesterov1-3/+2
for_each_thread() Change current_is_single_threaded() to use for_each_thread() rather than deprecated while_each_thread(). Signed-off-by: Oleg Nesterov <[email protected]> Cc: David Howells <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-06lib/kobject.c: use kvasprintf_const for formatting ->nameRasmus Villemoes1-8/+22
Sometimes kobject_set_name_vargs is called with a format string conaining no %, or a format string of precisely "%s", where the single vararg happens to point to .rodata. kvasprintf_const detects these cases for us and returns a copy of that pointer instead of duplicating the string, thus saving some run-time memory. Otherwise, it falls back to kvasprintf. We just need to always deallocate ->name using kfree_const. Unfortunately, the dance we need to do to perform the '/' -> '!' sanitization makes the resulting code rather ugly. I instrumented kstrdup_const to provide some statistics on the memory saved, and for me this gave an additional ~14KB after boot (306KB was already saved; this patch bumped that to 320KB). I have KMALLOC_SHIFT_LOW==3, and since 80% of the kvasprintf_const hits were satisfied by an 8-byte allocation, the 14K would roughly be quadrupled when KMALLOC_SHIFT_LOW==5. Whether these numbers are sufficient to justify the ugliness I'll leave to others to decide. Signed-off-by: Rasmus Villemoes <[email protected]> Cc: Greg KH <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-06lib/kasprintf.c: introduce kvasprintf_constRasmus Villemoes2-0/+18
This adds kvasprintf_const which tries to use kstrdup_const if possible: If the format string contains no % characters, or if the format string is exactly "%s", we delegate to kstrdup_const. Otherwise, we fall back to kvasprintf. Just as for kstrdup_const, the main motivation is to save memory by reusing .rodata when possible. The return value should be freed by kfree_const, just like for kstrdup_const. There is deliberately no kasprintf_const: In the vast majority of cases, the format string argument is a literal, so one can determine statically whether one could instead use kstrdup_const directly (which would also require one to change all corresponding kfree calls to kfree_const). Signed-off-by: Rasmus Villemoes <[email protected]> Cc: Greg KH <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-06lib/llist.c: fix data race in llist_del_firstDmitry Vyukov1-2/+2
llist_del_first reads entry->next, but it did not acquire visibility over the entry node. As the result it can get a stale value of entry->next (e.g. NULL or whatever garbage was there before the appending thread wrote correct value). And then commit that value as llist head with cmpxchg. That will corrupt llist. Note there is a control-dependency between read of head->first and read of entry->next, but it does not make the code correct. Kernel memory model unambiguously says: "A load-load control dependency requires a full read memory barrier". Use smp_load_acquire to acquire visibility over the entry node. The data race was found with KernelThreadSanitizer (KTSAN). Here is an example of KTSAN report: ThreadSanitizer: data-race in llist_del_first Read of size 1 by thread T389 (K2630, CPU0): [<ffffffff8156b8a9>] llist_del_first+0x39/0x70 lib/llist.c:74 [< inlined >] tty_buffer_alloc drivers/tty/tty_buffer.c:181 [<ffffffff81664af4>] __tty_buffer_request_room+0xb4/0x250 drivers/tty/tty_buffer.c:292 [<ffffffff81664e6c>] tty_insert_flip_string_fixed_flag+0x6c/0x150 drivers/tty/tty_buffer.c:337 [< inlined >] tty_insert_flip_string include/linux/tty_flip.h:35 [<ffffffff81667422>] pty_write+0x72/0xc0 drivers/tty/pty.c:110 [< inlined >] process_output_block drivers/tty/n_tty.c:611 [<ffffffff8165c016>] n_tty_write+0x346/0x7f0 drivers/tty/n_tty.c:2401 [< inlined >] do_tty_write drivers/tty/tty_io.c:1159 [<ffffffff816568df>] tty_write+0x21f/0x3f0 drivers/tty/tty_io.c:1245 [<ffffffff8125f00f>] __vfs_write+0x5f/0x1f0 fs/read_write.c:489 [<ffffffff8125ff8f>] vfs_write+0xef/0x280 fs/read_write.c:538 [< inlined >] SYSC_write fs/read_write.c:585 [<ffffffff81261390>] SyS_write+0x70/0xe0 fs/read_write.c:577 [<ffffffff81ee862e>] entry_SYSCALL_64_fastpath+0x12/0x71 arch/x86/entry/entry_64.S:186 Previous write of size 8 by thread T226 (K761, CPU0): [<ffffffff8156b832>] llist_add_batch+0x32/0x70 lib/llist.c:44 (discriminator 16) [< inlined >] llist_add include/linux/llist.h:180 [<ffffffff816649fc>] tty_buffer_free+0x6c/0xb0 drivers/tty/tty_buffer.c:221 [<ffffffff816651e7>] flush_to_ldisc+0x107/0x300 drivers/tty/tty_buffer.c:514 [<ffffffff810b20ee>] process_one_work+0x47e/0x930 kernel/workqueue.c:2036 [<ffffffff810b2650>] worker_thread+0xb0/0x900 kernel/workqueue.c:2170 [<ffffffff810bbe20>] kthread+0x150/0x170 kernel/kthread.c:209 [<ffffffff81ee8a1f>] ret_from_fork+0x3f/0x70 arch/x86/entry/entry_64.S:526 Signed-off-by: Dmitry Vyukov <[email protected]> Reviewed-by: Paul E. McKenney <[email protected]> Cc: Rasmus Villemoes <[email protected]> Cc: Huang Ying <[email protected]> Cc: Konstantin Serebryany <[email protected]> Cc: Andrey Konovalov <[email protected]> Cc: Alexander Potapenko <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-06lib/test-string_helpers.c: add string_get_size() testsVitaly Kuznetsov1-0/+36
Add a couple of simple tests for string_get_size(). The last one will hang the kernel without the 'lib/string_helpers.c: fix infinite loop in string_get_size()' fix. Signed-off-by: Vitaly Kuznetsov <[email protected]> Cc: James Bottomley <[email protected]> Cc: Andy Shevchenko <[email protected]> Cc: Rasmus Villemoes <[email protected]> Cc: "K. Y. Srinivasan" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-06lib/halfmd4.c: use rol32 inline function in the ROUND macroAlexander Kuleshov1-1/+2
<linux/bitops.h> provides rol32() inline function, let's use already predefined function instead of direct expression. Signed-off-by: Alexander Kuleshov <[email protected]> Cc: Herbert Xu <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-06arch/x86/kernel/cpu/perf_event_msr.c: use sign_extend64() for sign extensionMartin Kepplinger1-4/+3
Signed-off-by: Martin Kepplinger <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: George Spelvin <[email protected]> Cc: Rasmus Villemoes <[email protected]> Cc: Maxime Coquelin <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: Yury Norov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-06arch/sh/kernel/traps_64.c: use sign_extend64() for sign extensionMartin Kepplinger2-2/+2
Signed-off-by: Martin Kepplinger <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: George Spelvin <[email protected]> Cc: Rasmus Villemoes <[email protected]> Cc: Maxime Coquelin <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: Yury Norov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-06bitops.h: add sign_extend64()Martin Kepplinger1-0/+11
Months back, this was discussed, see https://lkml.org/lkml/2015/1/18/289 The result was the 64-bit version being "likely fine", "valuable" and "correct". The discussion fell asleep but since there are possible users, let's add it. Signed-off-by: Martin Kepplinger <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: George Spelvin <[email protected]> Cc: Rasmus Villemoes <[email protected]> Cc: Maxime Coquelin <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: Yury Norov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-06bitops.h: improve sign_extend32()'s documentationMartin Kepplinger1-0/+2
It is often overlooked that sign_extend32(), despite its name, is safe to use for 16 and 8 bit types as well. This should help prevent sign extension being done manually some other way. Signed-off-by: Martin Kepplinger <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: George Spelvin <[email protected]> Cc: Rasmus Villemoes <[email protected]> Cc: Maxime Coquelin <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: Yury Norov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-06MAINTAINERS: add missing extcon directoryChanwoo Choi1-0/+3
Add the missing extcon directory to maintain them. When using get_maintainer.pl, the result should include the correct maintainer information. Signed-off-by: Chanwoo Choi <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-06get_maintainer: add subsystem to reviewer outputJoe Perches1-15/+16
Reviewer output currently does not include the subsystem that matched. Add it. Miscellanea: o Add a get_subsystem_name routine to centralize this Signed-off-by: Joe Perches <[email protected]> Tested-by: Krzysztof Kozlowski <[email protected]> Cc: Lee Jones <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-06get_maintainer: --r (list reviewer) is on by defaultBrian Norris1-1/+1
We don't consistenly document the default value next to the option listing, but we do have a list of defaults here, so let's keep it up to date. Signed-off-by: Brian Norris <[email protected]> Signed-off-by: Joe Perches <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-06get_maintainer: add --no-foo options to --helpBrian Norris1-0/+3
Many flag options are boolean and support both a positive and a negative invocation from the command line. Some of these are even mentioned by example (e.g., --nogit is mentioned as a default option), but they aren't explicitly mentioned in the list of options. It happens that some of these are pretty important, as they are default-on, and to turn them off, you have to know about the --no-foo version. Rather than clutter the whole help text with bracketed '--[no]foo', let's just mention the general rule, a la 'man gcc'. Signed-off-by: Brian Norris <[email protected]> Signed-off-by: Joe Perches <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-06get_maintainer: it's '--pattern-depth', not '-pattern-depth'Brian Norris1-1/+1
Though it appears that Perl's GetOptions will take either, the latter is not documented in the options listing. Signed-off-by: Brian Norris <[email protected]> Signed-off-by: Joe Perches <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-06get_maintainer: add missing documentation for --git-blame-signaturesBrian Norris1-0/+1
I really haven't used this option much myself, so feel free to improve on the documentation for it. I just noticed it while inspecting this script for undocumented features. Signed-off-by: Brian Norris <[email protected]> Signed-off-by: Joe Perches <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-06printk: prevent userland from spoofing kernel messagesMathias Krause1-5/+8
The following statement of ABI/testing/dev-kmsg is not quite right: It is not possible to inject messages from userspace with the facility number LOG_KERN (0), to make sure that the origin of the messages can always be reliably determined. Userland actually can inject messages with a facility of 0 by abusing the fact that the facility is stored in a u8 data type. By using a facility which is a multiple of 256 the assignment of msg->facility in log_store() implicitly truncates it to 0, i.e. LOG_KERN, allowing users of /dev/kmsg to spoof kernel messages as shown below: The following call... # printf '<%d>Kernel panic - not syncing: beer empty\n' 0 >/dev/kmsg ...leads to the following log entry (dmesg -x | tail -n 1): user :emerg : [ 66.137758] Kernel panic - not syncing: beer empty However, this call... # printf '<%d>Kernel panic - not syncing: beer empty\n' 0x800 >/dev/kmsg ...leads to the slightly different log entry (note the kernel facility): kern :emerg : [ 74.177343] Kernel panic - not syncing: beer empty Fix that by limiting the user provided facility to 8 bit right from the beginning and catch the truncation early. Fixes: 7ff9554bb578 ("printk: convert byte-buffer to variable-length...") Signed-off-by: Mathias Krause <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Petr Mladek <[email protected]> Cc: Alex Elder <[email protected]> Cc: Joe Perches <[email protected]> Cc: Kay Sievers <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-06lib/vsprintf.c: update documentationRasmus Villemoes2-9/+10
%n is no longer just ignored; it results in early return from vsnprintf. Also add a request to add test cases for future %p extensions. Signed-off-by: Rasmus Villemoes <[email protected]> Reviewed-by: Martin Kletzander <[email protected]> Reviewed-by: Andy Shevchenko <[email protected]> Cc: Jonathan Corbet <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-06selftests: run lib/test_printf moduleKees Cook3-0/+19
This runs the lib/test_printf module to make sure printf is operating sanely. Signed-off-by: Kees Cook <[email protected]> Cc: Rasmus Villemoes <[email protected]> Cc: Shuah Khan <[email protected]> Cc: Andy Shevchenko <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-06test_printf: test printf family at runtimeRasmus Villemoes3-0/+366
This adds a simple module for testing the kernel's printf facilities. Previously, some %p extensions have caused a wrong return value in case the entire output didn't fit and/or been unusable in kasprintf(). This should help catch such issues. Also, it should help ensure that changes to the formatting algorithms don't break anything. I'm not sure if we have a struct dentry or struct file lying around at boot time or if we can fake one, but most %p extensions should be testable, as should the ordinary number and string formatting. The nature of vararg functions means we can't use a more conventional table-driven approach. For now, this is mostly a skeleton; contributions are very welcome. Some tests are/will be slightly annoying to write, since the expected output depends on stuff like CONFIG_*, sizeof(long), runtime values etc. Signed-off-by: Rasmus Villemoes <[email protected]> Reviewed-by: Kees Cook <[email protected]> Cc: Andy Shevchenko <[email protected]> Cc: Martin Kletzander <[email protected]> Cc: Rasmus Villemoes <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-06lib/vsprintf.c: remove SPECIAL handling in pointer()Rasmus Villemoes1-1/+1
As a quick git grep -E '%[ +0#-]*#[ +0#-]*(\*|[0-9]+)?(\.(\*|[0-9]+)?)?p' shows, nobody uses the # flag with %p. Should one try to do so, one will be met with warning: `#' flag used with `%p' gnu_printf format [-Wformat] (POSIX and C99 both say "... For other conversion specifiers, the behavior is undefined.". Obviously, the kernel can choose to define the behaviour however it wants, but as long as gcc issues that warning, users are unlikely to show up.) Since default_width is effectively always 2*sizeof(void*), we can simplify the prologue of pointer() and save a few instructions. Signed-off-by: Rasmus Villemoes <[email protected]> Reviewed-by: Andy Shevchenko <[email protected]> Acked-by: Kees Cook <[email protected]> Cc: Martin Kletzander <[email protected]> Cc: Rasmus Villemoes <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-06lib/vsprintf.c: also improve sanity check in bstr_printf()Rasmus Villemoes1-1/+1
Quoting from 2aa2f9e21e4e ("lib/vsprintf.c: improve sanity check in vsnprintf()"): On 64 bit, size may very well be huge even if bit 31 happens to be 0. Somehow it doesn't feel right that one can pass a 5 GiB buffer but not a 3 GiB one. So cap at INT_MAX as was probably the intention all along. This is also the made-up value passed by sprintf and vsprintf. I should have seen this copy-pasted instance back then, but let's just do it now. Signed-off-by: Rasmus Villemoes <[email protected]> Reviewed-by: Andy Shevchenko <[email protected]> Acked-by: Kees Cook <[email protected]> Cc: Martin Kletzander <[email protected]> Cc: Rasmus Villemoes <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-06lib/vsprintf.c: handle invalid format specifiers more robustlyRasmus Villemoes1-10/+21
If we meet any invalid or unsupported format specifier, 'handling' it by just printing it as a literal string is not safe: Presumably the format string and the arguments passed gcc's type checking, but that means something like sprintf(buf, "%n %pd", &intvar, dentry) would end up interpreting &intvar as a struct dentry*. When the offending specifier was %n it used to be at the end of the format string, but we can't rely on that always being the case. Also, gcc doesn't complain about some more or less exotic qualifiers (or 'length modifiers' in posix-speak) such as 'j' or 'q', but being unrecognized by the kernel's printf implementation, they'd be interpreted as unknown specifiers, and the rest of arguments would be interpreted wrongly. So let's complain about anything we don't understand, not just %n, and stop pretending that we'd be able to make sense of the rest of the format/arguments. If the offending specifier is in a printk() call we unfortunately only get a "BUG: recent printk recursion!", but at least direct users of the sprintf family will be caught. Signed-off-by: Rasmus Villemoes <[email protected]> Reviewed-by: Andy Shevchenko <[email protected]> Acked-by: Kees Cook <[email protected]> Cc: Martin Kletzander <[email protected]> Cc: Rasmus Villemoes <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-06printk: synchronize %p formatting documentationMartin Kletzander2-32/+37
Move all pointer-formatting documentation to one place in the code and one place in the documentation instead of keeping it in three places with different level of completeness. Documentation/printk-formats.txt has detailed information about each modifier, docstring above pointer() has short descriptions of them (as that is the function dealing with %p) and docstring above vsprintf() is removed as redundant. Both docstrings in the code that were modified are updated with a reminder of updating the documentation upon any further change. [[email protected]: fix comment] Signed-off-by: Martin Kletzander <[email protected]> Reviewed-by: Andy Shevchenko <[email protected]> Cc: Rasmus Villemoes <[email protected]> Cc: Jonathan Corbet <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-06lib/dynamic_debug.c: use kstrdup_constRasmus Villemoes1-4/+4
Using kstrdup_const, thus reusing .rodata when possible, saves around 2 kB of runtime memory on my laptop/.config combination. Signed-off-by: Rasmus Villemoes <[email protected]> Cc: Jason Baron <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-06fs/jffs2/wbuf.c: remove stray semicolonAndrew Morton1-1/+1
Reported-by: Wu Fengguang <[email protected]> Cc: Sasha Levin <[email protected]> Cc: David Woodhouse <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>