From 5c9a8750a6409c63a0f01d51a9024861022f6593 Mon Sep 17 00:00:00 2001 From: Dmitry Vyukov Date: Tue, 22 Mar 2016 14:27:30 -0700 Subject: kernel: add kcov code coverage kcov provides code coverage collection for coverage-guided fuzzing (randomized testing). Coverage-guided fuzzing is a testing technique that uses coverage feedback to determine new interesting inputs to a system. A notable user-space example is AFL (http://lcamtuf.coredump.cx/afl/). However, this technique is not widely used for kernel testing due to missing compiler and kernel support. kcov does not aim to collect as much coverage as possible. It aims to collect more or less stable coverage that is function of syscall inputs. To achieve this goal it does not collect coverage in soft/hard interrupts and instrumentation of some inherently non-deterministic or non-interesting parts of kernel is disbled (e.g. scheduler, locking). Currently there is a single coverage collection mode (tracing), but the API anticipates additional collection modes. Initially I also implemented a second mode which exposes coverage in a fixed-size hash table of counters (what Quentin used in his original patch). I've dropped the second mode for simplicity. This patch adds the necessary support on kernel side. The complimentary compiler support was added in gcc revision 231296. We've used this support to build syzkaller system call fuzzer, which has found 90 kernel bugs in just 2 months: https://github.com/google/syzkaller/wiki/Found-Bugs We've also found 30+ bugs in our internal systems with syzkaller. Another (yet unexplored) direction where kcov coverage would greatly help is more traditional "blob mutation". For example, mounting a random blob as a filesystem, or receiving a random blob over wire. Why not gcov. Typical fuzzing loop looks as follows: (1) reset coverage, (2) execute a bit of code, (3) collect coverage, repeat. A typical coverage can be just a dozen of basic blocks (e.g. an invalid input). In such context gcov becomes prohibitively expensive as reset/collect coverage steps depend on total number of basic blocks/edges in program (in case of kernel it is about 2M). Cost of kcov depends only on number of executed basic blocks/edges. On top of that, kernel requires per-thread coverage because there are always background threads and unrelated processes that also produce coverage. With inlined gcov instrumentation per-thread coverage is not possible. kcov exposes kernel PCs and control flow to user-space which is insecure. But debugfs should not be mapped as user accessible. Based on a patch by Quentin Casasnovas. [akpm@linux-foundation.org: make task_struct.kcov_mode have type `enum kcov_mode'] [akpm@linux-foundation.org: unbreak allmodconfig] [akpm@linux-foundation.org: follow x86 Makefile layout standards] Signed-off-by: Dmitry Vyukov Reviewed-by: Kees Cook Cc: syzkaller Cc: Vegard Nossum Cc: Catalin Marinas Cc: Tavis Ormandy Cc: Will Deacon Cc: Quentin Casasnovas Cc: Kostya Serebryany Cc: Eric Dumazet Cc: Alexander Potapenko Cc: Kees Cook Cc: Bjorn Helgaas Cc: Sasha Levin Cc: David Drysdale Cc: Ard Biesheuvel Cc: Andrey Ryabinin Cc: Kirill A. Shutemov Cc: Jiri Slaby Cc: Ingo Molnar Cc: Thomas Gleixner Cc: "H. Peter Anvin" Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- kernel/exit.c | 2 ++ 1 file changed, 2 insertions(+) (limited to 'kernel/exit.c') diff --git a/kernel/exit.c b/kernel/exit.c index 10e088237fed..953d1a1c0387 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -53,6 +53,7 @@ #include #include #include +#include #include #include @@ -655,6 +656,7 @@ void do_exit(long code) TASKS_RCU(int tasks_rcu_i); profile_task_exit(tsk); + kcov_task_exit(tsk); WARN_ON(blk_needs_flush_plug(tsk)); -- cgit From 36324a990cf578b57828c04cd85ac62cd25cf5a4 Mon Sep 17 00:00:00 2001 From: Michal Hocko Date: Fri, 25 Mar 2016 14:20:27 -0700 Subject: oom: clear TIF_MEMDIE after oom_reaper managed to unmap the address space When oom_reaper manages to unmap all the eligible vmas there shouldn't be much of the freable memory held by the oom victim left anymore so it makes sense to clear the TIF_MEMDIE flag for the victim and allow the OOM killer to select another task. The lack of TIF_MEMDIE also means that the victim cannot access memory reserves anymore but that shouldn't be a problem because it would get the access again if it needs to allocate and hits the OOM killer again due to the fatal_signal_pending resp. PF_EXITING check. We can safely hide the task from the OOM killer because it is clearly not a good candidate anymore as everyhing reclaimable has been torn down already. This patch will allow to cap the time an OOM victim can keep TIF_MEMDIE and thus hold off further global OOM killer actions granted the oom reaper is able to take mmap_sem for the associated mm struct. This is not guaranteed now but further steps should make sure that mmap_sem for write should be blocked killable which will help to reduce such a lock contention. This is not done by this patch. Note that exit_oom_victim might be called on a remote task from __oom_reap_task now so we have to check and clear the flag atomically otherwise we might race and underflow oom_victims or wake up waiters too early. Signed-off-by: Michal Hocko Suggested-by: Johannes Weiner Suggested-by: Tetsuo Handa Cc: Andrea Argangeli Cc: David Rientjes Cc: Hugh Dickins Cc: Mel Gorman Cc: Oleg Nesterov Cc: Rik van Riel Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/oom.h | 2 +- kernel/exit.c | 2 +- mm/oom_kill.c | 73 +++++++++++++++++++++++++++++++++++------------------ 3 files changed, 50 insertions(+), 27 deletions(-) (limited to 'kernel/exit.c') diff --git a/include/linux/oom.h b/include/linux/oom.h index 03e6257321f0..45993b840ed6 100644 --- a/include/linux/oom.h +++ b/include/linux/oom.h @@ -91,7 +91,7 @@ extern enum oom_scan_t oom_scan_process_thread(struct oom_control *oc, extern bool out_of_memory(struct oom_control *oc); -extern void exit_oom_victim(void); +extern void exit_oom_victim(struct task_struct *tsk); extern int register_oom_notifier(struct notifier_block *nb); extern int unregister_oom_notifier(struct notifier_block *nb); diff --git a/kernel/exit.c b/kernel/exit.c index 953d1a1c0387..fd90195667e1 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -435,7 +435,7 @@ static void exit_mm(struct task_struct *tsk) mm_update_next_owner(mm); mmput(mm); if (test_thread_flag(TIF_MEMDIE)) - exit_oom_victim(); + exit_oom_victim(tsk); } static struct task_struct *find_alive_thread(struct task_struct *p) diff --git a/mm/oom_kill.c b/mm/oom_kill.c index f7ed6ece0719..2830b1c6483e 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -416,20 +416,36 @@ bool oom_killer_disabled __read_mostly; * victim (if that is possible) to help the OOM killer to move on. */ static struct task_struct *oom_reaper_th; -static struct mm_struct *mm_to_reap; +static struct task_struct *task_to_reap; static DECLARE_WAIT_QUEUE_HEAD(oom_reaper_wait); -static bool __oom_reap_vmas(struct mm_struct *mm) +static bool __oom_reap_task(struct task_struct *tsk) { struct mmu_gather tlb; struct vm_area_struct *vma; + struct mm_struct *mm; + struct task_struct *p; struct zap_details details = {.check_swap_entries = true, .ignore_dirty = true}; bool ret = true; - /* We might have raced with exit path */ - if (!atomic_inc_not_zero(&mm->mm_users)) + /* + * Make sure we find the associated mm_struct even when the particular + * thread has already terminated and cleared its mm. + * We might have race with exit path so consider our work done if there + * is no mm. + */ + p = find_lock_task_mm(tsk); + if (!p) + return true; + + mm = p->mm; + if (!atomic_inc_not_zero(&mm->mm_users)) { + task_unlock(p); return true; + } + + task_unlock(p); if (!down_read_trylock(&mm->mmap_sem)) { ret = false; @@ -464,60 +480,66 @@ static bool __oom_reap_vmas(struct mm_struct *mm) } tlb_finish_mmu(&tlb, 0, -1); up_read(&mm->mmap_sem); + + /* + * Clear TIF_MEMDIE because the task shouldn't be sitting on a + * reasonably reclaimable memory anymore. OOM killer can continue + * by selecting other victim if unmapping hasn't led to any + * improvements. This also means that selecting this task doesn't + * make any sense. + */ + tsk->signal->oom_score_adj = OOM_SCORE_ADJ_MIN; + exit_oom_victim(tsk); out: mmput(mm); return ret; } -static void oom_reap_vmas(struct mm_struct *mm) +static void oom_reap_task(struct task_struct *tsk) { int attempts = 0; /* Retry the down_read_trylock(mmap_sem) a few times */ - while (attempts++ < 10 && !__oom_reap_vmas(mm)) + while (attempts++ < 10 && !__oom_reap_task(tsk)) schedule_timeout_idle(HZ/10); /* Drop a reference taken by wake_oom_reaper */ - mmdrop(mm); + put_task_struct(tsk); } static int oom_reaper(void *unused) { while (true) { - struct mm_struct *mm; + struct task_struct *tsk; wait_event_freezable(oom_reaper_wait, - (mm = READ_ONCE(mm_to_reap))); - oom_reap_vmas(mm); - WRITE_ONCE(mm_to_reap, NULL); + (tsk = READ_ONCE(task_to_reap))); + oom_reap_task(tsk); + WRITE_ONCE(task_to_reap, NULL); } return 0; } -static void wake_oom_reaper(struct mm_struct *mm) +static void wake_oom_reaper(struct task_struct *tsk) { - struct mm_struct *old_mm; + struct task_struct *old_tsk; if (!oom_reaper_th) return; - /* - * Pin the given mm. Use mm_count instead of mm_users because - * we do not want to delay the address space tear down. - */ - atomic_inc(&mm->mm_count); + get_task_struct(tsk); /* * Make sure that only a single mm is ever queued for the reaper * because multiple are not necessary and the operation might be * disruptive so better reduce it to the bare minimum. */ - old_mm = cmpxchg(&mm_to_reap, NULL, mm); - if (!old_mm) + old_tsk = cmpxchg(&task_to_reap, NULL, tsk); + if (!old_tsk) wake_up(&oom_reaper_wait); else - mmdrop(mm); + put_task_struct(tsk); } static int __init oom_init(void) @@ -532,7 +554,7 @@ static int __init oom_init(void) } subsys_initcall(oom_init) #else -static void wake_oom_reaper(struct mm_struct *mm) +static void wake_oom_reaper(struct task_struct *tsk) { } #endif @@ -563,9 +585,10 @@ void mark_oom_victim(struct task_struct *tsk) /** * exit_oom_victim - note the exit of an OOM victim */ -void exit_oom_victim(void) +void exit_oom_victim(struct task_struct *tsk) { - clear_thread_flag(TIF_MEMDIE); + if (!test_and_clear_tsk_thread_flag(tsk, TIF_MEMDIE)) + return; if (!atomic_dec_return(&oom_victims)) wake_up_all(&oom_victims_wait); @@ -748,7 +771,7 @@ void oom_kill_process(struct oom_control *oc, struct task_struct *p, rcu_read_unlock(); if (can_oom_reap) - wake_oom_reaper(mm); + wake_oom_reaper(victim); mmdrop(mm); put_task_struct(victim); -- cgit