aboutsummaryrefslogtreecommitdiff
path: root/tools/perf/scripts/python/failed-syscalls-by-pid.py
diff options
context:
space:
mode:
authorPeter Zijlstra <[email protected]>2022-10-06 15:00:39 +0200
committerPeter Zijlstra <[email protected]>2022-10-17 16:32:05 +0200
commitca6c21327c6af02b7eec31ce4b9a740a18c6c13f (patch)
tree9d1f4e125bef71966eef12256f3495ba21940281 /tools/perf/scripts/python/failed-syscalls-by-pid.py
parent9abf2313adc1ca1b6180c508c25f22f9395cc780 (diff)
perf: Fix missing SIGTRAPs
Marco reported: Due to the implementation of how SIGTRAP are delivered if perf_event_attr::sigtrap is set, we've noticed 3 issues: 1. Missing SIGTRAP due to a race with event_sched_out() (more details below). 2. Hardware PMU events being disabled due to returning 1 from perf_event_overflow(). The only way to re-enable the event is for user space to first "properly" disable the event and then re-enable it. 3. The inability to automatically disable an event after a specified number of overflows via PERF_EVENT_IOC_REFRESH. The worst of the 3 issues is problem (1), which occurs when a pending_disable is "consumed" by a racing event_sched_out(), observed as follows: CPU0 | CPU1 --------------------------------+--------------------------- __perf_event_overflow() | perf_event_disable_inatomic() | pending_disable = CPU0 | ... | _perf_event_enable() | event_function_call() | task_function_call() | /* sends IPI to CPU0 */ <IPI> | ... __perf_event_enable() +--------------------------- ctx_resched() task_ctx_sched_out() ctx_sched_out() group_sched_out() event_sched_out() pending_disable = -1 </IPI> <IRQ-work> perf_pending_event() perf_pending_event_disable() /* Fails to send SIGTRAP because no pending_disable! */ </IRQ-work> In the above case, not only is that particular SIGTRAP missed, but also all future SIGTRAPs because 'event_limit' is not reset back to 1. To fix, rework pending delivery of SIGTRAP via IRQ-work by introduction of a separate 'pending_sigtrap', no longer using 'event_limit' and 'pending_disable' for its delivery. Additionally; and different to Marco's proposed patch: - recognise that pending_disable effectively duplicates oncpu for the case where it is set. As such, change the irq_work handler to use ->oncpu to target the event and use pending_* as boolean toggles. - observe that SIGTRAP targets the ctx->task, so the context switch optimization that carries contexts between tasks is invalid. If the irq_work were delayed enough to hit after a context switch the SIGTRAP would be delivered to the wrong task. - observe that if the event gets scheduled out (rotation/migration/context-switch/...) the irq-work would be insufficient to deliver the SIGTRAP when the event gets scheduled back in (the irq-work might still be pending on the old CPU). Therefore have event_sched_out() convert the pending sigtrap into a task_work which will deliver the signal at return_to_user. Fixes: 97ba62b27867 ("perf: Add support for SIGTRAP on perf events") Reported-by: Dmitry Vyukov <[email protected]> Debugged-by: Dmitry Vyukov <[email protected]> Reported-by: Marco Elver <[email protected]> Debugged-by: Marco Elver <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Marco Elver <[email protected]> Tested-by: Marco Elver <[email protected]>
Diffstat (limited to 'tools/perf/scripts/python/failed-syscalls-by-pid.py')
0 files changed, 0 insertions, 0 deletions