aboutsummaryrefslogtreecommitdiff
path: root/kernel/events/ring_buffer.c
AgeCommit message (Collapse)AuthorFilesLines
2015-02-04perf: Use POLLIN instead of POLL_IN for perf poll data in flagJiri Olsa1-1/+2
Currently we flag available data (via poll syscall) on perf fd with POLL_IN macro, which is normally used for SIGIO interface. We've been lucky, because POLLIN (0x1) is subset of POLL_IN (0x20001) and sys_poll (do_pollfd function) cut the extra bit out (0x20000). Signed-off-by: Jiri Olsa <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Linus Torvalds <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2013-12-11perf: Optimize ring-buffer write by depending on control dependenciesPeter Zijlstra1-16/+26
Remove a full barrier from the ring-buffer write path by relying on a control dependency to order a LOAD -> STORE scenario. Cc: "Paul E. McKenney" <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/n/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2013-11-06perf: Update a stale commentPeter Zijlstra1-2/+2
Signed-off-by: Peter Zijlstra <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Michael Neuling <[email protected]> Cc: "Paul E. McKenney" <[email protected]> Cc: [email protected] Cc: Vince Weaver <[email protected]> Cc: Victor Kaplansky <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Anton Blanchard <[email protected]> Link: http://lkml.kernel.org/n/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2013-11-06perf: Optimize perf_output_begin() -- address calculationPeter Zijlstra1-7/+7
Rewrite the handle address calculation code to be clearer. Saves 8 bytes on x86_64-defconfig. Signed-off-by: Peter Zijlstra <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Michael Neuling <[email protected]> Cc: "Paul E. McKenney" <[email protected]> Cc: [email protected] Cc: Vince Weaver <[email protected]> Cc: Victor Kaplansky <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Anton Blanchard <[email protected]> Link: http://lkml.kernel.org/n/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2013-11-06perf: Optimize perf_output_begin() -- lost_event casePeter Zijlstra1-5/+8
Avoid touching the lost_event and sample_data cachelines twince. Its not like we end up doing less work, but it might help to keep all accesses to these cachelines in one place. Due to code shuffle, this looses 4 bytes on x86_64-defconfig. Signed-off-by: Peter Zijlstra <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Michael Neuling <[email protected]> Cc: "Paul E. McKenney" <[email protected]> Cc: [email protected] Cc: Vince Weaver <[email protected]> Cc: Victor Kaplansky <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Anton Blanchard <[email protected]> Link: http://lkml.kernel.org/n/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2013-11-06perf: Optimize perf_output_begin()Peter Zijlstra1-8/+9
There's no point in re-doing the memory-barrier when we fail the cmpxchg(). Also placing it after the space reservation loop makes it clearer it only separates the userpage->tail read from the data stores. Signed-off-by: Peter Zijlstra <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Michael Neuling <[email protected]> Cc: "Paul E. McKenney" <[email protected]> Cc: [email protected] Cc: Vince Weaver <[email protected]> Cc: Victor Kaplansky <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Anton Blanchard <[email protected]> Link: http://lkml.kernel.org/n/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2013-11-06perf: Add unlikely() to the ring-buffer codePeter Zijlstra1-8/+8
Add unlikely() annotations to 'slow' paths: When having a sampling event but no output buffer; you have bigger issues -- also the bail is still faster than actually doing the work. When having a sampling event but a control page only buffer, you have bigger issues -- again the bail is still faster than actually doing work. Optimize for the case where you're not loosing events -- again, not doing the work is still faster but make sure that when you have to actually do work its as fast as possible. The typical watermark is 1/2 the buffer size, so most events will not take this path. Shrinks perf_output_begin() by 16 bytes on x86_64-defconfig. Signed-off-by: Peter Zijlstra <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Michael Neuling <[email protected]> Cc: "Paul E. McKenney" <[email protected]> Cc: [email protected] Cc: Vince Weaver <[email protected]> Cc: Victor Kaplansky <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Anton Blanchard <[email protected]> Link: http://lkml.kernel.org/n/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2013-11-06perf: Simplify the ring-buffer codePeter Zijlstra1-33/+4
By using CIRC_SPACE() we can obviate the need for perf_output_space(). Shrinks the size of perf_output_begin() by 17 bytes on x86_64-defconfig. Signed-off-by: Peter Zijlstra <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Michael Neuling <[email protected]> Cc: "Paul E. McKenney" <[email protected]> Cc: [email protected] Cc: Vince Weaver <[email protected]> Cc: Victor Kaplansky <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Anton Blanchard <[email protected]> Link: http://lkml.kernel.org/n/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2013-10-29perf: Fix perf ring buffer memory orderingPeter Zijlstra1-4/+27
The PPC64 people noticed a missing memory barrier and crufty old comments in the perf ring buffer code. So update all the comments and add the missing barrier. When the architecture implements local_t using atomic_long_t there will be double barriers issued; but short of introducing more conditional barrier primitives this is the best we can do. Reported-by: Victor Kaplansky <[email protected]> Tested-by: Victor Kaplansky <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: [email protected] Cc: Paul McKenney <[email protected]> Cc: Michael Neuling <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2013-05-01perf: Fix vmalloc ring buffer pages handlingJiri Olsa1-4/+10
If we allocate perf ring buffer with the size of single (user) page, we will get memory corruption when releasing itin rb_free_work function (for CONFIG_PERF_USE_VMALLOC option). For single page sized ring buffer the page_order is -1 (because nr_pages is 0). This needs to be recognized in the rb_free_work function to release proper amount of pages. Adding data_page_nr function that returns number of allocated data pages. Customizing the rest of the code to use it. Reported-by: Jan Stancek <[email protected]> Original-patch-by: Peter Zijlstra <[email protected]> Acked-by: Peter Zijlstra <[email protected]> Cc: Corey Ashford <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Signed-off-by: Jiri Olsa <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2013-03-21perf: Fix ring_buffer perf_output_space() boundary calculationStephane Eranian1-4/+18
This patch fixes a flaw in perf_output_space(). In case the size of the space needed is bigger than the actual buffer size, there may be situations where the function would return true (i.e., there is space) when it should not. head > offset due to rounding of the masking logic. The problem can be tested by activating BTS on Intel processors. A BTS record can be as big as 16 pages. The following command fails: $ perf record -m 4 -c 1 -e branches:u my_test_program You will get a buffer corruption with this. Perf report won't be able to parse the perf.data. The fix is to first check that the requested space is smaller than the buffer size. If so, then the masking logic will work fine. If not, then there is no chance the record can be saved and it will be gracefully handled by upper code layers. [ In v2, we also make the logic for the writable more explicit by renaming it to rb->overwrite because it tells whether or not the buffer can overwrite its tail (suggested by PeterZ). ] Signed-off-by: Stephane Eranian <[email protected]> Acked-by: Peter Zijlstra <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/20130318133327.GA3056@quad Signed-off-by: Ingo Molnar <[email protected]>
2012-08-10perf: Add perf_output_skip function to skip bytes in sampleJiri Olsa1-0/+6
Introducing perf_output_skip function to be able to skip data within the perf ring buffer. When writing data into perf ring buffer we first reserve needed place in ring buffer and then copy the actual data. There's a possibility we won't be able to fill all the reserved size with data, so we need a way to skip the remaining bytes. This is going to be useful when storing the user stack dump, where we might end up with less data than we originally requested. Signed-off-by: Jiri Olsa <[email protected]> Acked-by: Frederic Weisbecker <[email protected]> Cc: "Frank Ch. Eigler" <[email protected]> Cc: Arun Sharma <[email protected]> Cc: Benjamin Redelings <[email protected]> Cc: Corey Ashford <[email protected]> Cc: Cyrill Gorcunov <[email protected]> Cc: Frank Ch. Eigler <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Robert Richter <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Tom Zanussi <[email protected]> Cc: Ulrich Drepper <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2012-08-10perf: Factor __output_copy to be usable with specific copy functionFrederic Weisbecker1-2/+2
Adding a generic way to use __output_copy function with specific copy function via DEFINE_PERF_OUTPUT_COPY macro. Using this to add new __output_copy_user function, that provides output copy from user pointers. For x86 the copy_from_user_nmi function is used and __copy_from_user_inatomic for the rest of the architectures. This new function will be used in user stack dump on sample, coming in next patches. Signed-off-by: Jiri Olsa <[email protected]> Cc: "Frank Ch. Eigler" <[email protected]> Cc: Arun Sharma <[email protected]> Cc: Benjamin Redelings <[email protected]> Cc: Corey Ashford <[email protected]> Cc: Cyrill Gorcunov <[email protected]> Cc: Frank Ch. Eigler <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Robert Richter <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Tom Zanussi <[email protected]> Cc: Ulrich Drepper <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Frederic Weisbecker <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2012-01-08Merge branch 'for-linus' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (53 commits) Kconfig: acpi: Fix typo in comment. misc latin1 to utf8 conversions devres: Fix a typo in devm_kfree comment btrfs: free-space-cache.c: remove extra semicolon. fat: Spelling s/obsolate/obsolete/g SCSI, pmcraid: Fix spelling error in a pmcraid_err() call tools/power turbostat: update fields in manpage mac80211: drop spelling fix types.h: fix comment spelling for 'architectures' typo fixes: aera -> area, exntension -> extension devices.txt: Fix typo of 'VMware'. sis900: Fix enum typo 'sis900_rx_bufer_status' decompress_bunzip2: remove invalid vi modeline treewide: Fix comment and string typo 'bufer' hyper-v: Update MAINTAINERS treewide: Fix typos in various parts of the kernel, and fix some comments. clockevents: drop unknown Kconfig symbol GENERIC_CLOCKEVENTS_MIGR gpio: Kconfig: drop unknown symbol 'CS5535_GPIO' leds: Kconfig: Fix typo 'D2NET_V2' sound: Kconfig: drop unknown symbol ARCH_CLPS7500 ... Fix up trivial conflicts in arch/powerpc/platforms/40x/Kconfig (some new kconfig additions, close to removed commented-out old ones)
2012-01-02misc latin1 to utf8 conversionsAl Viro1-1/+1
Signed-off-by: Al Viro <[email protected]> Signed-off-by: Jiri Kosina <[email protected]>
2011-12-05perf: Fix loss of notification with multi-eventPeter Zijlstra1-0/+3
When you do: $ perf record -e cycles,cycles,cycles noploop 10 You expect about 10,000 samples for each event, i.e., 10s at 1000samples/sec. However, this is not what's happening. You get much fewer samples, maybe 3700 samples/event: $ perf report -D | tail -15 Aggregated stats: TOTAL events: 10998 MMAP events: 66 COMM events: 2 SAMPLE events: 10930 cycles stats: TOTAL events: 3644 SAMPLE events: 3644 cycles stats: TOTAL events: 3642 SAMPLE events: 3642 cycles stats: TOTAL events: 3644 SAMPLE events: 3644 On a Intel Nehalem or even AMD64, there are 4 counters capable of measuring cycles, so there is plenty of space to measure those events without multiplexing (even with the NMI watchdog active). And even with multiplexing, we'd expect roughly the same number of samples per event. The root of the problem was that when the event that caused the buffer to become full was not the first event passed on the cmdline, the user notification would get lost. The notification was sent to the file descriptor of the overflowed event but the perf tool was not polling on it. The perf tool aggregates all samples into a single buffer, i.e., the buffer of the first event. Consequently, it assumes notifications for any event will come via that descriptor. The seemingly straight forward solution of moving the waitq into the ringbuffer object doesn't work because of life-time issues. One could perf_event_set_output() on a fd that you're also blocking on and cause the old rb object to be freed while its waitq would still be referenced by the blocked thread -> FAIL. Therefore link all events to the ringbuffer and broadcast the wakeup from the ringbuffer object to all possible events that could be waited upon. This is rather ugly, and we're open to better solutions but it works for now. Reported-by: Stephane Eranian <[email protected]> Finished-by: Stephane Eranian <[email protected]> Reviewed-by: Stephane Eranian <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/20111126014731.GA7030@quad Signed-off-by: Ingo Molnar <[email protected]>
2011-07-01perf: Remove the perf_output_begin(.sample) argumentPeter Zijlstra1-18/+1
Since only samples call perf_output_sample() its much saner (and more correct) to put the sample logic in there than in the perf_output_begin()/perf_output_end() pair. Saves a useless argument, reduces conditionals and shrinks struct perf_output_handle, win! Signed-off-by: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/n/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2011-07-01perf: Remove the nmi parameter from the swevent and overflow interfacePeter Zijlstra1-7/+3
The nmi parameter indicated if we could do wakeups from the current context, if not, we would set some state and self-IPI and let the resulting interrupt do the wakeup. For the various event classes: - hardware: nmi=0; PMI is in fact an NMI or we run irq_work_run from the PMI-tail (ARM etc.) - tracepoint: nmi=0; since tracepoint could be from NMI context. - software: nmi=[0,1]; some, like the schedule thing cannot perform wakeups, and hence need 0. As one can see, there is very little nmi=1 usage, and the down-side of not using it is that on some platforms some software events can have a jiffy delay in wakeup (when arch_irq_work_raise isn't implemented). The up-side however is that we can remove the nmi parameter and save a bunch of conditionals in fast paths. Signed-off-by: Peter Zijlstra <[email protected]> Cc: Michael Cree <[email protected]> Cc: Will Deacon <[email protected]> Cc: Deng-Cheng Zhu <[email protected]> Cc: Anton Blanchard <[email protected]> Cc: Eric B Munson <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Paul Mundt <[email protected]> Cc: David S. Miller <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Jason Wessel <[email protected]> Cc: Don Zickus <[email protected]> Link: http://lkml.kernel.org/n/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2011-07-01perf_events: Fix perf buffer watermark settingVince Weaver1-7/+9
Since 2.6.36 (specifically commit d57e34fdd60b ("perf: Simplify the ring-buffer logic: make perf_buffer_alloc() do everything needed"), the perf_buffer_init_code() has been mis-setting the buffer watermark if perf_event_attr.wakeup_events has a non-zero value. This is because perf_event_attr.wakeup_events is a union with perf_event_attr.wakeup_watermark. This commit re-enables the check for perf_event_attr.watermark being set before continuing with setting a non-default watermark. This bug is most noticable when you are trying to use PERF_IOC_REFRESH with a value larger than one and perf_event_attr.wakeup_events is set to one. In this case the buffer watermark will be set to 1 and you will get extraneous POLL_IN overflows rather than POLL_HUP as expected. [ avoid using attr.wakeup_events when attr.watermark is set ] Signed-off-by: Vince Weaver <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Cc: <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2011-06-09perf: Split up buffer handling from core codeFrederic Weisbecker1-0/+399
And create the internal perf events header. v2: Keep an internal inlined perf_output_copy() Signed-off-by: Frederic Weisbecker <[email protected]> Acked-by: Peter Zijlstra <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Steven Rostedt <[email protected]> Link: http://lkml.kernel.org/r/[email protected] [ v3: use clearer 'ring_buffer' and 'rb' naming ] Signed-off-by: Ingo Molnar <[email protected]>