perf/ring_buffer: Use high order allocations for AUX buffers optimistically

Currently, the AUX buffer allocator will use high-order allocations for PMUs that don't support hardware scatter-gather chaining to ensure large contiguous blocks of pages, and always use an array of single pages otherwise. There is, however, a tangible performance benefit in using larger chunks of contiguous memory even in the latter case, that comes from not having to fetch the next page's address at every page boundary. In particular, a task running under Intel PT on an Atom CPU shows 1.5%-2% less runtime penalty with a single multi-page output region in snapshot mode (no PMI) than with multiple single-page output regions, from ~6% down to ~4%. For the snapshot mode it does make a difference as it is intended to run over long periods of time. For this reason, change the allocation policy to always optimistically start with the highest possible order when allocating pages for the AUX buffer, desceding until the allocation succeeds or order zero allocation fails. Signed-off-by: Alexander Shishkin <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Dave Hansen <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vince Weaver <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
author: Alexander Shishkin <[email protected]> 2019-02-15 13:47:27 +0200
committer: Ingo Molnar <[email protected]> 2019-03-09 14:10:30 +0100
commit: 5768402fd9c6e872252b5268ad85e3fbae4fe26b (patch)
tree: 62874f0d5f2b186b16cf50b2d66f24c1e48a3436
parent: 6ea98b4baa1c9089d7a035ebccb993e03d1ac57f (diff)
1 files changed, 15 insertions, 17 deletions
diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c
index 678ccec60d8f..a4047321d7d8 100644
--- a/kernel/events/ring_buffer.c
+++ b/kernel/events/ring_buffer.c
@@ -598,29 +598,27 @@ int rb_alloc_aux(struct ring_buffer *rb, struct perf_event *event,
 {
 	bool overwrite = !(flags & RING_BUFFER_WRITABLE);
 	int node = (event->cpu == -1) ? -1 : cpu_to_node(event->cpu);
-	int ret = -ENOMEM, max_order = 0;
+	int ret = -ENOMEM, max_order;
 
 	if (!has_aux(event))
 		return -EOPNOTSUPP;
 
-	if (event->pmu->capabilities & PERF_PMU_CAP_AUX_NO_SG) {
-		/*
-		 * We need to start with the max_order that fits in nr_pages,
-		 * not the other way around, hence ilog2() and not get_order.
-		 */
-		max_order = ilog2(nr_pages);
+	/*
+	 * We need to start with the max_order that fits in nr_pages,
+	 * not the other way around, hence ilog2() and not get_order.
+	 */
+	max_order = ilog2(nr_pages);
 
-		/*
-		 * PMU requests more than one contiguous chunks of memory
-		 * for SW double buffering
-		 */
-		if ((event->pmu->capabilities & PERF_PMU_CAP_AUX_SW_DOUBLEBUF) &&
-		    !overwrite) {
-			if (!max_order)
-				return -EINVAL;
+	/*
+	 * PMU requests more than one contiguous chunks of memory
+	 * for SW double buffering
+	 */
+	if ((event->pmu->capabilities & PERF_PMU_CAP_AUX_SW_DOUBLEBUF) &&
+	    !overwrite) {
+		if (!max_order)
+			return -EINVAL;
 
-			max_order--;
-		}
+		max_order--;
 	}
 
 	rb->aux_pages = kcalloc_node(nr_pages, sizeof(void *), GFP_KERNEL,
author	Alexander Shishkin <[email protected]>	2019-02-15 13:47:27 +0200
committer	Ingo Molnar <[email protected]>	2019-03-09 14:10:30 +0100
commit	5768402fd9c6e872252b5268ad85e3fbae4fe26b (patch)
tree	62874f0d5f2b186b16cf50b2d66f24c1e48a3436
parent	6ea98b4baa1c9089d7a035ebccb993e03d1ac57f (diff)