aboutsummaryrefslogtreecommitdiff
path: root/arch/powerpc/perf
AgeCommit message (Collapse)AuthorFilesLines
2015-04-11powerpc/perf/hv-24x7: Modify definition of request and result buffersSukadev Bhattiprolu2-44/+41
The parameters to the 24x7 HCALL have variable number of elements in them. Set the minimum number of such elements to 1 rather than 0 and eliminate the temporary structures. This would enable us to submit multiple counter requests and process multiple results from a single HCALL (in a follow on patch). Signed-off-by: Sukadev Bhattiprolu <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2015-03-27powerpc/perf: add missing put_cpu_var in power_pmu_event_initJan Stancek1-1/+3
One path in power_pmu_event_init() calls get_cpu_var(), but is missing matching call to put_cpu_var(), which causes preemption imbalance and crash in user-space: Page fault in user mode with in_atomic() = 1 mm = c000001fefa5a280 NIP = 3fff9bf2cae0 MSR = 900000014280f032 Oops: Weird page fault, sig: 11 [#23] SMP NR_CPUS=2048 NUMA PowerNV Modules linked in: <snip> CPU: 43 PID: 10285 Comm: a.out Tainted: G D 4.0.0-rc5+ #1 task: c000001fe82c9200 ti: c000001fe835c000 task.ti: c000001fe835c000 NIP: 00003fff9bf2cae0 LR: 00003fff9bee4898 CTR: 00003fff9bf2cae0 REGS: c000001fe835fea0 TRAP: 0401 Tainted: G D (4.0.0-rc5+) MSR: 900000014280f032 <SF,HV,VEC,VSX,EE,PR,FP,ME,IR,DR,RI> CR: 22000028 XER: 00000000 CFAR: 00003fff9bee4894 SOFTE: 1 GPR00: 00003fff9bee494c 00003fffe01c2ee0 00003fff9c084410 0000000010020068 GPR04: 0000000000000000 0000000000000002 0000000000000008 0000000000000001 GPR08: 0000000000000001 00003fff9c074a30 00003fff9bf2cae0 00003fff9bf2cd70 GPR12: 0000000052000022 00003fff9c10b700 NIP [00003fff9bf2cae0] 0x3fff9bf2cae0 LR [00003fff9bee4898] 0x3fff9bee4898 Call Trace: ---[ end trace 5d3d952b5d4185d4 ]--- BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:41 in_atomic(): 1, irqs_disabled(): 0, pid: 10285, name: a.out INFO: lockdep is turned off. CPU: 43 PID: 10285 Comm: a.out Tainted: G D 4.0.0-rc5+ #1 Call Trace: [c000001fe835f990] [c00000000089c014] .dump_stack+0x98/0xd4 (unreliable) [c000001fe835fa10] [c0000000000e4138] .___might_sleep+0x1d8/0x2e0 [c000001fe835faa0] [c000000000888da8] .down_read+0x38/0x110 [c000001fe835fb30] [c0000000000bf2f4] .exit_signals+0x24/0x160 [c000001fe835fbc0] [c0000000000abde0] .do_exit+0xd0/0xe70 [c000001fe835fcb0] [c00000000001f4c4] .die+0x304/0x450 [c000001fe835fd60] [c00000000088e1f4] .do_page_fault+0x2d4/0x900 [c000001fe835fe30] [c000000000008664] handle_page_fault+0x10/0x30 note: a.out[10285] exited with preempt_count 1 Reproducer: #include <stdio.h> #include <unistd.h> #include <syscall.h> #include <sys/types.h> #include <sys/stat.h> #include <linux/perf_event.h> #include <linux/hw_breakpoint.h> static struct perf_event_attr event = { .type = PERF_TYPE_RAW, .size = sizeof(struct perf_event_attr), .sample_type = PERF_SAMPLE_BRANCH_STACK, .branch_sample_type = PERF_SAMPLE_BRANCH_ANY_RETURN, }; int main() { syscall(__NR_perf_event_open, &event, 0, -1, -1, 0); } Signed-off-by: Jan Stancek <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2015-03-06treewide: Fix typo in printk messagesMasanari Iida1-1/+1
This patch fix spelling typo in printk messages. Signed-off-by: Masanari Iida <[email protected]> Acked-by: Randy Dunlap <[email protected]> Signed-off-by: Jiri Kosina <[email protected]>
2015-02-26Merge tag 'v4.0-rc1' into perf/core, to refresh the treeIngo Molnar16-49/+1360
Signed-off-by: Ingo Molnar <[email protected]>
2015-02-18perf, powerpc: Fix up flush_branch_stack() usersPeter Zijlstra1-4/+9
The recent LBR rework for x86 left a stray flush_branch_stack() user in the PowerPC code, fix that up. Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Anshuman Khandual <[email protected]> Cc: Anton Blanchard <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Joel Stanley <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Michael Neuling <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Tejun Heo <[email protected]> Cc: [email protected] Signed-off-by: Ingo Molnar <[email protected]>
2015-02-04Merge branch 'next' of ↵Michael Ellerman1-2/+8
git://git.kernel.org/pub/scm/linux/kernel/git/scottwood/linux into next Freescale updates from Scott: "Highlights include 8xx optimizations, some more work on datapath device tree content, e300 machine check support, t1040 corenet error reporting, and various cleanups and fixes."
2015-02-02powerpc/perf/hv-gpci: add the remaining gpci requestsCody P Schafer1-1/+186
Add the remaining gpci requests that contain counters suitable for use by perf. Omit those that don't contain any counters (but note their ommision). Signed-off-by: Cody P Schafer <[email protected]> Signed-off-by: Sukadev Bhattiprolu <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2015-02-02powerpc/perf/{hv-gpci, hv-common}: generate requests with counters annotatedCody P Schafer10-30/+316
This adds (in req-gen/) a framework for defining gpci counter requests. It uses macro magic similar to ftrace. Also convert the existing hv-gpci request structures and enum values to use the new framework (and adjust old users of the structs and enum values to cope with changes in naming). In exchange for this macro disaster, we get autogenerated event listing for GPCI in sysfs, build time field offset checking, and zero duplication of information about GPCI requests. Signed-off-by: Cody P Schafer <[email protected]> Signed-off-by: Sukadev Bhattiprolu <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2015-02-02powerpc/perf/hv-24x7: parse catalog and populate sysfs with eventsCody P Schafer4-17/+841
Retrieves and parses the 24x7 catalog on POWER systems that supply it (right now, only POWER 8). Events are exposed via sysfs in the standard fashion, and are all parameterized. $ cd /sys/bus/event_source/devices/hv_24x7/events $ cat HPM_CS_FROM_L4_LDATA__PHYS_CORE domain=0x2,offset=0xd58,core=?,lpar=0x0 $ cat HPM_TLBIE__VCPU_HOME_CHIP domain=0x4,offset=0x358,vcpu=?,lpar=? where user is required to specify values for the fields with '?' (like core, vcpu, lpar above), when specifying the event with the perf tool. Catalog is (at the moment) only parsed on boot. It needs re-parsing when a some hypervisor events occur. At that point we'll also need to prevent old events from continuing to function (counter that is passed in via spare space in the config values?). Signed-off-by: Cody P Schafer <[email protected]> Signed-off-by: Sukadev Bhattiprolu <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2015-02-02perf: define EVENT_DEFINE_RANGE_FORMAT_LITE helper[email protected]1-0/+10
Define a lite version of the EVENT_DEFINE_RANGE_FORMAT() that avoids defining helper functions for the bit-field ranges. Signed-off-by: Sukadev Bhattiprolu <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2015-01-29perf/powerpc: reset event hw state when adding it to the PMUAlexandru-Cezar Sardan1-1/+3
When adding an event to the PMU with PERF_EF_START the STOPPED and UPTODATE flags need to be cleared in the hw.event status variable because they are preventing the update of the event count on overflow interrupt. Signed-off-by: Alexandru-Cezar Sardan <[email protected]> Signed-off-by: Scott Wood <[email protected]>
2015-01-29powerpc/perf: fix fsl_emb_pmu_start to write correct pmc valueTom Huynh1-1/+5
PMCs on PowerPC increases towards 0x80000000 and triggers an overflow interrupt when the msb is set to collect a sample. Therefore, to setup for the next sample collection, pmu_start should set the pmc value to 0x80000000 - left instead of left which incorrectly delays the next overflow interrupt. Same as commit 9a45a9407c69 ("powerpc/perf: power_pmu_start restores incorrect values, breaking frequency events") for book3s. Signed-off-by: Tom Huynh <[email protected]> Signed-off-by: Scott Wood <[email protected]>
2014-12-12power/perf/hv-24x7: Use kmem_cache_free() instead of kfreeSukadev Bhattiprolu1-1/+1
Use kmem_cache_free() to free a buffer allocated with kmem_cache_alloc(). Signed-off-by: Sukadev Bhattiprolu <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2014-12-12powerpc/perf/hv-24x7: Use per-cpu page buffer[email protected]1-12/+9
The 24x7 counters are continuously running and not updated on an interrupt. So we record the event counts when stopping the event or deleting it. But to "read" a single counter in 24x7, we allocate a page and pass it into the hypervisor (The HV returns the page full of counters from which we extract the specific counter for this event). We allocate a page using GFP_USER and when deleting the event, we end up with the following warning because we are blocking in interrupt context. [ 698.641709] BUG: scheduling while atomic: swapper/0/0/0x10010000 We could use GFP_ATOMIC but that could result in failures. Pre-allocate a buffer so we don't have to allocate in interrupt context. Further as Michael Ellerman suggested, use Per-CPU buffer so we only need to allocate once per CPU. Cc: [email protected] Signed-off-by: Sukadev Bhattiprolu <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2014-11-03powerpc: Replace __get_cpu_var usesChristoph Lameter2-14/+14
This still has not been merged and now powerpc is the only arch that does not have this change. Sorry about missing linuxppc-dev before. V2->V2 - Fix up to work against 3.18-rc1 __get_cpu_var() is used for multiple purposes in the kernel source. One of them is address calculation via the form &__get_cpu_var(x). This calculates the address for the instance of the percpu variable of the current processor based on an offset. Other use cases are for storing and retrieving data from the current processors percpu area. __get_cpu_var() can be used as an lvalue when writing data or on the right side of an assignment. __get_cpu_var() is defined as : __get_cpu_var() always only does an address determination. However, store and retrieve operations could use a segment prefix (or global register on other platforms) to avoid the address calculation. this_cpu_write() and this_cpu_read() can directly take an offset into a percpu area and use optimized assembly code to read and write per cpu variables. This patch converts __get_cpu_var into either an explicit address calculation using this_cpu_ptr() or into a use of this_cpu operations that use the offset. Thereby address calculations are avoided and less registers are used when code is generated. At the end of the patch set all uses of __get_cpu_var have been removed so the macro is removed too. The patch set includes passes over all arches as well. Once these operations are used throughout then specialized macros can be defined in non -x86 arches as well in order to optimize per cpu access by f.e. using a global register that may be set to the per cpu base. Transformations done to __get_cpu_var() 1. Determine the address of the percpu instance of the current processor. DEFINE_PER_CPU(int, y); int *x = &__get_cpu_var(y); Converts to int *x = this_cpu_ptr(&y); 2. Same as #1 but this time an array structure is involved. DEFINE_PER_CPU(int, y[20]); int *x = __get_cpu_var(y); Converts to int *x = this_cpu_ptr(y); 3. Retrieve the content of the current processors instance of a per cpu variable. DEFINE_PER_CPU(int, y); int x = __get_cpu_var(y) Converts to int x = __this_cpu_read(y); 4. Retrieve the content of a percpu struct DEFINE_PER_CPU(struct mystruct, y); struct mystruct x = __get_cpu_var(y); Converts to memcpy(&x, this_cpu_ptr(&y), sizeof(x)); 5. Assignment to a per cpu variable DEFINE_PER_CPU(int, y) __get_cpu_var(y) = x; Converts to __this_cpu_write(y, x); 6. Increment/Decrement etc of a per cpu variable DEFINE_PER_CPU(int, y); __get_cpu_var(y)++ Converts to __this_cpu_inc(y) Cc: Benjamin Herrenschmidt <[email protected]> CC: Paul Mackerras <[email protected]> Signed-off-by: Christoph Lameter <[email protected]> [mpe: Fix build errors caused by set/or_softirq_pending(), and rework assignment in __set_breakpoint() to use memcpy().] Signed-off-by: Michael Ellerman <[email protected]>
2014-10-28perf: Fix and clean up initialization of pmu::event_idxPeter Zijlstra2-12/+0
Andy reported that the current state of event_idx is rather confused. So remove all but the x86_pmu implementation and change the default to return 0 (the safe option). Reported-by: Andy Lutomirski <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Cody P Schafer <[email protected]> Cc: Cody P Schafer <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Hendrik Brueckner <[email protected]> Cc: Himangi Saraogi <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Martin Schwidefsky <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Paul Gortmaker <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: [email protected] <[email protected]> Cc: Thomas Huth <[email protected]> Cc: Vince Weaver <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Signed-off-by: Ingo Molnar <[email protected]>
2014-10-07powerpc/perf/hv-24x7: Simplify catalog_read()[email protected]1-87/+14
catalog_read() implements the read interface for the sysfs file /sys/bus/event_source/devices/hv_24x7/interface/catalog It essentially takes a buffer, an offset and count as parameters to the read() call. It makes a hypervisor call to read a specific page from the catalog and copy the required bytes into the given buffer. Each call to catalog_read() returns at most one 4K page. Given these requirements, we should be able to simplify the catalog_read(). Signed-off-by: Sukadev Bhattiprolu <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2014-10-07powerpc/perf/hv-24x7: use kmem_cache instead of aligned stack allocationsCody P Schafer1-18/+37
Ian pointed out the use of __aligned(4096) caused rather large stack consumption in single_24x7_request(), so use the kmem_cache hv_page_cache (which we've already got set up for other allocations) insead of allocating locally. CC: Haren Myneni <[email protected]> Reported-by: Ian Munsie <[email protected]> Signed-off-by: Cody P Schafer <[email protected]> Signed-off-by: Sukadev Bhattiprolu <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2014-09-25powerpc: Make a bunch of things staticAnton Blanchard1-9/+9
Signed-off-by: Anton Blanchard <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2014-09-09powerpc/perf: Fix ABIv2 kernel backtracesAnton Blanchard1-1/+1
ABIv2 kernels are failing to backtrace through the kernel. An example: 39.30% readseek2_proce [kernel.kallsyms] [k] find_get_entry | --- find_get_entry __GI___libc_read The problem is in valid_next_sp() where we check that the new stack pointer is at least STACK_FRAME_OVERHEAD below the previous one. ABIv1 has a minimum stack frame size of 112 bytes consisting of 48 bytes and 64 bytes of parameter save area. ABIv2 changes that to 32 bytes with no paramter save area. STACK_FRAME_OVERHEAD is in theory the minimum stack frame size, but we over 240 uses of it, some of which assume that it includes space for the parameter area. We need to work through all our stack defines and rationalise them but let's fix perf now by creating STACK_FRAME_MIN_SIZE and using in valid_next_sp(). This fixes the issue: 30.64% readseek2_proce [kernel.kallsyms] [k] find_get_entry | --- find_get_entry pagecache_get_page generic_file_read_iter new_sync_read vfs_read sys_read syscall_exit __GI___libc_read Cc: [email protected] # 3.16+ Reported-by: Aneesh Kumar K.V <[email protected]> Signed-off-by: Anton Blanchard <[email protected]>
2014-08-13powerpc/perf/hv-24x7: Use kmem_cache_freeHimangi Saraogi1-1/+1
Free memory allocated using kmem_cache_zalloc using kmem_cache_free rather than kfree. The Coccinelle semantic patch that makes this change is as follows: // <smpl> @@ expression x,E,c; @@ x = \(kmem_cache_alloc\|kmem_cache_zalloc\|kmem_cache_alloc_node\)(c,...) ... when != x = E when != &x ?-kfree(x) +kmem_cache_free(c,x) // </smpl> Signed-off-by: Himangi Saraogi <[email protected]> Acked-by: Julia Lawall <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2014-08-07Merge branch 'next' of ↵Linus Torvalds9-35/+82
git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc Pull powerpc updates from Ben Herrenschmidt: "This is the powerpc new goodies for 3.17. The short story: The biggest bit is Michael removing all of pre-POWER4 processor support from the 64-bit kernel. POWER3 and rs64. This gets rid of a ton of old cruft that has been bitrotting in a long while. It was broken for quite a few versions already and nobody noticed. Nobody uses those machines anymore. While at it, he cleaned up a bunch of old dusty cabinets, getting rid of a skeletton or two. Then, we have some base VFIO support for KVM, which allows assigning of PCI devices to KVM guests, support for large 64-bit BARs on "powernv" platforms, support for HMI (Hardware Management Interrupts) on those same platforms, some sparse-vmemmap improvements (for memory hotplug), There is the usual batch of Freescale embedded updates (summary in the merge commit) and fixes here or there, I think that's it for the highlights" * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (102 commits) powerpc/eeh: Export eeh_iommu_group_to_pe() powerpc/eeh: Add missing #ifdef CONFIG_IOMMU_API powerpc: Reduce scariness of interrupt frames in stack traces powerpc: start loop at section start of start in vmemmap_populated() powerpc: implement vmemmap_free() powerpc: implement vmemmap_remove_mapping() for BOOK3S powerpc: implement vmemmap_list_free() powerpc: Fail remap_4k_pfn() if PFN doesn't fit inside PTE powerpc/book3s: Fix endianess issue for HMI handling on napping cpus. powerpc/book3s: handle HMIs for cpus in nap mode. powerpc/powernv: Invoke opal call to handle hmi. powerpc/book3s: Add basic infrastructure to handle HMI in Linux. powerpc/iommu: Fix comments with it_page_shift powerpc/powernv: Handle compound PE in config accessors powerpc/powernv: Handle compound PE for EEH powerpc/powernv: Handle compound PE powerpc/powernv: Split ioda_eeh_get_state() powerpc/powernv: Allow to freeze PE powerpc/powernv: Enable M64 aperatus for PHB3 powerpc/eeh: Aux PE data for error log ...
2014-08-04Merge branch 'perf-core-for-linus' of ↵Linus Torvalds2-4/+8
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf changes from Ingo Molnar: "Kernel side changes: - Consolidate the PMU interrupt-disabled code amongst architectures (Vince Weaver) - misc fixes Tooling changes (new features, user visible changes): - Add support for pagefault tracing in 'trace', please see multiple examples in the changeset messages (Stanislav Fomichev). - Add pagefault statistics in 'trace' (Stanislav Fomichev) - Add header for columns in 'top' and 'report' TUI browsers (Jiri Olsa) - Add pagefault statistics in 'trace' (Stanislav Fomichev) - Add IO mode into timechart command (Stanislav Fomichev) - Fallback to syscalls:* when raw_syscalls:* is not available in the perl and python perf scripts. (Daniel Bristot de Oliveira) - Add --repeat global option to 'perf bench' to be used in benchmarks such as the existing 'futex' one, that was modified to use it instead of a local option. (Davidlohr Bueso) - Fix fd -> pathname resolution in 'trace', be it using /proc or a vfs_getname probe point. (Arnaldo Carvalho de Melo) - Add suggestion of how to set perf_event_paranoid sysctl, to help non-root users trying tools like 'trace' to get a working environment. (Arnaldo Carvalho de Melo) - Updates from trace-cmd for traceevent plugin_kvm plus args cleanup (Steven Rostedt, Jan Kiszka) - Support S/390 in 'perf kvm stat' (Alexander Yarygin) Tooling infrastructure changes: - Allow reserving a row for header purposes in the hists browser (Arnaldo Carvalho de Melo) - Various fixes and prep work related to supporting Intel PT (Adrian Hunter) - Introduce multiple debug variables control (Jiri Olsa) - Add callchain and additional sample information for python scripts (Joseph Schuchart) - More prep work to support Intel PT: (Adrian Hunter) - Polishing 'script' BTS output - 'inject' can specify --kallsym - VDSO is per machine, not a global var - Expose data addr lookup functions previously private to 'script' - Large mmap fixes in events processing - Include standard stringify macros in power pc code (Sukadev Bhattiprolu) Tooling cleanups: - Convert open coded equivalents to asprintf() (Andy Shevchenko) - Remove needless reassignments in 'trace' (Arnaldo Carvalho de Melo) - Cache the is_exit syscall test in 'trace) (Arnaldo Carvalho de Melo) - No need to reimplement err() in 'perf bench sched-messaging', drop barf(). (Davidlohr Bueso). - Remove ev_name argument from perf_evsel__hists_browse, can be obtained from the other parameters. (Jiri Olsa) Tooling fixes: - Fix memory leak in the 'sched-messaging' perf bench test. (Davidlohr Bueso) - The -o and -n 'perf bench mem' options are mutually exclusive, emit error when both are specified. (Davidlohr Bueso) - Fix scrollbar refresh row index in the ui browser, problem exposed now that headers will be added and will be allowed to be switched on/off. (Jiri Olsa) - Handle the num array type in python properly (Sebastian Andrzej Siewior) - Fix wrong condition for allocation failure (Jiri Olsa) - Adjust callchain based on DWARF debug info on powerpc (Sukadev Bhattiprolu) - Fix a risk for doing free on uninitialized pointer in traceevent lib (Rickard Strandqvist) - Update attr test with PERF_FLAG_FD_CLOEXEC flag (Jiri Olsa) - Enable close-on-exec flag on perf file descriptor (Yann Droneaud) - Fix build on gcc 4.4.7 (Arnaldo Carvalho de Melo) - Event ordering fixes (Jiri Olsa)" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (123 commits) Revert "perf tools: Fix jump label always changing during tracing" perf tools: Fix perf usage string leftover perf: Check permission only for parent tracepoint event perf record: Store PERF_RECORD_FINISHED_ROUND only for nonempty rounds perf record: Always force PERF_RECORD_FINISHED_ROUND event perf inject: Add --kallsyms parameter perf tools: Expose 'addr' functions so they can be reused perf session: Fix accounting of ordered samples queue perf powerpc: Include util/util.h and remove stringify macros perf tools: Fix build on gcc 4.4.7 perf tools: Add thread parameter to vdso__dso_findnew() perf tools: Add dso__type() perf tools: Separate the VDSO map name from the VDSO dso name perf tools: Add vdso__new() perf machine: Fix the lifetime of the VDSO temporary file perf tools: Group VDSO global variables into a structure perf session: Add ability to skip 4GiB or more perf session: Add ability to 'skip' a non-piped event stream perf tools: Pass machine to vdso__dso_findnew() perf tools: Add dso__data_size() ...
2014-07-28Merge tag 'v3.16-rc7' into perf/core, to merge in the latest fixes before ↵Ingo Molnar2-5/+23
applying new changes Signed-off-by: Ingo Molnar <[email protected]>
2014-07-28powerpc/perf: Add per-event excludes on Power8Michael Ellerman2-24/+67
Power8 has a new register (MMCR2), which contains individual freeze bits for each counter. This is an improvement on previous chips as it means we can have multiple events on the PMU at the same time with different exclude_{user,kernel,hv} settings. Previously we had to ensure all events on the PMU had the same exclude settings. The core of the patch is fairly simple. We use the 207S feature flag to indicate that the PMU backend supports per-event excludes, if it's set we skip the generic logic that enforces the equality of excludes between events. We also use that flag to skip setting the freeze bits in MMCR0, the PMU backend is expected to have handled setting them in MMCR2. The complication arises with EBB. The FCxP bits in MMCR2 are accessible R/W to a task using EBB. Which means a task using EBB will be able to see that we are using MMCR2 for freezing, whereas the old logic which used MMCR0 is not user visible. The task can not see or affect exclude_kernel & exclude_hv, so we only need to consider exclude_user. The table below summarises the behaviour both before and after this commit is applied: exclude_user true false ------------------------------------ | User visible | N N Before | Can freeze | Y Y | Can unfreeze | N Y ------------------------------------ | User visible | Y Y After | Can freeze | Y Y | Can unfreeze | Y/N Y ------------------------------------ So firstly I assert that the simple visibility of the exclude_user setting in MMCR2 is a non-issue. The event belongs to the task, and was most likely created by the task. So the exclude_user setting is not privileged information in any way. Secondly, the behaviour in the exclude_user = false case is unchanged. This is important as it is the case that is actually useful, ie. the event is created with no exclude setting and the task uses MMCR2 to implement exclusion manually. For exclude_user = true there is no meaningful change to freezing the event. Previously the task could use MMCR2 to freeze the event, though it was already frozen with MMCR0. With the new code the task can use MMCR2 to freeze the event, though it was already frozen with MMCR2. The only real change is when exclude_user = true and the task tries to use MMCR2 to unfreeze the event. Previously this had no effect, because the event was already frozen in MMCR0. With the new code the task can unfreeze the event in MMCR2, but at some indeterminate time in the future the kernel will overwrite its setting and refreeze the event. Therefore my final assertion is that any task using exclude_user = true and also fiddling with MMCR2 was deeply confused before this change, and remains so after it. Signed-off-by: Michael Ellerman <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2014-07-28powerpc/perf: Pass the struct perf_events down to compute_mmcr()Michael Ellerman9-10/+12
To support per-event exclude settings on Power8 we need access to the struct perf_events in compute_mmcr(). Signed-off-by: Michael Ellerman <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2014-07-28powerpc/perf: Clear all MMCR settings before calling compute_mmcr()Michael Ellerman1-1/+3
Because we reuse cpuhw->mmcr on each call to compute_mmcr() there's a risk that we could forget to set one of the values and use whatever value was in there previously. Currently all the implementations are careful to set all the values, but it's safer to clear them all before we call compute_mmcr(). Signed-off-by: Michael Ellerman <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2014-07-23powerpc/perf: Fix MMCR2 handling for EBBMichael Ellerman1-3/+3
In the recent commit b50a6c584bb4 "Clear MMCR2 when enabling PMU", I screwed up the handling of MMCR2 for tasks using EBB. We must make sure we set MMCR2 *before* ebb_switch_in(), otherwise we overwrite the value of MMCR2 that userspace may have written. That potentially breaks a task that uses EBB and manually uses MMCR2 for event freezing. Fixes: b50a6c584bb4 ("powerpc/perf: Clear MMCR2 when enabling PMU") Cc: [email protected] Signed-off-by: Michael Ellerman <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2014-07-11powerpc/perf: Never program book3s PMCs with values >= 0x80000000Anton Blanchard1-1/+16
We are seeing a lot of PMU warnings on POWER8: Can't find PMC that caused IRQ Looking closer, the active PMC is 0 at this point and we took a PMU exception on the transition from negative to 0. Some versions of POWER8 have an issue where they edge detect and not level detect PMC overflows. A number of places program the PMC with (0x80000000 - period_left), where period_left can be negative. We can either fix all of these or just ensure that period_left is always >= 1. This patch takes the second option. Cc: <[email protected]> Signed-off-by: Anton Blanchard <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2014-07-11powerpc/perf: Clear MMCR2 when enabling PMUJoel Stanley1-0/+3
On POWER8 when switching to a KVM guest we set bits in MMCR2 to freeze the PMU counters. Aside from on boot they are then never reset, resulting in stuck perf counters for any user in the guest or host. We now set MMCR2 to 0 whenever enabling the PMU, which provides a sane state for perf to use the PMU counters under either the guest or the host. This was manifesting as a bug with ppc64_cpu --frequency: $ sudo ppc64_cpu --frequency WARNING: couldn't run on cpu 0 WARNING: couldn't run on cpu 8 ... WARNING: couldn't run on cpu 144 WARNING: couldn't run on cpu 152 min: 18446744073.710 GHz (cpu -1) max: 0.000 GHz (cpu -1) avg: 0.000 GHz The command uses a perf counter to measure CPU cycles over a fixed amount of time, in order to approximate the frequency of the machine. The counters were returning zero once a guest was started, regardless of weather it was still running or had been shut down. By dumping the value of MMCR2, it was observed that once a guest is running MMCR2 is set to 1s - which stops counters from running: $ sudo sh -c 'echo p > /proc/sysrq-trigger' CPU: 0 PMU registers, ppmu = POWER8 n_counters = 6 PMC1: 5b635e38 PMC2: 00000000 PMC3: 00000000 PMC4: 00000000 PMC5: 1bf5a646 PMC6: 5793d378 PMC7: deadbeef PMC8: deadbeef MMCR0: 0000000080000000 MMCR1: 000000001e000000 MMCRA: 0000040000000000 MMCR2: fffffffffffffc00 EBBHR: 0000000000000000 EBBRR: 0000000000000000 BESCR: 0000000000000000 SIAR: 00000000000a51cc SDAR: c00000000fc40000 SIER: 0000000001000000 This is done unconditionally in book3s_hv_interrupts.S upon entering the guest, and the original value is only save/restored if the host has indicated it was using the PMU. This is okay, however the user of the PMU needs to ensure that it is in a defined state when it starts using it. Fixes: e05b9b9e5c10 ("powerpc/perf: Power8 PMU support") Cc: [email protected] Signed-off-by: Joel Stanley <[email protected]> Acked-by: Michael Ellerman <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2014-07-11powerpc/perf: Add PPMU_ARCH_207S defineJoel Stanley2-4/+4
Instead of separate bits for every POWER8 PMU feature, have a single one for v2.07 of the architecture. This saves us adding a MMCR2 define for a future patch. Cc: [email protected] Signed-off-by: Joel Stanley <[email protected]> Acked-by: Michael Ellerman <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2014-07-05powerpc, perf: Use common PMU interrupt disabled codeVince Weaver2-4/+8
Transition to using the new generic PERF_PMU_CAP_NO_INTERRUPT method for failing a sampling event when no PMU interrupt is available. Signed-off-by: Vince Weaver <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1406191435440.27913@vincent-weaver-1.umelst.maine.edu Cc: Benjamin Herrenschmidt <[email protected]> Cc: Cody P Schafer <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: [email protected] Cc: [email protected] Signed-off-by: Ingo Molnar <[email protected]>
2014-04-28powerpc/perf/hv-24x7: Catalog version number is be64, not be32Cody P Schafer1-6/+7
The catalog version number was changed from a be32 (with proceeding 32bits of padding) to a be64, update the code to treat it as a be64 Signed-off-by: Cody P Schafer <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2014-04-28powerpc/perf/hv-24x7: Remove [static 4096], sparse chokes on itCody P Schafer1-1/+1
Signed-off-by: Cody P Schafer <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2014-04-28powerpc/perf/hv-24x7: Use (unsigned long) not (u32) values when calling ↵Cody P Schafer1-4/+16
plpar_hcall_norets() Signed-off-by: Cody P Schafer <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2014-04-28powerpc/perf/hv-gpci: Make device attr staticCody P Schafer1-1/+1
Signed-off-by: Cody P Schafer <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2014-04-28powerpc/perf/hv_gpci: Probe failures use pr_debug(), and padding reducedCody P Schafer1-2/+2
fixup for "powerpc/perf: Add support for the hv gpci (get performance counter info) interface". Makes the "not enabled" message less awful (and hidden unless debugging). Signed-off-by: Cody P Schafer <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2014-04-28powerpc/perf/hv_24x7: Probe errors changed to pr_debug(), padding fixedCody P Schafer1-2/+2
fixup for "powerpc/perf: Add support for the hv 24x7 interface" Makes the "not enabled" message less awful (and hides it in most cases). Signed-off-by: Cody P Schafer <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2014-03-24powerpc/perf: Fix handling of L3 events with bank == 1Michael Ellerman1-2/+3
Currently we reject events which have the L3 bank == 1, such as 0x000084918F, because the cache field is non-zero. However that is incorrect, because although the bank is non-zero, the value we would write into MMCRC is zero, and so we can count the event. So fix the check to ignore the bank selector when checking whether the cache selector is non-zero. Signed-off-by: Michael Ellerman <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2014-03-24powerpc/perf: Add kconfig option for hypervisor provided countersCody P Schafer1-0/+2
The commit adds a Kconfig option which allows the hv_gpci and hv_24x7 PMUs, added in the preceeding commits, to be built. Signed-off-by: Cody P Schafer <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2014-03-24powerpc/perf: Add support for the hv 24x7 interfaceCody P Schafer1-0/+510
This provides a basic interface between hv_24x7 and perf. Similar to the one provided for gpci, it lacks transaction support and does not list any events. Example usage via perf tool: perf stat -e 'hv_24x7/domain=2,offset=8,starting_index=0,lpar=0xffffffff/' -r 0 -C 0 -x ' ' sleep 0.1 Signed-off-by: Cody P Schafer <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2014-03-24powerpc/perf: Add support for the hv gpci (get performance counter info) ↵Cody P Schafer1-0/+294
interface This provides a basic link between perf and hv_gpci. Notably, it does not yet support transactions and does not list any events (they can still be manually composed). Example usage via perf tool: perf stat -e 'hv_gpci/counter_info_version=3,offset=0,length=8,secondary_index=0,starting_index=0xffffffff,request=0x10/' -r 0 -C 0 -x ' ' sleep 0.1 Signed-off-by: Cody P Schafer <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2014-03-24powerpc/perf: Add macros for defining event fields & formatsCody P Schafer1-0/+19
Add two macros which generate functions to extract the relevent bits from event->attr.config{,1,2}. EVENT_DEFINE_RANGE() defines an accessor for a range of bits in the event, as well as a "max" function that gives the maximum value of the field based on the bit width. EVENT_DEFINE_RANGE_FORMAT() defines the accessor & max routine and also a format attribute for use in the PMU's attr_groups. Signed-off-by: Cody P Schafer <[email protected]> [mpe: move to powerpc, ugly but descriptive macro names] Signed-off-by: Michael Ellerman <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2014-03-24powerpc/perf: Add a shared interface to get gpci version and capabilitiesCody P Schafer2-0/+56
This exposes a simple way to grab the firmware provided collect_priveliged, ga, expanded, and lab capability bits. All of these bits come in from the same gpci request, so we've exposed all of them. Only the collect_priveliged bit is really used by the hv-gpci/hv-24x7 code, the other bits are simply exposed in sysfs to inform the user. Signed-off-by: Cody P Schafer <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2014-03-24powerpc/perf: Add 24x7 interface headersCody P Schafer2-0/+142
24x7 (also called hv_24x7 or H_24X7) is an interface to obtain performance counters from the hypervisor. These counters do not have a fixed format/possition and are instead documented in a "24x7 Catalog", which is provided by the hypervisor (that interface is also documented paritialy in the included hv-24x7-catalog.h and fully in at https://raw.githubusercontent.com/jmesmon/catalog-24x7/master/hv-24x7-catalog.h ). The 24x7 data access is simply a copy operation into a 4 dimentional array of 64bit counters (from hypervisor to kernel memory). There is no interupt triggered on overflow, these are completely disjoint from the typical power pmu. This method of obtaining performance counters from the hypervisor is intended to paritialy replace the gpci interface. Signed-off-by: Cody P Schafer <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2014-03-24powerpc/perf: Add hv_gpci interface headerCody P Schafer1-0/+73
"H_GetPerformanceCounterInfo" (refered to as hv_gpci or just gpci from here on) is an interface to retrieve specific performance counters and other data from the hypervisor. All outputs have a fixed format. This header only describes the portions of the interface that we plan on using in linux at this time. Signed-off-by: Cody P Schafer <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2014-03-24powerpc/perf: Enable BHRB access for EBB eventsMichael Ellerman1-4/+6
The previous commit added constraint and register handling to allow processes using EBB (Event Based Branches) to request access to the BHRB (Branch History Rolling Buffer). With that in place we can allow processes using EBB to access the BHRB. This is achieved by setting BHRBA in MMCR0 when we enable EBB access. We must also clear BHRBA when we are disabling. Signed-off-by: Michael Ellerman <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2014-03-24powerpc/perf: Add BHRB constraint and IFM MMCRA handling for EBBMichael Ellerman1-9/+44
We want a way for users of EBB (Event Based Branches) to also access the BHRB (Branch History Rolling Buffer). EBB does not interoperate with our existing BHRB support, which is wired into the generic Linux branch stack sampling support. To support EBB & BHRB we add three new bits to the event code. The first bit indicates that the event wants access to the BHRB, and the other two bits indicate the desired IFM (Instruction Filtering Mode). We allow multiple events to request access to the BHRB, but they must agree on the IFM value. Events which are not interested in the BHRB can also interoperate with events which do. Finally we program the desired IFM value into MMCRA. Although we do this for every event, we know that the value will be identical for all events that request BHRB access. Signed-off-by: Michael Ellerman <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2014-03-24powerpc/perf: Avoid mutating event in power8_get_constraint()Michael Ellerman1-6/+8
We only need to mask the EBB bit out of the event for the check of the special PMC 5 & 6 events. So use a local to do it just for that code, rather than changing the event value for the life of the function. While we're there move the set of mask and value after all the checks. Signed-off-by: Michael Ellerman <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2014-03-24powerpc/perf: Clean up the EBB hash defines a littleMichael Ellerman1-3/+4
Rather than using PERF_EVENT_CONFIG_EBB_SHIFT everywhere, add an EVENT_EBB_SHIFT like every other event and use that. Signed-off-by: Michael Ellerman <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>