aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2009-08-09SUNRPC: Eliminate PROC macro from rpcb_clntChuck Lever1-22/+91
Clean up: Replace PROC macro with open coded C99 structure initializers to improve readability. The rpcbind v4 GETVERSADDR procedure is never sent by the current implementation, so it is not copied to the new structures. Signed-off-by: Chuck Lever <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2009-08-09SUNRPC: Clean up: Remove unused XDR decoder functions from rpcb_clnt.cChuck Lever1-81/+0
Signed-off-by: Chuck Lever <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2009-08-09SUNRPC: Introduce new xdr_stream-based decoders to rpcb_clnt.cChuck Lever1-3/+77
Replace the open-coded decode logic for PMAP_GETPORT/RPCB_GETADDR with an xdr_stream-based implementation, similar to what NFSv4 uses, to protect against buffer overflows. The new implementation also checks that the incoming port number is reasonable. Signed-off-by: Chuck Lever <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2009-08-09SUNRPC: Introduce xdr_stream-based decoders for RPCB_UNSETChuck Lever1-0/+22
Replace the open-coded decode logic for rpcbind UNSET results with an xdr_stream-based implementation, similar to what NFSv4 uses, to protect against buffer overflows. The new function is unused for the moment. Signed-off-by: Chuck Lever <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2009-08-09SUNRPC: Clean up: Remove unused XDR encoder functions from rpcb_clnt.cChuck Lever1-34/+0
Signed-off-by: Chuck Lever <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2009-08-09SUNRPC: Introduce new xdr_stream-based encoders to rpcb_clnt.cChuck Lever1-1/+75
Replace the open-coded encode logic for rpcbind arguments with an xdr_stream-based implementation, similar to what NFSv4 uses, to better protect against buffer overflows. Signed-off-by: Chuck Lever <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2009-08-09NFSD: Support IPv6 addresses in write_failover_ip()Chuck Lever1-14/+7
In write_failover_ip(), replace the sscanf() with a call to the common sunrpc.ko presentation address parser. Signed-off-by: Chuck Lever <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2009-08-09lockd: Replace nsm_display_address() with rpc_ntop()Chuck Lever1-39/+5
Clean up. Signed-off-by: Chuck Lever <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2009-08-09lockd: Replace nlm_clear_port()Chuck Lever1-13/+1
Clean up: Use shared rpc_set_port() function instead of nlm_clear_port(). Signed-off-by: Chuck Lever <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2009-08-09NFS: Replace nfs_set_port() with rpc_set_port()Chuck Lever3-23/+4
Clean up. Signed-off-by: Chuck Lever <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2009-08-09NFS: Replace nfs_parse_ip_address() with rpc_pton()Chuck Lever3-133/+22
Clean up: Use the common routine now provided in sunrpc.ko for parsing mount addresses. Signed-off-by: Chuck Lever <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2009-08-09SUNRPC: Use rpc_ntop() for constructing transport address stringsChuck Lever2-102/+52
Clean up: In addition to using the new generic rpc_ntop() and rpc_get_port() functions, have the RPC client compute the presentation address buffer sizes dynamically using kstrdup(). Signed-off-by: Chuck Lever <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2009-08-09SUNRPC: Remove duplicate universal address generationChuck Lever4-46/+29
RPC universal address generation is currently done in several places: rpcb_clnt.c, nfs4proc.c xprtsock.c, and xprtrdma.c. Remove the redundant cases that convert a socket address to a universal address. The nfs4proc.c case takes a pre-formatted presentation address string, not a socket address, so we'll leave that one. Because the new uaddr constructor uses the recently introduced rpc_ntop(), it now supports proper "::" shorthanding for IPv6 addresses. This allows the kernel to register properly formed universal addresses with the local rpcbind service, in _all_ cases. The kernel can now also send properly formed universal addresses in RPCB_GETADDR requests, and support link-local properly when encoding and decoding IPv6 addresses. Signed-off-by: Chuck Lever <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2009-08-09SUNRPC: Provide functions for managing universal addressesChuck Lever4-3/+403
Introduce a set of functions in the kernel's RPC implementation for converting between a socket address and either a standard presentation address string or an RPC universal address. The universal address functions will be used to encode and decode RPCB_FOO and NFSv4 SETCLIENTID arguments. The other functions are part of a previous promise to deliver shared functions that can be used by upper-layer protocols to display and manipulate IP addresses. The kernel's current address printf formatters were designed specifically for kernel to user-space APIs that require a particular string format for socket addresses, thus are somewhat limited for the purposes of sunrpc.ko. The formatter for IPv6 addresses, %pI6, does not support short-handing or scope IDs. Also, these printf formatters are unique per address family, so a separate formatter string is required for printing AF_INET and AF_INET6 addresses. Signed-off-by: Chuck Lever <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2009-08-09SUNRPC: Move XDR data type size macrosChuck Lever1-25/+31
Clean up: To make subsequent patches cleaner, move the XDR data type size macros to the top of the file (similar to nfs4xdr.c) first. Signed-off-by: Chuck Lever <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2009-08-09SUNRPC: Clean up RPCBIND_MAXUADDRLEN definitionsChuck Lever1-1/+16
Clean up: Replace the single-integer definition of RPCBIND_MAXUADDRLEN with a definition that is based on previously defined address string sizes, and document the way this maximum is calculated. Also provide a separate macro for the size of the port number extension. Signed-off-by: Chuck Lever <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2009-08-09NFS: Use the authentication flavor list returned by mountdChuck Lever1-7/+51
Commit a14017db added support in the kernel's NFS mount client to decode the authentication flavor list returned by mountd. The NFS client can now use this list to determine whether the authentication flavor requested by the user is actually supported by the server. Note we don't actually negotiate the security flavor if none was specified by the user. Instead, we try to use AUTH_SYS, and fail if the server does not support it. This prevents us from negotiating an inappropriate security flavor (some servers list AUTH_NULL first). If the server does not support AUTH_SYS, the user must provide an appropriate security flavor by specifying the "sec=" mount option. Signed-off-by: Chuck Lever <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2009-08-09NFS: Fix auth flavor len accountingChuck Lever1-14/+3
Previous logic in the NFS mount parsing code path assumed auth_flavor_len was set to zero for simple authentication flavors (like AUTH_UNIX), and 1 for compound flavors (like AUTH_GSS). At some earlier point (maybe even before the option parsers were merged?) specific checks for auth_flavor_len being zero were removed from the functions that validate the mount option that sets the mount point's authentication flavor. Since we are populating an array for authentication flavors, the auth_flavor_len should always be set to the number of flavors. Let's eliminate some cleverness here, and prepare for new logic that needs to know the number of flavors in the auth_flavors[] array. (auth_flavors[] is an array because at some point we want to allow a list of acceptable authentication flavors to be specified via the sec= mount option. For now it remains a single element array). Signed-off-by: Chuck Lever <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2009-08-09NFS: Add ability to send MOUNTPROC_UMNT to the kernel's mountd clientChuck Lever2-0/+80
After certain failure modes of an NFS mount, an NFS client should send a MOUNTPROC_UMNT request to remove the just-added mount entry from the server's mount table. While no-one should rely on the accuracy of the server's mount table, sending a UMNT is simply being a good internet neighbor. Since NFS mount processing is handled in the kernel now, we will need a function in the kernel's mountd client that can post a MOUNTRPC_UMNT request, in order to handle these failure modes. Signed-off-by: Chuck Lever <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2009-08-09NFS: Fix up new minorversion= optionChuck Lever1-7/+11
The new minorversion= mount option (commit 3fd5be9e) was merged at the same time as the recent sloppy parser fixes (commit a5a16bae), so minorversion= still uses the old value parsing logic. If the minorversion= option specifies a bogus value, it should fail with "bad value" not "bad option." Signed-off-by: Chuck Lever <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2009-08-09NFSv4: Clean up the nfs.callback_tcpport optionTrond Myklebust1-9/+17
Tighten up the validity checking in param_set_port: check for NULL pointers. Ensure that the option shows up on 'modinfo' output. Signed-off-by: Trond Myklebust <[email protected]>
2009-08-09SUNRPC: convert some sysctls into module parametersTrond Myklebust2-0/+73
Parameters like the minimum reserved port, and the number of slot entries should really be module parameters rather than sysctls. Signed-off-by: Trond Myklebust <[email protected]>
2009-08-09NFSv4: Don't do idmapper upcalls for asynchronous RPC callsTrond Myklebust1-28/+58
We don't want to cause rpciod to hang... Signed-off-by: Trond Myklebust <[email protected]>
2009-08-09NFSv4: Add 'server capability' flags for NFSv4 recommended attributesTrond Myklebust4-16/+114
If the NFSv4 server doesn't support a POSIX attribute, the generic NFS code needs to know that, so that it don't keep trying to poll for it. However, by the same count, if the NFSv4 server does support that attribute, then we should ensure that the inode metadata is appropriately labelled as being untrusted. For instance, if we don't know the correct value of the file's uid, we should certainly not be caching ACLs or ACCESS results. Signed-off-by: Trond Myklebust <[email protected]>
2009-08-09NFSv4: Don't loop forever on state recovery failure...Trond Myklebust1-6/+12
If the server is broken, then retrying forever won't fix it. We should just give up after a while, and return an error to the user. We set the number of retries to 10 for now... Signed-off-by: Trond Myklebust <[email protected]>
2009-08-09nfs: Keep index within mnt_errtbl[]Roel Kluin1-2/+2
Ensure that index i remains within array mnt_errtbl[] and mnt3_errtbl[]. Signed-off-by: Roel Kluin <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2009-08-09perf tools: callchain: Fix bad rounding of minimum rateFrederic Weisbecker1-2/+3
Sometimes we get callchain branches that have a rate under the limit given by the user. Say you launched: perf record -f -g -a ./hackbench 10 perf report -g fractal,10.0 And you got: 2.33% hackbench [kernel] [k] _spin_lock_irqsave | |--78.57%-- remove_wait_queue | poll_freewait | do_sys_poll | sys_poll | sysenter_dispatch | 0xf7ffa430 | 0x1ffadea3c | |--7.14%-- __up_read | up_read | do_page_fault | page_fault | 0xf7ffa430 | 0xa0df710000000a ... It is abnormal to get a 7.14% branch whereas we passed a 10% filter. The problem is that we round down the minimum threshold. This happens mostly when we have very low number of events. If the total amount of your branch is 4 and you have a subranch of 3 events, filtering to 90% will be computed like follows: limit = 4 * 0.9; The result is about 3.6, but the cast to integer will round down to 3. It means that our filter is actually of 75% We must then explicitly round up the minimum threshold. Reported-by: Ingo Molnar <[email protected]> Signed-off-by: Frederic Weisbecker <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] LKML-Reference: <20090809024235.GA10146@nowhere> Signed-off-by: Ingo Molnar <[email protected]>
2009-08-09perf_counter tools: Fix libbfd detection for systems with libz dependencyMike Galbraith1-0/+4
Due to a libz dependency in some distro's binutils package, C++ demangle support isn't compiled in despite the necessary libraries being available. Fix this by adding a -lz link test to the dependency detection rules. Signed-off-by: Mike Galbraith <[email protected]> Acked-by: Peter Zijlstra <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-08-09perf: "Longum est iter per praecepta, breve et efficax per exempla"Carlos R. Mafra1-0/+225
A few examples of how 'perf' can be used, from an e-mail by Ingo Molnar http://lkml.org/lkml/2009/8/4/346. Signed-off-by: Carlos R. Mafra <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: [email protected] LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-08-09perf_counter: Fix a race on perf_counter_ctxPeter Zijlstra1-15/+15
While extending perfcounters with BTS hw-tracing, Markus Metzger managed to trigger this warning: [ 995.557128] WARNING: at kernel/perf_counter.c:1191 __perf_counter_task_sched_out+0x48/0x6b() triggers because commit 9f498cc5be7e013d8d6e4c616980ed0ffc8680d2 (perf_counter: Full task tracing) removed clearing of tsk->perf_counter_ctxp out from under ctx->lock which introduced a race (against perf_lock_task_context). Move it back and deal with the exit notification by explicitly passing along the former task context. Reported-by: Markus T Metzger <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Cc: Paul Mackerras <[email protected]> LKML-Reference: <1249667341.17467.5.camel@twins> Signed-off-by: Ingo Molnar <[email protected]>
2009-08-09perf_counter: Fix tracepoint sampling to be part of generic samplingFrederic Weisbecker2-15/+15
Based on Peter's comments, make tracepoint sampling generic just like all the other sampling bits are. This is a rename with no code changes: - PERF_SAMPLE_TP_RECORD to PERF_SAMPLE_RAW - struct perf_tracepoint_record to perf_raw_record We want the system in place that transport tracepoints raw samples events into the perf ring buffer to be generalized and usable by any type of counter. Reported-by; Peter Zijlstra <[email protected]> Signed-off-by: Frederic Weisbecker <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Paul Mackerras <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-08-09perf_counter: Work around gcc warning by initializing tracepoint record ↵Frederic Weisbecker1-3/+4
unconditionally Despite that the tracepoint record is always present when the PERF_SAMPLE_TP_RECORD flag is set, gcc raises a warning, thinking it might not be initialized: kernel/perf_counter.c: In function ‘perf_counter_output’: kernel/perf_counter.c:2650: warning: ‘tp’ may be used uninitialized in this function Then, initialize it to NULL and always check if it's not NULL before dereference it. Reported-by: Ingo Molnar <[email protected]> Signed-off-by: Frederic Weisbecker <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Paul Mackerras <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-08-09perf tools: callchain: Fix sum of percentages to be 100% by displaying ↵Frederic Weisbecker1-3/+44
amount of ignored chains in fractal mode When we filter the callchains below a given percentage, we ignore them and the end result only shows entries that have an upper percentage than the filter threshold. It seems to users then that we have an imbalance in the percentage, as if the sum inside a profiled branch doesn't reach 100%. Since in the past there have been real perf report bugs that showed the same sypmtom, it would be nice to assure the user that the data is perfect and trustable and it all sums up to 100.00%. So fix this by displaying the remaining hits that have been filtered but without more detail than their amount in each branches. Example while filtering below 50%: 7.73% [k] delay_tsc | |--98.22%-- __const_udelay | | | |--86.37%-- ath5k_hw_register_timeout | | ath5k_hw_noise_floor_calibration | | ath5k_hw_reset | | ath5k_reset | | ath5k_config | | ieee80211_hw_config | | | | | |--88.53%-- ieee80211_scan_work | | | worker_thread | | | kthread | | | child_rip | | --11.47%-- [...] | --13.63%-- [...] --1.78%-- [...] Reported-by: Ingo Molnar <[email protected]> Signed-off-by: Frederic Weisbecker <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Mike Galbraith <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-08-09perf tools: callchain: Fix 'perf report' display to be callchain by defaultFrederic Weisbecker2-1/+16
If we recorded with -g option to record the callchain, right now we require a -g option to perf report as well - and people reported this as unnecessary complication: the user already specified -g once, no need to require it a second time. So if the recording includes call-chains, display the callchain by default from perf report. ( The user can override this default using "-g none" option from perf report. ) Reported-by: Ingo Molnar <[email protected]> Signed-off-by: Frederic Weisbecker <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Mike Galbraith <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-08-09perf tools: callchain: Fix spurious 'perf report' warnings: ignore empty ↵Frederic Weisbecker1-0/+2
callchains When the callchain tree comes to insert an empty backtrace, it raises a spurious warning about the fact we are inserting an empty. This is spurious because the radix tree assumes it did something wrong to reach this state. But it didn't, we just met an empty callchain that has to be ignored. This happens occasionally with certain types of call-chain recordings. If it happens it's a big nuisance as perf report output starts with thousands of warning lines. Reported-by: Ingo Molnar <[email protected]> Signed-off-by: Frederic Weisbecker <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Mike Galbraith <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-08-09perf record: Fix the -A UI for empty or non-existent perf.dataPierre Habouzit1-4/+8
1. Ignore the -A argument if there is no perf.data file 2. Treat an empty file like a non existent file. Else, perf will try to read the perf.data header, and fail with an error. Treating an empty file like a non-existent file makes sense, since an interupted (as in SIGKILLed) perf could leave such files around, and you don't want to annoy the user with errors for files with no data in it. Signed-off-by: Pierre Habouzit <[email protected]> Acked-by: Peter Zijlstra <[email protected]> Cc: Paul Mackerras <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-08-09perf util: Fix do_read() to fail on EOF instead of busy-loopingPierre Habouzit1-0/+2
While toying with perf, I've noticed that perf record can easily enter a busy loop when doing something as silly as: $ perf record -A ls Yeah, do_read here really wants to read a known size, not being able to should die(), not busy-loop ;) That was the cause for the bug. Signed-off-by: Pierre Habouzit <[email protected]> Acked-by: Peter Zijlstra <[email protected]> Cc: Paul Mackerras <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-08-09perf list: Fix the output to not include tracepoints without an idPeter Zijlstra1-1/+17
Stop perf list from displaying tracepoints without an id file, those are special tracepoints that are not interfaced to perfcounters so listing them is erroneous and passing them as events will produce no output. Signed-off-by: Peter Zijlstra <[email protected]> Acked-by: Jason Baron <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Chris Mason <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-08-09perf_counter/powerpc: Fix oops on cpus without perf_counter hardware supportPaul Mackerras1-0/+8
If we have the powerpc perf_counter backend compiled in, but the cpu we are running on is one where we don't support the PMU, we currently oops in hw_perf_group_sched_in if we try to use any counters, because ppmu is NULL in that case, and we unconditionally dereference ppmu. This fixes the problem by adding a check if ppmu is NULL at the beginning of hw_perf_group_sched_in, and also at the beginning of the other functions that get called from the perf_counter core, i.e. hw_perf_disable, hw_perf_enable, and hw_perf_counter_setup. Signed-off-by: Paul Mackerras <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: [email protected] Signed-off-by: Ingo Molnar <[email protected]>
2009-08-09perf stat: Fix tool option consistency: rename -S/--scale to -c/--scaleBrice Goglin2-2/+2
We want to use a coherent flag for -S/--stat across all tools, so free up -S in perf stat. Signed-off-by: Brice Goglin <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: [email protected] Signed-off-by: Ingo Molnar <[email protected]>
2009-08-09perf report: Add debug help for the finding of symbol bugs - show the symtab ↵Arnaldo Carvalho de Melo3-12/+50
origin (DSO, build-id, kernel, etc) Used with perf report --verbose: [acme@doppio linux-2.6-tip]$ perf report -v | head -16 5.17% firefox /usr/lib64/xulrunner-1.9.1/libxul.so 0x00000000005d8eee f [.] imgContainer::DrawFrameTo(gfxIImageFrame*, gfxIImageFrame*, nsRect&) 2.56% firefox /lib64/libpthread-2.10.1.so 0x0000000000008e02 d [.] __pthread_mutex_lock_internal 1.94% firefox /usr/lib64/xulrunner-1.9.1/libxul.so 0x0000000000d0af8f f [.] SearchTable 1.75% firefox [kernel] 0xffffffffff60013b k [.] vread_hpet 1.63% firefox /lib64/libpthread-2.10.1.so 0x000000000000a404 d [.] __pthread_mutex_unlock 1.47% firefox /usr/lib64/xulrunner-1.9.1/libmozjs.so 0x00000000000482ea f [.] js_Interpret 1.42% firefox /usr/lib64/xulrunner-1.9.1/libmozjs.so 0x000000000003eda3 f [.] JS_CallTracer 1.24% firefox [kernel] 0xffffffff8102ca4a k [k] read_hpet 1.16% firefox [kernel] 0xffffffff810f3dd4 k [k] fget_light 1.11% firefox /usr/lib64/xulrunner-1.9.1/libmozjs.so 0x00000000000567ff f [.] js_TraceObject 0.98% firefox /usr/lib64/firefox-3.5.2/firefox 0x000000000000dd23 b [.] arena_ralloc [acme@doppio linux-2.6-tip]$ The new field is just after the symbol address. To help in figuring out symbol resolution bugs. Signed-off-by: Arnaldo Carvalho de Melo <[email protected]> Acked-by: Peter Zijlstra <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-08-09perf report: Fix per task mult-counter stat reportingPeter Zijlstra3-5/+35
Brice Goglin reported: > I can easily sort them by thread id, but I don't know how to match > my 4 events with each group of 4 lines. Also report the counter id and the time running/enabled stats (in case the counter got time-shared). Reported-by: Brice Goglin <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Tested-by: Brice Goglin <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-08-09perf tools: Fix multi-counter stat bug caused by incorrect reading of ↵Peter Zijlstra1-1/+2
perf.data file header Brice Goglin reported that only the first result from a multi-counter perf record --stat run is accurate, the rest looks bogus. A silly mistake made us re-read the first attribute for every recorded attribute. Reported-by: Brice Goglin <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Tested-by: Brice Goglin <[email protected]> Cc: [email protected] Signed-off-by: Ingo Molnar <[email protected]>
2009-08-09perf tools: Fix call-chain cumul hit based sub-total (fractal mode)Frederic Weisbecker3-14/+24
The callchain fractal mode builds each new total hits in a new branch of profiling by using the parent's hits of the current branch plus the hits of the children. This is wrong, the total hits of a branch should be made of the sum of every children hits, we must ignore the parent hits in this scope. This patch also fixes another mistake with the hit counting. Now the rates are correct. Signed-off-by: Frederic Weisbecker <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Pekka Enberg <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-08-09perf top: Update man pageMike Galbraith1-13/+99
perf_counter tools: update perf top manual page to reflect current implementation. Signed-off-by: Mike Galbraith <[email protected]> Cc: Peter Zijlstra <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-08-09perf top: Improve interactive key handlingMike Galbraith1-35/+73
Pressing any key which is not currently mapped to functionality, based on startup command line options, displays currently mapped keys, and prompts for input. Pressing any unmapped key at the prompt returns the user to display mode with variables unchanged. eg, pressing ? <SPACE> <ESC> etc displays currently available keys, the value of the variable associated with that key, and prompts. Pressing same again aborts input. Signed-off-by: Mike Galbraith <[email protected]> Cc: Peter Zijlstra <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-08-09perf_counter: Fix software counters for fast moving event sourcesPeter Zijlstra1-70/+94
Reimplement the software counters to deal with fast moving event sources (such as tracepoints). This means being able to generate multiple overflows from a single 'event' as well as support throttling. Signed-off-by: Peter Zijlstra <[email protected]> Cc: Paul Mackerras <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-08-09perf_counter tools: Allow perf top top users to switch between weighted and ↵Mike Galbraith1-7/+17
individual counter display Add [w]eighted hotkey. Pressing [w] toggles between displaying weighted total of all counters, and the counter selected via [E]vent select key. ------------------------------------------------------------------------------ PerfTop: 90395 irqs/sec kernel:16.1% [cache-misses/cache-references/instructions], (all, 4 CPUs) ------------------------------------------------------------------------------ weight samples pcnt RIP kernel function ______ _______ _____ ________________ _______________ 1275408.6 10881 - 5.3% - ffffffff81146f70 : copy_page_c 553683.4 43569 - 21.3% - ffffffff81146f20 : clear_page_c 74075.0 6768 - 3.3% - ffffffff81147190 : copy_user_generic_string 40602.9 7538 - 3.7% - ffffffff81284ba2 : _spin_lock 26882.1 965 - 0.5% - ffffffff8109d280 : file_ra_state_init [w] ------------------------------------------------------------------------------ PerfTop: 91221 irqs/sec kernel:14.5% [10000Hz cache-misses], (all, 4 CPUs) ------------------------------------------------------------------------------ weight samples pcnt RIP kernel function ______ _______ _____ ________________ _______________ 47320.00 - 22.3% - ffffffff81146f20 : clear_page_c 14261.00 - 6.7% - ffffffff810992f5 : __rmqueue 11046.00 - 5.2% - ffffffff81146f70 : copy_page_c 7842.00 - 3.7% - ffffffff81284ba2 : _spin_lock 7234.00 - 3.4% - ffffffff810aa1d6 : unmap_vmas Signed-off-by: Mike Galbraith <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-08-09perf_counter tools: Fix/resurrect perf top annotation in a simple ↵Mike Galbraith1-40/+456
interactive form perf top used to have annotation support, but it has bitrotted and removed. This patch restores that: it allows the user to select any symbol in kernel space for source level annotation on the fly, switch between event counters and alter display variables. When symbol details are being displayed, stopping annotation reverts to normal. known keys: [d] select display delay. [e] select display entries (lines). [E] select annotation event counter. [f] select normal display count filter. [F] select annotation display count filter (percentage). [qQ] quit. [s] select annotation symbol and start annotation. [S] stop annotation, revert to normal display. [z] toggle event count zeroing. Sample: ------------------------------------------------------------------------------ PerfTop: 16719 irqs/sec kernel:78.7% [cache-misses/cache-references/instructions/cycles], (all, 4 CPUs) ------------------------------------------------------------------------------ Showing cache-misses for e1000_clean_rx_irq Events Pcnt (>=3%) 0 0.0% /* adjust length to remove Ethernet CRC */ 0 0.0% if (!(adapter->flags2 & FLAG2_CRC_STRIPPING)) 0 0.0% length -= 4; 436 5.0% f039: 41 f6 84 24 5c 29 00 testb $0x1,0x295c(%r12) 0 0.0% f089: 8b 4d 84 mov -0x7c(%rbp),%ecx 0 0.0% f08c: 48 83 ef 02 sub $0x2,%rdi 0 0.0% f090: 48 83 ee 02 sub $0x2,%rsi 811 9.3% f094: f3 a4 rep movsb %ds:(%rsi),%es:(%rdi) 0 0.0% 0 0.0% while (rx_desc->status & E1000_RXD_STAT_DD) { 0 0.0% f114: 41 f6 47 0c 01 testb $0x1,0xc(%r15) 7226 82.6% f119: 0f 85 24 fe ff ff jne ef43 <e1000_clean_rx_irq+0x84> Available events: 0 cache-misses 1 cache-references 2 instructions 3 cycles Enter details event counter: 2 ------------------------------------------------------------------------------ PerfTop: 15035 irqs/sec kernel:79.0% [cache-misses/cache-references/instructions/cycles], (all, 4 CPUs) ------------------------------------------------------------------------------ Showing instructions for e1000_clean_rx_irq Events Pcnt (>=3%) 0 0.0% int *work_done, int work_to_do) 0 0.0% { 175 0.9% eebf: 55 push %rbp 1898 9.8% eec0: 48 89 e5 mov %rsp,%rbp 0 0.0% 0 0.0% i = rx_ring->next_to_clean; 140 0.7% ef0a: 0f b7 41 1a movzwl 0x1a(%rcx),%eax 670 3.4% ef0e: 89 45 ac mov %eax,-0x54(%rbp) 0 0.0% { 0 0.0% memcpy(skb->data + offset, from, len); 91 0.5% f07b: 49 8b b6 e8 00 00 00 mov 0xe8(%r14),%rsi 1153 5.9% f082: 48 8b b8 e8 00 00 00 mov 0xe8(%rax),%rdi 42 0.2% f089: 8b 4d 84 mov -0x7c(%rbp),%ecx 14 0.1% f08c: 48 83 ef 02 sub $0x2,%rdi 0 0.0% f090: 48 83 ee 02 sub $0x2,%rsi 1618 8.3% f094: f3 a4 rep movsb %ds:(%rsi),%es:(%rdi) 0 0.0% 0 0.0% /* return some buffers to hardware, one at a time is too slow */ 0 0.0% if (cleaned_count >= E1000_RX_BUFFER_WRITE) { 867 4.5% f0e7: 83 7d b0 0f cmpl $0xf,-0x50(%rbp) 0 0.0% 0 0.0% while (rx_desc->status & E1000_RXD_STAT_DD) { 37 0.2% f114: 41 f6 47 0c 01 testb $0x1,0xc(%r15) 4047 20.8% f119: 0f 85 24 fe ff ff jne ef43 <e1000_clean_rx_irq+0x84> Signed-off-by: Mike Galbraith <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Cc: Paul Mackerras <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-08-09perf_counter: Fix/complete ftrace event records samplingFrederic Weisbecker7-41/+126
This patch implements the kernel side support for ftrace event record sampling. A new counter sampling attribute is added: PERF_SAMPLE_TP_RECORD which requests ftrace events record sampling. In this case if a PERF_TYPE_TRACEPOINT counter is active and a tracepoint fires, we emit the tracepoint binary record to the perfcounter event buffer, as a sample. Result, after setting PERF_SAMPLE_TP_RECORD attribute from perf record: perf record -f -F 1 -a -e workqueue:workqueue_execution perf report -D 0x21e18 [0x48]: event: 9 . . ... raw event: size 72 bytes . 0000: 09 00 00 00 01 00 48 00 d0 c7 00 81 ff ff ff ff ......H........ . 0010: 0a 00 00 00 0a 00 00 00 21 00 00 00 00 00 00 00 ........!...... . 0020: 2b 00 01 02 0a 00 00 00 0a 00 00 00 65 76 65 6e +...........eve . 0030: 74 73 2f 31 00 00 00 00 00 00 00 00 0a 00 00 00 ts/1........... . 0040: e0 b1 31 81 ff ff ff ff ....... . 0x21e18 [0x48]: PERF_EVENT_SAMPLE (IP, 1): 10: 0xffffffff8100c7d0 period: 33 The raw ftrace binary record starts at offset 0020. Translation: struct trace_entry { type = 0x2b = 43; flags = 1; preempt_count = 2; pid = 0xa = 10; tgid = 0xa = 10; } thread_comm = "events/1" thread_pid = 0xa = 10; func = 0xffffffff8131b1e0 = flush_to_ldisc() What will come next? - Userspace support ('perf trace'), 'flight data recorder' mode for perf trace, etc. - The unconditional copy from the profiling callback brings some costs however if someone wants no such sampling to occur, and needs to be fixed in the future. For that we need to have an instant access to the perf counter attribute. This is a matter of a flag to add in the struct ftrace_event. - Take care of the events recursivity! Don't ever try to record a lock event for example, it seems some locking is used in the profiling fast path and lead to a tracing recursivity. That will be fixed using raw spinlock or recursivity protection. - [...] - Profit! :-) Signed-off-by: Frederic Weisbecker <[email protected]> Cc: Li Zefan <[email protected]> Cc: Tom Zanussi <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Gabriel Munteanu <[email protected]> Cc: Lai Jiangshan <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>