Age | Commit message (Collapse) | Author | Files | Lines |
|
Clean up: Replace PROC macro with open coded C99 structure
initializers to improve readability.
The rpcbind v4 GETVERSADDR procedure is never sent by the current
implementation, so it is not copied to the new structures.
Signed-off-by: Chuck Lever <[email protected]>
Signed-off-by: Trond Myklebust <[email protected]>
|
|
Signed-off-by: Chuck Lever <[email protected]>
Signed-off-by: Trond Myklebust <[email protected]>
|
|
Replace the open-coded decode logic for PMAP_GETPORT/RPCB_GETADDR with
an xdr_stream-based implementation, similar to what NFSv4 uses, to
protect against buffer overflows. The new implementation also checks
that the incoming port number is reasonable.
Signed-off-by: Chuck Lever <[email protected]>
Signed-off-by: Trond Myklebust <[email protected]>
|
|
Replace the open-coded decode logic for rpcbind UNSET results with an
xdr_stream-based implementation, similar to what NFSv4 uses, to
protect against buffer overflows.
The new function is unused for the moment.
Signed-off-by: Chuck Lever <[email protected]>
Signed-off-by: Trond Myklebust <[email protected]>
|
|
Signed-off-by: Chuck Lever <[email protected]>
Signed-off-by: Trond Myklebust <[email protected]>
|
|
Replace the open-coded encode logic for rpcbind arguments with an
xdr_stream-based implementation, similar to what NFSv4 uses, to
better protect against buffer overflows.
Signed-off-by: Chuck Lever <[email protected]>
Signed-off-by: Trond Myklebust <[email protected]>
|
|
In write_failover_ip(), replace the sscanf() with a call to the common
sunrpc.ko presentation address parser.
Signed-off-by: Chuck Lever <[email protected]>
Signed-off-by: Trond Myklebust <[email protected]>
|
|
Clean up.
Signed-off-by: Chuck Lever <[email protected]>
Signed-off-by: Trond Myklebust <[email protected]>
|
|
Clean up: Use shared rpc_set_port() function instead of nlm_clear_port().
Signed-off-by: Chuck Lever <[email protected]>
Signed-off-by: Trond Myklebust <[email protected]>
|
|
Clean up.
Signed-off-by: Chuck Lever <[email protected]>
Signed-off-by: Trond Myklebust <[email protected]>
|
|
Clean up: Use the common routine now provided in sunrpc.ko for parsing mount
addresses.
Signed-off-by: Chuck Lever <[email protected]>
Signed-off-by: Trond Myklebust <[email protected]>
|
|
Clean up: In addition to using the new generic rpc_ntop() and
rpc_get_port() functions, have the RPC client compute the presentation
address buffer sizes dynamically using kstrdup().
Signed-off-by: Chuck Lever <[email protected]>
Signed-off-by: Trond Myklebust <[email protected]>
|
|
RPC universal address generation is currently done in several places:
rpcb_clnt.c, nfs4proc.c xprtsock.c, and xprtrdma.c. Remove the
redundant cases that convert a socket address to a universal
address. The nfs4proc.c case takes a pre-formatted presentation
address string, not a socket address, so we'll leave that one.
Because the new uaddr constructor uses the recently introduced
rpc_ntop(), it now supports proper "::" shorthanding for IPv6
addresses. This allows the kernel to register properly formed
universal addresses with the local rpcbind service, in _all_ cases.
The kernel can now also send properly formed universal addresses in
RPCB_GETADDR requests, and support link-local properly when
encoding and decoding IPv6 addresses.
Signed-off-by: Chuck Lever <[email protected]>
Signed-off-by: Trond Myklebust <[email protected]>
|
|
Introduce a set of functions in the kernel's RPC implementation for
converting between a socket address and either a standard
presentation address string or an RPC universal address.
The universal address functions will be used to encode and decode
RPCB_FOO and NFSv4 SETCLIENTID arguments. The other functions are
part of a previous promise to deliver shared functions that can be
used by upper-layer protocols to display and manipulate IP
addresses.
The kernel's current address printf formatters were designed
specifically for kernel to user-space APIs that require a particular
string format for socket addresses, thus are somewhat limited for the
purposes of sunrpc.ko. The formatter for IPv6 addresses, %pI6, does
not support short-handing or scope IDs. Also, these printf formatters
are unique per address family, so a separate formatter string is
required for printing AF_INET and AF_INET6 addresses.
Signed-off-by: Chuck Lever <[email protected]>
Signed-off-by: Trond Myklebust <[email protected]>
|
|
Clean up: To make subsequent patches cleaner, move the XDR data type
size macros to the top of the file (similar to nfs4xdr.c) first.
Signed-off-by: Chuck Lever <[email protected]>
Signed-off-by: Trond Myklebust <[email protected]>
|
|
Clean up: Replace the single-integer definition of RPCBIND_MAXUADDRLEN
with a definition that is based on previously defined address string
sizes, and document the way this maximum is calculated. Also provide
a separate macro for the size of the port number extension.
Signed-off-by: Chuck Lever <[email protected]>
Signed-off-by: Trond Myklebust <[email protected]>
|
|
Commit a14017db added support in the kernel's NFS mount client to
decode the authentication flavor list returned by mountd.
The NFS client can now use this list to determine whether the
authentication flavor requested by the user is actually supported
by the server.
Note we don't actually negotiate the security flavor if none was
specified by the user. Instead, we try to use AUTH_SYS, and fail if
the server does not support it. This prevents us from negotiating
an inappropriate security flavor (some servers list AUTH_NULL first).
If the server does not support AUTH_SYS, the user must provide an
appropriate security flavor by specifying the "sec=" mount option.
Signed-off-by: Chuck Lever <[email protected]>
Signed-off-by: Trond Myklebust <[email protected]>
|
|
Previous logic in the NFS mount parsing code path assumed
auth_flavor_len was set to zero for simple authentication flavors
(like AUTH_UNIX), and 1 for compound flavors (like AUTH_GSS).
At some earlier point (maybe even before the option parsers were
merged?) specific checks for auth_flavor_len being zero were removed
from the functions that validate the mount option that sets the mount
point's authentication flavor.
Since we are populating an array for authentication flavors, the
auth_flavor_len should always be set to the number of flavors. Let's
eliminate some cleverness here, and prepare for new logic that needs
to know the number of flavors in the auth_flavors[] array.
(auth_flavors[] is an array because at some point we want to allow a
list of acceptable authentication flavors to be specified via the sec=
mount option. For now it remains a single element array).
Signed-off-by: Chuck Lever <[email protected]>
Signed-off-by: Trond Myklebust <[email protected]>
|
|
After certain failure modes of an NFS mount, an NFS client should send
a MOUNTPROC_UMNT request to remove the just-added mount entry from the
server's mount table. While no-one should rely on the accuracy of the
server's mount table, sending a UMNT is simply being a good internet
neighbor.
Since NFS mount processing is handled in the kernel now, we will need
a function in the kernel's mountd client that can post a MOUNTRPC_UMNT
request, in order to handle these failure modes.
Signed-off-by: Chuck Lever <[email protected]>
Signed-off-by: Trond Myklebust <[email protected]>
|
|
The new minorversion= mount option (commit 3fd5be9e) was merged at
the same time as the recent sloppy parser fixes (commit a5a16bae),
so minorversion= still uses the old value parsing logic.
If the minorversion= option specifies a bogus value, it should fail
with "bad value" not "bad option."
Signed-off-by: Chuck Lever <[email protected]>
Signed-off-by: Trond Myklebust <[email protected]>
|
|
Tighten up the validity checking in param_set_port: check for NULL pointers.
Ensure that the option shows up on 'modinfo' output.
Signed-off-by: Trond Myklebust <[email protected]>
|
|
Parameters like the minimum reserved port, and the number of slot entries
should really be module parameters rather than sysctls.
Signed-off-by: Trond Myklebust <[email protected]>
|
|
We don't want to cause rpciod to hang...
Signed-off-by: Trond Myklebust <[email protected]>
|
|
If the NFSv4 server doesn't support a POSIX attribute, the generic NFS code
needs to know that, so that it don't keep trying to poll for it.
However, by the same count, if the NFSv4 server does support that
attribute, then we should ensure that the inode metadata is appropriately
labelled as being untrusted. For instance, if we don't know the correct
value of the file's uid, we should certainly not be caching ACLs or ACCESS
results.
Signed-off-by: Trond Myklebust <[email protected]>
|
|
If the server is broken, then retrying forever won't fix it. We
should just give up after a while, and return an error to the user.
We set the number of retries to 10 for now...
Signed-off-by: Trond Myklebust <[email protected]>
|
|
Ensure that index i remains within array mnt_errtbl[] and mnt3_errtbl[].
Signed-off-by: Roel Kluin <[email protected]>
Signed-off-by: Trond Myklebust <[email protected]>
|
|
Sometimes we get callchain branches that have a rate under the
limit given by the user.
Say you launched:
perf record -f -g -a ./hackbench 10
perf report -g fractal,10.0
And you got:
2.33% hackbench [kernel] [k] _spin_lock_irqsave
|
|--78.57%-- remove_wait_queue
| poll_freewait
| do_sys_poll
| sys_poll
| sysenter_dispatch
| 0xf7ffa430
| 0x1ffadea3c
|
|--7.14%-- __up_read
| up_read
| do_page_fault
| page_fault
| 0xf7ffa430
| 0xa0df710000000a
...
It is abnormal to get a 7.14% branch whereas we passed a 10%
filter.
The problem is that we round down the minimum threshold. This
happens mostly when we have very low number of events. If the
total amount of your branch is 4 and you have a subranch of 3
events, filtering to 90% will be computed like follows:
limit = 4 * 0.9;
The result is about 3.6, but the cast to integer will round
down to 3. It means that our filter is actually of 75%
We must then explicitly round up the minimum threshold.
Reported-by: Ingo Molnar <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
LKML-Reference: <20090809024235.GA10146@nowhere>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Due to a libz dependency in some distro's binutils package,
C++ demangle support isn't compiled in despite the necessary
libraries being available.
Fix this by adding a -lz link test to the dependency detection
rules.
Signed-off-by: Mike Galbraith <[email protected]>
Acked-by: Peter Zijlstra <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
A few examples of how 'perf' can be used, from an e-mail by
Ingo Molnar http://lkml.org/lkml/2009/8/4/346.
Signed-off-by: Carlos R. Mafra <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: [email protected]
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
While extending perfcounters with BTS hw-tracing, Markus
Metzger managed to trigger this warning:
[ 995.557128] WARNING: at kernel/perf_counter.c:1191 __perf_counter_task_sched_out+0x48/0x6b()
triggers because commit
9f498cc5be7e013d8d6e4c616980ed0ffc8680d2 (perf_counter: Full
task tracing) removed clearing of tsk->perf_counter_ctxp out
from under ctx->lock which introduced a race (against
perf_lock_task_context).
Move it back and deal with the exit notification by explicitly
passing along the former task context.
Reported-by: Markus T Metzger <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
Cc: Paul Mackerras <[email protected]>
LKML-Reference: <1249667341.17467.5.camel@twins>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Based on Peter's comments, make tracepoint sampling generic
just like all the other sampling bits are. This is a rename
with no code changes:
- PERF_SAMPLE_TP_RECORD to PERF_SAMPLE_RAW
- struct perf_tracepoint_record to perf_raw_record
We want the system in place that transport tracepoints raw
samples events into the perf ring buffer to be generalized and
usable by any type of counter.
Reported-by; Peter Zijlstra <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Paul Mackerras <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
unconditionally
Despite that the tracepoint record is always present when the
PERF_SAMPLE_TP_RECORD flag is set, gcc raises a warning,
thinking it might not be initialized:
kernel/perf_counter.c: In function ‘perf_counter_output’:
kernel/perf_counter.c:2650: warning: ‘tp’ may be used uninitialized in this function
Then, initialize it to NULL and always check if it's not NULL
before dereference it.
Reported-by: Ingo Molnar <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Paul Mackerras <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
amount of ignored chains in fractal mode
When we filter the callchains below a given percentage, we
ignore them and the end result only shows entries that have an
upper percentage than the filter threshold.
It seems to users then that we have an imbalance in the
percentage, as if the sum inside a profiled branch doesn't
reach 100%.
Since in the past there have been real perf report bugs that
showed the same sypmtom, it would be nice to assure the user
that the data is perfect and trustable and it all sums up to
100.00%.
So fix this by displaying the remaining hits that have been
filtered but without more detail than their amount in each
branches. Example while filtering below 50%:
7.73% [k] delay_tsc
|
|--98.22%-- __const_udelay
| |
| |--86.37%-- ath5k_hw_register_timeout
| | ath5k_hw_noise_floor_calibration
| | ath5k_hw_reset
| | ath5k_reset
| | ath5k_config
| | ieee80211_hw_config
| | |
| | |--88.53%-- ieee80211_scan_work
| | | worker_thread
| | | kthread
| | | child_rip
| | --11.47%-- [...]
| --13.63%-- [...]
--1.78%-- [...]
Reported-by: Ingo Molnar <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Mike Galbraith <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
If we recorded with -g option to record the callchain, right now
we require a -g option to perf report as well - and people reported
this as unnecessary complication: the user already specified -g
once, no need to require it a second time.
So if the recording includes call-chains, display the callchain by
default from perf report.
( The user can override this default using "-g none" option from
perf report. )
Reported-by: Ingo Molnar <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Mike Galbraith <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
callchains
When the callchain tree comes to insert an empty backtrace, it
raises a spurious warning about the fact we are inserting an
empty. This is spurious because the radix tree assumes it did
something wrong to reach this state. But it didn't, we just met
an empty callchain that has to be ignored.
This happens occasionally with certain types of call-chain
recordings. If it happens it's a big nuisance as perf report
output starts with thousands of warning lines.
Reported-by: Ingo Molnar <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Mike Galbraith <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
1. Ignore the -A argument if there is no perf.data file
2. Treat an empty file like a non existent file.
Else, perf will try to read the perf.data header, and fail with
an error.
Treating an empty file like a non-existent file makes sense,
since an interupted (as in SIGKILLed) perf could leave such
files around, and you don't want to annoy the user with errors
for files with no data in it.
Signed-off-by: Pierre Habouzit <[email protected]>
Acked-by: Peter Zijlstra <[email protected]>
Cc: Paul Mackerras <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
While toying with perf, I've noticed that perf record can
easily enter a busy loop when doing something as silly as:
$ perf record -A ls
Yeah, do_read here really wants to read a known size, not being
able to should die(), not busy-loop ;)
That was the cause for the bug.
Signed-off-by: Pierre Habouzit <[email protected]>
Acked-by: Peter Zijlstra <[email protected]>
Cc: Paul Mackerras <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Stop perf list from displaying tracepoints without an id file,
those are special tracepoints that are not interfaced to
perfcounters so listing them is erroneous and passing them as
events will produce no output.
Signed-off-by: Peter Zijlstra <[email protected]>
Acked-by: Jason Baron <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Chris Mason <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
If we have the powerpc perf_counter backend compiled in, but
the cpu we are running on is one where we don't support the
PMU, we currently oops in hw_perf_group_sched_in if we try to
use any counters, because ppmu is NULL in that case, and we
unconditionally dereference ppmu.
This fixes the problem by adding a check if ppmu is NULL at the
beginning of hw_perf_group_sched_in, and also at the beginning
of the other functions that get called from the perf_counter
core, i.e. hw_perf_disable, hw_perf_enable, and
hw_perf_counter_setup.
Signed-off-by: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: [email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
We want to use a coherent flag for -S/--stat across all tools,
so free up -S in perf stat.
Signed-off-by: Brice Goglin <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: [email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
origin (DSO, build-id, kernel, etc)
Used with perf report --verbose:
[acme@doppio linux-2.6-tip]$ perf report -v | head -16
5.17% firefox /usr/lib64/xulrunner-1.9.1/libxul.so 0x00000000005d8eee f [.] imgContainer::DrawFrameTo(gfxIImageFrame*, gfxIImageFrame*, nsRect&)
2.56% firefox /lib64/libpthread-2.10.1.so 0x0000000000008e02 d [.] __pthread_mutex_lock_internal
1.94% firefox /usr/lib64/xulrunner-1.9.1/libxul.so 0x0000000000d0af8f f [.] SearchTable
1.75% firefox [kernel] 0xffffffffff60013b k [.] vread_hpet
1.63% firefox /lib64/libpthread-2.10.1.so 0x000000000000a404 d [.] __pthread_mutex_unlock
1.47% firefox /usr/lib64/xulrunner-1.9.1/libmozjs.so 0x00000000000482ea f [.] js_Interpret
1.42% firefox /usr/lib64/xulrunner-1.9.1/libmozjs.so 0x000000000003eda3 f [.] JS_CallTracer
1.24% firefox [kernel] 0xffffffff8102ca4a k [k] read_hpet
1.16% firefox [kernel] 0xffffffff810f3dd4 k [k] fget_light
1.11% firefox /usr/lib64/xulrunner-1.9.1/libmozjs.so 0x00000000000567ff f [.] js_TraceObject
0.98% firefox /usr/lib64/firefox-3.5.2/firefox 0x000000000000dd23 b [.] arena_ralloc
[acme@doppio linux-2.6-tip]$
The new field is just after the symbol address. To help in
figuring out symbol resolution bugs.
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
Acked-by: Peter Zijlstra <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Brice Goglin reported:
> I can easily sort them by thread id, but I don't know how to match
> my 4 events with each group of 4 lines.
Also report the counter id and the time running/enabled
stats (in case the counter got time-shared).
Reported-by: Brice Goglin <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
Tested-by: Brice Goglin <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
perf.data file header
Brice Goglin reported that only the first result from a
multi-counter perf record --stat run is accurate, the
rest looks bogus.
A silly mistake made us re-read the first attribute for
every recorded attribute.
Reported-by: Brice Goglin <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
Tested-by: Brice Goglin <[email protected]>
Cc: [email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
The callchain fractal mode builds each new total hits in a new
branch of profiling by using the parent's hits of the current
branch plus the hits of the children.
This is wrong, the total hits of a branch should be made of the
sum of every children hits, we must ignore the parent hits in
this scope.
This patch also fixes another mistake with the hit counting.
Now the rates are correct.
Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Pekka Enberg <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
perf_counter tools: update perf top manual page to reflect
current implementation.
Signed-off-by: Mike Galbraith <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Pressing any key which is not currently mapped to
functionality, based on startup command line options, displays
currently mapped keys, and prompts for input.
Pressing any unmapped key at the prompt returns the user to
display mode with variables unchanged. eg, pressing ? <SPACE>
<ESC> etc displays currently available keys, the value of the
variable associated with that key, and prompts.
Pressing same again aborts input.
Signed-off-by: Mike Galbraith <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Reimplement the software counters to deal with fast moving
event sources (such as tracepoints). This means being able
to generate multiple overflows from a single 'event' as well
as support throttling.
Signed-off-by: Peter Zijlstra <[email protected]>
Cc: Paul Mackerras <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
individual counter display
Add [w]eighted hotkey. Pressing [w] toggles between displaying
weighted total of all counters, and the counter selected via
[E]vent select key.
------------------------------------------------------------------------------
PerfTop: 90395 irqs/sec kernel:16.1% [cache-misses/cache-references/instructions], (all, 4 CPUs)
------------------------------------------------------------------------------
weight samples pcnt RIP kernel function
______ _______ _____ ________________ _______________
1275408.6 10881 - 5.3% - ffffffff81146f70 : copy_page_c
553683.4 43569 - 21.3% - ffffffff81146f20 : clear_page_c
74075.0 6768 - 3.3% - ffffffff81147190 : copy_user_generic_string
40602.9 7538 - 3.7% - ffffffff81284ba2 : _spin_lock
26882.1 965 - 0.5% - ffffffff8109d280 : file_ra_state_init
[w]
------------------------------------------------------------------------------
PerfTop: 91221 irqs/sec kernel:14.5% [10000Hz cache-misses], (all, 4 CPUs)
------------------------------------------------------------------------------
weight samples pcnt RIP kernel function
______ _______ _____ ________________ _______________
47320.00 - 22.3% - ffffffff81146f20 : clear_page_c
14261.00 - 6.7% - ffffffff810992f5 : __rmqueue
11046.00 - 5.2% - ffffffff81146f70 : copy_page_c
7842.00 - 3.7% - ffffffff81284ba2 : _spin_lock
7234.00 - 3.4% - ffffffff810aa1d6 : unmap_vmas
Signed-off-by: Mike Galbraith <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
interactive form
perf top used to have annotation support, but it has bitrotted and
removed.
This patch restores that: it allows the user to select any symbol
in kernel space for source level annotation on the fly, switch
between event counters and alter display variables. When symbol
details are being displayed, stopping annotation reverts to normal.
known keys:
[d] select display delay.
[e] select display entries (lines).
[E] select annotation event counter.
[f] select normal display count filter.
[F] select annotation display count filter (percentage).
[qQ] quit.
[s] select annotation symbol and start annotation.
[S] stop annotation, revert to normal display.
[z] toggle event count zeroing.
Sample:
------------------------------------------------------------------------------
PerfTop: 16719 irqs/sec kernel:78.7% [cache-misses/cache-references/instructions/cycles], (all, 4 CPUs)
------------------------------------------------------------------------------
Showing cache-misses for e1000_clean_rx_irq
Events Pcnt (>=3%)
0 0.0% /* adjust length to remove Ethernet CRC */
0 0.0% if (!(adapter->flags2 & FLAG2_CRC_STRIPPING))
0 0.0% length -= 4;
436 5.0% f039: 41 f6 84 24 5c 29 00 testb $0x1,0x295c(%r12)
0 0.0% f089: 8b 4d 84 mov -0x7c(%rbp),%ecx
0 0.0% f08c: 48 83 ef 02 sub $0x2,%rdi
0 0.0% f090: 48 83 ee 02 sub $0x2,%rsi
811 9.3% f094: f3 a4 rep movsb %ds:(%rsi),%es:(%rdi)
0 0.0%
0 0.0% while (rx_desc->status & E1000_RXD_STAT_DD) {
0 0.0% f114: 41 f6 47 0c 01 testb $0x1,0xc(%r15)
7226 82.6% f119: 0f 85 24 fe ff ff jne ef43 <e1000_clean_rx_irq+0x84>
Available events:
0 cache-misses
1 cache-references
2 instructions
3 cycles
Enter details event counter: 2
------------------------------------------------------------------------------
PerfTop: 15035 irqs/sec kernel:79.0% [cache-misses/cache-references/instructions/cycles], (all, 4 CPUs)
------------------------------------------------------------------------------
Showing instructions for e1000_clean_rx_irq
Events Pcnt (>=3%)
0 0.0% int *work_done, int work_to_do)
0 0.0% {
175 0.9% eebf: 55 push %rbp
1898 9.8% eec0: 48 89 e5 mov %rsp,%rbp
0 0.0%
0 0.0% i = rx_ring->next_to_clean;
140 0.7% ef0a: 0f b7 41 1a movzwl 0x1a(%rcx),%eax
670 3.4% ef0e: 89 45 ac mov %eax,-0x54(%rbp)
0 0.0% {
0 0.0% memcpy(skb->data + offset, from, len);
91 0.5% f07b: 49 8b b6 e8 00 00 00 mov 0xe8(%r14),%rsi
1153 5.9% f082: 48 8b b8 e8 00 00 00 mov 0xe8(%rax),%rdi
42 0.2% f089: 8b 4d 84 mov -0x7c(%rbp),%ecx
14 0.1% f08c: 48 83 ef 02 sub $0x2,%rdi
0 0.0% f090: 48 83 ee 02 sub $0x2,%rsi
1618 8.3% f094: f3 a4 rep movsb %ds:(%rsi),%es:(%rdi)
0 0.0%
0 0.0% /* return some buffers to hardware, one at a time is too slow */
0 0.0% if (cleaned_count >= E1000_RX_BUFFER_WRITE) {
867 4.5% f0e7: 83 7d b0 0f cmpl $0xf,-0x50(%rbp)
0 0.0%
0 0.0% while (rx_desc->status & E1000_RXD_STAT_DD) {
37 0.2% f114: 41 f6 47 0c 01 testb $0x1,0xc(%r15)
4047 20.8% f119: 0f 85 24 fe ff ff jne ef43 <e1000_clean_rx_irq+0x84>
Signed-off-by: Mike Galbraith <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
Cc: Paul Mackerras <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
This patch implements the kernel side support for ftrace event
record sampling.
A new counter sampling attribute is added:
PERF_SAMPLE_TP_RECORD
which requests ftrace events record sampling. In this case
if a PERF_TYPE_TRACEPOINT counter is active and a tracepoint
fires, we emit the tracepoint binary record to the
perfcounter event buffer, as a sample.
Result, after setting PERF_SAMPLE_TP_RECORD attribute from perf
record:
perf record -f -F 1 -a -e workqueue:workqueue_execution
perf report -D
0x21e18 [0x48]: event: 9
.
. ... raw event: size 72 bytes
. 0000: 09 00 00 00 01 00 48 00 d0 c7 00 81 ff ff ff ff ......H........
. 0010: 0a 00 00 00 0a 00 00 00 21 00 00 00 00 00 00 00 ........!......
. 0020: 2b 00 01 02 0a 00 00 00 0a 00 00 00 65 76 65 6e +...........eve
. 0030: 74 73 2f 31 00 00 00 00 00 00 00 00 0a 00 00 00 ts/1...........
. 0040: e0 b1 31 81 ff ff ff ff .......
.
0x21e18 [0x48]: PERF_EVENT_SAMPLE (IP, 1): 10: 0xffffffff8100c7d0 period: 33
The raw ftrace binary record starts at offset 0020.
Translation:
struct trace_entry {
type = 0x2b = 43;
flags = 1;
preempt_count = 2;
pid = 0xa = 10;
tgid = 0xa = 10;
}
thread_comm = "events/1"
thread_pid = 0xa = 10;
func = 0xffffffff8131b1e0 = flush_to_ldisc()
What will come next?
- Userspace support ('perf trace'), 'flight data recorder' mode
for perf trace, etc.
- The unconditional copy from the profiling callback brings
some costs however if someone wants no such sampling to
occur, and needs to be fixed in the future. For that we need
to have an instant access to the perf counter attribute.
This is a matter of a flag to add in the struct ftrace_event.
- Take care of the events recursivity! Don't ever try to record
a lock event for example, it seems some locking is used in
the profiling fast path and lead to a tracing recursivity.
That will be fixed using raw spinlock or recursivity
protection.
- [...]
- Profit! :-)
Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Li Zefan <[email protected]>
Cc: Tom Zanussi <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: Gabriel Munteanu <[email protected]>
Cc: Lai Jiangshan <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|