summaryrefslogtreecommitdiff
path: root/usr/src/man/man1/prof.1
blob: 6c165958377c84c0e1b50da5877bc262c2244209 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
'\" te
.\" Copyright (c) 2009, Sun Microsystems, Inc.  All Rights Reserved
.\" Copyright 1989 AT&T
.\" The contents of this file are subject to the terms of the Common Development and Distribution License (the "License").  You may not use this file except in compliance with the License. You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE or http://www.opensolaris.org/os/licensing.
.\"  See the License for the specific language governing permissions and limitations under the License. When distributing Covered Code, include this CDDL HEADER in each file and include the License file at usr/src/OPENSOLARIS.LICENSE.  If applicable, add the following below this CDDL HEADER, with
.\" the fields enclosed by brackets "[]" replaced with your own identifying information: Portions Copyright [yyyy] [name of copyright owner]
.TH PROF 1 "Aug 25, 2009"
.SH NAME
prof \- display profile data
.SH SYNOPSIS
.LP
.nf
\fBprof\fR [\fB-ChsVz\fR] [\fB-a\fR | c | n | t] [\fB-o\fR | x] [\fB-g\fR | l] [\fB-m\fR \fImdata\fR]
     [\fIprog\fR]
.fi

.SH DESCRIPTION
.sp
.LP
The \fBprof\fR command interprets a profile file produced by the \fBmonitor\fR
function.  The symbol table in the object file \fIprog\fR (\fBa.out\fR by
default) is read and correlated with a profile file (\fBmon.out\fR by default).
For each external text symbol the percentage of time spent executing between
the address of that symbol and the address of the next is printed, together
with the number of times that function was called and the average number of
milliseconds per call.
.SH OPTIONS
.sp
.LP
The mutually exclusive options \fB-a\fR, \fB-c\fR, \fB-n\fR, and \fB-t\fR
determine the type of sorting of the output lines:
.sp
.ne 2
.na
\fB\fB-a\fR\fR
.ad
.RS 6n
Sort by increasing symbol address.
.RE

.sp
.ne 2
.na
\fB\fB-c\fR\fR
.ad
.RS 6n
Sort by decreasing number of calls.
.RE

.sp
.ne 2
.na
\fB\fB-n\fR\fR
.ad
.RS 6n
Sort lexically by symbol name.
.RE

.sp
.ne 2
.na
\fB\fB-t\fR\fR
.ad
.RS 6n
Sort by decreasing percentage of total time (default).
.RE

.sp
.LP
The mutually exclusive options \fB-o\fR and  \fB-x\fR specify the printing of
the address of each symbol monitored:
.sp
.ne 2
.na
\fB\fB-o\fR\fR
.ad
.RS 6n
Print each symbol address (in octal) along with the symbol name.
.RE

.sp
.ne 2
.na
\fB\fB-x\fR\fR
.ad
.RS 6n
Print each symbol address (in hexadecimal) along with the symbol name.
.RE

.sp
.LP
The mutually exclusive options \fB-g\fR and \fB-l\fR control the type of
symbols to be reported. The  \fB-l\fR option must be used with care; it applies
the time spent in a static function to the preceding (in memory) global
function, instead of giving the static function a separate entry in the report.
If all static functions are properly located, this feature can be very useful.
If not, the resulting report may be misleading.
.sp
.LP
Assume that  \fBA\fR and \fBB\fR are global functions and only  \fBA\fR calls
static function  \fBS\fR. If  \fBS\fR is located immediately after  A in the
source code (that is, if  \fBS\fR is properly located), then, with the
\fB-l\fR option, the amount of time spent in  \fBA\fR can easily be determined,
including the time spent in  \fBS\fR. If, however, both  \fBA\fR and \fBB\fR
call  \fBS\fR, then, if the  \fB-l\fR option is used, the report will be
misleading; the time spent during  \fBB\fR's call to  \fBS\fR will be
attributed to  \fBA\fR, making it appear as if more time had been spent in
\fBA\fR than really had.  In this case, function  \fBS\fR cannot be properly
located.
.sp
.ne 2
.na
\fB\fB-g\fR\fR
.ad
.RS 6n
List the time spent in static (non-global) functions separately. The \fB-g\fR
option function is the opposite of the  \fB-l\fR function.
.RE

.sp
.ne 2
.na
\fB\fB-l\fR\fR
.ad
.RS 6n
Suppress printing statically declared functions.  If this option is given, time
spent executing in a static function is allocated to the closest global
function loaded before the static function in the executable.  This option is
the default.  It is the opposite of  the  \fB-g\fR function and should be used
with care.
.RE

.sp
.LP
The following options may be used in any combination:
.sp
.ne 2
.na
\fB\fB-C\fR\fR
.ad
.RS 12n
Demangle C++ symbol names before printing them out.
.RE

.sp
.ne 2
.na
\fB\fB-h\fR\fR
.ad
.RS 12n
Suppress the heading normally printed on the report. This is useful if the
report is to be processed further.
.RE

.sp
.ne 2
.na
\fB\fB-m\fR \fImdata\fR\fR
.ad
.RS 12n
Use file \fImdata\fR instead of \fBmon.out\fR as the input profile file.
.RE

.sp
.ne 2
.na
\fB\fB-s\fR\fR
.ad
.RS 12n
Print a summary of several of the monitoring parameters and statistics on the
standard error output.
.RE

.sp
.ne 2
.na
\fB\fB-V\fR\fR
.ad
.RS 12n
Print  \fBprof\fR version information on the standard error output.
.RE

.sp
.ne 2
.na
\fB\fB-z\fR\fR
.ad
.RS 12n
Include all symbols in the profile range, even if associated with zero number
of calls and zero time.
.RE

.sp
.LP
A single function may be split into subfunctions for profiling by means of the
\fBMARK\fR macro. See  \fBprof\fR(5).
.SH ENVIRONMENT VARIABLES
.sp
.ne 2
.na
\fB\fBPROFDIR\fR\fR
.ad
.RS 11n
The name of the file created by a profiled program is controlled by the
environment variable \fBPROFDIR\fR. If \fBPROFDIR\fR is not set,  \fBmon.out\fR
is produced in the directory current when the program terminates. If
\fBPROFDIR\fR=\fIstring\fR, \fIstring\fR\fB/\fR\fIpid\fR\fB\&.\fR\fIprogname\fR
is produced, where  \fIprogname\fR consists of  \fBargv[0]\fR with any path
prefix removed, and \fIpid\fR is the process ID of the program.  If
\fBPROFDIR\fR is set, but null, no profiling output is produced.
.RE

.SH FILES
.sp
.ne 2
.na
\fB\fBmon.out\fR\fR
.ad
.RS 11n
default profile file
.RE

.sp
.ne 2
.na
\fB\fBa.out\fR\fR
.ad
.RS 11n
default namelist (object) file
.RE

.SH SEE ALSO
.sp
.LP
\fBgprof\fR(1), \fBexit\fR(2), \fBpcsample\fR(2), \fBprofil\fR(2),
\fBmalloc\fR(3C), \fBmalloc\fR(3MALLOC), \fBmonitor\fR(3C),
\fBattributes\fR(5), \fBprof\fR(5)
.SH NOTES
.sp
.LP
If the executable image has been stripped and does not have the \fB\&.symtab\fR
symbol table, \fBgprof\fR reads the global dynamic symbol tables
\fB\&.dynsym\fR and \fB\&.SUNW_ldynsym\fR, if present.  The symbols in the
dynamic symbol tables are a subset of the symbols that are found in
\fB\&.symtab\fR. The \fB\&.dynsym\fR symbol table contains the global symbols
used by the runtime linker. \fB\&.SUNW_ldynsym\fR augments the information in
\fB\&.dynsym\fR with local function symbols. In the case where \fB\&.dynsym\fR
is found and \fB\&.SUNW_ldynsym\fR is not, only the  information for the global
symbols is available. Without local symbols, the behavior is as described for
the  \fB-a\fR option.
.sp
.LP
The times reported in successive identical runs may show variances because of
varying cache-hit ratios that result from sharing the cache with other
processes. Even if a program seems to be the only one using the machine, hidden
background or asynchronous processes may blur the data. In rare cases, the
clock ticks initiating recording of the program counter may \fBbeat\fR with
loops in a program, grossly distorting measurements. Call counts are always
recorded precisely, however.
.sp
.LP
Only programs that call  \fBexit\fR or return from  \fBmain\fR are guaranteed
to produce a profile file, unless a final call to  \fBmonitor\fR is explicitly
coded.
.sp
.LP
The times for static functions are attributed to the preceding external text
symbol if the \fB-g\fR option is not used. However, the call counts for the
preceding function are still correct; that is, the static function call counts
are not added to the call counts of the external function.
.sp
.LP
If more than one of the options  \fB-t\fR, \fB-c\fR, \fB-a\fR,  and  \fB-n\fR
is specified, the last option specified is used and the user is warned.
.sp
.LP
\fBLD_LIBRARY_PATH\fR must not contain \fB/usr/lib\fR as a component when
compiling a program for profiling. If   \fBLD_LIBRARY_PATH\fR contains
\fB/usr/lib\fR, the program will not be linked correctly with the profiling
versions of the system libraries in \fB/usr/lib/libp\fR. See \fBgprof\fR(1).
.sp
.LP
Functions such as  \fBmcount()\fR, \fB_mcount()\fR, \fBmoncontrol()\fR,
\fB_moncontrol()\fR, \fBmonitor()\fR, and \fB_monitor()\fR may appear in the
\fBprof\fR report.  These functions are part of the profiling implementation
and thus account for some amount of the runtime overhead.  Since these
functions are not present in an unprofiled application, time accumulated and
call counts for these functions may be ignored when evaluating the performance
of an application.
.SS "64-bit profiling"
.sp
.LP
64-bit profiling may be used freely with dynamically linked executables, and
profiling information is collected for the shared objects if the objects are
compiled for profiling. Care must be applied to interpret the profile output,
since it is possible for symbols from different shared objects to have the same
name. If duplicate names are seen in the profile output, it is better to use
the \fB-s\fR (summary) option, which prefixes a module id before each symbol
that is duplicated. The symbols can then be mapped to appropriate modules by
looking at the modules information in the summary.
.sp
.LP
If the \fB-a\fR option is used with a dynamically linked executable, the
sorting occurs on a per-shared-object basis. Since there is a high likelihood
of symbols from differed shared objects to have the same value, this results in
an output that is more understandable. A blank line separates the symbols from
different shared objects, if the \fB-s\fR option is given.
.SS "32-bit profiling"
.sp
.LP
32-bit profiling may be used with dynamically linked executables, but care must
be applied. In 32-bit profiling, shared objects cannot be profiled with
\fBprof\fR. Thus, when a profiled, dynamically linked program is executed, only
the \fBmain\fR portion of the image is sampled. This means that all time spent
outside of the \fBmain\fR object, that is, time spent in a shared object, will
not be included in the profile summary; the total time reported for the program
may be less than the total time used by the program.
.sp
.LP
Because the time spent in a shared object cannot be accounted for, the use of
shared objects should be minimized whenever a program is profiled with
\fBprof\fR. If desired, the program should be linked to the profiled version of
a library (or to the standard archive version if no profiling version is
available), instead of the shared object to get profile information on the
functions of a library. Versions of profiled libraries may be supplied with the
system in the \fB/usr/lib/libp\fR directory. Refer to compiler driver
documentation on profiling.
.sp
.LP
Consider an extreme case. A profiled program dynamically linked with the shared
C library spends 100 units of time in some  \fBlibc\fR routine, say,
\fBmalloc()\fR. Suppose  \fBmalloc()\fR is called only from routine  \fBB\fR
and \fBB\fR consumes only 1 unit of time. Suppose further that routine  \fBA\fR
consumes 10 units of time, more than any other routine in the \fBmain\fR
(profiled) portion of the image. In this case,  \fBprof\fR will conclude that
most of the time is being spent in  \fBA\fR and almost no time is being spent
in  \fBB\fR. From this it will be almost impossible to tell that the greatest
improvement can be made by looking at routine  \fBB\fR and not routine
\fBA\fR. The value of the profiler in this case is severely degraded; the
solution is to use archives as much as possible for profiling.