1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
|
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<!--
(c) Copyright 2000-2004 Silicon Graphics Inc. All rights reserved.
Permission is granted to copy, distribute, and/or modify this document
under the terms of the Creative Commons Attribution-Share Alike, Version
3.0 or any later version published by the Creative Commons Corp. A copy
of the license is available at
http://creativecommons.org/licenses/by-sa/3.0/us/ .
-->
<HTML>
<HEAD>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<meta http-equiv="content-style-type" content="text/css">
<link href="pcpdoc.css" rel="stylesheet" type="text/css">
<link href="images/pcp.ico" rel="icon" type="image/ico">
<TITLE>Understanding system-level processor performance</TITLE>
</HEAD>
<BODY LANG="en-AU" TEXT="#000060" DIR="LTR">
<TABLE WIDTH=100% BORDER=0 CELLPADDING=0 CELLSPACING=0 STYLE="page-break-before: always">
<TR> <TD WIDTH=64 HEIGHT=64><FONT COLOR="#000080"><A HREF="http://pcp.io/"><IMG SRC="images/pcpicon.png" NAME="pmcharticon" ALIGN=TOP WIDTH=64 HEIGHT=64 BORDER=0></A></FONT></TD>
<TD WIDTH=1><P> </P></TD>
<TD WIDTH=500><P VALIGN=MIDDLE ALIGN=LEFT><A HREF="index.html"><FONT COLOR="#cc0000">Home</FONT></A> · <A HREF="lab.pmchart.html"><FONT COLOR="#cc0000">Charts</FONT></A> · <A HREF="timecontrol.html"><FONT COLOR="#cc0000">Time Control</FONT></A></P></TD>
</TR>
</TABLE>
<H1 ALIGN=CENTER STYLE="margin-top: 0.48cm; margin-bottom: 0.32cm"><FONT SIZE=7>Understanding measures of system-level processor performance</FONT></H1>
<TABLE WIDTH=15% BORDER=0 CELLPADDING=5 CELLSPACING=10 ALIGN=RIGHT>
<TR><TD BGCOLOR="#e2e2e2"><IMG SRC="images/system-search.png" WIDTH=16 HEIGHT=16 BORDER=0> <I>Tools</I><BR><PRE>
pmchart
mpvis
sar
</PRE></TD></TR>
</TABLE>
<P>This chapter of the Performance Co-Pilot tutorial provides some hints
on how to interpret and understand the various measures of system-level
processor (CPU) performance.</P>
<P>All modern operating systems collect processor resource utilization at both
the <B>process</B>-level and the <B>system</B>-level. This tutorial relate specifically to the <B>system</B>-level metrics.</P>
<P>For an explanation of Performance Co-Pilot terms and acronyms, consult
the <A HREF="glossary.html">PCP glossary</A>.</P>
<P><BR></P>
<TABLE WIDTH=100% BORDER=0 CELLPADDING=0 CELLSPACING=0 BGCOLOR="#e2e2e2">
<TR><TD WIDTH=100% BGCOLOR="#081c59"><P ALIGN=LEFT><FONT SIZE=5 COLOR="#ffffff"><B>How the system-level CPU time is computed</B></FONT></P></TD></TR>
</TABLE>
<P>
Both <I>sar</I> and Performance Co-Pilot (PCP) use a common collection
of system-level CPU performance instrumentation from the kernel.
This instrumentation is based upon statistical sampling of the state of <B>each</B>
CPU in the kernel's software clock interrupt routine which is commonly
called 100 times (HZ) per second on every CPU.</P>
<P>
At each observation a CPU is attributed a quantum of 10 milliseconds of
elapsed time to one of several counters based on the current state of
the code executing on that CPU.</P>
<P>
This sort of statistical sampling is subject to some anomalies,
particularly when activity is strongly correlated with the clock
interrupts, however the distribution of observations over several
seconds or minutes is often an accurate reflection of the true
distribution of CPU time. The kernel profiling mechanisms offer higher
resolution should that be required, however that is beyond the scope
of this document.</P>
<P>
The CPU state is determined by considering what the CPU was doing just
before the clock interrupt, as follows:</P>
<OL>
<LI>
If executing a <B>user</B> thread (i.e. above the kernel system
call interface for some process) then the state is CPU_USER.
<LI>
If executing a kernel interrupt thread, then the state is CPU_INTR.
<LI>
If executing a kernel thread waiting for a graphics event, then the
state is CPU_WAIT.
<LI>
If otherwise executing a kernel thread, then the state is CPU_KERNEL.
<LI>
If not executing a kernel thread and some I/O is pending, then the
state is CPU_WAIT.
<LI>
If not executing a kernel thread and no I/O is pending and some user
thread is paused waiting for memory to become available, then the state
is CPU_SXBRK.
<LI>
Otherwise the state is CPU_IDLE.
</OL>
<P>
These states are mutually exclusive and complete, so exactly one state
is assigned for each CPU at each clock interrupt.</P>
<P>
The kernel agent for PCP exports the following metrics:</P>
<TABLE BORDER="1">
<CAPTION ALIGN="BOTTOM"><B>Table 1: Raw PCP CPU metrics</B></CAPTION>
<TR VALIGN="TOP">
<TH>PCP Metric</TH>
<TH>Semantics</TH>
</TR>
<TR VALIGN="TOP">
<TD><I><TT>kernel.all.cpu.user</TT></I></TD>
<TD>Time counted when in CPU_USER state.</TD>
</TR>
<TR VALIGN="TOP">
<TD><I><TT>kernel.all.cpu.sys</TT></I></TD>
<TD>Time counted when in CPU_KERNEL state.</TD>
</TR>
<TR VALIGN="TOP">
<TD><I><TT>kernel.all.cpu.intr</TT></I></TD>
<TD>Time counted when in CPU_INTR state.</TD>
</TR>
<TR VALIGN="TOP">
<TD><I><TT>kernel.all.cpu.sxbrk</TT></I></TD>
<TD>Time counted when in CPU_SXBRK state (IRIX only).</TD>
</TR>
<TR VALIGN="TOP">
<TD><I><TT>kernel.all.cpu.wait.total</TT></I></TD>
<TD>Time counted when in CPU_WAIT state (UNIX only).</TD>
</TR>
<TR VALIGN="TOP">
<TD><I><TT>kernel.all.cpu.idle</TT></I></TD>
<TD>Time counted when in CPU_IDLE state.</TD>
</TR>
</TABLE>
<P>
These metrics are all "counters" in units of milliseconds
(cumulative since system boot time) so when displayed with most PCP
tools they are "rate converted" (sampled periodically and the
differences between consecutive values converted to time utilization in
units of milliseconds per second over the sample interval). Since the
raw values are aggregated across all CPUs, the time utilization for any
of the metrics above is in the range 0 to N*1000 for an N CPU system;
for some PCP tools this is reported as a percentage in the range 0 to
N*100 percent.</P>
<TABLE WIDTH=100% BORDER=0 CELLPADDING=10 CELLSPACING=20>
<TR><TD BGCOLOR="#e2e2e2" WIDTH=70%><BR><IMG SRC="images/stepfwd_on.png" WIDTH=16 HEIGHT=16 BORDER=0> Using <I>pmchart</I> to display CPU activity (aggregated over all CPUs).<BR>
<PRE><B>
$ source /etc/pcp.conf
$ tar xzf $PCP_DEMOS_DIR/tutorials/cpuperf.tgz
$ pmchart -c CPU -t 2sec -O -0sec -a cpuperf/moomba.pmkstat
</B></PRE>
<BR>
This command will provide the interactive charts described here.
</TD></TR>
</TABLE>
<P>On IRIX, the CPU_WAIT state is further subdivided into components describing
different types of "waiting":</P>
<OL>
<LI>
If executing a kernel thread waiting for a graphics context switch,
then the waiting classification W_GFXC is true.
<LI>
If executing a kernel thread waiting for a graphics FIFO operation to
complete, then the waiting classification W_GFXF is true.
<LI>
If not executing any thread and an I/O involving a block device (most
likely associated with a file system but independent of the CPU from
which the I/O was initiated), then the waiting classification W_IO is
true.
<LI>
If not executing any thread and an I/O involving a swap operation
(independent of the CPU from which the I/O was initiated), then the
waiting classification W_SWAP is true.
<LI>
If not executing any thread and an I/O involving a raw device is
pending (independent of the CPU from which the I/O was initiated), then
the waiting classification W_PIO is true.
</OL>
<P>
More than one of the group { W_IO, W_SWAP, W_PIO } can be true each
time, however this group and W_GFXC and W_GFXF are all mutually
exclusive. If the state is CPU_WAIT, then at least one of the
classifications must be true.</P>
<P>
The IRIX agent for PCP exports the following CPU "wait"
metrics, the sum of which approximately equals <I><TT>kernel.all.cpu.wait.total</TT></I>:</P>
<TABLE BORDER="1">
<CAPTION ALIGN="BOTTOM"><B>Table 2: Raw PCP CPU wait metrics</B></CAPTION>
<TR VALIGN="TOP">
<TH>PCP Metric</TH>
<TH>Semantics</TH>
</TR>
<TR VALIGN="TOP">
<TD><I><TT>kernel.all.cpu.wait.gfxc</TT></I></TD>
<TD>Time counted when W_GFXC is true.</TD>
</TR>
<TR VALIGN="TOP">
<TD><I><TT>kernel.all.cpu.wait.gfxf</TT></I></TD>
<TD>Time counted when W_GFXF is true.</TD>
</TR>
<TR VALIGN="TOP">
<TD><I><TT>kernel.all.cpu.wait.io</TT></I></TD>
<TD>Time counted when W_IO is true.</TD>
</TR>
<TR VALIGN="TOP">
<TD><I><TT>kernel.all.cpu.wait.pio</TT></I></TD>
<TD>Time counted when W_SWAP is true.</TD>
</TR>
<TR VALIGN="TOP">
<TD><I><TT>kernel.all.cpu.wait.swap</TT></I></TD>
<TD>Time counted when W_PIO is true.</TD>
</TR>
</TABLE>
<P>
These metrics are all "counters" in units of milliseconds
(cumulative since system boot time) so when displayed with most PCP
tools they are "rate converted" (sampled periodically and the
differences between consecutive values converted to time utilization in
units of milliseconds per second over the sample interval). Since the
raw values are aggregated across all CPUs, the time utilization for any
of the metrics above is in the range 0 to N*1000 for an N CPU system;
for some PCP tools this is reported as a percentage in the range 0 to
N*100 percent.</P>
<P>
Note that for a multiprocessor system with one I/O pending, <B>all</B>
otherwise idle CPUs will be assigned the CPU_WAIT state. This may lead
to an over-estimate of the I/O wait time, as discussed in the
companion <A HREF="howto.diskperf.html">How to understand measures of
disk performance</A> document.</P>
<P>
In IRIX 6.5.2 additional instrumentation was added to help address the
wait time attribution by looking at the <B>number</B> of waiting
processes in various states, rather than the state of a CPU with
reference to the cardinality of the various sets of waiting processes.
The wait I/O queue length is defined as the number of processes
waiting on events corresponding to the classifications W_IO, W_SWAP or
W_PIO. The metrics shown in the table below are computed on only <B>one</B>
of the CPUs (the "clock-master") each clock interrupt.</P>
<TABLE BORDER="1">
<CAPTION ALIGN="BOTTOM"><B>Table 3: Raw PCP wait I/O queue length
metrics</B></CAPTION>
<TR VALIGN="TOP">
<TH>PCP Metric</TH>
<TH>Semantics</TH>
</TR>
<TR VALIGN="TOP">
<TD><I><TT>kernel.all.waitio.queue</TT></I></TD>
<TD>Cumulative total of the wait I/O queue lengths, as observed
on each clock interrupt.</TD>
</TR>
<TR VALIGN="TOP">
<TD><I><TT>kernel.all.waitio.occ</TT></I></TD>
<TD>Cumulative total of the number of times the wait I/O queue
length is greater than zero, as observed on each clock interrupt.</TD>
</TR>
</TABLE>
<P>
These metrics may be used with PCP tools as follows:</P>
<UL>
<LI>
Displaying <I><TT>kernel.all.waitio.queue</TT></I> with <I>pmval</I>, <I>pmdumptext</I>, <I>pmchart</I>,
etc. will display the time average of the wait I/O queue length
multiplied by the frequency of clock interrupts, i.e. by 100.
<LI>
Displaying <I><TT>kernel.all.waitio.occ</TT></I> with <I>pmval</I>, <I>pmdumptext</I>, <I>pmchart</I>,
etc. will display the probability that the wait I/O queue is not empty
multiplied by the frequency of clock interrupts, i.e. by 100. This
value (converted to a percentage) is reported as <I><TT>%wioocc</TT></I>
by the <B>-q</B> option of <I>sar.</I>
<LI>
Using <I>pmie</I> with the expression<BR>
<I><TT>kernel.all.waitio.queue / kernel.all.waitio.occ</TT></I><BR>
will report the stochastic average of the wait I/O queue length,
conditional upon the queue not being empty. This value is reported as <I><TT>wioq-sz</TT></I>
by the <B>-q</B> option of <I>sar.</I>
</UL>
<P><BR></P>
<TABLE WIDTH=100% BORDER=0 CELLPADDING=0 CELLSPACING=0 BGCOLOR="#e2e2e2">
<TR><TD WIDTH=100% BGCOLOR="#081c59"><P ALIGN=LEFT><FONT SIZE=5 COLOR="#ffffff"><B>The per-CPU variants</B></FONT></P></TD></TR>
</TABLE>
<P>
Inside the kernel, most of the metrics described above are
accumulated per-CPU for reasons of efficiency (to reduce the locking
overheads and minimize dirty cache-line traffic).</P>
<P>
PCP exports the per-CPU versions of the system-wide metrics with metric
names formed by replacing <B><I><TT>all</TT></I></B> by <B><I><TT>percpu</TT></I></B>,
e.g. <I><TT>kernel.percpu.cpu.user</TT></I>.</P>
<P>
<TABLE WIDTH=100% BORDER=0 CELLPADDING=10 CELLSPACING=20>
<TR><TD BGCOLOR="#e2e2e2" WIDTH=70%><BR><IMG SRC="images/stepfwd_on.png" WIDTH=16 HEIGHT=16 BORDER=0> The <I>mpvis</I> tool provides 3-D visualization of these per-CPU metrics.<PRE><B>
$ mpvis -a cpuperf/babylon.percpu
</B></PRE>
<BR>
When the window is shown, use the <A HREF="timecontrol.html">PCP Archive Time Control</A> dialog to scroll through the archive (Fast Forward).
</TD></TR>
</TABLE>
<P><BR></P>
<TABLE WIDTH=100% BORDER=0 CELLPADDING=0 CELLSPACING=0 BGCOLOR="#e2e2e2">
<TR><TD WIDTH=100% BGCOLOR="#081c59"><P ALIGN=LEFT><FONT SIZE=5 COLOR="#ffffff"><B>Reconciling sar -u and PCP CPU performance metrics</B></FONT></P></TD></TR>
</TABLE>
<P>
The <I>sar</I> metrics are scaled based on the number of CPUs and
expressed in percentages, PCP metrics are in units of milliseconds per
second after rate conversion; this explains the PCP metric <I><TT>hinv.ncpu</TT></I>
and the constants 100 and 1000 in the expressions below.</P>
<P>
When run with a <B>-u</B> option, <I>sar</I> reports the following:</P>
<TABLE BORDER="1">
<CAPTION ALIGN="BOTTOM"><B>Table 3: PCP and sar metric equivalents</B></CAPTION>
<TR>
<TH><I>sar</I><BR>
metric</TH>
<TH>PCP equivalent (assuming rate conversion)</TH>
</TR>
<TR VALIGN="TOP">
<TD><I><TT>%usr</TT></I></TD>
<TD>100 * <I><TT>kernel.all.cpu.user</TT></I> / (<I><TT>hinv.ncpu </TT></I>*
1000)</TD>
</TR>
<TR VALIGN="TOP">
<TD><I><TT>%sys</TT></I></TD>
<TD>100 * <I><TT>kernel.all.cpu.sys</TT></I> / (<I><TT>hinv.ncpu </TT></I>*
1000)</TD>
</TR>
<TR VALIGN="TOP">
<TD><I><TT>%intr</TT></I></TD>
<TD>100 * <I><TT>kernel.all.cpu.intr</TT></I> / (<I><TT>hinv.ncpu </TT></I>*
1000)</TD>
</TR>
<TR VALIGN="TOP">
<TD><I><TT>%wio</TT></I></TD>
<TD>100 * <I><TT>kernel.all.cpu.wait.total</TT></I> / (<I><TT>hinv.ncpu </TT></I>*
1000)</TD>
</TR>
<TR VALIGN="TOP">
<TD><I><TT>%idle</TT></I></TD>
<TD>100 * <I><TT>kernel.all.cpu.idle</TT></I> / (<I><TT>hinv.ncpu </TT></I>*
1000)</TD>
</TR>
<TR VALIGN="TOP">
<TD><I><TT>%sbrk</TT></I></TD>
<TD>100 * <I><TT>kernel.all.cpu.sxbrk </TT></I>/ (<I><TT>hinv.ncpu </TT></I>*
1000)</TD>
</TR>
<TR VALIGN="TOP">
<TD><I><TT>%wfs</TT></I></TD>
<TD>100 * <I><TT>kernel.all.cpu.wait.io</TT></I> / <I><TT>kernel.all.cpu.wait.total</TT></I></TD>
</TR>
<TR VALIGN="TOP">
<TD><I><TT>%wswp</TT></I></TD>
<TD>100 * <I><TT>kernel.all.cpu.wait.swap</TT></I> / <I><TT>kernel.all.cpu.wait.total</TT></I></TD>
</TR>
<TR>
<TD><I><TT>%wphy</TT></I></TD>
<TD>100 * <I><TT>kernel.all.cpu.wait.pio</TT></I> / <I><TT>kernel.all.cpu.wait.total</TT></I></TD>
</TR>
<TR>
<TD><I><TT>%wgsw</TT></I></TD>
<TD>100 * <I><TT>kernel.all.cpu.wait.gfxc</TT></I> / <I><TT>kernel.all.cpu.wait.total</TT></I></TD>
</TR>
<TR>
<TD><I><TT>%wfif</TT></I></TD>
<TD>100 * <I><TT>kernel.all.cpu.wait.gfxf</TT></I> / <I><TT>kernel.all.cpu.wait.total</TT></I></TD>
</TR>
</TABLE>
<P><BR></P>
<TABLE WIDTH=100% BORDER=0 CELLPADDING=0 CELLSPACING=0 BGCOLOR="#e2e2e2">
<TR><TD WIDTH=100% BGCOLOR="#081c59"><P ALIGN=LEFT><FONT SIZE=5 COLOR="#ffffff"><B>The load average</B></FONT></P></TD></TR>
</TABLE>
<P>
The "load average" is reported by <I>uptime</I>, <I>top</I>,
etc. and the PCP metric <I><TT>kernel.all.load</TT></I>.</P>
<P>
The load average is an indirect measure of the demand for CPU
resources. It is calculated using the previous load average (<I>load</I>)
and the number of currently runnable processes (<I>nrun</I>) and an
exponential dampening expression, e.g. for the "1 minute"
average, the expression is:</P>
<PRE>
load = exp(-5/60) * load + (1 - exp(-5/60)) * nrun
</PRE>
<P>
The three load averages use different exponential constants and are all
re-computed every 5 seconds.</P>
<P>
<I>nrun</I> is computed as follows:</P>
<OL>
<LI>
Inspect every process.
<LI>
If the process is not likely to be runnable in the near future (state
not SRUN), ignore it.
<LI>
Inspect every thread of the process.
<LI>
If the thread is sleeping and not currently expanding its address space
(state not SXBRK) and not in a long-term sleep, increment <I>nrun.</I>
<LI>
If the thread is stopped, ignore it.
<LI>
Otherwise if the thread is not "weightless" (being ignored by
the scheduler), increment <I>nrun.</I>
</OL>
<P>
Note that the "run queue length" (a variant of which is
reported by the <B>-q</B> option of <I>sar</I>) counts processes using
a similar, but not identical algorithm:</P>
<OL>
<LI>
Inspect every process.
<LI>
If the process is not likely to be runnable in the near future (state
not SRUN), ignore it.
<LI>
Inspect every thread of the process.
<LI>
If the thread is sleeping and not currently expanding its address space
(state not SXBRK), ignore it
<LI>
If the thread is stopped, ignore it.
<LI>
Otherwise increment the "run queue length".
</OL>
<P><BR></P>
<HR>
<CENTER>
<TABLE WIDTH=100% BORDER=0 CELLPADDING=0 CELLSPACING=0>
<TR> <TD WIDTH=50%><P>Copyright © 2007-2010 <A HREF="http://www.aconex.com/"><FONT COLOR="#000060">Aconex</FONT></A><BR>Copyright © 2000-2004 <A HREF="http://www.sgi.com/"><FONT COLOR="#000060">Silicon Graphics Inc</FONT></P></TD>
<TD WIDTH=50%><P ALIGN=RIGHT><A HREF="http://pcp.io/"><FONT COLOR="#000060">PCP Site</FONT></A><BR>Copyright © 2012-2014 <A HREF="http://www.redhat.com/"><FONT COLOR="#000060">Red Hat</FONT></P></TD> </TR>
</TABLE>
</CENTER>
</BODY>
</HTML>
|