summaryrefslogtreecommitdiff
path: root/man/html/lab.pmie.html
blob: 3967bf9807da962e3e763a63a319c5c296f9c3bb (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<!--
 (c) Copyright 2010 Aconex. All rights reserved.
 (c) Copyright 2000-2004 Silicon Graphics Inc. All rights reserved.
 Permission is granted to copy, distribute, and/or modify this document
 under the terms of the Creative Commons Attribution-Share Alike, Version
 3.0 or any later version published by the Creative Commons Corp. A copy
 of the license is available at
 http://creativecommons.org/licenses/by-sa/3.0/us/ .
-->
<HTML>
<HEAD>
	<meta http-equiv="content-type" content="text/html; charset=utf-8">
	<meta http-equiv="content-style-type" content="text/css">
	<link href="pcpdoc.css" rel="stylesheet" type="text/css">
	<link href="images/pcp.ico" rel="icon" type="image/ico">
	<TITLE>Automated reasoning with pmie</TITLE>
</HEAD>
<BODY LANG="en-AU" TEXT="#000060" DIR="LTR">
<TABLE WIDTH=100% BORDER=0 CELLPADDING=0 CELLSPACING=0 STYLE="page-break-before: always">
	<TR> <TD WIDTH=64 HEIGHT=64><FONT COLOR="#000080"><A HREF="http://pcp.io/"><IMG SRC="images/pcpicon.png" NAME="pmcharticon" ALIGN=TOP WIDTH=64 HEIGHT=64 BORDER=0></A></FONT></TD>
	<TD WIDTH=1><P>&nbsp;&nbsp;&nbsp;&nbsp;</P></TD>
	<TD WIDTH=500><P VALIGN=MIDDLE ALIGN=LEFT><A HREF="index.html"><FONT COLOR="#cc0000">Home</FONT></A>&nbsp;&nbsp;&middot;&nbsp;<A HREF="lab.pmchart.html"><FONT COLOR="#cc0000">Charts</FONT></A>&nbsp;&nbsp;&middot;&nbsp;<A HREF="timecontrol.html"><FONT COLOR="#cc0000">Time Control</FONT></A></P></TD>
	</TR>
</TABLE>
<H1 ALIGN=CENTER STYLE="margin-top: 0.48cm; margin-bottom: 0.32cm"><FONT SIZE=7>
Automated reasoning with <I>pmie</I></FONT></H1>
<TABLE WIDTH=15% BORDER=0 CELLPADDING=5 CELLSPACING=10 ALIGN=RIGHT>
	<TR><TD BGCOLOR="#e2e2e2"><IMG SRC="images/system-search.png" WIDTH=16 HEIGHT=16 BORDER=0>&nbsp;&nbsp;<I>Tools</I><BR><PRE>
pmie
dkvis
pminfo
pmchart
pmdumplog
</PRE></TD></TR>
</TABLE>

<P>For many systems, the performance data is produced in volumes and at
rates that require some sort of automated and intelligent filtering by
which the mundane data can be removed from the interesting information.</P>
<P>Once interesting information has been found, there are a variety of
actions that may be appropriate.</P>
<P>The Performance Metrics Inference Engine (<I>pmie</I>) is the tool
within PCP that is designed for automated filtering and reasoning
about performance.</P>
<P>For an explanation of Performance Co-Pilot terms and acronyms, consult
the <A HREF="glossary.html">PCP glossary</A>.</P>
<BR>
<UL>
<LI><A HREF="#basics"><I>pmie</I> basics</A> 
<LI><A HREF="#repeat">Action repetition and launching arbitrary actions</A> 
<LI><A HREF="#value-sets">Rule evaluation over sets of values</A> 
<LI><A HREF="#predicates">Forms of <I>pmie</I> predicate</A> 
  <UL>
    <LI><A HREF="#exist">Existential quantification</A> 
    <LI><A HREF="#universal">Universal quantification</A> 
    <LI><A HREF="#percentile">Percentile quantification</A> 
    <LI><A HREF="#others">Other predicates</A> 
  </UL>
<LI><A HREF="#expr"><I>pmie</I> expressions</A> 
<LI><A HREF="#actions">Actions and parameter substitution of predicate context</A> 
<LI><A HREF="#audit">Performance audit using archives</A> 
<LI><A HREF="#interval">Influence of the update interval</A> 
</UL>
<BR>
<P><I>pmie</I> evaluates a set of assertions against a time-series of
performance metric values collected in real-time from PMCD on one or
more hosts or from one or more PCP archives.</P>
<P>For those assertions that are found to be true, <I>pmie</I> is able
to print messages, activate alarms, write syslog entries and launch
arbitrary programs.</P>
<P>Typical <I>pmie</I> usage might include:
<UL>
    <LI> monitoring for exceptional performance conditions 
    <LI> raising alarms 
    <LI> automated filtering of acceptable performance 
    <LI> early warning of pending performance problems 
    <LI> &quot;call home&quot; to the support center 
    <LI> retrospective performance audits 
    <LI> evaluating assertions about &quot;before and after&quot;
    performance in the context of upgrades or system reconfiguration 
    <LI> hypothesis evaluation for capacity planning 
    <LI> as part of the <I>post mortem</I> analysis following a system failure 
</UL>
</P>

<P><BR></P>
<TABLE WIDTH=100% BORDER=0 CELLPADDING=0 CELLSPACING=0 BGCOLOR="#e2e2e2">
        <TR><TD WIDTH=100% BGCOLOR="#081c59"><P ALIGN=LEFT><FONT SIZE=5 COLOR="#ffffff"><B><A NAME="basics"><I>pmie</I> basics</A></B></FONT></P></TD></TR>
</TABLE>
<P>The simplest rules test thresholds and are formed from expressions
involving performance metrics and constants.&nbsp;&nbsp;For example, the following
statement:</P>
<PRE>
<I>    If the context switch rate exceeds 2000 switches per second</I>
<I>    then activate an alarm notifier</I>
</PRE>
<P>may be translated into the following <I>pmie</I> rule:</P>
<CENTER><P ALIGN="CENTER"><IMG SRC="images/pmie_rule1.png" WIDTH="495" HEIGHT="200"></P>
</CENTER>
<P>where the &quot;alarm&quot; action launches an information dialog with
the specified message.</P>
<P>Other <I>pmie</I> actions are discussed later in the
<A HREF="#actions">Actions and parameter substitution of predicate context</A> section.</P>

<TABLE WIDTH=100% BORDER=0 CELLPADDING=0 CELLSPACING=0>
<TR><TD>
  <TABLE WIDTH=100% BORDER=0 CELLPADDING=10 CELLSPACING=20>
	<TR><TD BGCOLOR="#e2e2e2" WIDTH=50%><BR><IMG SRC="images/stepfwd_on.png" WIDTH=16 HEIGHT=16 BORDER=0>&nbsp;&nbsp;Generate alot of context switches, and watch 'em:<BR>
<PRE><B>
$ . /etc/pcp.env
$ pmie -v -c $PCP_DEMOS_DIR/pmie/pswitch.pmie &amp;
$ pmchart -t 1sec -c $PCP_DEMOS_DIR/pmie/pswitch.view &amp;
$ while true; do sleep 0; done &amp;
$ jobs
[1]- Running     pmie -v -c $PCP_DEMOS_DIR/pmie/pswitch.pmie &amp;
[2]- Running     pmchart -t 0.5sec -c $PCP_DEMOS_DIR/pmie/pswitch.view &amp;
[3]+ Running     while true; do sleep 0; done &amp;
</B></PRE>
<BR>
<B>Important:</B> the above test case can be quite intrusive on low processor
count machines, so remember to terminate it when you've finished this tutorial:
<PRE><B>
$ jobs
...
[3]+ Running     while true; do sleep 0; done &amp;
$ fg %3
</B></PRE>
</TD></TR>
</TABLE>
</TD>
<TD><IMG ALIGN=RIGHT SRC="images/cpu_pswitch.png" BORDER=0></TD>
</TR>
</TABLE>

<P><BR></P>
<TABLE WIDTH=100% BORDER=0 CELLPADDING=0 CELLSPACING=0 BGCOLOR="#e2e2e2">
        <TR><TD WIDTH=100% BGCOLOR="#081c59"><P ALIGN=LEFT><FONT SIZE=5 COLOR="#ffffff"><B><A NAME="repeat">Action repetition and launching arbitrary actions</A></B></FONT></P></TD></TR>
</TABLE>
<P>Sometimes it is useful for an action not to be repeated for a time.
For example, the English statement:</P>
<PRE></I>
    If the context switch rate exceeds 2000 switches per second
    then launch <B>top</B> in an <B>xterm</B> window
    but hold off repetition of the action for 5 minutes
</I></PRE>
<P>may be translated into the following <I>pmie</I> rule:</P>
<CENTER><P ALIGN="CENTER"><IMG SRC="images/pmie_rule2.png" WIDTH="495" HEIGHT="200"></P>
</CENTER>
<P>
Note the <B><TT><FONT COLOR="#ff0000">shell</FONT></TT></B> keyword
introduces an arbitrary action in which any program can be launched.</P>

<P><BR></P>
<TABLE WIDTH=100% BORDER=0 CELLPADDING=0 CELLSPACING=0 BGCOLOR="#e2e2e2">
        <TR><TD WIDTH=100% BGCOLOR="#081c59"><P ALIGN=LEFT><FONT SIZE=5 COLOR="#ffffff"><B><A NAME="value-sets">Rule evaluation over sets of values</A></B></FONT></P></TD></TR>
</TABLE>
<P>Each <I>pmie</I> rule may be evaluated over a set of performance metric values.</P>
<P>Conceptually these sets of values are constructed for a single performance
metric by taking the cross product of observed values over the three
dimensions of: <B>hosts</B>, <B>instances</B> and <B>times</B>.</P>
<P>The default host is:</P>
<UL>
    <LI>
    the host named in the <B><TT>-h</TT></B> command line option to
    <I>pmie</I>, or
    <LI>
    the host associated with the archive named in the first
    <B><TT>-a</TT></B> command line option to <I>pmie</I>, or
    <LI>
    the local host if neither <B><TT>-h</TT></B> nor <B><TT>-a</TT></B>
    appears on the command line.
</UL>
<P>
By default, a metric name represents the set of values formed by the
cross product of the default host for <I>pmie</I>, <B>all</B> instances
and the current time.&nbsp;&nbsp;If there is only one instance, then the set
contains a singular value.</P>
<P>
For example <B><TT>filesys.free</TT></B> is the most recent set of
values for the amount of free space on every mounted file system on the
default host, and may be represented by the shaded rectangle in the
following figure:</P>
<CENTER><P ALIGN="CENTER"><IMG SRC="images/pmie_axis1.png" WIDTH="396" HEIGHT="226"></P>
</CENTER>
<P>
One or more suffix of the form <B><TT>#<I>instance</I></TT></B>
(where <TT><I>instance</I></TT> is the external instance identifier)
after a metric name restricts the set of values on the default host
for <I>pmie</I>, to the <B>nominated</B> instances and the current time.
If <I><TT>instance</TT></I> includes any special characters then it
should be enclosed in single quotes.</P>
<P>
For example <B><TT>filesys.free #'/usr'</TT></B> is the most recent
value for the amount of free space on the <I>/usr</I> file system on the
default host, and may be represented as follows:</P>
<CENTER><P ALIGN="CENTER"><IMG SRC="images/pmie_axis2.png" WIDTH="396" HEIGHT="226"></P>
</CENTER>
<P>
One or more suffix of the form <B><TT>:<I>hostname</I></TT></B> after
a metric name changes the set of values to include <B>all</B> instances
on the <B>nominated</B> hosts, at the current time.</P>
<P>
For example <B><TT>filesys.free :otherhost</TT></B> is the most
recent set of values for the amount of free space on every mounted file
system on the host <B>otherhost</B>, and may be represented as follows:</P>
<CENTER><P ALIGN="CENTER"><IMG SRC="images/pmie_axis3.png" WIDTH="396" HEIGHT="226"></P>
</CENTER>
<P>
A suffix of the form <B><TT>@<I>N..M</I></TT></B> after a metric name
changes the set of values to be that formed by <B>all</B> instances on
the default host for <I>pmie</I>, at the sample times
<B><I><TT>N</TT></I></B>, <B><I><TT>N+1</TT></I></B>, ...
<B><I><TT>M</TT></I></B><TT> </TT><B>back</B> from the current time.</P>
<P>
And finally more than one type of suffix may be used to control enumeration
in each of the three axis directions.</P>
<P>
For example <B><TT>filesys.free #'/usr' @0..3</TT></B> refers to
the default host, restricts the instances and enumerates the time.
This may be represented as follows:</P>
<CENTER><P ALIGN="CENTER"><IMG SRC="images/pmie_axis4.png" WIDTH="396" HEIGHT="226"></P>
</CENTER>

<P><BR></P>
<TABLE WIDTH=100% BORDER=0 CELLPADDING=0 CELLSPACING=0 BGCOLOR="#e2e2e2">
        <TR><TD WIDTH=100% BGCOLOR="#081c59"><P ALIGN=LEFT><FONT SIZE=5 COLOR="#ffffff"><B><A NAME="predicates">Forms of <I>pmie</I> predicate</A></B></FONT></P></TD></TR>
</TABLE>
<H3><A NAME="exist">Existential quantification</A></H3>
<P>
The predicate <B><TT><FONT COLOR="#ff0000">some_inst</FONT></TT></B><TT>
<I>expr</I></TT> is true if there is some instance of a metric that
makes <I><TT>expr</TT></I> true.</P>
<P>
Existential quantification over hosts and consecutive samples is also
supported by <B><TT><FONT COLOR="#ff0000">some_host</FONT></TT></B><TT>
<I>expr</I></TT> and
<B><TT><FONT COLOR="#ff0000">some_sample</FONT></TT></B><TT>
<I>expr</I></TT>.</P>
<P>
For example, the English statement:</P>
<PRE>
<I>    if some disk is doing a lot of I/O</I>
<I>    then launch a visible alarm</I>
</PRE>
<P>may be translated into the following <I>pmie</I> rule:</P>
<CENTER><P ALIGN="CENTER"><IMG SRC="images/pmie_rule3.png" WIDTH="495" HEIGHT="120"></P>
</CENTER>
<BR>
<H3><A NAME="universal">Universal quantification</A></H3>
<P>
The predicate <B><TT><FONT COLOR="#ff0000">all_inst</FONT></TT></B>
<TT><I>expr</I></TT> is true if <I><TT>expr</TT></I> is true for every
instance of a metric.</P>
<P>
Universal quantification over hosts and consecutive samples is also
supported by
<B><TT><FONT COLOR="#ff0000">all_host</FONT></TT></B> <TT><I>expr</I></TT>
and <B><TT><FONT COLOR="#ff0000">all_sample</FONT></TT></B>
<TT><I>expr</I></TT>.</P>
<P>
Quantification predicates may be nested.</P>
<P>
For example, the English statement:</P>
<PRE>
<I>    if for every one of the last 5 samples,</I>
<I>    some disk (but not necessarily the same disk) is doing a lot of I/O</I>
<I>    then launch a visible alarm</I>
</PRE>
<P>may be translated into the following <I>pmie</I> rule:</P>
<CENTER><P ALIGN="CENTER"><IMG SRC="images/pmie_rule4.png" WIDTH="540" HEIGHT="140"></P>
</CENTER>
<P>Note that reversing the nesting of the universal and existential predicates produces
a rule which has slightly different English semantics, namely:</P>
<PRE>
<I>    if the same disk has been doing a lot of I/O</I>
<I>    for every one of the last 5 samples,</I>
<I>    then launch a visible alarm</I>
</PRE>
<CENTER><P ALIGN="CENTER"><IMG SRC="images/pmie_rule5.png" WIDTH="540" HEIGHT="140"></P>
</CENTER>
<BR>
<H3><A NAME="percentile">Percentile quantification</A></H3>
<P>
The predicate
<TT><I><FONT COLOR="#000000">N</FONT></I><B><FONT COLOR="#ff0000">%_inst</FONT></B> <I>expr</I></TT>
is true if <I><TT>expr</TT></I> is true for <I><TT>N</TT></I> percent
of the instances of a metric.</P>
<P>
Percentile quantification over hosts and consecutive samples is also
supported by
<TT><I><FONT COLOR="#000000">N</FONT></I><B><FONT COLOR="#ff0000">%_host</FONT></B> <I>expr</I></TT> and
<TT><I><FONT COLOR="#000000">N</FONT></I><B><FONT COLOR="#ff0000">%_sample</FONT></B> <I>expr</I></TT>.</P>
<P>
For example, the English phrase:</P>
<PRE>
<I>    if at least 30% of the disks are doing a lot of I/O</I>
<I>    then launch a visible alarm</I>
</PRE>
<P>
may be translated into the following <I>pmie</I> expression:</P>
<CENTER><P ALIGN="CENTER"><IMG SRC="images/pmie_rule3.png" WIDTH="495" HEIGHT="120"></P>
</CENTER>
<BR>
<H3><A NAME="others">Other predicates</A></H3>
<P>
Instance quantification:
<B><TT><FONT COLOR="#ff0000">match_inst</FONT></TT>,
<TT><FONT COLOR="#ff0000">nomatch_inst</FONT></TT></B></P>
<P>
Value aggregation:
<B><TT><FONT COLOR="#ff0000">avg_inst</FONT></TT>,
<TT><FONT COLOR="#ff0000">sum_inst</FONT></TT>,
<TT><FONT COLOR="#ff0000">avg_host</FONT></TT>,
<TT><FONT COLOR="#ff0000">sum_host</FONT></TT>,
<TT><FONT COLOR="#ff0000">avg_sample</FONT></TT>,
<TT><FONT COLOR="#ff0000">sum_sample</FONT></TT></B></P>
<P>
Value extrema:
<B><TT><FONT COLOR="#ff0000">min_inst</FONT></TT>,
<TT><FONT COLOR="#ff0000">max_inst</FONT></TT>,
<TT><FONT COLOR="#ff0000">min_host</FONT></TT>,
<TT><FONT COLOR="#ff0000">max_host</FONT></TT>,
<TT><FONT COLOR="#ff0000">min_sample</FONT></TT>,
<TT><FONT COLOR="#ff0000">max_sample</FONT></TT></B></P>
<P>
Value set cardinality:
<B><TT><FONT COLOR="#ff0000">count_inst</FONT></TT>,
<TT><FONT COLOR="#ff0000">count_host</FONT></TT>,
<TT><FONT COLOR="#ff0000">count_sample</FONT></TT></B></P>
<P>
Trends:
<B><TT><FONT COLOR="#ff0000">rising</FONT></TT>,
<TT><FONT COLOR="#ff0000">falling</FONT></TT>,
<TT><FONT COLOR="#ff0000">rate</FONT></TT></B></P>
<P>These predicates are discussed in depth in the <I>pmie</I> manual page.</P>


<P><BR></P>
<TABLE WIDTH=100% BORDER=0 CELLPADDING=0 CELLSPACING=0 BGCOLOR="#e2e2e2">
        <TR><TD WIDTH=100% BGCOLOR="#081c59"><P ALIGN=LEFT><FONT SIZE=5 COLOR="#ffffff"><B><I><A NAME="expr">pmie</I> expressions</A></B></FONT></P></TD></TR>
</TABLE>
<P><I>pmie</I> expressions are very similar to the C programming language;
especially with regard to arithmetic, relational and Boolean operators,
and the use of parenthesis for grouping.</P>
<P>The <I>pmie</I> language allows macro definition and textual substitution
for common expressions and metric names.</P>
<PRE>
    // Macro for later use ...
    bc = &quot;buffer_cache&quot;;

    // Using the above macro
    // If the buffer cache is in use (more than 50 read requests)
    // with hit ratio less than 90%, then popup an alarm
    $bc.getblks &gt; 50 &amp;&amp; $bc.getfound / $bc.getblks &lt; 0.9
        -&gt; alarm &quot;poor buffer cache hit rate&quot;;

</PRE>
<P>All calculations are done in double precision, where default units are
<B>bytes</B>, <B>seconds</B> and <B>counts</B>.
Note that this can sometimes cause surprises:</P>
<PRE>
    mem.freemem &gt; 10;
</PRE>
<P>
will always be true, unlike</P>
<PRE>
    mem.freemem &gt; 10 Mbyte;
</PRE>
<P>
Metrics with &quot;counter&quot; semantics have their units, semantics
and values converted to rates.&nbsp;&nbsp;For example, the metric
<B><TT>network.interface.total.bytes</TT></B> measures the number of bytes passed
across all of the configured network interfaces.&nbsp;&nbsp;The metric is a counter and the
units are <B><TT>bytes</TT></B>.&nbsp;&nbsp;If <I>pmie</I> finds the value of
<B><TT>network.interface.total.bytes</TT></B> to be 10000 and 15000 on consecutive
fetches 5 seconds apart, then the <I>pmie</I> expression</P>
<PRE>
    <B><TT>kernel.interface.total.bytes;</TT></B>
</PRE>
<P>
would have the value <B>1000</B> and the units of
<B><TT>bytes/second</TT></B>.</P>

<P><BR></P>
<TABLE WIDTH=100% BORDER=0 CELLPADDING=0 CELLSPACING=0 BGCOLOR="#e2e2e2">
        <TR><TD WIDTH=100% BGCOLOR="#081c59"><P ALIGN=LEFT><FONT SIZE=5 COLOR="#ffffff"><B><A NAME="actions">Actions and parameter substitution of predicate context</A></B></FONT></P></TD></TR>
</TABLE>
<P> The available <I>pmie</I> actions are:</P>
<UL>
<LI><B><TT><FONT COLOR="#ff0000">alarm</FONT></TT></B> - popup alarm notifier 
<LI><B><TT><FONT COLOR="#ff0000">shell</FONT></TT></B> - launch any program 
<LI><B><TT><FONT COLOR="#ff0000">syslog</FONT></TT></B> - write an entry in the system log 
<LI><B><TT><FONT COLOR="#ff0000">print</FONT></TT></B> - print message to standard output 
</UL>
<P>Within the arguments that follow the action keyword, parameter substitution
may be used to incorporate some context from the predicate in the arguments
to the actions.&nbsp;&nbsp;For example, when using <B><TT><FONT COLOR="#ff0000">some_host</FONT></TT></B> or <B><TT><FONT COLOR="#ff0000">some_inst</FONT></TT></B> in a predicate, it is most helpful to know &quot;which hosts&quot; or &quot;which instances&quot;
made the condition true.</P>
<P>The following substitutions are available:</P>
<UL>
<LI><B><TT><FONT COLOR="#ff0000">%h</FONT></TT></B> appearing in an action is replaced by the qualifying hosts 
<LI><B><TT><FONT COLOR="#ff0000">%i</FONT></TT></B> appearing in an action is replaced by the qualifying instances 
<LI><B><TT><FONT COLOR="#ff0000">%v</FONT></TT></B> appearing in an action is replaced by value of the left-most top-level
expression in the expression tree that represents the parsed condition 
</UL>

<P><BR></P>
<TABLE WIDTH=100% BORDER=0 CELLPADDING=0 CELLSPACING=0 BGCOLOR="#e2e2e2">
        <TR><TD WIDTH=100% BGCOLOR="#081c59"><P ALIGN=LEFT><FONT SIZE=5 COLOR="#ffffff"><B><A NAME="audit">Performance audit using archives</A></B></FONT></P></TD></TR>
</TABLE>
<P> In this exercise, we shall use <I>pmie</I> to investigate performance from 
a PCP archive.</P>

<TABLE WIDTH=100% BORDER=0 CELLPADDING=0 CELLSPACING=0>
<TR><TD>
    <TABLE WIDTH=100% BORDER=0 CELLPADDING=10 CELLSPACING=10>
	<TR><TD BGCOLOR="#e2e2e2" WIDTH=50%><BR><IMG SRC="images/stepfwd_on.png" WIDTH=16 HEIGHT=16 BORDER=0>&nbsp;&nbsp;Use <I>pmdumplog</I> to report the details of when the archive was created and from which host the archive was created:<BR>
<PRE><B>
$ . /etc/pcp.env
$ tar xzf $PCP_DEMOS_DIR/tutorials/pmie.tar.gz
$ pmdumplog -L pmie/babylon.perdisk
Log Label (Log Format Version 2)
Performance metrics from host babylon
  commencing Wed Jan 25 08:17:48.460 1995
  ending     Wed Jan 25 14:12:48.457 1995
Archive timezone: PST8PDT
PID for pmlogger: 18496
</B></PRE>
<I>Yes, PCP archives from <B>that</B> long ago still work today!</I><BR>
<BR>From running the command:<BR>
<PRE><B>
$ dkvis -a pmie/babylon.perdisk &amp;
</B></PRE>
we can visually determine which disks and which controllers are active.
</TD>
<TD><IMG ALIGN=RIGHT SRC="images/dkvis.png" BORDER=0></TD>
</TR>
</TABLE>
<P>This is easy, which is good.
However, consider the situation where we have a large number of 
separate archives, possibly collected from different machines and with 
different disk configurations.&nbsp;&nbsp;We'd like to be able to quickly process 
these archives, and filter out the extraneous information, to focus on 
those times at which the disks were busy, how busy they were, etc.</P>

<TABLE WIDTH=100% BORDER=0 CELLPADDING=10 CELLSPACING=20>
	<TR><TD BGCOLOR="#e2e2e2" WIDTH=50%><BR><IMG SRC="images/stepfwd_on.png" WIDTH=16 HEIGHT=16 BORDER=0>&nbsp;&nbsp;Using the <I>pmie</I> configuration file in <A HREF="pmie/disk.pmie">pmie/disk.pmie</A> as a starting point, run this against the archive:<BR>
<PRE><B>
$ pmie -t 5min -a pmie/babylon.perdisk < pmie/disk.pmie
</B></PRE>
<BR>
Copy the configuration file and extend it by adding new rules to report 
different messages for each of the following: 
<OL>
  <LI> some disk is doing more than 30 reads per second (make use of the <B><TT>disk.dev.read</TT></B> metric) 
  <LI> some disk is doing more than 30 writes per second (make use of the <B><TT>disk.dev.write</TT></B> metric) 
  <LI> some disk has a high I/O rate (consider a high I/O rate to be when the 
transfers are greater than 40 per second), and where reads contribute 
greater than ninety-five percent of the total transfers 
  <LI> some disk has a high I/O rate (as defined above) and the system's 1
minute load average is greater than 5 (make use of the &quot;1
minute&quot; instance for the <B><TT>kernel.all.load</TT></B> metric).
</OL>
<P>Use the <I>pmie/babylon.perdisk</I> archive extracted earlier to cross check your rules as you add each one.</P>
</OL>
</TD></TR>
</TABLE>

<H4> <I>Hints:</I> </H4>
<UL>
    <LI> Make sure you sample the archive every 5 minutes (<B>-t 5min</B> on the command line).
    <LI> You'll need to use existential quantification (the <B>some_inst</B> keyword) in all of the rules.
    <LI> When producing the final rule, start with the load average metric using the command:
<BR><BR>
<TT><B>$ pminfo -f kernel.all.load</B></TT>
<BR><BR>
Notice there are three values corresponding to the 1, 5 and 15 minute load average.
<BR>
For <I>pmie</I> the metric <TT><B>kernel.all.load</B></TT>
is a set of <B>three</B> values, one for each instance at each point
of time.&nbsp;&nbsp;To choose <B>one</B> instance append the <B><TT>#</B></TT>
qualifier to the name of the metric and the name of a particular instance,
e.g. <TT><B>kernel.all.load #'1 minute'</B></TT>.
    <LI> The <I>pmie(1)</I> man page describes the <I>pmie</I> language in detail.
    <LI> You may find it helpful to use <I>dkvis</I> to visually predict 
when the rules should be triggered.&nbsp;&nbsp;Using the <B>PCP Archive Time Control</B>
dialog, you can position the <I>dkvis</I> display at the time where <I>pmie</I>
is reporting interesting activity.
</UL>
<P>
When all else fails, the solution is at <A HREF="pmie/answer.pmie">pmie/answer.pmie</A>.</P>

<P><BR></P>
<TABLE WIDTH=100% BORDER=0 CELLPADDING=0 CELLSPACING=0 BGCOLOR="#e2e2e2">
        <TR><TD WIDTH=100% BGCOLOR="#081c59"><P ALIGN=LEFT><FONT SIZE=5 COLOR="#ffffff"><B><A NAME="interval">Influence of the update interval </A></B></FONT></P></TD></TR>
</TABLE>
<P>As a final exercise, investigate the effects of using different update 
intervals on the <I>pmie</I> command line (the <B>-t</B> option) with 
the initial configuration file and archive from the previous exercise.</P>
<TABLE WIDTH=100% BORDER=0 CELLPADDING=10 CELLSPACING=20>
	<TR><TD BGCOLOR="#e2e2e2" WIDTH=50%><BR><IMG SRC="images/stepfwd_on.png" WIDTH=16 HEIGHT=16 BORDER=0>&nbsp;&nbsp;Try each of the following:<BR>
<PRE><B>
$ pmie -t 5min -a pmie/babylon.perdisk &lt; pmie/disk.pmie
$ pmie -t 6min -a pmie/babylon.perdisk &lt; pmie/disk.pmie
$ pmie -t 10min -a pmie/babylon.perdisk &lt; pmie/disk.pmie
</B></PRE>
</TD></TR>
</TABLE>
<P>Why does the number of reported incidents decline as the rule evaluation interval increases?</P>
<TABLE WIDTH=100% BORDER=0 CELLPADDING=10 CELLSPACING=20>
	<TR><TD BGCOLOR="#e2e2e2" WIDTH=50%><BR><IMG SRC="images/stepfwd_on.png" WIDTH=16 HEIGHT=16 BORDER=0>&nbsp;&nbsp;Repeat the exercise but use <I>pmchart</I>:<BR>
<PRE><B>
$ pmchart -t 5min -a pmie/babylon.perdisk &amp;
</B></PRE>
<BR>
Use the <B>New Chart...</B> command of the <B>File</B> menu to plot
the <B><TT>disk.dev.total</TT></B> metric for the disk <B>jag3d5</B>:<P>
    <UL>
	<LI> Enter the name <TT>disk.dev.total</TT> into the Metric Selection dialog.
	<LI> There should be 54 instances of the metric listed.
	Find the instance <B>jag3d5</B>, select it, and press OK.
    </UL>
</B></PRE>
</TD></TR>
</TABLE>

<P>Use the PCP Archive Time Control dialog to change the <B><I>Interval</I></B>.</P>
<P>By using <B>smaller</B> values of the update interval, can you 
deduce the sampling rate of the data in the PCP archive?</P>

<H4> <I>Hint:</I> </H4>
<UL>
    <LI> From a PCP archive you can get a dump of the raw data and timestamps
when the data for a particular metric was collected using the command:
<BR> <BR>
<TT><B>$ pmdumplog pmie/babylon.perdisk disk.dev.total | more</B></TT>
<BR> <BR>
</UL>

<P><BR></P>
<HR>
<CENTER>
<TABLE WIDTH=100% BORDER=0 CELLPADDING=0 CELLSPACING=0>
	<TR> <TD WIDTH=50%><P>Copyright &copy; 2007-2010 <A HREF="http://www.aconex.com/"><FONT COLOR="#000060">Aconex</FONT></A><BR>Copyright &copy; 2000-2004 <A HREF="http://www.sgi.com/"><FONT COLOR="#000060">Silicon Graphics Inc</FONT></P></TD>
	<TD WIDTH=50%><P ALIGN=RIGHT><A HREF="http://pcp.io/"><FONT COLOR="#000060">PCP Site</FONT></A><BR>Copyright &copy; 2012-2014 <A HREF="http://www.redhat.com/"><FONT COLOR="#000060">Red Hat</FONT></P></TD> </TR>
</TABLE>
</CENTER>
</BODY>
</HTML>