diff options
Diffstat (limited to 'man/html/lab.pmie.html')
-rw-r--r-- | man/html/lab.pmie.html | 550 |
1 files changed, 550 insertions, 0 deletions
diff --git a/man/html/lab.pmie.html b/man/html/lab.pmie.html new file mode 100644 index 0000000..3967bf9 --- /dev/null +++ b/man/html/lab.pmie.html @@ -0,0 +1,550 @@ +<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> +<!-- + (c) Copyright 2010 Aconex. All rights reserved. + (c) Copyright 2000-2004 Silicon Graphics Inc. All rights reserved. + Permission is granted to copy, distribute, and/or modify this document + under the terms of the Creative Commons Attribution-Share Alike, Version + 3.0 or any later version published by the Creative Commons Corp. A copy + of the license is available at + http://creativecommons.org/licenses/by-sa/3.0/us/ . +--> +<HTML> +<HEAD> + <meta http-equiv="content-type" content="text/html; charset=utf-8"> + <meta http-equiv="content-style-type" content="text/css"> + <link href="pcpdoc.css" rel="stylesheet" type="text/css"> + <link href="images/pcp.ico" rel="icon" type="image/ico"> + <TITLE>Automated reasoning with pmie</TITLE> +</HEAD> +<BODY LANG="en-AU" TEXT="#000060" DIR="LTR"> +<TABLE WIDTH=100% BORDER=0 CELLPADDING=0 CELLSPACING=0 STYLE="page-break-before: always"> + <TR> <TD WIDTH=64 HEIGHT=64><FONT COLOR="#000080"><A HREF="http://pcp.io/"><IMG SRC="images/pcpicon.png" NAME="pmcharticon" ALIGN=TOP WIDTH=64 HEIGHT=64 BORDER=0></A></FONT></TD> + <TD WIDTH=1><P> </P></TD> + <TD WIDTH=500><P VALIGN=MIDDLE ALIGN=LEFT><A HREF="index.html"><FONT COLOR="#cc0000">Home</FONT></A> · <A HREF="lab.pmchart.html"><FONT COLOR="#cc0000">Charts</FONT></A> · <A HREF="timecontrol.html"><FONT COLOR="#cc0000">Time Control</FONT></A></P></TD> + </TR> +</TABLE> +<H1 ALIGN=CENTER STYLE="margin-top: 0.48cm; margin-bottom: 0.32cm"><FONT SIZE=7> +Automated reasoning with <I>pmie</I></FONT></H1> +<TABLE WIDTH=15% BORDER=0 CELLPADDING=5 CELLSPACING=10 ALIGN=RIGHT> + <TR><TD BGCOLOR="#e2e2e2"><IMG SRC="images/system-search.png" WIDTH=16 HEIGHT=16 BORDER=0> <I>Tools</I><BR><PRE> +pmie +dkvis +pminfo +pmchart +pmdumplog +</PRE></TD></TR> +</TABLE> + +<P>For many systems, the performance data is produced in volumes and at +rates that require some sort of automated and intelligent filtering by +which the mundane data can be removed from the interesting information.</P> +<P>Once interesting information has been found, there are a variety of +actions that may be appropriate.</P> +<P>The Performance Metrics Inference Engine (<I>pmie</I>) is the tool +within PCP that is designed for automated filtering and reasoning +about performance.</P> +<P>For an explanation of Performance Co-Pilot terms and acronyms, consult +the <A HREF="glossary.html">PCP glossary</A>.</P> +<BR> +<UL> +<LI><A HREF="#basics"><I>pmie</I> basics</A> +<LI><A HREF="#repeat">Action repetition and launching arbitrary actions</A> +<LI><A HREF="#value-sets">Rule evaluation over sets of values</A> +<LI><A HREF="#predicates">Forms of <I>pmie</I> predicate</A> + <UL> + <LI><A HREF="#exist">Existential quantification</A> + <LI><A HREF="#universal">Universal quantification</A> + <LI><A HREF="#percentile">Percentile quantification</A> + <LI><A HREF="#others">Other predicates</A> + </UL> +<LI><A HREF="#expr"><I>pmie</I> expressions</A> +<LI><A HREF="#actions">Actions and parameter substitution of predicate context</A> +<LI><A HREF="#audit">Performance audit using archives</A> +<LI><A HREF="#interval">Influence of the update interval</A> +</UL> +<BR> +<P><I>pmie</I> evaluates a set of assertions against a time-series of +performance metric values collected in real-time from PMCD on one or +more hosts or from one or more PCP archives.</P> +<P>For those assertions that are found to be true, <I>pmie</I> is able +to print messages, activate alarms, write syslog entries and launch +arbitrary programs.</P> +<P>Typical <I>pmie</I> usage might include: +<UL> + <LI> monitoring for exceptional performance conditions + <LI> raising alarms + <LI> automated filtering of acceptable performance + <LI> early warning of pending performance problems + <LI> "call home" to the support center + <LI> retrospective performance audits + <LI> evaluating assertions about "before and after" + performance in the context of upgrades or system reconfiguration + <LI> hypothesis evaluation for capacity planning + <LI> as part of the <I>post mortem</I> analysis following a system failure +</UL> +</P> + +<P><BR></P> +<TABLE WIDTH=100% BORDER=0 CELLPADDING=0 CELLSPACING=0 BGCOLOR="#e2e2e2"> + <TR><TD WIDTH=100% BGCOLOR="#081c59"><P ALIGN=LEFT><FONT SIZE=5 COLOR="#ffffff"><B><A NAME="basics"><I>pmie</I> basics</A></B></FONT></P></TD></TR> +</TABLE> +<P>The simplest rules test thresholds and are formed from expressions +involving performance metrics and constants. For example, the following +statement:</P> +<PRE> +<I> If the context switch rate exceeds 2000 switches per second</I> +<I> then activate an alarm notifier</I> +</PRE> +<P>may be translated into the following <I>pmie</I> rule:</P> +<CENTER><P ALIGN="CENTER"><IMG SRC="images/pmie_rule1.png" WIDTH="495" HEIGHT="200"></P> +</CENTER> +<P>where the "alarm" action launches an information dialog with +the specified message.</P> +<P>Other <I>pmie</I> actions are discussed later in the +<A HREF="#actions">Actions and parameter substitution of predicate context</A> section.</P> + +<TABLE WIDTH=100% BORDER=0 CELLPADDING=0 CELLSPACING=0> +<TR><TD> + <TABLE WIDTH=100% BORDER=0 CELLPADDING=10 CELLSPACING=20> + <TR><TD BGCOLOR="#e2e2e2" WIDTH=50%><BR><IMG SRC="images/stepfwd_on.png" WIDTH=16 HEIGHT=16 BORDER=0> Generate alot of context switches, and watch 'em:<BR> +<PRE><B> +$ . /etc/pcp.env +$ pmie -v -c $PCP_DEMOS_DIR/pmie/pswitch.pmie & +$ pmchart -t 1sec -c $PCP_DEMOS_DIR/pmie/pswitch.view & +$ while true; do sleep 0; done & +$ jobs +[1]- Running pmie -v -c $PCP_DEMOS_DIR/pmie/pswitch.pmie & +[2]- Running pmchart -t 0.5sec -c $PCP_DEMOS_DIR/pmie/pswitch.view & +[3]+ Running while true; do sleep 0; done & +</B></PRE> +<BR> +<B>Important:</B> the above test case can be quite intrusive on low processor +count machines, so remember to terminate it when you've finished this tutorial: +<PRE><B> +$ jobs +... +[3]+ Running while true; do sleep 0; done & +$ fg %3 +</B></PRE> +</TD></TR> +</TABLE> +</TD> +<TD><IMG ALIGN=RIGHT SRC="images/cpu_pswitch.png" BORDER=0></TD> +</TR> +</TABLE> + +<P><BR></P> +<TABLE WIDTH=100% BORDER=0 CELLPADDING=0 CELLSPACING=0 BGCOLOR="#e2e2e2"> + <TR><TD WIDTH=100% BGCOLOR="#081c59"><P ALIGN=LEFT><FONT SIZE=5 COLOR="#ffffff"><B><A NAME="repeat">Action repetition and launching arbitrary actions</A></B></FONT></P></TD></TR> +</TABLE> +<P>Sometimes it is useful for an action not to be repeated for a time. +For example, the English statement:</P> +<PRE></I> + If the context switch rate exceeds 2000 switches per second + then launch <B>top</B> in an <B>xterm</B> window + but hold off repetition of the action for 5 minutes +</I></PRE> +<P>may be translated into the following <I>pmie</I> rule:</P> +<CENTER><P ALIGN="CENTER"><IMG SRC="images/pmie_rule2.png" WIDTH="495" HEIGHT="200"></P> +</CENTER> +<P> +Note the <B><TT><FONT COLOR="#ff0000">shell</FONT></TT></B> keyword +introduces an arbitrary action in which any program can be launched.</P> + +<P><BR></P> +<TABLE WIDTH=100% BORDER=0 CELLPADDING=0 CELLSPACING=0 BGCOLOR="#e2e2e2"> + <TR><TD WIDTH=100% BGCOLOR="#081c59"><P ALIGN=LEFT><FONT SIZE=5 COLOR="#ffffff"><B><A NAME="value-sets">Rule evaluation over sets of values</A></B></FONT></P></TD></TR> +</TABLE> +<P>Each <I>pmie</I> rule may be evaluated over a set of performance metric values.</P> +<P>Conceptually these sets of values are constructed for a single performance +metric by taking the cross product of observed values over the three +dimensions of: <B>hosts</B>, <B>instances</B> and <B>times</B>.</P> +<P>The default host is:</P> +<UL> + <LI> + the host named in the <B><TT>-h</TT></B> command line option to + <I>pmie</I>, or + <LI> + the host associated with the archive named in the first + <B><TT>-a</TT></B> command line option to <I>pmie</I>, or + <LI> + the local host if neither <B><TT>-h</TT></B> nor <B><TT>-a</TT></B> + appears on the command line. +</UL> +<P> +By default, a metric name represents the set of values formed by the +cross product of the default host for <I>pmie</I>, <B>all</B> instances +and the current time. If there is only one instance, then the set +contains a singular value.</P> +<P> +For example <B><TT>filesys.free</TT></B> is the most recent set of +values for the amount of free space on every mounted file system on the +default host, and may be represented by the shaded rectangle in the +following figure:</P> +<CENTER><P ALIGN="CENTER"><IMG SRC="images/pmie_axis1.png" WIDTH="396" HEIGHT="226"></P> +</CENTER> +<P> +One or more suffix of the form <B><TT>#<I>instance</I></TT></B> +(where <TT><I>instance</I></TT> is the external instance identifier) +after a metric name restricts the set of values on the default host +for <I>pmie</I>, to the <B>nominated</B> instances and the current time. +If <I><TT>instance</TT></I> includes any special characters then it +should be enclosed in single quotes.</P> +<P> +For example <B><TT>filesys.free #'/usr'</TT></B> is the most recent +value for the amount of free space on the <I>/usr</I> file system on the +default host, and may be represented as follows:</P> +<CENTER><P ALIGN="CENTER"><IMG SRC="images/pmie_axis2.png" WIDTH="396" HEIGHT="226"></P> +</CENTER> +<P> +One or more suffix of the form <B><TT>:<I>hostname</I></TT></B> after +a metric name changes the set of values to include <B>all</B> instances +on the <B>nominated</B> hosts, at the current time.</P> +<P> +For example <B><TT>filesys.free :otherhost</TT></B> is the most +recent set of values for the amount of free space on every mounted file +system on the host <B>otherhost</B>, and may be represented as follows:</P> +<CENTER><P ALIGN="CENTER"><IMG SRC="images/pmie_axis3.png" WIDTH="396" HEIGHT="226"></P> +</CENTER> +<P> +A suffix of the form <B><TT>@<I>N..M</I></TT></B> after a metric name +changes the set of values to be that formed by <B>all</B> instances on +the default host for <I>pmie</I>, at the sample times +<B><I><TT>N</TT></I></B>, <B><I><TT>N+1</TT></I></B>, ... +<B><I><TT>M</TT></I></B><TT> </TT><B>back</B> from the current time.</P> +<P> +And finally more than one type of suffix may be used to control enumeration +in each of the three axis directions.</P> +<P> +For example <B><TT>filesys.free #'/usr' @0..3</TT></B> refers to +the default host, restricts the instances and enumerates the time. +This may be represented as follows:</P> +<CENTER><P ALIGN="CENTER"><IMG SRC="images/pmie_axis4.png" WIDTH="396" HEIGHT="226"></P> +</CENTER> + +<P><BR></P> +<TABLE WIDTH=100% BORDER=0 CELLPADDING=0 CELLSPACING=0 BGCOLOR="#e2e2e2"> + <TR><TD WIDTH=100% BGCOLOR="#081c59"><P ALIGN=LEFT><FONT SIZE=5 COLOR="#ffffff"><B><A NAME="predicates">Forms of <I>pmie</I> predicate</A></B></FONT></P></TD></TR> +</TABLE> +<H3><A NAME="exist">Existential quantification</A></H3> +<P> +The predicate <B><TT><FONT COLOR="#ff0000">some_inst</FONT></TT></B><TT> +<I>expr</I></TT> is true if there is some instance of a metric that +makes <I><TT>expr</TT></I> true.</P> +<P> +Existential quantification over hosts and consecutive samples is also +supported by <B><TT><FONT COLOR="#ff0000">some_host</FONT></TT></B><TT> +<I>expr</I></TT> and +<B><TT><FONT COLOR="#ff0000">some_sample</FONT></TT></B><TT> +<I>expr</I></TT>.</P> +<P> +For example, the English statement:</P> +<PRE> +<I> if some disk is doing a lot of I/O</I> +<I> then launch a visible alarm</I> +</PRE> +<P>may be translated into the following <I>pmie</I> rule:</P> +<CENTER><P ALIGN="CENTER"><IMG SRC="images/pmie_rule3.png" WIDTH="495" HEIGHT="120"></P> +</CENTER> +<BR> +<H3><A NAME="universal">Universal quantification</A></H3> +<P> +The predicate <B><TT><FONT COLOR="#ff0000">all_inst</FONT></TT></B> +<TT><I>expr</I></TT> is true if <I><TT>expr</TT></I> is true for every +instance of a metric.</P> +<P> +Universal quantification over hosts and consecutive samples is also +supported by +<B><TT><FONT COLOR="#ff0000">all_host</FONT></TT></B> <TT><I>expr</I></TT> +and <B><TT><FONT COLOR="#ff0000">all_sample</FONT></TT></B> +<TT><I>expr</I></TT>.</P> +<P> +Quantification predicates may be nested.</P> +<P> +For example, the English statement:</P> +<PRE> +<I> if for every one of the last 5 samples,</I> +<I> some disk (but not necessarily the same disk) is doing a lot of I/O</I> +<I> then launch a visible alarm</I> +</PRE> +<P>may be translated into the following <I>pmie</I> rule:</P> +<CENTER><P ALIGN="CENTER"><IMG SRC="images/pmie_rule4.png" WIDTH="540" HEIGHT="140"></P> +</CENTER> +<P>Note that reversing the nesting of the universal and existential predicates produces +a rule which has slightly different English semantics, namely:</P> +<PRE> +<I> if the same disk has been doing a lot of I/O</I> +<I> for every one of the last 5 samples,</I> +<I> then launch a visible alarm</I> +</PRE> +<CENTER><P ALIGN="CENTER"><IMG SRC="images/pmie_rule5.png" WIDTH="540" HEIGHT="140"></P> +</CENTER> +<BR> +<H3><A NAME="percentile">Percentile quantification</A></H3> +<P> +The predicate +<TT><I><FONT COLOR="#000000">N</FONT></I><B><FONT COLOR="#ff0000">%_inst</FONT></B> <I>expr</I></TT> +is true if <I><TT>expr</TT></I> is true for <I><TT>N</TT></I> percent +of the instances of a metric.</P> +<P> +Percentile quantification over hosts and consecutive samples is also +supported by +<TT><I><FONT COLOR="#000000">N</FONT></I><B><FONT COLOR="#ff0000">%_host</FONT></B> <I>expr</I></TT> and +<TT><I><FONT COLOR="#000000">N</FONT></I><B><FONT COLOR="#ff0000">%_sample</FONT></B> <I>expr</I></TT>.</P> +<P> +For example, the English phrase:</P> +<PRE> +<I> if at least 30% of the disks are doing a lot of I/O</I> +<I> then launch a visible alarm</I> +</PRE> +<P> +may be translated into the following <I>pmie</I> expression:</P> +<CENTER><P ALIGN="CENTER"><IMG SRC="images/pmie_rule3.png" WIDTH="495" HEIGHT="120"></P> +</CENTER> +<BR> +<H3><A NAME="others">Other predicates</A></H3> +<P> +Instance quantification: +<B><TT><FONT COLOR="#ff0000">match_inst</FONT></TT>, +<TT><FONT COLOR="#ff0000">nomatch_inst</FONT></TT></B></P> +<P> +Value aggregation: +<B><TT><FONT COLOR="#ff0000">avg_inst</FONT></TT>, +<TT><FONT COLOR="#ff0000">sum_inst</FONT></TT>, +<TT><FONT COLOR="#ff0000">avg_host</FONT></TT>, +<TT><FONT COLOR="#ff0000">sum_host</FONT></TT>, +<TT><FONT COLOR="#ff0000">avg_sample</FONT></TT>, +<TT><FONT COLOR="#ff0000">sum_sample</FONT></TT></B></P> +<P> +Value extrema: +<B><TT><FONT COLOR="#ff0000">min_inst</FONT></TT>, +<TT><FONT COLOR="#ff0000">max_inst</FONT></TT>, +<TT><FONT COLOR="#ff0000">min_host</FONT></TT>, +<TT><FONT COLOR="#ff0000">max_host</FONT></TT>, +<TT><FONT COLOR="#ff0000">min_sample</FONT></TT>, +<TT><FONT COLOR="#ff0000">max_sample</FONT></TT></B></P> +<P> +Value set cardinality: +<B><TT><FONT COLOR="#ff0000">count_inst</FONT></TT>, +<TT><FONT COLOR="#ff0000">count_host</FONT></TT>, +<TT><FONT COLOR="#ff0000">count_sample</FONT></TT></B></P> +<P> +Trends: +<B><TT><FONT COLOR="#ff0000">rising</FONT></TT>, +<TT><FONT COLOR="#ff0000">falling</FONT></TT>, +<TT><FONT COLOR="#ff0000">rate</FONT></TT></B></P> +<P>These predicates are discussed in depth in the <I>pmie</I> manual page.</P> + + +<P><BR></P> +<TABLE WIDTH=100% BORDER=0 CELLPADDING=0 CELLSPACING=0 BGCOLOR="#e2e2e2"> + <TR><TD WIDTH=100% BGCOLOR="#081c59"><P ALIGN=LEFT><FONT SIZE=5 COLOR="#ffffff"><B><I><A NAME="expr">pmie</I> expressions</A></B></FONT></P></TD></TR> +</TABLE> +<P><I>pmie</I> expressions are very similar to the C programming language; +especially with regard to arithmetic, relational and Boolean operators, +and the use of parenthesis for grouping.</P> +<P>The <I>pmie</I> language allows macro definition and textual substitution +for common expressions and metric names.</P> +<PRE> + // Macro for later use ... + bc = "buffer_cache"; + + // Using the above macro + // If the buffer cache is in use (more than 50 read requests) + // with hit ratio less than 90%, then popup an alarm + $bc.getblks > 50 && $bc.getfound / $bc.getblks < 0.9 + -> alarm "poor buffer cache hit rate"; + +</PRE> +<P>All calculations are done in double precision, where default units are +<B>bytes</B>, <B>seconds</B> and <B>counts</B>. +Note that this can sometimes cause surprises:</P> +<PRE> + mem.freemem > 10; +</PRE> +<P> +will always be true, unlike</P> +<PRE> + mem.freemem > 10 Mbyte; +</PRE> +<P> +Metrics with "counter" semantics have their units, semantics +and values converted to rates. For example, the metric +<B><TT>network.interface.total.bytes</TT></B> measures the number of bytes passed +across all of the configured network interfaces. The metric is a counter and the +units are <B><TT>bytes</TT></B>. If <I>pmie</I> finds the value of +<B><TT>network.interface.total.bytes</TT></B> to be 10000 and 15000 on consecutive +fetches 5 seconds apart, then the <I>pmie</I> expression</P> +<PRE> + <B><TT>kernel.interface.total.bytes;</TT></B> +</PRE> +<P> +would have the value <B>1000</B> and the units of +<B><TT>bytes/second</TT></B>.</P> + +<P><BR></P> +<TABLE WIDTH=100% BORDER=0 CELLPADDING=0 CELLSPACING=0 BGCOLOR="#e2e2e2"> + <TR><TD WIDTH=100% BGCOLOR="#081c59"><P ALIGN=LEFT><FONT SIZE=5 COLOR="#ffffff"><B><A NAME="actions">Actions and parameter substitution of predicate context</A></B></FONT></P></TD></TR> +</TABLE> +<P> The available <I>pmie</I> actions are:</P> +<UL> +<LI><B><TT><FONT COLOR="#ff0000">alarm</FONT></TT></B> - popup alarm notifier +<LI><B><TT><FONT COLOR="#ff0000">shell</FONT></TT></B> - launch any program +<LI><B><TT><FONT COLOR="#ff0000">syslog</FONT></TT></B> - write an entry in the system log +<LI><B><TT><FONT COLOR="#ff0000">print</FONT></TT></B> - print message to standard output +</UL> +<P>Within the arguments that follow the action keyword, parameter substitution +may be used to incorporate some context from the predicate in the arguments +to the actions. For example, when using <B><TT><FONT COLOR="#ff0000">some_host</FONT></TT></B> or <B><TT><FONT COLOR="#ff0000">some_inst</FONT></TT></B> in a predicate, it is most helpful to know "which hosts" or "which instances" +made the condition true.</P> +<P>The following substitutions are available:</P> +<UL> +<LI><B><TT><FONT COLOR="#ff0000">%h</FONT></TT></B> appearing in an action is replaced by the qualifying hosts +<LI><B><TT><FONT COLOR="#ff0000">%i</FONT></TT></B> appearing in an action is replaced by the qualifying instances +<LI><B><TT><FONT COLOR="#ff0000">%v</FONT></TT></B> appearing in an action is replaced by value of the left-most top-level +expression in the expression tree that represents the parsed condition +</UL> + +<P><BR></P> +<TABLE WIDTH=100% BORDER=0 CELLPADDING=0 CELLSPACING=0 BGCOLOR="#e2e2e2"> + <TR><TD WIDTH=100% BGCOLOR="#081c59"><P ALIGN=LEFT><FONT SIZE=5 COLOR="#ffffff"><B><A NAME="audit">Performance audit using archives</A></B></FONT></P></TD></TR> +</TABLE> +<P> In this exercise, we shall use <I>pmie</I> to investigate performance from +a PCP archive.</P> + +<TABLE WIDTH=100% BORDER=0 CELLPADDING=0 CELLSPACING=0> +<TR><TD> + <TABLE WIDTH=100% BORDER=0 CELLPADDING=10 CELLSPACING=10> + <TR><TD BGCOLOR="#e2e2e2" WIDTH=50%><BR><IMG SRC="images/stepfwd_on.png" WIDTH=16 HEIGHT=16 BORDER=0> Use <I>pmdumplog</I> to report the details of when the archive was created and from which host the archive was created:<BR> +<PRE><B> +$ . /etc/pcp.env +$ tar xzf $PCP_DEMOS_DIR/tutorials/pmie.tar.gz +$ pmdumplog -L pmie/babylon.perdisk +Log Label (Log Format Version 2) +Performance metrics from host babylon + commencing Wed Jan 25 08:17:48.460 1995 + ending Wed Jan 25 14:12:48.457 1995 +Archive timezone: PST8PDT +PID for pmlogger: 18496 +</B></PRE> +<I>Yes, PCP archives from <B>that</B> long ago still work today!</I><BR> +<BR>From running the command:<BR> +<PRE><B> +$ dkvis -a pmie/babylon.perdisk & +</B></PRE> +we can visually determine which disks and which controllers are active. +</TD> +<TD><IMG ALIGN=RIGHT SRC="images/dkvis.png" BORDER=0></TD> +</TR> +</TABLE> +<P>This is easy, which is good. +However, consider the situation where we have a large number of +separate archives, possibly collected from different machines and with +different disk configurations. We'd like to be able to quickly process +these archives, and filter out the extraneous information, to focus on +those times at which the disks were busy, how busy they were, etc.</P> + +<TABLE WIDTH=100% BORDER=0 CELLPADDING=10 CELLSPACING=20> + <TR><TD BGCOLOR="#e2e2e2" WIDTH=50%><BR><IMG SRC="images/stepfwd_on.png" WIDTH=16 HEIGHT=16 BORDER=0> Using the <I>pmie</I> configuration file in <A HREF="pmie/disk.pmie">pmie/disk.pmie</A> as a starting point, run this against the archive:<BR> +<PRE><B> +$ pmie -t 5min -a pmie/babylon.perdisk < pmie/disk.pmie +</B></PRE> +<BR> +Copy the configuration file and extend it by adding new rules to report +different messages for each of the following: +<OL> + <LI> some disk is doing more than 30 reads per second (make use of the <B><TT>disk.dev.read</TT></B> metric) + <LI> some disk is doing more than 30 writes per second (make use of the <B><TT>disk.dev.write</TT></B> metric) + <LI> some disk has a high I/O rate (consider a high I/O rate to be when the +transfers are greater than 40 per second), and where reads contribute +greater than ninety-five percent of the total transfers + <LI> some disk has a high I/O rate (as defined above) and the system's 1 +minute load average is greater than 5 (make use of the "1 +minute" instance for the <B><TT>kernel.all.load</TT></B> metric). +</OL> +<P>Use the <I>pmie/babylon.perdisk</I> archive extracted earlier to cross check your rules as you add each one.</P> +</OL> +</TD></TR> +</TABLE> + +<H4> <I>Hints:</I> </H4> +<UL> + <LI> Make sure you sample the archive every 5 minutes (<B>-t 5min</B> on the command line). + <LI> You'll need to use existential quantification (the <B>some_inst</B> keyword) in all of the rules. + <LI> When producing the final rule, start with the load average metric using the command: +<BR><BR> +<TT><B>$ pminfo -f kernel.all.load</B></TT> +<BR><BR> +Notice there are three values corresponding to the 1, 5 and 15 minute load average. +<BR> +For <I>pmie</I> the metric <TT><B>kernel.all.load</B></TT> +is a set of <B>three</B> values, one for each instance at each point +of time. To choose <B>one</B> instance append the <B><TT>#</B></TT> +qualifier to the name of the metric and the name of a particular instance, +e.g. <TT><B>kernel.all.load #'1 minute'</B></TT>. + <LI> The <I>pmie(1)</I> man page describes the <I>pmie</I> language in detail. + <LI> You may find it helpful to use <I>dkvis</I> to visually predict +when the rules should be triggered. Using the <B>PCP Archive Time Control</B> +dialog, you can position the <I>dkvis</I> display at the time where <I>pmie</I> +is reporting interesting activity. +</UL> +<P> +When all else fails, the solution is at <A HREF="pmie/answer.pmie">pmie/answer.pmie</A>.</P> + +<P><BR></P> +<TABLE WIDTH=100% BORDER=0 CELLPADDING=0 CELLSPACING=0 BGCOLOR="#e2e2e2"> + <TR><TD WIDTH=100% BGCOLOR="#081c59"><P ALIGN=LEFT><FONT SIZE=5 COLOR="#ffffff"><B><A NAME="interval">Influence of the update interval </A></B></FONT></P></TD></TR> +</TABLE> +<P>As a final exercise, investigate the effects of using different update +intervals on the <I>pmie</I> command line (the <B>-t</B> option) with +the initial configuration file and archive from the previous exercise.</P> +<TABLE WIDTH=100% BORDER=0 CELLPADDING=10 CELLSPACING=20> + <TR><TD BGCOLOR="#e2e2e2" WIDTH=50%><BR><IMG SRC="images/stepfwd_on.png" WIDTH=16 HEIGHT=16 BORDER=0> Try each of the following:<BR> +<PRE><B> +$ pmie -t 5min -a pmie/babylon.perdisk < pmie/disk.pmie +$ pmie -t 6min -a pmie/babylon.perdisk < pmie/disk.pmie +$ pmie -t 10min -a pmie/babylon.perdisk < pmie/disk.pmie +</B></PRE> +</TD></TR> +</TABLE> +<P>Why does the number of reported incidents decline as the rule evaluation interval increases?</P> +<TABLE WIDTH=100% BORDER=0 CELLPADDING=10 CELLSPACING=20> + <TR><TD BGCOLOR="#e2e2e2" WIDTH=50%><BR><IMG SRC="images/stepfwd_on.png" WIDTH=16 HEIGHT=16 BORDER=0> Repeat the exercise but use <I>pmchart</I>:<BR> +<PRE><B> +$ pmchart -t 5min -a pmie/babylon.perdisk & +</B></PRE> +<BR> +Use the <B>New Chart...</B> command of the <B>File</B> menu to plot +the <B><TT>disk.dev.total</TT></B> metric for the disk <B>jag3d5</B>:<P> + <UL> + <LI> Enter the name <TT>disk.dev.total</TT> into the Metric Selection dialog. + <LI> There should be 54 instances of the metric listed. + Find the instance <B>jag3d5</B>, select it, and press OK. + </UL> +</B></PRE> +</TD></TR> +</TABLE> + +<P>Use the PCP Archive Time Control dialog to change the <B><I>Interval</I></B>.</P> +<P>By using <B>smaller</B> values of the update interval, can you +deduce the sampling rate of the data in the PCP archive?</P> + +<H4> <I>Hint:</I> </H4> +<UL> + <LI> From a PCP archive you can get a dump of the raw data and timestamps +when the data for a particular metric was collected using the command: +<BR> <BR> +<TT><B>$ pmdumplog pmie/babylon.perdisk disk.dev.total | more</B></TT> +<BR> <BR> +</UL> + +<P><BR></P> +<HR> +<CENTER> +<TABLE WIDTH=100% BORDER=0 CELLPADDING=0 CELLSPACING=0> + <TR> <TD WIDTH=50%><P>Copyright © 2007-2010 <A HREF="http://www.aconex.com/"><FONT COLOR="#000060">Aconex</FONT></A><BR>Copyright © 2000-2004 <A HREF="http://www.sgi.com/"><FONT COLOR="#000060">Silicon Graphics Inc</FONT></P></TD> + <TD WIDTH=50%><P ALIGN=RIGHT><A HREF="http://pcp.io/"><FONT COLOR="#000060">PCP Site</FONT></A><BR>Copyright © 2012-2014 <A HREF="http://www.redhat.com/"><FONT COLOR="#000060">Red Hat</FONT></P></TD> </TR> +</TABLE> +</CENTER> +</BODY> +</HTML> |