summaryrefslogtreecommitdiff
path: root/man/html/howto.cpuperf.html
diff options
context:
space:
mode:
Diffstat (limited to 'man/html/howto.cpuperf.html')
-rw-r--r--man/html/howto.cpuperf.html430
1 files changed, 430 insertions, 0 deletions
diff --git a/man/html/howto.cpuperf.html b/man/html/howto.cpuperf.html
new file mode 100644
index 0000000..31fc5d3
--- /dev/null
+++ b/man/html/howto.cpuperf.html
@@ -0,0 +1,430 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
+<!--
+ (c) Copyright 2000-2004 Silicon Graphics Inc. All rights reserved.
+ Permission is granted to copy, distribute, and/or modify this document
+ under the terms of the Creative Commons Attribution-Share Alike, Version
+ 3.0 or any later version published by the Creative Commons Corp. A copy
+ of the license is available at
+ http://creativecommons.org/licenses/by-sa/3.0/us/ .
+-->
+<HTML>
+<HEAD>
+ <meta http-equiv="content-type" content="text/html; charset=utf-8">
+ <meta http-equiv="content-style-type" content="text/css">
+ <link href="pcpdoc.css" rel="stylesheet" type="text/css">
+ <link href="images/pcp.ico" rel="icon" type="image/ico">
+ <TITLE>Understanding system-level processor performance</TITLE>
+</HEAD>
+<BODY LANG="en-AU" TEXT="#000060" DIR="LTR">
+<TABLE WIDTH=100% BORDER=0 CELLPADDING=0 CELLSPACING=0 STYLE="page-break-before: always">
+ <TR> <TD WIDTH=64 HEIGHT=64><FONT COLOR="#000080"><A HREF="http://pcp.io/"><IMG SRC="images/pcpicon.png" NAME="pmcharticon" ALIGN=TOP WIDTH=64 HEIGHT=64 BORDER=0></A></FONT></TD>
+ <TD WIDTH=1><P>&nbsp;&nbsp;&nbsp;&nbsp;</P></TD>
+ <TD WIDTH=500><P VALIGN=MIDDLE ALIGN=LEFT><A HREF="index.html"><FONT COLOR="#cc0000">Home</FONT></A>&nbsp;&nbsp;&middot;&nbsp;<A HREF="lab.pmchart.html"><FONT COLOR="#cc0000">Charts</FONT></A>&nbsp;&nbsp;&middot;&nbsp;<A HREF="timecontrol.html"><FONT COLOR="#cc0000">Time Control</FONT></A></P></TD>
+ </TR>
+</TABLE>
+<H1 ALIGN=CENTER STYLE="margin-top: 0.48cm; margin-bottom: 0.32cm"><FONT SIZE=7>Understanding measures of system-level processor performance</FONT></H1>
+<TABLE WIDTH=15% BORDER=0 CELLPADDING=5 CELLSPACING=10 ALIGN=RIGHT>
+ <TR><TD BGCOLOR="#e2e2e2"><IMG SRC="images/system-search.png" WIDTH=16 HEIGHT=16 BORDER=0>&nbsp;&nbsp;<I>Tools</I><BR><PRE>
+pmchart
+mpvis
+sar
+</PRE></TD></TR>
+</TABLE>
+<P>This chapter of the Performance Co-Pilot tutorial provides some hints
+on how to interpret and understand the various measures of system-level
+processor (CPU) performance.</P>
+<P>All modern operating systems collect processor resource utilization at both
+the <B>process</B>-level and the <B>system</B>-level.&nbsp;&nbsp;This tutorial relate specifically to the <B>system</B>-level metrics.</P>
+<P>For an explanation of Performance Co-Pilot terms and acronyms, consult
+the <A HREF="glossary.html">PCP glossary</A>.</P>
+
+<P><BR></P>
+<TABLE WIDTH=100% BORDER=0 CELLPADDING=0 CELLSPACING=0 BGCOLOR="#e2e2e2">
+ <TR><TD WIDTH=100% BGCOLOR="#081c59"><P ALIGN=LEFT><FONT SIZE=5 COLOR="#ffffff"><B>How the system-level CPU time is computed</B></FONT></P></TD></TR>
+</TABLE>
+<P>
+Both <I>sar</I> and Performance Co-Pilot (PCP) use a common collection
+of system-level CPU performance instrumentation from the kernel.
+This instrumentation is based upon statistical sampling of the state of <B>each</B>
+ CPU in the kernel's software clock interrupt routine which is commonly
+called 100 times (HZ) per second on every CPU.</P>
+<P>
+At each observation a CPU is attributed a quantum of 10 milliseconds of
+elapsed time to one of several counters based on the current state of
+the code executing on that CPU.</P>
+<P>
+This sort of statistical sampling is subject to some anomalies,
+particularly when activity is strongly correlated with the clock
+interrupts, however the distribution of observations over several
+seconds or minutes is often an accurate reflection of the true
+distribution of CPU time. The kernel profiling mechanisms offer higher
+resolution should that be required, however that is beyond the scope
+of this document.</P>
+<P>
+The CPU state is determined by considering what the CPU was doing just
+before the clock interrupt, as follows:</P>
+<OL>
+ <LI>
+ If executing a <B>user</B> thread (i.e. above the kernel system
+ call interface for some process) then the state is CPU_USER.
+ <LI>
+ If executing a kernel interrupt thread, then the state is CPU_INTR.
+ <LI>
+ If executing a kernel thread waiting for a graphics event, then the
+ state is CPU_WAIT.
+ <LI>
+ If otherwise executing a kernel thread, then the state is CPU_KERNEL.
+ <LI>
+ If not executing a kernel thread and some I/O is pending, then the
+ state is CPU_WAIT.
+ <LI>
+ If not executing a kernel thread and no I/O is pending and some user
+ thread is paused waiting for memory to become available, then the state
+ is CPU_SXBRK.
+ <LI>
+ Otherwise the state is CPU_IDLE.
+</OL>
+<P>
+These states are mutually exclusive and complete, so exactly one state
+is assigned for each CPU at each clock interrupt.</P>
+<P>
+The kernel agent for PCP exports the following metrics:</P>
+<TABLE BORDER="1">
+ <CAPTION ALIGN="BOTTOM"><B>Table 1: Raw PCP CPU metrics</B></CAPTION>
+ <TR VALIGN="TOP">
+ <TH>PCP Metric</TH>
+ <TH>Semantics</TH>
+ </TR>
+ <TR VALIGN="TOP">
+ <TD><I><TT>kernel.all.cpu.user</TT></I></TD>
+ <TD>Time counted when in CPU_USER state.</TD>
+ </TR>
+ <TR VALIGN="TOP">
+ <TD><I><TT>kernel.all.cpu.sys</TT></I></TD>
+ <TD>Time counted when in CPU_KERNEL state.</TD>
+ </TR>
+ <TR VALIGN="TOP">
+ <TD><I><TT>kernel.all.cpu.intr</TT></I></TD>
+ <TD>Time counted when in CPU_INTR state.</TD>
+ </TR>
+ <TR VALIGN="TOP">
+ <TD><I><TT>kernel.all.cpu.sxbrk</TT></I></TD>
+ <TD>Time counted when in CPU_SXBRK state (IRIX only).</TD>
+ </TR>
+ <TR VALIGN="TOP">
+ <TD><I><TT>kernel.all.cpu.wait.total</TT></I></TD>
+ <TD>Time counted when in CPU_WAIT state (UNIX only).</TD>
+ </TR>
+ <TR VALIGN="TOP">
+ <TD><I><TT>kernel.all.cpu.idle</TT></I></TD>
+ <TD>Time counted when in CPU_IDLE state.</TD>
+ </TR>
+</TABLE>
+<P>
+These metrics are all &quot;counters&quot; in units of milliseconds
+(cumulative since system boot time) so when displayed with most PCP
+tools they are &quot;rate converted&quot; (sampled periodically and the
+differences between consecutive values converted to time utilization in
+units of milliseconds per second over the sample interval). Since the
+raw values are aggregated across all CPUs, the time utilization for any
+of the metrics above is in the range 0 to N*1000 for an N CPU system;
+for some PCP tools this is reported as a percentage in the range 0 to
+N*100 percent.</P>
+
+<TABLE WIDTH=100% BORDER=0 CELLPADDING=10 CELLSPACING=20>
+ <TR><TD BGCOLOR="#e2e2e2" WIDTH=70%><BR><IMG SRC="images/stepfwd_on.png" WIDTH=16 HEIGHT=16 BORDER=0>&nbsp;&nbsp;&nbsp;Using <I>pmchart</I> to display CPU activity (aggregated over all CPUs).<BR>
+<PRE><B>
+$ source /etc/pcp.conf
+$ tar xzf $PCP_DEMOS_DIR/tutorials/cpuperf.tgz
+$ pmchart -c CPU -t 2sec -O -0sec -a cpuperf/moomba.pmkstat
+</B></PRE>
+<BR>
+This command will provide the interactive charts described here.
+</TD></TR>
+</TABLE>
+
+<P>On IRIX, the CPU_WAIT state is further subdivided into components describing
+different types of &quot;waiting&quot;:</P>
+<OL>
+ <LI>
+ If executing a kernel thread waiting for a graphics context switch,
+ then the waiting classification W_GFXC is true.
+ <LI>
+ If executing a kernel thread waiting for a graphics FIFO operation to
+ complete, then the waiting classification W_GFXF is true.
+ <LI>
+ If not executing any thread and an I/O involving a block device (most
+ likely associated with a file system but independent of the CPU from
+ which the I/O was initiated), then the waiting classification W_IO is
+ true.
+ <LI>
+ If not executing any thread and an I/O involving a swap operation
+ (independent of the CPU from which the I/O was initiated), then the
+ waiting classification W_SWAP is true.
+ <LI>
+ If not executing any thread and an I/O involving a raw device is
+ pending (independent of the CPU from which the I/O was initiated), then
+ the waiting classification W_PIO is true.
+</OL>
+<P>
+More than one of the group { W_IO, W_SWAP, W_PIO } can be true each
+time, however this group and W_GFXC and W_GFXF are all mutually
+exclusive. If the state is CPU_WAIT, then at least one of the
+classifications must be true.</P>
+<P>
+The IRIX agent for PCP exports the following CPU &quot;wait&quot;
+metrics, the sum of which approximately equals <I><TT>kernel.all.cpu.wait.total</TT></I>:</P>
+<TABLE BORDER="1">
+ <CAPTION ALIGN="BOTTOM"><B>Table 2: Raw PCP CPU wait metrics</B></CAPTION>
+ <TR VALIGN="TOP">
+ <TH>PCP Metric</TH>
+ <TH>Semantics</TH>
+ </TR>
+ <TR VALIGN="TOP">
+ <TD><I><TT>kernel.all.cpu.wait.gfxc</TT></I></TD>
+ <TD>Time counted when W_GFXC is true.</TD>
+ </TR>
+ <TR VALIGN="TOP">
+ <TD><I><TT>kernel.all.cpu.wait.gfxf</TT></I></TD>
+ <TD>Time counted when W_GFXF is true.</TD>
+ </TR>
+ <TR VALIGN="TOP">
+ <TD><I><TT>kernel.all.cpu.wait.io</TT></I></TD>
+ <TD>Time counted when W_IO is true.</TD>
+ </TR>
+ <TR VALIGN="TOP">
+ <TD><I><TT>kernel.all.cpu.wait.pio</TT></I></TD>
+ <TD>Time counted when W_SWAP is true.</TD>
+ </TR>
+ <TR VALIGN="TOP">
+ <TD><I><TT>kernel.all.cpu.wait.swap</TT></I></TD>
+ <TD>Time counted when W_PIO is true.</TD>
+ </TR>
+</TABLE>
+<P>
+These metrics are all &quot;counters&quot; in units of milliseconds
+(cumulative since system boot time) so when displayed with most PCP
+tools they are &quot;rate converted&quot; (sampled periodically and the
+differences between consecutive values converted to time utilization in
+units of milliseconds per second over the sample interval). Since the
+raw values are aggregated across all CPUs, the time utilization for any
+of the metrics above is in the range 0 to N*1000 for an N CPU system;
+for some PCP tools this is reported as a percentage in the range 0 to
+N*100 percent.</P>
+<P>
+Note that for a multiprocessor system with one I/O pending, <B>all</B>
+ otherwise idle CPUs will be assigned the CPU_WAIT state. This may lead
+to an over-estimate of the I/O wait time, as discussed in the
+companion <A HREF="howto.diskperf.html">How to understand measures of
+disk performance</A> document.</P>
+<P>
+In IRIX 6.5.2 additional instrumentation was added to help address the
+wait time attribution by looking at the <B>number</B> of waiting
+processes in various states, rather than the state of a CPU with
+reference to the cardinality of the various sets of waiting processes.
+The wait I/O queue length is defined as the number of processes
+waiting on events corresponding to the classifications W_IO, W_SWAP or
+W_PIO. The metrics shown in the table below are computed on only <B>one</B>
+ of the CPUs (the &quot;clock-master&quot;) each clock interrupt.</P>
+<TABLE BORDER="1">
+ <CAPTION ALIGN="BOTTOM"><B>Table 3: Raw PCP wait I/O queue length
+ metrics</B></CAPTION>
+ <TR VALIGN="TOP">
+ <TH>PCP Metric</TH>
+ <TH>Semantics</TH>
+ </TR>
+ <TR VALIGN="TOP">
+ <TD><I><TT>kernel.all.waitio.queue</TT></I></TD>
+ <TD>Cumulative total of the wait I/O queue lengths, as observed
+ on each clock interrupt.</TD>
+ </TR>
+ <TR VALIGN="TOP">
+ <TD><I><TT>kernel.all.waitio.occ</TT></I></TD>
+ <TD>Cumulative total of the number of times the wait I/O queue
+ length is greater than zero, as observed on each clock interrupt.</TD>
+ </TR>
+</TABLE>
+<P>
+These metrics may be used with PCP tools as follows:</P>
+<UL>
+ <LI>
+ Displaying <I><TT>kernel.all.waitio.queue</TT></I> with <I>pmval</I>, <I>pmdumptext</I>, <I>pmchart</I>,
+ etc. will display the time average of the wait I/O queue length
+ multiplied by the frequency of clock interrupts, i.e. by 100.
+ <LI>
+ Displaying <I><TT>kernel.all.waitio.occ</TT></I> with <I>pmval</I>, <I>pmdumptext</I>, <I>pmchart</I>,
+ etc. will display the probability that the wait I/O queue is not empty
+ multiplied by the frequency of clock interrupts, i.e. by 100. This
+ value (converted to a percentage) is reported as <I><TT>%wioocc</TT></I>
+ by the <B>-q</B> option of <I>sar.</I>
+ <LI>
+ Using <I>pmie</I> with the expression<BR>
+ <I><TT>kernel.all.waitio.queue / kernel.all.waitio.occ</TT></I><BR>
+ will report the stochastic average of the wait I/O queue length,
+ conditional upon the queue not being empty. This value is reported as <I><TT>wioq-sz</TT></I>
+ by the <B>-q</B> option of <I>sar.</I>
+</UL>
+
+<P><BR></P>
+<TABLE WIDTH=100% BORDER=0 CELLPADDING=0 CELLSPACING=0 BGCOLOR="#e2e2e2">
+ <TR><TD WIDTH=100% BGCOLOR="#081c59"><P ALIGN=LEFT><FONT SIZE=5 COLOR="#ffffff"><B>The per-CPU variants</B></FONT></P></TD></TR>
+</TABLE>
+<P>
+Inside the kernel, most of the metrics described above are
+accumulated per-CPU for reasons of efficiency (to reduce the locking
+overheads and minimize dirty cache-line traffic).</P>
+<P>
+PCP exports the per-CPU versions of the system-wide metrics with metric
+names formed by replacing <B><I><TT>all</TT></I></B> by <B><I><TT>percpu</TT></I></B>,
+e.g. <I><TT>kernel.percpu.cpu.user</TT></I>.</P>
+<P>
+
+<TABLE WIDTH=100% BORDER=0 CELLPADDING=10 CELLSPACING=20>
+ <TR><TD BGCOLOR="#e2e2e2" WIDTH=70%><BR><IMG SRC="images/stepfwd_on.png" WIDTH=16 HEIGHT=16 BORDER=0>&nbsp;&nbsp;&nbsp;The <I>mpvis</I> tool provides 3-D visualization of these per-CPU metrics.<PRE><B>
+$ mpvis -a cpuperf/babylon.percpu
+</B></PRE>
+<BR>
+When the window is shown, use the <A HREF="timecontrol.html">PCP Archive Time Control</A> dialog to scroll through the archive (Fast Forward).
+</TD></TR>
+</TABLE>
+
+<P><BR></P>
+<TABLE WIDTH=100% BORDER=0 CELLPADDING=0 CELLSPACING=0 BGCOLOR="#e2e2e2">
+ <TR><TD WIDTH=100% BGCOLOR="#081c59"><P ALIGN=LEFT><FONT SIZE=5 COLOR="#ffffff"><B>Reconciling sar -u and PCP CPU performance metrics</B></FONT></P></TD></TR>
+</TABLE>
+<P>
+The <I>sar</I> metrics are scaled based on the number of CPUs and
+expressed in percentages, PCP metrics are in units of milliseconds per
+second after rate conversion; this explains the PCP metric <I><TT>hinv.ncpu</TT></I>
+ and the constants 100 and 1000 in the expressions below.</P>
+<P>
+When run with a <B>-u</B> option, <I>sar</I> reports the following:</P>
+<TABLE BORDER="1">
+ <CAPTION ALIGN="BOTTOM"><B>Table 3: PCP and sar metric equivalents</B></CAPTION>
+ <TR>
+ <TH><I>sar</I><BR>
+ metric</TH>
+ <TH>PCP equivalent (assuming rate conversion)</TH>
+ </TR>
+ <TR VALIGN="TOP">
+ <TD><I><TT>%usr</TT></I></TD>
+ <TD>100 * <I><TT>kernel.all.cpu.user</TT></I> / (<I><TT>hinv.ncpu </TT></I>*
+ 1000)</TD>
+ </TR>
+ <TR VALIGN="TOP">
+ <TD><I><TT>%sys</TT></I></TD>
+ <TD>100 * <I><TT>kernel.all.cpu.sys</TT></I> / (<I><TT>hinv.ncpu </TT></I>*
+ 1000)</TD>
+ </TR>
+ <TR VALIGN="TOP">
+ <TD><I><TT>%intr</TT></I></TD>
+ <TD>100 * <I><TT>kernel.all.cpu.intr</TT></I> / (<I><TT>hinv.ncpu </TT></I>*
+ 1000)</TD>
+ </TR>
+ <TR VALIGN="TOP">
+ <TD><I><TT>%wio</TT></I></TD>
+ <TD>100 * <I><TT>kernel.all.cpu.wait.total</TT></I> / (<I><TT>hinv.ncpu </TT></I>*
+ 1000)</TD>
+ </TR>
+ <TR VALIGN="TOP">
+ <TD><I><TT>%idle</TT></I></TD>
+ <TD>100 * <I><TT>kernel.all.cpu.idle</TT></I> / (<I><TT>hinv.ncpu </TT></I>*
+ 1000)</TD>
+ </TR>
+ <TR VALIGN="TOP">
+ <TD><I><TT>%sbrk</TT></I></TD>
+ <TD>100 * <I><TT>kernel.all.cpu.sxbrk </TT></I>/ (<I><TT>hinv.ncpu </TT></I>*
+ 1000)</TD>
+ </TR>
+ <TR VALIGN="TOP">
+ <TD><I><TT>%wfs</TT></I></TD>
+ <TD>100 * <I><TT>kernel.all.cpu.wait.io</TT></I> / <I><TT>kernel.all.cpu.wait.total</TT></I></TD>
+ </TR>
+ <TR VALIGN="TOP">
+ <TD><I><TT>%wswp</TT></I></TD>
+ <TD>100 * <I><TT>kernel.all.cpu.wait.swap</TT></I> / <I><TT>kernel.all.cpu.wait.total</TT></I></TD>
+ </TR>
+ <TR>
+ <TD><I><TT>%wphy</TT></I></TD>
+ <TD>100 * <I><TT>kernel.all.cpu.wait.pio</TT></I> / <I><TT>kernel.all.cpu.wait.total</TT></I></TD>
+ </TR>
+ <TR>
+ <TD><I><TT>%wgsw</TT></I></TD>
+ <TD>100 * <I><TT>kernel.all.cpu.wait.gfxc</TT></I> / <I><TT>kernel.all.cpu.wait.total</TT></I></TD>
+ </TR>
+ <TR>
+ <TD><I><TT>%wfif</TT></I></TD>
+ <TD>100 * <I><TT>kernel.all.cpu.wait.gfxf</TT></I> / <I><TT>kernel.all.cpu.wait.total</TT></I></TD>
+ </TR>
+</TABLE>
+
+<P><BR></P>
+<TABLE WIDTH=100% BORDER=0 CELLPADDING=0 CELLSPACING=0 BGCOLOR="#e2e2e2">
+ <TR><TD WIDTH=100% BGCOLOR="#081c59"><P ALIGN=LEFT><FONT SIZE=5 COLOR="#ffffff"><B>The load average</B></FONT></P></TD></TR>
+</TABLE>
+<P>
+The &quot;load average&quot; is reported by <I>uptime</I>, <I>top</I>,
+etc. and the PCP metric <I><TT>kernel.all.load</TT></I>.</P>
+<P>
+The load average is an indirect measure of the demand for CPU
+resources. It is calculated using the previous load average (<I>load</I>)
+and the number of currently runnable processes (<I>nrun</I>) and an
+exponential dampening expression, e.g. for the &quot;1 minute&quot;
+average, the expression is:</P>
+<PRE>
+load = exp(-5/60) * load + (1 - exp(-5/60)) * nrun
+</PRE>
+<P>
+The three load averages use different exponential constants and are all
+re-computed every 5 seconds.</P>
+<P>
+<I>nrun</I> is computed as follows:</P>
+<OL>
+ <LI>
+ Inspect every process.
+ <LI>
+ If the process is not likely to be runnable in the near future (state
+ not SRUN), ignore it.
+ <LI>
+ Inspect every thread of the process.
+ <LI>
+ If the thread is sleeping and not currently expanding its address space
+ (state not SXBRK) and not in a long-term sleep, increment <I>nrun.</I>
+
+ <LI>
+ If the thread is stopped, ignore it.
+ <LI>
+ Otherwise if the thread is not &quot;weightless&quot; (being ignored by
+ the scheduler), increment <I>nrun.</I>
+</OL>
+<P>
+Note that the &quot;run queue length&quot; (a variant of which is
+reported by the <B>-q</B> option of <I>sar</I>) counts processes using
+a similar, but not identical algorithm:</P>
+<OL>
+ <LI>
+ Inspect every process.
+ <LI>
+ If the process is not likely to be runnable in the near future (state
+ not SRUN), ignore it.
+ <LI>
+ Inspect every thread of the process.
+ <LI>
+ If the thread is sleeping and not currently expanding its address space
+ (state not SXBRK), ignore it
+ <LI>
+ If the thread is stopped, ignore it.
+ <LI>
+ Otherwise increment the &quot;run queue length&quot;.
+</OL>
+
+<P><BR></P>
+<HR>
+<CENTER>
+<TABLE WIDTH=100% BORDER=0 CELLPADDING=0 CELLSPACING=0>
+ <TR> <TD WIDTH=50%><P>Copyright &copy; 2007-2010 <A HREF="http://www.aconex.com/"><FONT COLOR="#000060">Aconex</FONT></A><BR>Copyright &copy; 2000-2004 <A HREF="http://www.sgi.com/"><FONT COLOR="#000060">Silicon Graphics Inc</FONT></P></TD>
+ <TD WIDTH=50%><P ALIGN=RIGHT><A HREF="http://pcp.io/"><FONT COLOR="#000060">PCP Site</FONT></A><BR>Copyright &copy; 2012-2014 <A HREF="http://www.redhat.com/"><FONT COLOR="#000060">Red Hat</FONT></P></TD> </TR>
+</TABLE>
+</CENTER>
+</BODY>
+</HTML>