diff options
Diffstat (limited to 'man/html/howto.cpuperf.html')
-rw-r--r-- | man/html/howto.cpuperf.html | 430 |
1 files changed, 430 insertions, 0 deletions
diff --git a/man/html/howto.cpuperf.html b/man/html/howto.cpuperf.html new file mode 100644 index 0000000..31fc5d3 --- /dev/null +++ b/man/html/howto.cpuperf.html @@ -0,0 +1,430 @@ +<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> +<!-- + (c) Copyright 2000-2004 Silicon Graphics Inc. All rights reserved. + Permission is granted to copy, distribute, and/or modify this document + under the terms of the Creative Commons Attribution-Share Alike, Version + 3.0 or any later version published by the Creative Commons Corp. A copy + of the license is available at + http://creativecommons.org/licenses/by-sa/3.0/us/ . +--> +<HTML> +<HEAD> + <meta http-equiv="content-type" content="text/html; charset=utf-8"> + <meta http-equiv="content-style-type" content="text/css"> + <link href="pcpdoc.css" rel="stylesheet" type="text/css"> + <link href="images/pcp.ico" rel="icon" type="image/ico"> + <TITLE>Understanding system-level processor performance</TITLE> +</HEAD> +<BODY LANG="en-AU" TEXT="#000060" DIR="LTR"> +<TABLE WIDTH=100% BORDER=0 CELLPADDING=0 CELLSPACING=0 STYLE="page-break-before: always"> + <TR> <TD WIDTH=64 HEIGHT=64><FONT COLOR="#000080"><A HREF="http://pcp.io/"><IMG SRC="images/pcpicon.png" NAME="pmcharticon" ALIGN=TOP WIDTH=64 HEIGHT=64 BORDER=0></A></FONT></TD> + <TD WIDTH=1><P> </P></TD> + <TD WIDTH=500><P VALIGN=MIDDLE ALIGN=LEFT><A HREF="index.html"><FONT COLOR="#cc0000">Home</FONT></A> · <A HREF="lab.pmchart.html"><FONT COLOR="#cc0000">Charts</FONT></A> · <A HREF="timecontrol.html"><FONT COLOR="#cc0000">Time Control</FONT></A></P></TD> + </TR> +</TABLE> +<H1 ALIGN=CENTER STYLE="margin-top: 0.48cm; margin-bottom: 0.32cm"><FONT SIZE=7>Understanding measures of system-level processor performance</FONT></H1> +<TABLE WIDTH=15% BORDER=0 CELLPADDING=5 CELLSPACING=10 ALIGN=RIGHT> + <TR><TD BGCOLOR="#e2e2e2"><IMG SRC="images/system-search.png" WIDTH=16 HEIGHT=16 BORDER=0> <I>Tools</I><BR><PRE> +pmchart +mpvis +sar +</PRE></TD></TR> +</TABLE> +<P>This chapter of the Performance Co-Pilot tutorial provides some hints +on how to interpret and understand the various measures of system-level +processor (CPU) performance.</P> +<P>All modern operating systems collect processor resource utilization at both +the <B>process</B>-level and the <B>system</B>-level. This tutorial relate specifically to the <B>system</B>-level metrics.</P> +<P>For an explanation of Performance Co-Pilot terms and acronyms, consult +the <A HREF="glossary.html">PCP glossary</A>.</P> + +<P><BR></P> +<TABLE WIDTH=100% BORDER=0 CELLPADDING=0 CELLSPACING=0 BGCOLOR="#e2e2e2"> + <TR><TD WIDTH=100% BGCOLOR="#081c59"><P ALIGN=LEFT><FONT SIZE=5 COLOR="#ffffff"><B>How the system-level CPU time is computed</B></FONT></P></TD></TR> +</TABLE> +<P> +Both <I>sar</I> and Performance Co-Pilot (PCP) use a common collection +of system-level CPU performance instrumentation from the kernel. +This instrumentation is based upon statistical sampling of the state of <B>each</B> + CPU in the kernel's software clock interrupt routine which is commonly +called 100 times (HZ) per second on every CPU.</P> +<P> +At each observation a CPU is attributed a quantum of 10 milliseconds of +elapsed time to one of several counters based on the current state of +the code executing on that CPU.</P> +<P> +This sort of statistical sampling is subject to some anomalies, +particularly when activity is strongly correlated with the clock +interrupts, however the distribution of observations over several +seconds or minutes is often an accurate reflection of the true +distribution of CPU time. The kernel profiling mechanisms offer higher +resolution should that be required, however that is beyond the scope +of this document.</P> +<P> +The CPU state is determined by considering what the CPU was doing just +before the clock interrupt, as follows:</P> +<OL> + <LI> + If executing a <B>user</B> thread (i.e. above the kernel system + call interface for some process) then the state is CPU_USER. + <LI> + If executing a kernel interrupt thread, then the state is CPU_INTR. + <LI> + If executing a kernel thread waiting for a graphics event, then the + state is CPU_WAIT. + <LI> + If otherwise executing a kernel thread, then the state is CPU_KERNEL. + <LI> + If not executing a kernel thread and some I/O is pending, then the + state is CPU_WAIT. + <LI> + If not executing a kernel thread and no I/O is pending and some user + thread is paused waiting for memory to become available, then the state + is CPU_SXBRK. + <LI> + Otherwise the state is CPU_IDLE. +</OL> +<P> +These states are mutually exclusive and complete, so exactly one state +is assigned for each CPU at each clock interrupt.</P> +<P> +The kernel agent for PCP exports the following metrics:</P> +<TABLE BORDER="1"> + <CAPTION ALIGN="BOTTOM"><B>Table 1: Raw PCP CPU metrics</B></CAPTION> + <TR VALIGN="TOP"> + <TH>PCP Metric</TH> + <TH>Semantics</TH> + </TR> + <TR VALIGN="TOP"> + <TD><I><TT>kernel.all.cpu.user</TT></I></TD> + <TD>Time counted when in CPU_USER state.</TD> + </TR> + <TR VALIGN="TOP"> + <TD><I><TT>kernel.all.cpu.sys</TT></I></TD> + <TD>Time counted when in CPU_KERNEL state.</TD> + </TR> + <TR VALIGN="TOP"> + <TD><I><TT>kernel.all.cpu.intr</TT></I></TD> + <TD>Time counted when in CPU_INTR state.</TD> + </TR> + <TR VALIGN="TOP"> + <TD><I><TT>kernel.all.cpu.sxbrk</TT></I></TD> + <TD>Time counted when in CPU_SXBRK state (IRIX only).</TD> + </TR> + <TR VALIGN="TOP"> + <TD><I><TT>kernel.all.cpu.wait.total</TT></I></TD> + <TD>Time counted when in CPU_WAIT state (UNIX only).</TD> + </TR> + <TR VALIGN="TOP"> + <TD><I><TT>kernel.all.cpu.idle</TT></I></TD> + <TD>Time counted when in CPU_IDLE state.</TD> + </TR> +</TABLE> +<P> +These metrics are all "counters" in units of milliseconds +(cumulative since system boot time) so when displayed with most PCP +tools they are "rate converted" (sampled periodically and the +differences between consecutive values converted to time utilization in +units of milliseconds per second over the sample interval). Since the +raw values are aggregated across all CPUs, the time utilization for any +of the metrics above is in the range 0 to N*1000 for an N CPU system; +for some PCP tools this is reported as a percentage in the range 0 to +N*100 percent.</P> + +<TABLE WIDTH=100% BORDER=0 CELLPADDING=10 CELLSPACING=20> + <TR><TD BGCOLOR="#e2e2e2" WIDTH=70%><BR><IMG SRC="images/stepfwd_on.png" WIDTH=16 HEIGHT=16 BORDER=0> Using <I>pmchart</I> to display CPU activity (aggregated over all CPUs).<BR> +<PRE><B> +$ source /etc/pcp.conf +$ tar xzf $PCP_DEMOS_DIR/tutorials/cpuperf.tgz +$ pmchart -c CPU -t 2sec -O -0sec -a cpuperf/moomba.pmkstat +</B></PRE> +<BR> +This command will provide the interactive charts described here. +</TD></TR> +</TABLE> + +<P>On IRIX, the CPU_WAIT state is further subdivided into components describing +different types of "waiting":</P> +<OL> + <LI> + If executing a kernel thread waiting for a graphics context switch, + then the waiting classification W_GFXC is true. + <LI> + If executing a kernel thread waiting for a graphics FIFO operation to + complete, then the waiting classification W_GFXF is true. + <LI> + If not executing any thread and an I/O involving a block device (most + likely associated with a file system but independent of the CPU from + which the I/O was initiated), then the waiting classification W_IO is + true. + <LI> + If not executing any thread and an I/O involving a swap operation + (independent of the CPU from which the I/O was initiated), then the + waiting classification W_SWAP is true. + <LI> + If not executing any thread and an I/O involving a raw device is + pending (independent of the CPU from which the I/O was initiated), then + the waiting classification W_PIO is true. +</OL> +<P> +More than one of the group { W_IO, W_SWAP, W_PIO } can be true each +time, however this group and W_GFXC and W_GFXF are all mutually +exclusive. If the state is CPU_WAIT, then at least one of the +classifications must be true.</P> +<P> +The IRIX agent for PCP exports the following CPU "wait" +metrics, the sum of which approximately equals <I><TT>kernel.all.cpu.wait.total</TT></I>:</P> +<TABLE BORDER="1"> + <CAPTION ALIGN="BOTTOM"><B>Table 2: Raw PCP CPU wait metrics</B></CAPTION> + <TR VALIGN="TOP"> + <TH>PCP Metric</TH> + <TH>Semantics</TH> + </TR> + <TR VALIGN="TOP"> + <TD><I><TT>kernel.all.cpu.wait.gfxc</TT></I></TD> + <TD>Time counted when W_GFXC is true.</TD> + </TR> + <TR VALIGN="TOP"> + <TD><I><TT>kernel.all.cpu.wait.gfxf</TT></I></TD> + <TD>Time counted when W_GFXF is true.</TD> + </TR> + <TR VALIGN="TOP"> + <TD><I><TT>kernel.all.cpu.wait.io</TT></I></TD> + <TD>Time counted when W_IO is true.</TD> + </TR> + <TR VALIGN="TOP"> + <TD><I><TT>kernel.all.cpu.wait.pio</TT></I></TD> + <TD>Time counted when W_SWAP is true.</TD> + </TR> + <TR VALIGN="TOP"> + <TD><I><TT>kernel.all.cpu.wait.swap</TT></I></TD> + <TD>Time counted when W_PIO is true.</TD> + </TR> +</TABLE> +<P> +These metrics are all "counters" in units of milliseconds +(cumulative since system boot time) so when displayed with most PCP +tools they are "rate converted" (sampled periodically and the +differences between consecutive values converted to time utilization in +units of milliseconds per second over the sample interval). Since the +raw values are aggregated across all CPUs, the time utilization for any +of the metrics above is in the range 0 to N*1000 for an N CPU system; +for some PCP tools this is reported as a percentage in the range 0 to +N*100 percent.</P> +<P> +Note that for a multiprocessor system with one I/O pending, <B>all</B> + otherwise idle CPUs will be assigned the CPU_WAIT state. This may lead +to an over-estimate of the I/O wait time, as discussed in the +companion <A HREF="howto.diskperf.html">How to understand measures of +disk performance</A> document.</P> +<P> +In IRIX 6.5.2 additional instrumentation was added to help address the +wait time attribution by looking at the <B>number</B> of waiting +processes in various states, rather than the state of a CPU with +reference to the cardinality of the various sets of waiting processes. +The wait I/O queue length is defined as the number of processes +waiting on events corresponding to the classifications W_IO, W_SWAP or +W_PIO. The metrics shown in the table below are computed on only <B>one</B> + of the CPUs (the "clock-master") each clock interrupt.</P> +<TABLE BORDER="1"> + <CAPTION ALIGN="BOTTOM"><B>Table 3: Raw PCP wait I/O queue length + metrics</B></CAPTION> + <TR VALIGN="TOP"> + <TH>PCP Metric</TH> + <TH>Semantics</TH> + </TR> + <TR VALIGN="TOP"> + <TD><I><TT>kernel.all.waitio.queue</TT></I></TD> + <TD>Cumulative total of the wait I/O queue lengths, as observed + on each clock interrupt.</TD> + </TR> + <TR VALIGN="TOP"> + <TD><I><TT>kernel.all.waitio.occ</TT></I></TD> + <TD>Cumulative total of the number of times the wait I/O queue + length is greater than zero, as observed on each clock interrupt.</TD> + </TR> +</TABLE> +<P> +These metrics may be used with PCP tools as follows:</P> +<UL> + <LI> + Displaying <I><TT>kernel.all.waitio.queue</TT></I> with <I>pmval</I>, <I>pmdumptext</I>, <I>pmchart</I>, + etc. will display the time average of the wait I/O queue length + multiplied by the frequency of clock interrupts, i.e. by 100. + <LI> + Displaying <I><TT>kernel.all.waitio.occ</TT></I> with <I>pmval</I>, <I>pmdumptext</I>, <I>pmchart</I>, + etc. will display the probability that the wait I/O queue is not empty + multiplied by the frequency of clock interrupts, i.e. by 100. This + value (converted to a percentage) is reported as <I><TT>%wioocc</TT></I> + by the <B>-q</B> option of <I>sar.</I> + <LI> + Using <I>pmie</I> with the expression<BR> + <I><TT>kernel.all.waitio.queue / kernel.all.waitio.occ</TT></I><BR> + will report the stochastic average of the wait I/O queue length, + conditional upon the queue not being empty. This value is reported as <I><TT>wioq-sz</TT></I> + by the <B>-q</B> option of <I>sar.</I> +</UL> + +<P><BR></P> +<TABLE WIDTH=100% BORDER=0 CELLPADDING=0 CELLSPACING=0 BGCOLOR="#e2e2e2"> + <TR><TD WIDTH=100% BGCOLOR="#081c59"><P ALIGN=LEFT><FONT SIZE=5 COLOR="#ffffff"><B>The per-CPU variants</B></FONT></P></TD></TR> +</TABLE> +<P> +Inside the kernel, most of the metrics described above are +accumulated per-CPU for reasons of efficiency (to reduce the locking +overheads and minimize dirty cache-line traffic).</P> +<P> +PCP exports the per-CPU versions of the system-wide metrics with metric +names formed by replacing <B><I><TT>all</TT></I></B> by <B><I><TT>percpu</TT></I></B>, +e.g. <I><TT>kernel.percpu.cpu.user</TT></I>.</P> +<P> + +<TABLE WIDTH=100% BORDER=0 CELLPADDING=10 CELLSPACING=20> + <TR><TD BGCOLOR="#e2e2e2" WIDTH=70%><BR><IMG SRC="images/stepfwd_on.png" WIDTH=16 HEIGHT=16 BORDER=0> The <I>mpvis</I> tool provides 3-D visualization of these per-CPU metrics.<PRE><B> +$ mpvis -a cpuperf/babylon.percpu +</B></PRE> +<BR> +When the window is shown, use the <A HREF="timecontrol.html">PCP Archive Time Control</A> dialog to scroll through the archive (Fast Forward). +</TD></TR> +</TABLE> + +<P><BR></P> +<TABLE WIDTH=100% BORDER=0 CELLPADDING=0 CELLSPACING=0 BGCOLOR="#e2e2e2"> + <TR><TD WIDTH=100% BGCOLOR="#081c59"><P ALIGN=LEFT><FONT SIZE=5 COLOR="#ffffff"><B>Reconciling sar -u and PCP CPU performance metrics</B></FONT></P></TD></TR> +</TABLE> +<P> +The <I>sar</I> metrics are scaled based on the number of CPUs and +expressed in percentages, PCP metrics are in units of milliseconds per +second after rate conversion; this explains the PCP metric <I><TT>hinv.ncpu</TT></I> + and the constants 100 and 1000 in the expressions below.</P> +<P> +When run with a <B>-u</B> option, <I>sar</I> reports the following:</P> +<TABLE BORDER="1"> + <CAPTION ALIGN="BOTTOM"><B>Table 3: PCP and sar metric equivalents</B></CAPTION> + <TR> + <TH><I>sar</I><BR> + metric</TH> + <TH>PCP equivalent (assuming rate conversion)</TH> + </TR> + <TR VALIGN="TOP"> + <TD><I><TT>%usr</TT></I></TD> + <TD>100 * <I><TT>kernel.all.cpu.user</TT></I> / (<I><TT>hinv.ncpu </TT></I>* + 1000)</TD> + </TR> + <TR VALIGN="TOP"> + <TD><I><TT>%sys</TT></I></TD> + <TD>100 * <I><TT>kernel.all.cpu.sys</TT></I> / (<I><TT>hinv.ncpu </TT></I>* + 1000)</TD> + </TR> + <TR VALIGN="TOP"> + <TD><I><TT>%intr</TT></I></TD> + <TD>100 * <I><TT>kernel.all.cpu.intr</TT></I> / (<I><TT>hinv.ncpu </TT></I>* + 1000)</TD> + </TR> + <TR VALIGN="TOP"> + <TD><I><TT>%wio</TT></I></TD> + <TD>100 * <I><TT>kernel.all.cpu.wait.total</TT></I> / (<I><TT>hinv.ncpu </TT></I>* + 1000)</TD> + </TR> + <TR VALIGN="TOP"> + <TD><I><TT>%idle</TT></I></TD> + <TD>100 * <I><TT>kernel.all.cpu.idle</TT></I> / (<I><TT>hinv.ncpu </TT></I>* + 1000)</TD> + </TR> + <TR VALIGN="TOP"> + <TD><I><TT>%sbrk</TT></I></TD> + <TD>100 * <I><TT>kernel.all.cpu.sxbrk </TT></I>/ (<I><TT>hinv.ncpu </TT></I>* + 1000)</TD> + </TR> + <TR VALIGN="TOP"> + <TD><I><TT>%wfs</TT></I></TD> + <TD>100 * <I><TT>kernel.all.cpu.wait.io</TT></I> / <I><TT>kernel.all.cpu.wait.total</TT></I></TD> + </TR> + <TR VALIGN="TOP"> + <TD><I><TT>%wswp</TT></I></TD> + <TD>100 * <I><TT>kernel.all.cpu.wait.swap</TT></I> / <I><TT>kernel.all.cpu.wait.total</TT></I></TD> + </TR> + <TR> + <TD><I><TT>%wphy</TT></I></TD> + <TD>100 * <I><TT>kernel.all.cpu.wait.pio</TT></I> / <I><TT>kernel.all.cpu.wait.total</TT></I></TD> + </TR> + <TR> + <TD><I><TT>%wgsw</TT></I></TD> + <TD>100 * <I><TT>kernel.all.cpu.wait.gfxc</TT></I> / <I><TT>kernel.all.cpu.wait.total</TT></I></TD> + </TR> + <TR> + <TD><I><TT>%wfif</TT></I></TD> + <TD>100 * <I><TT>kernel.all.cpu.wait.gfxf</TT></I> / <I><TT>kernel.all.cpu.wait.total</TT></I></TD> + </TR> +</TABLE> + +<P><BR></P> +<TABLE WIDTH=100% BORDER=0 CELLPADDING=0 CELLSPACING=0 BGCOLOR="#e2e2e2"> + <TR><TD WIDTH=100% BGCOLOR="#081c59"><P ALIGN=LEFT><FONT SIZE=5 COLOR="#ffffff"><B>The load average</B></FONT></P></TD></TR> +</TABLE> +<P> +The "load average" is reported by <I>uptime</I>, <I>top</I>, +etc. and the PCP metric <I><TT>kernel.all.load</TT></I>.</P> +<P> +The load average is an indirect measure of the demand for CPU +resources. It is calculated using the previous load average (<I>load</I>) +and the number of currently runnable processes (<I>nrun</I>) and an +exponential dampening expression, e.g. for the "1 minute" +average, the expression is:</P> +<PRE> +load = exp(-5/60) * load + (1 - exp(-5/60)) * nrun +</PRE> +<P> +The three load averages use different exponential constants and are all +re-computed every 5 seconds.</P> +<P> +<I>nrun</I> is computed as follows:</P> +<OL> + <LI> + Inspect every process. + <LI> + If the process is not likely to be runnable in the near future (state + not SRUN), ignore it. + <LI> + Inspect every thread of the process. + <LI> + If the thread is sleeping and not currently expanding its address space + (state not SXBRK) and not in a long-term sleep, increment <I>nrun.</I> + + <LI> + If the thread is stopped, ignore it. + <LI> + Otherwise if the thread is not "weightless" (being ignored by + the scheduler), increment <I>nrun.</I> +</OL> +<P> +Note that the "run queue length" (a variant of which is +reported by the <B>-q</B> option of <I>sar</I>) counts processes using +a similar, but not identical algorithm:</P> +<OL> + <LI> + Inspect every process. + <LI> + If the process is not likely to be runnable in the near future (state + not SRUN), ignore it. + <LI> + Inspect every thread of the process. + <LI> + If the thread is sleeping and not currently expanding its address space + (state not SXBRK), ignore it + <LI> + If the thread is stopped, ignore it. + <LI> + Otherwise increment the "run queue length". +</OL> + +<P><BR></P> +<HR> +<CENTER> +<TABLE WIDTH=100% BORDER=0 CELLPADDING=0 CELLSPACING=0> + <TR> <TD WIDTH=50%><P>Copyright © 2007-2010 <A HREF="http://www.aconex.com/"><FONT COLOR="#000060">Aconex</FONT></A><BR>Copyright © 2000-2004 <A HREF="http://www.sgi.com/"><FONT COLOR="#000060">Silicon Graphics Inc</FONT></P></TD> + <TD WIDTH=50%><P ALIGN=RIGHT><A HREF="http://pcp.io/"><FONT COLOR="#000060">PCP Site</FONT></A><BR>Copyright © 2012-2014 <A HREF="http://www.redhat.com/"><FONT COLOR="#000060">Red Hat</FONT></P></TD> </TR> +</TABLE> +</CENTER> +</BODY> +</HTML> |