![]() |
|
Home · Charts · Time Control |
![]() pmchart mpvis sar |
This chapter of the Performance Co-Pilot tutorial provides some hints on how to interpret and understand the various measures of system-level processor (CPU) performance.
All modern operating systems collect processor resource utilization at both the process-level and the system-level. This tutorial relate specifically to the system-level metrics.
For an explanation of Performance Co-Pilot terms and acronyms, consult the PCP glossary.
How the system-level CPU time is computed |
Both sar and Performance Co-Pilot (PCP) use a common collection of system-level CPU performance instrumentation from the kernel. This instrumentation is based upon statistical sampling of the state of each CPU in the kernel's software clock interrupt routine which is commonly called 100 times (HZ) per second on every CPU.
At each observation a CPU is attributed a quantum of 10 milliseconds of elapsed time to one of several counters based on the current state of the code executing on that CPU.
This sort of statistical sampling is subject to some anomalies, particularly when activity is strongly correlated with the clock interrupts, however the distribution of observations over several seconds or minutes is often an accurate reflection of the true distribution of CPU time. The kernel profiling mechanisms offer higher resolution should that be required, however that is beyond the scope of this document.
The CPU state is determined by considering what the CPU was doing just before the clock interrupt, as follows:
These states are mutually exclusive and complete, so exactly one state is assigned for each CPU at each clock interrupt.
The kernel agent for PCP exports the following metrics:
PCP Metric | Semantics |
---|---|
kernel.all.cpu.user | Time counted when in CPU_USER state. |
kernel.all.cpu.sys | Time counted when in CPU_KERNEL state. |
kernel.all.cpu.intr | Time counted when in CPU_INTR state. |
kernel.all.cpu.sxbrk | Time counted when in CPU_SXBRK state (IRIX only). |
kernel.all.cpu.wait.total | Time counted when in CPU_WAIT state (UNIX only). |
kernel.all.cpu.idle | Time counted when in CPU_IDLE state. |
These metrics are all "counters" in units of milliseconds (cumulative since system boot time) so when displayed with most PCP tools they are "rate converted" (sampled periodically and the differences between consecutive values converted to time utilization in units of milliseconds per second over the sample interval). Since the raw values are aggregated across all CPUs, the time utilization for any of the metrics above is in the range 0 to N*1000 for an N CPU system; for some PCP tools this is reported as a percentage in the range 0 to N*100 percent.
![]() $ source /etc/pcp.conf $ tar xzf $PCP_DEMOS_DIR/tutorials/cpuperf.tgz $ pmchart -c CPU -t 2sec -O -0sec -a cpuperf/moomba.pmkstat This command will provide the interactive charts described here. |
On IRIX, the CPU_WAIT state is further subdivided into components describing different types of "waiting":
More than one of the group { W_IO, W_SWAP, W_PIO } can be true each time, however this group and W_GFXC and W_GFXF are all mutually exclusive. If the state is CPU_WAIT, then at least one of the classifications must be true.
The IRIX agent for PCP exports the following CPU "wait" metrics, the sum of which approximately equals kernel.all.cpu.wait.total:
PCP Metric | Semantics |
---|---|
kernel.all.cpu.wait.gfxc | Time counted when W_GFXC is true. |
kernel.all.cpu.wait.gfxf | Time counted when W_GFXF is true. |
kernel.all.cpu.wait.io | Time counted when W_IO is true. |
kernel.all.cpu.wait.pio | Time counted when W_SWAP is true. |
kernel.all.cpu.wait.swap | Time counted when W_PIO is true. |
These metrics are all "counters" in units of milliseconds (cumulative since system boot time) so when displayed with most PCP tools they are "rate converted" (sampled periodically and the differences between consecutive values converted to time utilization in units of milliseconds per second over the sample interval). Since the raw values are aggregated across all CPUs, the time utilization for any of the metrics above is in the range 0 to N*1000 for an N CPU system; for some PCP tools this is reported as a percentage in the range 0 to N*100 percent.
Note that for a multiprocessor system with one I/O pending, all otherwise idle CPUs will be assigned the CPU_WAIT state. This may lead to an over-estimate of the I/O wait time, as discussed in the companion How to understand measures of disk performance document.
In IRIX 6.5.2 additional instrumentation was added to help address the wait time attribution by looking at the number of waiting processes in various states, rather than the state of a CPU with reference to the cardinality of the various sets of waiting processes. The wait I/O queue length is defined as the number of processes waiting on events corresponding to the classifications W_IO, W_SWAP or W_PIO. The metrics shown in the table below are computed on only one of the CPUs (the "clock-master") each clock interrupt.
PCP Metric | Semantics |
---|---|
kernel.all.waitio.queue | Cumulative total of the wait I/O queue lengths, as observed on each clock interrupt. |
kernel.all.waitio.occ | Cumulative total of the number of times the wait I/O queue length is greater than zero, as observed on each clock interrupt. |
These metrics may be used with PCP tools as follows:
The per-CPU variants |
Inside the kernel, most of the metrics described above are accumulated per-CPU for reasons of efficiency (to reduce the locking overheads and minimize dirty cache-line traffic).
PCP exports the per-CPU versions of the system-wide metrics with metric names formed by replacing all by percpu, e.g. kernel.percpu.cpu.user.
![]() $ mpvis -a cpuperf/babylon.percpu When the window is shown, use the PCP Archive Time Control dialog to scroll through the archive (Fast Forward). |
Reconciling sar -u and PCP CPU performance metrics |
The sar metrics are scaled based on the number of CPUs and expressed in percentages, PCP metrics are in units of milliseconds per second after rate conversion; this explains the PCP metric hinv.ncpu and the constants 100 and 1000 in the expressions below.
When run with a -u option, sar reports the following:
sar metric |
PCP equivalent (assuming rate conversion) |
---|---|
%usr | 100 * kernel.all.cpu.user / (hinv.ncpu * 1000) |
%sys | 100 * kernel.all.cpu.sys / (hinv.ncpu * 1000) |
%intr | 100 * kernel.all.cpu.intr / (hinv.ncpu * 1000) |
%wio | 100 * kernel.all.cpu.wait.total / (hinv.ncpu * 1000) |
%idle | 100 * kernel.all.cpu.idle / (hinv.ncpu * 1000) |
%sbrk | 100 * kernel.all.cpu.sxbrk / (hinv.ncpu * 1000) |
%wfs | 100 * kernel.all.cpu.wait.io / kernel.all.cpu.wait.total |
%wswp | 100 * kernel.all.cpu.wait.swap / kernel.all.cpu.wait.total |
%wphy | 100 * kernel.all.cpu.wait.pio / kernel.all.cpu.wait.total |
%wgsw | 100 * kernel.all.cpu.wait.gfxc / kernel.all.cpu.wait.total |
%wfif | 100 * kernel.all.cpu.wait.gfxf / kernel.all.cpu.wait.total |
The load average |
The "load average" is reported by uptime, top, etc. and the PCP metric kernel.all.load.
The load average is an indirect measure of the demand for CPU resources. It is calculated using the previous load average (load) and the number of currently runnable processes (nrun) and an exponential dampening expression, e.g. for the "1 minute" average, the expression is:
load = exp(-5/60) * load + (1 - exp(-5/60)) * nrun
The three load averages use different exponential constants and are all re-computed every 5 seconds.
nrun is computed as follows:
Note that the "run queue length" (a variant of which is reported by the -q option of sar) counts processes using a similar, but not identical algorithm:
Copyright © 2007-2010 Aconex |