From 47e6e7c84f008a53061e661f31ae96629bc694ef Mon Sep 17 00:00:00 2001 From: Igor Pashev Date: Sun, 26 Oct 2014 12:33:50 +0400 Subject: Debian 3.9.10 --- man/html/howto.cpuperf.html | 430 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 430 insertions(+) create mode 100644 man/html/howto.cpuperf.html (limited to 'man/html/howto.cpuperf.html') diff --git a/man/html/howto.cpuperf.html b/man/html/howto.cpuperf.html new file mode 100644 index 0000000..31fc5d3 --- /dev/null +++ b/man/html/howto.cpuperf.html @@ -0,0 +1,430 @@ + + + + + + + + + Understanding system-level processor performance + + + + + + + +

Home · Charts · Time Control

Understanding measures of system-level processor performance

+ + +

Tools

+pmchart
+mpvis
+sar
+

This chapter of the Performance Co-Pilot tutorial provides some hints +on how to interpret and understand the various measures of system-level +processor (CPU) performance.

All modern operating systems collect processor resource utilization at both +the process-level and the system-level. This tutorial relate specifically to the system-level metrics.

For an explanation of Performance Co-Pilot terms and acronyms, consult +the PCP glossary.

+ +

+ + +

How the system-level CPU time is computed

+Both sar and Performance Co-Pilot (PCP) use a common collection +of system-level CPU performance instrumentation from the kernel. +This instrumentation is based upon statistical sampling of the state of each + CPU in the kernel's software clock interrupt routine which is commonly +called 100 times (HZ) per second on every CPU.

+At each observation a CPU is attributed a quantum of 10 milliseconds of +elapsed time to one of several counters based on the current state of +the code executing on that CPU.

+This sort of statistical sampling is subject to some anomalies, +particularly when activity is strongly correlated with the clock +interrupts, however the distribution of observations over several +seconds or minutes is often an accurate reflection of the true +distribution of CPU time. The kernel profiling mechanisms offer higher +resolution should that be required, however that is beyond the scope +of this document.

+The CPU state is determined by considering what the CPU was doing just +before the clock interrupt, as follows:

+ If executing a user thread (i.e. above the kernel system + call interface for some process) then the state is CPU_USER. +
+ If executing a kernel interrupt thread, then the state is CPU_INTR. +
+ If executing a kernel thread waiting for a graphics event, then the + state is CPU_WAIT. +
+ If otherwise executing a kernel thread, then the state is CPU_KERNEL. +
+ If not executing a kernel thread and some I/O is pending, then the + state is CPU_WAIT. +
+ If not executing a kernel thread and no I/O is pending and some user + thread is paused waiting for memory to become available, then the state + is CPU_SXBRK. +
+ Otherwise the state is CPU_IDLE. +

+These states are mutually exclusive and complete, so exactly one state +is assigned for each CPU at each clock interrupt.

+The kernel agent for PCP exports the following metrics:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +

**Table 1: Raw PCP CPU metrics**
PCP Metric	Semantics
`kernel.all.cpu.user`	Time counted when in CPU_USER state.
`kernel.all.cpu.sys`	Time counted when in CPU_KERNEL state.
`kernel.all.cpu.intr`	Time counted when in CPU_INTR state.
`kernel.all.cpu.sxbrk`	Time counted when in CPU_SXBRK state (IRIX only).
`kernel.all.cpu.wait.total`	Time counted when in CPU_WAIT state (UNIX only).
`kernel.all.cpu.idle`	Time counted when in CPU_IDLE state.

+These metrics are all "counters" in units of milliseconds +(cumulative since system boot time) so when displayed with most PCP +tools they are "rate converted" (sampled periodically and the +differences between consecutive values converted to time utilization in +units of milliseconds per second over the sample interval). Since the +raw values are aggregated across all CPUs, the time utilization for any +of the metrics above is in the range 0 to N*1000 for an N CPU system; +for some PCP tools this is reported as a percentage in the range 0 to +N*100 percent.

+ + + +

Using pmchart to display CPU activity (aggregated over all CPUs).
+


+$ source /etc/pcp.conf
+$ tar xzf $PCP_DEMOS_DIR/tutorials/cpuperf.tgz
+$ pmchart -c CPU -t 2sec -O -0sec -a cpuperf/moomba.pmkstat
+

+
+This command will provide the interactive charts described here. +

+ +

On IRIX, the CPU_WAIT state is further subdivided into components describing +different types of "waiting":

+ If executing a kernel thread waiting for a graphics context switch, + then the waiting classification W_GFXC is true. +
+ If executing a kernel thread waiting for a graphics FIFO operation to + complete, then the waiting classification W_GFXF is true. +
+ If not executing any thread and an I/O involving a block device (most + likely associated with a file system but independent of the CPU from + which the I/O was initiated), then the waiting classification W_IO is + true. +
+ If not executing any thread and an I/O involving a swap operation + (independent of the CPU from which the I/O was initiated), then the + waiting classification W_SWAP is true. +
+ If not executing any thread and an I/O involving a raw device is + pending (independent of the CPU from which the I/O was initiated), then + the waiting classification W_PIO is true. +

+More than one of the group { W_IO, W_SWAP, W_PIO } can be true each +time, however this group and W_GFXC and W_GFXF are all mutually +exclusive. If the state is CPU_WAIT, then at least one of the +classifications must be true.

+The IRIX agent for PCP exports the following CPU "wait" +metrics, the sum of which approximately equals kernel.all.cpu.wait.total:

+ + + + + + + + + + + + + + + + + + + + + + + + + + +

**Table 2: Raw PCP CPU wait metrics**
PCP Metric	Semantics
`kernel.all.cpu.wait.gfxc`	Time counted when W_GFXC is true.
`kernel.all.cpu.wait.gfxf`	Time counted when W_GFXF is true.
`kernel.all.cpu.wait.io`	Time counted when W_IO is true.
`kernel.all.cpu.wait.pio`	Time counted when W_SWAP is true.
`kernel.all.cpu.wait.swap`	Time counted when W_PIO is true.

+Note that for a multiprocessor system with one I/O pending, all + otherwise idle CPUs will be assigned the CPU_WAIT state. This may lead +to an over-estimate of the I/O wait time, as discussed in the +companion How to understand measures of +disk performance document.

+In IRIX 6.5.2 additional instrumentation was added to help address the +wait time attribution by looking at the number of waiting +processes in various states, rather than the state of a CPU with +reference to the cardinality of the various sets of waiting processes. +The wait I/O queue length is defined as the number of processes +waiting on events corresponding to the classifications W_IO, W_SWAP or +W_PIO. The metrics shown in the table below are computed on only one + of the CPUs (the "clock-master") each clock interrupt.

+ + + + + + + + + + + + + + +

**Table 3: Raw PCP wait I/O queue length + metrics**
PCP Metric	Semantics
`kernel.all.waitio.queue`	Cumulative total of the wait I/O queue lengths, as observed + on each clock interrupt.
`kernel.all.waitio.occ`	Cumulative total of the number of times the wait I/O queue + length is greater than zero, as observed on each clock interrupt.

+These metrics may be used with PCP tools as follows:

+ Displaying kernel.all.waitio.queue with pmval, pmdumptext, pmchart, + etc. will display the time average of the wait I/O queue length + multiplied by the frequency of clock interrupts, i.e. by 100. +
+ Displaying kernel.all.waitio.occ with pmval, pmdumptext, pmchart, + etc. will display the probability that the wait I/O queue is not empty + multiplied by the frequency of clock interrupts, i.e. by 100. This + value (converted to a percentage) is reported as %wioocc + by the -q option of sar. +
+ Using pmie with the expression
+ kernel.all.waitio.queue / kernel.all.waitio.occ
+ will report the stochastic average of the wait I/O queue length, + conditional upon the queue not being empty. This value is reported as wioq-sz + by the -q option of sar. +

+ +

+ + +

The per-CPU variants

+Inside the kernel, most of the metrics described above are +accumulated per-CPU for reasons of efficiency (to reduce the locking +overheads and minimize dirty cache-line traffic).

+PCP exports the per-CPU versions of the system-wide metrics with metric +names formed by replacing all by percpu, +e.g. kernel.percpu.cpu.user.

+ + + +

The mpvis tool provides 3-D visualization of these per-CPU metrics.
+$ mpvis -a cpuperf/babylon.percpu +
+
+When the window is shown, use the PCP Archive Time Control dialog to scroll through the archive (Fast Forward). +
+ +

+ + +

Reconciling sar -u and PCP CPU performance metrics

+The sar metrics are scaled based on the number of CPUs and +expressed in percentages, PCP metrics are in units of milliseconds per +second after rate conversion; this explains the PCP metric hinv.ncpu + and the constants 100 and 1000 in the expressions below.

+When run with a -u option, sar reports the following:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +

**Table 3: PCP and sar metric equivalents**
sar + metric	PCP equivalent (assuming rate conversion)
`%usr`	100 * `kernel.all.cpu.user` / (`hinv.ncpu` * + 1000)
`%sys`	100 * `kernel.all.cpu.sys` / (`hinv.ncpu` * + 1000)
`%intr`	100 * `kernel.all.cpu.intr` / (`hinv.ncpu` * + 1000)
`%wio`	100 * `kernel.all.cpu.wait.total` / (`hinv.ncpu` * + 1000)
`%idle`	100 * `kernel.all.cpu.idle` / (`hinv.ncpu` * + 1000)
`%sbrk`	100 * `kernel.all.cpu.sxbrk` / (`hinv.ncpu` * + 1000)
`%wfs`	100 * `kernel.all.cpu.wait.io` / `kernel.all.cpu.wait.total`
`%wswp`	100 * `kernel.all.cpu.wait.swap` / `kernel.all.cpu.wait.total`
`%wphy`	100 * `kernel.all.cpu.wait.pio` / `kernel.all.cpu.wait.total`
`%wgsw`	100 * `kernel.all.cpu.wait.gfxc` / `kernel.all.cpu.wait.total`
`%wfif`	100 * `kernel.all.cpu.wait.gfxf` / `kernel.all.cpu.wait.total`

+ +

+ + +

The load average

+The "load average" is reported by uptime, top, +etc. and the PCP metric kernel.all.load.

+The load average is an indirect measure of the demand for CPU +resources. It is calculated using the previous load average (load) +and the number of currently runnable processes (nrun) and an +exponential dampening expression, e.g. for the "1 minute" +average, the expression is:

+load = exp(-5/60) * load + (1 - exp(-5/60)) * nrun
+

+The three load averages use different exponential constants and are all +re-computed every 5 seconds.

+nrun is computed as follows:

+ Inspect every process. +
+ If the process is not likely to be runnable in the near future (state + not SRUN), ignore it. +
+ Inspect every thread of the process. +
+ If the thread is sleeping and not currently expanding its address space + (state not SXBRK) and not in a long-term sleep, increment nrun. + +
+ If the thread is stopped, ignore it. +
+ Otherwise if the thread is not "weightless" (being ignored by + the scheduler), increment nrun. +

+Note that the "run queue length" (a variant of which is +reported by the -q option of sar) counts processes using +a similar, but not identical algorithm:

+ Inspect every process. +
+ If the process is not likely to be runnable in the near future (state + not SRUN), ignore it. +
+ Inspect every thread of the process. +
+ If the thread is sleeping and not currently expanding its address space + (state not SXBRK), ignore it +
+ If the thread is stopped, ignore it. +
+ Otherwise increment the "run queue length". +

+ +

+ + + + +

+ + + -- cgit v1.2.3