src/pmdas/hotproc/README


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141

Performance Co-Pilot hotproc PMDA for Active Process Monitoring
===============================================================

This PMDA is designed to be configurable to monitor processes which
the administrator deems "hot" or "interesting." The PMDA is similar 
to the proc PMDA except in two main aspects:

(i) it extends the proc metric set by:
    hotproc.cpuburn,
    hotproc.control.*,
    hotproc.predicate.*,
    hotproc.total.* .

(ii) it allows one to retrieve all the instances.

It is allowed to retrieve all the instances because the set of
instances is restricted by a predicate specified in a configuration
file.  The predicate specifies what processes are "interesting", for
example,

    (cpuburn > 0.1 && uname == "root")

and it applies this predicate every <refresh> seconds.

Therefore, hotproc.nprocs now refers to the number of "interesting"
processes instead of the list of all the processes.

To monitor how successful (according to activity) that the
configuration predicate and refresh interval are, the hotproc.total.*
metrics can be used.  For example, hotproc.total.cpuother.transient
shows how much of the cpu that transient processes (ones which do not
live for the refresh interval) get.  If one is interested in some of
these processes then reducing the refresh interval may catch them.
Hotproc.total.cpuother.not_cpuburn indicates how much of the cpu that
the "uninteresting" processes are getting.  On the basis of this value,
one may decide to change what is "interesting" by modifying the
configuration predicate.  If one wants to get a simple indication of how
much of the cpu that all of the transient and "uninteresting" processes
are getting, then hotproc.total.cpuother.percent is the answer.

In order to see why the instances (processes) of the hotproc agent
were chosen, one can check the hotproc.predicate.* metrics. These
metrics show the values used by the predicate evaluation at the
last refresh of the instance domain. For example, if one used a
predicate of (syscalls > 100), then doing:
    $ pminfo -f hotproc.predicate.syscalls 
will show the values of the system call rates of the processes
which satisfy the predicate (i.e. are greater than 100 per second
over the last refresh interval).

Metrics
=======

The file ./help contains descriptions for all of the metrics exported
by this PMDA.

Once the PMDA has been installed, the following command will list all
the available metrics and their explanatory "help" text:

	$ pminfo -fT hotproc

Installation of the hotproc PMDA
================================

 +  # cd $PCP_PMDAS_DIR/hotproc

 +  Check that there is no clash with the Performance Metrics Domain
    number defined in domain.h and the other PMDAs currently in use
    (see $PCP_PMCDCONF_PATH).  If there is, edit domain.h and choose
    another domain number.

 +  Inspect the ./sample.conf file and either modify it or create a new
    configuration file that suits your need for "interesting".  See
    pmdahotproc(1) for configuration specification.

 +  Then run the Install script (as root)

	# ./Install

    and choose both the "collector" and "monitor" installation
    configuration options.

    Answer the questions, which include the option to specify the
    configuration file that you created.  You will also need to specify
    a refresh interval which determines how often the "hot" predicate
    is used over the current set of processes.  A smaller number will
    mean that the predicate will be able to choose processes which have
    short lives or sporadic activity but will consume more CPU because
    it is run more often.

 +  At the end of installation a check is made to verify that the
    metrics of the agent can be retrieved.  The reported number from this
    check will be low because most of the hotproc metrics will not be
    available until after the first refresh interval.

Special TRIX Installation Considerations
========================================

    For SGI Trix systems, the hotproc PMDA needs the CAP_MAC_READ
    capability in addition to the default capability (CAP_SCHED_MGT),
    before it can interrogate the resource utilization of all processes.

    To achieve this, run the ./Install script as described above, then

    1. edit /etc/pmcd.conf and for the pmdahotproc line, replace the
       pmda invocation arguments
	    $PCP_PMDAS_DIR/hotproc/pmdahotproc ...
       by
	    /sbin/suattr -C CAP_SCHED_MGT,CAP_MAC_READ+ipe -c "$PCP_PMDAS_DIR/hotproc/pmdahotproc ..."

    2. restart pmcd
	    # /etc/init.d/pcp start

    Thanks to Roald Lygre for this recipe.

De-installation
===============

Simply use

	# cd $PCP_PMDAS_DIR/hotproc
	# ./Remove

Changing the settings
=====================

The refresh period can be dynamically modified using
pmstore(1) for the metric hotproc.control.refresh.

To make permanent changes, re-run the Install script.

Troubleshooting
===============

 +  After installing or restarting the agent, the PMCD log file
    ($PCP_LOG_DIR/pmcd/pmcd.log) and the PMDA log file
    ($PCP_LOG_DIR/pmcd/hotproc.log) should be checked for any warnings
    or errors.

 +  If the Install script reports some warnings when checking the
    metrics, the problem should be listed in one of the log files.