summaryrefslogtreecommitdiff
path: root/src/pmdas/hotproc/README
diff options
context:
space:
mode:
Diffstat (limited to 'src/pmdas/hotproc/README')
-rw-r--r--src/pmdas/hotproc/README141
1 files changed, 141 insertions, 0 deletions
diff --git a/src/pmdas/hotproc/README b/src/pmdas/hotproc/README
new file mode 100644
index 0000000..be8c5a4
--- /dev/null
+++ b/src/pmdas/hotproc/README
@@ -0,0 +1,141 @@
+Performance Co-Pilot hotproc PMDA for Active Process Monitoring
+===============================================================
+
+This PMDA is designed to be configurable to monitor processes which
+the administrator deems "hot" or "interesting." The PMDA is similar
+to the proc PMDA except in two main aspects:
+
+(i) it extends the proc metric set by:
+ hotproc.cpuburn,
+ hotproc.control.*,
+ hotproc.predicate.*,
+ hotproc.total.* .
+
+(ii) it allows one to retrieve all the instances.
+
+It is allowed to retrieve all the instances because the set of
+instances is restricted by a predicate specified in a configuration
+file. The predicate specifies what processes are "interesting", for
+example,
+
+ (cpuburn > 0.1 && uname == "root")
+
+and it applies this predicate every <refresh> seconds.
+
+Therefore, hotproc.nprocs now refers to the number of "interesting"
+processes instead of the list of all the processes.
+
+To monitor how successful (according to activity) that the
+configuration predicate and refresh interval are, the hotproc.total.*
+metrics can be used. For example, hotproc.total.cpuother.transient
+shows how much of the cpu that transient processes (ones which do not
+live for the refresh interval) get. If one is interested in some of
+these processes then reducing the refresh interval may catch them.
+Hotproc.total.cpuother.not_cpuburn indicates how much of the cpu that
+the "uninteresting" processes are getting. On the basis of this value,
+one may decide to change what is "interesting" by modifying the
+configuration predicate. If one wants to get a simple indication of how
+much of the cpu that all of the transient and "uninteresting" processes
+are getting, then hotproc.total.cpuother.percent is the answer.
+
+In order to see why the instances (processes) of the hotproc agent
+were chosen, one can check the hotproc.predicate.* metrics. These
+metrics show the values used by the predicate evaluation at the
+last refresh of the instance domain. For example, if one used a
+predicate of (syscalls > 100), then doing:
+ $ pminfo -f hotproc.predicate.syscalls
+will show the values of the system call rates of the processes
+which satisfy the predicate (i.e. are greater than 100 per second
+over the last refresh interval).
+
+Metrics
+=======
+
+The file ./help contains descriptions for all of the metrics exported
+by this PMDA.
+
+Once the PMDA has been installed, the following command will list all
+the available metrics and their explanatory "help" text:
+
+ $ pminfo -fT hotproc
+
+Installation of the hotproc PMDA
+================================
+
+ + # cd $PCP_PMDAS_DIR/hotproc
+
+ + Check that there is no clash with the Performance Metrics Domain
+ number defined in domain.h and the other PMDAs currently in use
+ (see $PCP_PMCDCONF_PATH). If there is, edit domain.h and choose
+ another domain number.
+
+ + Inspect the ./sample.conf file and either modify it or create a new
+ configuration file that suits your need for "interesting". See
+ pmdahotproc(1) for configuration specification.
+
+ + Then run the Install script (as root)
+
+ # ./Install
+
+ and choose both the "collector" and "monitor" installation
+ configuration options.
+
+ Answer the questions, which include the option to specify the
+ configuration file that you created. You will also need to specify
+ a refresh interval which determines how often the "hot" predicate
+ is used over the current set of processes. A smaller number will
+ mean that the predicate will be able to choose processes which have
+ short lives or sporadic activity but will consume more CPU because
+ it is run more often.
+
+ + At the end of installation a check is made to verify that the
+ metrics of the agent can be retrieved. The reported number from this
+ check will be low because most of the hotproc metrics will not be
+ available until after the first refresh interval.
+
+Special TRIX Installation Considerations
+========================================
+
+ For SGI Trix systems, the hotproc PMDA needs the CAP_MAC_READ
+ capability in addition to the default capability (CAP_SCHED_MGT),
+ before it can interrogate the resource utilization of all processes.
+
+ To achieve this, run the ./Install script as described above, then
+
+ 1. edit /etc/pmcd.conf and for the pmdahotproc line, replace the
+ pmda invocation arguments
+ $PCP_PMDAS_DIR/hotproc/pmdahotproc ...
+ by
+ /sbin/suattr -C CAP_SCHED_MGT,CAP_MAC_READ+ipe -c "$PCP_PMDAS_DIR/hotproc/pmdahotproc ..."
+
+ 2. restart pmcd
+ # /etc/init.d/pcp start
+
+ Thanks to Roald Lygre for this recipe.
+
+De-installation
+===============
+
+Simply use
+
+ # cd $PCP_PMDAS_DIR/hotproc
+ # ./Remove
+
+Changing the settings
+=====================
+
+The refresh period can be dynamically modified using
+pmstore(1) for the metric hotproc.control.refresh.
+
+To make permanent changes, re-run the Install script.
+
+Troubleshooting
+===============
+
+ + After installing or restarting the agent, the PMCD log file
+ ($PCP_LOG_DIR/pmcd/pmcd.log) and the PMDA log file
+ ($PCP_LOG_DIR/pmcd/hotproc.log) should be checked for any warnings
+ or errors.
+
+ + If the Install script reports some warnings when checking the
+ metrics, the problem should be listed in one of the log files.