diff options
Diffstat (limited to 'src/pmdas/hotproc/README')
-rw-r--r-- | src/pmdas/hotproc/README | 141 |
1 files changed, 141 insertions, 0 deletions
diff --git a/src/pmdas/hotproc/README b/src/pmdas/hotproc/README new file mode 100644 index 0000000..be8c5a4 --- /dev/null +++ b/src/pmdas/hotproc/README @@ -0,0 +1,141 @@ +Performance Co-Pilot hotproc PMDA for Active Process Monitoring +=============================================================== + +This PMDA is designed to be configurable to monitor processes which +the administrator deems "hot" or "interesting." The PMDA is similar +to the proc PMDA except in two main aspects: + +(i) it extends the proc metric set by: + hotproc.cpuburn, + hotproc.control.*, + hotproc.predicate.*, + hotproc.total.* . + +(ii) it allows one to retrieve all the instances. + +It is allowed to retrieve all the instances because the set of +instances is restricted by a predicate specified in a configuration +file. The predicate specifies what processes are "interesting", for +example, + + (cpuburn > 0.1 && uname == "root") + +and it applies this predicate every <refresh> seconds. + +Therefore, hotproc.nprocs now refers to the number of "interesting" +processes instead of the list of all the processes. + +To monitor how successful (according to activity) that the +configuration predicate and refresh interval are, the hotproc.total.* +metrics can be used. For example, hotproc.total.cpuother.transient +shows how much of the cpu that transient processes (ones which do not +live for the refresh interval) get. If one is interested in some of +these processes then reducing the refresh interval may catch them. +Hotproc.total.cpuother.not_cpuburn indicates how much of the cpu that +the "uninteresting" processes are getting. On the basis of this value, +one may decide to change what is "interesting" by modifying the +configuration predicate. If one wants to get a simple indication of how +much of the cpu that all of the transient and "uninteresting" processes +are getting, then hotproc.total.cpuother.percent is the answer. + +In order to see why the instances (processes) of the hotproc agent +were chosen, one can check the hotproc.predicate.* metrics. These +metrics show the values used by the predicate evaluation at the +last refresh of the instance domain. For example, if one used a +predicate of (syscalls > 100), then doing: + $ pminfo -f hotproc.predicate.syscalls +will show the values of the system call rates of the processes +which satisfy the predicate (i.e. are greater than 100 per second +over the last refresh interval). + +Metrics +======= + +The file ./help contains descriptions for all of the metrics exported +by this PMDA. + +Once the PMDA has been installed, the following command will list all +the available metrics and their explanatory "help" text: + + $ pminfo -fT hotproc + +Installation of the hotproc PMDA +================================ + + + # cd $PCP_PMDAS_DIR/hotproc + + + Check that there is no clash with the Performance Metrics Domain + number defined in domain.h and the other PMDAs currently in use + (see $PCP_PMCDCONF_PATH). If there is, edit domain.h and choose + another domain number. + + + Inspect the ./sample.conf file and either modify it or create a new + configuration file that suits your need for "interesting". See + pmdahotproc(1) for configuration specification. + + + Then run the Install script (as root) + + # ./Install + + and choose both the "collector" and "monitor" installation + configuration options. + + Answer the questions, which include the option to specify the + configuration file that you created. You will also need to specify + a refresh interval which determines how often the "hot" predicate + is used over the current set of processes. A smaller number will + mean that the predicate will be able to choose processes which have + short lives or sporadic activity but will consume more CPU because + it is run more often. + + + At the end of installation a check is made to verify that the + metrics of the agent can be retrieved. The reported number from this + check will be low because most of the hotproc metrics will not be + available until after the first refresh interval. + +Special TRIX Installation Considerations +======================================== + + For SGI Trix systems, the hotproc PMDA needs the CAP_MAC_READ + capability in addition to the default capability (CAP_SCHED_MGT), + before it can interrogate the resource utilization of all processes. + + To achieve this, run the ./Install script as described above, then + + 1. edit /etc/pmcd.conf and for the pmdahotproc line, replace the + pmda invocation arguments + $PCP_PMDAS_DIR/hotproc/pmdahotproc ... + by + /sbin/suattr -C CAP_SCHED_MGT,CAP_MAC_READ+ipe -c "$PCP_PMDAS_DIR/hotproc/pmdahotproc ..." + + 2. restart pmcd + # /etc/init.d/pcp start + + Thanks to Roald Lygre for this recipe. + +De-installation +=============== + +Simply use + + # cd $PCP_PMDAS_DIR/hotproc + # ./Remove + +Changing the settings +===================== + +The refresh period can be dynamically modified using +pmstore(1) for the metric hotproc.control.refresh. + +To make permanent changes, re-run the Install script. + +Troubleshooting +=============== + + + After installing or restarting the agent, the PMCD log file + ($PCP_LOG_DIR/pmcd/pmcd.log) and the PMDA log file + ($PCP_LOG_DIR/pmcd/hotproc.log) should be checked for any warnings + or errors. + + + If the Install script reports some warnings when checking the + metrics, the problem should be listed in one of the log files. |