Performance Co-Pilot hotproc PMDA for Active Process Monitoring =============================================================== This PMDA is designed to be configurable to monitor processes which the administrator deems "hot" or "interesting." The PMDA is similar to the proc PMDA except in two main aspects: (i) it extends the proc metric set by: hotproc.cpuburn, hotproc.control.*, hotproc.predicate.*, hotproc.total.* . (ii) it allows one to retrieve all the instances. It is allowed to retrieve all the instances because the set of instances is restricted by a predicate specified in a configuration file. The predicate specifies what processes are "interesting", for example, (cpuburn > 0.1 && uname == "root") and it applies this predicate every seconds. Therefore, hotproc.nprocs now refers to the number of "interesting" processes instead of the list of all the processes. To monitor how successful (according to activity) that the configuration predicate and refresh interval are, the hotproc.total.* metrics can be used. For example, hotproc.total.cpuother.transient shows how much of the cpu that transient processes (ones which do not live for the refresh interval) get. If one is interested in some of these processes then reducing the refresh interval may catch them. Hotproc.total.cpuother.not_cpuburn indicates how much of the cpu that the "uninteresting" processes are getting. On the basis of this value, one may decide to change what is "interesting" by modifying the configuration predicate. If one wants to get a simple indication of how much of the cpu that all of the transient and "uninteresting" processes are getting, then hotproc.total.cpuother.percent is the answer. In order to see why the instances (processes) of the hotproc agent were chosen, one can check the hotproc.predicate.* metrics. These metrics show the values used by the predicate evaluation at the last refresh of the instance domain. For example, if one used a predicate of (syscalls > 100), then doing: $ pminfo -f hotproc.predicate.syscalls will show the values of the system call rates of the processes which satisfy the predicate (i.e. are greater than 100 per second over the last refresh interval). Metrics ======= The file ./help contains descriptions for all of the metrics exported by this PMDA. Once the PMDA has been installed, the following command will list all the available metrics and their explanatory "help" text: $ pminfo -fT hotproc Installation of the hotproc PMDA ================================ + # cd $PCP_PMDAS_DIR/hotproc + Check that there is no clash with the Performance Metrics Domain number defined in domain.h and the other PMDAs currently in use (see $PCP_PMCDCONF_PATH). If there is, edit domain.h and choose another domain number. + Inspect the ./sample.conf file and either modify it or create a new configuration file that suits your need for "interesting". See pmdahotproc(1) for configuration specification. + Then run the Install script (as root) # ./Install and choose both the "collector" and "monitor" installation configuration options. Answer the questions, which include the option to specify the configuration file that you created. You will also need to specify a refresh interval which determines how often the "hot" predicate is used over the current set of processes. A smaller number will mean that the predicate will be able to choose processes which have short lives or sporadic activity but will consume more CPU because it is run more often. + At the end of installation a check is made to verify that the metrics of the agent can be retrieved. The reported number from this check will be low because most of the hotproc metrics will not be available until after the first refresh interval. Special TRIX Installation Considerations ======================================== For SGI Trix systems, the hotproc PMDA needs the CAP_MAC_READ capability in addition to the default capability (CAP_SCHED_MGT), before it can interrogate the resource utilization of all processes. To achieve this, run the ./Install script as described above, then 1. edit /etc/pmcd.conf and for the pmdahotproc line, replace the pmda invocation arguments $PCP_PMDAS_DIR/hotproc/pmdahotproc ... by /sbin/suattr -C CAP_SCHED_MGT,CAP_MAC_READ+ipe -c "$PCP_PMDAS_DIR/hotproc/pmdahotproc ..." 2. restart pmcd # /etc/init.d/pcp start Thanks to Roald Lygre for this recipe. De-installation =============== Simply use # cd $PCP_PMDAS_DIR/hotproc # ./Remove Changing the settings ===================== The refresh period can be dynamically modified using pmstore(1) for the metric hotproc.control.refresh. To make permanent changes, re-run the Install script. Troubleshooting =============== + After installing or restarting the agent, the PMCD log file ($PCP_LOG_DIR/pmcd/pmcd.log) and the PMDA log file ($PCP_LOG_DIR/pmcd/hotproc.log) should be checked for any warnings or errors. + If the Install script reports some warnings when checking the metrics, the problem should be listed in one of the log files.