diff options
Diffstat (limited to 'man/man1/pmdaweblog.1')
-rw-r--r-- | man/man1/pmdaweblog.1 | 584 |
1 files changed, 584 insertions, 0 deletions
diff --git a/man/man1/pmdaweblog.1 b/man/man1/pmdaweblog.1 new file mode 100644 index 0000000..2558d2e --- /dev/null +++ b/man/man1/pmdaweblog.1 @@ -0,0 +1,584 @@ +'\"macro stdmacro +.\" +.\" Copyright (c) 2012 Red Hat. +.\" Copyright (c) 2000 Silicon Graphics, Inc. All Rights Reserved. +.\" +.\" This program is free software; you can redistribute it and/or modify it +.\" under the terms of the GNU General Public License as published by the +.\" Free Software Foundation; either version 2 of the License, or (at your +.\" option) any later version. +.\" +.\" This program is distributed in the hope that it will be useful, but +.\" WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY +.\" or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License +.\" for more details. +.\" +.\" +.TH PMDAWEBLOG 1 "PCP" "Performance Co-Pilot" +.SH NAME +\f3pmdaweblog\f1 \- performance metrics domain agent (PMDA) for Web server logs +.\" literals use .B or \f3 +.\" arguments use .I or \f2 +.SH SYNOPSIS +\f3$PCP_PMDAS_DIR/weblog/pmdaweblog\f1 +[\f3\-Cp\f1] +[\f3\-d\f1 \f2domain\f1] +[\f3\-h\f1 \f2helpfile\f1] +[\f3\-i\f1 \f2port\f1] +[\f3\-l\f1 \f2logfile\f1] +[\f3\-n\f1 \f2idlesec\f1] +[\f3\-S\f1 \f2num\f1] +[\f3\-t\f1 \f2delay\f1] +[\f3\-u\f1 \f2socket\f1] +[\f3\-U\f1 \f2username\f1] +\f2configfile\f1 +.SH DESCRIPTION +.B pmdaweblog +is a Performance Metrics Domain Agent +.RB ( PMDA (3)) +that scans Web server logs +to extract metrics characterizing Web server activity. +These performance metrics are then made available through the infrastructure +of the Performance Co-Pilot (PCP). +.PP +The +.I configfile +specifies which Web servers are to be monitored, their associated access +logs and error logs, and a regular-expression based scheme for extracting +detailed information about each Web access. This file is maintained as +part of the PMDA installation and/or de-installation by the scripts +.B Install +and +.B Remove +in the directory +.BR $PCP_PMDAS_DIR/weblog . +For more details, refer to the section below covering installation. +.PP +Once started, +.B pmdaweblog +monitors a set of log files and in response to a request for information, +will process any new information that has been appended to the log files, +similar to a +.BR tail (1). +There is also periodic "catch up" to process new information from all +log files, and a scheme to detect the rotation of log files. +.PP +Like all other PMDAs, +.B pmdaweblog +is launched by +.BR pmcd (1) +using command line options specified in +.I $PCP_PMCDCONF_PATH +\- the +.B Install +script will prompt for appropriate values for the command line options, and +update +.IR $PCP_PMCDCONF_PATH . +.PP +A brief description of the +.B pmdaweblog +command line options follows: +.TP +.B \-C +Check the configuration and exit. +.TP +.BI \-d " domain" +Specify the +.I domain +number. It is absolutely crucial that the performance metrics +.I domain +number specified here is unique and consistent. That is, +.I domain +should be different for every PMDA on the one host, and the same +.I domain +number should be used for the +.B pmdaweblog +PMDA on all hosts. +.RS +.P +For most installations, the default +.I domain +as encapsulated in the file +.B $PCP_PMDAS_DIR/weblog/domain.h +will suffice. For alternate values, check +.I $PCP_PMCDCONF_PATH +for the +.I domain +values already in use on this host, and the file +.B $PCP_VAR_DIR/pmns/stdpmid +contains a repository of ``well known'' +.I domain +assignments that probably should be avoided. +.RE +.TP +.BI \-h " helpfile" +Get the help text from the supplied +.I helpfile +rather than from the default location. +.TP +.BI \-i " port" +Communicate with +.BR pmcd (1) +on the specified Internet +.I port +(which may be a number or a name). +.TP +.BI \-l " logfile" +Location of the log file. By default, a log file named +.I weblog.log +is written in the current directory of +.BR pmcd (1) +when +.B pmdaweblog +is started, i.e. +.BR $PCP_LOG_DIR/pmcd . +If the log file cannot +be created or is not writable, output is written to the standard error instead. +.TP +.BI \-n " idlesec" +If a Web server log file has not been modified for +.IR idlesec +seconds, then the file will be closed and re-opened. +This is the only way +.B pmdaweblog +can detect any asynchronous rotation of the logs by Web server +administrative scripts. +The default period is 20 seconds. +This value may be changed dynamically using +.BR pmstore (1) +to modify the value of the performance metric +.BR web.config.check . +.I +.TP +.B \-p +Communicate with +.BR pmcd (1) +via a pipe. +.TP +.BI \-S " num" +Specify the maximum number of Web servers per +.IR sproc . +It may be desirable (from a latency and load balancing perspective) or +necessary (due to file descriptor limits) to delegate responsibility +for scanning the Web server log files to several +.IR sprocs . +.B pmdaweblog +will ensure that each +.I sproc +handles the log files for at most +.I num +Web servers. +The default value is 80 Web servers per +.IR sproc . +.TP +.BI \-t " delay" +To avoid the need to scan a lot of information from the Web +server logs in response to a single request for performance +metrics, all log files will be checked at least once +every +.I delay +seconds. +The default is 15 seconds. +This value may by changed dynamically using +.BR pmstore (1) +to modify the value of the performance metric +.BR web.config.catchup . +.TP +.BI \-u " socket" +Communicate with +.BR pmcd (1) +via the given Unix domain +.IR socket . +.TP +.B \-U +User account under which to run the agent. +The default is the unprivileged "pcp" account in current versions of PCP, +but in older versions the superuser account ("root") was used by default. +.SH INSTALLATION +The PCP framework allows metrics to be collected on one host +and monitored from another. These hosts are referred to as +.I collector +and +.I monitor +hosts, respectively. A host may be both a collector and a monitor. +.PP +Collector hosts require the installation of the agent, while monitoring +hosts require no agent installation at all. +.PP +For collector hosts do the following as root: +.PP +.ft CW +.nf +.in +0.25i +# cd $PCP_PMDAS_DIR/weblog +# ./Install +.in +.fi +.ft 1 +.PP +The installation procedure prompts for a default or non-default installation. +A default installation will search for known server configurations and +automatically configure the PMDA for any server log files that are found. +A non-default installation will step through each server, prompting the +user for other server configurations and arguments to +.BR pmdaweblog . +The end result of a collector installation +is to build a configuration file that is passed to +.B pmdaweblog +via the +.I configfile +argument. +.PP +If you want to undo the installation, do the following as root: +.PP +.ft CW +.nf +.in +0.25i +# cd $PCP_PMDAS_DIR/weblog +# ./Remove +.in +.fi +.ft 1 +.PP +.B pmdaweblog +is launched by +.BR pmcd (1) +and should never be executed directly. +The +.B Install +and +.B Remove +scripts notify +.BR pmcd (1) +when the agent is installed or removed. +.SH CONFIGURATION +The configuration file for the weblog PMDA is an ASCII file that can +be easily modified. +Empty lines and lines beginning with '\f3#\f1' +are ignored. +All other lines must be either a regular expression or server +specification. +.PP +Regular expressions, which are used on both the access and error log files, +must be of the form: +.PP +.in +0.25i +.B regex +.I regexName regexp +.in +.I or +.PP +.in +0.25i +.B regex_posix +.I regexName ordering regexp_posix +.in +.PP +The +.I regexName +is a word which uniquely identifies the regular expression. +This is the reference used in the server specification. +The +.I regexp +for access logs is in the format described for +.BR regcmp (3). +The +.I regexp_posix +for access logs is in the format described for +.BR regcomp (3). +The argument +.I ordering +is explained below. The +.B Posix +form should be available on all platforms. +.PP +The regular expression requires the specification of up to four arguments +to be extracted from each line of a Web server access log, depending on the +type of server. In the most common case there are two arguments representing +the method and the size. +.PP +For the non\- +.B Posix +version, argument +.I $0 +should contain the method: +.BR GET , +.B HEAD , +.B POST +or +.BR PUT . +The method +.B PUT +is treated as a synonym for +.BR POST , +and anything else is categorized as +.BR OTHER . +.PP +The second argument, +.IR $1 , +should contain the size of the request. +A size of ``\f3\-\f1'' or `` '' is treated as unknown. +.PP +Argument +.I $3 +should contain the status code returned to the client browser and argument +.I $4 +should contain the status code returned to the server from a remote host. +These latter two arguments are used for caching servers and must be specified +as a pair (or +.I $3 +will be ignored). For further information on status codes, refer to the +web site +.B http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html +.PP +Some legal non\- +.B Posix +regex expression specifications for monitoring an access log are: +.PP +.ft CW +.nf +.in +0.25i +# pattern for CERN, NCSA, Netscape etc Access Logs +regex CERN ] "([A\-Za\-z][\-A\-Za\-z]+)$0 .*" [\-0\-9]+ ([\-0\-9]+)$1 + +# pattern for FTP Server access logs (normally in SYSLOG) +regex SYSLOG_FTP ftpd[.*]: ([gp][\-A\-Za\-z]+)$0( )$1 +.in +.fi +.ft 1 +.PP +There is 1 special types of access logs with the +.I RegexName +.I SQUID. +This formats extract 4 parameters but since the +.B Squid +log file uses text-based status codes, it is handled as a special case. +.PP +In the examples below, +.I NS_PROXY +parses the Netscape/W3C +.I Common Extended Log Format +and +.I SQUID +parses the default Squid Object Cache format log file. +.PP +.ft CW +.nf +.in +0.25i +# pattern for Netscape Proxy Server Extended Logs +regex NS_PROXY ] "([A\-Za\-z][\-A\-Za\-z]+)$0 .*" ([\-0\-9]+)$2 \\ +.in +0.5i +([\-0\-9]+)$1 ([\-0\-9]+)$3 +.in + +# pattern for Squid Cache logs +regex SQUID [0\-9]+\.[0\-9]+[ ]+[0\-9]+ [a\-zA\-Z0\-9\.]+ \\ +.in +0.5i +([_A\-Z]+)$3\/([0\-9]+)$2 ([0\-9]+)$1 ([A\-Z]+)$0 +.in +.in +.fi +.ft 1 +.PP +The +.I regexp +for the error logs does not require any arguments, only a match. +Some legal +expressions are: +.PP +.ft CW +.nf +.in +0.25i +# pattern for CERN, NCSA, Netscape etc Error Logs +regex CERN_err . + +# pattern for FTP Server error logs (normally in SYSLOG) +regex SYSLOG_FTP_err FTP LOGIN FAILED +.in +.fi +.ft 1 +.PP +If +.B POSIX +compliant regular expressions are used, additional information is required +since the order of parameters cannot be specified in the regular expression. +For backwards compatibility, the common case of two parameters the order +may be specified as +.I method,size +or +.I size,method +In the general case, the ordering is specified by one of the following +methods: +.TP 0.5in +n1,n2,n3,n4 +where nX is a digit between 1 and 4. Each comma-seperated field represents +(in order) the argument number for +.I method,size,client_status,server_status +.TP 0.5in +- +Used for cases like the error logs where the content is ignored. +.PP +As for the non- +.B Posix +format, the +.I SQUID +RegexName is treated as a special case to match the non-numerical status codes. +.PP +Some legal +.B Posix +regex expression specifications for monitoring an access log are: +.PP +.ft CW +.nf +.in +0.25i +# pattern for CERN, NCSA, Netscape, Apache etc Access Logs +regex_posix CERN method,size ][ \\]+"([A\-Za\-z][\-A\-Za\-z]+) \\ +.in +0.5i +[^"]*" [\-0\-9]+ ([\-0\-9]+) +.in + +# pattern for CERN, NCSA, Netscape, Apache etc Access Logs +regex_posix CERN 1,2 ][ \\]+"([A\-Za\-z][\-A\-Za\-z]+) \\ +.in +0.5i +[^"]*" [\-0\-9]+ ([\-0\-9]+) +.in + +# pattern for FTP Server access logs (normally in SYSLOG) +regex_posix SYSLOG_FTP method,size ftpd[.*]: \\ +.in +0.5i +([gp][\-A\-Za\-z]+)( ) +.in + +# pattern for Netscape Proxy Server Extended Logs +regex_posix NS_PROXY 1,3,2,4 ][ ]+"([A\-Za\-z][\-A\-Za\-z]+) \\ +.in +0.5i +[^"]*" ([\-0\-9]+) ([\-0\-9]+) ([\-0\-9]+) +.in + +# pattern for Squid Cache logs +regex_posix SQUID 4,3,2,1 [0\-9]+\.[0\-9]+[ ]+[0\-9]+ \\ +.in +0.5i +[a\-zA\-Z0\-9\.]+ ([_A\-Z]+)\/([0\-9]+) ([0\-9]+) ([A\-Z]+) +.in + +# pattern for CERN, NCSA, Netscape etc Error Logs +regex_posix CERN_err \- . + +# pattern for FTP Server error logs (normally in SYSLOG) +regex_posix SYSLOG_FTP_err \- FTP LOGIN FAILED +.in +.fi +.ft 1 + +.PP +A Web server can be specified using this syntax: +.PP +.ft CW +.nf +.in +0.25i +\f3server \f2serverName \f3on\f2|\f3off \f2accessRegex accessFile errorRegex errorFile +.in +.fi +.ft 1 +.PP +The +.I serverName +must be unique for each server, and is the name given to the instance +for the associated performance metrics. +See +.BR PMAPI (3) +for a discussion of PCP instance domains. +The +.B on +or +.B off +flag indicates whether the server is to be monitored when the PMDA is +installed. +This can altered dynamically using +.BR pmstore (1) +for the metric +.BR web.perserver.watched , +which has one instance for each Web server named in +.IR configfile . +.PP +Two files are monitored for each Web server, the access and the error log. +Each file requires the name of a previously declared regular expression, +and a file name. +The log files specified for each server do not +have to exist when the weblog PMDA is installed. +The PMDA will continue +to check for non-existent log files and open them when possible. +Some legal server specifications are: +.PP +.ft CW +.nf +.in +0.25i +# Netscape Server on Port 80 at IP address 127.55.555.555 +server 127.55.555.555:80 on CERN /logs/access CERN_err /logs/errors + +# FTP Server. +server ftpd on SYSLOG_FTP /var/log/messages SYSLOG_FTP_err /var/log/messages +.in +.fi +.ft 1 +.SH CAVEATS +Specifying regular expressions with an incorrect number of arguments, anything other +than 2 for access logs, and none for error logs, may cause the PMDA to behave +incorrectly and even crash. This is due to limitations in the interface of +.BR regex (3). +.SH FILES +.TP 10 +.B $PCP_PMDAS_DIR/weblog +installation directory for the weblog PMDA +.TP +.B $PCP_PMDAS_DIR/weblog/Install +installation script for the weblog PMDA +.TP +.B $PCP_PMDAS_DIR/weblog/Remove +de-installation script for the weblog PMDA +.TP +.B $PCP_LOG_DIR/pmcd/weblog.log +default log file for error reporting +.TP +.I $PCP_PMCDCONF_PATH +.B pmcd +configuration file that specifies the command line options +to be used when +.B pmdaweblog +is launched +.TP +.B $PCP_LOG_DIR/NOTICES +log of PMDA installations and removals +.TP +.B $PCP_VAR_DIR/config/web/weblog.conf +likely location of the weblog PMDA configuration file +.TP +.B $PCP_DOC_DIR/pcpweb/index.html +the online HTML documentation for PCPWEB +.SH "PCP ENVIRONMENT" +Environment variables with the prefix +.B PCP_ +are used to parameterize the file and directory names +used by PCP. +On each installation, the file +.B /etc/pcp.conf +contains the local values for these variables. +The +.B $PCP_CONF +variable may be used to specify an alternative +configuration file, +as described in +.BR pcp.conf (5). +.SH SEE ALSO +.BR pmcd (1), +.BR pmchart (1), +.BR pmdawebping (1), +.BR pminfo (1), +.BR pmstore (1), +.BR pmview (1), +.BR tail (1), +.BR weblogvis (1), +.BR webvis (1), +.BR PMAPI (3), +.BR PMDA (3) +and +.BR regcmp (3). |