summaryrefslogtreecommitdiff
path: root/man/man1/pmdaweblog.1
diff options
context:
space:
mode:
Diffstat (limited to 'man/man1/pmdaweblog.1')
-rw-r--r--man/man1/pmdaweblog.1584
1 files changed, 584 insertions, 0 deletions
diff --git a/man/man1/pmdaweblog.1 b/man/man1/pmdaweblog.1
new file mode 100644
index 0000000..2558d2e
--- /dev/null
+++ b/man/man1/pmdaweblog.1
@@ -0,0 +1,584 @@
+'\"macro stdmacro
+.\"
+.\" Copyright (c) 2012 Red Hat.
+.\" Copyright (c) 2000 Silicon Graphics, Inc. All Rights Reserved.
+.\"
+.\" This program is free software; you can redistribute it and/or modify it
+.\" under the terms of the GNU General Public License as published by the
+.\" Free Software Foundation; either version 2 of the License, or (at your
+.\" option) any later version.
+.\"
+.\" This program is distributed in the hope that it will be useful, but
+.\" WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+.\" or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+.\" for more details.
+.\"
+.\"
+.TH PMDAWEBLOG 1 "PCP" "Performance Co-Pilot"
+.SH NAME
+\f3pmdaweblog\f1 \- performance metrics domain agent (PMDA) for Web server logs
+.\" literals use .B or \f3
+.\" arguments use .I or \f2
+.SH SYNOPSIS
+\f3$PCP_PMDAS_DIR/weblog/pmdaweblog\f1
+[\f3\-Cp\f1]
+[\f3\-d\f1 \f2domain\f1]
+[\f3\-h\f1 \f2helpfile\f1]
+[\f3\-i\f1 \f2port\f1]
+[\f3\-l\f1 \f2logfile\f1]
+[\f3\-n\f1 \f2idlesec\f1]
+[\f3\-S\f1 \f2num\f1]
+[\f3\-t\f1 \f2delay\f1]
+[\f3\-u\f1 \f2socket\f1]
+[\f3\-U\f1 \f2username\f1]
+\f2configfile\f1
+.SH DESCRIPTION
+.B pmdaweblog
+is a Performance Metrics Domain Agent
+.RB ( PMDA (3))
+that scans Web server logs
+to extract metrics characterizing Web server activity.
+These performance metrics are then made available through the infrastructure
+of the Performance Co-Pilot (PCP).
+.PP
+The
+.I configfile
+specifies which Web servers are to be monitored, their associated access
+logs and error logs, and a regular-expression based scheme for extracting
+detailed information about each Web access. This file is maintained as
+part of the PMDA installation and/or de-installation by the scripts
+.B Install
+and
+.B Remove
+in the directory
+.BR $PCP_PMDAS_DIR/weblog .
+For more details, refer to the section below covering installation.
+.PP
+Once started,
+.B pmdaweblog
+monitors a set of log files and in response to a request for information,
+will process any new information that has been appended to the log files,
+similar to a
+.BR tail (1).
+There is also periodic "catch up" to process new information from all
+log files, and a scheme to detect the rotation of log files.
+.PP
+Like all other PMDAs,
+.B pmdaweblog
+is launched by
+.BR pmcd (1)
+using command line options specified in
+.I $PCP_PMCDCONF_PATH
+\- the
+.B Install
+script will prompt for appropriate values for the command line options, and
+update
+.IR $PCP_PMCDCONF_PATH .
+.PP
+A brief description of the
+.B pmdaweblog
+command line options follows:
+.TP
+.B \-C
+Check the configuration and exit.
+.TP
+.BI \-d " domain"
+Specify the
+.I domain
+number. It is absolutely crucial that the performance metrics
+.I domain
+number specified here is unique and consistent. That is,
+.I domain
+should be different for every PMDA on the one host, and the same
+.I domain
+number should be used for the
+.B pmdaweblog
+PMDA on all hosts.
+.RS
+.P
+For most installations, the default
+.I domain
+as encapsulated in the file
+.B $PCP_PMDAS_DIR/weblog/domain.h
+will suffice. For alternate values, check
+.I $PCP_PMCDCONF_PATH
+for the
+.I domain
+values already in use on this host, and the file
+.B $PCP_VAR_DIR/pmns/stdpmid
+contains a repository of ``well known''
+.I domain
+assignments that probably should be avoided.
+.RE
+.TP
+.BI \-h " helpfile"
+Get the help text from the supplied
+.I helpfile
+rather than from the default location.
+.TP
+.BI \-i " port"
+Communicate with
+.BR pmcd (1)
+on the specified Internet
+.I port
+(which may be a number or a name).
+.TP
+.BI \-l " logfile"
+Location of the log file. By default, a log file named
+.I weblog.log
+is written in the current directory of
+.BR pmcd (1)
+when
+.B pmdaweblog
+is started, i.e.
+.BR $PCP_LOG_DIR/pmcd .
+If the log file cannot
+be created or is not writable, output is written to the standard error instead.
+.TP
+.BI \-n " idlesec"
+If a Web server log file has not been modified for
+.IR idlesec
+seconds, then the file will be closed and re-opened.
+This is the only way
+.B pmdaweblog
+can detect any asynchronous rotation of the logs by Web server
+administrative scripts.
+The default period is 20 seconds.
+This value may be changed dynamically using
+.BR pmstore (1)
+to modify the value of the performance metric
+.BR web.config.check .
+.I
+.TP
+.B \-p
+Communicate with
+.BR pmcd (1)
+via a pipe.
+.TP
+.BI \-S " num"
+Specify the maximum number of Web servers per
+.IR sproc .
+It may be desirable (from a latency and load balancing perspective) or
+necessary (due to file descriptor limits) to delegate responsibility
+for scanning the Web server log files to several
+.IR sprocs .
+.B pmdaweblog
+will ensure that each
+.I sproc
+handles the log files for at most
+.I num
+Web servers.
+The default value is 80 Web servers per
+.IR sproc .
+.TP
+.BI \-t " delay"
+To avoid the need to scan a lot of information from the Web
+server logs in response to a single request for performance
+metrics, all log files will be checked at least once
+every
+.I delay
+seconds.
+The default is 15 seconds.
+This value may by changed dynamically using
+.BR pmstore (1)
+to modify the value of the performance metric
+.BR web.config.catchup .
+.TP
+.BI \-u " socket"
+Communicate with
+.BR pmcd (1)
+via the given Unix domain
+.IR socket .
+.TP
+.B \-U
+User account under which to run the agent.
+The default is the unprivileged "pcp" account in current versions of PCP,
+but in older versions the superuser account ("root") was used by default.
+.SH INSTALLATION
+The PCP framework allows metrics to be collected on one host
+and monitored from another. These hosts are referred to as
+.I collector
+and
+.I monitor
+hosts, respectively. A host may be both a collector and a monitor.
+.PP
+Collector hosts require the installation of the agent, while monitoring
+hosts require no agent installation at all.
+.PP
+For collector hosts do the following as root:
+.PP
+.ft CW
+.nf
+.in +0.25i
+# cd $PCP_PMDAS_DIR/weblog
+# ./Install
+.in
+.fi
+.ft 1
+.PP
+The installation procedure prompts for a default or non-default installation.
+A default installation will search for known server configurations and
+automatically configure the PMDA for any server log files that are found.
+A non-default installation will step through each server, prompting the
+user for other server configurations and arguments to
+.BR pmdaweblog .
+The end result of a collector installation
+is to build a configuration file that is passed to
+.B pmdaweblog
+via the
+.I configfile
+argument.
+.PP
+If you want to undo the installation, do the following as root:
+.PP
+.ft CW
+.nf
+.in +0.25i
+# cd $PCP_PMDAS_DIR/weblog
+# ./Remove
+.in
+.fi
+.ft 1
+.PP
+.B pmdaweblog
+is launched by
+.BR pmcd (1)
+and should never be executed directly.
+The
+.B Install
+and
+.B Remove
+scripts notify
+.BR pmcd (1)
+when the agent is installed or removed.
+.SH CONFIGURATION
+The configuration file for the weblog PMDA is an ASCII file that can
+be easily modified.
+Empty lines and lines beginning with '\f3#\f1'
+are ignored.
+All other lines must be either a regular expression or server
+specification.
+.PP
+Regular expressions, which are used on both the access and error log files,
+must be of the form:
+.PP
+.in +0.25i
+.B regex
+.I regexName regexp
+.in
+.I or
+.PP
+.in +0.25i
+.B regex_posix
+.I regexName ordering regexp_posix
+.in
+.PP
+The
+.I regexName
+is a word which uniquely identifies the regular expression.
+This is the reference used in the server specification.
+The
+.I regexp
+for access logs is in the format described for
+.BR regcmp (3).
+The
+.I regexp_posix
+for access logs is in the format described for
+.BR regcomp (3).
+The argument
+.I ordering
+is explained below. The
+.B Posix
+form should be available on all platforms.
+.PP
+The regular expression requires the specification of up to four arguments
+to be extracted from each line of a Web server access log, depending on the
+type of server. In the most common case there are two arguments representing
+the method and the size.
+.PP
+For the non\-
+.B Posix
+version, argument
+.I $0
+should contain the method:
+.BR GET ,
+.B HEAD ,
+.B POST
+or
+.BR PUT .
+The method
+.B PUT
+is treated as a synonym for
+.BR POST ,
+and anything else is categorized as
+.BR OTHER .
+.PP
+The second argument,
+.IR $1 ,
+should contain the size of the request.
+A size of ``\f3\-\f1'' or `` '' is treated as unknown.
+.PP
+Argument
+.I $3
+should contain the status code returned to the client browser and argument
+.I $4
+should contain the status code returned to the server from a remote host.
+These latter two arguments are used for caching servers and must be specified
+as a pair (or
+.I $3
+will be ignored). For further information on status codes, refer to the
+web site
+.B http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html
+.PP
+Some legal non\-
+.B Posix
+regex expression specifications for monitoring an access log are:
+.PP
+.ft CW
+.nf
+.in +0.25i
+# pattern for CERN, NCSA, Netscape etc Access Logs
+regex CERN ] "([A\-Za\-z][\-A\-Za\-z]+)$0 .*" [\-0\-9]+ ([\-0\-9]+)$1
+
+# pattern for FTP Server access logs (normally in SYSLOG)
+regex SYSLOG_FTP ftpd[.*]: ([gp][\-A\-Za\-z]+)$0( )$1
+.in
+.fi
+.ft 1
+.PP
+There is 1 special types of access logs with the
+.I RegexName
+.I SQUID.
+This formats extract 4 parameters but since the
+.B Squid
+log file uses text-based status codes, it is handled as a special case.
+.PP
+In the examples below,
+.I NS_PROXY
+parses the Netscape/W3C
+.I Common Extended Log Format
+and
+.I SQUID
+parses the default Squid Object Cache format log file.
+.PP
+.ft CW
+.nf
+.in +0.25i
+# pattern for Netscape Proxy Server Extended Logs
+regex NS_PROXY ] "([A\-Za\-z][\-A\-Za\-z]+)$0 .*" ([\-0\-9]+)$2 \\
+.in +0.5i
+([\-0\-9]+)$1 ([\-0\-9]+)$3
+.in
+
+# pattern for Squid Cache logs
+regex SQUID [0\-9]+\.[0\-9]+[ ]+[0\-9]+ [a\-zA\-Z0\-9\.]+ \\
+.in +0.5i
+([_A\-Z]+)$3\/([0\-9]+)$2 ([0\-9]+)$1 ([A\-Z]+)$0
+.in
+.in
+.fi
+.ft 1
+.PP
+The
+.I regexp
+for the error logs does not require any arguments, only a match.
+Some legal
+expressions are:
+.PP
+.ft CW
+.nf
+.in +0.25i
+# pattern for CERN, NCSA, Netscape etc Error Logs
+regex CERN_err .
+
+# pattern for FTP Server error logs (normally in SYSLOG)
+regex SYSLOG_FTP_err FTP LOGIN FAILED
+.in
+.fi
+.ft 1
+.PP
+If
+.B POSIX
+compliant regular expressions are used, additional information is required
+since the order of parameters cannot be specified in the regular expression.
+For backwards compatibility, the common case of two parameters the order
+may be specified as
+.I method,size
+or
+.I size,method
+In the general case, the ordering is specified by one of the following
+methods:
+.TP 0.5in
+n1,n2,n3,n4
+where nX is a digit between 1 and 4. Each comma-seperated field represents
+(in order) the argument number for
+.I method,size,client_status,server_status
+.TP 0.5in
+-
+Used for cases like the error logs where the content is ignored.
+.PP
+As for the non-
+.B Posix
+format, the
+.I SQUID
+RegexName is treated as a special case to match the non-numerical status codes.
+.PP
+Some legal
+.B Posix
+regex expression specifications for monitoring an access log are:
+.PP
+.ft CW
+.nf
+.in +0.25i
+# pattern for CERN, NCSA, Netscape, Apache etc Access Logs
+regex_posix CERN method,size ][ \\]+"([A\-Za\-z][\-A\-Za\-z]+) \\
+.in +0.5i
+[^"]*" [\-0\-9]+ ([\-0\-9]+)
+.in
+
+# pattern for CERN, NCSA, Netscape, Apache etc Access Logs
+regex_posix CERN 1,2 ][ \\]+"([A\-Za\-z][\-A\-Za\-z]+) \\
+.in +0.5i
+[^"]*" [\-0\-9]+ ([\-0\-9]+)
+.in
+
+# pattern for FTP Server access logs (normally in SYSLOG)
+regex_posix SYSLOG_FTP method,size ftpd[.*]: \\
+.in +0.5i
+([gp][\-A\-Za\-z]+)( )
+.in
+
+# pattern for Netscape Proxy Server Extended Logs
+regex_posix NS_PROXY 1,3,2,4 ][ ]+"([A\-Za\-z][\-A\-Za\-z]+) \\
+.in +0.5i
+[^"]*" ([\-0\-9]+) ([\-0\-9]+) ([\-0\-9]+)
+.in
+
+# pattern for Squid Cache logs
+regex_posix SQUID 4,3,2,1 [0\-9]+\.[0\-9]+[ ]+[0\-9]+ \\
+.in +0.5i
+[a\-zA\-Z0\-9\.]+ ([_A\-Z]+)\/([0\-9]+) ([0\-9]+) ([A\-Z]+)
+.in
+
+# pattern for CERN, NCSA, Netscape etc Error Logs
+regex_posix CERN_err \- .
+
+# pattern for FTP Server error logs (normally in SYSLOG)
+regex_posix SYSLOG_FTP_err \- FTP LOGIN FAILED
+.in
+.fi
+.ft 1
+
+.PP
+A Web server can be specified using this syntax:
+.PP
+.ft CW
+.nf
+.in +0.25i
+\f3server \f2serverName \f3on\f2|\f3off \f2accessRegex accessFile errorRegex errorFile
+.in
+.fi
+.ft 1
+.PP
+The
+.I serverName
+must be unique for each server, and is the name given to the instance
+for the associated performance metrics.
+See
+.BR PMAPI (3)
+for a discussion of PCP instance domains.
+The
+.B on
+or
+.B off
+flag indicates whether the server is to be monitored when the PMDA is
+installed.
+This can altered dynamically using
+.BR pmstore (1)
+for the metric
+.BR web.perserver.watched ,
+which has one instance for each Web server named in
+.IR configfile .
+.PP
+Two files are monitored for each Web server, the access and the error log.
+Each file requires the name of a previously declared regular expression,
+and a file name.
+The log files specified for each server do not
+have to exist when the weblog PMDA is installed.
+The PMDA will continue
+to check for non-existent log files and open them when possible.
+Some legal server specifications are:
+.PP
+.ft CW
+.nf
+.in +0.25i
+# Netscape Server on Port 80 at IP address 127.55.555.555
+server 127.55.555.555:80 on CERN /logs/access CERN_err /logs/errors
+
+# FTP Server.
+server ftpd on SYSLOG_FTP /var/log/messages SYSLOG_FTP_err /var/log/messages
+.in
+.fi
+.ft 1
+.SH CAVEATS
+Specifying regular expressions with an incorrect number of arguments, anything other
+than 2 for access logs, and none for error logs, may cause the PMDA to behave
+incorrectly and even crash. This is due to limitations in the interface of
+.BR regex (3).
+.SH FILES
+.TP 10
+.B $PCP_PMDAS_DIR/weblog
+installation directory for the weblog PMDA
+.TP
+.B $PCP_PMDAS_DIR/weblog/Install
+installation script for the weblog PMDA
+.TP
+.B $PCP_PMDAS_DIR/weblog/Remove
+de-installation script for the weblog PMDA
+.TP
+.B $PCP_LOG_DIR/pmcd/weblog.log
+default log file for error reporting
+.TP
+.I $PCP_PMCDCONF_PATH
+.B pmcd
+configuration file that specifies the command line options
+to be used when
+.B pmdaweblog
+is launched
+.TP
+.B $PCP_LOG_DIR/NOTICES
+log of PMDA installations and removals
+.TP
+.B $PCP_VAR_DIR/config/web/weblog.conf
+likely location of the weblog PMDA configuration file
+.TP
+.B $PCP_DOC_DIR/pcpweb/index.html
+the online HTML documentation for PCPWEB
+.SH "PCP ENVIRONMENT"
+Environment variables with the prefix
+.B PCP_
+are used to parameterize the file and directory names
+used by PCP.
+On each installation, the file
+.B /etc/pcp.conf
+contains the local values for these variables.
+The
+.B $PCP_CONF
+variable may be used to specify an alternative
+configuration file,
+as described in
+.BR pcp.conf (5).
+.SH SEE ALSO
+.BR pmcd (1),
+.BR pmchart (1),
+.BR pmdawebping (1),
+.BR pminfo (1),
+.BR pmstore (1),
+.BR pmview (1),
+.BR tail (1),
+.BR weblogvis (1),
+.BR webvis (1),
+.BR PMAPI (3),
+.BR PMDA (3)
+and
+.BR regcmp (3).