$NetBSD: patch-ae,v 1.1.1.1 2007/04/30 20:53:54 heinz Exp $ Create monit.1.in so we can replace @sysconfdir@ in the configure script. --- monit.1.in.orig 2007-04-16 23:45:44.000000000 +0200 +++ monit.1.in @@ -0,0 +1,3893 @@ +.\" Automatically generated by Pod::Man v1.37, Pod::Parser v1.32 +.\" +.\" Standard preamble: +.\" ======================================================================== +.de Sh \" Subsection heading +.br +.if t .Sp +.ne 5 +.PP +\fB\\$1\fR +.PP +.. +.de Sp \" Vertical space (when we can't use .PP) +.if t .sp .5v +.if n .sp +.. +.de Vb \" Begin verbatim text +.ft CW +.nf +.ne \\$1 +.. +.de Ve \" End verbatim text +.ft R +.fi +.. +.\" Set up some character translations and predefined strings. \*(-- will +.\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left +.\" double quote, and \*(R" will give a right double quote. \*(C+ will +.\" give a nicer C++. Capital omega is used to do unbreakable dashes and +.\" therefore won't be available. \*(C` and \*(C' expand to `' in nroff, +.\" nothing in troff, for use with C<>. +.tr \(*W- +.ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p' +.ie n \{\ +. ds -- \(*W- +. ds PI pi +. if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch +. if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\" diablo 12 pitch +. ds L" "" +. ds R" "" +. ds C` "" +. ds C' "" +'br\} +.el\{\ +. ds -- \|\(em\| +. ds PI \(*p +. ds L" `` +. ds R" '' +'br\} +.\" +.\" If the F register is turned on, we'll generate index entries on stderr for +.\" titles (.TH), headers (.SH), subsections (.Sh), items (.Ip), and index +.\" entries marked with X<> in POD. Of course, you'll have to process the +.\" output yourself in some meaningful fashion. +.if \nF \{\ +. de IX +. tm Index:\\$1\t\\n%\t"\\$2" +.. +. nr % 0 +. rr F +.\} +.\" +.\" For nroff, turn off justification. Always turn off hyphenation; it makes +.\" way too many mistakes in technical documents. +.hy 0 +.if n .na +.\" +.\" Accent mark definitions (@(#)ms.acc 1.5 88/02/08 SMI; from UCB 4.2). +.\" Fear. Run. Save yourself. No user-serviceable parts. +. \" fudge factors for nroff and troff +.if n \{\ +. ds #H 0 +. ds #V .8m +. ds #F .3m +. ds #[ \f1 +. ds #] \fP +.\} +.if t \{\ +. ds #H ((1u-(\\\\n(.fu%2u))*.13m) +. ds #V .6m +. ds #F 0 +. ds #[ \& +. ds #] \& +.\} +. \" simple accents for nroff and troff +.if n \{\ +. ds ' \& +. ds ` \& +. ds ^ \& +. ds , \& +. ds ~ ~ +. ds / +.\} +.if t \{\ +. ds ' \\k:\h'-(\\n(.wu*8/10-\*(#H)'\'\h"|\\n:u" +. ds ` \\k:\h'-(\\n(.wu*8/10-\*(#H)'\`\h'|\\n:u' +. ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'^\h'|\\n:u' +. ds , \\k:\h'-(\\n(.wu*8/10)',\h'|\\n:u' +. ds ~ \\k:\h'-(\\n(.wu-\*(#H-.1m)'~\h'|\\n:u' +. ds / \\k:\h'-(\\n(.wu*8/10-\*(#H)'\z\(sl\h'|\\n:u' +.\} +. \" troff and (daisy-wheel) nroff accents +.ds : \\k:\h'-(\\n(.wu*8/10-\*(#H+.1m+\*(#F)'\v'-\*(#V'\z.\h'.2m+\*(#F'.\h'|\\n:u'\v'\*(#V' +.ds 8 \h'\*(#H'\(*b\h'-\*(#H' +.ds o \\k:\h'-(\\n(.wu+\w'\(de'u-\*(#H)/2u'\v'-.3n'\*(#[\z\(de\v'.3n'\h'|\\n:u'\*(#] +.ds d- \h'\*(#H'\(pd\h'-\w'~'u'\v'-.25m'\f2\(hy\fP\v'.25m'\h'-\*(#H' +.ds D- D\\k:\h'-\w'D'u'\v'-.11m'\z\(hy\v'.11m'\h'|\\n:u' +.ds th \*(#[\v'.3m'\s+1I\s-1\v'-.3m'\h'-(\w'I'u*2/3)'\s-1o\s+1\*(#] +.ds Th \*(#[\s+2I\s-2\h'-\w'I'u*3/5'\v'-.3m'o\v'.3m'\*(#] +.ds ae a\h'-(\w'a'u*4/10)'e +.ds Ae A\h'-(\w'A'u*4/10)'E +. \" corrections for vroff +.if v .ds ~ \\k:\h'-(\\n(.wu*9/10-\*(#H)'\s-2\u~\d\s+2\h'|\\n:u' +.if v .ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'\v'-.4m'^\v'.4m'\h'|\\n:u' +. \" for low resolution devices (crt and lpr) +.if \n(.H>23 .if \n(.V>19 \ +\{\ +. ds : e +. ds 8 ss +. ds o a +. ds d- d\h'-1'\(ga +. ds D- D\h'-1'\(hy +. ds th \o'bp' +. ds Th \o'LP' +. ds ae ae +. ds Ae AE +.\} +.rm #[ #] #H #V #F C +.\" ======================================================================== +.\" +.IX Title "MONIT 1" +.TH MONIT 1 "www.tildeslash.com" "February 19. 2007" "User Commands" +.SH "NAME" +monit \- utility for monitoring services on a Unix system +.SH "SYNOPSIS" +.IX Header "SYNOPSIS" +\&\fBmonit\fR [options] {arguments} +.SH "DESCRIPTION" +.IX Header "DESCRIPTION" +\&\fBmonit\fR is a utility for managing and monitoring processes, +files, directories and devices on a Unix system. Monit conducts +automatic maintenance and repair and can execute meaningful +causal actions in error situations. E.g. monit can start a +process if it does not run, restart a process if it does not +respond and stop a process if it uses to much resources. You may +use monit to monitor files, directories and devices for changes, +such as timestamps changes, checksum changes or size changes. +.PP +Monit is controlled via an easy to configure control file based +on a free\-format, token-oriented syntax. Monit logs to syslog or +to its own log file and notifies you about error conditions via +customizable alert messages. Monit can perform various \s-1TCP/IP\s0 +network checks, protocol checks and can utilize \s-1SSL\s0 for such +checks. Monit provides a http(s) interface and you may use a +browser to access the monit program. +.SH "GENERAL OPERATION" +.IX Header "GENERAL OPERATION" +The behavior of monit is controlled by command-line options +\&\fIand\fR a run control file, \fI~/.monitrc\fR, the syntax of which we +describe in a later section. Command-line options override +\&\fI.monitrc\fR declarations. +.PP +The following options are recognized by monit. However, it is +recommended that you set options (when applicable) directly in +the \fI.monitrc\fR control file. +.Sh "General Options and Arguments" +.IX Subsection "General Options and Arguments" +\&\fB\-c\fR \fIfile\fR + Use this control file +.PP +\&\fB\-d\fR \fIn\fR + Run as a daemon once per \fIn\fR seconds +.PP +\&\fB\-g\fR + Set group name for start, stop, restart and status +.PP +\&\fB\-l\fR \fIlogfile\fR + Print log information to this file +.PP +\&\fB\-p\fR \fIpidfile\fR + Use this lock file in daemon mode +.PP +\&\fB\-s\fR \fIstatefile\fR + Write state information to this file +.PP +\&\fB\-I\fR + Do not run in background (needed for run from init) +.PP +\&\fB\-t\fR + Run syntax check for the control file +.PP +\&\fB\-v\fR + Verbose mode, work noisy (diagnostic output) +.PP +\&\fB\-H\fR \fI[filename]\fR + Print \s-1MD5\s0 and \s-1SHA1\s0 hashes of the file or of stdin if the + filename is omitted; monit will exit afterwards +.PP +\&\fB\-V\fR + Print version number and patch level +.PP +\&\fB\-h\fR + Print a help text +.PP +In addition to the options above, monit can be started with one +of the following action arguments; monit will then execute the +action and exit without transforming itself to a daemon. +.PP +\&\fBstart all\fR + Start all services listed in the control file and + enable monitoring for them. If the group option is + set, only start and enable monitoring of services in + the named group. +.PP +\&\fBstart name\fR + Start the named service and enable monitoring for + it. The name is a service entry name from the + monitrc file. +.PP +\&\fBstop all\fR + Stop all services listed in the control file and + disable their monitoring. If the group option is + set, only stop and disable monitoring of the services + in the named group. +.PP +\&\fBstop name\fR + Stop the named service and disable its monitoring. + The name is a service entry name from the monitrc + file. +.PP +\&\fBrestart all\fR + Stop and start \fIall\fR services. If the group option + is set, only restart the services in the named group. +.PP +\&\fBrestart name\fR + Restart the named service. The name is a service entry + name from the monitrc file. +.PP +\&\fBmonitor all\fR + Enable monitoring of all services listed in the + control file. If the group option is set, only start + monitoring of services in the named group. +.PP +\&\fBmonitor name\fR + Enable monitoring of the named service. The name is + a service entry name from the monitrc file. Monit will + also enable monitoring of all services this service + depends on. +.PP +\&\fBunmonitor all\fR + Disable monitoring of all services listed in the + control file. If the group option is set, only disable + monitoring of services in the named group. +.PP +\&\fBunmonitor name\fR + Disable monitoring of the named service. The name is + a service entry name from the monitrc file. Monit + will also disable monitoring of all services that + depends on this service. +.PP +\&\fBstatus\fR + Print full status information for each service. +.PP +\&\fBsummary\fR + Print short status information for each service. +.PP +\&\fBreload\fR + Reinitialize a running monit daemon, the daemon will + reread its configuration, close and reopen log files. +.PP +\&\fBquit\fR + Kill a monit daemon process +.PP +\&\fBvalidate\fR + Check all services listed in the control file. This + action is also the default behavior when monit runs + in daemon mode. +.SH "WHAT TO MONITOR" +.IX Header "WHAT TO MONITOR" +You may use monit to monitor daemon processes or similar programs +running on localhost. Monit is particular useful for monitoring +daemon processes, such as those started at system boot time from +/etc/init.d/. For instance sendmail, sshd, apache and mysql. In +difference to many monitoring systems, monit can act if an error +situation should occur, e.g.; if sendmail is not running, monit +can start sendmail or if apache is using to much system resources +(e.g. if a DoS attack is in progress) monit can stop or restart +apache and send you an alert message. Monit does also monitor +process characteristics, such as; if a process has become a +zombie and how much memory or cpu cycles a process is using. +.PP +You may also use monit to monitor files, directories and devices +on localhost. Monit can monitor these items for changes, such as +timestamps changes, checksum changes or size changes. This is +also useful for security reasons \- you can monitor the md5 +checksum of files that should not change. +.PP +You may even use monit to monitor remote hosts. First and +foremost monit is a utility for monitoring and mending services +on localhost, but if a service depends on a remote service, +e.g. a database server or an application server, it might by +useful to be able to test a remote host as well. +.PP +You may monitor the general system-wide resources such as cpu +usage, memory and load average. +.SH "HOW TO MONITOR" +.IX Header "HOW TO MONITOR" +monit is configured and controlled via a control file called +\&\fBmonitrc\fR. The default location for this file is ~/.monitrc. If +this file does not exist, monit will try /etc/monitrc, then +@sysconfdir@/monitrc and finally ./monitrc. +.PP +A monit control file consists of a series of service entries and +global option statements in a free\-format, token-oriented syntax. +Comments begin with a # and extend through the end of the line. +There are three kinds of tokens in the control file: grammar +keywords, numbers and strings. +.PP +On a semantic level, the control file consists of three types of +statements: +.IP "1. Global set-statements" 4 +.IX Item "1. Global set-statements" +A global set-statement starts with the keyword \fIset\fR and the +item to configure. +.IP "2. Global include-statement" 4 +.IX Item "2. Global include-statement" +The include statement consists of the keyword \fIinclude\fR and +a glob string. +.IP "3. One or more service entry statements." 4 +.IX Item "3. One or more service entry statements." +A service entry starts with the keyword \fIcheck\fR followed by the +service type. +.PP +This is the hello galaxy version of a monit control file: +.PP +.Vb 3 +\& # +\& # monit control file +\& # +.Ve +.PP +.Vb 6 +\& set daemon 120 # Poll at 2\-minute intervals +\& set logfile syslog facility log_daemon +\& set alert foo@bar.baz +\& set httpd port 2812 and use address localhost +\& allow localhost # Allow localhost to connect +\& allow admin:monit # Allow Basic Auth +.Ve +.PP +.Vb 7 +\& check system myhost.mydomain.tld +\& if loadavg (1min) > 4 then alert +\& if loadavg (5min) > 2 then alert +\& if memory usage > 75% then alert +\& if cpu usage (user) > 70% then alert +\& if cpu usage (system) > 30% then alert +\& if cpu usage (wait) > 20% then alert +.Ve +.PP +.Vb 11 +\& check process apache +\& with pidfile "/usr/local/apache/logs/httpd.pid" +\& start program = "/etc/init.d/httpd start" +\& stop program = "/etc/init.d/httpd stop" +\& if 2 restarts within 3 cycles then timeout +\& if totalmem > 100 Mb then alert +\& if children > 255 for 5 cycles then stop +\& if cpu usage > 95% for 3 cycles then restart +\& if failed port 80 protocol http then restart +\& group server +\& depends on httpd.conf, httpd.bin +.Ve +.PP +.Vb 5 +\& check file httpd.conf +\& with path /usr/local/apache/conf/httpd.conf +\& # Reload apache if the httpd.conf file was changed +\& if changed checksum +\& then exec "/usr/local/apache/bin/apachectl graceful" +.Ve +.PP +.Vb 7 +\& check file httpd.bin +\& with path /usr/local/apache/bin/httpd +\& # Run /watch/dog in the case that the binary was changed +\& # and alert in the case that the checksum value recovered +\& # later +\& if failed checksum then exec "/watch/dog" +\& else if recovered then alert +.Ve +.PP +.Vb 2 +\& include /etc/monit/mysql.monitrc +\& include /etc/monit/mail/*.monitrc +.Ve +.PP +This example illustrate a service entry for monitoring the apache +web server process as well as related files. The meaning of the +various statements will be explained in the following sections. +.SH "LOGGING" +.IX Header "LOGGING" +monit will log status and error messages to a log file. Use the +\&\fIset logfile\fR statement in the monitrc control file. To setup +monit to log to its own logfile, use e.g. \fIset logfile +/var/log/monit.log\fR. If \fBsyslog\fR is given as a value for the +\&\fI\-l\fR command-line switch (or the keyword \fIset logfile syslog\fR +is found in the control file) monit will use the \fBsyslog\fR system +daemon to log messages. The priority is assigned to each message +based on the context. To turn off logging, simply do not set +the logfile in the control file (and of course, do not use the \-l +switch) +.SH "DAEMON MODE" +.IX Header "DAEMON MODE" +The \fI\-d interval\fR command-line switch runs monit in daemon +mode. You must specify a numeric argument which is a polling +interval in seconds. +.PP +In daemon mode, monit detaches from the console, puts itself in +the background and runs continuously, monitoring each specified +service and then goes to sleep for the given poll interval. +.PP +.Vb 1 +\& Simply invoking +.Ve +.PP +.Vb 1 +\& monit \-d 300 +.Ve +.PP +will poll all services described in your \fI~/.monitrc\fR file every +5 minutes. +.PP +It is strongly recommended to set the poll interval in your +~/.monitrc file instead, by using \fIset daemon \f(BIn\fI\fR, where \fBn\fR +is an integer number of seconds. If you do this, monit will +always start in daemon mode (as long as no action arguments are +given). +.PP +Monit makes a per-instance lock-file in daemon mode. If you need +more monit instances, you will need more configuration files, +each pointing to its own lock\-file. +.PP +Calling \fImonit\fR with a monit daemon running in the background +sends a wake-up signal to the daemon, forcing it to check +services immediately. +.PP +The \fIquit\fR argument will kill a running daemon process instead +of waking it up. +.SH "INIT SUPPORT" +.IX Header "INIT SUPPORT" +Monit can run and be controlled from \fIinit\fR. If monit should +crash, \fIinit\fR will re-spawn a new monit process. Using init to +start monit is probably the best way to run monit if you want to +be certain that you always have a running monit daemon on your +system. (It's obvious, but never the less worth to stress; Make +sure that the control file does not have any syntax errors before +you start monit from init. Also, make sure that if you run monit +from init, that you do not start monit from a startup scripts as +well). +.PP +To setup monit to run from init, you can either use the 'set +init' statement in monit's control file or use the \-I option from +the command line and here is what you must add to /etc/inittab: +.PP +.Vb 2 +\& # Run monit in standard run\-levels +\& mo:2345:respawn:/usr/local/bin/monit \-Ic /etc/monitrc +.Ve +.PP +After you have modified init's configuration file, you can run +the following command to re-examine /etc/inittab and start monit: +.PP +.Vb 1 +\& telinit q +.Ve +.PP +For systems without telinit: +.PP +.Vb 1 +\& kill \-1 1 +.Ve +.PP +If monit is used to monitor services that are also started at +boot time (e.g. services started via \s-1SYSV\s0 init rc scripts or via +inittab) then, in some cases, a race condition could occur. That +is; if a service is slow to start, monit can assume that the +service is not running and possibly try to start it and raise an +alert, while, in fact the service is already about to start or +already in its startup sequence. Please see the \s-1FAQ\s0 for solutions +to this problem. +.SH "INCLUDE FILES" +.IX Header "INCLUDE FILES" +The monit control file, \fImonitrc\fR, can include additional +configuration files. This feature helps to maintain a certain +structure or to place repeating settings into one file. Include +statements can be placed at virtually any spot. The syntax is the +following: +.PP +.Vb 1 +\& INCLUDE globstring +.Ve +.PP +The globstring is any kind of string as defined in \fIglob\fR\|(7). +Thus, you can refer to a single file or you can load several +files at once. In case you want to use whitespace in your string +the globstring need to be embedded into quotes (') or double +quotes ("). For example, +.PP +.Vb 1 +\& INCLUDE "/etc/monit/monit configuration files/printer.*.monitrc" +.Ve +.PP +loads any file matching the single globstring. If the globstring +matches a directory instead of a file, it is silently ignored. +.PP +\&\fI\s-1INCLUDE\s0\fR statements in included files are parsed as in the main +control file. +.PP +If the globstring matches several results, the files are included +in a non sorted manner. If you need to rely on a certain order, +you might need to use single \fIinclude\fR statements. +.SH "GROUP SUPPORT" +.IX Header "GROUP SUPPORT" +Service entries in the control file, \fImonitrc\fR, can be grouped +together by the \fIgroup\fR statement. The syntax is simply (keyword +in capital): +.PP +.Vb 1 +\& GROUP groupname +.Ve +.PP +With this statement it is possible to group similar service +entries together and manage them as a whole. Monit provides +functions to start, stop and restart a group of services, like +so: +.PP +To start a group of services from the console: +.PP +.Vb 1 +\& monit \-g start +.Ve +.PP +To stop a group of services: +.PP +.Vb 1 +\& monit \-g stop +.Ve +.PP +To restart a group of services: +.PP +.Vb 1 +\& monit \-g restart +.Ve +.SH "MONITORING MODE" +.IX Header "MONITORING MODE" +Monit supports three monitoring modes per service: \fIactive\fR, +\&\fIpassive\fR and \fImanual\fR. See also the example section below for +usage of the mode statement. +.PP +In \fIactive\fR mode, monit will monitor a service and in case of +problems monit will act and raise alerts, start, stop or restart +the service. Active mode is the default mode. +.PP +In \fIpassive\fR mode, monit will passively monitor a service and +specifically \fBnot\fR try to fix a problem, but it will still raise +alerts in case of a problem. +.PP +For use in clustered environments there is also a \fImanual\fR +mode. In this mode, monit will enter \fIactive\fR mode \fBonly\fR if a +service was brought under monit's control, for example by +executing the following command in the console: +.PP +.Vb 2 +\& monit start sybase +\& (monit will call sybase's start method and enable monitoring) +.Ve +.PP +If a service was not started by monit or was stopped or disabled +for example by: +.PP +.Vb 2 +\& monit stop sybase +\& (monit will call sybase's stop method and disable monitoring) +.Ve +.PP +monit will not monitor the service. This allows for having +services configured in monitrc and start it with monit only if it +should run. This feature can be used to build a simple failsafe +cluster. To see how, read more about how to setup a cluster with +monit using the \fIheartbeat\fR system in the examples sections +below. +.SH "ALERT MESSAGES" +.IX Header "ALERT MESSAGES" +Monit will raise an email alert in the following situations: +.PP +.Vb 14 +\& o A service timed out +\& o A service does not exist +\& o A service related data access problem +\& o A service related program execution problem +\& o A service is of invalid object type +\& o A icmp problem +\& o A port connection problem +\& o A resource statement match +\& o A file checksum problem +\& o A file size problem +\& o A file/directory timestamp problem +\& o A file/directory/device permission problem +\& o A file/directory/device uid problem +\& o A file/directory/device gid problem +.Ve +.PP +Monit will send an alert each time a monitored object changed. +This involves: +.PP +.Vb 5 +\& o Monit started, stopped or reloaded +\& o A file checksum changed +\& o A file size changed +\& o A file content match +\& o A file/directory timestamp changed +.Ve +.PP +You use the alert statement to notify monit that you want alert +messages sent to an email address. If you do not specify an alert +statement, monit will not send alert messages. +.PP +There are two forms of alert statement: +.PP +.Vb 2 +\& o Global \- common for all services +\& o Local \- per service +.Ve +.PP +In both cases you can use more than one alert statement. In other +words, you can send many different emails to many different +addresses. (in case you now got a new business idea: monit is not +really suitable for sending spam). +.PP +Recipients in the global and in the local lists are alerted when +a service failed, recovered or changed. If the same email address +is in the global and in the local list, monit will send only one +alert. Local (per service) defined alert email addresses override +global addresses in case of a conflict. Finally, you may choose +to only use a global alert list (recommended), a local per +service list or both. +.PP +It is also possible to disable the global alerts localy for +particular service(s) and recipients. +.Sh "Setting a global alert statement" +.IX Subsection "Setting a global alert statement" +If a change occurred on a monitored services, monit will send an +alert to all recipients in the global list who have registered +interest for the event type. Here is the syntax for the global +alert statement: +.IP "\s-1SET\s0 \s-1ALERT\s0 mail-address [ [\s-1NOT\s0] {events}] [\s-1MAIL\-FORMAT\s0 {mail\-format}] [\s-1REMINDER\s0 number]" 4 +.IX Item "SET ALERT mail-address [ [NOT] {events}] [MAIL-FORMAT {mail-format}] [REMINDER number]" +.PP +Simply using the following in the global section of monitrc: +.PP +.Vb 1 +\& set alert foo@bar +.Ve +.PP +will send a default email to the address foo@bar whenever an +event occurred on any service. Such an event may be that a +service timed out, a service was doesn't exist or a service does +exist (on recovery) and so on. If you want to send alert messages +to more email addresses, add a \fIset alert 'email'\fR statement for +each address. +.PP +For explanations of the \fIevents, MAIL-FORMAT and \s-1REMINDER\s0\fR +keywords above, please see below. +.PP +When you want to enable global alert recipient which will receive +all event alerts except some type, you can also use the \s-1NOT\s0 negation +option ahead of events list which allows you to set the recipient +for \*(L"all but specified events\*(R" (see bellow for more details). +.Sh "Setting a local alert statement" +.IX Subsection "Setting a local alert statement" +Each service can also have its own recipient list. +.IP "\s-1ALERT\s0 mail-address [ [\s-1NOT\s0] {events}] [\s-1MAIL\-FORMAT\s0 {mail\-format}] [\s-1REMINDER\s0 number]" 4 +.IX Item "ALERT mail-address [ [NOT] {events}] [MAIL-FORMAT {mail-format}] [REMINDER number]" +.PP +or +.IP "\s-1NOALERT\s0 mail-address" 4 +.IX Item "NOALERT mail-address" +.PP +If you only want an alert message sent for certain events for +certain service(s), for example only for timeout events or only +if a service died, then postfix the alert-statement with a filter +block: +.PP +.Vb 3 +\& check process myproc with pidfile /var/run/my.pid +\& alert foo@bar only on { timeout, nonexist } +\& ... +.Ve +.PP +(\fIonly\fR and \fIon\fR are noise keywords, ignored by monit. As a +side note; Noise keywords are used in the control file grammar to +make an entry resemble English and thus make it easier to read +(or, so goes the philosophy). The full set of available noise +keywords are listed below in the Control File section). +.PP +You can also set the alert to send all events except specified +using the list negation \- the word \fInot\fR ahead of the event +list. For example when you want to receive alerts for all events +except the monit instance related, you can write (note that the +noise words 'but' and 'on' are optional): +.PP +.Vb 3 +\& check system myserver +\& alert foo@bar but not on { instance } +\& ... +.Ve +.PP +instead of: +.PP +.Vb 13 +\& alert foo@bar on { change +\& checksum +\& data +\& exec +\& gid +\& icmp +\& invalid +\& match +\& nonexist +\& permission +\& size +\& timeout +\& timestamp } +.Ve +.PP +This will enable all alerts for foo@bar, except the monit instance +related alerts. +.PP +Event filtering can be used to send a mail to different email +addresses depending on the events that occurred. For instance: +.PP +.Vb 3 +\& alert foo@bar { nonexist, timeout, resource, icmp, connection } +\& alert security@bar on { checksum, permission, uid, gid } +\& alert manager@bar +.Ve +.PP +This will send an alert message to foo@bar whenever a nonexist, +timeout, resource or connection problem occurs and a message to +security@bar if a checksum, permission, uid or gid problem +occurs. And finally, a message to manager@bar whenever any error +event occurs. +.PP +This is the list of events you can use in a mail\-filter: \fIuid, +gid, size, nonexist, data, icmp, instance, invalid, exec, +changed, timeout, resource, checksum, match, timestamp, +connection, permission\fR +.PP +You can also disable the alerts localy using the \s-1NOALERT\s0 statement. +This is useful for example when you have lot of services monitored, +used the global alert statement, but don't want to receive alerts +for some minor subset of services: +.PP +.Vb 1 +\& noalert appadmin@bar +.Ve +.PP +For example when you will place the noalert statement to the +\&'check system', the given user won't receive the system related +alerts (such as monit instance started/stopped/reloaded alert, +system overloaded alert, etc.) but will receive the alerts for +all other monitored services. +.PP +The following example will alert foo@bar on all events on all +services by default, except the service mybar which will send an +alert only on timeout. The trick is based on the fact that local +definition of the same recipient overrides the global setting +(including registered events and mail format): +.PP +.Vb 1 +\& set alert foo@bar +.Ve +.PP +.Vb 4 +\& check process myfoo with pidfile /var/run/myfoo.pid +\& ... +\& check process mybar with pidfile /var/run/mybar.pid +\& alert foo@bar only on { timeout } +.Ve +.PP +The 'instance' alert type report events related to monit +internals, such as when a monit instance was started, stopped or +reloaded. +.PP +If the \s-1MTA\s0 (mailserver) for sending alerts is not available, +monit \fIcan\fR queue events on the local file-system until the \s-1MTA\s0 +recover. Monit will then post queued events in order with their +original timestamp so the events are not lost. This feature is +most useful if monit is used together with e.g. m/monit and when +event history is important. +.Sh "Alert message layout" +.IX Subsection "Alert message layout" +monit provides a default mail message layout that is short and to +the point. Here's an example of a standard alert mail sent by +monit: +.PP +.Vb 4 +\& From: monit@tildeslash.com +\& Subject: monit alert \-\- Does not exist apache +\& To: hauk@tildeslash.com +\& Date: Thu, 04 Sep 2003 02:33:03 +0200 +.Ve +.PP +.Vb 1 +\& Does not exist Service apache +.Ve +.PP +.Vb 3 +\& Date: Thu, 04 Sep 2003 02:33:03 +0200 +\& Action: restart +\& Host: www.tildeslash.com +.Ve +.PP +.Vb 2 +\& Your faithful employee, +\& monit +.Ve +.PP +If you want to, you can change the format of this message with +the optional \fImail-format\fR statement. The syntax for this +statement is as follows: +.PP +.Vb 7 +\& mail\-format { +\& from: monit@localhost +\& subject: $SERVICE $EVENT at $DATE +\& message: Monit $ACTION $SERVICE at $DATE on $HOST: $DESCRIPTION. +\& Yours sincerely, +\& monit +\& } +.Ve +.PP +Where the keyword \fIfrom:\fR is the email address monit should +pretend it is sending from. It does not have to be a real mail +address, but it must be a proper formated mail address, on the +form: name@domain. The keyword \fIsubject:\fR is for the email +subject line. The subject must be on only \fIone\fR line. The +\&\fImessage:\fR keyword denotes the mail body. If used, this keyword +should always be the last in a mail-format statement. The mail +body can be as long as you want and must \fBnot\fR contain the '}' +character. +.PP +All of these format keywords are optional but you must provide at +least one. Thus if you only want to change the from address monit +is using you can do: +.PP +.Vb 1 +\& set alert foo@bar with mail\-format { from: bofh@bar.baz } +.Ve +.PP +From the previous example you will notice that some special \f(CW$XXX\fR +variables was used. If used, they will be substituted and +expanded into the text with these values: +.IP "* \fI$EVENT\fR" 4 +.IX Item "$EVENT" +.Vb 2 +\& A string describing the event that occurred. The values are +\& fixed and are: +.Ve +.Sp +.Vb 19 +\& Event: | Failure state: | Recovery state: +\& \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\- +\& CHANGED | "Changed" | "Changed back" +\& CHECKSUM | "Checksum failed" | "Checksum passed" +\& CONNECTION| "Connection failed" | "Connection passed" +\& DATA | "Data access error" | "Data access succeeded" +\& EXEC | "Execution failed" | "Execution succeeded" +\& GID | "GID failed" | "GID passed" +\& ICMP | "ICMP failed" | "ICMP passed" +\& INSTANCE | "Monit instance changed"| "Monit instance changed not" +\& INVALID | "Invalid type" | "Type passed" +\& MATCH | "Regex match" | "No regex match" +\& NONEXIST | "Does not exist" | "Exists" +\& PERMISSION| "Permission failed" | "Permission passed" +\& RESOURCE | "Resource limit matched"| "Resource limit passed" +\& SIZE | "Size failed" | "Size passed" +\& TIMEOUT | "Timeout" | "Timeout recovery" +\& TIMESTAMP | "Timestamp failed" | "Timestamp passed" +\& UID | "UID failed" | "UID passed" +.Ve +.IP "* \fI$SERVICE\fR" 4 +.IX Item "$SERVICE" +.Vb 1 +\& The service entry name in monitrc +.Ve +.IP "* \fI$DATE\fR" 4 +.IX Item "$DATE" +.Vb 1 +\& The current time and date (RFC 822 date style). +.Ve +.IP "* \fI$HOST\fR" 4 +.IX Item "$HOST" +.Vb 1 +\& The name of the host monit is running on +.Ve +.IP "* \fI$ACTION\fR" 4 +.IX Item "$ACTION" +.Vb 2 +\& The name of the action which was done. Action names are fixed +\& and are: +.Ve +.Sp +.Vb 9 +\& Action: | Name: +\& \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\- +\& ALERT | "alert" +\& EXEC | "exec" +\& MONITOR | "monitor" +\& RESTART | "restart" +\& START | "start" +\& STOP | "stop" +\& UNMONITOR| "unmonitor" +.Ve +.IP "* \fI$DESCRIPTION\fR" 4 +.IX Item "$DESCRIPTION" +.Vb 1 +\& The description of the error condition +.Ve +.Sh "Setting a global mail format" +.IX Subsection "Setting a global mail format" +It is possible to set a standard mail format with the +following global set-statement (keywords are in capital): +.IP "\s-1SET\s0 MAIL-FORMAT {mail\-format}" 4 +.IX Item "SET MAIL-FORMAT {mail-format}" +.PP +Format set with this statement will apply to every alert +statement that does \fInot\fR have its own specified mail\-format. +This statement is most useful for setting a default from address +for messages sent by monit, like so: +.PP +.Vb 1 +\& set mail\-format { from: monit@foo.bar.no } +.Ve +.Sh "Setting a error reminder" +.IX Subsection "Setting a error reminder" +Monit by default sends just one error notification when the +service failed and another one when it has recovered. If you want +to be notified more then once in the case that the service +remains failed, you can use the reminder option of alert +statement (keywords are in capital): +.IP "\s-1ALERT\s0 ... [\s-1WITH\s0] \s-1REMINDER\s0 [\s-1ON\s0] number [\s-1CYCLES\s0]" 4 +.IX Item "ALERT ... [WITH] REMINDER [ON] number [CYCLES]" +.PP +For example if you want to be notified each tenth cycle when the +service remains failed, you can use: +.PP +.Vb 1 +\& alert foo@bar with reminder on 10 cycles +.Ve +.PP +If you want to be notified on each failed cycle, you can use: +.PP +.Vb 1 +\& alert foo@bar with reminder on 1 cycle +.Ve +.Sh "Setting a mail server for alert messages" +.IX Subsection "Setting a mail server for alert messages" +The mail server monit should use to send alert messages is +defined with a global set statement (keywords are in capital and +optional statements in [brackets]): +.PP +.Vb 2 +\& SET MAILSERVER {host name [PORT port]|ip\-address [PORT port]}+ +\& [with TIMEOUT X SECONDS] +.Ve +.PP +The port statement allows to use \s-1SMTP\s0 servers other then those +listening on port 25. If omitted, port 25 is used for the +connection. +.PP +As you can see, it is possible to set several \s-1SMTP\s0 servers. If +monit cannot connect to the first server in the list it will try +the second server and so on. Monit has a default 5 seconds +connection timeout and if the \s-1SMTP\s0 server is slow, monit could +timeout when connecting or reading from the server. You can use +the optional timeout statement to explicit set the timeout to a +higher value if needed. Here is an example for setting several +mail servers: +.PP +.Vb 2 +\& set mailserver mail.tildeslash.com, mail.foo.bar port 10025, +\& localhost with timeout 15 seconds +.Ve +.PP +Here monit will first try to connect to the server +\&\*(L"mail.tildeslash.com\*(R", if this server is down monit will try +\&\*(L"mail.foo.bar\*(R" on port 10025 and finally \*(L"localhost\*(R". We do also +set an explicit connect and read timeout; If monit cannot connect +to the first \s-1SMTP\s0 server in the list within 15 seconds it will +try the next server and so on. The \fIset mailserver ..\fR statement +is optional and if not defined monit defaults to use localhost as +the \s-1SMTP\s0 server. +.Sh "Event queue" +.IX Subsection "Event queue" +Monit provide optionally queueing of event alerts that cannot be +sent. For example, if no mail-server is available at the moment, +monit can store events in a queue and try to reprocess them at +the next cycle. As soon as the mail-server recover, monit will +post the queued events. The queue is persistent across monit +restarts and provided that the back-end filesystem is persistent +too, across system restart as well. +.PP +By default, the queue is disabled and if the alert handler fails, +monit will simply drop the alert message. To enable the event +queue, add the following statement to the monit control file: +.PP +.Vb 1 +\& SET EVENTQUEUE BASEDIR [SLOTS ] +.Ve +.PP +The is the path to the directory where events will be +stored. Optionally if you want to limit the queue size (maximum +events count), use the slots option. If the slots option is not +used, monit will store as many events as the backend filesystem +allows. +.PP +Example: +.PP +.Vb 3 +\& set eventqueue +\& basedir /var/monit +\& slots 5000 +.Ve +.PP +The events are stored in binary format, one file per event. The +file size is ca. 130 bytes or a bit more (depending on the +message length). The file name is composed of the unix timestamp, +underscore and the service name, for example: +.PP +.Vb 1 +\& /var/monit/1131269471_apache +.Ve +.PP +If you are running more then one monit instance on the same +machine, you \fBmust\fR use separated event queue directories to +avoid sending wrong alerts to the wrong addresses. +.PP +If you want to purge the queue by hand (remove queued +event\-files), monit should be stopped before the removal. +.SH "SERVICE TIMEOUT" +.IX Header "SERVICE TIMEOUT" +\&\fBmonit\fR provides a service timeout mechanism for situations +where a service simply refuses to start or respond over a longer +period. In cases like this, and particularly if monit's poll-cycle +is low, monit will simply increase the machine load by trying to +restart the service. +.PP +The timeout mechanism monit provides is based on two variables, +i.e. the number the service has been started and the number of +poll\-cycles. For example, if a service had \fIx\fR restarts within +\&\fIy\fR poll-cycles (where \fIx\fR <= \fIy\fR) then monit will timeout and +not (re)start the service on the next cycle. If a timeout occurs +monit will send you an alert message if you have register +interest for this event. +.PP +The syntax for the timeout statement is as follows (keywords +are in capital): +.IP "\s-1IF\s0 \s-1NUMBER\s0 \s-1RESTART\s0 \s-1NUMBER\s0 \s-1CYCLE\s0(S) \s-1THEN\s0 \s-1TIMEOUT\s0" 4 +.IX Item "IF NUMBER RESTART NUMBER CYCLE(S) THEN TIMEOUT" +.PP +Where the first number is the number of service restarts and the +second, the number of poll\-cycles. If the number of cycles was +reached without a timeout, the service start-counter is reset to +zero. This provides some granularity to catch exceptional cases +and do a service timeout, but let occasional service start and +restarts happen without having an accumulated timeout. +.PP +Here is an example where monit will timeout (not check the +service) if the service was restarted 2 times within 3 cycles: +.PP +.Vb 1 +\& if 2 restarts within 3 cycles then timeout +.Ve +.PP +To have monit check the service again after a timeout, run 'monit +monitor service' from the command line. This will remove the +timeout lock in the daemon and make the daemon start and check +the service again. +.SH "SERVICE TESTS" +.IX Header "SERVICE TESTS" +Monit provides several tests you may utilize in a service entry +to test a service. Basically here are two classes of tests: +variable and constant object tests. +.PP +Constant object tests are related to failed/passed state. In the +case of error, monit will watch whether the failed parameter will +recover \- in such case it will handle recovery related +action. General format: +.IP "\s-1IF\s0 <\s-1TEST\s0> [[] [\s-1TIMES\s0 \s-1WITHIN\s0] \s-1CYCLES\s0] \s-1THEN\s0 \s-1ACTION\s0 [\s-1ELSE\s0 \s-1IF\s0 \s-1PASSED\s0 [[] [\s-1TIMES\s0 \s-1WITHIN\s0] \s-1CYCLES\s0] \s-1THEN\s0 \s-1ACTION\s0]" 4 +.IX Item "IF [[] [TIMES WITHIN] CYCLES] THEN ACTION [ELSE IF PASSED [[] [TIMES WITHIN] CYCLES] THEN ACTION]" +.PP +For constant object tests if the <\s-1TEST\s0> should validate to true, +then the selected action is executed each cycle the condition +remains true. The value for comparison is constant. Recovery +action is evaluated only once (on failed\->passed state change +only). The '\s-1ELSE\s0 \s-1IF\s0 \s-1PASSED\s0' part is optional \- if omitted, +monit will do alert action on recovery by default. The alert is +delivered only once on each state change unless overridden by +\&'reminder' alert option. +.PP +Variable object tests begins with '\s-1IF\s0 \s-1CHANGED\s0' statement and +serves for monitoring of object, which property can change legally +\&\- monit watches whether the value will change again. You can use +it just for alert or to involve some automatic action, as for +example to reload monitored process after its configuration file +was changed. Variable tests are supported for 'checksum', +\&'size', 'pid, 'ppid' and 'timestamp' tests only, if you consider +that other tests can be useful in variable form too, please let +us know. +.IP "\s-1IF\s0 \s-1CHANGED\s0 <\s-1TEST\s0> [[] [\s-1TIMES\s0 \s-1WITHIN\s0] \s-1CYCLES\s0] \s-1THEN\s0 \s-1ACTION\s0" 4 +.IX Item "IF CHANGED [[] [TIMES WITHIN] CYCLES] THEN ACTION" +.PP +For variable object tests if the <\s-1TEST\s0> should validate to true, +then the selected action is executed once and monit will watch +for another change. The value for comparison is a variable where +the last result becomes the actual value, which is compared in +future cycles. The alert is delivered each time the condition +becomes true. +.PP +You can restrict the event ratio needed to change the state: +.IP "... [[] [\s-1TIMES\s0 \s-1WITHIN\s0] \s-1CYCLES\s0] ..." 4 +.IX Item "... [[] [TIMES WITHIN] CYCLES] ..." +.PP +This part is optional and is supported by all testing rules. +It defines how many event occurrences during how many cycles +are needed to trigger the following action. You can use it +in several ways \- the core syntax is: +.PP +.Vb 1 +\& [] CYCLES +.Ve +.PP +It is possible to use filling words which give the rule better +first-sight sense. You can use any filling words such as: \s-1FOR\s0, +\&\s-1TIMES\s0, \s-1WITHIN\s0, thus for example: +.PP +.Vb 1 +\& if failed port 80 for 3 times within 5 cycles then alert +.Ve +.PP +or +.PP +.Vb 1 +\& if failed port 80 for 10 cycles then unmonitor +.Ve +.PP +When you don't specify the , it equals to by default, +thus the rule applies when consecutive cycles of inverse +event occurred (relatively to the current service state). +.PP +When you omit it at all, monit will by default change state +on first inverse event, which is equivalent to this notation: +.PP +.Vb 1 +\& 1 times within 1 cycles +.Ve +.PP +It is possible to use this option for failed, passed/recovered +or changed rules. More complex examples: +.PP +.Vb 7 +\& check device rootfs with path /dev/hda1 +\& if space usage > 80% 5 times within 15 cycles +\& then alert +\& else if passed for 10 cycles then alert +\& if space usage > 90% for 5 cycles then +\& exec '/try/to/free/the/space' +\& if space usage > 99% then exec '/stop/processess' +.Ve +.PP +Note that the maximal cycles count which can be used in the rule +is limited by the size of 'long long' data type on your platform. +This provides 64 cycles on usual platforms currently. In the case +that you use unsupported value, the configuration parser will +tell you the limits during monit startup. +.PP +You must select an action to be executed from this list: +.IP "\(bu" 4 +\&\fB\s-1ALERT\s0\fR sends the user an alert event on each state change (for +constant object tests) or on each change (for variable object +tests). +.IP "\(bu" 4 +\&\fB\s-1RESTART\s0\fR restarts the service \fIand\fR sends an alert. Restart is +conducted by first calling the service's registered stop method +and then the service's start method. +.IP "\(bu" 4 +\&\fB\s-1START\s0\fR starts the service by calling the service's registered +start method \fIand\fR send an alert. +.IP "\(bu" 4 +\&\fB\s-1STOP\s0\fR stops the service by calling the service's registered +stop method \fIand\fR send an alert. If monit stops a service it +will not be checked by monit anymore nor restarted again +later. To reactivate monitoring of the service again you must +explicitly enable monitoring from the web interface or from the +console, e.g. 'monit monitor apache'. +.IP "\(bu" 4 +\&\fB\s-1EXEC\s0\fR may be used to execute an arbitrary program \fIand\fR send +an alert. If you choose this action you must state the program to +be executed and if the program require arguments you must enclose +the program and its arguments in a quoted string. You may +optionally specify the uid and gid the executed program should +switch to upon start. For instance: +.Sp +.Vb 2 +\& exec "/usr/local/tomcat/bin/startup.sh" +\& as uid nobody and gid nobody +.Ve +.Sp +This may be useful if the program to be started cannot change to +a lesser privileged user and group. This is typically needed for +Java Servers. Remember, if monit is run by the superuser, then +all programs executed by monit will be started with superuser +privileges unless the uid and gid extension was used. +.IP "\(bu" 4 +\&\fB\s-1MONITOR\s0\fR will enable monitoring of the service \fIand\fR send +an alert. +.IP "\(bu" 4 +\&\fB\s-1UNMONITOR\s0\fR will disable monitoring of the service \fIand\fR send +an alert. The service will not be checked by monit anymore nor +restarted again later. To reactivate monitoring of the service +you must explicitly enable monitoring from monit's web interface +or from the console using the monitor argument. +.Sh "\s-1RESOURCE\s0 \s-1TESTING\s0" +.IX Subsection "RESOURCE TESTING" +Monit can examine how much system resources a services are +using. This test may only be used within a system or process +service entry in the monit control file. +.PP +Depending on the system or process characteristics, services +can be stopped or restarted and alerts can be generated. Thus +it is possible to utilize systems which are idle and to spare +system under high load. +.PP +The full syntax for the resource-statements used for resource +testing is as follows (keywords are in capital and optional +statements in [brackets]), +.IP "\s-1IF\s0 resource operator value [[] \s-1CYCLES\s0] \s-1THEN\s0 action [\s-1ELSE\s0 \s-1IF\s0 \s-1PASSED\s0 [[] \s-1CYCLES\s0] \s-1THEN\s0 action]" 4 +.IX Item "IF resource operator value [[] CYCLES] THEN action [ELSE IF PASSED [[] CYCLES] THEN action]" +.PP +\&\fIresource\fR is a choice of \*(L"\s-1CPU\s0\*(R", \*(L"\s-1CPU\s0([user|system|wait])\*(R", +\&\*(L"\s-1MEMORY\s0\*(R", \*(L"\s-1CHILDREN\s0\*(R", \*(L"\s-1TOTALMEMORY\s0\*(R", \*(L"\s-1LOADAVG\s0([1min|5min|15min])\*(R". +Some resources can be used inside of system service container, +some in process service container and some in both: +.PP +System only resource tests: +.PP +\&\s-1CPU\s0([user|system|wait]) is the percent of time that the system +spend in user or system/kernel space. Some systems such as linux +2.6 supports 'wait' indicator as well. +.PP +Process only resource tests: +.PP +\&\s-1CPU\s0 is the \s-1CPU\s0 usage of the process and its children in +parts of hundred (percent). +.PP +\&\s-1CHILDREN\s0 is the number of child processes of the process. +.PP +\&\s-1TOTALMEMORY\s0 is the memory usage of the process and its child +processes in either percent or as an amount (Byte, kB, \s-1MB\s0, \s-1GB\s0). +.PP +System and process resource tests: +.PP +\&\s-1MEMORY\s0 is the memory usage of the system or in the process context +of the process without its child processes in either percent +(of the systems total) or as an amount (Byte, kB, \s-1MB\s0, \s-1GB\s0). +.PP +\&\s-1LOADAVG\s0([1min|5min|15min]) refers to the system's load average. +The load average is the number of processes in the system run +queue, averaged over the specified time period. +.PP +\&\fIoperator\fR is a choice of \*(L"<\*(R", \*(L">\*(R", \*(L"!=\*(R", \*(L"==\*(R" in C notation, +\&\*(L"gt\*(R", \*(L"lt\*(R", \*(L"eq\*(R", \*(L"ne\*(R" in shell sh notation and \*(L"greater\*(R", +\&\*(L"less\*(R", \*(L"equal\*(R", \*(L"notequal\*(R" in human readable form (if not +specified, default is \s-1EQUAL\s0). +.PP +\&\fIvalue\fR is either an integer or a real number (except for +\&\s-1CHILDREN\s0). For \s-1CPU\s0, \s-1MEMORY\s0 and \s-1TOTALMEMORY\s0 you need to specify a +\&\fIunit\fR. This could be \*(L"%\*(R" or if applicable \*(L"B\*(R" (Byte), \*(L"kB\*(R" +(1024 Byte), \*(L"\s-1MB\s0\*(R" (1024 KiloByte) or \*(L"\s-1GB\s0\*(R" (1024 MegaByte). +.PP +\&\fIaction\fR is a choice of \*(L"\s-1ALERT\s0\*(R", \*(L"\s-1RESTART\s0\*(R", \*(L"\s-1START\s0\*(R", \*(L"\s-1STOP\s0\*(R", +\&\*(L"\s-1EXEC\s0\*(R", \*(L"\s-1MONITOR\s0\*(R" or \*(L"\s-1UNMONITOR\s0\*(R". +.PP +To calculate the cycles, a counter is raised whenever the +expression above is true and it is lowered whenever it is false +(but not below 0). All counters are reset in case of a restart. +.PP +The following is an example to check that the \s-1CPU\s0 usage of a +service is not going beyond 50% during five poll cycles. If it +does, monit will restart the service: +.PP +.Vb 1 +\& if cpu is greater than 50% for 5 cycles then restart +.Ve +.PP +See also the example section below. +.Sh "\s-1FILE\s0 \s-1CHECKSUM\s0 \s-1TESTING\s0" +.IX Subsection "FILE CHECKSUM TESTING" +The checksum statement may only be used in a file service +entry. If specified in the control file, monit will compute +a md5 or sha1 checksum for a file. +.PP +The checksum test in constant form is used to verify that a +file does not change. Syntax (keywords are in capital): +.IP "\s-1IF\s0 \s-1FAILED\s0 [MD5|SHA1] \s-1CHECKSUM\s0 [\s-1EXPECT\s0 checksum] [[] \s-1CYCLES\s0] \s-1THEN\s0 action [\s-1ELSE\s0 \s-1IF\s0 \s-1PASSED\s0 [[] \s-1CYCLES\s0] \s-1THEN\s0 action]" 4 +.IX Item "IF FAILED [MD5|SHA1] CHECKSUM [EXPECT checksum] [[] CYCLES] THEN action [ELSE IF PASSED [[] CYCLES] THEN action]" +.PP +The checksum test in variable form is used to watch for +file changes. Syntax (keywords are in capital): +.IP "\s-1IF\s0 \s-1CHANGED\s0 [MD5|SHA1] \s-1CHECKSUM\s0 [[] \s-1CYCLES\s0] \s-1THEN\s0 action" 4 +.IX Item "IF CHANGED [MD5|SHA1] CHECKSUM [[] CYCLES] THEN action" +.PP +The choice of \s-1MD5\s0 or \s-1SHA1\s0 is optional. \s-1MD5\s0 features a 256 bit +and \s-1SHA1\s0 a 320 bit checksum. If this option is omitted monit +tries to guess the method from the \s-1EXPECT\s0 string or uses \s-1MD5\s0 as +default. +.PP +\&\fIexpect\fR is optional and if used it specifies a md5 or sha1 +string monit should expect when testing a file's checksum. If +\&\fIexpect\fR is used, monit will not compute an initial checksum for +the file, but instead use the string you submit. For example: +.PP +.Vb 3 +\& if failed checksum and +\& expect the sum 8f7f419955cefa0b33a2ba316cba3659 +\& then alert +.Ve +.PP +You can, for example, use the \s-1GNU\s0 utility \fI\fImd5sum\fI\|(1)\fR or +\&\fI\fIsha1sum\fI\|(1)\fR to create a checksum string for a file and +use this string in the expect\-statement. +.PP +\&\fIaction\fR is a choice of \*(L"\s-1ALERT\s0\*(R", \*(L"\s-1RESTART\s0\*(R", \*(L"\s-1START\s0\*(R", \*(L"\s-1STOP\s0\*(R", +\&\*(L"\s-1EXEC\s0\*(R", \*(L"\s-1MONITOR\s0\*(R" or \*(L"\s-1UNMONITOR\s0\*(R". +.PP +The checksum statement in variable form may be used to check a +file for changes and if changed, do a specified action. For +instance to reload a server if its configuration file was +changed. The following illustrate this for the apache web server: +.PP +.Vb 3 +\& check file httpd.conf path /usr/local/apache/conf/httpd.conf +\& if changed sha1 checksum +\& then exec "/usr/local/apache/bin/apachectl graceful" +.Ve +.PP +If you plan to use the checksum statement for security reasons, +(a very good idea, by the way) and to monitor a file or files +which should not change, then please use constant form and also +read the \s-1DEPENDENCY\s0 \s-1TREE\s0 section below to see a detailed example +on how to do this properly. +.PP +Monit can also test the checksum for files on a remote host via +the \s-1HTTP\s0 protocol. See the \s-1CONNECTION\s0 \s-1TESTING\s0 section below. +.Sh "\s-1TIMESTAMP\s0 \s-1TESTING\s0" +.IX Subsection "TIMESTAMP TESTING" +The timestamp statement may only be used in a file, fifo or directory +service entry. +.PP +The timestamp test in constant form is used to verify various +timestamp conditions. Syntax (keywords are in capital): +.IP "\s-1IF\s0 \s-1TIMESTAMP\s0 [[operator] value [unit]] [[] \s-1CYCLES\s0] \s-1THEN\s0 action [\s-1ELSE\s0 \s-1IF\s0 \s-1PASSED\s0 [[] \s-1CYCLES\s0] \s-1THEN\s0 action]" 4 +.IX Item "IF TIMESTAMP [[operator] value [unit]] [[] CYCLES] THEN action [ELSE IF PASSED [[] CYCLES] THEN action]" +.PP +The timestamp statement in variable form is simply to test an +existing file or directory for timestamp changes and if changed, +execute an action. Syntax (keywords are in capital): +.IP "\s-1IF\s0 \s-1CHANGED\s0 \s-1TIMESTAMP\s0 [[] \s-1CYCLES\s0] \s-1THEN\s0 action" 4 +.IX Item "IF CHANGED TIMESTAMP [[] CYCLES] THEN action" +.PP +\&\fIoperator\fR is a choice of \*(L"<\*(R", \*(L">\*(R", \*(L"!=\*(R", \*(L"==\*(R" in C notation, +\&\*(L"\s-1GT\s0\*(R", \*(L"\s-1LT\s0\*(R", \*(L"\s-1EQ\s0\*(R", \*(L"\s-1NE\s0\*(R" in shell sh notation and \*(L"\s-1GREATER\s0\*(R", +\&\*(L"\s-1LESS\s0\*(R", \*(L"\s-1EQUAL\s0\*(R", \*(L"\s-1NOTEQUAL\s0\*(R" in human readable form (if not +specified, default is \s-1EQUAL\s0). +.PP +\&\fIvalue\fR is a time watermark. +.PP +\&\fIunit\fR is either \*(L"\s-1SECOND\s0\*(R", \*(L"\s-1MINUTE\s0\*(R", \*(L"\s-1HOUR\s0\*(R" or \*(L"\s-1DAY\s0\*(R" (it is also +possible to use \*(L"\s-1SECONDS\s0\*(R", \*(L"\s-1MINUTES\s0\*(R", \*(L"\s-1HOURS\s0\*(R", or \*(L"\s-1DAYS\s0\*(R"). +.PP +\&\fIaction\fR is a choice of \*(L"\s-1ALERT\s0\*(R", \*(L"\s-1RESTART\s0\*(R", \*(L"\s-1START\s0\*(R", \*(L"\s-1STOP\s0\*(R", +\&\*(L"\s-1EXEC\s0\*(R", \*(L"\s-1MONITOR\s0\*(R" or \*(L"\s-1UNMONITOR\s0\*(R". +.PP +The variable timestamp statement is useful for checking a file +for changes and then execute an action. This version was written +particularly with configuration files in mind. For instance, if +you monitor the apache web server you can use this statement to +reload apache if the \fIhttpd.conf\fR (apache's configuration file) +was changed. Like so: +.PP +.Vb 3 +\& check file httpd.conf with path /usr/local/apache/conf/httpd.conf +\& if changed timestamp +\& then exec "/usr/local/apache/bin/apachectl graceful" +.Ve +.PP +The constant timestamp version is useful for monitoring systems +able to report its state by changing the timestamp of certain +state files. For instance the \fIiPlanet Messaging server stored +process\fR system updates the timestamp of: +.PP +.Vb 3 +\& o stored.ckp +\& o stored.lcu +\& o stored.per +.Ve +.PP +If a task should fail, the system keeps the timestamp. To report +stored problems you can use the following statements: +.PP +.Vb 2 +\& check file stored.ckp with path /msg\-foo/config/stored.ckp +\& if timestamp > 1 minute then alert +.Ve +.PP +.Vb 2 +\& check file stored.lcu with path /msg\-foo/config/stored.lcu +\& if timestamp > 5 minutes then alert +.Ve +.PP +.Vb 2 +\& check file stored.per with path /msg\-foo/config/stored.per +\& if timestamp > 1 hour then alert +.Ve +.PP +As mentioned above, you can also use the timestamp statement for +monitoring directories for changes. If files are added or removed +from a directory, its timestamp is changed: +.PP +.Vb 2 +\& check directory mydir path /foo/directory +\& if timestamp > 1 hour then alert +.Ve +.PP +or +.PP +.Vb 2 +\& check directory myotherdir path /foo/secure/directory +\& if timestamp < 1 hour then alert +.Ve +.PP +The following example is a hack for restarting a process after a +certain time. Sometimes this is a necessary workaround for some +third-party applications, until the vendor fix a problem: +.PP +.Vb 3 +\& check file server.pid path /var/run/server.pid +\& if timestamp > 7 days +\& then exec "/usr/local/server/restart\-server" +.Ve +.Sh "\s-1FILE\s0 \s-1SIZE\s0 \s-1TESTING\s0" +.IX Subsection "FILE SIZE TESTING" +The size statement may only be used in a file service entry. +If specified in the control file, monit will compute a size +for a file. +.PP +The size test in constant form is used to verify various +size conditions. Syntax (keywords are in capital): +.IP "\s-1IF\s0 \s-1SIZE\s0 [[operator] value [unit]] [[] \s-1CYCLES\s0] \s-1THEN\s0 action [\s-1ELSE\s0 \s-1IF\s0 \s-1PASSED\s0 [[] \s-1CYCLES\s0] \s-1THEN\s0 action]" 4 +.IX Item "IF SIZE [[operator] value [unit]] [[] CYCLES] THEN action [ELSE IF PASSED [[] CYCLES] THEN action]" +.PP +The size statement in variable form is simply to test an existing +file for size changes and if changed, execute an action. Syntax +(keywords are in capital): +.IP "\s-1IF\s0 \s-1CHANGED\s0 \s-1SIZE\s0 [[] \s-1CYCLES\s0] \s-1THEN\s0 action" 4 +.IX Item "IF CHANGED SIZE [[] CYCLES] THEN action" +.PP +\&\fIoperator\fR is a choice of \*(L"<\*(R", \*(L">\*(R", \*(L"!=\*(R", \*(L"==\*(R" in C notation, +\&\*(L"\s-1GT\s0\*(R", \*(L"\s-1LT\s0\*(R", \*(L"\s-1EQ\s0\*(R", \*(L"\s-1NE\s0\*(R" in shell sh notation and \*(L"\s-1GREATER\s0\*(R", +\&\*(L"\s-1LESS\s0\*(R", \*(L"\s-1EQUAL\s0\*(R", \*(L"\s-1NOTEQUAL\s0\*(R" in human readable form (if not +specified, default is \s-1EQUAL\s0). +.PP +\&\fIvalue\fR is a size watermark. +.PP +\&\fIunit\fR is a choice of \*(L"B\*(R",\*(L"\s-1KB\s0\*(R",\*(L"\s-1MB\s0\*(R",\*(L"\s-1GB\s0\*(R" or long alternatives +\&\*(L"byte\*(R", \*(L"kilobyte\*(R", \*(L"megabyte\*(R", \*(L"gigabyte\*(R". If it is not +specified, \*(L"byte\*(R" unit is assumed by default. +.PP +\&\fIaction\fR is a choice of \*(L"\s-1ALERT\s0\*(R", \*(L"\s-1RESTART\s0\*(R", \*(L"\s-1START\s0\*(R", \*(L"\s-1STOP\s0\*(R", +\&\*(L"\s-1EXEC\s0\*(R", \*(L"\s-1MONITOR\s0\*(R" or \*(L"\s-1UNMONITOR\s0\*(R". +.PP +The variable size test form is useful for checking a file for +changes and send an alert or execute an action. Monit will +register the size of the file at startup and monitor the file for +changes. As soon as the value changed, monit will do specified +action, reset the registered value to new result and continue to +monitor, whether the size changed again. +.PP +One example of use for this statement is to conduct security +checks, for instance: +.PP +.Vb 2 +\& check file su with path /bin/su +\& if changed size then exec "/sbin/ifconfig eth0 down" +.Ve +.PP +which will \*(L"cut the cable\*(R" and stop a possible intruder from +compromising the system further. This test is just one of many +you may use to increase the security awareness on a system. If +you plan to use monit for security reasons we recommend that you +use this test in combination with other supported tests like +checksum, timestamp, and so on. +.PP +The constant size test form may be useful in similar or different +contexts. It can, for instance, be used to test if a certain file +size was exceeded and then alert you or monit may execute a +certain action specified by you. An example is to use this +statement to rotate log files after they have reached a certain +size or to check that a database file does not grow beyond a +specified threshold. +.PP +To rotate a log file: +.PP +.Vb 3 +\& check file myapp.log with path /var/log/myapp.log +\& if size > 50 MB then +\& exec "/usr/local/bin/rotate /var/log/myapp.log myapp" +.Ve +.PP +where /usr/local/bin/rotate may be a simple script, such as: +.PP +.Vb 3 +\& #/bin/bash +\& /bin/mv $1 $1.`date +%y\-%m\-%d` +\& /usr/bin/pkill \-HUP $2 +.Ve +.PP +Or you may use this statement to trigger the \fIlogrotate\fR\|(8) +program, to do an \*(L"emergency\*(R" rotate. Or to send an alert if a +file becomes a known bottleneck if it grows behind a certain size +because of limits in a database engine: +.PP +.Vb 2 +\& check file mydb with path /data/mydatabase.db +\& if size > 1 GB then alert +.Ve +.PP +This is a more restrictive form of the first example where the +size is explicitly defined (note that the real su size is system +dependent): +.PP +.Vb 2 +\& check file su with path /bin/su +\& if size != 95564 then exec "/sbin/ifconfig eth0 down" +.Ve +.Sh "\s-1FILE\s0 \s-1CONTENT\s0 \s-1TESTING\s0" +.IX Subsection "FILE CONTENT TESTING" +The match statement allows you to test the content of a text +file by using regular expressions. This is a great feature if +you need to periodically test files, such as log files, for +certain patterns. If a pattern match, monit defaults to +raise an alert, other actions are also possible. +.PP +The syntax (keywords in capital) for using this function is: +.IP "\s-1IF\s0 [\s-1NOT\s0] \s-1MATCH\s0 {regex|path} [[] \s-1CYCLES\s0] \s-1THEN\s0 action" 4 +.IX Item "IF [NOT] MATCH {regex|path} [[] CYCLES] THEN action" +.PP +\&\fIregex\fR is a string containing the extended regular expression. +See also \fIregex\fR\|(7). +.PP +\&\fIpath\fR is an absolute path to a file containing extended +regular expression on every line. See also \fIregex\fR\|(7). +.PP +\&\fIaction\fR is a choice of \*(L"\s-1ALERT\s0\*(R", \*(L"\s-1RESTART\s0\*(R", \*(L"\s-1START\s0\*(R", \*(L"\s-1STOP\s0\*(R", +\&\*(L"\s-1EXEC\s0\*(R", \*(L"\s-1MONITOR\s0\*(R" or \*(L"\s-1UNMONITOR\s0\*(R". +.PP +You can use the \fI\s-1NOT\s0\fR statement to invert a match. +.PP +The content is only being checked every cycle. If content is +being added and removed between two checks they are unnoticed. +.PP +On startup the read position is set to the end of the file +and monit continue to scan to the end of file on each cycle. +But if the file size should decrease or inode change the read +position is set to the start of the file. +.PP +Only lines ending with a newline character are inspected. Thus, +lines are being ignored until they have been completed with this +character. Also note that only the first 511 characters of a +line are inspected. +.IP "\s-1IGNORE\s0 [\s-1NOT\s0] \s-1MATCH\s0 {regex|path}" 4 +.IX Item "IGNORE [NOT] MATCH {regex|path}" +.PP +Lines matching an \fI\s-1IGNORE\s0\fR are not inspected during later +evaluations. \fI\s-1IGNORE\s0 \s-1MATCH\s0\fR has always precedence over +\&\fI\s-1IF\s0 \s-1MATCH\s0\fR. +.PP +All \fI\s-1IGNORE\s0 \s-1MATCH\s0\fR statements are evaluated first, in the +order of their appearance. Thereafter, all the \fI\s-1IF\s0 \s-1MATCH\s0\fR +statements are evaluated. +.PP +A real life example might look like this: +.PP +.Vb 7 +\& check file syslog with path /var/log/syslog +\& ignore match +\& "^\ew{3} [ :0\-9]{11} [._[:alnum:]\-]+ monit\e[[0\-9]+\e]:" +\& ignore match /etc/monit/ignore.regex +\& if match +\& "^\ew{3} [ :0\-9]{11} [._[:alnum:]\-]+ mrcoffee\e[[0\-9]+\e]:" +\& if match /etc/monit/active.regex then alert +.Ve +.Sh "\s-1FILESYSTEM\s0 \s-1FLAGS\s0 \s-1TESTING\s0" +.IX Subsection "FILESYSTEM FLAGS TESTING" +monit tests the filesystem flags of devices for change. This +test is implicit and monit will send alert in the case of +failure by default. +.PP +You may override the default action using below rule (it may only +be used within a device service entry in the monit control file). +.PP +This test is useful for detecting changes of the filesystem flags +such as when the filesystem became read-only based on disk errors +or the mount flags were changed (such as nosuid). Each platform +provides different flags set. \s-1POSIX\s0 defined the \s-1RDONLY\s0 and \s-1NOSUID\s0 +flags which should work on all platforms. Some platforms (such as +FreeBSD) present another flags in addition. +.PP +The syntax for the fsflags statement is: +.IP "\s-1IF\s0 \s-1CHANGED\s0 \s-1FSFLAGS\s0 [[] \s-1CYCLES\s0] \s-1THEN\s0 action" 4 +.IX Item "IF CHANGED FSFLAGS [[] CYCLES] THEN action" +.PP +\&\fIaction\fR is a choice of \*(L"\s-1ALERT\s0\*(R", \*(L"\s-1RESTART\s0\*(R", \*(L"\s-1START\s0\*(R", \*(L"\s-1STOP\s0\*(R", +\&\*(L"\s-1EXEC\s0\*(R", \*(L"\s-1MONITOR\s0\*(R" or \*(L"\s-1UNMONITOR\s0\*(R". +.PP +Example: +.PP +.Vb 3 +\& check device rootfs with path / +\& if changed fsflags then exec "/my/script" +\& alert root@localhost +.Ve +.Sh "\s-1SPACE\s0 \s-1TESTING\s0" +.IX Subsection "SPACE TESTING" +Monit can test devices/file systems and check for space +usage. This test may only be used within a device service entry +in the monit control file. +.PP +Monit will check a device's total space usage. If you only want +to check available space for non\-superuser, you must set the +watermark appropriately (i.e. total space minus reserved blocks +for the superuser). +.PP +You can obtain (and set) the superuser's reserved blocks size, +for example by using the tune2fs utility on Linux. On Linux 5% of +available blocks are reserved for the superuser by default. To +list the reserved blocks for the superuser: +.PP +.Vb 4 +\& [root@berry monit]# tune2fs \-l /dev/hda1| grep "Reserved block" +\& Reserved block count: 319994 +\& Reserved blocks uid: 0 (user root) +\& Reserved blocks gid: 0 (group root) +.Ve +.PP +On solaris 10% of the blocks are reserved. You can also use +tunefs on solaris to change values on a live filesystem. +.PP +The full syntax for the space statement is: +.IP "\s-1IF\s0 \s-1SPACE\s0 operator value unit [[] \s-1CYCLES\s0] \s-1THEN\s0 action [\s-1ELSE\s0 \s-1IF\s0 \s-1PASSED\s0 [[] \s-1CYCLES\s0] \s-1THEN\s0 action]" 4 +.IX Item "IF SPACE operator value unit [[] CYCLES] THEN action [ELSE IF PASSED [[] CYCLES] THEN action]" +.PP +\&\fIoperator\fR is a choice of \*(L"<\*(R",\*(L">\*(R",\*(L"!=\*(R",\*(L"==\*(R" in c notation, \*(L"gt\*(R", +\&\*(L"lt\*(R", \*(L"eq\*(R", \*(L"ne\*(R" in shell sh notation and \*(L"greater\*(R", \*(L"less\*(R", +\&\*(L"equal\*(R", \*(L"notequal\*(R" in human readable form (if not specified, +default is \s-1EQUAL\s0). +.PP +\&\fIunit\fR is a choice of \*(L"B\*(R",\*(L"\s-1KB\s0\*(R",\*(L"\s-1MB\s0\*(R",\*(L"\s-1GB\s0\*(R", \*(L"%\*(R" or long +alternatives \*(L"byte\*(R", \*(L"kilobyte\*(R", \*(L"megabyte\*(R", \*(L"gigabyte\*(R", +\&\*(L"percent\*(R". +.PP +\&\fIaction\fR is a choice of \*(L"\s-1ALERT\s0\*(R", \*(L"\s-1RESTART\s0\*(R", \*(L"\s-1START\s0\*(R", \*(L"\s-1STOP\s0\*(R", +\&\*(L"\s-1EXEC\s0\*(R", \*(L"\s-1MONITOR\s0\*(R" or \*(L"\s-1UNMONITOR\s0\*(R". +.Sh "\s-1INODE\s0 \s-1TESTING\s0" +.IX Subsection "INODE TESTING" +If supported by the file\-system, you can use monit to test for +inodes usage. This test may only be used within a device service +entry in the monit control file. +.PP +If the device becomes unavailable, monit will call the entry's +registered start method, if it is defined and if monit is running +in active mode. If monit runs in passive mode or the start +methods is not defined, monit will just send an error alert. +.PP +The syntax for the inode statement is: +.IP "\s-1IF\s0 \s-1INODE\s0(S) operator value [unit] [[] \s-1CYCLES\s0] \s-1THEN\s0 action [\s-1ELSE\s0 \s-1IF\s0 \s-1PASSED\s0 [[] \s-1CYCLES\s0] \s-1THEN\s0 action]" 4 +.IX Item "IF INODE(S) operator value [unit] [[] CYCLES] THEN action [ELSE IF PASSED [[] CYCLES] THEN action]" +.PP +\&\fIoperator\fR is a choice of \*(L"<\*(R",\*(L">\*(R",\*(L"!=\*(R",\*(L"==\*(R" in c notation, \*(L"gt\*(R", +\&\*(L"lt\*(R", \*(L"eq\*(R", \*(L"ne\*(R" in shell sh notation and \*(L"greater\*(R", \*(L"less\*(R", +\&\*(L"equal\*(R", \*(L"notequal\*(R" in human readable form (if not specified, +default is \s-1EQUAL\s0). +.PP +\&\fIunit\fR is optional. If not specified, the value is an absolute +count of inodes. You can use the \*(L"%\*(R" character or the longer +alternative \*(L"percent\*(R" as a unit. +.PP +\&\fIaction\fR is a choice of \*(L"\s-1ALERT\s0\*(R", \*(L"\s-1RESTART\s0\*(R", \*(L"\s-1START\s0\*(R", \*(L"\s-1STOP\s0\*(R", +\&\*(L"\s-1EXEC\s0\*(R", \*(L"\s-1MONITOR\s0\*(R" or \*(L"\s-1UNMONITOR\s0\*(R". +.Sh "\s-1PERMISSION\s0 \s-1TESTING\s0" +.IX Subsection "PERMISSION TESTING" +Monit can monitor the permissions. This test may only be used +within a file, fifo, directory or device service entry in the +monit control file. +.PP +The syntax for the permission statement is: +.IP "\s-1IF\s0 \s-1FAILED\s0 \s-1PERM\s0(\s-1ISSION\s0) octalnumber [[] \s-1CYCLES\s0] \s-1THEN\s0 action [\s-1ELSE\s0 \s-1IF\s0 \s-1PASSED\s0 [[] \s-1CYCLES\s0] \s-1THEN\s0 action]" 4 +.IX Item "IF FAILED PERM(ISSION) octalnumber [[] CYCLES] THEN action [ELSE IF PASSED [[] CYCLES] THEN action]" +.PP +\&\fIoctalnumber\fR defines permissions for a file, a directory or a +device. +.PP +\&\fIaction\fR is a choice of \*(L"\s-1ALERT\s0\*(R", \*(L"\s-1RESTART\s0\*(R", \*(L"\s-1START\s0\*(R", \*(L"\s-1STOP\s0\*(R", +\&\*(L"\s-1EXEC\s0\*(R", \*(L"\s-1MONITOR\s0\*(R" or \*(L"\s-1UNMONITOR\s0\*(R". +.PP +The web interface will show a permission warning if the test +failed. +.PP +We recommend that you use the \s-1UNMONITOR\s0 action in a permission +statement. The rationale for this feature is security and that +monit does not start a possible cracked program or +script. Example: +.PP +.Vb 3 +\& check file monit.bin with path "/usr/local/bin/monit" +\& if failed permission 0555 then unmonitor +\& alert foo@bar +.Ve +.PP +If the test fails, monit will simply send an alert and stop +monitoring the file and propagate an unmonitor action upward in +a depend tree. +.Sh "\s-1UID\s0 \s-1TESTING\s0" +.IX Subsection "UID TESTING" +monit can monitor the owner user id (uid). This test may only be +used within a file, fifo, directory or device service entry in +the monit control file. +.PP +The syntax for the uid statement is: +.IP "\s-1IF\s0 \s-1FAILED\s0 \s-1UID\s0 user [[] \s-1CYCLES\s0] \s-1THEN\s0 action [\s-1ELSE\s0 \s-1IF\s0 \s-1PASSED\s0 [[] \s-1CYCLES\s0] \s-1THEN\s0 action]" 4 +.IX Item "IF FAILED UID user [[] CYCLES] THEN action [ELSE IF PASSED [[] CYCLES] THEN action]" +.PP +\&\fIuser\fR defines a user id either in numeric or in string form. +.PP +\&\fIaction\fR is a choice of \*(L"\s-1ALERT\s0\*(R", \*(L"\s-1RESTART\s0\*(R", \*(L"\s-1START\s0\*(R", \*(L"\s-1STOP\s0\*(R", +\&\*(L"\s-1EXEC\s0\*(R", \*(L"\s-1MONITOR\s0\*(R" or \*(L"\s-1UNMONITOR\s0\*(R". +.PP +The web interface will show a uid warning if the test should +fail. +.PP +We recommend that you use the \s-1UNMONITOR\s0 action in a uid +statement. The rationale for this feature is security and that +monit does not start a possible cracked program or +script. Example: +.PP +.Vb 3 +\& check file passwd with path /etc/passwd +\& if failed uid root then unmonitor +\& alert root@localhost +.Ve +.PP +If the test fails, monit will simply send an alert and stop +monitoring the file and propagate an unmonitor action upward in +a depend tree. +.Sh "\s-1GID\s0 \s-1TESTING\s0" +.IX Subsection "GID TESTING" +monit can monitor the owner group id (gid). This test may only +be used within a file, fifo, directory or device service entry +in the monit control file. +.PP +The syntax for the gid statement is: +.IP "\s-1IF\s0 \s-1FAILED\s0 \s-1GID\s0 user [[] \s-1CYCLES\s0] \s-1THEN\s0 action [\s-1ELSE\s0 \s-1IF\s0 \s-1PASSED\s0 [[] \s-1CYCLES\s0] \s-1THEN\s0 action]" 4 +.IX Item "IF FAILED GID user [[] CYCLES] THEN action [ELSE IF PASSED [[] CYCLES] THEN action]" +.PP +\&\fIuser\fR defines a group id either in numeric or in string form. +.PP +\&\fIaction\fR is a choice of \*(L"\s-1ALERT\s0\*(R", \*(L"\s-1RESTART\s0\*(R", \*(L"\s-1START\s0\*(R", \*(L"\s-1STOP\s0\*(R", +\&\*(L"\s-1EXEC\s0\*(R", \*(L"\s-1MONITOR\s0\*(R" or \*(L"\s-1UNMONITOR\s0\*(R". +.PP +The web interface will show a gid warning if the test should +fail. +.PP +We recommend that you use the \s-1UNMONITOR\s0 action in a gid +statement. The rationale for this feature is security and that +monit does not start a possible cracked program or +script. Example: +.PP +.Vb 3 +\& check file shadow with path /etc/shadow +\& if failed gid root then unmonitor +\& alert root@localhost +.Ve +.PP +If the test fails, monit will simply send an alert and stop +monitoring the file and propagate an unmonitor action upward in +a depend tree. +.Sh "\s-1PID\s0 \s-1TESTING\s0" +.IX Subsection "PID TESTING" +monit tests the process id (pid) of processes for change. This +test is implicit and monit will send alert in the case of failure +by default. +.PP +You may override the default action using below rule (it may only +be used within a process service entry in the monit control +file). +.PP +The syntax for the pid statement is: +.IP "\s-1IF\s0 \s-1CHANGED\s0 \s-1PID\s0 [[] \s-1CYCLES\s0] \s-1THEN\s0 action" 4 +.IX Item "IF CHANGED PID [[] CYCLES] THEN action" +.PP +\&\fIaction\fR is a choice of \*(L"\s-1ALERT\s0\*(R", \*(L"\s-1RESTART\s0\*(R", \*(L"\s-1START\s0\*(R", \*(L"\s-1STOP\s0\*(R", +\&\*(L"\s-1EXEC\s0\*(R", \*(L"\s-1MONITOR\s0\*(R" or \*(L"\s-1UNMONITOR\s0\*(R". +.PP +This test is useful to detect possible process restarts which +has occurred in the timeframe between two monit testing cycles. +In the case that the restart was fast and the process provides +expected service (i.e. all tests passed) you will be notified +that the process was replaced. +.PP +For example sshd daemon can restart very quickly, thus if someone +changes its configuration and do sshd restart outside of monit +control, you will be notified that the process was replaced by +new instance (or you can optionaly do some other action such as +preventively stop sshd). +.PP +Another example is MySQL Cluster which has its own watchdog with +process restart ability. You can use monit for redundant +monitoring. Monit will just send alert in the case that the MySQL +cluster restarted the node quickly. +.PP +Example: +.PP +.Vb 3 +\& check process sshd with pidfile /var/run/sshd.pid +\& if changed pid then exec "/my/script" +\& alert root@localhost +.Ve +.Sh "\s-1PPID\s0 \s-1TESTING\s0" +.IX Subsection "PPID TESTING" +monit tests the process parent id (ppid) of processes for change. +This test is implicit and monit will send alert in the case of +failure by default. +.PP +You may override the default action using below rule (it may only +be used within a process service entry in the monit control file). +.PP +The syntax for the ppid statement is: +.IP "\s-1IF\s0 \s-1CHANGED\s0 \s-1PPID\s0 [[] \s-1CYCLES\s0] \s-1THEN\s0 action" 4 +.IX Item "IF CHANGED PPID [[] CYCLES] THEN action" +.PP +\&\fIaction\fR is a choice of \*(L"\s-1ALERT\s0\*(R", \*(L"\s-1RESTART\s0\*(R", \*(L"\s-1START\s0\*(R", \*(L"\s-1STOP\s0\*(R", +\&\*(L"\s-1EXEC\s0\*(R", \*(L"\s-1MONITOR\s0\*(R" or \*(L"\s-1UNMONITOR\s0\*(R". +.PP +This test is useful for detecting changes of a process parent. +.PP +Example: +.PP +.Vb 3 +\& check process myproc with pidfile /var/run/myproc.pid +\& if changed ppid then exec "/my/script" +\& alert root@localhost +.Ve +.Sh "\s-1CONNECTION\s0 \s-1TESTING\s0" +.IX Subsection "CONNECTION TESTING" +Monit is able to perform connection testing via networked ports +or via Unix sockets. A connection test may only be used within a +process or within a host service entry in the monit control file. +.PP +If a service listens on one or more sockets, monit can connect to +the port (using either tcp or udp) and verify that the service +will accept a connection and that it is possible to write and +read from the socket. If a connection is not accepted or if there +is a problem with socket read/write, monit will assume that +something is wrong and execute a specified action. If monit is +compiled with openssl, then ssl based network services can also +be tested. +.PP +The full syntax for the statement used for connection testing is +as follows (keywords are in capital and optional statements in +[brackets]), +.IP "\s-1IF\s0 \s-1FAILED\s0 [host] port [type] [protocol|{send/expect}+] [timeout] [[] \s-1CYCLES\s0] \s-1THEN\s0 action [\s-1ELSE\s0 \s-1IF\s0 \s-1PASSED\s0 [[] \s-1CYCLES\s0] \s-1THEN\s0 action]" 4 +.IX Item "IF FAILED [host] port [type] [protocol|{send/expect}+] [timeout] [[] CYCLES] THEN action [ELSE IF PASSED [[] CYCLES] THEN action]" +.PP +or for Unix sockets, +.IP "\s-1IF\s0 \s-1FAILED\s0 [unixsocket] [type] [protocol|{send/expect}+] [timeout] [[] \s-1CYCLES\s0] \s-1THEN\s0 action [\s-1ELSE\s0 \s-1IF\s0 \s-1PASSED\s0 [[] \s-1CYCLES\s0] \s-1THEN\s0 action]" 4 +.IX Item "IF FAILED [unixsocket] [type] [protocol|{send/expect}+] [timeout] [[] CYCLES] THEN action [ELSE IF PASSED [[] CYCLES] THEN action]" +.PP +\&\fBhost:HOST hostname\fR. Optionally specify the host to connect to. +If the host is not given then localhost is assumed if this test +is used inside a process entry. If this test was used inside a +remote host entry then the entry's remote host is assumed. +Although \fIhost\fR is intended for testing name based virtual host +in a \s-1HTTP\s0 server running on local or remote host, it does allow +the connection statement to be used to test a server running on +another machine. This may be useful; For instance if you use +Apache httpd as a front-end and an application-server as the +back-end running on another machine, this statement may be used +to test that the back-end server is running and if not raise an +alert. +.PP +\&\fBport:PORT number\fR. The port number to connect to +.PP +\&\fBunixsocket:UNIXSOCKET \s-1PATH\s0\fR. Specifies the path to a Unix +socket. Servers based on Unix sockets, always runs on the local +machine and does not use a port. +.PP +\&\fBtype:TYPE {TCP|UDP|TCPSSL}\fR. Optionally specify the socket type +monit should use when trying to connect to the port. The +different socket types are; \s-1TCP\s0, \s-1UDP\s0 or \s-1TCPSSL\s0, where \s-1TCP\s0 is a +regular stream based socket, \s-1UDP\s0 is a datagram socket and \s-1TCPSSL\s0 +specify that monit should use a \s-1TCP\s0 socket with \s-1SSL\s0 when +connecting to a port. The default socket type is \s-1TCP\s0. If \s-1TCPSSL\s0 +is used you may optionally specify the \s-1SSL/TLS\s0 protocol to be +used and the md5 sum of the server's certificate. The \s-1TCPSSL\s0 +options are: +.PP +.Vb 1 +\& TCPSSL [SSLAUTO|SSLV2|SSLV3|TLSV1] [CERTMD5 md5sum] +.Ve +.PP +\&\fBproto(col):PROTO {protocols}\fR. Optionally specify the protocol +monit should speak when a connection is established. At the +moment monit knows how to speak: + \fIAPACHE-STATUS\fR + \fI\s-1DNS\s0\fR + \fI\s-1DWP\s0\fR + \fI\s-1FTP\s0\fR + \fI\s-1HTTP\s0\fR + \fI\s-1IMAP\s0\fR + \fI\s-1CLAMAV\s0\fR + \fI\s-1LDAP2\s0\fR + \fI\s-1LDAP3\s0\fR + \fI\s-1MYSQL\s0\fR + \fI\s-1NNTP\s0\fR + \fI\s-1NTP3\s0\fR + \fI\s-1POP\s0\fR + \fIPOSTFIX-POLICY\fR + \fI\s-1RDATE\s0\fR + \fI\s-1RSYNC\s0\fR + \fI\s-1SMTP\s0\fR + \fI\s-1SSH\s0\fR + \fI\s-1TNS\s0\fR + \fI\s-1PGSQL\s0\fR +If you have compiled monit with ssl support, monit can also speak +the \s-1SSL\s0 variants such as: + \fI\s-1HTTPS\s0\fR + \fI\s-1FTPS\s0\fR + \fI\s-1POPS\s0\fR + \fI\s-1IMAPS\s0\fR +To use the \s-1SSL\s0 protocol support you need to define the socket as +\&\s-1SSL\s0 and use the general protocol name (for example in the case of +\&\s-1HTTPS\s0) : + \s-1TYPE\s0 \s-1TCPSSL\s0 \s-1PROTOCOL\s0 \s-1HTTP\s0 +If the server's protocol is not found in this list, simply do not +specify the protocol and monit will utilize a default test, +including testing if it is possible to read and write to the +port. This default test is in most cases more than good enough to +deduce if the server behind the port is up or not. +.PP +The protocol statement is: +.PP +.Vb 1 +\& [PROTO(COL) {name} [REQUEST {"/path"} [with CHECKSUM checksum]] +.Ve +.PP +As you can see, you may specify a request after the protocol, at +the moment only the \s-1HTTP\s0 protocol supports the request option. +See also below for an example. +.PP +In addition to the standard protocols, the \fIAPACHE-STATUS\fR +protocol is a test of a specific server type, rather than a +generic protocol. Server performance is examined using the status +page generated by Apache's mod_status, which is expected to be at +its default address of http://www.example.com/server\-status. +Currently the \fIAPACHE-STATUS\fR protocol examines the percentage +of Apache child processes which are +.PP +.Vb 10 +\& o logging (loglimit) +\& o closing connections (closelimit) +\& o performing DNS lookups (dnslimit) +\& o in keepalive with a client (keepalivelimit) +\& o replying to a client (replylimit) +\& o receiving a request (requestlimit) +\& o initialising (startlimit) +\& o waiting for incoming connections (waitlimit) +\& o gracefully closing down (gracefullimit) +\& o performing cleanup procedures (cleanuplimit) +.Ve +.PP +Each of these quantities can be compared against a value relative +to the total number of active Apache child processes. If the +comparison expression is true the chosen action is performed. +.PP +The apache-status protocol statement is formally defined as +(keywords in uppercase): +.PP +.Vb 1 +\& PROTO(COL) {limit} OP PERCENT [OR {limit} OP PERCENT]* +.Ve +.PP +where {limit} is one or more of: loglimit, closelimit, dnslimit, +keepalivelimit, replylimit, requestlimit, startlimit, waitlimit +gracefullimit or cleanuplimit. The operator \s-1OP\s0 is one of: +[<|=|>]. +.PP +You can combine all of these test into one expression or you can +choose to test a certain limit. If you combine the limits you +must or' them together using the \s-1OR\s0 keyword. +.PP +Here's an example were we test for a loglimit more than 10 +percent, a dnslimit over 25 percent and a wait limit less than 20 +percent of processes. See also more examples below in the example +section. +.PP +.Vb 5 +\& protocol apache\-status +\& loglimit > 10% or +\& dnslimit > 50% or +\& waitlimit < 20% +\& then alert +.Ve +.PP +Obviously, do not use this test unless the httpd server you are +testing is Apache Httpd and mod_status is activated on the +server. +.PP +\&\fBsend/expect: {SEND|EXPECT} \*(L"string\*(R" ...\fR. If monit does not +support the protocol spoken by the server, you can write your own +protocol-test using \fIsend\fR and \fIexpect\fR strings. The \fI\s-1SEND\s0\fR +statement sends a string to the server port and the \fI\s-1EXPECT\s0\fR +statement compares a string read from the server with the string +given in the expect statement. If your system supports \s-1POSIX\s0 +regular expressions, you can use regular expressions in the +expect string, see \fIregex\fR\|(7) to learn more about the types of +regular expressions you can use in an expect string. Otherwise +the string is used as it is. The send/expect statement is: +.PP +.Vb 1 +\& [{SEND|EXPECT} "string"]+ +.Ve +.PP +Note that monit will send a string as it is, and you \fBmust\fR +remember to include \s-1CR\s0 and \s-1LF\s0 in the string sent to the server if +the protocol expect such characters to terminate a string (most +text based protocols used over Internet does). Likewise monit +will read up to 256 bytes from the server and use this string +when comparing the expect string. If the server sends strings +terminated by \s-1CRLF\s0, (i.e. \*(L"\er\en\*(R") you \fImay\fR remember to add the +same terminating characters to the string you expect from the +server. +.PP +You can use non-printable characters in a send string if +needed. Use the hex notation, \e0xHEXHEX to send any char in the +range \e0x00\-\e0xFF, that is, 0\-255 in decimal. This may be useful +when testing some network protocols, particularly those over +\&\s-1UDP\s0. An example, to test a quake 3 server you can use the +following, +.PP +.Vb 2 +\& send "\e0xFF\e0xFF\e0xFF\e0xFFgetstatus" +\& expect "sv_floodProtect|sv_maxPing" +.Ve +.PP +Finally, send/expect can be used with any socket type, such as +\&\s-1TCP\s0 sockets, \s-1UNIX\s0 sockets and \s-1UDP\s0 sockets. +.PP +\&\fBtimeout:with \s-1TIMEOUT\s0 x \s-1SECONDS\s0\fR. Optionally specifies the +connect and read timeout for the connection. If monit cannot +connect to the server within this time it will assume that the +connection failed and execute the specified action. The default +connect timeout is 5 seconds. +.PP +\&\fIaction\fR is a choice of \*(L"\s-1ALERT\s0\*(R", \*(L"\s-1RESTART\s0\*(R", \*(L"\s-1START\s0\*(R", \*(L"\s-1STOP\s0\*(R", +\&\*(L"\s-1EXEC\s0\*(R", \*(L"\s-1MONITOR\s0\*(R" or \*(L"\s-1UNMONITOR\s0\*(R". +.PP +Connection testing using the \s-1URL\s0 notation +.IX Subsection "Connection testing using the URL notation" +.PP +You can test a \s-1HTTP\s0 server using the compact \s-1URL\s0 syntax. This +test also allow you to use \s-1POSIX\s0 regular expressions to test the +content returned by the \s-1HTTP\s0 server. +.PP +The full syntax for the \s-1URL\s0 statement is as follows (keywords are +in capital and optional statements in [brackets]): +.PP +.Vb 5 +\& IF FAILED URL ULR\-spec +\& [CONTENT {==|!=} "regular\-expression"] +\& [TIMEOUT number SECONDS] [[] CYCLES] +\& THEN action +\& [ELSE IF PASSED [[] CYCLES] THEN action] +.Ve +.PP +Where URL-spec is an \s-1URL\s0 on the standard form as specified in \s-1RFC\s0 +2396: +.PP +.Vb 1 +\& ://? +.Ve +.PP +Here is an example on an \s-1URL\s0 where all components are used: +.PP +.Vb 1 +\& http://user:password@www.foo.bar:8080/document/?querystring#ref +.Ve +.PP +If a username and password is included in the \s-1URL\s0 monit will +attempt to login at the server using \fBBasic Authentication\fR. +.PP +Testing the content returned by the server is optional. If used, +you can test if the content \fBmatch\fR or does \fBnot match\fR a +regular expression. Here's an example on how the \s-1URL\s0 statement +can be used in a \fIcheck service\fR: +.PP +.Vb 5 +\& check host FOO with address www.foo.bar +\& if failed url +\& http://user:password@www.foo.bar:8080/?querystring +\& and content == 'action="j_security_check"' +\& then ... +.Ve +.PP +Monit will look at the content-length header returned by the +server and download this amount before testing the content. That +is, if the content-length is more than 1Mb or this header is not +set by the server monit will default to download up to 1 Mb and +not more. +.PP +Only the http(s) protocol is supported in an \s-1URL\s0 statement. If +the protocol is \fBhttps\fR monit will use \s-1SSL\s0 when connecting to +the server. +.PP +Remote host ping test +.IX Subsection "Remote host ping test" +.PP +In addition monit can perform \s-1ICMP\s0 Echo tests in remote host +checks. The icmp test may only be used in a check host entry and +monit must run with super user privileges, that is, the root user +must run monit. The reason is that the icmp test utilize a raw +socket to send the icmp packet and only the super user is allowed +to create a raw socket. +.PP +The full syntax for the \s-1ICMP\s0 Echo statement used for ping testing +is as follows (keywords are in capital and optional statements in +[brackets]): +.PP +.Vb 5 +\& IF FAILED ICMP TYPE ECHO +\& [COUNT number] [WITH] [TIMEOUT number SECONDS] +\& [[] CYCLES] +\& THEN action +\& [ELSE IF PASSED [[] CYCLES] THEN action] +.Ve +.PP +The rules for action and timeout are the same as those mentioned +above in the \s-1CONNECTION\s0 \s-1TESTING\s0 section. The count parameter +specifies how many consecutive echo requests will be send to the +host in one cycle. In the case that no reply came within timeout +frame, monit reports error. When at least one reply was received, +the test will pass. Monit sends by default three echo requests in +one cycle to prevent the random packet loss from generating false +alarm (i.e. up to 66% packet loss is tolerated). You can set the +count option to different value, which can serve as error ratio. +For example in the case that you require 100% ping success, you +can set the count to 1 (i.e. just one attempt will be send, when +the packet was lost, then error will be reported). +.PP +An icmp ping test is useful for testing if a host is up, before +testing ports at the host. If an icmp ping test is used in a +check host entry, this test is run first and if the ping test +should fail we assume that the connection to the host is down and +monit does \fInot\fR continue to test any ports. Here's an example: +.PP +.Vb 6 +\& check host xyzzy with address xyzzy.org +\& if failed icmp type echo count 5 with timeout 15 seconds +\& then alert +\& if failed port 80 proto http then alert +\& if failed port 443 type TCPSSL proto http then alert +\& alert foo@bar +.Ve +.PP +In this case, if the icmp test should fail you will get \fIone\fR +alert and only one alert as long as the host is down, and equally +important, monit will \fInot\fR test port 80 and port 443. Likewise +if the icmp ping test should succeed (again) monit will continue +to test both port 80 and 443. +.PP +Keep in mind though that some firewalls can block icmp packages +and thus render the test useless. +.PP +Examples +.IX Subsection "Examples" +.PP +To check a port connection and receive an alert if monit cannot +connect to the port, use the following statement: +.PP +.Vb 1 +\& if failed port 80 then alert +.Ve +.PP +In this case the machine in question is assumed to be the default +host. For a process entry it's \fIlocalhost\fR and for a remote host +entry it's the \fIaddress\fR of the remote host. Monit will conduct +a tcp connection to the host at port 80 and use tcp by default. +If you want to connect with udp, you can specify this after the +port\-statement; +.PP +.Vb 1 +\& if failed port 53 type udp protocol dns then alert +.Ve +.PP +Monit will stop trying to connect to the port after 5 seconds and +assume that the server behind the port is down. You may increase +or decrease the connect timeout by explicit add a connection +timeout. In the following example the timeout is increased to 15 +seconds and if monit cannot connect to the server within 15 +seconds the test will fail and an alert message is sent. +.PP +.Vb 1 +\& if failed port 80 with timeout 15 seconds then alert +.Ve +.PP +If a server is listening to a Unix socket the following statement +can be used: +.PP +.Vb 1 +\& if failed unixsocket /var/run/sophie then alert +.Ve +.PP +A Unix socket is used by some servers for fast (interprocess) +communication on localhost only. A Unix socket is specified by a +path and in the example above the path, /var/run/sophie, +specifies a Unix socket. +.PP +If your machine answers for several virtual hosts you can prefix +the port statement with a host-statement like so: +.PP +.Vb 3 +\& if failed host www.sol.no port 80 then alert +\& if failed host 80.69.226.133 port 443 then alert +\& if failed host kvasir.sol.no port 80 then alert +.Ve +.PP +And as mentioned above, if you do not specify a host\-statement, +\&\fIlocalhost\fR or \fIaddress\fR is assumed. +.PP +Monit also knows how to speak some of the more popular Internet +protocols. So, besides testing for connections, monit can also +speak with the server in question to verify that the server +works. For example, the following is used to test a http server: +.PP +.Vb 2 +\& if failed host www.tildeslash.com port 80 proto http +\& then restart +.Ve +.PP +Some protocols also support a request statement. This statement +can be used to ask the server for a special document entity. +.PP +Currently \fBonly\fR the \fI\s-1HTTP\s0\fR protocol module supports the +request statement, such as: +.PP +.Vb 3 +\& if failed host www.myhost.com port 80 protocol http +\& and request "/data/show.php?a=b&c=d" +\& then restart +.Ve +.PP +The request must contain an \s-1URI\s0 string specifying a document from +the http server. The string will be \s-1URL\s0 encoded by monit before +it sends the request to the http server, so it's okay to use \s-1URL\s0 +unsafe characters in the request. If the request statement isn't +specified, the default web server page will be requested. +.PP +You can also test the checksum for documents returned by a http +server. You can use either \s-1MD5\s0 sums: +.PP +.Vb 4 +\& if failed port 80 protocol http +\& and request "/page.html" +\& with checksum 8f7f419955cefa0b33a2ba316cba3659 +\& then alert +.Ve +.PP +Or you can use \s-1SHA1\s0 sums: +.PP +.Vb 4 +\& if failed port 80 protocol http +\& and request "/page.html" +\& with checksum e428302e260e0832007d82de853aa8edf19cd872 +\& then alert +.Ve +.PP +monit will compute a checksum (either \s-1MD5\s0 or \s-1SHA1\s0 is used, +depending on length of the hash) for the document (in the above +case, /page.html) and compare the computed checksum with the +expected checksum. If the sums does not match then the if-tests +action is performed, in this case alert. Note that monit will +\&\fBnot\fR test the checksum for a document if the server does not +set the \s-1HTTP\s0 \fIContent-Length\fR header. A \s-1HTTP\s0 server should set +this header when it server a static document (i.e. a file). A +server will often use chunked transfer encoding instead when +serving dynamic content (e.g. a document created by a CGI-script +or a Servlet), but to test the checksum for dynamic content is +not very useful. There are no limitation on the document size, +but keep in mind that monit will use time to download the +document over the network so it's probably smart not to ask monit +to compute a checksum for documents larger than 1Mb or so, +depending on you network connection of course. Tip; If you get a +checksum error even if the document has the correct sum, the +reason may be that the download timed out. In this case, explicit +set a longer timeout than the default 5 seconds. +.PP +As mentioned above, if the server protocol is not supported by +monit you can write your own protocol test using send/expect +strings. Here we show a protocol test using send/expect for an +imaginary \*(L"Ali Baba and the Forty Thieves\*(R" protocol: +.PP +.Vb 6 +\& if failed host cave.persia.ir port 4040 +\& send "Open, Sesame!\er\en" +\& expect "Please enter the cave\er\en" +\& send "Shut, Sesame!\er\en" +\& expect "See you later [A\-Za\-z ]+\er\en" +\& then restart +.Ve +.PP +The \fI\s-1TCPSSL\s0\fR statement can optionally test the md5 sum of the +server's certificate. You must state the md5 certificate string +you expect the server to deliver and upon a connect to the +server, the server's actual md5 sum certificate string is tested. +Any other symbol but [A\-Fa\-f0\-9] is being ignored in that sting. +Thus it is possible to copy and paste the output of e.g. openssl. +If they do not match, the connection test fails. If the ssl +version handshake does not work properly you can also force a +specific ssl version, as we demonstrate in this example: +.PP +.Vb 10 +\& if failed host shop.sol.no port 443 +\& type TCPSSL SSLV3 # Force monit to use ssl version 3 +\& # We expect the server to return this md5 certificate sum +\& # as either 12\-34\-56\-78\-90\-AB\-CD\-EF\-12\-34\-56\-78\-90\-AB\-CD\-EF +\& # or e.g. 1234567890ABCDEF1234567890ABCDEF +\& # or e.g. 1234567890abcdef1234567890abcdef +\& # what ever come in more handy (see text above) +\& CERTMD5 12\-34\-56\-78\-90\-AB\-CD\-EF\-12\-34\-56\-78\-90\-AB\-CD\-EF +\& protocol http +\& then restart +.Ve +.PP +Here's an example where a connection test is used inside a +process entry: +.PP +.Vb 4 +\& check process apache with pidfile /var/run/apache.pid +\& start program = "/etc/init.d/httpd start" +\& stop program = "/etc/init.d/httpd stop" +\& if failed host www.tildeslash.com port 80 then restart +.Ve +.PP +Here, a connection test is used in a remote host entry: +.PP +.Vb 2 +\& check host up2date with address ftp.redhat.com +\& if failed port 21 and protocol ftp then alert +.Ve +.PP +Since we did not explicit specify a host in the above test, monit +will connect to port 21 at ftp.redhat.com. Apropos, the host +address can be specified as a dotted \s-1IP\s0 address string or as +hostname in the \s-1DNS\s0. The following is exactly[*] the same test, +but here an ip address is used instead: +.PP +.Vb 2 +\& check host up2date with address 66.187.232.30 +\& if failed port 21 and protocol ftp then alert +.Ve +.PP +[*] Well, not quite, since we specify an ip-address directly we +will bypass any \s-1DNS\s0 round-robin setup, but that's another story. +.PP +For more examples, see the example section below. +.SH "MONIT HTTPD" +.IX Header "MONIT HTTPD" +If specified in the control file, monit will start a monit daemon +with http support. From a Browser you can then start and stop +services, disable or enable service monitoring as well as view +the status of each service. Also, if monit logs to its own file, +you can view the content of this logfile in a Browser. +.PP +The control file statement for starting a monit daemon with http +support is a global set\-statement: +.IP "set httpd port 2812" 4 +.IX Item "set httpd port 2812" +.PP +And you can use this \s-1URL\s0, \fIhttp://localhost:2812/\fR, to access +the daemon from a browser. The port number, in this case 2812, +can be any number that you are allowed to bind to. +.PP +If you have compiled monit with openssl, you can also start the +httpd server with ssl support, using the following expression: +.PP +.Vb 3 +\& set httpd port 2812 +\& ssl enable +\& pemfile /etc/certs/monit.pem +.Ve +.PP +And you can use this \s-1URL\s0, \fIhttps://localhost:2812/\fR, to access +the monit web server over an ssl encrypted connection. +.PP +The pemfile, in the example above, holds both the server's +private key and certificate. This file should be stored in a safe +place on the filesystem and should have strict permissions, that +is, no more than 0700. +.PP +In addition, if you want to check for client certificates you can +use the \s-1CLIENTPEMFILE\s0 statement. In this case, a connecting +client has to provided a certificate known by monit in order to +connect. This file also needs to have all necessary \s-1CA\s0 +certificates. A configuration could look like: +.PP +.Vb 4 +\& set httpd port 2812 +\& ssl enable +\& pemfile /etc/certs/monit.pem +\& clientpemfile /etc/certs/monit\-client.pem +.Ve +.PP +By default self signed client certificates are not allowed. If +you want to use a self signed certificate from a client it has to +be allowed explicitly with the \s-1ALLOWSELFCERTIFICATION\s0 statement. +.PP +For more information on how to use monit with \s-1SSL\s0 and for more +information about certificates and generating pem files, please +consult the \s-1README\s0.SSL file accompanying the software. +.PP +If you only want the http server to accept connect requests to +one host addresses you can specify the bind address either as an +\&\s-1IP\s0 number string or as a hostname. In the following example we +bind the http server to the loopback device. In other words the +http server will only be reachable from localhost: +.PP +.Vb 1 +\& set httpd port 2812 and use the address 127.0.0.1 +.Ve +.PP +or +.PP +.Vb 1 +\& set httpd port 2812 and use the address localhost +.Ve +.PP +If you do not use the \s-1ADDRESS\s0 statement the http server will +accept connections on any/all local addresses. +.PP +It is possible to hide monit's httpd server version, which +usually is available in httpd header responses and in error +pages. +.PP +.Vb 3 +\& set httpd port 2812 +\& ... +\& signature {enable|disable} +.Ve +.PP +Use \fIdisable\fR to hide the server signature \- monit will only +report its name (e.g. 'monit' instead of for example 'monit +4.2'). By default the version signature is enabled. It is worth +to stress that this option provides no security advantage and +falls into the \*(L"security through obscurity\*(R" category. +.PP +If you remove the httpd statement from the config file, monit +will stop the httpd server on configuration reload. Likewise if +you change the port number, monit will restart the http server +using the new specified port number. +.PP +The status page displayed by the monit web server is +automatically refreshed with the same poll time set for the monit +daemon. +.PP +\&\fBNote:\fR +.PP +We strongly recommend that you start monit with http support (and +bind the server to localhost, only, unless you are behind a +firewall). The built-in web-server is small and does not use much +resources, and more \fIimportantly\fR, monit can use the http server +for interprocess communication between a monit client and a monit +daemon. +.PP +For instance, you \fImust\fR start a monit daemon with http support +if you want to be able to use the following console commands. +(That is; most of the available console commands). +.PP +.Vb 12 +\& 'monit stop all' +\& 'monit start all' +\& 'monit stop service' +\& 'monit start service' +\& 'monit restart service' +\& 'monit monitor service' +\& 'monit unmonitor service' +\& 'monit \-g groupname stop all' +\& 'monit \-g groupname start all' +\& 'monit \-g groupname restart all' +\& 'monit \-g groupname monitor all' +\& 'monit \-g groupname unmonitor all' +.Ve +.PP +If a monit daemon is running in the background we will ask the +daemon (via the \s-1HTTP\s0 protocol) to execute the above commands. +That is, the daemon is requested to start and stop the services. +This ensures that a daemon will not restart a service that you +requested to stop and that (any) timeout lock will be removed +from a service when you start it. +.Sh "Monit \s-1HTTPD\s0 Authentication" +.IX Subsection "Monit HTTPD Authentication" +monit supports two types of authentication schema's for +connecting to the httpd server, (three, if you count \s-1SSL\s0 client +certificate validation). Both schema's can be used together or by +itself. You \fBmust\fR choose at least one. +.PP +Host and network allow list +.IX Subsection "Host and network allow list" +.PP +The http server maintains an access-control list of hosts and +networks allowed to connect to the server. You can add as many +hosts as you want to, but only hosts with a valid domain name or +its \s-1IP\s0 address are allowed. If you specify a hostname that does +not resolve, monit will write an error message in the console and +not start. Networks require a network \s-1IP\s0 and a netmask to be +accepted. +.PP +The http server will query a name server to check any hosts +connecting to the server. If a host (client) is trying to connect +to the server, but cannot be found in the access list or cannot +be resolved, the server will shutdown the connection to the +client promptly. +.PP +Control file example: +.PP +.Vb 6 +\& set httpd port 2812 +\& allow localhost +\& allow my.other.work.machine.com +\& allow 10.1.1.1 +\& allow 192.168.1.0/255.255.255.0 +\& allow 10.0.0.0/8 +.Ve +.PP +Clients, not mentioned in the allow list that tries to connect to +the server are logged with their ip\-address. +.PP +Basic Authentication +.IX Subsection "Basic Authentication" +.PP +This authentication schema is \s-1HTTP\s0 specific and described in more +detail in \s-1RFC\s0 2617. +.PP +In short; a server challenge a client (e.g. a Browser) to send +authentication information (username and password) and if +accepted, the server will allow the client access to the +requested document. +.PP +The biggest weakness with Basic Authentication is that the +username and password is sent in clear-text (i.e. base64 encoded) +over the network. It is therefor recommended that you do not use +this authentication method unless you run the monit http server +with \fIssl\fR support. With ssl support it is completely safe to +use Basic Authentication since \fBall\fR http data, including Basic +Authentication headers will be encrypted. +.PP +monit will use Basic Authentication if an allow statement +contains a username and a password separated with a single ':' +character, like so; \fIallow username:password\fR. The username and +password must be written in clear\-text. +.PP +Alternatively you can use files in \*(L"htpasswd\*(R" format (one +user:passwd entry per line), like so: \fIallow +[cleartext|crypt|md5] /path [users]\fR. By default cleartext +passwords are read. In case the passwords are digested it is +necessary to specify the cryptographic method. In order to select +the users their names can be added to the allow statement. +Otherwise all users are added. +.PP +Example: +.PP +.Vb 3 +\& set httpd port 2812 +\& allow hauk:password +\& allow md5 /etc/httpd/htpasswd john paul ringo george +.Ve +.PP +If you use this method together with a host list, then only +clients from the listed hosts will be allowed to connect to the +monit http server and each client will be asked to provide a +username and a password. +.PP +Example: +.PP +.Vb 4 +\& set httpd port 2812 +\& allow localhost +\& allow 10.1.1.1 +\& allow hauk:password +.Ve +.PP +If you only want to use Basic Authentication, then just provide +allow entries with username and password, like so: +.PP +.Vb 3 +\& set httpd port 2812 +\& allow hauk:password +\& allow admin:password +.Ve +.PP +Finally it is possible to define some users as read\-only. A +read-only user can read the monit web pages but will \fInot\fR get +access to push-buttons and cannot change a service from the web +interface. +.PP +.Vb 3 +\& set httpd port 2812 +\& allow admin:password +\& allow hauk:password read\-only +.Ve +.PP +A user is set to read-only by using the \fIread-only\fR keyword +\&\fBafter\fR username:password. In the above example the user \fIhauk\fR +is defined as a read-only user, while the \fIadmin\fR user has all +access rights. +.PP +\&\s-1NB\s0! a monit client will use the \fIfirst\fR username:password pair +in an allow list and you should \fBnot\fR define the first user as a +read-only user. If you do, monit console commands will not work. +.PP +If you use Basic Authentication it is a good idea to set the +access permission for the control file (~/.monitrc) to only +readable and writable for the user running monit, because the +password is written in clear\-text. (Use this command, /bin/chmod +600 ~/.monitrc). In fact, since monit \fBversion 3.0\fR, monit will +complain and exit if the control file is readable by others. +.PP +Clients trying to connect to the server but supply the wrong +username and/or password are logged with their ip\-address. +.PP +If the monit command line interface is being used at least one +cleartext password is necessary. Otherwise, the monit command +line interface will not be able to connect to the monit daemon +server. +.SH "DEPENDENCIES" +.IX Header "DEPENDENCIES" +If specified in the control file, monit can do dependency +checking before start, stop, monitoring or unmonitoring of +services. The dependency statement may be used within any service +entries in the monit control file. +.PP +The syntax for the depend statement is simply: +.IP "\s-1DEPENDS\s0 on service[, service [,...]]" 4 +.IX Item "DEPENDS on service[, service [,...]]" +.PP +Where \fBservice\fR is a service entry name, for instance \fBapache\fR +or \fBdatafs\fR. +.PP +You may add more than one service name of any type or use more +than one depend statement in an entry. +.PP +Services specified in a \fIdepend\fR statement will be checked +during stop/start/monitor/unmonitor operations. If a service is +stopped or unmonitored it will stop/unmonitor any services that +depends on itself. Likewise, if a service is started, it will +first stop any services that depends on itself and after it is +started, start all depending services again. If the service is to +be monitored (enable monitoring), all services which this service +depends on will be monitored before enabling monitoring of this +service. +.PP +Here is an example where we set up an apache service entry to +depend on the underlying apache binary. If the binary should +change an alert is sent and apache is not monitored anymore. The +rationale is security and that monit should not execute a +possibly cracked apache binary. +.PP +.Vb 7 +\& (1) check process apache +\& (2) with pidfile "/usr/local/apache/logs/httpd.pid" +\& (3) ... +\& (4) depends on httpd +\& (5) +\& (6) check file httpd with path /usr/local/apache/bin/httpd +\& (7) if failed checksum then unmonitor +.Ve +.PP +The first entry is the process entry for apache shown before +(abbreviated for clarity). The fourth line sets up a dependency +between this entry and the service entry named httpd in line 6. A +depend tree works as follows, if an action is conducted in a +lower branch it will propagate upward in the tree and for every +dependent entry execute the same action. In this case, if the +checksum should fail in line 7 then an unmonitor action is +executed and the apache binary is not checked anymore. But since +the apache process entry depends on the httpd entry this entry +will also execute the unmonitor action. In short, if the checksum +test for the httpd binary file should fail, both the check file +httpd entry and the check process apache entry is set in +un-monitoring mode. +.PP +A dependency tree is a general construct and can be used between +all types of service entries and span many levels and propagate +any supported action (except the exec action which will not +propagate upward in a dependency tree for obvious reasons). +.PP +Here is another different example. Consider the following common +server setup: +.PP +.Vb 2 +\& WEB\-SERVER \-> APPLICATION\-SERVER \-> DATABASE \-> FILESYSTEM +\& (a) (b) (c) (d) +.Ve +.PP +You can set dependencies so that the web-server depends on the +application server to run before the web-server starts and the +application server depends on the database server and the +database depends on the file-system to be mounted before it +starts. See also the example section below for examples using the +depend statement. +.PP +Here we describe how monit will function with the above +dependencies: +.IP "If no servers are running" 4 +.IX Item "If no servers are running" +monit will start the servers in the following order: \fId\fR, \fIc\fR, +\&\fIb\fR, \fIa\fR +.IP "If all servers are running" 4 +.IX Item "If all servers are running" +When you run 'monit stop all' this is the stop order: \fIa\fR, \fIb\fR, +\&\fIc\fR, \fId\fR. If you run 'monit stop d' then \fIa\fR, \fIb\fR and \fIc\fR +are also stopped because they depend on \fId\fR and finally \fId\fR is +stopped. +.IP "If \fIa\fR does not run" 4 +.IX Item "If a does not run" +When monit runs it will start \fIa\fR +.IP "If \fIb\fR does not run" 4 +.IX Item "If b does not run" +When monit runs it will first stop \fIa\fR then start \fIb\fR and +finally start \fIa\fR again. +.IP "If \fIc\fR does not run" 4 +.IX Item "If c does not run" +When monit runs it will first stop \fIa\fR and \fIb\fR then start \fIc\fR +and finally start \fIb\fR then \fIa\fR. +.IP "If \fId\fR does not run" 4 +.IX Item "If d does not run" +When monit runs it will first stop \fIa\fR, \fIb\fR and \fIc\fR then start +\&\fId\fR and finally start \fIc\fR, \fIb\fR then \fIa\fR. +.IP "If the control file contains a depend loop." 4 +.IX Item "If the control file contains a depend loop." +A depend loop is for example; a\->b and b\->a or a\->b\->c\->a. +.Sp +When monit starts it will check for such loops and complain and +exit if a loop was found. It will also exit with a complaint if a +depend statement was used that does not point to a service in the +control file. +.SH "THE RUN CONTROL FILE" +.IX Header "THE RUN CONTROL FILE" +The preferred way to set up monit is to write a \fI.monitrc\fR file +in your home directory. When there is a conflict between the +command-line arguments and the arguments in this file, the +command-line arguments take precedence. To protect the security +of your control file and passwords the control file must have +permissions \fIno more than 0700\fR (u=xrw,g=,o=); monit will +complain and exit otherwise. +.Sh "Run Control Syntax" +.IX Subsection "Run Control Syntax" +Comments begin with a '#' and extend through the end of the line. +Otherwise the file consists of a series of service entries or +global option statements in a free\-format, token-oriented syntax. +.PP +There are three kinds of tokens: grammar keywords, numbers (i.e. +decimal digit sequences) and strings. Strings can be either +quoted or unquoted. A quoted string is bounded by double quotes +and may contain whitespace (and quoted digits are treated as a +string). An unquoted string is any whitespace-delimited token, +containing characters and/or numbers. +.PP +On a semantic level, the control file consists of two types of +entries: +.IP "1. Global set-statements" 4 +.IX Item "1. Global set-statements" +A global set-statement starts with the keyword \fIset\fR and the +item to configure. +.IP "2. One or more service entry statements." 4 +.IX Item "2. One or more service entry statements." +Each service entry consists of the keywords `check', followed by +the service type. Each entry requires a descriptive +name, which may be freely chosen. This name is used by monit +to refer to the service internally and in all interactions +with the user. +.PP +Currently, six types of check statements are supported: +.IP "1. \s-1CHECK\s0 \s-1PROCESS\s0 \s-1PIDFILE\s0 " 4 +.IX Item "1. CHECK PROCESS PIDFILE " + is the absolute path to the program's pidfile. If the +pidfile does not exist or does not contain the pid number of a +running process, monit will call the entry's start method if +defined, If monit runs in passive mode or the start methods is +not defined, monit will just send alerts on errors. +.IP "2. \s-1CHECK\s0 \s-1FILE\s0 \s-1PATH\s0 " 4 +.IX Item "2. CHECK FILE PATH " + is the absolute path to the file. If the file does not +exist or disappeared, monit will call the entry's start method if +defined, if does not point to a regular file type (for +instance a directory), monit will disable monitoring of this +entry. If monit runs in passive mode or the start methods is not +defined, monit will just send alerts on errors. +.IP "3. \s-1CHECK\s0 \s-1FIFO\s0 \s-1PATH\s0 " 4 +.IX Item "3. CHECK FIFO PATH " + is the absolute path to the fifo. If the fifo does not +exist or disappeared, monit will call the entry's start method if +defined, if does not point to a fifo type (for +instance a directory), monit will disable monitoring of this +entry. If monit runs in passive mode or the start methods is not +defined, monit will just send alerts on errors. +.IP "4. \s-1CHECK\s0 \s-1DEVICE\s0 \s-1PATH\s0 " 4 +.IX Item "4. CHECK DEVICE PATH " + is the path to the device block special file, mount point, +file or a directory which is part of a filesystem. It is +recommended to use a block special file directly (for example +/dev/hda1 on Linux or /dev/dsk/c0t0d0s1 on Solaris, etc.) If you +use a mount point (for example /data), be careful, because if the +device is unmounted the test will still be true because the mount +point exist. +.Sp +If the device becomes unavailable, monit will call the entry's +start method if defined. if does not point to a device, +monit will disable monitoring of this entry. If monit runs in +passive mode or the start methods is not defined, monit will just +send alerts on errors. +.IP "5. \s-1CHECK\s0 \s-1DIRECTORY\s0 \s-1PATH\s0 " 4 +.IX Item "5. CHECK DIRECTORY PATH " + is the absolute path to the directory. If the directory +does not exist or disappeared, monit will call the entry's start +method if defined, if does not point to a directory, monit +will disable monitoring of this entry. If monit runs in passive +mode or the start methods is not defined, monit will just send +alerts on errors. +.IP "6. \s-1CHECK\s0 \s-1HOST\s0 \s-1ADDRESS\s0 " 4 +.IX Item "6. CHECK HOST ADDRESS " +The host address can be specified as a hostname string or as an +ip-address string on a dotted decimal format. Such as, +tildeslash.com or \*(L"64.87.72.95\*(R". +.IP "7. \s-1CHECK\s0 \s-1SYSTEM\s0 " 4 +.IX Item "7. CHECK SYSTEM " +The system name is usualy hostname, but any descriptive name can be +used. This test allows to check general system resources such as +\&\s-1CPU\s0 usage (percent of time spent in user, system and wait), total +memory usage or load average. +.PP +You can use noise keywords like 'if', `and', `with(in)', `has', +`using', 'use', 'on(ly)', `usage' and `program(s)' anywhere in an +entry to make it resemble English. They're ignored, but can make +entries much easier to read at a glance. The punctuation +characters ';' ',' and '=' are also ignored. Keywords are case +insensitive. +.PP +.Vb 1 +\& Here are the legal global keywords: +.Ve +.PP +.Vb 50 +\& Keyword Function +\& \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\- +\& set daemon Set a background poll interval in seconds. +\& set init Set monit to run from init. monit will not +\& transform itself into a daemon process. +\& set logfile Name of a file to dump error\- and status\- +\& messages to. If syslog is specified as the +\& file, monit will utilize the syslog daemon +\& to log messages. This can optionally be +\& followed by 'facility ' where +\& facility is 'log_local0' \- 'log_local7' or +\& 'log_daemon'. If no facility is specified, +\& LOG_USER is used. +\& set mailserver The mailserver used for sending alert +\& notifications. If the mailserver is not +\& defined, monit will try to use 'localhost' +\& as the smtp\-server for sending mail. You +\& can add more mail servers, if monit cannot +\& connect to the first server it will try the +\& next server and so on. +\& set mail\-format Set a global mail format for all alert +\& messages emitted by monit. +\& set pidfile Explicit set the location of the monit lock +\& file. E.g. set pidfile /var/run/xyzmonit.pid. +\& set statefile Explicit set the location of the file monit +\& will write state data to. If not set, the +\& default is $HOME/.monit.state. +\& set httpd port Activates monit http server at the given +\& port number. +\& ssl enable Enables ssl support for the httpd server. +\& Requires the use of the pemfile statement. +\& ssl disable Disables ssl support for the httpd server. +\& It is equal to omitting any ssl statement. +\& pemfile Set the pemfile to be used with ssl. +\& clientpemfile Set the pemfile to be used when client +\& certificates should be checked by monit. +\& address If specified, the http server will only +\& accept connect requests to this addresses +\& This statement is an optional part of the +\& set httpd statement. +\& allow Specifies a host or IP address allowed to +\& connect to the http server. Can also specify +\& a username and password allowed to connect +\& to the server. More than one allow statement +\& are allowed. This statement is also an +\& optional part of the set httpd statement. +\& read\-only Set the user defined in username:password +\& to read only. A read\-only user cannot change +\& a service from the monit web interface. +\& include include a file or files matching the globstring +.Ve +.PP +.Vb 1 +\& Here are the legal service entry keywords: +.Ve +.PP +.Vb 174 +\& Keyword Function +\& \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\- +\& check Starts an entry and must be followed by the type +\& of monitored service {device|directory|file|host +\& process|system} and a descriptive name for the +\& service. +\& pidfile Specify the process pidfile. Every +\& process must create a pidfile with its +\& current process id. This statement should only +\& be used in a process service entry. +\& path Must be followed by a path to the block +\& special file for filesystem (device), regular +\& file, directory or a process's pidfile. +\& group Specify a groupname for a service entry. +\& start The program used to start the specified +\& service. Full path is required. This +\& statement is optional, but recommended. +\& stop The program used to stop the specified +\& service. Full path is required. This +\& statement is optional, but recommended. +\& pid and ppid These keywords may be used as standalone +\& statements in a process service entry to +\& override the alert action for change of +\& process pid and ppid. +\& uid and gid These keywords are either 1) an optional part of +\& a start, stop or exec statement. They may be +\& used to specify a user id and a group id the +\& program (process) should switch to upon start. +\& This feature can only be used if the superuser +\& is running monit. 2) uid and gid may also be +\& used as standalone statements in a file service +\& entry to test a file's uid and gid attributes. +\& host The hostname or IP address to test the port +\& at. This keyword can only be used together +\& with a port statement or in the check host +\& statement. +\& port Specify a TCP/IP service port number which +\& a process is listening on. This statement +\& is also optional. If this statement is not +\& prefixed with a host\-statement, localhost is +\& used as the hostname to test the port at. +\& type Specifies the socket type monit should use when +\& testing a connection to a port. If the type +\& keyword is omitted, tcp is used. This keyword +\& must be followed by either tcp, udp or tcpssl. +\& tcp Specifies that monit should use a TCP +\& socket type (stream) when testing a port. +\& tcpssl Specifies that monit should use a TCP socket +\& type (stream) and the secure socket layer (ssl) +\& when testing a port connection. +\& udp Specifies that monit should use a UDP socket +\& type (datagram) when testing a port. +\& certmd5 The md5 sum of a certificate a ssl forged +\& server has to deliver. +\& proto(col) This keyword specifies the type of service +\& found at the port. monit knows at the moment +\& how to speak HTTP, SMTP, FTP, POP, IMAP, MYSQL, +\& NNTP, SSH, DWP, LDAP2, LDAP3, RDATE, NTP3, DNS, +\& POSTFIX\-POLICY, APACHE\-STATUS, TNS, PGSQL and +\& RSYNC. +\& You're welcome to write new protocol test +\& modules. If no protocol is specified monit will +\& use a default test which in most cases are good +\& enough. +\& request Specifies a server request and must come +\& after the protocol keyword mentioned above. +\& \- for http it can contain an URL and an +\& optional query string. +\& \- other protocols does not support this +\& statement yet +\& send/expect These keywords specify a generic protocol. +\& Both require a string whether to be sent or +\& to be matched against (as extended regex if +\& supported). Send/expect can not be used +\& together with the proto(col) statement. +\& unix(socket) Specifies a Unix socket file and used like +\& the port statement above to test a Unix +\& domain network socket connection. +\& URL Specify an URL string which monit will use for +\& connection testing. +\& content Optional sub\-statement for the URL statement. +\& Specifies that monit should test the content +\& returned by the server against a regular +\& expression. +\& timeout x sec. Define a network port connection timeout. Must +\& be followed by a number in seconds and the +\& keyword, seconds. +\& timeout Define a service timeout. Must be followed by +\& two digits. The first digit is max number of +\& restarts for the service. The second digit +\& is the cycle interval to test restarts. +\& This statement is optional. +\& alert Specifies an email address for notification +\& if a service event occurs. Alert can also +\& be postfixed, to only send a message for +\& certain events. See the examples above. More +\& than one alert statement is allowed in an +\& entry. This statement is also optional. +\& noalert Specifies an email address which don't want +\& to receive alerts. This statement is also +\& optional. +\& restart, stop These keywords may be used as actions for +\& unmonitor, various test statements. The exec statement is +\& start and special in that it requires a following string +\& exec specifying the program to be execute. You may +\& also specify an UID and GID for the exec +\& statement. The program executed will then run +\& using the specified user id and group id. +\& mail\-format Specifies a mail format for an alert message +\& This statement is an optional part of the +\& alert statement. +\& checksum Specify that monit should compute and monitor a +\& file's md5/sha1 checksum. May only be used in a +\& check file entry. +\& expect Specifies a md5/sha1 checksum string monit +\& should expect when testing the checksum. This +\& statement is an optional part of the checksum +\& statement. +\& timestamp Specifies an expected timestamp for a file +\& or directory. More than one timestamp statement +\& are allowed. May only be used in a check file or +\& check directory entry. +\& changed Part of a timestamp statement and used as an +\& operator to simply test for a timestamp change. +\& every Validate this entry only at every n poll cycle. +\& Useful in daemon mode when the cycle is short +\& and a service takes some time to start. +\& mode Must be followed either by the keyword active, +\& passive or manual. If active, monit will restart +\& the service if it is not running (this is the +\& default behavior). If passive, monit will not +\& (re)start the service if it is not running \- it +\& will only monitor and send alerts (resource +\& related restart and stop options are ignored +\& in this mode also). If manual, monit will enter +\& active mode only if a service was started under +\& monit's control otherwise the service isn't +\& monitored. +\& cpu Must be followed by a compare operator, a number +\& with "%" and an action. This statement is used +\& to check the cpu usage in percent of a process +\& with its children over a number of cycles. If +\& the compare expression matches then the +\& specified action is executed. +\& mem The equivalent to the cpu token for memory of a +\& process (w/o children!). This token must be +\& followed by a compare operator a number with +\& unit {B|KB|MB|GB|%|byte|kilobyte|megabyte| +\& gigabyte|percent} and an action. +\& loadavg Must be followed by [1min,5min,15min] in (), a +\& compare operator, a number and an action. This +\& statement is used to check the system load +\& average over a number of cycles. If the compare +\& expression matches then the specified action is +\& executed. +\& children This is the number of child processes spawn by a +\& process. The syntax is the same as above. +\& totalmem The equivalent of mem, except totalmem is an +\& aggregation of memory, not only used by a +\& process but also by all its child +\& processes. The syntax is the same as above. +\& space Must be followed by a compare operator, a +\& number, unit {B|KB|MB|GB|%|byte|kilobyte| +\& megabyte|gigabyte|percent} and an action. +\& inode(s) Must be followed by a compare operator, integer +\& number, optionally by percent sign (if not, the +\& limit is absolute) and an action. +\& perm(ission) Must be followed by an octal number describing +\& the permissions. +\& size Must be followed by a compare operator, a +\& number, unit {B|KB|MB|GB|byte|kilobyte| +\& megabyte|gigabyte} and an action. +\& depends (on) Must be followed by the name of a service this +\& service depends on. +.Ve +.PP +Here's the complete list of reserved \fBkeywords\fR used by monit: +.PP +\&\fIif\fR, \fIthen\fR, \fIelse\fR, \fIset\fR, \fIdaemon\fR, \fIlogfile\fR, +\&\fIsyslog\fR, \fIaddress\fR, \fIhttpd\fR, \fIssl\fR, \fIenable\fR, \fIdisable\fR, +\&\fIpemfile\fR, \fIallow\fR, \fIread-only\fR, \fIcheck\fR, \fIinit\fR, \fIcount\fR, +\&\fIpidfile\fR, \fIstatefile\fR, \fIgroup\fR, \fIstart\fR, \fIstop\fR, \fIuid\fR, +\&\fIgid\fR, \fIconnection\fR, \fIport(number)\fR, \fIunix(socket)\fR, \fItype\fR, +\&\fIproto(col)\fR, \fItcp\fR, \fItcpssl\fR, \fIudp\fR, \fIalert\fR, \fInoalert\fR, +\&\fImail-format\fR, \fIrestart\fR, \fItimeout\fR, \fIchecksum\fR, \fIresource\fR, +\&\fIexpect\fR, \fIsend\fR, \fImailserver\fR, \fIevery\fR, \fImode\fR, \fIactive\fR, +\&\fIpassive\fR, \fImanual\fR, \fIdepends\fR, \fIhost\fR, \fIdefault\fR, \fIhttp\fR, +\&\fIftp\fR, \fIsmtp\fR, \fIpop\fR, \fIntp3\fR, \fInntp\fR, \fIimap\fR, \fIclamav\fR, +\&\fIssh\fR, \fIdwp\fR, \fIldap2\fR, \fIldap3\fR, \fItns\fR, \fIrequest\fR, \fIcpu\fR, +\&\fImem\fR, \fItotalmem\fR, \fIchildren\fR, \fIloadavg\fR, \fItimestamp\fR, +\&\fIchanged\fR, \fIsecond(s)\fR, \fIminute(s)\fR, \fIhour(s)\fR, \fIday(s)\fR, +\&\fIspace\fR, \fIinode\fR, \fIpid\fR, \fIppid\fR, \fIperm(ission)\fR, \fIicmp\fR, +\&\fIprocess\fR, \fIfile\fR, \fIdirectory\fR, \fIdevice\fR, \fIsize\fR, +\&\fIunmonitor\fR, \fIrdate\fR, \fIrsync\fR, \fIdata\fR, \fIinvalid\fR, \fIexec\fR, +\&\fInonexist\fR, \fIpolicy\fR, \fIreminder\fR, \fIinstance\fR, \fIeventqueue\fR, + \fIbasedir\fR, \fIslot(s)\fR, \fIsystem\fR and \fIfailed\fR +.PP +And here is a complete list of \fBnoise keywords\fR ignored by +monit: +.PP +\&\fIis\fR, \fIas\fR, \fIare\fR, \fIon(ly)\fR, \fIwith(in)\fR, \fIand\fR, \fIhas\fR, +\&\fIusing\fR, \fIuse\fR, \fIthe\fR, \fIsum\fR, \fIprogram(s)\fR, \fIthan\fR, \fIfor\fR, +\&\fIusage\fR, \fIwas\fR, \fIbut\fR. +.PP +\&\fBNote:\fR If the \fIstart\fR or \fIstop\fR programs are shell scripts, +then the script must begin with \f(CW\*(C`#!\*(C'\fR and the remainder of the +first line must specify an interpreter for the program. E.g. +\&\f(CW\*(C`#!/bin/sh\*(C'\fR +.PP +It's possible to write scripts directly into the \fIstart\fR and +\&\fIstop\fR entries by using a string of shell\-commands. Like so: +.PP +.Vb 2 +\& start="/bin/bash \-c 'echo $$ > pidfile; exec program'" +\& stop="/bin/bash \-c 'kill \-s SIGTERM `cat pidfile`'" +.Ve +.Sh "\s-1CONFIGURATION\s0 \s-1EXAMPLES\s0" +.IX Subsection "CONFIGURATION EXAMPLES" +The simplest form is just the check statement. In this example we +check to see if the server is running and log a message if not: +.PP +.Vb 1 +\& check process resin with pidfile /usr/local/resin/srun.pid +.Ve +.PP +To have monit start the server if it's not running, add a start +statement: +.PP +.Vb 2 +\& check process resin with pidfile /usr/local/resin/srun.pid +\& start program = "/usr/local/resin/bin/srun.sh start" +.Ve +.PP +Here's a more advanced example for monitoring an apache +web-server listening on the default port number for \s-1HTTP\s0 and +\&\s-1HTTPS\s0. In this example monit will restart apache if it's not +accepting connections at the port numbers. The method monit use +for a process restart is to first execute the stop\-program, wait +for the process to stop and then execute the start\-program. (If +monit was unable to stop or start the service a failed alert +message will be sent if you have requested alert messages to be +sent). +.PP +.Vb 5 +\& check process apache with pidfile /var/run/httpd.pid +\& start program = "/etc/init.d/httpd start" +\& stop program = "/etc/init.d/httpd stop" +\& if failed port 80 then restart +\& if failed port 443 with timeout 15 seconds then restart +.Ve +.PP +This example demonstrate how you can run a program as a specified +user (uid) and with a specified group (gid). Many daemon programs +will do the uid and gid switch by them self, but for those +programs that does not (e.g. Java programs), monit's ability to +start a program as a certain user can be very useful. In this +example we start the Tomcat Java Servlet Engine as the standard +\&\fInobody\fR user and group. Please note that monit will only switch +uid and gid for a program if the super-user is running monit, +otherwise monit will simply ignore the request to change uid and +gid. +.PP +.Vb 7 +\& check process tomcat with pidfile /var/run/tomcat.pid +\& start program = "/etc/init.d/tomcat start" +\& as uid nobody and gid nobody +\& stop program = "/etc/init.d/tomcat stop" +\& # You can also use id numbers instead and write: +\& as uid 99 and with gid 99 +\& if failed port 8080 then alert +.Ve +.PP +In this example we use udp for connection testing to check if the +name-server is running and also use timeout and alert: +.PP +.Vb 5 +\& check process named with pidfile /var/run/named.pid +\& start program = "/etc/init.d/named start" +\& stop program = "/etc/init.d/named stop" +\& if failed port 53 use type udp protocol dns then restart +\& if 3 restarts within 5 cycles then timeout +.Ve +.PP +The following example illustrate how to check if the service +\&'sophie' is answering connections on its Unix domain socket: +.PP +.Vb 4 +\& check process sophie with pidfile /var/run/sophie.pid +\& start program = "/etc/init.d/sophie start" +\& stop program = "/etc/init.d/sophie stop" +\& if failed unix /var/run/sophie then restart +.Ve +.PP +In this example we check an apache web-server running on +localhost that answers for several IP-based virtual hosts or +vhosts, hence the host statement before port: +.PP +.Vb 7 +\& check process apache with pidfile /var/run/httpd.pid +\& start "/etc/init.d/httpd start" +\& stop "/etc/init.d/httpd stop" +\& if failed host www.sol.no port 80 then alert +\& if failed host shop.sol.no port 443 then alert +\& if failed host chat.sol.no port 80 then alert +\& if failed host www.tildeslash.com port 80 then alert +.Ve +.PP +To make sure that monit is communicating with a http server a +protocol test can be added: +.PP +.Vb 6 +\& check process apache with pidfile /var/run/httpd.pid +\& start "/etc/init.d/httpd start" +\& stop "/etc/init.d/httpd stop" +\& if failed host www.sol.no port 80 +\& protocol HTTP +\& then alert +.Ve +.PP +This example shows a different way to check a webserver using +the send/expect mechanism: +.PP +.Vb 7 +\& check process apache with pidfile /var/run/httpd.pid +\& start "/etc/init.d/httpd start" +\& stop "/etc/init.d/httpd stop" +\& if failed host www.sol.no port 80 +\& send "GET / HTTP/1.0\er\enHost: www.sol.no\er\en\er\en" +\& expect "HTTP/[0\-9\e.]{3} 200 .*\er\en" +\& then alert +.Ve +.PP +To make sure that Apache is logging successfully (i.e. no more +than 60 percent of child servers are logging), use its mod_status +page at www.sol.no/server\-status with this special protocol test: +.PP +.Vb 5 +\& check process apache with pidfile /var/run/httpd.pid +\& start "/etc/init.d/httpd start" +\& stop "/etc/init.d/httpd stop" +\& if failed host www.sol.no port 80 +\& protocol apache\-status loglimit > 60% then restart +.Ve +.PP +This configuration can be used to alert you if 25 percent or more +of Apache child processes are stuck performing \s-1DNS\s0 lookups: +.PP +.Vb 5 +\& check process apache with pidfile /var/run/httpd.pid +\& start "/etc/init.d/httpd start" +\& stop "/etc/init.d/httpd stop" +\& if failed host www.sol.no port 80 +\& protocol apache\-status dnslimit > 25% then alert +.Ve +.PP +Here we use an icmp ping test to check if a remote host is up and +if not send an alert: +.PP +.Vb 3 +\& check host www.tildeslash.com with address www.tildeslash.com +\& if failed icmp type echo count 5 with timeout 15 seconds +\& then alert +.Ve +.PP +In the following example we ask monit to compute and verify the +checksum for the underlying apache binary used by the start and +stop programs. If the the checksum test should fail, monitoring +will be disabled to prevent possibly starting a compromised +binary: +.PP +.Vb 5 +\& check process apache with pidfile /var/run/httpd.pid +\& start program = "/etc/init.d/httpd start" +\& stop program = "/etc/init.d/httpd stop" +\& if failed host www.tildeslash.com port 80 then restart +\& depends on apache_bin +.Ve +.PP +.Vb 2 +\& check file apache_bin with path /usr/local/apache/bin/httpd +\& if failed checksum then unmonitor +.Ve +.PP +In this example we ask monit to test the checksum for a document +on a remote server. If the checksum was changed we send an alert: +.PP +.Vb 7 +\& check host tildeslash with address www.tildeslash.com +\& if failed port 80 protocol http +\& and request "/monit/dist/monit\-4.0.tar.gz" +\& with checksum f9d26b8393736b5dfad837bb13780786 +\& then alert +\& alert hauk@tildeslash.com with mail\-format {subject: +\& Aaaalarm! } +.Ve +.PP +Some servers are slow starters, like for example Java based +Application Servers. So if we want to keep the poll-cycle low +(i.e. < 60 seconds) but allow some services to take its time to +start, the \fBevery\fR statement is handy: +.PP +.Vb 5 +\& check process dynamo with pidfile /etc/dynamo.pid +\& start program = "/etc/init.d/dynamo start" +\& stop program = "/etc/init.d/dynamo stop" +\& if failed port 8840 then alert +\& every 2 cycles +.Ve +.PP +Here is an example where we group together two database entries +so you can manage them together, e.g.; 'monit \-g database start +all'. The mode statement is also illustrated in the first entry +and have the effect that monit will not try to (re)start this +service if it is not running: +.PP +.Vb 5 +\& check process sybase with pidfile /var/run/sybase.pid +\& start = "/etc/init.d/sybase start" +\& stop = "/etc/init.d/sybase stop" +\& mode passive +\& group database +.Ve +.PP +.Vb 6 +\& check process oracle with pidfile /var/run/oracle.pid +\& start program = "/etc/init.d/oracle start" +\& stop program = "/etc/init.d/oracle stop" +\& mode active # Not necessary really, since it's the default +\& if failed port 9001 then restart +\& group database +.Ve +.PP +Here is an example to show the usage of the resource checks. It +will send an alert when the \s-1CPU\s0 usage of the http daemon and its +child processes raises beyond 60% for over two cycles. Apache is +restarted if the \s-1CPU\s0 usage is over 80% for five cycles or the +memory usage over 100Mb for five cycles or if the machines load +average is more than 10 for 8 cycles: +.PP +.Vb 7 +\& check process apache with pidfile /var/run/httpd.pid +\& start program = "/etc/init.d/httpd start" +\& stop program = "/etc/init.d/httpd stop" +\& if cpu > 60% for 2 cycles then alert +\& if cpu > 80% for 5 cycles then restart +\& if mem > 100 MB for 5 cycles then stop +\& if loadavg(5min) greater than 10.0 for 8 cycles then stop +.Ve +.PP +This examples demonstrate the timestamp statement with exec and +how you may restart apache if its configuration file was +changed. +.PP +.Vb 3 +\& check file httpd.conf with path /etc/httpd/httpd.conf +\& if changed timestamp +\& then exec "/etc/init.d/httpd graceful" +.Ve +.PP +In this example we demonstrate usage of the extended alert +statement and a file check dependency: +.PP +.Vb 15 +\& check process apache with pidfile /var/run/httpd.pid +\& start = "/etc/init.d/httpd start" +\& stop = "/etc/init.d/httpd stop" +\& if failed host www.tildeslash.com port 80 then restart +\& alert admin@bar on {nonexist, timeout} +\& with mail\-format { +\& from: bofh@$HOST +\& subject: apache $EVENT \- $ACTION +\& message: This event occurred on $HOST at $DATE. +\& Your faithful employee, +\& monit +\& } +\& if 3 restarts within 5 cycles then timeout +\& depend httpd_bin +\& group apache +.Ve +.PP +.Vb 12 +\& check file httpd_bin with path /usr/local/apache/bin/httpd +\& if failed checksum +\& and expect 8f7f419955cefa0b33a2ba316cba3659 +\& then unmonitor +\& if failed permission 755 then unmonitor +\& if failed uid root then unmonitor +\& if failed gid root then unmonitor +\& if changed timestamp then alert +\& alert security@bar on {checksum, timestamp, +\& permission, uid, gid} +\& with mail\-format {subject: Alaaarrm! on $HOST} +\& group apache +.Ve +.PP +In this example, we demonstrate usage of the depend statement. In +this case, we want to start oracle and apache. However, we've set +up apache to use oracle as a back end, and if oracle is +restarted, apache must be restarted as well. +.PP +.Vb 4 +\& check process apache with pidfile /var/run/httpd.pid +\& start = "/etc/init.d/httpd start" +\& stop = "/etc/init.d/httpd stop" +\& depends on oracle +.Ve +.PP +.Vb 4 +\& check process oracle with pidfile /var/run/oracle.pid +\& start = "/etc/init.d/oracle start" +\& stop = "/etc/init.d/oracle stop" +\& if failed port 9001 then restart +.Ve +.PP +Next, we have 2 services, oracle-import and oracle-export that +need to be restarted if oracle is restarted, but are independent +of each other. +.PP +.Vb 4 +\& check process oracle with pidfile /var/run/oracle.pid +\& start = "/etc/init.d/oracle start" +\& stop = "/etc/init.d/oracle stop" +\& if failed port 9001 then restart +.Ve +.PP +.Vb 5 +\& check process oracle\-import +\& with pidfile /var/run/oracle\-import.pid +\& start = "/etc/init.d/oracle\-import start" +\& stop = "/etc/init.d/oracle\-import stop" +\& depends on oracle +.Ve +.PP +.Vb 5 +\& check process oracle\-export +\& with pidfile /var/run/oracle\-export.pid +\& start = "/etc/init.d/oracle\-export start" +\& stop = "/etc/init.d/oracle\-export stop" +\& depends on oracle +.Ve +.PP +Finally an example with all statements: +.PP +.Vb 23 +\& check process apache with pidfile /var/run/httpd.pid +\& start program = "/etc/init.d/httpd start" +\& stop program = "/etc/init.d/httpd stop" +\& if 3 restarts within 5 cycles then timeout +\& if failed host www.sol.no port 80 protocol http +\& and use the request "/login.cgi" +\& then alert +\& if failed host shop.sol.no port 443 type tcpssl +\& protocol http and with timeout 15 seconds +\& then restart +\& if cpu is greater than 60% for 2 cycles then alert +\& if cpu > 80% for 5 cycles then restart +\& if totalmem > 100 MB then stop +\& if children > 200 then alert +\& alert bofh@bar with mail\-format {from: monit@foo.bar.no} +\& every 2 cycles +\& mode active +\& depends on weblogic +\& depends on httpd.pid +\& depends on httpd.conf +\& depends on httpd_bin +\& depends on datafs +\& group server +.Ve +.PP +.Vb 6 +\& check file httpd.pid with path /usr/local/apache/logs/httpd.pid +\& group server +\& if timestamp > 7 days then restart +\& every 2 cycles +\& alert bofh@bar with mail\-format {from: monit@foo.bar.no} +\& depends on datafs +.Ve +.PP +.Vb 7 +\& check file httpd.conf with path /etc/httpd/httpd.conf +\& group server +\& if timestamp was changed +\& then exec "/usr/local/apache/bin/apachectl graceful" +\& every 2 cycles +\& alert bofh@bar with mail\-format {from: monit@foo.bar.no} +\& depends on datafs +.Ve +.PP +.Vb 13 +\& check file httpd_bin with path /usr/local/apache/bin/httpd +\& group server +\& if failed checksum and expect the sum +\& 8f7f419955cefa0b33a2ba316cba3659 then unmonitor +\& if failed permission 755 then unmonitor +\& if failed uid root then unmonitor +\& if failed gid root then unmonitor +\& if changed size then alert +\& if changed timestamp then alert +\& every 2 cycles +\& alert bofh@bar with mail\-format {from: monit@foo.bar.no} +\& alert foo@bar on { checksum, size, timestamp, uid, gid } +\& depends on datafs +.Ve +.PP +.Vb 12 +\& check device datafs with path /dev/sdb1 +\& group server +\& start program = "/bin/mount /data" +\& stop program = "/bin/umount /data" +\& if failed permission 660 then unmonitor +\& if failed uid root then unmonitor +\& if failed gid disk then unmonitor +\& if space usage > 80 % then alert +\& if space usage > 94 % then stop +\& if inode usage > 80 % then alert +\& if inode usage > 94 % then stop +\& alert root@localhost +.Ve +.PP +.Vb 7 +\& check host ftp.redhat.com with address ftp.redhat.com +\& if failed icmp type echo with timeout 15 seconds +\& then alert +\& if failed port 21 protocol ftp +\& then exec "/usr/X11R6/bin/xmessage \-display +\& :0 ftp connection failed" +\& alert foo@bar.com +.Ve +.PP +.Vb 7 +\& check host www.gnu.org with address www.gnu.org +\& if failed port 80 protocol http +\& and request "/pub/gnu/bash/bash\-2.05b.tar.gz" +\& with checksum 8f7f419955cefa0b33a2ba316cba3659 +\& then alert +\& alert rms@gnu.org with mail\-format { +\& subject: The gnu server may be hacked again! } +.Ve +.PP +Note; only the \fBcheck type\fR, \fBpidfile/path/address\fR statements +are mandatory, the other statements are optional and the order of +the optional statements is not important. +.SH "MONIT WITH HEARTBEAT" +.IX Header "MONIT WITH HEARTBEAT" +You can download \fIheartbeat\fR from +http://www.linux\-ha.org/download/. It might be useful to have a +look at The Heartbeat Getting Started Guide at: +http://www.linux\-ha.org/GettingStarted.html +.PP +\&\fBStarting up a Node\fR +.PP +This is the normal start sequence for a cluster\-node. With this +sequence, there should be no error\-case, which is not handled +either by heartbeat or by monit. For example, if monit dies, +initd restarts it. If heartbeat dies, monit restarts it. If the +node dies, the heartbeat instance on the other node detects it +and restart the services there. +.IP "1. initd starts monit with group local" 4 +.IX Item "1. initd starts monit with group local" +.PD 0 +.IP "2. monit starts heartbeat in local group" 4 +.IX Item "2. monit starts heartbeat in local group" +.IP "3. heartbeat requests monit to start the node group" 4 +.IX Item "3. heartbeat requests monit to start the node group" +.IP "4. monit starts the node group" 4 +.IX Item "4. monit starts the node group" +.PD +.PP +\&\fBMonit: \f(BI/etc/monitrc\fB\fR +.PP +This example describes a cluster with 2 nodes. Services running +on Node 1 are in the group \fInode1\fR and Node 2 services are in +the \fInode2\fR group. +.PP +The local group entries are mode \fIactive\fR, the node group +entries are mode \fImanual\fR and controlled by heartbeat. +.PP +.Vb 3 +\& # +\& # local services on both hosts +\& # +.Ve +.PP +.Vb 6 +\& check process heartbeat with pidfile /var/run/heartbeat.pid +\& start program = "/etc/init.d/heartbeat start" +\& stop program = "/etc/init.d/heartbeat start" +\& mode active +\& alert foo@bar +\& group local +.Ve +.PP +.Vb 6 +\& check process postfix with pidfile /var/run/postfix/master.pid +\& start program = "/etc/init.d/postfix start" +\& stop program = "/etc/init.d/postfix stop" +\& mode active +\& alert foo@bar +\& group local +.Ve +.PP +.Vb 3 +\& # +\& # node1 services +\& # +.Ve +.PP +.Vb 7 +\& check process apache with pidfile /var/apache/logs/httpd.pid +\& start program = "/etc/init.d/apache start" +\& stop program = "/etc/init.d/apache stop" +\& depends named +\& alert foo@bar +\& mode manual +\& group node1 +.Ve +.PP +.Vb 6 +\& check process named with pidfile /var/tmp/named.pid +\& start program = "/etc/init.d/named start" +\& stop program = "/etc/init.d/named stop" +\& alert foo@bar +\& mode manual +\& group node1 +.Ve +.PP +.Vb 3 +\& # +\& # node2 services +\& # +.Ve +.PP +.Vb 6 +\& check process named\-slave with pidfile /var/tmp/named\-slave.pid +\& start program = "/etc/init.d/named\-slave start" +\& stop program = "/etc/init.d/named\-slave stop" +\& mode manual +\& alert foo@bar +\& group node2 +.Ve +.PP +.Vb 7 +\& check process squid with pidfile /var/squid/logs/squid.pid +\& start program = "/etc/init.d/squid start" +\& stop program = "/etc/init.d/squid stop" +\& depends named\-slave +\& alert foo@bar +\& mode manual +\& group node2 +.Ve +.PP +\&\fBinitd: \f(BI/etc/inittab\fB\fR +.PP +Monit is started on both nodes with initd. You will need to add +an entry in \fI/etc/inittab\fR to start monit with the same local +group heartbeat is member of. +.PP +.Vb 2 +\& #/etc/inittab +\& mo:2345:respawn:/usr/local/bin/monit \-d 10 \-c /etc/monitrc \-g local +.Ve +.PP +\&\fBheartbeat: \f(BI/etc/ha.d/haresources\fB\fR +.PP +When heartbeat starts, heartbeat looks up the node entry and +start the script \fI/etc/init.d/monit\-node1\fR or +\&\fI/etc/init.d/monit\-node2\fR. The script calls monit to start the +specific group per node. +.PP +.Vb 3 +\& # /etc/ha.d/haresources +\& node1 IPaddr::172.16.100.1 monit\-node1 +\& node2 IPaddr::172.16.100.2 monit\-node2 +.Ve +.PP +\&\fB\f(BI/etc/init.d/monit\-node1\fB\fR +.PP +.Vb 11 +\& #!/bin/bash +\& # +\& # sample script for starting/stopping all services on node1 +\& # +\& prog="/usr/local/bin/monit \-g node1" +\& start() +\& { +\& echo \-n $"Starting $prog:" +\& $prog start all +\& echo +\& } +.Ve +.PP +.Vb 6 +\& stop() +\& { +\& echo \-n $"Stopping $prog:" +\& $prog stop all +\& echo +\& } +.Ve +.PP +.Vb 10 +\& case "$1" in +\& start) +\& start;; +\& stop) +\& stop;; +\& *) +\& echo $"Usage: $0 {start|stop}" +\& RETVAL=1 +\& esac +\& exit $RETVAL +.Ve +.Sh "Handling state" +.IX Subsection "Handling state" +As mentioned elsewhere, monit save its state to a state file. If +the monit process should die, upon restart monit will read its +last known state from this file. This can be a problem if monit +is used in a cluster, as illustrate in this scenario: +.IP "1" 4 +.IX Item "1" +The active node fails, the second takes over +.IP "2" 4 +.IX Item "2" +After a reboot, the failed node comes back, monit read its state +file and start all the services (even manual ones) as they were +running before the failure. This is a problem because services +will now run on both nodes. +.PP +The solution to this problem is to remove the monit.state file in +a rc-script called at boot time and before monit is started. +.SH "FILES" +.IX Header "FILES" +\&\fI~/.monitrc\fR + Default run control file +.PP +\&\fI/etc/monitrc\fR + If the control file is not found in the default + location and /etc contains a \fImonitrc\fR file, this + file will be used instead. +.PP +\&\fI./monitrc\fR + If the control file is not found in either of the + previous two locations, and the current working + directory contains a \fImonitrc\fR file, this file is + used instead. +.PP +\&\fI~/.monitrc.pid\fR + Lock file to help prevent concurrent runs (non\-root + mode). +.PP +\&\fI/var/run/monit.pid\fR + Lock file to help prevent concurrent runs (root mode, + Linux systems). +.PP +\&\fI/etc/monit.pid\fR + Lock file to help prevent concurrent runs (root mode, + systems without /var/run). +.PP +\&\fI~/.monit.state\fR + monit save its state to this file and utilize + information found in this file to recover from + a crash. This is a binary file and its content is + only of interest to monit. You may set the location + of this file in the monit control file or by using + the \-s switch when monit is started. +.SH "ENVIRONMENT" +.IX Header "ENVIRONMENT" +No environment variables are used by monit. However, when monit +execute a script or a program monit will set several environment +variables which can be utilized by the executable. The following +and \fIonly\fR the following environment variables are available: +.IP "\s-1MONIT_EVENT\s0" 4 +.IX Item "MONIT_EVENT" +The event that occurred on the service +.IP "\s-1MONIT_SERVICE\s0" 4 +.IX Item "MONIT_SERVICE" +The name of the service (from monitrc) on which the event +occurred. +.IP "\s-1MONIT_DATE\s0" 4 +.IX Item "MONIT_DATE" +The time and date (rfc 822 style) the event occurred +.IP "\s-1MONIT_HOST\s0" 4 +.IX Item "MONIT_HOST" +The host the event occurred on +.PP +The following environment variables are only available for +process service entries: +.IP "\s-1MONIT_PROCESS_PID\s0" 4 +.IX Item "MONIT_PROCESS_PID" +The process pid. This may be 0 if the process was (re)started, +.IP "\s-1MONIT_PROCESS_MEMORY\s0" 4 +.IX Item "MONIT_PROCESS_MEMORY" +Process memory. This may be 0 if the process was (re)started, +.IP "\s-1MONIT_PROCESS_CHILDREN\s0" 4 +.IX Item "MONIT_PROCESS_CHILDREN" +Process children. This may be 0 if the process was (re)started, +.IP "\s-1MONIT_PROCESS_CPU_PERCENT\s0" 4 +.IX Item "MONIT_PROCESS_CPU_PERCENT" +Process cpu%. This may be 0 if the process was (re)started, +.PP +In addition the following spartan \s-1PATH\s0 environment variable is +available: +.IP "PATH=/bin:/usr/bin:/sbin:/usr/sbin" 4 +.IX Item "PATH=/bin:/usr/bin:/sbin:/usr/sbin" +.PP +Scripts or programs that depends on other environment variables +or on a more verbose \s-1PATH\s0 must provide means to set these +variables by them self. +.SH "SIGNALS" +.IX Header "SIGNALS" +If a monit daemon is running, \s-1SIGUSR1\s0 wakes it up from its sleep +phase and forces a poll of all services. \s-1SIGTERM\s0 and \s-1SIGINT\s0 will +gracefully terminate a monit daemon. The \s-1SIGTERM\s0 signal is sent +to a monit daemon if monit is started with the \fIquit\fR action +argument. +.PP +Sending a \s-1SIGHUP\s0 signal to a running monit daemon will force +the daemon to reinitialize itself, specifically it will reread +configuration, close and reopen log files. +.PP +Running monit in foreground while a background monit daemon is +running will wake up the daemon. +.SH "NOTES" +.IX Header "NOTES" +This is a very silent program. Use the \-v switch if you want to +see what monit is doing, and tail \-f the logfile. Optionally for +testing purposes; you can start monit with the \-Iv switch. Monit +will then print debug information to the console, to stop monit +in this mode, simply press CTRL^C (i.e. \s-1SIGINT\s0) in the same +console. +.PP +The syntax (and parser) of the control file is inspired by Eric +S. Raymond et al. excellent fetchmail program. Some portions of +this man page does also receive inspiration from the same +authors. +.SH "AUTHORS" +.IX Header "AUTHORS" +Jan-Henrik Haukeland , +Martin Pala , +Christian Hopp , +Rory Toma +.PP +See also http://www.tildeslash.com/monit/who.html +.SH "COPYRIGHT" +.IX Header "COPYRIGHT" +Copyright (C) 2000\-2007 by the monit project group. All Rights +Reserved. This product is distributed in the hope that it will be +useful, but \s-1WITHOUT\s0 any warranty; without even the implied +warranty of \s-1MERCHANTABILITY\s0 or \s-1FITNESS\s0 for a particular purpose. +.SH "SEE ALSO" +.IX Header "SEE ALSO" +\&\s-1GNU\s0 text utilities; \fImd5sum\fR\|(1); \fIsha1sum\fR\|(1); \fIopenssl\fR\|(1); \fIglob\fR\|(7); +\&\fIregex\fR\|(7)