summaryrefslogtreecommitdiff
path: root/mail/bmf/files/bmf.1
diff options
context:
space:
mode:
Diffstat (limited to 'mail/bmf/files/bmf.1')
-rw-r--r--mail/bmf/files/bmf.1153
1 files changed, 0 insertions, 153 deletions
diff --git a/mail/bmf/files/bmf.1 b/mail/bmf/files/bmf.1
deleted file mode 100644
index 62a911c93ad..00000000000
--- a/mail/bmf/files/bmf.1
+++ /dev/null
@@ -1,153 +0,0 @@
-.\"Generated by db2man.xsl. Don't modify this, modify the source.
-.de Sh \" Subsection
-.br
-.if t .Sp
-.ne 5
-.PP
-\fB\\$1\fR
-.PP
-..
-.de Sp \" Vertical space (when we can't use .PP)
-.if t .sp .5v
-.if n .sp
-..
-.de Ip \" List item
-.br
-.ie \\n(.$>=3 .ne \\$3
-.el .ne 3
-.IP "\\$1" \\$2
-..
-.TH "BMF" 1 "" "" ""
-.SH NAME
-bmf \- efficient Bayesian mail filter
-.SH "SYNOPSIS"
-
-.nf
-\fBbmf\fR [-t] [-n] [-s] [-N] [-S] [-f fmt] [-d db] [-i file] [-p] [-v] [-h]
-.fi
-
-.SH "DESCRIPTION"
-
-.PP
-bmf is a Bayesian mail filter. In its normal mode of operation, it takes an email message or other text on standard input, does a statistical check against lists of "good" and "spam" words, registers the new data, and returns a status code indicating whether or not the message is spam. BMF is written with fast, zero-copy algorithms, coded directly in C, and tuned for speed. It aims to be faster, smaller, and more versatile than similar applications.
-
-.PP
-NOTE: The input is assumed to be a single message. If you need to process multiple messages (such as an mbox folder), use formail(1) with the -s option.
-
-.SH "OPTIONS"
-
-.PP
-Without command-line options, bmf processes the input, registers it as either "good" or "spam", and returns the appropriate error code. The wordlist directory and nonexistent wordfiles are created if absent.
-
-.PP
-\fB-t\fR Test to see if the input is spam. The word lists are not updated. A report is written to stdout showing the final score and the tokens with the highest deviation form a mean of 0.5.
-
-.PP
-\fB-n\fR Register the input as non-spam.
-
-.PP
-\fB-s\fR Register the input as spam.
-
-.PP
-\fB-N\fR Register the input as non-spam and undo a prior registration as spam.
-
-.PP
-\fB-S\fR Register the input as spam and undo a prior registration as non-spam.
-
-.PP
-\fB-f fmt\fR Specify database format. Valid formats are text, db, and mysql. Text is always valid. The others may not be available if the corresponding option was not enabled at compile time. The default is db if available, else text.
-
-.PP
-\fB-d db\fR Specify database or directory for loading and saving word lists. The default is \fI~/.bmf\fR in text mode.
-
-.PP
-\fB-i file\fR Use file for input instead of stdin.
-
-.PP
-\fB-p\fR Copy the input to the output (passthrough) and insert spam headers in the style of SpamAssassin. An X-Spam-Status header is always inserted with processing details. If the input is judged to be spam, an X-Spam-Flag header is also inserted.
-
-.PP
-\fB-v\fR Display version information.
-
-.PP
-\fB-h\fR Display usage information.
-
-.SH "THEORY OF OPERATION"
-
-.PP
-bmf treats its input as a bag of tokens. Each token is checked against "good" and "bad" wordlists, which maintain counts of the numbers of times it has occurred in non-spam and spam mails. These numbers are used to compute the probability that a mail in which the token occurs is spam. After probabilities for all input tokens have been computed, a fixed number of the probabilities that deviate furthest from average (15 by default) are combined using Bayes's theorem on conditional probabilities.
-
-.PP
-While this method sounds crude compared to the more usual pattern-matching approach, it turns out to be extremely effective. Paul Graham's paper A Plan For Spam: \fIhttp://www.paulgraham.com/spam.html\fR is recommended reading.
-
-.PP
-bmf improves on Paul's proposal by doing smarter lexical analysis with an exact reimplementation of the lexer in bogofilter. In particular, hostames and IP addresses are retained as regognition features rather than broken up. Various kinds of MTA cruft such as dates and message-IDs are discarded so as not to bloat the word lists.
-
-.PP
-MIME and other attachments are not decoded. Experience from watching the token streams suggests that spam with enclosures invariably gives itself away through cues in the headers and non-enclosure parts. Nonetheless, I would like to add the ability to decode quoted-printable and perhaps base64 encodings for textual attachments.
-
-.SH "INTEGRATION WITH OTHER TOOLS"
-
-.PP
-The following procmail rule will analyze the input on stdin, update the wordlists, and direct it to \fIMail/spam\fR if bmf thinks it's spam:
-
-.nf
-
-:0HB:
-* ? bmf
-$MAILDIR/spam
-
-.fi
-
-.PP
-If bmf fails (returning 2) the message will be treated as non-spam.
-
-.PP
-The following \fI.muttrc\fR lines will create mutt macros for dispatching mail to bmf.
-
-.nf
-
-macro index \\ed "enter-commandunset wait_key\\npipe-entrybmf -S\\nenter-commandset wait_key\\nchange-folder=spam"
-macro index \\eu "enter-commandunset wait_key\\npipe-entrybmf -N\\nenter-commandset wait_key\\nchange-folder=inbox"
-
-.fi
-
-.SH "RETURN VALUES"
-
-.PP
-In passthrough mode: zero for success, nonzero for failure.
-
-.PP
-In non-passthrough mode: 0 for spam; 1 for non-spam; 2 for I/O or other errors.
-
-.SH "FILES"
-
-.TP
-\fI~/.bmf/goodlist.txt\fR
-List of good tokens for text mode.
-
-.TP
-\fI~/.bmf/spamlist.db\fR
-List of bad tokens for text mode.
-
-.TP
-\fI~/.bmf/goodlist.db\fR
-List of good tokens for libdb mode.
-
-.TP
-\fI~/.bmf/spamlist.db\fR
-List of bad tokens for libdb mode.
-
-.SH "BUGS"
-
-.PP
-The lexer should recognize MIME header lines and attachments.
-
-.PP
-Content-Transfer-Encoding is not decoded.
-
-.SH "AUTHOR"
-
-.PP
-Tom Marshall <tommy@tig-grr.com>. The Bayes algorithm is from bogofilter by Eric S. Raymond <esr@thyrsus.com>. bogofilter can be found at the bogofilter project page: \fIhttp://bogofilter.sourceforge.net/\fR.
-