summaryrefslogtreecommitdiff
path: root/mail/bmf
diff options
context:
space:
mode:
authormjl <mjl>2002-10-25 14:18:13 +0000
committermjl <mjl>2002-10-25 14:18:13 +0000
commit6225cd67c655017779a91d4fb09ae767a8c4fc99 (patch)
treee483feaab53564db5a1f8cc470d3a2cf0399c8a6 /mail/bmf
parent7732c20d4c5d723f9981e47a96378c63ccee737c (diff)
downloadpkgsrc-6225cd67c655017779a91d4fb09ae767a8c4fc99.tar.gz
Update to 0.9.4 (note that the author changed versioning system,
our last version was 0.84). * Update documentation. * Move Bayes stuff into its own file. * Fix NaN exception: if list is empty, use zero for probability. * Make extrema array (keepers) variable size. Needs more work. * Add SYSLIBS to the makefile. * Fix gcc-ism in dbg.c (ptr arithmetic on void*). * Fix off-by-one in html tag check. * Fix unaligned access in libdb. * Fix bug in -d handling for text and libdb. * Autodetect mailbox type and deprecate the -m option. * Ditch the builtin libdb locks, use fcntl instead. * Fix memory leak in dbtext. * Fix some trivial issues with the lexer: - Be more strict about recognizing IP addresses. - Do case-insensitive header name comparisons. * Fix multiple database closure with mbox format. * Fix a bogus assert in passthrough. * Add heap checking in debug mode. * Fix bug in -N mode which made it act the same as -S. * Support maildir style folders. * Fix bug in multiple message registration. * Improve error reporting and clarify some messages. * Package preformatted manpage instead of XML. * Remove single message per invocation restriction.
Diffstat (limited to 'mail/bmf')
-rw-r--r--mail/bmf/Makefile13
-rw-r--r--mail/bmf/distinfo8
-rw-r--r--mail/bmf/files/bmf.1153
-rw-r--r--mail/bmf/files/bmfconv.181
-rw-r--r--mail/bmf/patches/patch-ab11
5 files changed, 10 insertions, 256 deletions
diff --git a/mail/bmf/Makefile b/mail/bmf/Makefile
index a610dc4edd9..420c81b033d 100644
--- a/mail/bmf/Makefile
+++ b/mail/bmf/Makefile
@@ -1,7 +1,7 @@
-# $NetBSD: Makefile,v 1.2 2002/10/12 18:18:22 mjl Exp $
+# $NetBSD: Makefile,v 1.3 2002/10/25 14:18:13 mjl Exp $
#
-DISTNAME= bmf-0.84
+DISTNAME= bmf-0.9.4
CATEGORIES= mail
MASTER_SITES= $(MASTER_SITE_SOURCEFORGE:=bmf/)
@@ -14,13 +14,4 @@ HAS_CONFIGURE= YES
CONFIGURE_ARGS+= --without-mysql
MAKE_ENV+= DESTDIR=${PREFIX} BINDIR=/bin MANDIR=/man
-ALL_TARGET= apps
-
-# Note: Until we figure out how to build the docs (it needs some
-# xml files I cannot figure out where they can be found), we just
-# ship pre-build man pages (snarfed from the Debian package).
-
-post-build:
- ${CP} files/*.1 ${WRKSRC}
-
.include "../../mk/bsd.pkg.mk"
diff --git a/mail/bmf/distinfo b/mail/bmf/distinfo
index abe7c4fd363..845c14c9e50 100644
--- a/mail/bmf/distinfo
+++ b/mail/bmf/distinfo
@@ -1,6 +1,6 @@
-$NetBSD: distinfo,v 1.1.1.1 2002/10/10 11:05:31 mjl Exp $
+$NetBSD: distinfo,v 1.2 2002/10/25 14:18:13 mjl Exp $
-SHA1 (bmf-0.84.tar.gz) = 9ac63586403f0984e0569b11b3f81354469b7a56
-Size (bmf-0.84.tar.gz) = 40101 bytes
+SHA1 (bmf-0.9.4.tar.gz) = d0b7ab253a531a533fefc6bc0691cc7af7ccea79
+Size (bmf-0.9.4.tar.gz) = 36740 bytes
SHA1 (patch-aa) = 2dccc61f00373e67868bdc8e49e791e7b41fa8c0
-SHA1 (patch-ab) = ee28061f04567832d7497cf50f2678c2df327bfe
+SHA1 (patch-ab) = 453971a7be6d6e2b53a0e99a3b1ab64f7b161416
diff --git a/mail/bmf/files/bmf.1 b/mail/bmf/files/bmf.1
deleted file mode 100644
index 62a911c93ad..00000000000
--- a/mail/bmf/files/bmf.1
+++ /dev/null
@@ -1,153 +0,0 @@
-.\"Generated by db2man.xsl. Don't modify this, modify the source.
-.de Sh \" Subsection
-.br
-.if t .Sp
-.ne 5
-.PP
-\fB\\$1\fR
-.PP
-..
-.de Sp \" Vertical space (when we can't use .PP)
-.if t .sp .5v
-.if n .sp
-..
-.de Ip \" List item
-.br
-.ie \\n(.$>=3 .ne \\$3
-.el .ne 3
-.IP "\\$1" \\$2
-..
-.TH "BMF" 1 "" "" ""
-.SH NAME
-bmf \- efficient Bayesian mail filter
-.SH "SYNOPSIS"
-
-.nf
-\fBbmf\fR [-t] [-n] [-s] [-N] [-S] [-f fmt] [-d db] [-i file] [-p] [-v] [-h]
-.fi
-
-.SH "DESCRIPTION"
-
-.PP
-bmf is a Bayesian mail filter. In its normal mode of operation, it takes an email message or other text on standard input, does a statistical check against lists of "good" and "spam" words, registers the new data, and returns a status code indicating whether or not the message is spam. BMF is written with fast, zero-copy algorithms, coded directly in C, and tuned for speed. It aims to be faster, smaller, and more versatile than similar applications.
-
-.PP
-NOTE: The input is assumed to be a single message. If you need to process multiple messages (such as an mbox folder), use formail(1) with the -s option.
-
-.SH "OPTIONS"
-
-.PP
-Without command-line options, bmf processes the input, registers it as either "good" or "spam", and returns the appropriate error code. The wordlist directory and nonexistent wordfiles are created if absent.
-
-.PP
-\fB-t\fR Test to see if the input is spam. The word lists are not updated. A report is written to stdout showing the final score and the tokens with the highest deviation form a mean of 0.5.
-
-.PP
-\fB-n\fR Register the input as non-spam.
-
-.PP
-\fB-s\fR Register the input as spam.
-
-.PP
-\fB-N\fR Register the input as non-spam and undo a prior registration as spam.
-
-.PP
-\fB-S\fR Register the input as spam and undo a prior registration as non-spam.
-
-.PP
-\fB-f fmt\fR Specify database format. Valid formats are text, db, and mysql. Text is always valid. The others may not be available if the corresponding option was not enabled at compile time. The default is db if available, else text.
-
-.PP
-\fB-d db\fR Specify database or directory for loading and saving word lists. The default is \fI~/.bmf\fR in text mode.
-
-.PP
-\fB-i file\fR Use file for input instead of stdin.
-
-.PP
-\fB-p\fR Copy the input to the output (passthrough) and insert spam headers in the style of SpamAssassin. An X-Spam-Status header is always inserted with processing details. If the input is judged to be spam, an X-Spam-Flag header is also inserted.
-
-.PP
-\fB-v\fR Display version information.
-
-.PP
-\fB-h\fR Display usage information.
-
-.SH "THEORY OF OPERATION"
-
-.PP
-bmf treats its input as a bag of tokens. Each token is checked against "good" and "bad" wordlists, which maintain counts of the numbers of times it has occurred in non-spam and spam mails. These numbers are used to compute the probability that a mail in which the token occurs is spam. After probabilities for all input tokens have been computed, a fixed number of the probabilities that deviate furthest from average (15 by default) are combined using Bayes's theorem on conditional probabilities.
-
-.PP
-While this method sounds crude compared to the more usual pattern-matching approach, it turns out to be extremely effective. Paul Graham's paper A Plan For Spam: \fIhttp://www.paulgraham.com/spam.html\fR is recommended reading.
-
-.PP
-bmf improves on Paul's proposal by doing smarter lexical analysis with an exact reimplementation of the lexer in bogofilter. In particular, hostames and IP addresses are retained as regognition features rather than broken up. Various kinds of MTA cruft such as dates and message-IDs are discarded so as not to bloat the word lists.
-
-.PP
-MIME and other attachments are not decoded. Experience from watching the token streams suggests that spam with enclosures invariably gives itself away through cues in the headers and non-enclosure parts. Nonetheless, I would like to add the ability to decode quoted-printable and perhaps base64 encodings for textual attachments.
-
-.SH "INTEGRATION WITH OTHER TOOLS"
-
-.PP
-The following procmail rule will analyze the input on stdin, update the wordlists, and direct it to \fIMail/spam\fR if bmf thinks it's spam:
-
-.nf
-
-:0HB:
-* ? bmf
-$MAILDIR/spam
-
-.fi
-
-.PP
-If bmf fails (returning 2) the message will be treated as non-spam.
-
-.PP
-The following \fI.muttrc\fR lines will create mutt macros for dispatching mail to bmf.
-
-.nf
-
-macro index \\ed "enter-commandunset wait_key\\npipe-entrybmf -S\\nenter-commandset wait_key\\nchange-folder=spam"
-macro index \\eu "enter-commandunset wait_key\\npipe-entrybmf -N\\nenter-commandset wait_key\\nchange-folder=inbox"
-
-.fi
-
-.SH "RETURN VALUES"
-
-.PP
-In passthrough mode: zero for success, nonzero for failure.
-
-.PP
-In non-passthrough mode: 0 for spam; 1 for non-spam; 2 for I/O or other errors.
-
-.SH "FILES"
-
-.TP
-\fI~/.bmf/goodlist.txt\fR
-List of good tokens for text mode.
-
-.TP
-\fI~/.bmf/spamlist.db\fR
-List of bad tokens for text mode.
-
-.TP
-\fI~/.bmf/goodlist.db\fR
-List of good tokens for libdb mode.
-
-.TP
-\fI~/.bmf/spamlist.db\fR
-List of bad tokens for libdb mode.
-
-.SH "BUGS"
-
-.PP
-The lexer should recognize MIME header lines and attachments.
-
-.PP
-Content-Transfer-Encoding is not decoded.
-
-.SH "AUTHOR"
-
-.PP
-Tom Marshall <tommy@tig-grr.com>. The Bayes algorithm is from bogofilter by Eric S. Raymond <esr@thyrsus.com>. bogofilter can be found at the bogofilter project page: \fIhttp://bogofilter.sourceforge.net/\fR.
-
diff --git a/mail/bmf/files/bmfconv.1 b/mail/bmf/files/bmfconv.1
deleted file mode 100644
index 895258fff1f..00000000000
--- a/mail/bmf/files/bmfconv.1
+++ /dev/null
@@ -1,81 +0,0 @@
-.\"Generated by db2man.xsl. Don't modify this, modify the source.
-.de Sh \" Subsection
-.br
-.if t .Sp
-.ne 5
-.PP
-\fB\\$1\fR
-.PP
-..
-.de Sp \" Vertical space (when we can't use .PP)
-.if t .sp .5v
-.if n .sp
-..
-.de Ip \" List item
-.br
-.ie \\n(.$>=3 .ne \\$3
-.el .ne 3
-.IP "\\$1" \\$2
-..
-.TH "BMFCONV" 1 "" "" ""
-.SH NAME
-bmfconv \- Database converter for bmf
-.SH "SYNOPSIS"
-
-.nf
-\fBbmfconv\fR [-f fmt] [-d db] [-e] [-i] [-v] [-h]
-.fi
-
-.SH "DESCRIPTION"
-
-.PP
-bmfconv converts bmf token databases between the supported formats. It can import flat text files into databases and export databases into flat text files.
-
-.PP
-PLEASE NOTE that the text files used in import and export operations are read and written in the current directory.
-
-.SH "OPTIONS"
-
-.PP
-\fB-f fmt\fR Specify database format. Supported formats are "db" for libdb and "mysql" for MySQL.
-
-.PP
-\fB-d db\fR Specify database name.
-
-.PP
-\fB-e\fR Export the database to text files.
-
-.PP
-\fB-i\fR Import the database from text files.
-
-.PP
-\fB-v\fR Display version information.
-
-.PP
-\fB-h\fR Display usage information.
-
-.SH "RETURN VALUES"
-
-.PP
-0 if conversion succeeds, nonzero if conversion fails.
-
-.SH "FILES"
-
-.TP
-\fIgoodlist.txt\fR
-Text file for import or export of good tokens.
-
-.TP
-\fIspamlist.txt\fR
-Text file for import or export of spam tokens.
-
-.SH "BUGS"
-
-.PP
-Should be more robust.
-
-.SH "AUTHOR"
-
-.PP
-Tom Marshall <tommy@tig-grr.com>. bmfconv is a part of the bmf package.
-
diff --git a/mail/bmf/patches/patch-ab b/mail/bmf/patches/patch-ab
index 6ec17b87dd5..906eefb06f5 100644
--- a/mail/bmf/patches/patch-ab
+++ b/mail/bmf/patches/patch-ab
@@ -1,14 +1,11 @@
-$NetBSD: patch-ab,v 1.1.1.1 2002/10/10 11:05:31 mjl Exp $
+$NetBSD: patch-ab,v 1.2 2002/10/25 14:18:14 mjl Exp $
---- Makefile.in.orig Thu Oct 10 12:04:54 2002
-+++ Makefile.in Thu Oct 10 12:05:18 2002
-@@ -1,7 +1,7 @@
- # Makefile for bmf
+--- Makefile.in.orig Sun Oct 20 22:27:56 2002
++++ Makefile.in Fri Oct 25 16:01:40 2002
+@@ -2,4 +2,4 @@
-BINDIR=/usr/bin
-MANDIR=/usr/share/man
+BINDIR?=/usr/bin
+MANDIR?=/usr/share/man
- VERSION=0.84
-