summaryrefslogtreecommitdiff
path: root/mail/spamprobe
diff options
context:
space:
mode:
authorhubertf <hubertf>2004-11-18 12:46:53 +0000
committerhubertf <hubertf>2004-11-18 12:46:53 +0000
commit21ded16f81534a2ff2836353fc933df552dfba94 (patch)
treee1371e54bfbd7d0ccbccd8eb2a2dafbf0bec8961 /mail/spamprobe
parentd14df0b4bc363d958cdce0fc5cdaf08d65e37afd (diff)
downloadpkgsrc-21ded16f81534a2ff2836353fc933df552dfba94.tar.gz
Update spamprobe to 1.0a, patch sent via IRC by the maintainer.
Changes: * MimeLineReader.cc: 1.0 branch - fixed MBX record header regex * spamprobe.cc (main): Added exec and exec-shared commands. (import_words): modified import command to allow negative values to be specified in the import file. * Applied patches for configure.in and aclocal.m4 contributed by Siggy Brentrup for debian compatibility. * FrequencyDBImpl_pbl.cc: Invokes new WordData methods to allow storing data in big endian format. * WordData.h: Added optional support for storing counts/flags in big endian order for data portability. * MimeLineReader.cc (readMBXFileHeader): UW IMAP MBX file format is now auto detected from the first line of the mailbox file. * spamprobe.cc (process_extended_options): Removed -o imap-mbx option. * spamprobe.cc (process_extended_options): Added -o imap-mbx option to process files as WU-IMAP MBX files rather than mbox files. * MimeLineReader.cc (readLine): Added support for WU-IMAP MBX file format. * spamprobe.cc (process_stream): Added -o tokenized option to allow people to use an external tokenizer with spamprobe. * SpamFilter.cc (scoreToken): Reduced sorting overhead by pre-computing and integer sort value with sorting priorities reflected in the value. This eliminates several calculations inside of the sort routine. * SpamFilter.cc (computeRatio): Capped ratios in calculations to within MIN_PROB and MAX_PROB. Widened that range. This avoids problems with div/0 and makes it easier to sort terms. * spamprobe.cc (dump_words): dump command can now optionally accept a regular expression as an argument and will only dump terms matching the regular expression. (purge_terms): Added purge-terms command to purge from the database all terms matching a regular expression. * spamprobe.cc (main): Fixed bug in command line processing. Thanks to Jem for bug report. * spamprobe.cc (train_on_message): Code simplified. Eliminated redundant recalculation of scores. (train_on_message): Timestamps are now longer updated by train-spam and train-good commands. They are still updated by train command. (main): Fixed assertion if -P option is specified in a read only operation. * spamprobe.cc (main): Added -C command line option to allow users to specify their own min word count. * SpamFilter.cc (SpamFilter): Set default minimum word count back to 5 (was 3). * spamprobe.cc (process_extended_options): Removed "alt-score" from -o options list because it distributes scores poorly. New formula achieves the same end with better accuracy. Added "orig-score" option to allow people to continue using the old formula. Added "honor-xstatus-header" option for people whose mail server uses X-Status: rather than Status: for the deleted flag. (main): Added -l command line option to allow people to set their own spam threshold if they don't like the default value. * SpamFilter.cc (scoreMessage): Added a new scoring formula based on Paul's but taking the nth root of spam and good probabilities to produce more evenly distributed scores. Lowered the spam threshold to 0.6 to keep accuracy about the same as the original formula. Highest score seen for a ham so far in tests is 0.44 so 0.6 seems safe. Made the new formula the default instead of Paul's.
Diffstat (limited to 'mail/spamprobe')
-rw-r--r--mail/spamprobe/Makefile5
-rw-r--r--mail/spamprobe/distinfo6
2 files changed, 5 insertions, 6 deletions
diff --git a/mail/spamprobe/Makefile b/mail/spamprobe/Makefile
index 39ecda2c741..e465bac369b 100644
--- a/mail/spamprobe/Makefile
+++ b/mail/spamprobe/Makefile
@@ -1,7 +1,6 @@
-# $NetBSD: Makefile,v 1.10 2004/10/03 00:12:54 tv Exp $
+# $NetBSD: Makefile,v 1.11 2004/11/18 12:46:53 hubertf Exp $
-DISTNAME= spamprobe-0.9h
-PKGREVISION= 1
+DISTNAME= spamprobe-1.0a
CATEGORIES= mail
MASTER_SITES= ${MASTER_SITE_SOURCEFORGE:=spamprobe/}
diff --git a/mail/spamprobe/distinfo b/mail/spamprobe/distinfo
index f55be504145..1d226d2407f 100644
--- a/mail/spamprobe/distinfo
+++ b/mail/spamprobe/distinfo
@@ -1,4 +1,4 @@
-$NetBSD: distinfo,v 1.5 2004/02/03 20:49:34 hubertf Exp $
+$NetBSD: distinfo,v 1.6 2004/11/18 12:46:53 hubertf Exp $
-SHA1 (spamprobe-0.9h.tar.gz) = 34a4d5dc622570cc109a92f1a4b2222d4d3b08ff
-Size (spamprobe-0.9h.tar.gz) = 161164 bytes
+SHA1 (spamprobe-1.0a.tar.gz) = 4077b4b5280b29fa08b31b3131ee5cf005faefd7
+Size (spamprobe-1.0a.tar.gz) = 165747 bytes