summaryrefslogtreecommitdiff
path: root/mail/OSBF-lua/DESCR
diff options
context:
space:
mode:
authorshannonjr <shannonjr>2008-10-13 11:29:53 +0000
committershannonjr <shannonjr>2008-10-13 11:29:53 +0000
commit90bfda788faa89b1cb2f9c4a2b0afecd0f57f78b (patch)
treeacffac4ce9b9c4dbfce6cfb6348a63a4eb1a158e /mail/OSBF-lua/DESCR
parentb93590833e1dafaec827a30edc218ec646dd9a5e (diff)
downloadpkgsrc-90bfda788faa89b1cb2f9c4a2b0afecd0f57f78b.tar.gz
Rename lua-OSBF to OSBL-lua for consistency with package name
Diffstat (limited to 'mail/OSBF-lua/DESCR')
-rw-r--r--mail/OSBF-lua/DESCR21
1 files changed, 21 insertions, 0 deletions
diff --git a/mail/OSBF-lua/DESCR b/mail/OSBF-lua/DESCR
new file mode 100644
index 00000000000..b9a121a32f0
--- /dev/null
+++ b/mail/OSBF-lua/DESCR
@@ -0,0 +1,21 @@
+OSBF-Lua (Orthogonal Sparse Bigrams with confidence Factor) is a Lua C module
+for text classification. It is a port of the OSBF classifier implemented in
+the CRM114 project. This implementation attempts to put focus on the
+classification task itself by using Lua as the scripting language, a powerful
+yet light-weight and fast language, which makes it easier to build and test
+more elaborated filters and training methods.
+
+The OSBF algorithm is a typical Bayesian classifier but enhanced with two
+techniques originally developed for the CRM114 project: Orthogonal Sparse
+Bigrams - OSB, for feature extraction, and Exponential Differential Document
+Count - EDDC (a.k.a Confidence Factor), for automatic feature selection.
+Combined, these two techniques produce a highly accurate classifier. OSBF
+was developed focused on two classes, SPAM and NON-SPAM, so the performance
+for more than two classes may not be the same.
+
+spamfilter.lua is an anti-spam filter written in Lua using the OSBF-lua
+module. It takes special advantage of EDDC to introduce TONE-HR, a highly
+effective training method. The combination of OSB, EDDC and TONE-HR to
+enhance a classical Bayesian classifier resulted in the best spam filtering
+performance in TREC's Spam Track 2006 and the CEAS 2008 Live Spam Filter
+Challenge.