3
This note explains how to use dbacl with the TREC 2005 Spam Filter
4
Evaluation Toolkit (or spamjig for short). The spamjig is a system you
5
can install to test and compare several spam filters with either
6
public data or your own private data. It is/was developed as part of
7
the NIST TREC 2005 conference.
9
The TREC Spam Filter Evalutation Toolkit can be downloaded from the
12
http://plg.uwaterloo.ca/~trlynam/spamjig/
14
The spamjig has a similar purpose as dbacl's mailcross testsuite
15
commands (see the man page for mailcross(1)), but uses a different
16
methodology with a possibly different selection of open and closed
17
source spam filters, and may be more up to date than the mailcross
18
wrappers for some filters.
20
This README file only covers the spamjig aspects directly related to
21
dbacl, please refer to the spamjig's documentation for other
22
installation and usage instructions.
24
If you have downloaded dbacl as part of the spamjig, then you
25
already have a self extracting archive, named something like this:
27
dbacl-1.9.1.TREC.sfx.sh
29
In that case, you can skip the next section. Otherwise, you will have
30
to create the file above from scratch, as explained below.
32
PREPARING THE DBACL SELF-EXTRACTING SHELL SCRIPT
34
The spamjig expects dbacl to come as a self-extracting shell script.
35
To create this script from the normal dbacl-1.xxx.tar.gz is very easy.
36
Suppose you have downloaded the file dbacl-1.9.1.tar.gz, then you
39
tar xfz dbacl-1.9.1.tar.gz
41
./configure && make trec
43
This will automatically create a self-extracting script named
44
dbacl-1.9.1.TREC.sfx.sh and place it into the dbacl-1.9.1 directory.
46
USING THE SELF-EXTRACTING SCRIPT WITH THE SPAMJIG
48
To use the spamjig with a self extracting archive, first create
49
a directory where you would like to run the spamjig test. Normally,
50
this is a subdirectory of the spamjig working directory itself.
52
Next, you should copy the file dbacl-1.xxx.TREC.sfx.sh into your
53
chosen working directory, and type from within that directory
55
./dbacl-1.xxx.TREC.sfx.sh
57
You will obtain a list of instructions as well as a set of possible
58
optional parameters. Follow these instructions to create (in the
59
current working directory) all the necessary programs and scripts.
60
If something goes wrong, it should be printed on your terminal, so
61
please read the messages.
63
Upon success, you will have several scripts named initialize,
64
classify, train, finalize, in the same directory containing the
65
self extracting archive. These scripts are used by the spamjig,
66
consult the spamjig documentation for details.
68
Note: The self extracting archive checks for a local file named
69
OPTIONS.default. If this file is found in the current directory,
70
then you will not see instructions, but instead all the test jig
71
files will be extracted directly.
75
The dbacl program has several switches and options which can result in
76
different classification performance. The spamjig scripts supplied
77
with dbacl are designed to allow you to experiment with different
80
The switches and settings used for a simulation are defined in a
81
file called OPTIONS which exists in the share/dbacl/TREC subdirectory,
82
ie the same directory containing this README file. This file is
83
recreated every time initialize is called, so you cannot make changes
86
To change the simulation options, you have two choices: you can either select
87
a predefined OPTIONS file among the variants which are bundled with dbacl, or
88
you can write your own.
92
The initialize script accepts the name of an OPTIONS file on the command line, eg
94
initialize OPTIONS.simple
96
Here OPTIONS.simple is one among the OPTIONS.* files which are found in the
97
dbacl-xxx/TREC/ source directory, where the program was compiled.
99
Possible options are more or less as follows:
107
OPTIONS.bi-adp-unif-d
109
Remember that initialize will recreate the share/dbacl/TREC/OPTIONS file by
110
overwriting it with one of the above.
112
Each OPTIONS.* file is a text file and contains descriptions of the
113
algorithmic choices it mandates and other relevant information.
115
For the actual TREC conference, a specially named set of OPTIONS.* files
116
exist, but dbacl is packaged with several others for your convenience.
120
You can also create your own OPTIONS.xxx file if the predefined variants are
121
not to your liking. To do so, simply create a file named OPTIONS.custom
122
and place it in the same directory which contains the self extracting archive
123
(ie where also the initialize script is created). Then you can type
125
initialize OPTIONS.custom
127
and the initialize script will look for the file OPTIONS.custom first among its
128
predefined variants, and then in the current working directory if not found.
129
The OPTIONS.custom file will overwrite the share/dbacl/TREC/OPTIONS file, and
130
the simulation will use your custom settings.