1
<?xml version="1.0" encoding="ISO-8859-1"?>
2
<!DOCTYPE refentry PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN" "http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd">
3
<refentry id="bogoutil.1">
5
<refentrytitle>bogoutil</refentrytitle>
6
<manvolnum>1</manvolnum>
9
<refname>bogoutil</refname>
10
<refpurpose>Dumps, loads, and maintains bogofilter database files</refpurpose>
12
<refsynopsisdiv id="synopsis">
15
<command>bogoutil</command>
16
<arg choice="opt">options</arg>
18
<arg choice="plain">-d <replaceable>file</replaceable></arg>
19
<arg choice="plain">-H</arg>
20
<arg choice="plain">-l <replaceable>file</replaceable></arg>
21
<arg choice="plain">-m</arg>
22
<arg choice="plain">-w <replaceable>file_or_dir</replaceable></arg>
23
<arg choice="plain">-p <replaceable>file_or_dir</replaceable></arg>
25
<arg choice="plain"><replaceable>file.db</replaceable></arg>
29
<command>bogoutil</command>
31
<arg choice="plain">-r</arg>
32
<arg choice="plain">-R</arg>
34
<arg choice="plain"><replaceable>directory</replaceable></arg>
38
<command>bogoutil</command>
40
<arg choice="plain">-h</arg>
41
<arg choice="plain">-V</arg>
45
<para>where <option>options</option> is</para>
47
<arg choice="opt">-v</arg>
48
<arg choice="opt">-n</arg>
49
<arg choice="opt">-D</arg>
50
<arg choice="opt">-a <replaceable>age</replaceable></arg>
51
<arg choice="opt">-c <replaceable>count</replaceable></arg>
52
<arg choice="opt">-s <replaceable>min,max</replaceable></arg>
53
<arg choice="opt">-y <replaceable>date</replaceable></arg>
54
<arg choice="opt">-I <replaceable>file</replaceable></arg>
55
<arg choice="opt">-x <replaceable>flags</replaceable></arg>
59
<refsect1 id="description">
60
<title>DESCRIPTION</title>
61
<para><application>Bogoutil</application> is part of the bogofilter Bayesian spam filter package.</para>
62
<para>It is used to dump and load bogofilter's Berkeley DB databases to and
63
from text files, perform database maintenance functions, and to display the
64
values for specific words.</para>
66
<refsect1 id="options">
67
<title>OPTIONS</title>
69
The <option>-d <replaceable>file</replaceable></option>
70
option tells <application>bogoutil</application> to print
71
the contents of the database file to <option>stdout</option>.
74
The <option>-H <replaceable>file_or_dir</replaceable></option>
75
option tells <application>bogoutil</application> to print
76
a histogram of the specified database file to
77
<option>stdout</option>. The output is similar to
78
<application>bogofilter -vv</application>. Finally,
79
hapaxes (tokens which were only seen once) and pure tokens
80
(tokens which were encountered only in ham or only in
84
The <option>-l <replaceable>file</replaceable></option>
85
option tells <application>bogoutil</application> to load
86
to load the data from <option>stdin</option> into the database file.
88
<para>The <option>-m</option> option tells <application>bogoutil</application>
89
to perform maintenance functions on the specified database, i.e. discard tokens
90
that are older than desired, have counts that are too small, or sizes (lengths)
91
that are too long or too short.
94
The <option>-w <replaceable>file_or_dir</replaceable></option>
95
option tells <application>bogoutil</application> to
96
display token information from the database. The option
97
takes an argument, which is either the name of the
98
wordlist (usually wordlist.db) or the name of the directory
99
containing it. Tokens can be listed on the command line
100
or piped to <application>bogoutil</application>. When
101
there are extra arguments on the command line,
102
<application>bogoutil</application> will use them as the
103
tokens to lookup. If there are no extra arguments,
104
<application>bogoutil</application> will read tokens from
105
<option>stdin</option>.
108
The <option>-p <replaceable>file_or_dir</replaceable></option>
109
option tells <application>bogoutil</application> to
110
display the database information for one or more tokens.
111
The display includes a probability column with the
112
token's spam score (computed using
113
<application>bogofilter</application>'s default values).
114
Option <option>-p</option> takes the same arguments as
115
option <option>-w</option> .
117
<para>The <option>-r</option> option tells
118
<application>bogoutil</application> to recalculate the ROBX
119
value and print it as a six-digit fraction.
121
<para>The <option>-R</option> option does the same as <option>-r</option>, but prints more
122
information and saves the result in the training database.
124
<para>The <option>-I <replaceable>file</replaceable></option> option tells
125
<application>bogoutil</application> to read its input from
126
<replaceable>file</replaceable> rather than stdin.
129
The <option>-v</option> option produces verbose output on <option>stderr</option>.
130
This option is primarily useful for debugging.
132
<para>The <option>-D</option> redirects debug output to stdout (it
133
usually goes to stderr).</para>
134
<para>The <option>-x <replaceable>flags</replaceable></option>
135
option sets debugging flags.</para>
137
Option <option>-n</option> stands for "replace non-ascii characters".
138
It will replace characters with the high bit (0x80) by question marks.
139
This can be useful if a word list has lots of unreadable tokens, for example from asian spam.
140
The "bad" characters will be converted to question marks and matching tokens will be combined
141
when used with '-m' or '-l', but not with '-d'.
144
Option <option>-a age</option> indicates an acceptable token age, with older ones being discarded.
145
The age can be a date (in form YYYYMMMDD) or a day count, i.e. discard tokens older than
146
<option>age</option> days.
149
Option <option>-c value</option> indicates that tokens with counts less than or equal to <option>value</option>
153
Option <option>-s min,max</option> is used to discard tokens based on their size, i.e. length.
154
All tokens shorter than <option>min</option> or longer than <option>max</option> will be discarded.
157
Option <option>-y date</option> is specifies the date to give to tokens that don't have dates.
159
<para>The <option>-h</option> option prints the help message and exits.</para>
160
<para>The <option>-V</option> option prints the version number and exits.</para>
163
<refsect1 id="dataformat">
164
<title>DATA FORMAT</title>
166
<application>Bogoutil</application> reads and writes text files where each nonblank
167
line consists of a word, any amount of horizontal whitespace, a numeric word count,
168
more whitespace, and (optionally) a date in form YYYYMMDD.
169
Blank lines are skipped.
173
<refsect1 id="returns">
174
<title>RETURN VALUES</title>
176
0 for successful operation.
178
3 for I/O or other errors.
179
Error 3 usually means that something is seriously wrong with the database files.
182
<refsect1 id="author">
183
<title>AUTHOR</title>
184
<para>Gyepi Sam <email>gyepi@praxis-sw.com</email>.</para>
185
<para>Matthias Andree <email>matthias.andree@gmx.de</email>.</para>
186
<para>David Relson <email>relson@osagesoftware.com</email>.</para>
188
For updates, see <ulink url="http://bogofilter.sourceforge.net/">
189
the bogofilter project page</ulink>.