~ubuntu-branches/ubuntu/hardy/dbacl/hardy

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
dbacl 1.12:
	* some tests in make check need C locale (found by Szperacz).
	* swap order of #include "util.h" and #include<math.h> for gcc 4.0
	* defined dummy yywrap() in risk-parser.y
	* added the TREC2005 options files to the distribution.
	* when using -T html:styles, now also shows CSS class.
	* bug fix: some MMAP calls incorrectly tested NULL instead of MAP_FAILED.
	* bug fix: std_tokenizer lost tokens with M_OPTION_NGRAM_STRADDLE_NL.
	* bug fix: xml_character_filter, XTAG attributes straddling newlines.
	* changed -T email:xheaders semantics slightly.
	* re-added EXIT_STATUS section in man page (was lost after rewrite).
	* fixed exit status when learning, to conform with Unix convention,
	  but classifying exit status continues to be nonstandard.
	* new option -e char for single characters.
	* updated code in mailinspect.c so that it compiles with slang2.
	  (thanks Clint Adams)
	* changed util.h and tests/Makefile.am for IRIX make compilation.
	  (thanks to anonymous submitter)
	* todo: find unchecked null pointers
	* todo: fix flex linking problem
	* new hypex command, a Chernoff exponent calculator for dbacl.
	* optionally scan sources with splint (tricky, incomplete)
	  (thanks to Markus Elfring for pointing out the need for it).
dbacl 1.11:
	* bug fix in mailcross.
	* bug fix: SIGACTION changed to HAVE_SIGACTION.
	* add slowdown warning on STDERR when learner hash won't grow any more.
	* set extra_lines = 0 in process_file when U_OPTION_FILTER applies.
	* add new switch -S to be used with -w, and make -w more standard.
	* replace tmpfile() call with mytmpfile() + unlink().
	* change the formula for score renormalizations and make complexity
	  into a fractional quantity.
	* update/edit the manpages and tutorials.
        * add new function fast_partial_save_learner().
dbacl 1.10:
	* small changes to the documentation.
	* add new TREC directory containing spamjig scripts.
	* apply vpath changes, thanks to Clint Adams, see contrib directory.
	* change is_binline() to be more robust across locales.
	* bug fix in test scripts in case program doesn't run in C locale.
	* -m switch now applies to learning with -o switch.
	* add "b" to all the fopen() calls, including stdin handling.
	* fix typos and updated documentation (thanks Keith Briggs).
	* bug fix: add check for q = NULL pointer in std_tokenizer.
	* implicit parentheses around regexes with -g switch.
	* -0 switch is now default, added -1 switch to force preloading.
	* -X switch no longer default for learning.
	* convert some E_ERROR messages into E_FATAL.
	* remove hacks in make_dirichlet_digrams(), agrees with dbacl.ps again.
	* bug fix: handle style attribute in XTAGS if M_OPTION_SHOW_STYLE.
	* parsing improvement: detect MIME headers with missing boundary.
	* loophole fix: IGNORE_MIME_PREAMBLE in mbw.c.
dbacl 1.9:
	* bug fix: bayesol expects '^scores', dbacl writes '^# scores', found
	  by Darryl Luff (thanks).
	* dbacl -l now accepts directory names as well as mboxes.
	* add new -U switch, to measure MAP ambiguity.
	* change "n/a" type confidence value (-X switch) from 101 to 0,
	  which is friendlier to mailinspect.
	* bug fix: -vnX displayed wrong percentage.
	* new hmine command.
	* add two new scoring types in mailinspect (-o switch).
	* fix interactive compilation of mailinspect, which got broken by
	  previous automake redesign.
	* new -T email:theaders option.
	* bug fix: portable categories had been disabled in dbacl.h
dbacl 1.8.1:
	* reformat output scores.
	* fix -d switch to work during classification.
	* new -m switch to speed up learning/classifying.
	* bug fix: is_adp_char/is_cef_char was buggy, now ok + more modular.
	* stop printing control codes with -D and -d switches.
	* handle RFC 1153 format.
	* add -T html:forms switch.
	* bug fix: add extra newlines to input and flush filter caches.
	* bug fix: uri encoding.
	* bug fix: message/rfc822 mime type.
	* where possible, write portable (byte order) category files.
	* new make check autotest scripts (finally!).
	* standardize error messages a bit more.
	* rework automake system (after doing teh RTFM).
	* remove boost regex code.
	* move some common functions to new file util.c.
	* fix bug in get_token_type() when MBOX mode is off.
	* redesign the process_file() functions, fixing bugs.
	* limit single token sizes to prevent numeric overflows in digitization.
	* replace wcsncasecmp with mystrncasecmp as former is broken on glibc. 
	* cosmetic change to "summarize" testsuite commands.
dbacl 1.8:
	* revise dbacl.ps: fixes typos (thanks to Keith Briggs) and brings
	  theory up to date.
	* change html:links option to display full unparsed url.
	* email mode now defaults to -e adp and -L uniform, unlike previous
	  version, whose behaviour can be obtained with -e cef -L dirichlet.
	* new -e switch parameter (adp).
	* new support for token classes. Not used much for now.
	* rework reference measure estimations, modified -L switch.	
	* change default model policy from multinomial to hierarchical. To
	  obtain multinomial from now on, -M must always be used.
	* add _GNU_SOURCE to shut up posix_memalign warnings on Linux.
	* bug fix: decode_html_entity. This function just keeps bugging me. 
	* fix slightly the handling of HTML comments.
	* fix some more options/bugs in the testsuite wrappers.
	* bug fix: in digitizing digrams, because format has changed.
dbacl 1.7:
	* add -q switch to control learning quality/speed.
	* rework entropy optimization to reduce variability, and preload
	  weights if category already exists (-0 switch).
	* improve buffering behaviour when using -f switch. Discovered
	  by Yoav Aner. 
	* add signal handlers with notification to stderr.
	* add new costs.ps document describing the bayesol cost calculations.
	* add new -N switch to bayesol.
	* add basic "plot" commands for mailtoe/mailfoot (needs gnuplot).
	* add new -o switch. Useful for faster mailtoe/mailfoot simulations.
	* modify -x switch to skip full messages when used with -T "email" .
	* add madvise calls for hash tables.
	* bug fix: memset address was wrong in grow_learner_hash().
	* bug fix: more robust quote parser inside xml tags. Discovered
	  by spammers. Thanks, whoever you are ;-)
	* saving category files is now atomic, and cannot be corrupted.
	* save category files with 440 permissions only.
	* remove some deprecated bogofilter wrappers.
	* forgot to check sscanf return values. 
	* bug fix: no plain_text_filter() for non-plaintext body parts.
	* bug fix: decode_html_entity.
	* bug fix: use 64 bit hashes when defining "huge" memory model.
	* fix dbacl display bug with -N switch and single category.
dbacl 1.6:
	* add new testsuite wrappers for crm114, SpamAssassin, SpamOracle.
	* new autoconf check for wcstol (in case C library is incomplete). 
	  Thanks to Marian Steinbach.
	* new mailtoe and mailfoot commands similar to mailcross.
	* merge the mailcross and mailcross.testsuite commands, and invert 
	  their exit codes to get normal shell conventions.
	* new -T switch to scan attachments somewhat like strings(1).
	* fix a crash in mailinspect when viewing mailboxes with more than
	  1024 messages.
	* remove "growing hashtable" warning - it's distracting and useless.
dbacl 1.5.1:
	* fix a trivial, but serious bug: mail messages were not parsed
	  if they didn't start with From_.
	* streamline html entity decoding inside XTAGS
	* add new testsuite wrapper script for popfile (tested with 0.20.1 only).
	* cosmetic changes to the testsuite scripts (e.g. /bin/sh --> /bin/bash
	  everywhere, at least until the scripts become truly portable).
dbacl 1.5:	
	* new mailcross.testsuite command.
	* use poor man's templates (in mbw.c) for parsing functions. 
	* add several new -T switches.
	* completely rework the xml filter.
	* add new base64 and quoted-printable decoders.
	* fix hang bug with signed chars during learning.
	* fix bugs in mbox parser. 
	* slight reorganization and new features in mailcross.
	* make digramic transitions semireversible.
	* rework email.html tutorial.
	* invert the readline and ncurses tests in configure.in
dbacl 1.4:
	* add the -mieee switch on Alpha processors 
	  (to prevent incorrect divide by zero errors - we use IEEE fp).
	* add a tutorial for email classification.
	* add dependency on ncurses to satisfy libreadline, following
	  a suggestion by Christian Loitsch.
	* change slightly the bayesol risk calculation and parse risk
	  spec costs directly on log scale.
	* add an -e switch to replace alpha character class tokenization.
	* add a -L switch to replace digramic measure with Laplacian measure. 
	* add -fsigned-char compilation switch to Makefile.in for portability. 
	  Discovered by Kerry Todyruik.
	* change slightly the mbox/MIME parsing algorithm to fix
	  bugs where Base64 encoded attachements aren't skipped.
	* change some of the sample*.txt files to preempt copyright issues,
	  following a suggestion by Johannes Huesing.
	* fix misplaced post_line_fun() call in *process_file().
	* use AC_FUNC_MBRTOWC in configure script instead of 
	  manually checking headers.
dbacl 1.3.1:
	* add basic vi cursor key support in mailinspect.
	* don't pack structs if not GNU C compiler.
	* remove hh modifier in fprintf calls.
	* disable wide character support for BSD style machines. 
	* fix solaris compilation problem with sys/types.h. Thanks to
	  Wes Groleau for pointing out the bug.
	* fix a typo in tutorial.
	* fix for gcc-3.x compilation problem. '^M' is now '\r'. Thanks to 
	  Mike Frysinger <vapier@gentoo.org>
dbacl 1.3:	
	* add a paragraph to the README
	* dbacl now counts the number of emails found during learning.
	* mailinspect permits (interactively) sorting an mbox by category
	* refactored functions to allow sharing in dbacl.c and mailinspect.c
dbacl 1.2.1:	
	* new -H switch allows dynamically growing hash tables during learning
	* bayesol now warns if complexities are disparate + warning in tutorial
	* new command mailcross to perform cross validation
	* new -A switch as a companion for -a switch
	* remove empirical.track_features limitation on line length
dbacl 1.2:
	* add simple-minded feature decimation for memory-constrained operation
	* add new Bayes solution calculator (bayesol)
	* add a tutorial
dbacl 1.1:
	* add handler for regex submatch bitmaps.
	* add a new dump model switch (-d).
	* add new code for hierarchical type models (incl. -w switch). 
	* speed up hash macros
	* insert typedefs for fine grained portability control.
	* reformat the usage strings.
	* properly separate components in n-grams.
	* document the theoretical aspects of the design.
dbacl 1.0:
	* add support for regular expressions
	* add support for internationalization
	* fix a bug (miscalculation of lambdas) in previous version. 
dbacl 0.9:
	* initial stable release.