1
This chronicle briefly describes the progress of agrep.
3
Feb/91: The approximate pattern matching algorithm called 'bitap'
4
(bit-parallel approximate pattern matching) is designed.
5
The algorithm is a generalization of Baeza-Yates' "shift-or"
6
algorithm for exact matching.
8
Mar/91: Many extensions of the algorithm 'bitap' are found, especially
9
for approximate regular expression pattern matching. Preliminary
10
implementation of the algorithm showed a strong promise for
11
a general-purpose fast approximate pattern-matching tool.
13
Apr/91: Approximate regular expression pattern matching was implemented.
14
The result is even better than expected.
15
The design of the software tool is pinned down.
16
(For example, record oriented, multi-pattern, AND/OR logic queries.)
17
A partition technique for approximate pattern matching is used.
19
May/91: The prototype of "agrep" is completed.
20
A lot of debugging/optimization in this month.
22
Jun/91: The first version of agrep is released.
23
agrep 1.0 was announced and made available by anonymous ftp
26
Jul/91: A sub-linear expected-time algorithm, called "amonkey" for
27
approximate pattern matching (for simple pattern) is designed.
28
The algorithm has the same time complexity as that of
29
Chang&Lawler but is much much faster in practice.
30
The algorithm is based on a variation of Boyer-Moore technique,
31
which we call "block-shifting."
32
A sub-linear expected-time algorithm, called "mgrep" for
33
matching a set of patterns is designed based on the "block-shifting"
34
technique with a hashing technique.
36
Aug/91: "amonkey" is implemented and incorporated into agrep.
37
It is very fast for long patterns like DNA patterns.
38
(But roughly the same for matching English words as the bitap
39
algorithm using the partition technique.)
40
Prototype of "mgrep" is implemented.
42
Sep/91: "mgrep" is incorporated into agrep to support the -f option.
43
An algorithm for approximate pattern matching that combines the
44
'partition' technique with the sub-linear expected-time algorithm
45
for multi-patterns is designed.
46
Implementation shows it to be the fastest for ASCII text (and pattern).
47
Boyer-moore technique for exact matching is incorporated.
49
Nov/91: The final paper of "agrep" that is to appear in USENIX
50
conference (Jan 1992) is finished.
52
Jan/92: Some new options are added, such as find best matches (-B),
53
and file outputs (-G).
54
The man pages are revised.
55
agrep version 2.0 is released.
56
Fixed the following bugs and change the version to be 2.01.
57
1. -G option doesn't work correctly.
58
2. multiple definition of some global variables.
59
3. -# with -w forced the first character of the pattern to be matched
61
Mar/92: Fixed the following bugs and change the version to be 2.02.
62
1. agrep sometimes misses some matches for pipeline input.
63
2. the word delimiter was not defined consistantly.