1
by Michael-John Turner
Import upstream version 2.04 |
1 |
This is version 2.04 of agrep - a new tool for fast |
2 |
text searching allowing errors. |
|
3 |
agrep is similar to egrep (or grep or fgrep), but it is much more general |
|
4 |
(and usually faster). |
|
5 |
The main changes from version 1.1 are 1) incorporating Boyer-Moore |
|
6 |
type filtering to speed up search considerably, 2) allowing multi patterns |
|
7 |
via the -f option; this is similar to fgrep, but from our experience |
|
8 |
agrep is much faster, 3) searching for "best match" without having to |
|
9 |
specify the number of errors allowed, and 4) ascii is no longer required. |
|
10 |
Several more options were added. |
|
11 |
||
12 |
To compile, simply run make in the agrep directory after untar'ing |
|
13 |
the tar file (tar -xf agrep-2.04.tar will do it). |
|
14 |
||
15 |
The three most significant features of agrep that are not supported by |
|
16 |
the grep family are |
|
17 |
1) the ability to search for approximate patterns; |
|
18 |
for example, "agrep -2 homogenos foo" will find homogeneous as well |
|
19 |
as any other word that can be obtained from homogenos with at most |
|
20 |
2 substitutions, insertions, or deletions. |
|
21 |
"agrep -B homogenos foo" will generate a message of the form |
|
22 |
best match has 2 errors, there are 5 matches, output them? (y/n) |
|
23 |
2) agrep is record oriented rather than just line oriented; a record |
|
24 |
is by default a line, but it can be user defined; |
|
25 |
for example, "agrep -d '^From ' 'pizza' mbox" |
|
26 |
outputs all mail messages that contain the keyword "pizza". |
|
27 |
Another example: "agrep -d '$$' pattern foo" will output all |
|
28 |
paragraphs (separated by an empty line) that contain pattern. |
|
29 |
3) multiple patterns with AND (or OR) logic queries. |
|
30 |
For example, "agrep -d '^From ' 'burger,pizza' mbox" |
|
31 |
outputs all mail messages containing at least one of the |
|
32 |
two keywords (, stands for OR). |
|
33 |
"agrep -d '^From ' 'good;pizza' mbox" outputs all mail messages |
|
34 |
containing both keywords. |
|
35 |
||
36 |
Putting these options together one can ask queries like |
|
37 |
||
38 |
agrep -d '$$' -2 '<CACM>;TheAuthor;Curriculum;<198[5-9]>' bib |
|
39 |
||
40 |
which outputs all paragraphs referencing articles in CACM between |
|
41 |
1985 and 1989 by TheAuthor dealing with curriculum. |
|
42 |
Two errors are allowed, but they cannot be in either CACM or the year |
|
43 |
(the <> brackets forbid errors in the pattern between them). |
|
44 |
||
45 |
Other features include searching for regular expressions (with or |
|
46 |
without errors), unlimited wild cards, limiting the errors to only |
|
47 |
insertions or only substitutions or any combination, |
|
48 |
allowing each deletion, for example, to be counted as, say, |
|
49 |
2 substitutions or 3 insertions, restricting parts of the query |
|
50 |
to be exact and parts to be approximate, and many more. |
|
51 |
||
52 |
agrep is available by anonymous ftp from cs.arizona.edu (IP 192.12.69.5) |
|
53 |
as agrep/agrep-2.04.tar.Z (or in uncompressed form as agrep/agrep-2.04.tar). |
|
54 |
The tar file contains the source code (in C), man pages (agrep.1), |
|
55 |
and two additional files, agrep.algorithms and agrep.chronicle, |
|
56 |
giving more information. |
|
57 |
The agrep directory also includes two postscript files: |
|
58 |
agrep.ps.1 is a technical report from June 1991 |
|
59 |
describing the design and implementation of agrep; |
|
60 |
agrep.ps.2 is a copy of the paper as appeared in the 1992 |
|
61 |
Winter USENIX conference. |
|
62 |
||
63 |
Please mail bug reports (or any other comments) |
|
64 |
to sw@cs.arizona.edu or to udi@cs.arizona.edu. |
|
65 |
||
66 |
We would appreciate if users notify us (at the address above) |
|
67 |
of any extensions, improvements, or interesting uses of this software. |
|
68 |
||
69 |
January 17, 1992 |
|
70 |
||
71 |
||
72 |
BUGS_fixed/option_update |
|
73 |
||
74 |
1. remove multiple definitions of some global variables. |
|
75 |
2. fix a bug in -G option. |
|
76 |
3. fix a bug in -w option. |
|
77 |
January 23, 1992 |
|
78 |
||
79 |
4. fix a bug in pipeline input. |
|
80 |
5. make the definition of word-delimiter consistant. |
|
81 |
March 16, 1992 |
|
82 |
||
83 |
6. add option '-y' which, if specified with -B option, will always |
|
84 |
output the best-matches without a prompt. |
|
85 |
April 10, 1992 |
|
86 |
||
87 |
7. fix a bug regarding exit status. |
|
88 |
April 15, 1992 |