agrep - a new tool for text searching with errors

Udi Manber udi at cs.arizona.edu
Mon Jun 17 17:10:19 AEST 1991


We are proud to announce the release of version 1.0 of agrep - a new tool
for fast text searching with errors.
agrep is similar to egrep (or grep or fgrep), but it is much more general.
It is based on an entirely different algorithm.
The three most significant features of agrep that are not supported by
the grep family are 
1) the ability to search for approximate patterns;
	for example, "agrep -2 homogenos foo" will find homogeneous as well 
	as any other word that can be obtained from homogenos with at most 
	2 substitutions, insertions, or deletions.
2) agrep is record oriented rather than just line oriented;  a record
is by default a line, but it can be user defined;
	for example, "agrep -d '^From ' 'pizza' mbox"
	outputs all mail messages that contain the keyword "pizza".
	Another example:  "agrep -d '$$' pattern foo" will output all
	paragraphs (separated by an empty line) that contain pattern.
3) multiple patterns with AND (or OR) logic queries.   
	For example, "agrep -d '^From ' 'burger,pizza' mbox" 
	outputs all mail messages containing at least one of the 
	two keywords (, stands for OR).
	"agrep -d '^From ' 'good;pizza' mbox" outputs all mail messages
	containing both keywords.

Putting these options together one can ask queries like

agrep -d '$$' -2 '<CACM>;TheAuthor;Curriculum;<198[5-9]>' bib

which outputs all paragraphs referencing articles in CACM between 
1985 and 1989 by TheAuthor dealing with curriculum.  
Two errors are allowed (e.g., one in TheAuthor and one in Curriculum,
or two in one of them), but they cannot be in either CACM or the year 
(the <> brackets forbid errors in the pattern between them).  

Other features include searching for regular expressions (with or
without errors), unlimited wild cards, limiting the errors to only 
insertions or only substitutions or any combination, 
allowing each deletion, for example, to be counted as, say, 
2 substitutions or 3 insertions, restricting parts of the query 
to be exact and parts to be approximate, and many more.

agrep is available by anonymous ftp from cs.arizona.edu (IP 192.12.69.5)
as agrep/agrep.tar.Z (or in uncompressed form as agrep/agrep.tar).
The tar file contains the source code (in C), man pages (agrep.1), 
and a postscript file (agrep.ps) of a technical report (TR #91-11) 
describing the design and implementation of agrep.

This is the first version of agrep.  There may be some bugs, especially with
complicated patterns and a combination of options.
Please mail bug reports (or any other comments) 
to sw at cs.arizona.edu or to udi at cs.arizona.edu.

We would appreciate if users notify us (at the address above)
of any extensions, improvements, or interesting uses of this software.

June 16, 1991.



More information about the Comp.unix.wizards mailing list