agrep - a new tool for approximate text search

Udi Manber udi at cs.arizona.edu
Fri Jun 28 06:22:00 AEST 1991


We are proud to announce the release of version 1.0 of agrep - a new tool
for text searching with errors.
agrep is similar to egrep (or grep or fgrep), but it is much more general.
It is also usually faster than egrep (on a SUN SparcStation II it is
about twice as fast for typical queries even with errors).
It is based on an entirely different algorithm.
The two most significant features of agrep that are not supported by
the grep family are 
1) the ability to search for approximate patterns;
	for example, "agrep -2 homogenos foo" will find homogeneous as well 
	as any other word that can be obtained from homogenos with at most 
	2 substitutions, insertions, or deletions.
2) agrep is record oriented rather than just line oriented;  a record
is by default a line, but it can be user defined;
	for example, "agrep -d '^From ' 'breakdown; (inter|arpa|bit)net' mbox"
	outputs all mail messages that contain breakdown and one
	of either internet, arpanet, or bitnet.
	Another example:  "agrep -d '$$' pattern foo" will output all
	paragraphs (separated by an empty line) that contain pattern.

Other features include searching for regular expressions (with or
without errors), unlimited wild cards, AND and OR operations,
limiting the errors to only insertions or only substitutions or 
any combination, allowing each deletion, for example, to be 
counted as, say, 2 substitutions or 3 insertions, 
restricting parts of the query to be exact and parts
to be approximate, and many more.

agrep is available by anonymous ftp from cs.arizona.edu (IP 192.12.69.5)
as pub/agrep/agrep.tar.Z (or in uncompressed form as pub/agrep/agrep.tar).
The tar file contains the source code (in C), man pages (agrep.1), 
and a postscript file (agrep.ps) of a technical report (TR #91-11) 
describing the design and implementation of agrep.

This is the first version of agrep.  There may be some bugs, especially with
complicated patterns and a combination of options.
Please mail bug reports (or any other comments) 
to sw at cs.arizona.edu or to udi at cs.arizona.edu.

We would appreciate if users notify us (at the address above)
of any extensions, improvements, or interesting uses of this software.

Prof. Udi Manber  (udi at cs.arizona.edu) 
Dept. of Computer Science
University of Arizona
Tucson, AZ 85721

June 10, 1991.





More information about the Comp.sys.sun mailing list