the many greps

Wed Nov 16 08:32:50 AEST 1983

From:  Dan Franklin <dan at bbncd>

Each time the 3 greps are discussed, and people point out that they use
different algorithms, each best for different kinds of regular expressions, I
am puzzled by the leap to the conclusion that they must therefore be different
programs.  Some UNIX C compilers have several different algorithms for the
'switch' statement, choosing either an indexed table, a hashed table with
linear rehash, or the obvious if/then/else structure for the output, depending
on the properties of the input.  These compilers do not provide 'switch1',
'switch2', and 'switch3' statements; the compiler examines the properties of
the case list and chooses the best representation.  If the only difference
between the three greps were the space-time performance of each algorithm, the
sensible thing to do would be to have one 'grep' which chose the most efficient
algorithm for the regular expression--with, perhaps, a switch so the user could
override grep's choice on special occasions (no heuristic can be perfect).

So why doesn't somebody do just that?  Consider how much new-user puzzlement
(and excess unix-wizards mail) would be eliminated!  There is a reason: the
three greps interpret three different forms of regular expression.  You can't
take an arbitrary shell script which uses, say, 'grep' and substitute 'egrep'
everywhere without first scrutinizing each regular expression to make sure it
doesn't have parentheses, vertical bars, etc.  So even if 'egrep' could use a
variant of the 'grep' algorithm in the right circumstances, you couldn't throw
away 'grep'.  (Each command also accepts a different subset of options, but
that problem could be solved.) Too bad.

	Dan Franklin