egrep string too big

tim at ucf-cs.UUCP tim at ucf-cs.UUCP
Thu Nov 17 15:09:25 AEST 1983


egrep allows full regular expression recognition and the search method
requires a deterministic search string.  When you ask for the pattern
"a.c", you are really asking for "a(a+b+c+...+z+A+B+C+...+any ascii char)c"
adding a question mark (?) after a don't care character (.) makes it even worse
by now looking for the regular expression "a.?c" you actually search for:
                                           +
a(a+b+c+...+z+A+B+C+...+any ascii character)c

The four character short hand really represents a non-deterministic
regular expression of 130 characters plus the operators.  In this example,
the non-determinism is only represented by what to do when a "c" is
encountered.  In the example ".?.", you introduce non-determinism in
all the 128 characters of ascii.  That expression must now be converted from
non-deterministic to deterministic which has a potential exponential explosion
in the number of states required.  I don't know right off the top of my head
what the growth is for this example (if you spent the time you could calculate
it).  After knowing this, it shouldn't suprise you that the pattern
"^.........?.?.?..$" (or whatever it was) was considered too big for egrep
to handle.
--
Tim Curry
USENET:  decvax!ucf-cs!tim
ARPANET: tim.ucf-cs at rand-relay



More information about the Comp.unix.wizards mailing list