s/A*./B/g editor bug?

dmr at alice.UUCP dmr at alice.UUCP
Fri Oct 20 13:39:35 AEST 1989


The behavior of ed's
	s/b*a/c/g
on
	aaaaa
to yield
	cacacacacac
is doubtless a bug, and quite old; it occurs in
the Seventh Edition ed.  It relates to handling of null strings.

The problem ed is trying to overcome is shown by the command
	s/b*/c/g
on
	aaaaa
where the most desirable result is presumably
	cacacacacac
where each null string before, inside, and after the subject
matches /b*/ exactly once.  But "each null string" is a bit
slippery in practice.

Ed generally handles the 'g' option of substitution by iterating the
's' command, with an advancing starting position, until it reaches
the end of the subject string.  The new starting position is just
between the end of the replacement string from the last iteration and
the end of the unconsumed part of the subject string (if the r.e. match
succeeded), or one character further along in the subject (if the
match at this position failed).

When the algorithm suggested by this scheme is tried naively,
the second example, 's/b*/c/g', blows up; it puts a 'c'
at the beginning, then again finds the null string after
the inserted 'c' and before the same, initial 'a'; the r.e.
didn't consume any of the subject.

The remedy for this misbehavior that ed chose was
to force * expressions to fail to match empty strings
at the beginning of the subject string on the second
and subsequent iterations of the 's/.../.../g' processing.
(If you have the code, this is the meaning of the 'locs'
variable in advance().)  So in the 's/b*a/c/g' example,
a 'c' is put at the start; on the second iteration,
the subject still begins with the first 'a', and the 'b*' fails on this
initial 'a'.  Then it advances; this time the substitution
succeeds, because it is looking for 'b*a' not at the beginning.

What you've discovered is that this remedy is also naive.
It looks wiser to attack the problem head-on, by permitting
the r.e. null string match, but recasting the 's/.../.../g' loop
so that if the entire r.e. matches a null string immediately to the
right of a string replaced in the preceding iteration,
an advance through the subject string is forced.


	Dennis Ritchie
	dmr at research.att.com
	att!research!dmr



More information about the Comp.bugs.4bsd.ucb-fixes mailing list