sed - match newlines on input

hoey at nrl-aic.arpa hoey at nrl-aic.arpa
Sat Mar 14 14:55:25 AEST 1987


>>      s/one\ntwo\nthree/one, two, three/g

It is amazing how many answers you get to questions about sed, and even
more amazing how many turn out to be wrong.  Having seen four persons'
incorrect answers, and no correct answer yet, I suppose you ought to
see one.

If the "g" at the end of the problem is to be believed, the file may
have to be completely read into memory.  This is because a file
containing

    threeone\ntwo\nthreeone\ntwo\n...\nthreeone\ntwo\nthreeone

must be written as a single line

    threeone, two, threeone, two, ..., threeone, two, threeone

and sed refuses to write part of a line.  The following script will
suffice to solve the problem:

    H;$!d;x;s/.//
    s/one\ntwo\nthree/one, two, three/g

Unfortunately, at least on 4.2BSD, there is a limit of about 4K to the
pattern space; on longer files you will get "Output line too long"
diagnostics and sed may core dump.  One solution to the problem is to
ignore the final "g" in the problem statement.  In other words, we
relax the requirement that a "one\ntwo\nthree" that begins on the same
line as a previous match ends will be recognized.  The following script
will solve the simpler problem, and will keep at most three lines of
the file in the pattern space at a time.

    /\n/!{$!N;}
    $!N
    /one\ntwo\nthree/s/\n/, /g
    P;D

If you are interested in simplifying these scripts, please be careful
to avoid the following common bugs:

1. Script changes "\n" to ", " always, rather than just in a
   one\ntwo\nthree context (Dornfield).
2. Script assumes the input does not contain "#" character, or some
   other character (Dornfield).
3. Script fails to recognize pattern "one\none\ntwo\nthree" (Fratkin,
   Roberts.  Thanks to Chris Torek for pointing out the problem of
   partial matches).
4. Script searches for the pattern /^one\ntwo\nthree$/ rather than
   /one\ntwo\nthree/ (Fratkin, Stewart).
5. Script fails to output the last line (Fratkin, Roberts).  Note that
   an "N" command on the last line will exit without printing the
   pattern space.  Apparently "$!N" solves the problem, though I am
   unconvinced that the documentation guarantees this to be so.
6. Script does not check that the input matches "one\ntwo\nthree".
   (Bill Roberts's script will modify "twone\nthreeleven\nhike".)
7. Script fails to output the non-matching lines.  (Dick Stewart's
   scripts have this problem, as well as gratuitously changing the
   problem to "s/abc\n123\nxyz/abc, 123, xyz/g" without even mentioning
   that this has been done.)

My thanks to Allyn Fratkin, for being the first to suggest that the
problem admits of an elegant solution, and for providing some of the
ideas that have gone into the above scripts.

Dan Hoey



More information about the Comp.unix.questions mailing list