grep replacement

Leo de Wit leo at philmds.UUCP
Tue May 31 20:42:03 AEST 1988


In article <292 at ncar.ucar.edu> russ at groucho.UCAR.EDU (Russ Rew) writes:
>I also recently had a need for printing multi-line "records" in which a
>specified pattern appeared somewhere in the record.  The following
>short csh script uses the awk capability to treat whole lines as fields
>and empty lines as record separators to print all the records from
>standard input that contain a line matching a regular specified as an
>argument:
>
>#!/bin/csh -f
>awk 'BEGIN {RS = ""; FS = "\n"; OFS = "\n"; ORS = "\n\n"} /'"$1"'/ {print} '
>
>

Awk is a nice solution, but sed is a much faster one. I've been following 
the 'grep' discussion for some time now, and have seen much demand for
features that are simply within sed. Here are some; I have left the discussion
about the function of this or that sed-command out: there is a sed article and
a man page...

Patrick Powell writes:
>The other facility is to find multiple line patterns, as in:
>find the pair of lines that have pattern1 in the first line
>pattern2 in the second, etc.

Try this one:

        sed -n -e '/PATTERN1/,/PATTERN2/p' file

It prints all lines between PATTERN1 and PATTERN2 matches. Of course you can
have subcommands to do special things (with '{' I mean).


Alan (..!cit-vax!elroy!alan) writes:
>One thing I would _love_ is to be able to find the context of what I've
>found, for example, to find the two (n?) surrounding lines.  I have wanted
>to do this many times and there is no good way.

There is. Try this one:

        sed -n -e '
/PATTERN/{
x
p
x
p
n
p
}
h' file

It prints the line before, the line containing the PATTERN, and the line after.
Of course you can make the output fancier and the number of lines printed
larger.


David Connet writes:
>>
>>One thing I would _love_ is to be able to find the context of what I've
>>found, for example, to find the two (n?) surrounding lines.  I have wanted
>>to do this many times and there is no good way.
>Also, what line number it was found on.

Sed can also handle this one:

        sed -n -e '/PATTERN/=' file


Lloyd Zusman writes:
>Or another way to get this functionality would be for this new greplike
>thing to allow matches on the newline character.  For example:
>    ^.*foo\nbar.*$
>          ^^
>    	newline

Sed can match on embedded newline characters in the substitute command 
(it is indeed \n here!). The trailing newline is matched by $.


Barry Shein writes [story about relative addressing]:
>I dunno, food for thought, like I said, maybe there's a generalization
>here somewhere. Or maybe grep should just emit line numbers in a form
>which could be post-processed by sed for fancier output (grep in
>backquotes on sed line.) Therefore none of this is necessary :-)

Quite right. I think most times you want to see the context it is in 
interactive use. In that case you can write a simple sed-script that does
just what is needed, i.e. display the [/pattern/-N] through [/pattern/+N] lines
, where N is a constant. The example I gave for N == 1 can be extended for
larger N, with fancy output etc.


Bill Wyatt writes: 
>> There have been times when I wanted a grep that would print out the
>> first occurrence and then stop.
>
>grep '(your_pattern_here)' | head -1

Much simpler, and faster:

        sed -n -e '/PATTERN/{
p
q
}' file

Sed quits immediately after finding the first match. You could even create an 
alias for something like that.


Michael Morrell writes:
>>Also, what line number it was found on.
>grep -n does this, but I'd like to see an option which ONLY prints the line
>numbers where the pattern was found.

The sed trick does this:

        sed -n -e '/PATTERN/=' file

Or you could even:

        sed -n -e '/PATTERN/{
=
q
}' file

which prints the first matched line number and exits.


Roy Smith writes:
>wyatt at cfa.harvard.EDU (Bill Wyatt) writes:
>[as a way to get just the first occurance of pattern]
>> grep '(your_pattern_here)' | head -1
>	Yes, it'll certainly work, but I think it bypasses the original
>intention; to save CPU time.  If I had a 1000 line file with pattern on
>line 7, I want grep to read the first 7 lines, print out line 7, and exit.
>grep|head, on the other hand, will read and search all 1000 lines of the
>file; it won't exit (with a EPIPE) until it writes another line to stdout
>and finds that head has already exited.  In fact, if grep block-buffers its
>output, it may never do more than a single write(2) and never notice that
>head has exited.

Quite right. The sed-solution I mentioned before is fast and neat. In fact, 
who needs head:

        sed 10q

does the job, as you can find in a book of Kernigan and Pike, I thought the 
title was 'the Unix Programming Environment'.


Stan Brown writes:
>	Along this same general line it would be nice to be abble to
>	look for paterns that span lines.  But perhaps this would be
>	tom complete a change in the philosophy of grep ?

As I mentioned before, embedded newlines can be matched by sed in the
substitute command.


What I also see often is things like

        grep 'pattern' file | sed 'expression'

A pity a lot of people don't know that sed can do the pattern matching itself.

S. E. D. (Sic Erat Demonstrandum)


As far as options for a new grep are conceirned, I suggest to use the options
proposed (and no more). Let other tools handle other problems - that's in the
Un*x spirit. What I would appreciate most in a new grep is:
no more grep, egrep, fgrep, just one tool that can be both fast (for fixed
strings) and elaborate (for pattern matching like egrep). The 'bm' tool that
was on the net (author Peter Bain) is very fast for fixed strings, using the
Boyer-Moore algorithm. Maybe this knowledge could be 'joined in'...?


        Leo.



More information about the Comp.unix.wizards mailing list