easy for some

Larry Wall lwall at jpl-devvax.jpl.nasa.gov
Fri May 10 04:55:03 AEST 1991


In article <1991May9.153351.1754 at colorado.edu> lewis at tramp.Colorado.EDU (LEWIS WILLIAM M JR) writes:
: In article <574 at appserv.Eng.Sun.COM> lm at slovax.Eng.Sun.COM (Larry McVoy) writes:
: >matthew at gizmo.UK.Sun.COM (Matthew Buller - Sun EHQ - MIS) writes:
: >> problem: to extract text between start and end patterns in a file
: ... more problem description
: >/bin/sh, usage shellscript start_pat stop_pat [files...]
: >
: ... complex shell and perl programs to do
: 
: 	sed -n '/pattern1/,/pattern2/p' source_file > new_file

No, that's not what those programs were trying to do.  (Admittedly, the
original spec was unclear.)  The other programs were attempting to omit
the endpoints, taking "between" to mean exclusion of said endpoints.
Some of them were also trying to snab only the text between the first
pair of patterns.  Some were allowing for the patterns to be passed in
as arguments.

Here's the perl equivalent of what you said:

    perl -ne 'print if /pattern1/../pattern2/' source_file >new_file

When using Perl to do the other thing, I personally prefer a straightforward
approach:

    #!/usr/bin/perl
    while (<>) {
	last if /pattern1/;
    }
    while (<>) {
	exit if /pattern2/;
	print;
    }

For hardwired patterns this will generally beat sed.  (Especially if
sed is stupid enough to read the rest of the input file.)
Parameterized patterns can get the same performance using eval:

    #!/usr/bin/perl
    $pattern1 = shift;
    $pattern2 = shift;
    eval <<"END";
	while (<>) {
	    last if /$pattern1/;
	}
	while (<>) {
	    exit if /$pattern2/;
	    print;
	}
    END

Larry Wall
lwall at netlabs.com



More information about the Comp.unix.wizards mailing list