easy for some

Tom Christiansen tchrist at convex.COM
Thu May 9 13:39:51 AEST 1991


>From the keyboard of lm at slovax.Eng.Sun.COM (Larry McVoy):
:matthew at gizmo.UK.Sun.COM (Matthew Buller - Sun EHQ - MIS) writes:
:> problem: to extract text between start and end patterns in a file
:> eg:-
:> 
:> file:
:> 
:> pattern1---
:> 
:> stuff
:> stuff
:> stuff
:> 
:> pattern2---
:
:/bin/sh, usage shellscript start_pat stop_pat [files...]

ug.

A shell solution is obscene. :-) I don't know how to do it in sed.  An awk
solution would have made certain others happy, but wouldn't have been so
nifty.  

> /bin/perl, same usage (see the notes on the ".." operator, cool thingy).

But since we do happen to be on the perl topic...

> 	$START = shift;
> 	$STOP = shift;
> 	while (<>) {
> 		if (/^$START$/../^$STOP/) {
> 			next if /^$START$/;	# skip starting pattern
> 			last if /^$STOP/;	# done if last;
> 			print;
> 		}
> 	}

The following code should be faster because it's got fewer regexp
compiles.  The /o is to tell perl to compile the pattern only one.  
It also uses the fact that .. returns the sequence number, and that 
the last in the sequence has an E0 appended to it, for example making 
144 be seen as 144E0, which is the same numerically, but you can do 
string or pattern operations on it.

    $START = shift;
    $STOP = shift;

    while (<>) {
	if ( $which = /^$START$/o .. /^$STOP$/o ) {
	    next if $which == 1;
	    last if $which =~ /E/;
	    print;
	} 
    } 

or maybe instead of the next/last pair of lines, just

    next if $which =~ /^1$|E/;

if they want all instances in the stream extracted.


--tom
--
Tom Christiansen		tchrist at convex.com	convex!tchrist
		"So much mail, so little time." 



More information about the Comp.unix.wizards mailing list