Breaking large file into pieces

Larry Wall lwall at jpl-devvax.JPL.NASA.GOV
Wed Sep 12 05:38:42 AEST 1990


In article <1990Sep11.134238.20218 at dg-rtp.dg.com> monroe at dg-rtp.dg.com (Mark A Monroe) writes:
: I want to rip a large file into pieces, naming new files according
: to an ID string in the large file.  For example, the large file contains
: records that look like this:
: 
: xxx-00001239	data	data	data
: description
:        .
:        .
: (variable length)
:        .
: 						<---blank line
: xxx-00001489	data	data	data
: description
:        .
:        .
: (variable length)
:        .
: 						<---blank line
: xxx-00001326	data	data	data
: 
: When I find a line in the large data file that starts
: with "xxx-0000", I want to open a file named "xxx-0000<number>",
: like "xxx-00001489", and write every line, including
: the current one, into it.  When I see another "xxx-0000",
: I want to close the file, open a new file named for the new id 
: string, and continue writing.  At the end of the large data
: file, close all files and exit.
: 
: Any suggestions?  

In standard shell+awk+sed it's a bit hard because you run out of file
descriptors.  You could do something like run sed over your file
to turn it into a giant script of here-is commands, but that'll be
real slow.

You could do something like this:

while read line; do
    case "$line" in
    xxx-0000*) set $line; exec >$1;;
    esac
    echo "$line"
done

But how well that works depends on the vagaries of your echo command,
such as what it does with lines starting with '-', or containing '\c'.
You don't really want to do this on a machine where echo isn't a builtin.

If you have Perl, your fastest solution will be to say something like

perl -pe 'open(STDOUT,">$&") if /^xxx-0000\d+/' filename

Change > to >> if the keys aren't unique in your input file.

Larry Wall
lwall at jpl-devvax.jpl.nasa.gov



More information about the Comp.unix.shell mailing list