Breaking large file into pieces
Larry Wall
lwall at jpl-devvax.JPL.NASA.GOV
Wed Sep 12 05:38:42 AEST 1990
In article <1990Sep11.134238.20218 at dg-rtp.dg.com> monroe at dg-rtp.dg.com (Mark A Monroe) writes:
: I want to rip a large file into pieces, naming new files according
: to an ID string in the large file. For example, the large file contains
: records that look like this:
:
: xxx-00001239 data data data
: description
: .
: .
: (variable length)
: .
: <---blank line
: xxx-00001489 data data data
: description
: .
: .
: (variable length)
: .
: <---blank line
: xxx-00001326 data data data
:
: When I find a line in the large data file that starts
: with "xxx-0000", I want to open a file named "xxx-0000<number>",
: like "xxx-00001489", and write every line, including
: the current one, into it. When I see another "xxx-0000",
: I want to close the file, open a new file named for the new id
: string, and continue writing. At the end of the large data
: file, close all files and exit.
:
: Any suggestions?
In standard shell+awk+sed it's a bit hard because you run out of file
descriptors. You could do something like run sed over your file
to turn it into a giant script of here-is commands, but that'll be
real slow.
You could do something like this:
while read line; do
case "$line" in
xxx-0000*) set $line; exec >$1;;
esac
echo "$line"
done
But how well that works depends on the vagaries of your echo command,
such as what it does with lines starting with '-', or containing '\c'.
You don't really want to do this on a machine where echo isn't a builtin.
If you have Perl, your fastest solution will be to say something like
perl -pe 'open(STDOUT,">$&") if /^xxx-0000\d+/' filename
Change > to >> if the keys aren't unique in your input file.
Larry Wall
lwall at jpl-devvax.jpl.nasa.gov
More information about the Comp.unix.shell
mailing list