Breaking large file into pieces
Larry Wall
lwall at jpl-devvax.JPL.NASA.GOV
Thu Sep 13 10:54:35 AEST 1990
In article <26116 at boulder.Colorado.EDU> skwu at spot.Colorado.EDU.Colorado.EDU (WU SHI-KUEI) writes:
: The right tool for the job is NOT perl but 'csplit'.
"Those words fall too easily from your lips." --Gandalf
Let us attempt to distinguish fact from dogma.
1) As far as I can tell, csplit is AT&T proprietary. I certainly
don't have it on all my machines, and don't know offhand where
I'd find the source for it. The person we were advising may
well not have it on his machine. You should at least say "If
you have csplit..."
2) The man page for csplit (in the AT&T universe of a Pyramid, anyway)
indicates that you can have a maximum of 99 output files. The
application in question could easily have more than that, judging
by how it was specified. A general tool should not have
such limitations.
3) csplit won't name the files in the way specified--you'd have to
follow it up with a loopful of mv commands, one process per file.
And in the naive implementation, you'd have a sed or awk for each
file to extract out the filename to hand to mv.
4) csplit can't recognize patterns across newlines (not that this
job required that, but a general tool shouldn't have such
limitations.)
5) csplit can get confused on lines longer than 255 chars. It can't
handle embedded nulls. A general tool should not have such
limitations.
6) Even if I did manage to find a freely available source for csplit,
I'd have to worry about recompiling it on all my different
architectures. That would be okay (after all, I have to do that
with Perl too), but I have to do it for 50 blue jillion other little
"must have" tools too. I'd much rather compile Perl once on
each architecture, rewrite csplit in Perl, throw it into my
/u/scripts directory that's mounted everywhere, and never worry about
recompiling csplit again.
So it's not quite so simple as all that. You can chop down a tree with
a hatchet, but sometimes you want an industrial strength Swiss Army Chainsaw.
And sometimes not. There's more than one way to do it.
Larry
More information about the Comp.unix.shell
mailing list