edit first line of long file

Chris Torek chris at mimsy.umd.edu
Thu Oct 25 00:02:40 AEST 1990


In various articles Dan Bernstein and Blair Houghton fight over whether

	(head -1 | sed "$cmd"; cat)

or

	sed "1$cmd"

is better for applying a change to line 1 of a long file (see Subject).
Blair shows results for a tiny file in which sed is faster; this is not
relevant unless `a few lines' is considered a long file.  Dan argues that
sed is `more than 12 times slower than cat'.

All of this overlooks a more basic problem.  Although Dan is right about
the efficiency argument on most machines (on some, the I/O is slow enough
that the difference between `sed' and `cat' is that `cat' simply spends
more time waiting for the disk), I have to side with Blair.  Use `sed'
and restrict the operation to line one.  The problem with `head -1;cat'
is that it does not work.

On many machines, head uses standard I/O, and stdio reads in a block
and head prints out the first line.  Then head exits, leaving the file
seek pointer pointing at the second block.  Cat then reads and prints
blocks two through the end.

At least one POSIX draft has attempted to mandate that stdio should
`reset' the input seek pointer so that (head -1; cat)<foo would produce
an exact copy of file foo.  (I say `attempted' because the wording I
saw was utterly unintelligible.  The problem is tricky since an
application can fork, and the child had better not reset the parent's
seek pointers, unless maybe the file was read further in the child,
in which case maybe it ought to reset them but only to where they were,
except if the parent was also reading the file, except that the child
has no way to tell, but if we throw in words like `active file pointer'
we can confuse everyone and keep them from realizing that none of this
works at all anyway if the input is a pipe; if we keep the sentence
running on long enough all the readers will have given up long before
the dead giveaway in the previous clause, so. . . . :-) )  This
approach is doomed to failure (see previous parenthetical remark),
and in any case there are plenty of systems on which (head -1;cat)<foo
produces a copy of file foo with all of block zero, except for the
first line, removed.

(Actually, the previous is not entirely accurate, if the first line is
longer than one block.  But in (n-1) out of n cases some of the file is
lost.)
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 405 2750)
Domain:	chris at cs.umd.edu	Path:	uunet!mimsy!chris



More information about the Comp.unix.questions mailing list