Holey files, Batman!

Mark Rosenthal mbr at aoa.UUCP
Thu Mar 17 02:46:48 AEST 1988


How do you copy holey files?

You know, the kind of file which would be created by the following code.

    #include <sys/file.h>

    main(argc, argv)
    int argc;
    char **argv;
    {
	static long zeroes[] = { 0, 0, 0, 0 };
	int fd;

	fd = creat("holey", 0666);
	lseek(fd, 123456L, L_SET);
	write(fd, zeroes, sizeof(zeroes));
    }

As far as I can tell, when reading such a file, there is no way to determine
whether bytes whose value is 0 are really stored on disk, or are in the middle
of a hole in the file.  Therefore, when copying such a file, the holes get
converted to actual blocks which take up space on the disk, so the file expands.

I can think of two possible ways to copy such a file without expansion, neither
of which is entirely satisfactory:

    1. Read through the bytes sequentially, but don't output any whose value
	is 0.  Use lseek() before writing the next non-zero byte.  To guarantee
	that the created file has the same extent, always write the last byte.
	Disadvantage: When copying a file with huge holes and little data,
	this would be very slow.

    2. Go outside the filesystem.  Read the raw device for the partition,
	and interpret through the structures which represent the filesystem
	to determine where the holes are.  Disadvantages: 1. system dependent -
	needs to be rewritten for each different implementation of the
	filesystem; 2. a non-privileged user may well not have read access
	to the raw device.

Is there any direct method which will allow me to determine where the holes
in a file are?  Does anybody know of any other tricks besides the two I have
described above?  What, if any, are the differences among the following
utilities in terms of how they handle holey files: cp, tar, cpio, restor (V7
and successors), restore (4.2bsd and successors)?
-- 
	Mark of the Valley of Roses
	...!{harvard,ima}!bbn!aoa!mbr



More information about the Comp.unix.wizards mailing list