sparse files

Robert Cousins rec at dg.dg.com
Sat Dec 2 00:50:56 AEST 1989


In article <21581 at adm.BRL.MIL> JAZBO at brownvm.brown.edu (James H. Coombs) writes:
>"Sparse files" have been mentioned in several recent postings.  For example:
>
>>Kemp at DOCKMASTER.NCSC.MIL writes:
>>>Just for the record, is there *any* way to do a recursive copy
>>>correctly?  I.e.  one that doesn't:
>>> * turn symbolic links into actual files
>>> * turn link loops into a series of infinitely nested copies
>>> * alter the modify and change times
>>> * choke on block and character special files
>>> * turn holes in sparse files into real disk blocks
>>I think afio will do this. I am not sure about the symlink
>>stuff, though, as we're a SYS V only site.
>
>Can someone explain exactly what a sparse file is?  How does one get created?
>
>--Jim
>
>Dr. James H. Coombs
>Senior Software Engineer, Research
>Institute for Research in Information and Scholarship (IRIS)
>Brown University, Box 1946
>Providence, RI 02912
>jazbo at brownvm.bitnet
>Acknowledge-To: <JAZBO at BROWNVM>

A sparse file is one which has "holes" in it.  Specifically, the amount of 
space required to store the file on disk is less than the length of the
file (offset of the last byte).  A sparse file can be created under UNIX
by creating a file and then simply choosing not to write some portions
of the file.  The following program creates a sparse file:


#include	<stdio.h>
#include	<fcntl.h>
#include	<sys/file.h>
#include	<sys/types.h>
#include	<unistd.h>

main()
{
	int	fp, status;
	off_t	position;
	static char buffer[] = "This is a test of sparse files.";
	
	fp = open("test.file",O_RDWR+O_CREAT,0666);
	if (fp < 0) { printf("Unable to open/create file.\n"); exit(1); }
	position = lseek(fp,100000, SEEK_SET);
	printf("Moved the file to offset %d\n",position);
	status = write(fp, buffer, sizeof(buffer));
	printf("Result status of write is %d\n",status);
	close(fp);
	exit(0);
}

UNIX treats the "holes" as 0's when read. In fact, UNIX has only
minimal support for sparse files.  Backing up sparse files often
involves copying large amounts of nulls.  Once an area of a file is
written, it cannot be returned to its previous sparse state.  One
cannot REALLY tell (without heroic effort) if  a given area of a file
is just 0's or is not there.  In arguments that UNIX is not suitable for
DP applications, sparse files usually come up if the conversation goes
on long enough between knowledgeable people.

Some operating systems return an error which amounts to 
"you can't read that because there isn't anything there."  
Sparse files are quite popular for a number of Data Processing
applications.  (Effectively you can use them for hash buckets amongst
other applications.)  Furthermore, for some scientific applications, sparse
files can be used to store sparse matrices.  This, however, would
require finer granularity than normally found in the sparse storage
system.  Most operating systems just check to see if there is a block
allocated which would contain that information and if so return that
value.  Hence, a "sparse" file in which every other byte was written would
appear to an application to be continuous.

Robert Cousins
Dept. Mgr, Workstation Dev't.
Data General Corp.

Speaking for myself alone.



More information about the Comp.unix.questions mailing list