sparse files

Robert Cousins rec at dg.dg.com
Tue Dec 12 04:02:29 AEST 1989


In article <2700 at auspex.auspex.com> guy at auspex.auspex.com (Guy Harris) writes:
(I wrote)
>>UNIX treats the "holes" as 0's when read. In fact, UNIX has only
>>minimal support for sparse files.  Backing up sparse files often
>>involves copying large amounts of nulls.  Once an area of a file is
>>written, it cannot be returned to its previous sparse state.
>Not in general, anyway.  At least the first version of AIX for the RT PC
>claimed, in its documentation, that it had an "fclear()" call to punch
>holes in files; I think this may show up in future releases of other
>UNIXes as well.

It is unclear whether support for sparse files is necessary.  My only
point is that at one time they were very popular amongst a particular
class of heavy DP applications.  Today we have the technology to more
effectively use system resources.  Don't forget that B-trees are relatively
recent inventions!

>>In arguments that UNIX is not suitable for DP applications, sparse
>>files usually come up if the conversation goes on long enough between
>>knowledgeable people.
>Umm, what other operating systems support sparse files *and* return a
>"there's a hole there" indication?  For instance, are there any OSes
>with extent-based file systems (VMS, OS/360 and successors as I
>remember, IRIX with SGI's Extent File System) that support sparse files?

There are a number of OS's which support sparse files.  An 
incomplete list of them includes:

	TurboDOS (1.3 and later)
	S1 (all revs if my memory is correct)
	RM/COS 
	IBM System 3 os (I think, its been 10 years)
	VM
	VMS 
	CP/M (Its not really an os but . . . . it is extent based)
	Any operating system which supports honest-and-for-true ISAMs
	In fact, a number of OS's designed for COBOL or RPG support
		have these features.
	Anyone care to add to the list?
	
It is true, however that newer operating systems don't support sparse
files.  However, add-ons such as VTAM, do still support it.  One reason
for the dimise of sparse files is the lack of support for the concept
of records in more popular operating systems (UNIX, DOS, etc.) It is 
much more difficult to treat a file as a sparse collection of bytes efficiently
than it is as a collection of records.  Several of the above mentioned
operating systems were plagued with handling sparse files in some form
of system imposed record scheme. Often this system-imposed scheme did
hide the "sparseness" from programmers under certain circumstances. For
example, I have been told that VMS allows programs to sequentially read 
a sparse file and skip over gaps in the file. ISAM files were intrensically
sparse.  ("ISAM" is a term which has recently been corrupted to mean 
"Keyed indexed access system of some form" instead of the traditional
surface/track/sector indexing scheme.)

As an aside, TurboDOS used sparse files as the extension mechanism for
files.  To extend a file, one would lock the region beyond the end of the
file, write to it (implicitly extending the file) and then release the lock.
Since file locks were for system imposed quantities, it was possible for
a program to create a sparse file by accident.  If one program wanted to write
1k bytes but the lock quantity was set at 2k bytes, it would have to lock the entire 
physical record (2k bytes) which would cause any program attempting to extend the 
file at the same time to skip beyond the lock region (over the second half of the
2k bytes) and do the same thing.  Effectively a sparse file was
created where the file ended in 1k of written data, 1k of "nothing", and 1k of 
written data.  Depending upon other circumstances, it was possible that the
sparse area could be shown as either unwritten (and return sparse file status)
or under certain obscure cases it would show to contain the previous contents 
of some physicla disk sectors.  This made porting some business applications 
quite difficult since business applications tend to depend upon shared files 
extended in real time. Applications properly written could use sparse files
to their own advantage without difficulty, however.

Robert Cousins
Dept. Mgr, Workstation Dev't.
Data General Corp.

Speaking for myself alone.



More information about the Comp.unix.questions mailing list