sparse files

Jonathan I. Kamens jik at athena.mit.edu
Fri Dec 1 01:48:52 AEST 1989


In article <21581 at adm.BRL.MIL> JAZBO at brownvm.brown.edu (James H. Coombs)
writes:
>Can someone explain exactly what a sparse file is?  How does one get created?

  A "sparse file" is a file with a lot more NULLs in it than anything
else (well, that's a general definition, but it's basically correct).

  Many (although not all -- the Andrew File System, for example does
not) Unix filesystem types support the ability to greatly reduce the
amount of space taken up by a file that is mostly nulls by not really
storing the file blocks that are filled with nulls.

  Instead, the OS keeps track of how many blocks of nulls there are in
between each block that has something other than nulls in it, and
feeds nulls to anybody that tries to read the file, even though
they're not really being read off of a disk.

  You can create a sparse file by fopen'ing a file and fseek'ing far
past the end of the file without writing anything -- the file up to
where you fseek will be NULL, and the kernel (probably) won't save all
of those NULLs to disk.

  Programs that use dbm often create sparse files, because dbm uses
file location as part of its hashing and tries to spread out entries
in the database file so there is lots of blank space between them.

  The reason sparse files are a problem when it comes to copying is
that the Kernel isn't smart enough (or perhaps it won't do it because
it *is* smart :-) to figure out you're feeding it a sparse file if you
actually feed it the NULLs.  Therefore, standard file copying programs
like cp that just read the file in and write it out in a different
location lose, because they end up creating a file that really does
take up as much as space physically as there are NULLs in the abstract
file object.

Jonathan Kamens			              USnail:
MIT Project Athena				11 Ashford Terrace
jik at Athena.MIT.EDU				Allston, MA  02134
Office: 617-253-8495			      Home: 617-782-0710



More information about the Comp.unix.questions mailing list