holes in files

Anthony DeBoer adeboer at gjetor.geac.COM
Tue Dec 18 08:13:47 AEST 1990


In article <8432:Dec1622:40:0790 at kramden.acf.nyu.edu> brnstnd at kramden.acf.nyu.edu (Dan Bernstein) writes:
>If every hole on this system were allocated as a 0-filled block, we'd
>need twice as many disks. Another system has a huge page size and
>loosely padded executables; it would need three times as many disks.
>
>If every 0-filled block on this system were made into a hole, several
>well-written programs would crash miserably as soon as the disk is full.
[for example,]
>I have a file open. I want to make sure that blocks 17 through 22
>(expressed in byte sizes) will be guaranteed not to run out of space
>when I write to them. You're saying that I should have no way to make
>this guarantee.

We have another angle on the problem here:  The application software we run
will define a file with an index starting at location 0 and the actual data
starting a ways into the file, just past the point where the index will
eventually end when full (yes, this means you get an error when you try to
write the 1001st record into a file you defined for 1000 records, even with
lots of disk free, and you have to expand it by copying to a larger-defined
file).  It never writes to the tail part of the index space until it needs it,
so we wind up with a hole there.  Getting to the point, it's happened that a
client has had to restore a backup with lots of these indexed files and ran
out of disk space because cpio or tar was writing all the zeros and allocating
the holes!

This is essentially your case #1, but I'm just bringing up the backup angle;
if you back up and restore (or compress and uncompress) a swiss cheese file
you may lose a lot of disk space to the phantom holes.

The previous poster [<2806 at cirrusl.UUCP> dhesi%cirrusl at oliveb.ATC.olivetti.com
(Rahul Dhesi)] was arguing about how the underlying operating system should
treat this, ie.  what happens if you create a file, seek 1meg forward, and
start writing there?  Or what if you write a block of zeros; should it detect
that and deallocate the block?  We could argue the point, but we're stuck with
the way that real live Unix out in the field does it today.  Case #2 is valid
as well; if you've explicitly written zeros it's quite reasonable for you to
be able to rely on their being there.  The system's existing behaviour of
allowing a seek past EOF and not allocating space never written, reading it
back as zeros is reasonable; it's just that any copying operation can get
caught by it and write more than it read.

In article <BZS.90Dec10190615 at world.std.com> bzs at world.std.com (Barry Shein) writes:
> Actually, under BSD, you can write a fairly portable program to
> identify holes without getting intimate with the disk, tho I'm not
> entirely certain if there are any, um, holes in it, probably.
> The basic idea goes like this:
> 	1. Holes always read back as a block of zeros, so only
> 	blocks that appear to be filled with zeros are interesting.
> 	2. If you rewrite a real hole with all zeros (still
> 	with me?) the number of blocks in the file will change,
> 	a stat() will indicate this.
> Here's a basic program (which could be improved in various ways, but
> illustrates the idea) which prints out which blocks in the file are
> holes, have fun picking holes in it (at least grant me that I said it
                          ^^^^^ aaurgh, a punster in our midst!
> was BSD-only)!
(followed by program code; deleted)

Granted, that will take care of identifying if a file has holes, and will
as a side effect act as a "hole-filling" program.  Somewhere in my "to-do"
queue is writing a quick-and-dirty C program to "dig out" these holes,
copying a file and fseeking past any large block of zeros, replacing the
original file with the copy when done so as to free up the empty space.
-- 
Anthony DeBoer - NAUI #Z8800                           adeboer at gjetor.geac.com 
Programmer, GEAC J&E Systems Ltd.             uunet!jtsv16!geac!gjetor!adeboer
Toronto, Ontario, Canada             #include <std.random.opinions.disclaimer>



More information about the Comp.unix.internals mailing list