How do you find the symbolic links to files.

Dan Bernstein brnstnd at kramden.acf.nyu.edu
Wed Dec 12 13:37:20 AEST 1990


In article <1990Dec10.191522.2757 at erg.sri.com> zwicky at erg.sri.com (Elizabeth Zwicky) writes:
> In article <2469:Dec1001:13:4390 at kramden.acf.nyu.edu> brnstnd at kramden.acf.nyu.edu (Dan Bernstein) writes:
> >Elizabeth said that ``you have to get pretty intimate with the disk'' to
> >tell that a file has holes, or something like that. She concluded that
> >an archiver can with good conscience restore files with as many holes as
> >possible, hence saving as much space as possible.
> No, actually, Elizabeth didn't say either of those things.

Well, sorry, I thought it was Elizabeth who said ``you have to get
pretty intimate with the disk to tell that the 20 meg of nulls aren't
there'' in <1990Dec5.052124.28435 at erg.sri.com>. And who agreed in a
later article with Tom's conclusions. But this is besides the point.

Does anyone else understand the importance of restoring as much stat
information as possible? It's an archiver's duty to do as good a job as
it can.

Now Elizabeth's position has been that an archiver cannot do this
without going beyond the stat information and reading the raw disk.
Other people have agreed that you don't need raw access, but claim that
dumps become a lot slower. I'm more of an optimist:

  1. On a system without st_blocks, an archiver can lseek past every
     0-filled region. The system will automatically use holes wherever
     possible. (A) This doesn't require raw disk access. (B) Since stat
     doesn't care about holes, this doesn't destroy any information.
     (C) This wastes only restore time, not dump time.

  2. On a system with st_blocks, an archiver can lseek past the first
     N 0-filled regions, enough to restore st_blocks; and then it can
     write explicit zeros in the rest. Even if it doesn't know the block
     size, it can use trial and error to get the right st_blocks, as
     Barry illustrated in a previous article; since most files in
     practice do not have holes, this will rarely be necessary. (A) This
     does not require raw disk access. (B) st_blocks is restored as we
     want. (C) This wastes only restore time, not dump time; and it only
     wastes restore time on files that actually do have holes.

  3. On a system with full information about the locations of holes, an
     archiver can trivially record the locations and lseek appropriately
     on restore. (A) This does not require raw disk access. (B) All stat
     information is restored as we want. (C) This doesn't waste any
     time.

  4. On a system... well, I've never seen any systems that don't fall
     under #1 or #2, and hopefully future systems will be under #3.

People talking about ``portability'' simply don't understand what's
going on here. An archiver ON SYSTEM X is responsible for restoring
stat information as returned BY SYSTEM X. It is incredibly asinine to
say ``#2 is wrong on an AT&T system''---#2 is not *meant* for an AT&T
system!

> What I did say is that you cannot tell the difference between a hole
> and an equivalent number of nulls without reading raw blocks.
> st_blocks at best tells you how many holes there are; it doesn't tell
> you *where*.

Right! So on a system with st_blocks, the archiver's responsibility is
to restore the right number of holes. It can do this by making the first
N zero-filled blocks into holes, with no regard to the original
positions. This does *not* require access to the raw disk blocks.

> Just as programs may, conceivably, care what st_blocks is
> (care to name one that does?), they may also care where the holes are
> (I have no examples of this one either, but it's equally imaginable).

Yes, it is conceivable that a vendor would have a system returning
different stat information. Here's the most important point I'm trying
to make: On *that* system it is the archiver's responsibility to restore
that stat information returned by *that* system. Do you understand this?

It is even conceivable that a vendor will provide stat information that
can't be restored properly without raw disk access. In your December 5
article you were trying to cast ``gloom'' on archivers for exactly this
reason. But that's simply not true for System V or for standard BSD.

> I conclude from this that good archivers are not portable. One can
> arguably conclude that if you want a portable program, you can in good
> conscience restore files with as many holes as possible, since you
> can't get it right.

No! This is what Tom said, and it is entirely wrong. On a BSD system the
right strategy is #2: do what's necessary to restore st_blocks. A
program can reasonably depend on that information, so an archiver that
doesn't restore st_blocks is buggy.

---Dan



More information about the Comp.unix.internals mailing list