Making rm undoable

Dan Bernstein bernsten at phoenix.Princeton.EDU
Fri Mar 24 20:00:33 AEST 1989


I did promise a few weeks ago to summarize this in a week...

For those who missed the original posting, I was proposing that rather
than somehow convert every rm into an mv (unlink() into rename()), you
could prepare beforehand for losing files by making an extra link
somewhere safe with ln (link()). (That's what I meant to say, anyway.)
After all, link() followed by unlink() is almost the same as to rename().

I received several responses, each of which I summarize (in detail)
below. Skip to GENERAL THOUGHTS at the end if you don't want to read
eighty lines of discussion...

Stephen C. North (hector!north, ulysses!north, hector.homer.nh.att.com)
prefers the ``old alias "rm" trick'' for its simplicity. I'd say that
there is less to think about for the user, but often-noted problems
include: 1. If the file preservation is too invisible, you'll be too
careless under a system or shell without it. 2. How do you make sure
that all unlink()s use the alias? 3. How do you make sure that shell
scripts that really should delete a file actually use the real rm?
Nevertheless, the ``old alias "rm" trick'' does in practice prove quite
useful.

James R. Drinkwater (jd at csd4.milw.wisc.edu) sees the problem that every
file in the file system would take up an extra inode. He said that when
he wanted a file back, it was because of accidental deletion (e.g., rm a*
without remembering attn.important) rather than later deciding he needed
a file. He made a general proposal that the trash directory contain not
only the deleted files but also soft links to their original positions;
this is an excellent idea that applies to all trash directory methods.
He also proposed that deleted files could be dumped to tape in the end
rather than really erased; I think this would require superuser support
and also that everybody use the same trash method---jd proposed a global
trash directory. He pointed out that files should remain at least a fixed
(though user-defined) time; I'd say that if the trash is emptied
automatically, this had better be true, whil if the trash is emptied
manually, it shouldn't.

Christopher J. Calabrese (ulysses!cjc at research, cjc at ulysses.att.com)
also pointed out that ``you really want to emptytrash only files over
a certain age.'' He criticized my proposal, saying it would require
too much overhead and too many ``huge and unnecessary directories''
to maintain, as well as time; this is basically correct. He brought
up the problem of distinguishing between deleting when you want a
copy preserved and deleting when you don't. He said that people most
often delete files that they just created, and that for this reason
changing rm's behavior is better than my proposal.

Paul English (ileaf!io!speed!pme at eddie.mit.edu, pme at speed.io.uucp)
also prefers the idea of changing rm's behavior. He proposed that
rather than doing mv, safe rm should make a hard link to the file
and then remove the original file. This is, like my method, more
restricted than mv, which (on newer systems) can transfer files
across filesystems; forcing a physical transfer of a potentially
gigantic file is dubious, so I agree that an rm alias should
understand the necessity of staying within a filesystem.

Eli ? (echarne at orion.cf.uci.edu) mentioned that on his system, file
names beginning with a comma are automatically removed after a few
days, and that thus a safe way of removing files is to rename them
to ,-files. I've observed this elsewhere (# files are also commonly
removed); renaming files that way seems to me a very good solution.

Barry Shein (bzs at xenna.encore.com) also observed that you usually
delete what you're currently working on. He pointed out again the
fundamental problem of convincing all programs to unlink()
safely---except those shell scripts that should really erase the
file (aargh)... He proposed that if UNIX supported real event
signals (wake me up when a process does X, and pause that process
in the meantime) one could easily trap all unlink()s, and noted
that one can effectively do this by using NFS. He mentioned that
some editors and other utilities unlink and then recreate the file,
which deserves some discussion: The more common action (shell >,
vi, most other programs) is to simply write over the file. This
means that trapping unlink() won't stop most changes, and brings
to light the fact that version numbering in UNIX is a very very
tricky subject. What do you do if a process keeps a file open?
Do you say the version number increases on each write() (very
inefficient) or on each close()? How do you distinguish between
files that should not be version numbered and files that should,
and what about disk space? I am tempted to say that because of
the unified UNIX philosophy for dealing with everything as just
some type of file, version numbering is impossible---but I
remember hearing someone mention it is possible, and if I do
make my claim, Murphy will insure that I am publicly proven wrong.

Carl Witty (cwitty at csli.stanford.edu) wondered what mvdir is
(it's a general term covering whatever you have to do to move
a directory---on BSD, mv can do mvdir, within filesystems...).
He reminds us that ``the only cost for an extra hard link is
the space in the directory file, which is certainly manageable.''
Of course, this is the opposite view to jd, who worries about all
the extra inodes needed. I agree with cwitty; I've never seen more
than half the inodes used, on any filesystem.

Jerry Peek (jdpeek at rodan.acs.syr.edu) supports my idea and has been
looking forward to this summary. Well, now you have it.

Kevin Braunsdorf (ksb at j.cc.purdue.edu) said that at Purdue there are
three entombing schemes, of which the best one, maintained by
Matt Bradburn (mjb at staff.cc.purdue.edu), is a library redefining
unlink(), link(), and rename() to safer versions. ``It works.''

Larry Wall (lwall at devvax.jpl.nasa.gov, lwall at jpl-devvax.jpl.nasa.gov)
criticized my scheme since it doesn't work across filesystems, and
thus doesn't work over his account. He would rather see a trashcan
in each subdirectory; this is an interesting idea.

GENERAL THOUGHTS

The first person who reads this far wins a ... :-)

If UNIX were the type of system where version numbering were possible
(oops, I mean common, really I do) then the problem of file deletion
would be trivial. But version numbering is not possible (oops, common)
in UNIX.

Changing the low-down behavior of at least unlink() and possibly
link() and rename(), by (slow) NFS trickery or by a safe-rm library,
would completely solve the problem of files being accidentally
deleted. Perhaps the kernel should support this. However, this
leaves the problem of files that you really want deleted, or
the fact that this is not (yet?) the standard and thus programs
will be written for the old standard, or shell scripts that only
want a temporary file, or ... .

So it's not a simple problem. As for the idea of a more long-term
link() to make unlink() more safe, the responses have convinced me
that without kernel support this is not an appropriate use of
resources for all files. However, it would be useful as a
``preserve'' program that you explicitly invoke upon files that
you do not want deleted at any cost. preserve would not stop any
changes, and it would have to list all those programs that unlink()
and recreate files as ``preserve will not work with these, sorry,''
but it would prevent accidental deletion of the named files. So
you would just preserve your most important files, as a last resort.

There could be advantages to writing preserve as simply a process
that keeps the file open. This is a shorter-term solution, giving
Murphy a great excuse to crash the machine; but it would not require
an extra filesystem entry, and it would be trivial to include
automatic warnings every so often if the file is accidentally
removed. ``Mail from username... Subject: preserving "foo". To
recover "foo", type "unrm foo"...'' Or I suppose the file could
be re-instated in a trash directory by that process...

---Dan Bernstein, bernsten at phoenix.princeton.edu



More information about the Comp.unix.wizards mailing list