Non Destructive Version of rm

Mon May 6 17:24:47 AEST 1991

In article <JIK.91May6001507 at pit-manager.mit.edu> jik at athena.mit.edu (Jonathan I. Kamens) writes:

>
>  John Navarra suggests a non-destructive version of 'rm' that either
>moves the deleted file into a directory such as
>/var/preserve/username, which is periodically reaped by the system,
>and from which the user can retrieve accidentally deleted files, or
>uses a directory $HOME/tmp and does a similar thing.
>
>  He points out two drawbacks with the approach of putting the deleted
>file in the same directory as before it was deleted.  First of all,
>this requires that the entire directory tree be searched in order to
>reap deleted files, and this is slower than just having to search one
>directory.  Second, the files show up when the "-a" or "A" flag to ls
>is used to list the files in a directory.
>
>  A design similar to his was considered when we set about designing
>the non-destructive rm currently in use (as "delete") at Project
>Athena and available in the comp.sources.misc archives.  There were
>several reasons why we chose the approach of leaving files in the same
>directory, rather than Navarra's approach.  They include:
>
>1. In a distributed computing environment, it is not practical to
>   assume that a world-writeable directory such as /var/preserve will
>   exist on all workstations, and be accessible identically from all
>   workstations (i.e. if I delete a file on one workstation, I must be
>   able to undelete it on any other workstation; one of the tenet's of
>   Project Athena's services is that, as much as possible, they must
>   not differ when a user moves from one workstation to another).
>   Furthermore, the "delete" program cannot run setuid in order to
>   have access to the directory, both because setuid programs are a
>   bad idea in general, and because setuid has problems in remote
>   filesystem environments (such as Athena's).  Using $HOME/tmp
>   alleviates this problem, but there are others....

     	The fact that among Athena's 'tenets' is that of similarity from
 workstation to workstation is both good and bad in my opinion. True, it
 is reasonable to expect that Unix will behave the same on similar workstations
 but one of the fundamental benifits of Unix is that the user gets to create
 his own environment. Thus, we can argue the advantages and disadvantages of
 using an undelete utililty but you seem to be of the opinion that non-
 standard changes are not beneficial and I argue that most users don't use
 a large number of different workstations and that we shouldn't reject a 
 better method just because it isn't standard.
	I don't understand your setuid argument. All you do is have a directory
 called /var/preserve/navarra and have each persons directory unaccessible to
 others (or possibily have the sticky bit set on too) so that only a the owner
 of the file can undelete it.
>
>2. (This is a big one.) We wanted to insure that the interface for
>   delete would be as close as possible to that of rm, including
>   recursive deletion and other stuff like that.  Furthermore, we
>   wanted to insure that undelete's interface would be close to
>   delete's and as functional.  If I do "delete -r" on a directory
>   tree, then "undelete -r" on that same filename should restore it,
>   as it was, in its original location.
>
>   Navarra's scheme cannot do that -- his script stores no information
>   about where files lived originally, so users must undelete files by
>   hand.  If he were to attempt to modify it to store such
>   information, he would have to either (a) copy entire directory
>   trees to other locations in order to store their directory tree
>   state, or (b) munge the filenames in the deleted file directory in
>   order to indicate their original locationa, and search for
>   appropriate patterns in filenames when undeleting, or (c) keep a
>   record file in the deleted file directory of where all the files
>   came from.

    Ahh, we can improve that. I can write a program called undelete that
    will look at the filename argument and by default undelete it to $HOME
    but can also include a second argument -- a directory -- to move the
    undeleted material. I am pretty sure I could (or some better programmer
    than I) could get it to move more than one file at a time or even be
    able to do something like: undelete *.c $HOME/src and move all files
    in /var/preserve/username with .c extensions to your src dir.
    And if you don't have an src dir -- it will make one for you. Now this
    if done right, shouldn't take much longer than removing a directory 
    structure. So rm *.c on a dir should be only a tiny bit faster than 
    undelete *.c $HOME/src. I think the wait is worth it though -- esp
    if you consider the consequnces of looking thru a tape backup or gee
    a total loss of your files!  
	As far as rm -r and undelete -r go, perhaps the best way to handle
    this is when the -r option is called, the whole dir in which you are 
    removing files is just moved to /preserve. And then an undelete -r dir          dir2 where dir2 is a destination dir,  would restore all those files.             HOwever, you would run into
    problems if /preserve is not mounted on the same tree as the dir you wanted
    to remove. This can be resolved by allowing undelete to run suid but
    I agree that is not wise. You wouldn't want users being able to mount
    and unmount filesystems they had remove privledges on -- perhaps there
    is another solution that I am overlooking but there are limits to any
    program.  Just because there might not be any information about where
    the files orginally were is not good enough reason to axe its use. 
>
>   Each of these approaches has problems.  (a) is slow, and can be
>   unreliable.  (b) might break in the case of funny filenames that
>   confuse the parser in undelete, and undelete is slow because it has
>   to do pattern matching on every filename when doing recursive
>   undeletes, rather than just opening and reading directories.  (c)
>   introduces all kinds of locking problems -- what if two processes
>   try to delete files at the same time.

       Assuming I can write a program which could look thru this preserve
 dir and grab a file(s) that matches the argument undelete would be slow
 if there were a vast number of files in there. However, assuming you don't
 remove HUGE numbers of files over a two day period (the period the files would
 be deleted.) I bet that would be faster than undeleting a file in a number
 of directories that have a .# extension because many directories would be
 bigger than the /preserve dir in which case you would have to be digging thru
 a bigger list of files. 
 	Here are some more problems. Like rm, undelete would operate by looking
 thru /preserve. But if rm did not store files in that dir but instead stored
 them as .# in the current directory, then undelete would likewise have to
 start looking in the current dir and work its way thru the directory structure
 looking for .# files that matched a filename argument UNLESS you gave it
 a starting directory as an argument in which case it would start there. That
 seems like alot of hassle to me.
	As far as funny filenames and such -- that I am not sure about but
 it seems like it could be worked out.  

>
>3. If all of the deleted files are kept in one directory, the
>   directory gets very large.  This makes searching it slower, and
>   wastes space (since the directory will not shrink when the files
>   are reaped from it or undeleted).

   You get a two day grace period -- then they are GONE! This is still faster
 than searchin thru the current directory (in many cases) looking for .# files
 to undelete. 
>
>4. My home directory is mounted automatically under /mit/jik.  but
>   someone else may choose to mount it on /mnt, or I may choose to do
>   so.  The undeletion process must be independent of mount point, and
>   therefore storing original paths of filenames when deleting them
>   will fail if a different mount point is later used.  Using the
>   filesystem hierarchy itself is the only way to insure mount-point
>   independent operation of the system.
>
>5. It is not expensive to scan the entire tree for deleted files to
>   reap, since most systems already run such scans every night,
>   looking for core files *~ files, etc.  In fact, many Unix systems
>   come bundled with a crontab that searches for # and .# files every
>   night by default.

     if that is the case -- fine -- you got me there. Do it from crontab
 and remove them every few days. I just think it is a waste to infest many       directories with *~ and # and .# files when 99% of the time when someone
 does rm filename -- THEY WANT IT REMOVED AND NEVER WANT TO SEE IT AGAIN!
 SO now when I do an ls -las -- guess what! There they are again! Well
 you tell me "John, don't do an ls -las"   -- well how bout having
 to wait longer on various ls's because my directory size is bigger now. 
 Say I did delete a whole mess of files, now I have all those files in
 my current dir, now I want to see all my .files as well. So I do an ls -las
 and when I come back from lunch I might see them . -- ever try to ls -las
 /dev!? 
>
>6. If I delete a file in our source tree, why should the deleted
>   version take up space in my home directory, rather than in the
>   source tree?  Furthermore, if the source tree is on a different
>   filesystem, the file can't simply be rename()d to put it into my
>   deleted file directory, it has to be copied.  That's slow.  Again,
>   using the filesystem hierarchy avoids these problems, since
>   rename() within a directory always works (although I believe
>   renaming a non-empty directory might fail on some systems, they
>   deserve to have their vendors shot :-).
>
>7. Similarly, if I delete a file in a project source tree that many
>   people work on, then other people should be able to undelete the
>   file if necessary.  If it's been put into my home directory, in a
>   temporary location which presumably is not world-readable, they
>   can't.  They probably don't even know who delete it.

    I admit you have pointed out some flaws. Some of which can be corrected,
 others you just have to live with. I have made a few suggestions to improve
 the program. In the end though, I think the one /preserve directory is
 much better. But here is another suggestion which you might like:

    make a shell variable RMPATH and you can set it to whatever PATH 
 you want. The default will be /var/preserve but you can set it to $HOME/tmp
 or maybe perhaps it could work like the PS1 variable and have a $PWD  
 options in which case it is set to your current directory. Then when you
 rm something or undelete something, the RMPATH will be checked.

>
>Jonathan Kamens			              USnail:
>MIT Project Athena				11 Ashford Terrace
>jik at Athena.MIT.EDU				Allston, MA  02134
>Office: 617-253-8085			      Home: 617-782-0710
>
>
-- 
>From the Lab of the MaD ScIenTiST:

navarra at casbah.acns.nwu.edu