Hard links to directories: why not?

Sat Jul 28 04:29:42 AEST 1990

Conrad Longmore <conrad at tharr.UUCP> writes:

   Multics got this right of course! :-) None of this tedious mucking
   about with links into where the files really were... the file in
   Multics are really in the directory that they seem to be in -
   links are just another file type, like directories. :-) <f/x> smug.

Well, that's right. But you can improve on Multics.  I discussed, and
got wide agreement, years ago (on BIX) an old idea of mine: directories
stink.  They are a poor indexing system (tall and thin, with poor
physical clustering, instead of squat and bushy, with good physical
clustering), and they are too navigational.

	Note: Multics actually has links and synonyms; a file
	or directory) may have multiple names, i.e. synonyms.
	This covers nearly all the uses of hard links, without
	the hazards.

There are instead filesystem organizations where a path name is just a
strings, and you use something like a btree to map that string into a
file pointer, and instead of a current directory you have a current
prefix. "Links" are then just names (or prefixes) that are declared to
be synonyms, and you make a name (or prefix) resolve to another name
(symlinks) or two names resolve to the same file block pointer (hard
links).

My own idea is actually very different: use a file name resolution
service that is totally not hierarchical. File names are really sets of
keywords, as in say "user,pcg,sources,IPC,header,macros", and a file is
identified by any uniquely resolvable subset of the keywords with which
it is registered with the name service, given in any order. Any
underspecified name resolution returns all file pointers that were
selected. There is no concept of current directory, but rather of
default keyword set, which is merged with keyword sets that are not
marked as "absolute".

The advantages are obvious: gone is the need to decide whether you want
for two compilers and their documentation organize the tree as 'ada/doc
c++/doc' or 'doc/ada doc/c++'. You can get rid of links and directories
and traversal programs and all sorts of funny problems.

	Note: an interesting thought is that probably we need some
	sort of trademarking, i.e. the ability to reserve certain
	combinations of keywords (such as 'user,pcg,private') to avoid
	cross pollution of name spaces, or a search for ever larger
	and more specific keyword sets.

If the implementor is clever efficiency could be dramatic, and probably
not inferior to that of Multics style directory name resolution systems
(not to speak of UNIX style ones), thanks to the ability to use high
density indexing methods with better physical clustering, to offset the
greater overheads for the increased sophistication and flexibility.

I think that a convenient prototype could be done as an NFS server,
(using slashes to separate keywords, why not, to give a UNIX like
flavour to the thing), starting from the free one posted in
comp.sources.unix, and could become a fine dissertation subject. It has
been many years that I have wanted to implement this scheme, but it
seems I will never have the time... Any takers? (I reckon it would make
an especially fine system for Amoeba or other capability based systems).

	Note: a very interesting dissertation from Stanford describes a
	system that is not too unlike this, except that the author used
	keyword based queries not for the name but for attributes of the
	file, e.g. owner, size, etc... The results were fairly
	encouraging, even in terms of performance, even with a not fully
	tuned prototype implementation.
--
Piercarlo "Peter" Grandi           | ARPA: pcg%cs.aber.ac.uk at nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth        | UUCP: ...!mcsun!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg at cs.aber.ac.uk