unix file system

Chuck Hedrick hedrick at topaz.ARPA
Sun Jul 28 15:05:33 AEST 1985


Jon: I am very glad to see that DEC is interested in Fortran on Unix.
You would make many people very happy if you bring to Unix a Fortran
compiler of the quality of the DEC VMS (or TOPS-20) compiler.
However...

I think it is a bad idea to add attributes to the Unix file system.
You indicate that it would not cause any incompatibility.  There is a
sense in which this is true.  But you would have to change all the
utility programs that copy files, to copy the attributes.  You would
have to change the formats of backup tapes and tapes such as tar, to
include the attributes.  To the extent that the attributes are used,
you would have to modify language runtime systems and utilities to
take attributes into account when reading files that have them.  One
of my staff members has just written a network spooler for VMS.  It is
amazing how complex it is to read VMS files in their full generality,
at least from Modula 2.  (Perhaps this is a defect in the runtime
system.)  This complexity has nothing to do with whether there is an
extra layer of RMS between you and the file system.  Indeed that layer
may make things more liveable.  It has to do simply with the
complexity of the file system.  I am recommending that our Computer
Science Dept use Unix, partly because I want an O.S. that is simple.
I would like our students to be able to do some system programming.  I
would not like to face them with the complexities of an RMS file.  If
you add attributes to your Unix, I would regretfully have to rule it
out as a candidate for our department.

However the problem that you pose still remains.  I think you want to
distinguish between 2 kinds of files: those that are intended to be
human-readable, and binary files.  I believe you should do whatever
violence is necessary to keep human-readable files in a single, simple
format.  This is the clear difference between Unix/Tenex on the one
side and IBM/VMS on the other.  I believe Unix people have chosen
which side of the fence they want to be on, and you should respect
that decision.  Fortunately, I believe you do not have to do much
violence to Fortran to make this work.  The only structure you really
have to worry about in human-readable files is carriage control.  I
suggest that the runtime system should turn the carriage control into
carriage return, line feed, form feed, etc.  At first glance, this
appears to be a  problem.  After all, you say, Fortran programs might
write a file using carriage control, and expect that when the file is
read back in, the carriage control is still there.  However as I
understand it, Fortran 77 has deemphasized carriage control.  I
believe it is now used only in "print" files.  It seems reasonable to
believe that a print file is not normally going to be read back in as
data to another Fortran program.  Thus I believe you should do the
following:
   - by default, map carriage control into CR, LF, etc. when output
	is to a "print" file.  I suggest a convention that by default
	units 0 (stderr) and 6 (stdout) are print files.
   - supply an option to OPEN to override this.  
   - for programs that do not use these mechanisms properly (e.g. old
	Fortran 66 programs), the only damage is that the ANSI
	carriage control characters will show up in column 1.  There 
	can still be a filter to handle this explicitly for those
	exceptions.
I do not like the TOPS-20 idea of defaulting depending upon the actual
output device (/dev/tty and /dev/lpt being print, disk files
nonprint).  The program will not then know in advance whether the file
is a print file. That makes it unnecessarily hard to code.

For binary files, I like the idea of a "magic number" that specifies
"This is a structured binary file".  In case you are not familiar with
the concept of magic number, all relocatable and executable binaries
have a certain number in their first 32 bits.  There is no danger of
confusing these files with text files, since the magic numbers are
small integers.  Thus the first 2 or 3 bytes are always 0, which is
unlikely in a text file.  You then need a way to specify the
attributes.  Experience with network protocols and other things
suggests a text format for this.  If you use bits, you will always run
out of bits.  There are several reasonable formats.  My favorite (you
are going to laugh, I'm sure) is Lisp format: a parenthesized list
with attribute-value pairs, e.g. ((RECORD-SIZE 200) (FORMAT VBA)) This
is simple to parse using a higher-level language.  Xerox used it for
specifying file attributes in PUP FTP, and it is easier to handle than
the alternatives I have seen elsewhere.  A more "binary" format might
be pairs of null-terminated strings, ending with an extra null.  But I
think the Lisp format is better.  You would probably want a convention
that the actual data begins on the next 32-bit boundary after the end
of the attributes, since that might simplify processing for certain
situations.  (For paged files, such as B-trees, you would probably
want to skip to the next page boundary, but that would be an action
implied by certain attributes.)

PS: in future messages, could you give a UUCP route?  I don't have
a routing to mrfort.DEC offhand.


Charles Hedrick
Rutgers University

uucp:   ...{harvard, seismo, ut-sally, sri-iu, ihnp4!packard}!topaz!hedrick
arpa:   HEDRICK at RUTGERS



More information about the Comp.unix.wizards mailing list