unix file structure (or lack of same)

Tue Nov 6 00:37:12 AEST 1990

duncant at mbunix.mitre.org (Thomson) in <125379 at linus.mitre.org> writes:

	I understand that, on unix, the file system is designed so that a file
	always looks like a sequence of bytes, with no record structure at
	all.  Is this correct?

YES, thank goodness!  Contrast that UNIX view of a "file" to that on, say,
VAX/VMS where you find eleventy-seven RMS file types that complicate efficient
and portable I/O beyond belief.  I have a commercial product in that market,
and I'm now porting it to UNIX, so this is not idle speculation.

	If so, how does one implement an efficient database manager on unix in
	a standard, portable, way?  To be efficient, a database manager needs
	to have random access into files on a record-oriented basis.  It seems
	to me that fseek() wouldn't do the job.  (Am I wrong here?) If unix
	doesn'`t provide a record-oriented view of files, then any database
	implementation would have to go below unix, and access the mass
	storage devices directly.  Is this right?

One can impose any "view" on the file one desires.  Assuming fixed-length
'records' and no funny-stuff at the beginning of the file, a typical method
to calculate any record's relative address in the file could be:

	address = (record_number - 1) * sizeof(record_structure);

and that "address" would be used per "lseek(fd, (long)address, 0);".  See
the writeup of lseek(2) for the meaning of its 3rd parameter which provides
some interesting options.  Of course, a real DBMS could be "smarter" and
calculate a block address instead, (possibly) map that into memory, and
then calculate the record's in-core offset from the beginning of that buffer.

If you're going in for really big files whose 'records' might even be
variable-length, use a secondary index file(s) whose records are fixed length
and "point" to the address of their associated data records in the big file.
Common datafile index methods are B-tree and ISAM.

And if you're REALLY concerned about efficiency and your OS version permits
it, go for either the FFS or a 4K or 8K filesystem which could even be a
separate mount and dedicated to DBMS applications.  Some "database" vendors
have claimed they've written their own filesystems due to perceived problems
with UNIX' filesystems, but I haven't seen the need for that even with some
of the humongous data files with which I operate.  And a custom file system
means you're going to need a custom backup-and-restore facility and the
attendant special procedures.

Many standard filesystems are either 1K, 2K or 4K.  This means the smallest
allocated space for a given file (ignoring sparse files) would be that size.
It also means that for small files you may end up with a lot of "wasted"
space at the end of each file.  The 1K, for example, means the logical block
size comprises two 512-byte real sectors.

Stick with the "standard" software and tools for greater portability, and
switch to custom methods only if the specific case warrants it.  With today's
modern UNIX systems and fast I/O subsystems you may be pleasantly surprised.

One final comment: you used the word "portable" often.  If that is of concern,
then you may wish to store your numeric data in ASCII form even though there
is a conversion penalty.  To move binary data files amongst systems such as
a 386/486 and 680x0 and SPARC and MIPS and VAX and ... is asking for trouble,
even for integer data.

Thad Floryan [ thad at cup.portal.com (OR) ..!sun!portal!cup.portal.com!thad ]