Record-access libraries (with q

gwp at hcx3.SSD.HARRIS.COM gwp at hcx3.SSD.HARRIS.COM
Thu Oct 20 03:54:00 AEST 1988


Written  4:21 pm  Oct 17, 1988 by jc at minya (John Chambers)
>> If you access the raw disk device do you disable that read-ahead and
>> write-behind aspect of the UNIX filesystem abstraction?

> Oh, wow!  A question with a simple answer: Yes.  According to several
> manuals, the main difference between /dev/dsk* and /dev/rdsk* is that
> there is no buffering for the latter.  Reads always delay for physical
> I/O, and writes always go immediately to disk (though with DMA, the
> write may not be complete when write() returns).  There's also a
> warning that the raw disks should be only accessed in multiples of
> a sector.  In fact, most programs use multiples of BUFSIZ, which 
> is invariably a multiple of a sector.

Maybe this is obvious, but you have to keep in mind that there is also
no "file-system" with a raw disk device.  I mention this because I
have seen a number of database programs that directly read and write
to and from the raw disk (for performance/safety reasons) then turn
around at some later time and access that information through a
file-system (for convenience).  To do this the database kernel did all
sorts of system specific manipulations to mesh with the "invisible"
file system before performing their raw I/O.  This all struck me as
rather stupid because you can disable at least the write behind
portion of the buffer cache by specifying O_SYNC when opening the file
(at leat you can under System V).

> The exact wording in one of the manuals describes the "'raw' interface
> which provides for direct transmission between the disk and the user's
> read or write buffer.  A single read or write call results in exactly
> one I/O operation and therefore raw I/O is considerably more efficient
> when many words are transmitted."  Note the specific claim that the
> transfer is direct between the disk and the buffer in user space,
> without going through a kernel buffer.

Not to get into any wild plugging but we've worked out a method for
doing the same thing with a mounted file system i.e. transferring the
data directly from the users adddress space to the disk without going
throught the kernel buffer.  Interestingly enough the main performance
gain with this method doesn't come from avoiding the buffer copying
but more from the fact that you can do single transfers of up to 256K
rather than 32 individual transfers of 8K (our block size). Of course
this assumes the ability to lay out 32 disk blocks contiguously.

Gil Pilz   -=|*|=-   Harris Computer Systems   -=|*|=-   gwp at ssd.harris.com



More information about the Comp.unix.wizards mailing list