ESDI controller recommendations
clewis at eci386.UUCP
clewis at eci386.UUCP
Wed Aug 30 09:00:48 AEST 1989
In article <4843 at looking.on.ca> brad at looking.on.ca (Brad Templeton) writes:
>While cylinder or track caching is an eminently sensible idea that I
>have been waiting to see for a long time, what is the point in the
>controller or drive sotring more than that?
>
>Surely it makes more sense for the OS to do all other cache duties.
>Why put the 512K in your drive when you can put it in your system and
>bump your cache there? Other than the CPU overhead of maintaining the
>cache within the OS, I mean. I would assume the benefit from having
>the cache maintained by software that knows a bit about what's going
>on would outweigh this.
I've had quite a bit of exposure to the DPT caching disk controllers
so I'll outline some of the interesting points. Some of these pertain
generally to DPT, or only the models I was playing with (ESDI and ST506
disk interface versions with SCSI host interface), or more generally.
1) Write-after caching: Most systems do their swapping and/or paging
raw. Thus they must *wait* for a write operation to complete
before reusing the memory. Eg: avg 28 ms with ST506 drive.
With write-after, you can reuse memory in .5 ms no matter how slow
your drive is (unless the cache really fills).
I installed one of these suckers on a Tower 600 with 4Mb running
Oracle. We were able to immediately double the number of users
using Oracle (from 4 to 8 simultaneous actions with considerably
better response for all 8. Oracle 4.1.4 is a pig! So was the
host adapter at the time - 3-6ms to transfer 512 bytes!).
A look at the controller statistics showed that the system was
swapping like mad, but virtually *no* physical disk I/O's actually
occurred. Eg: blocks were being read back so fast that the
controller never needed to write them out.
Of course, this can be similarly done by adding physical memory
to the system, however, DPT memory is cheaper than Tower memory...
2) Host memory limitations - how does 16Mb of main memory almost
exclusively for use by programs and 12Mb of buffer cache strike you?
(AT-style system limitations) Otherwise there's lots of tricky
trade-offs.
On the other hand, when faced with lots of physical memory on the
host, it makes far more sense to use it for program memory than a RAM
swap disk.
3) If your kernel panics, the controller gets a chance to flush
its buffers - handy particularly if you make the kernel buffers
small. Was sort of scary to see, for the first time, a Tower
400 woof its cookies (so I'm not a perfect device driver writer ;-)
and see the disk stay active for another 30 seconds...
4) If you have a power failure, having the cache on the controller
is a bad idea, because the kernel does make some assumptions
about the order in which I/O occurs. With the models I was
using it made economic sense to place a UPS only on the controller
and disk subsystem. I don't know whether this is possible on the
AT versions, but on the AT versions it's cheaper to get a
whole-system UPS.
5) DPT read-ahead can be cancelled by subsequent read requests.
6) The DPT's algorithms (eg: replacement policy, lock regions,
write-after delay times, dirty buffer high-water, cache
allocation amongst multiple drives etc.) can be tuned. Most
kernels cannot be much.
7) Now we get into the hazy stuff - I'm convinced from the testing
that I did with the DPT lashups I built, plus experience
inside other kernels, that the DPT has far better caching than
most UNIX kernels.
Generally speaking, except for look-ahead (which the DPT supports
as well) kernel take no special knowledge of the disk *other* than
inherent efficiency of file system layout (eg: Fast File System
structures) and free list sorting (dump/mkfs/restore anyone?).
For example, except for free-list sorting and other mkfs-style
tuning, fio.c and bio.c (file I/O and block I/O portions of
the kernel) don't know diddly squat about the real disk.
Whereas, the DPT knows it intimately - sectors per track,
rotational latency etc.
The DPT uses the elevator algorithm and apparently a better
LRU (page replacement) algorithm, has sector and cylinder
skewing and so on.
Unfortunately, I no longer have a copy of the report. Further, most of the
measurements I was making was with reasonably representative technical
measures of performance, but don't give an overall feel for performance.
However, one that I remember may be of interest - kernel relinks on the
Tower usually took close to 3 minutes. With the DPT, it went to little over
2 minutes. Big hairy deal... However, further examination of "time" results
showed that the I/O component *completely* disappeared. Like wow.
Some other simple benchmarks showed overall performance increases of up
to a factor of 15!
The only way to make the DPT system work better would be to make some major
deals with fio.c/bio.c and a couple of minor mods to the DPT. For example,
multiple lower priority look ahead threads based upon file block ordering.
Explicitly cancellable I/O's or look aheads. More, but I forget now.
The DPT also has some other niceties: automatic bad-block sparing, single
command format/bad blocking, statistics retrieval, and in my case,
compatibility with dumb SCSI controllers except for the additional
features - the NCR Tower SCSI driver has this neat "issue
this chunk of memory as a SCSI command and give me the result" ioctl.
Neat stuff the DPT.
[No, I don't work for, nor have I ever worked for DPT. Hi Tom!]
--
Chris Lewis, R.H. Lathwell & Associates: Elegant Communications Inc.
UUCP: {uunet!mnetor, utcsri!utzoo}!lsuc!eci386!clewis
Phone: (416)-595-5425
More information about the Comp.unix.i386
mailing list