The performance implications of the ISa bus

Tue Dec 11 05:24:30 AEST 1990

On 5 Dec 90 14:44:45 GMT, jcburt at ipsun.larc.nasa.gov (John Burton) said:

In article <1990Dec5.144445.18632 at abcfd20.larc.nasa.gov>
jcburt at ipsun.larc.nasa.gov (John Burton) writes:

jcburt> In article <PCG.90Dec4160737 at odin.cs.aber.ac.uk>
jcburt> pcg at cs.aber.ac.uk (Piercarlo Grandi) writes:

pst> You're comparing CPU performance to I/O performance. [ ... ] Back
pst> when there were REAL(tm) computers like 780, a lot of time and
pst> energy went into designing efficient I/O from the CPU bus to the
pst> electrons going to the disk or tty. [ ... ] Sure OS's and apps have
pst> gotten bloated, but when you put a chip like the MIPS R3000 on a
pst> machine barely more advanced than an IBM-AT you end up with a toy
pst> that can think fast but can't do anything.

pcg> No, no, no, no, no, no, no. The IO bandwidth of a typical 386 is
pcg> equivalent or better than that of any UNIBUS based machine, and, in
pcg> practical terms, equivalent to that of MASSBUS based ones. You can get
pcg> observable raw disc data rates of 600-900KB/s and observable filesystem
pcg> bandwidths of 300-500KB/s under SVR3.2 (with suitable controllers and a
pcg> FFS of some sort). This is way better than a PDP-11.

jcburt> True, a typical 386 machine has good I/O bandwidth, but
jcburt> bandwidth isn't everything. The majority of 386 machines have an
jcburt> ISA bus which is a very simple bus controlled by the cpu. When
jcburt> performing I/O, the cpu blocks itself and turns control of the
jcburt> bus to the I/O device.

This not quite true. Actually it is not true at all. You seem to be
describing synchronous programmed IO, which is not used in most ISA
peripherals.  Most ISA peripherals are interrupt driven, and even use
DMA, and the CPU can work between interrupts. Definitely.

jcburt> Machines that were originally designed as a multi-user platform
jcburt> usually where set up so that the I/O could be performed without
jcburt> the direct control (or blocking) of the cpu. The system bus was
jcburt> designed so that multiple operations could occur more or less
jcburt> independent of the cpu (multi-tasking hardware design).

This is entirely true of the ISA bus and any PC system around. Hey, they
even have DMA (well, read on).

However, I can easily see that you misconceptions have a root in three
problems with typical ISA machines, one that is particular to the design
of a PC clone, and two that are particular to the most common disk
controller design for such machines.

For a very ugly reason, the DMA chips that perform DMA under the CPU
control are nearly useless for high speed transfers, and on some designs
the braindamage is bad enough that the few slow DMA channels avaialable
cannot ven be shared. But there is no such restriction for DMA driven by
a peripheral board itself, not by the CPU, and some (rare) boards have
bus mastering ability and have their own DMA onboard.

Since DMA using the CPU controlled DMA channels is so bad, the standard
WD style AT controller does not use DMA. It is interrupt driven, so
while the controller is seeking the disk or transferring data the CPU is
free. When the controller is done seeking and transferring, the CPU gets
an interrupt, and then copies byte by byte, with a very fast block move,
the sector read from the controller's onboard cache to core. This is
indeed done using programmed IO, synchronously and the CPU is busy while
doing it, but it takes relatively little.

Finally, the common type of ISA disk controller, for other relatively
ugly reasons, is single threaded. This means that it cannot overlap
seeks and transfers to/from multiple disks. It cannot overlap multiple
tranfers because of the above mentioned sector buffer; there is only one
sector buffer... In theory it could overlap seeks on two drives, or
seeking on one with transfer on another, and indeed this can be done
with seek buffering (ST506) devices using a clever (and obscene) hack.

The really big problem for multiuser operation is the lack of overlap;
the authors of the UNIX disk driver sort routine report that on with a
multithreaded controller on a PDP-11, three moving arm disks operating
in parallel givem under typical timesharing loads, the same performance
as if they were a single fixed arm one with the sum of their capacities.

This means that with a multithreaded disk controller, three disks, and
typical timesharing load, the ability to move three arms in parallel is
the same as having a single zero seek time arm. A big, big, big win.

Two disks on a multithreaded disk controller are already a very large
improvement over a single disk for timesharing, especially if you spread
the (instantaneous) load across them by careful positioning of your
partitions.

Now back to the ISA bus. As somebody observes elsewhere, the IO
bottlenecks of a timesharing system are the terminal lines and the disk
controllers. If you use intelligent terminal controllers and intelligent
multithreaded disk controllers you timesharing performance will be
impressive, on a par with that of a VAX of the same class.

Just using FIFO based serial line controllers substantially reduces
terminal IO overhead; just using two ESDI controllers, one per each
disk, will give tremendous improvements, because the two controllers
will be able to seek and transfer in parallel.

If you want higher performance use a microprocessor based intelligent
serial line controller, and something like an AHA 154x disk controller,
that is multithreaded, bus mastering, and has its own fast DMA channels.

Ah, a final note: if you really want high performance form your
multiuser ISA machine, DO NOT use in any way the console. Access to
video RAM is so abysmally slow that it could consume a large portion of
your bus bandwidth. If you want to do fast graphics on an ISA machine,
buy an X terminal and a fast Ethernet board, don't use the console,
unless you get a really expensive super intelligent video board with
very fast truly 16 bit memory, but I think that for timeharing the X
terminal solution is still better, and not much more expensive, because
it allows further overlap in the generation fo the graphics and in its
rendering on the screen.

In summary: to saturate an ISA bus (5 MB/sec) you need a pretty large
number of peripherals running continuously, such as more than three
disks (say 800KB/sec each) and a network board (say 600KB/sec), which
brings us to 2/3 of nominal. Things like a QIC tape (90KB/sec), 8 serial
ports (20KB/sec for eight ports simultaneously at 19200 baud), and so on
are irrelevant for bandwidth. You have then a problem with the typical
high interrupts processing overheads of 386 UNIX systems, with their
often badly written drivers, but if you use the right controllers even
these are not that important.

Let's say that a machine with 8 FIFO based serial lines, 2 < 20msec seek
time discs attached to an AHA154x, a 386/25 noncaching motherboard (4
MIPS, let's say), and 16 MBytes can comfortably support 8 users doing
fairly heavvy development work even using things like G++ and GNU Emacs.
--
Piercarlo Grandi                   | ARPA: pcg%uk.ac.aber.cs at nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth        | UUCP: ...!mcsun!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg at cs.aber.ac.uk