Archive Tapes

Tue May 8 07:06:56 AEST 1990

In article <261 at bradf.UUCP> brad at bradf.UUCP (Bradley W. Fisher) writes:
>... things do revolve around the buffer size. As I understand it, this is 
>the amount of data transferred from origin to ram before it is transferred
>to the final destination. Also, as I understand it, this has been declared
>by AT&T source license code to be 10k (or 20 - 512 byte blocks,hence the
>usual blocking factor of 20) for tar. This was probably quite adequate for 
>older start/stop reel and cartridge systems.

I think you're confusing application code and kernel code here.  If you
use the '-B' option on cpio, it gives you 5120 bytes, 10 Unix-standard
blocks.  The '-C' option will let you use a bigger block size, or you
can run through 'dd' to do the buffering.  I've never used the SCO system,
so I may be incorrect in the following conjecture; take it with however much
salt you wish... I suspect that the difference in performance you are seeing
between (fundamentally) unblocked cpio/tar transfers on SCO and the other
systems is that the SCO tape driver is probably buying a large buffer and
hiding the buffering operation from the application program.  We (Interactive)
rejected this approach for two reasons: 1) For dumb (single address, single
count) DMA tape controllers, you need to have PHYSICALLY-CONTIGUOUS memory
for your buffer.  Large chunks of this become difficult to find after the
system has been running for any length of time, so you are usually forced
to buy the pages at INIT time.  This removes that memory from user programs
WHETHER OR NOT THE TAPE IS BEING USED!  2) the original philosophy for Unix
was (and still should be, IMHO) that things that can be done in user code
SHOULD be done in user code, not in the kernel.  Since 'dd' existed for
buffering (although it tends to hide end-of-tape detection even more, sigh)
and the latest cpio supports the -C option, there is no real win to attaching
a comparatively expensive resource (memory) to an I/O device just so that
programs not using large buffers run fast.

>Various companies that have licensed the source to *NIX either have or have
>not addressed this problem, and hacked the source for tar to increase the 
>buffer size. It seems Interactive falls into the latter category. However,
>SCO falls into the former ... their "blocking factor" of 20 *I beleive* is
>really a multiple of ten ... and about 100k is being tranferred at a time.
>With less stops for transfer of data this results in an overall rate increase.
>

See above comment...

>Now for the clincher ... how do you keep it streaming? Well, going out to
>tape(the slowest device in the picture) you would ideally fill one ram buffer
>area with data from the disk and feed it to another ram buffer area (are we
>talking pipes here?) that is in control of feeding the tape drive. I think 
>from what I've read, that to be able to do this involves the use of "shared
>memory", and in brief I've also been told "you can't do that with the Intel
>achitecure".

There is nothing in the Intel architecture that prevents having shared memory
between two processes; if there were, 386-based Unix systems would never pass
the System V Verification (Validation?) Suite (SVVS, required if you're going
to call something Unix).  You could indeed write 2 cooperating processes,
one of which fills memory from disk while the other writes it to tape.  This
is known as double-buffering, and is a pain to do under Unix (as it requires
two processes and shared memory instead of asynchronous I/O as God intended).
I guess most systems writers never considered it a big enough problem to
bother re-writing the back-up programs.  Using large buffers will cause the
tape to stream for quite a time, stop a little, and then stream again, so
it saves BUNCHES of time over writing little teensy records.  Just as a BTW,
AIX/PS-2 (at least version 1.2) DOES have a 2-task cpio.

Besides, if backups didn't take forever, where would all the grave-shift
operations people find work?  :-)
DLP