Asynchronous I/O on UNIX?

Thu Nov 21 07:44:04 AEST 1985

> 
> As more and more mainframe systems are moving to UNIX, I am
> very interested in finding out how asynch I/O is being implemented
> on these systems.
> 

This was one of the first complaints I had about UNIX after having used
operating systems which imposed fewer constraints on how user processes
performed synchronization on I/O completion.  There are a few things
worth contemplating here.  Firstly, why there is no asynchronous I/O
mechanism in UNIX, and secondly, how may such a mechanism be implemented.

Before getting down to details, it should be pointed out that there
are other, conceptually cleaner methods for performing overlapped
I/O.  That is, using multiple processes each of which have no more
than one outstanding (blocking) I/O request.  This form of overlapped
I/O results in a conceptually straightforward implementation but is
costly in terms of efficiency.  This is especially true due to the
hard boundaries maintained between process address spaces and the lack
of a shared memory mechanism (non SYSV).  In addition, the overhead from context
switching contributes to an overall inefficiency.  Moreover, the current
mechanisms in UNIX for interprocessor communication, e.g., pipes, sockets,
or files, all result in the copying of data to and from the kernel address
space as it is being transferred to the destination process.  This introduces
more inefficiences.

There is, however, the select() system call in 4.[23]BSD UNIX which allows
a timed blocking poll of multiple potentially outstanding I/O activities.
This is many times more efficient than the previous busy wait polling method
which used the FIONREAD ioctl(), and this latter method was usable on a
limited number of I/O activities, e.g., read().  Given these various
mechanisms for performing multiple I/O activities, most applications have
chosen to make do with them rather than address the more difficult task
of implementing a more general kernel-based asynchronous I/O mechanism.

My efforts originated while implementing an I/O intensive signal processing
application under RSX-11M/S using Whitesmith's C.  My first job was to
throw out the junk Whitesmith's called a standard I/O library and make it
more similar to UNIX V7.  I actually used the 4.2BSD stdio with the addition
of most V7 system calls which were mapped to RSX Executive Calls of one
sort or another.  Since, for this application, I had strong need of the
efficiency of asynchronous I/O, I needed some UNIX like mechanism for
implementing it.  What I ended up with is as follows:  prior to performing
an I/O operation, e.g., read(), write(), or ioctl(), an fcntl() call is
performed with a command argument of F_ASYNC, and an argument which points
to an Asynchronous Control Block.  This argument structure contains the
address of the asynchronous I/O handler and an optional argument to be
passed to the handler.  The optional argument is used to communicate application
specific information to the handler about the subsequent I/O activity.
The handler is invoked upon I/O completion as follows:
	(*handler)(status, opt-argument);
Thus the status code indicating the success/failure of the I/O activity is
communicated along with the optional arugment specified in the Asynchronous
Control Block.

It may be argued that a cleaner mechanism could be implemented, especially
since this calls for two stages, i.e., arming and execution phases.  However,
I felt that it was better to do it this way than to add another argument to
all I/O related system calls, or even worse, to add yet more system calls.
>From the application programmer's perspective, this mechanism is quite simple to
use and builds upon existing system calls.  The semantics of handler invocation
are quite simple and result in a clean interface with minimal global data
communication.  Furthermore, since this mechanism enables asynchronous
notification on a per descriptor basis, it is possible to have outstanding
I/O on multiple descriptors.  Further still, since an optional argument is
specified on a per I/O request basis, i.e., the optional argument in the
Asynchronous Control Block, it is possible to have multiple outstanding I/O
requests on a single descriptor and use this optional argument to identify
the request.

For the application for which I implemented this mechanism, it was necessary
to have overlapping I/O on multiple devices and to have multiple outstanding
requests enqueued to a single device.  The latter was necessary to reduce I/O
turnaround latency on devices with very small data overrun periods, e.g.,
an unbuffered A/D converter.  I haven't mentioned a few details here such
as the obvious need for blocking out sections of critical code from incurring
asynchronous entry.

Now that I have had some success with this particular interface mechanism
for performing Asynchronous I/O, I am consdering how it might best be
implemented in the 4.[23]BSD environment.  I haven't scoped the problem
enough at this point to be able to state how difficult this will be.  One
problem that I see already is the fact that different device drivers use
different mechanisms to perform synchronization.  Some use iowait(), and others
call sleep() directly.  If a single mechanism were used, e.g.,
iowait(), then the task would be much easier.  Those drivers that use
iowait() could now be easily converted to enable asynchronous notification
since a hook could be placed in iowait() to allow the process to continue and
then to notify the user process when the driver calls iodone().  However,
the other drivers would be much more difficult since they don't necessarily
follow this strict protocol, i.e., calling iowait() and then iodone().
The actual notification could come via the psignal() mechanism with a special
signal (SIGIO ?) being used to get things going.

I'm not sure when or if I will get an opportunity to try implementing these
ideas in the UNIX kernel; however, I thought it might be interesting to discuss
the ideas that I've had on the subject in case you or others are interested
in actually doing the implementation.
-- 

Glenn Adams
MIT Lincoln Laboratory

ARPA: 	glenn at LL-XN.ARPA
CSNET:	glenn%ll-xn.arpa at csnet-relay
UUCP:	...!seismo!ll-xn!glenn
	...!ihnp4!houem!ll-xn!glenn