error handling techniques?

Mon Nov 12 10:15:05 AEST 1990

In article <1990Nov3.153643.26368 at clear.com> rmartin at clear.com (Bob Martin) writes:
>In article <1990Nov2.205831.23696 at elroy.jpl.nasa.gov> alan at cogswell.Jpl.Nasa.Gov (Alan S. Mazer) writes:
>>I'm interested in what approaches people use for error handling, particularly
>>in general purpose function libraries and large software systems.  If someone
>
>Alan:
>
>In a large software system the number of places where the code can
>detect errors can range into the tens of thousands. ...
>
>What I have done in the past to cope with this is to create an
>ErrorLog function which will write single line error messages into
>an error log file. ...
>                  ...  Errors of similar types should _not_ use the
>same <loc>! ...
>
>Every hour I close the current error log file and open a new one.
>At the end of the day I compile then into a summary and eyeball
>them to see if anything horrible went wrong.  Software can be written
>to automatically scan these logs to see if there are critical errors.
>
>							Hope this helps.
>							I welcome discussion.
You got it.

I have some general comments about your scheme.

The <loc> numbers would have to either be stored in one central place or
you would need an allocation scheme that allocates blocks of numbers to
various subsystems. Either way, this seems like an awful lot of work to
set up.

The manual nature of evaluating whether any serious errors have occurred
bothers me ( unless you're the only one that runs your software ). It would
require a rather intimate knowledge of the entire system. It also
bothers me that the errors are "hidden" in a logfile (again assuming
other people run your software). Out of curiosity, how big do your logfiles
get?

>
>----------------------------------------------------------------------
>R. Martin
>rmartin at clear.com
>uunet!clrcom!rmartin
>----------------------------------------------------------------------

And, in answer to Alan's original posting:

I really like exceptions. I don't use them. Exceptions in C require writing an
exception handling mechanism which I have never had the time to write for
my own "small" programs. There are other systems I use which have used different
error handling mechanisms from Day One and are "too big" to change now.

All the code we write returns 0 on error. ( By never using '0' as an index,
I can usually get away with this. ) Usually failures are trickled up to
the function level where enough is known that they can be handled. The macro
we use to "fail out" of a routine is called "FAILIF" and takes a condition,
an error number and an error parm as parameters. If the condition is true,
the error number is assigned the global variable errno, and the error parm
is assigned to the global variable errparm. In addition, FAIL's behaviour
can be modified to do a little cleanup before returning which solves one of
the problems of multiple returns, although it is not that elegant.

At higher levels, we will check for errors that we do not wish to handle
( like failures from malloc() ) by using fatal assertions. A fatal assertion
asserts that a condition is true (nonzero), otherwise, it prints the
string argument, hex dumps any areas of memory that the user wishes to dump,
prints the errno, the name of the errno, the errparm, the line number, the
file name, and function trace. ( This varies from the standard UNIX assert()
mechanism. ) It then terminates the program. A non-fatal assert is also
available for conditions that must be reported but need not be acted upon.

Code using FAILIF and fatal assertions reads quite easily and is easy to
write. You generally check a condition once, FAILIF or FASSERT it, and
continue, secure that you are dealing with only valid values from here on
in. Assertions should actually be coded in the interface to the routine
because they can be valuable documentation, but we're not that sophisticated
yet.

To reduce code overhead, there are a number of functions whose failure
is almost never handled (fclose, malloc, write, ... ). These functions
are generally wrapped in envelopes that assert the success of the call.
The user can then use the secure call if he wishes to program safely,
or the lower level interface if he can handle the error himself or if
he doesn't care. ( Apathy is the only good reason for ignoring return
codes. )

One of the problems with the trickle-up method of subroutine failure, is
that, often, you do not wish to decide on how fatal the error is at the
lower level, and so the error trickles up to a much higher level where
the severity is understood, but the exact condition which caused the error
is lost. There are also cases where no one level contains all the info
needed for a meaningful error message.

One solution to this is to use a stack of errnos and errparms instead of
single global variables. It also helps to have a user definable error
string that is saved in this stack. As the error gets passed up the call
chain more information is added. If the main program chooses to abort,
the entire error stack can be dumped giving a complete description of the
error. Although this generates really nicely detailed error messages
with very little coding trouble, I have not used it on any programs that
have enough levels of function calling to make it really worthwhile.

Anyway, those are my experiences. And my code is usually a great test suite
for error checking mechanisms!

rba iv