Error recovery (long)

Michael Meissner meissner at dg_rtp.UUCP
Fri Jun 6 14:49:16 AEST 1986


In article <312 at uw-nsr.UUCP> john at uw-nsr.UUCP (John Sambrook) writes:
>
>Regarding error recovery in C compilers, I like the error recovery
>provided by the Data General C compiler.  Here is an example of a 
>botched program:
>
>	main() {
>		int	a = 0	/* missing ";" */
>
>		printf("a: %s\n",  (a == 1) ? "1" : "?"; /* missing ")" */
>	}
>
>When compiled the following is written on stderr:
>
>	Error 502 severity 2 beginning on line 4 (Line 4 of file main.c) 
>		printf("a: %s\n",  (a == 1) ? "1" : "?";
>		^
>	Syntax Error.
>	A symbol of type ";" has been inserted before this symbol.
>	 
>	 
>	Error 502 severity 2 beginning on line 4 (Line 4 of file main.c) 
>		printf("a: %s\n",  (a == 1) ? "1" : "?";
>		                                       ^
>	Syntax Error.
>	A symbol of type ")" has been inserted before this symbol.
>
>In this example the compiler produced a program that executed correctly.
>
>To be fair, both errors are "errors of omission."  I believe, but do not
>assert, that these errors are easier to repair than other types of errors.
>In the event of serious errors the compiler will cease code generation and
>only check the remaining input.  I don't know the parsing method used in
>this compiler; it does not seem to suffer from poor error recovery as do 
>many recursive-descent parsers.

    It's a pleasant surprise when somebody says he likes something.  I am
the author of the Data General C compilers.  The parsing method that I use
is a standard LALR parse, based on an internal tool that constructs the tables
from a BNF input grammar.  In comparison to YACC, the tool is not as developer
friendly, ie, it only creates the tables, I have to write the routine that
actually interprets the parse state machine and dispatch on the semantic
actions.  The error recovery routines must also be provided as well.  YACC
on the other hand, encapsulates the parser into the the C program it generates.
It also handles error recovery (badly in my opinion), so that in general, the
user doesn't have to mess with it.  It also means that the user does not really
have the control either.

    The algorithm that I use, which is the first part of Jerry Fisher's (from
SIGPLAN, compiler construction conference) first attempts to insert, delete,
or replace the token that is in error with any of the tokens that are in the
follow set (ie would be possible, legal input), and then parses ahead 3 tokens.
The first parse that will succeed for 3 tokens is selected (the tokens are
given a priority, and tried in priority order).  The second part of Jerry
Fisher's algorithm is a complicated secondary recovery, which I initially
attempted, and gave up because adapting his algorithm to my parser kept coming
up with errors in my translation, or areas where I did not really understand
what is going on deep within the LALR tables.  As near as I can understand
from looking at it, the YACC approach is to discard tokens until it can reduce
from an 'error' production.  It's been my experience that this rarly does what
the compiler writer wants.  As far as local replacement goes, I am currently
thinking of adding another pass that would attempt to glue two tokens together
(to make += out of + and = separated by whitespace).  The priorities are the
hardest thing to get a feeling for, and I still play with them every so often.
As far as secondary recovery goes, my feeling still is that if you ever need
to go to more extereme methods, the program is hopelessly damaged, and I
question whether the programmer gets anything useful after the first few error
messages.

>While on the subject of compilers, I would like to share two other features
>of this compiler that I find useful.  I have not found these features in
>other C compilers that I have used, although I have heard that the VAX/VMS
>C compiler is very good.
>
>The first feature is the ability to generate a stack trace ("traceback") 
>in the event of a serious error.  There are two compiler switches that
>control the amount of information in a traceback.  The "-Clineid" switch
>causes the offending line number to be included while the "-Cprocid" switch 
>causes the procedure name to be included.

There have been a few responses saying dbx/adb gives you the information, if
you compile with -g and look at the core file.  The traceback feature (which
is standard on almost all 32-bit DG compilers) produces smallish tables, which
can be kept in the program file, even when it is shipped to users in production
mode.  We also support -g and dbx.

>The second feature is the ability to declare certain data structures as
>"read only." This is done via a compiler switch "-R" and applies to all 
>data structures that are initialized to a constant value within the 
>compilation unit.

This came from Berkley 4.2 (and 4.3) and was added in attempt to be as
compatible with both 4.2 and system V.2 as we could.  At some point in the
future, when the ANSI X3J11 draft stabilizes to the point of going for public
review, the `const' feature will also allow this without having to set the
option.  The private Data General keyword $shared allows this in the released
revisions.

>John Sambrook				Work: (206) 545-2018
>University of Washington WD-12		Home: (206) 487-0180
>Seattle, Washington  98195		UUCP: uw-beaver!uw-nsr!john

Michael Meissner
Data General Corporation
...{ decvax, ihnp4, ucbvax, ... }!mcnc!rti-sel!dg_rtp!meissner



More information about the Comp.lang.c mailing list