Lex bug
Charles Sandel
sandel at tuvalu.sw.mcc.com
Fri Jun 23 08:57:16 AEST 1989
One of our programmers found a bug in the 'lex' source.
The bug report follows.
We get various error messages from Andrew's class. I finally tracked
down what was the problem: a bug in lex. class uses lex for its
lexical input, in particular to recognize the class keywords like
InitializeObject and FinalizeObject. InitializeObject was not being
recognized. Tracing through the state machine generated by lex and
then used to parse the input, it turns out that there are *exactly*
255 states in the state machine. During construction, the states are
listed as 0 to 255. In cmd/lex/sub2.c, line 797, lex checks to see
if it needs a byte (char) or larger quantity to store the states:
fprintf(fout,"# define YYTYPE %s\n",stnum+1 > NCH ? "int" : "char");
where NCH is 256 (number of characters). Notice that stnum+1 is
compared rather than stnum (number of states). This is because the
zero state is used as an error state, and the states 0..255 are shifted
up by 1 to 1..256. stnum is 255 so stnum+1 is 256. stnum+1 is not
greater than NCH (which is 256) since they are equal, and a char is
then used for YYTYPE which holds the state number. As a result, lex
creates tables which store state 256 (old state 255+1) in a char. This
is, of course, zero, and the lexical token ending in that state is
not recognized. I will submit a bug report on this today.
In fact this code is badly written in several respects. First, the
comparison should not be against NCH. The value of NCH varies according
to whether NLS (with an 8-bit character code) is used or not. However
the size of a number that can be stored in a char is dependent upon
the size of the char, not whether a 7-bit or 8-bit character code is
used. Further, we want the smallest storage unit that will hold all
the state values. Thus the code should be:
fprintf(fout,"# define YYTYPE %s\n",
(stnum+1 <= 0xFF) ? "unsigned char"
: (stnum+1 <= 0xFFFF) ? "unsigned short"
: "unsigned long");
More information about the Comp.bugs.4bsd.ucb-fixes
mailing list