LEX behaviour when given "large" automata.

Chris Torek chris at mimsy.UUCP
Sun Mar 20 15:30:17 AEST 1988


Several people suggested replacing lex description such as

	if	return (IF);
	then	return (THEN);
	else	return (ELSE);
	{id}	{ symenter(yytext); return (ID); }

with

	{id}	{ /* first try it as a keyword */
		  k = keyword_index(yytext); if (k >= 0) return (k);
		  /* not a keyword, must be an identifier */
		  symenter(yytext); return (ID); }

This may (probably does) help in lex, but it is clearly the wrong way
around in almost all cases.  The lexer has to traverse a DFA to decide
that something is an ID.  While it is at it it could easily tell
whether it is a keyword instead.  Obviously this makes the DFA table
larger, but the result should run faster.

It turns out (as Van Jacobson discovered) that lex uses a long
subroutine---on the order of 200 lines---to do what is essentially
described by the loop

	/* follow the DFA */
	for (state = begin; state[c].v == c; c = *in++)
		state += state[c].n;

(best case) or

	/*
	 * Pick the right start state depending on whether we are at
	 * the beginning of a line.
	 */
	state = newline ? hat_begin : begin;
	while (state[c].v == c) {
		/* follow the DFA */
		state += state[c].n;
		c = *in++;
		/* remember action state (for right context) */
		if (state->v) {
			a_s = state;
			a_c = in;
		}
	}
	in = a_c;
	if (a_s->n) {
		in -= a_s->n;	/* back up over right context */
		c = in[-1];
	}
	newline = in[-2]=='\n';	/* remember whether we are now at ^ */
	in[-1] = '\0';		/* kill residual */

(worst case).  This pretty much handles everything except YYREJECT.
I do not know whether the LBL `flex' (Fast LEX) handles YYREJECT now.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris at mimsy.umd.edu	Path:	uunet!mimsy!chris
--
Send compilers articles to ima!compilers or, in a pinch, to Levine at YALE.EDU
Plausible paths are { ihnp4 | decvax | cbosgd | harvard | yale | bbn}!ima
Please send responses to the originator of the message -- I cannot forward
mail accidentally sent back to compilers.  Meta-mail to ima!compilers-request



More information about the Comp.lang.c mailing list