two (or more) lex's/yacc's in one executable

Martin Weitzel martin at mwtech.UUCP
Tue Dec 11 02:49:46 AEST 1990


In article <14674 at smoke.brl.mil> gwyn at smoke.brl.mil (Doug Gwyn) writes:
>In article <1990Dec6.200944.13037 at cs.columbia.edu>, leland at cs writes:
>- I've tried this kludge:  create a header file that re-#define's all the
>- names 'yyfoo' in lex/yacc set #1 to be named, say, set1yyfoo, and all
>- those in set #2 to be named set2yyfoo.  This has worked for me in the
>- past, but won't in this particular instance because the generated
>- code includes calls to yyless() and yywrap(), which are in the LEX
>- library (-ll), the contents of which I cannot rename.  So that doesn't
>- work.
>
>But it almost does -- Since "lex" produces C source, you can #define
>set1yyless yyless, etc. before the lex output to be compiled, thereby
>turning these selected reference back into calls to the shared library
>functions.  (I assume the lex library does not maintain internal state.)

Unfortunately things are more complicated. Here is an excerpt from
`nm /usr/lib/libl.a' (UNIX Sys V):
----------------------------------------------------------------------
Symbols from /usr/lib/libl.a[reject.o]:

Name                  Value   Class        Type         Size   Line  Section

reject.c            |        | file |                  |      |     |
yyreject            |       0|extern|            int( )|   270|     |.text
yyracc              |     272|extern|            int( )|   154|     |.text
yyinput             |       0|extern|                  |      |     |
yyleng              |       0|extern|                  |      |     |
yytext              |       0|extern|                  |      |     |
yylsp               |       0|extern|                  |      |     |
yyolsp              |       0|extern|                  |      |     |
yyfnd               |       0|extern|                  |      |     |
yyunput             |       0|extern|                  |      |     |
yylstate            |       0|extern|                  |      |     |
yyprevious          |       0|extern|                  |      |     |
yyoutput            |       0|extern|                  |      |     |
yyextra             |       0|extern|                  |      |     |
yyback              |       0|extern|                  |      |     |


Symbols from /usr/lib/libl.a[yyless.o]:

Name                  Value   Class        Type         Size   Line  Section

yyless.c            |        | file |                  |      |     |
yyless              |       0|extern|            int( )|   107|     |.text
yyleng              |       0|extern|                  |      |     |
yytext              |       0|extern|                  |      |     |
yyunput             |       0|extern|                  |      |     |
yyprevious          |       0|extern|                  |      |     |


Symbols from /usr/lib/libl.a[yywrap.o]:

Name                  Value   Class        Type         Size   Line  Section

yywrap.c            |        | file |                  |      |     |
yywrap              |       0|extern|            int( )|    16|     |.text
----------------------------------------------------------------------

The problem is not some internal state of these functions, but that they
expect a number of external `yyfoo'-symbols, and there is no way to make
them access the `right' ones without rewriting the functions.

So, how hard would it be to rewrite them?

The trivial case is `yywrap'. I hope AT&T doesn't sue me because of reverse
engineering :-), but this function is a one-liner.

	yywrap() { return 1; }

The two other functions (`yyless' and `yywrap') may have complicated
interactions with a lot of globals, so the best solution is to avoid
them and do manually what is required. This is simple in case of
`yyless', since it is usually used to push back parts of `yytext' to
the input stream. This can also be done by with `unput()'-macro in a loop
(The library version of `yyless' does this via the `yyunput()'-function but
this function simply calls `unput()' which may have been redefined -
have a look into `lex.yy.c' to understand how things work together.)
In addition the "original" `yyless' adjusts `yytext' and `yyleng'
accordingly. The part that still worries me is the reference to
`yyprevious' within `yyless'. To be sure, you should probably disassemble
the library version of `yyless' - it's not that large.

``yyreject' should best be completly avoided because it plays with a lot
of external symbols (the poster of the original question is lucky here, but
others may understand this as a hint to use REJECT - which in turn calls
yyreject() - only as a last resort).

BTW: Another option is to have a common lexer for both sets of input
symbols and use start conditions in lex to select the appropriate ones.
It's a pitty that start conditions are insufficiently explained in the
common documentation of lex (if they are mentioned at all).
-- 
Martin Weitzel, email: martin at mwtech.UUCP, voice: 49-(0)6151-6 56 83



More information about the Comp.lang.c mailing list