v19i088: wacco - A C++ LL parser generator, Part01/06

Sun May 19 04:00:53 AEST 1991

Submitted-by: Parag Patel <parag at hpsdeb.sde.hp.com>
Posting-number: Volume 19, Issue 88
Archive-name: wacco/part01

This is version 1.1 of Wacco, basically an LL(1) parser generator.
Wacco generates recursive-descent C++ code from an input file.  The
wacco file.w looks a lot like a yacc(1) input file, but with a lot more
syntactic sugar added.  Since the parser generated recurses, you can
do attribute-driven parsing easily and even pass information into rules
which could alter the parse.

Wacco should port and run easily on most C++ systems.  It does need C++
2.0 of some flavor.  It's been successfully built on HP-UX s300 and s800
systems, Sparc, and 4.3BSD running on HP hardware.  

The code is somewhat commented.  Feel free to hack away, add new
features, or fix my screwups.  If you make mods you feel are useful, or
fix some bug, please send me the cdiffs so I can make them available to
others too.

Parag Patel <parag at sde.hp.com

---- Cut Here and feed the following to sh ----
#!/bin/sh
# This is wacco, a shell archive (produced by shar 3.49)
# To extract the files from this archive, save it to a file, remove
# everything above the "!/bin/sh" line above, and type "sh file_name".
#
# made 05/18/1991 03:21 UTC by parag at hpsdeb
# Source directory /users/parag/tools/wacco
#
# existing files will NOT be overwritten unless -c is specified
#
# This shar contains:
# length  mode       name
# ------ ---------- ------------------------------------------
#   2899 -r--r--r-- README
#   1967 -r--r--r-- Makefile
#   4588 -r--r--r-- wacco.1
#  15987 -r--r--r-- wacco.doc
#  42897 -r--r--r-- wacco.doc.iw
#  98569 -r-xr-xr-x wacco.doc.ps
#   5585 -r--r--r-- wacco.w
#   3575 -r--r--r-- defs.h
#   1555 -r--r--r-- toks.h
#    188 -r--r--r-- boolean.h
#   2537 -r--r--r-- bitset.h
#   1624 -r--r--r-- darray.h
#   8290 -r--r--r-- table.h
#   6447 -r--r--r-- bitset.C
#   1739 -r--r--r-- tgram.w
#    229 -r--r--r-- tgram.good
#    261 -r--r--r-- tgram.bad
#   6362 -r--r--r-- main.C
#   3130 -r--r--r-- sym.C
#  18403 -r--r--r-- parse.C
#   3884 -r--r--r-- scan.C
#   5614 -r--r--r-- build.C
#   2302 -r--r--r-- check.C
#   3603 -r--r--r-- read.C
#  18803 -r--r--r-- gen.C
#   3832 -r--r--r-- io.C
#    120 -r--r--r-- version.C
#
# ============= README ==============
if test -f 'README' -a X"$1" != X"-c"; then
	echo 'x - skipping README (File already exists)'
else
echo 'x - extracting README (Text)'
sed 's/^X//' << 'SHAR_EOF' > 'README' &&
$Header: README,v 1.7 91/05/17 16:29:53 hmgr Exp $
X
Copyright (c) 1991 by Parag Patel.  All Rights Reserved.
You can do what you wish with this as long as
X    (1) you do not claim it or any part of it as yours and
X    (2) you do not remove or alter my copyright in any file.
This software is provided "AS IS" without any implied or express warranty
as to its performance or to the results that may be obtained by using this
software.  It is completely unsupported.  You're on your own.
X
X
This is version 1.1 of Wacco, basically an LL(1) parser generator.
Why Another Compiler COmpiler?  Why not?!?
X
Wacco generates recursive-descent C++ code from an input file.  The
wacco file.w looks a lot like a yacc(1) input file, but with a lot more
syntactic sugar added.  Since it the parser generated recurses, you can
do attribute-driven parsing easily and even pass information into rules
which could alter the parse.
X
I wrote wacco to give me a platform for experiment with various error
recovery schemes.  A fairly cheesy first/follow set scheme is currently
implemented.  Wacco turned out to be useful in its own right and I never
did get around to serious experimenting.
X
Wacco is written in itself.  The file "wacco.w" describes its own format
and was used to manually generate "parse.C" and "toks.h".  (The original
bootstrap version no longer exists.  Wacco has evolved considerably from
a much simpler version to the current implementation, so the old code
would be useless anyway.)  The files "parse.C" and "toks.h" are always
shipped since there's no other way to build a working wacco.
X
The file "wacco.doc" describes the wacco file format in a tty-readable
form.  "Wacco.doc.iw" is the much prettier IslandWrite version of the
document.  "Wacco.doc.ps" is the Postscript output from IslandWrite.
Wacco.1 describes only the command-line options.
X
There are few comments and lots of ugly non-OO code throughout wacco.
It had evolved from a straight C implementation and I never got around
to cleaning it up.  Sorry.
X
Wacco should port and run easily on most C++ systems.  It does need C++
2.0 of some flavor.  It's been successfully built on HP-UX s300 and s800
systems, Sparc, and 4.3BSD running on HP hardware.  You may need to
tweak some -D defines in the Makefile.  If sizeof(long) is NOT 32 bits,
you may have to perform major surgery on bitset.h and bitset.C.
X
All you should need to do is modify CFLAGS in the Makefile, then type
"make".  The Makefile should come setup for HP-UX systems.  Type "make
tst" to build a simple test program using "tgram.w".  The files to
be installed wherever you prefer are "wacco" and "wacco.1".
X
The code is somewhat commented.  Feel free to hack away, add new
features, or fix my screwups.  If you make mods you feel are useful, or
fix some bug, please send me the cdiffs so I can make them available to
others too.
X
X
X
X	-- Parag Patel <parag at sde.hp.com>
SHAR_EOF
chmod 0444 README ||
echo 'restore of README failed'
Wc_c="`wc -c < 'README'`"
test 2899 -eq "$Wc_c" ||
	echo 'README: original size 2899, current size' "$Wc_c"
fi
# ============= Makefile ==============
if test -f 'Makefile' -a X"$1" != X"-c"; then
	echo 'x - skipping Makefile (File already exists)'
else
echo 'x - extracting Makefile (Text)'
sed 's/^X//' << 'SHAR_EOF' > 'Makefile' &&
# Copyright (c) 1991 by Parag Patel.  All Rights Reserved.
# $Header: Makefile,v 1.27 91/05/17 16:29:50 hmgr Exp $
X
CXX = CC
.SUFFIXES: .C
.C.o:
X	$(CXX) $(CFLAGS) -c $<
X
# system-dependent options - use any appropriate -D<sys> macros
# -DBSD		 	for a BSD derivative (Sun)
# -Dpid_t=long		if your headers don't define a pid_t type
# -DFREE_TAKES_CHAR	if you have a free(char*) instead of free(void*) (Sun)
CFLAGS = -g
LIBS = libwacco.a
X
SRCS =	README Makefile wacco.1 wacco.doc wacco.doc.iw wacco.doc.ps wacco.w \
X	defs.h toks.h boolean.h bitset.h darray.h table.h \
X	bitset.C tgram.w tgram.good tgram.bad \
X	main.C sym.C parse.C scan.C build.C check.C read.C gen.C \
X	io.C version.C
X
OBJS =	main.o sym.o parse.o scan.o build.o check.o read.o gen.o bitset.o
X
wacco : $(OBJS) libwacco.a
X	$(CXX) $(CFLAGS) -o wacco $(OBJS) $(LIBS)
X
libwacco.a : io.o version.o
X	 ar ru libwacco.a $(?)
X	 -[ -x /usr/bin/ranlib ] && ranlib libwacco.a
X
tst: parser.o scanner.o
X	$(CXX) $(CFLAGS) -o tst parser.o scanner.o $(LIBS) $(LFLAGS) -ll
X	-./tst <tgram.bad
X	./tst <tgram.good
X
parser.C scanner.l: tgram.w wacco
X	./wacco tgram.w
X
tar: $(SRCS)
X	tar -cvf - $(SRCS) | compress >wacco.tar.Z 
X
shar: $(SRCS)
X	shar -ac -nwacco -l50 -owacco-shar $(SRCS)
X
clean:
X	rm -f wacco *.o libwacco.a wacco.tar.Z* wacch.shar* tst parser.C scanner.l
X
files:
X	@echo $(SRCS)
X
main.o : main.C toks.h defs.h boolean.h darray.h table.h bitset.h
sym.o : sym.C defs.h boolean.h darray.h table.h bitset.h
parse.o : parse.C toks.h defs.h boolean.h darray.h table.h bitset.h
scan.o : scan.C toks.h defs.h boolean.h darray.h table.h bitset.h
build.o : build.C defs.h boolean.h darray.h table.h bitset.h
check.o : check.C defs.h boolean.h darray.h table.h bitset.h
read.o : read.C toks.h defs.h boolean.h darray.h table.h bitset.h
gen.o : gen.C toks.h defs.h boolean.h darray.h table.h bitset.h
io.o : io.C toks.h defs.h boolean.h darray.h table.h bitset.h
version.o : version.C
bitset.o : bitset.C bitset.h boolean.h
SHAR_EOF
chmod 0444 Makefile ||
echo 'restore of Makefile failed'
Wc_c="`wc -c < 'Makefile'`"
test 1967 -eq "$Wc_c" ||
	echo 'Makefile: original size 1967, current size' "$Wc_c"
fi
# ============= wacco.1 ==============
if test -f 'wacco.1' -a X"$1" != X"-c"; then
	echo 'x - skipping wacco.1 (File already exists)'
else
echo 'x - extracting wacco.1 (Text)'
sed 's/^X//' << 'SHAR_EOF' > 'wacco.1' &&
.\" Copyright (c) 1991 by Parag Patel.  All Rights Reserved.
.\" $Header: wacco.1,v 1.13 91/02/22 16:04:11 hmgr Exp $
.TH WACCO 1 unsupported
.ad b
.SH NAME
wacco \- why another compiler-compiler?
.SH SYNOPSIS
.B wacco
.RB [ -dciOCL ]
.RB [ -h
header]
.RB [ -p
parser]
.RB [ -s
scanner]
[file]
.SH DESCRIPTION
.I Wacco
is another compiler-compiler.
(Why another compiler-compiler you may ask?  Why not!)
It has some rather convenient features with a lot of
syntactic sugar tossed on top over what
.IR yacc (1)
provides.
.PP
Unlike
.IR yacc (1),
.I wacco
generates a top-down recursive-descent LL(1) parser instead
of a bottom-up LALR parser.
Although
.I wacco
generated parsers handle a smaller class of grammars than
.IR yacc (1),
in practice, there is rarely any need for a full LALR parser.
It is much easier to deal with error recovery in a top-down parser.
It is also possible to re-direct and even completely alter the
parse on the fly, as well as perform attribute-driven parsing.
.PP
.I Wacco
generates a parser that automatically attempts to resync on
errors based on some heuristics on the first
and follow sets of non-terminals.
Admittedly this is a far from optimal error-handling system,
but it is much better that what
.IR yacc (1)
provides (skip X tokens, then continue!).
Future versions of
.I wacco
may provide much more intelligent error-recovery systems.
.PP
.I Wacco
also allows using its parser in an attribute-driven manner.
Information may be passed down to the right-hand side of an
expression even though that expression hasn't yet been parsed.
Different rules may have different types associated with them.
The C++ compiler will perform the type-checking for you!
No more funny unions and hoping that you didn't make a mistake!
.PP
Token values do not have to be explicitly defined.
String and character tokens may be specified implicitly as well,
rather than creating a dummy symbol for them.
.I Wacco
will generate a header file containing definitions for
all the tokens.
.PP
There is support for a somewhat smarter scanner.
Errors will be (hopefully) printed out in a clear and simple
manner.
.PP
.I Wacco
currently generates only C++ code.
Some day it may optionally generate straight C code as well
(but don't hold your breath).
.PP
The grammar format is described in the
.I wacco
documentation since it is too lengthy to repeat here.
See the
.I wacco.doc
files for more information.
.SS Options
.I Wacco
expects a grammar on stdin if
.I file
is not specified on the command line.
It will generate the files
.I parser.C
and
.I tokens.h
by default.
If there is a scanner section in the input file,
then the file
.I scanner.l
will also be generated.
.TP
.B -d
Dump mode.
Only prints (somewhat) interesting information about what
.I wacco
thinks the grammar looks like, first and follow sets, and other
miscellaneous stuff.
.TP
.B -i
Do not generate code for scanning case-insensitive strings.
If the
.B "string"
construct is used in the grammer source,
.I wacco
will normally generate code like
.BR [Ss][Tr][Ii][Nn][Gg] .
This option inhibits such behavior to allow exact matches.
.TP
.B -c
Normally,
.I wacco
will generate temporary output files and then compare them with
the originals.
The originals are replaced only if the new files are different.
This is very handy for use inside makefiles, where doing things
like this gets ugly.
This option turns off this feature and always generates the
output files.
.TP
.B -O
Turns off optimization.
Normally,
.I wacco
expands non-terminals that are only used once in the code rather
than creating functions for them.
If you use the "return" operator, you must use this option for now.
If you use the "$?" construct, optimization will be automatically
turned off.
.TP
.B -C
Do not output the imbedded user code within the grammer.
This generates a parser that either accepts or rejects
its input, only printing errors.
It is handy for verifying a grammer.
.TP
.B -L
Do not generate the "#line" entries for the original
.I wacco
source file within in the parser.
.TP
.BI "-h " header
Create a file named
.I header
instead of the default "tokens.h".
.TP
.BI "-p " parser
Create a file named
.I parser
instead of the default "parser.C".
.TP
.BI "-s " scanner
Create a file named
.I scanner
instead of the default "scanner.l".
.SH FILES
wacco.doc wacco.doc.iw wacco.doc.ps
.br
tokens.h scanner.l parser.C ./.wacco.tmp
.SH NOTES
The scanner generated may be ``compiled'' by either
.IR lex (1)
or
.IR flex (1),
although
.I flex
is highly recommended.
.SH AUTHOR
Copyright (c) 1991 by Parag Patel.  All Rights Reserved.
SHAR_EOF
chmod 0444 wacco.1 ||
echo 'restore of wacco.1 failed'
Wc_c="`wc -c < 'wacco.1'`"
test 4588 -eq "$Wc_c" ||
	echo 'wacco.1: original size 4588, current size' "$Wc_c"
fi
# ============= wacco.doc ==============
if test -f 'wacco.doc' -a X"$1" != X"-c"; then
	echo 'x - skipping wacco.doc (File already exists)'
else
echo 'x - extracting wacco.doc (Text)'
sed 's/^X//' << 'SHAR_EOF' > 'wacco.doc' &&
Copyright (c) 1991 by Parag Patel.  All Rights Reserved.
<< $Header: wacco.doc,v 1.25 91/02/22 16:04:23 hmgr Exp $ >>
X
<< Please see the wacco(1) man page for details on its usage. >>
<< Only the grammar format is described here.                 >>
X
X
The underlying philosophy in wacco is that the code generated should be
exactly like that someone would generate by hand, if they were writing a
recursive-descent compiler manually.
X
X
X
The basic grammar file format is:
X
X	/* C style comments */
X	%opt <directives>
X	{ <header> }
X	<rules>         // C++ style comments
X	$$
X	<scanner>
X
Wacco directives may be placed on the optional "%opt" line at the top of
the source grammer.  Only one such line is allowed in the grammer
source, and it MUST be first in the source.  The directives are actually
the command-line options for wacco!  Options may thus be set either on
the command line, or in the wacco source itself.  The entire "%opt" line
is parsed as if it were the command line.  Please see the man page for
descriptions of the command-line options.
X
The header section (which is optional) is a set of code in curly-braces
{} that is put at the top of the output parser.C file.  This a the place
to include files, define classes, or setup global variables.  Naturally,
there are no curlies {} if there is no need for a header section.
X
The scanner section (the two "$$" and everything after) is entirely
optional.  It is included in the grammar file to make it easy to refer
to the actual values of tokens without explicitly defining those values.
X
Without any of the optional parts, a grammer consists only of rules.
X
X
X
The rules look much like those of yacc at first glance but there are
some interesting differences.  A rule looks like:
X
X	ID <TYPE> : stuff ;
X
The ID on the left-hand side is a non-terminal and so is eventually
turned into a function.  The TYPE is the type that this function will
accept in and return as a reference argument.  It is optional and must
be in angle-brackets <> if present and assumed to be "int" if not.  It
can be used to pass information into a function (rule) or to get
information out of it.
X
X	ID : stuff ;
X	ID <TYPE> : stuff ;
X
A vertical-bar "|" may be used to avoid duplicating the left-hand side:
X
X	ID : stuff1 ;
X	ID : stuff2 ;
X
is equivalent to
X
X	ID : stuff1 | stuff2 ;
X
X
<< For the rest of this document, the conventions are that terminals will be
X   in uppercase and non-terminals in lowercase. >>
X
X
The "stuff" on the right-hand side can get kind of interesting.  Like,
yacc, this is basically a list of terminals or non-terminals that are
expected in sequence.
X
X	parenexpr : LPAREN expr RPAREN ;
X
X
Terminals can be described in several different ways.
X
Simple character tokens are straight-forward.  Their token value is
always that of the character they represent.  The null character '\0'
may not be used as a token - its value used for other things internally.
X
X	parenexpr : '(' expr ')' ;
X
For more complicated strings, just use the strings themselves!
X
X	parenexpr : "<<" expr ">>" ;
X
The same string may be used in other rules to refer to that token.
X
Also, any identifier name may be used to define a terminal.  If that id
does not appear on the left side of a colon `:', then it is assumed to
be a terminal symbol in the grammar.
X
Token codes for terminals are automatically assigned and stored in the
"tokens.h" header file.  The token value of a string is pretty much
inaccessible.  A character constant will be its own token.  Any other
terminal name like LPAREN above will be in the header file as an enum
with the same name.
X
X
X
Actions (code) is imbedded anywhere on the right-hand side within pairs
of curly-braces {}.
X
X	parenexpr: '(' expr { $$ = $expr; } ')'
X
This introduces some other features that wacco has which yacc doesn't.
First though, the value that the non-terminal returns is always "$$".
X
The values of the right-hand side are referred to directly via their
symbolic names.  Thus we use "$expr" instead of "$2" in yacc!  Also,
"expr"s must return "int"s or the C++ compiler will complain!
X
Wacco generates an appropriate temporary variable if and only if it is
used by referring to a "$$" inside some code for that rule.  Thus
parenexpr above will have an in/out argument defined for it.  If there
were no code in {}, then parenexpr wouldn't be passed anything at all.
X
X
Actually, The TYPE specifier of a non-terminal may actually be a lot
more complicated than just a simple type:
X
X	example <double d; int i, j> : ... { $$.d = 0.0; $$.i = 34; }
X
In this case, wacco creates a struct for this non-terminal instead of a
simple variable.  The contents of the <> are put into this struct.  This
allows passing more info in and out of a non-term without having to
create a dummy struct by hand.  It is also passed to the non-terminal
function by reference rather than copying, and thus is very efficient.
X
X	expr <int left, right> :  ...  ;
X	example : expr ';' { $$ = $expr.left + $expr.right } ;
X
Note that all exported non-terms MUST have simple types, to avoid bogus
structure naming conventions.  If you must have a complicated type
returned from a start-symbol, you should create a specially named struct
or class and use it instead.
X
Also, simple types must not be named.  The following is illegal as
well as redundant, and kind of silly anyway:
X
X	expr <int var> : ... ;
X
X
X
If we have 2 "expr"s on the right, things get a little messier:
X
X	example: '(' expr ',' expr ')'
X		{ $$ = $expr1 + $expr2; };
or
X	example: '(' expr=front ',' expr=back ')'
X		{ $$ = $front + $back; };
X
The second form introduces the ability to name (alias) one of the
right-hand side's non-terminal names!  Here we name "expr1" to be called
"front" and "expr2" to be "back" for just this particular right-hand side.
X
X
X
Since wacco generates a C++ recursive-descent parser, we can do even more
interesting things on the right.  Wacco passes the local vars to store
return values by reference.  Thus we can pass information into a rule as
well as get stuff out of it.
X
X	example: { $expr = $$; } '(' expr ')' { $$ = $expr; };
X
This initializes the temp-var used to store the return value from "expr"
to whatever was passed in to "example", then passes it to "expr".  If a
non-terminal never uses "$$", then it is assumed to not return anything,
and no temp-var will be declared nor passed into it.
X
Other things that one can do:
X
X	example: '(' { int v = 2; } expr ')' { v = $expr; };
X
and create temp vars anywhere you want.  Wacco carefully avoids
putting out unnecessary sets of blocks in the output parser file.
X
To generate incomplete blocks, and allow a wierd sort of free-form
grammar, the %{%} format may be used wherever a {} is normally used.
This allows creating incomplete blocks like so:
X
X	example: '(' %{ if (somevar) {  %} expr ')' %{ } %} ;
X
Curly-braces are not counted within %{%} blocks, and %{%} blocks
may be used wherever {} blocks are allowed.
X
X
The empty rule may not be implicitly specified is in yacc, but must
be defined with the special "[]" symbol:
X
X	null: [] ;
X	expr: '(' expr ')' | [] ;
X
An empty statement is an error in wacco to help protect against typos
and other mistakes.
X
X
X
Right-hand sides may have parentheses for grouping.  Basically, a
function must be generated for every parenthesized expression to
maintain the parsing semantics:
X
X	value: (ID | INT) | [];
X
is the equivalent of:
X
X	value: v1 | [];
X	v1: ID | INT;
X
Just like every other non-terminal, parenthesized expressions have
return values, types, aliases, and may be referred to in other parts of
the right-hand side.  The default type is the type of the enclosing
parens or left-hand side for the outer-most parens:
X
X	example (<long> ID | INT) { $$ = $_; };
X
Multiple sets of parens on the right may be refered to as "$_1", "$_2",
and so on.  They may be named as well:
X
X	example<float>: (ID | FLOAT)=num { $$ = $num; };
X
Here the parens inherit the type "float" from "example".
X
Since the left-hand side may be used on the right for recursive
functions, so may parenthesized expressions.  The names just get a
little strange.
X
X	strange: (ID (OP # #1 #2 #3 #* | []) | []);
X
The inner "#" refers to the inner-most set of parens enclosing the
"OP...".  The strings "#" and "#1" are equivalent and refer to this inner
most set of parens.  "#2" refers to the next outer parens starting the
"ID...".  "#3" and "#*" refer the the name of the left-hand side, just
for completeness.  These can be viewed as the outermost "parens" in the
expression.  Ugly but sometimes necessary.
X
X
X
Other things defined in "tokens.h" include the end-of-input token EOI
which has value 0, and the constants RETOK and RETERR, for appropriate
return values.  These have the values of TRUE (1) and FALSE (0)
respectively.  These may be used in the right-hand side of rules if it
is determined that further parsing of rules is un-necessary.
X
X	parenexpr: LPAREN expr { if ($expr == BOGUS) return RETERR; } RPAREN;
X
The return-code from various rules is always available as the magic
string "$?" directly after that particular rule is called:
X
X	parenexpr: LPAREN expr { if ($? != RETOK) return RETERR; } RPAREN;
X
The return code is overwritten with each call to a non-terminal on the
right-hand side, so if a previous return value is needed, you must save it
in some variable yourself.
X
The generated parser code does not look at the actual return value of
non-terminals (funtions), so other return values may be used if desired.
X
X
X
By default, the first rule in the grammar is considered to be the start
symbol.  Instead of calling "yyparse()" to initiate the parse, the
function to call is the name of the left-hand ID in the first rule.  It
is called with no arguments.  It returns either RETOK or RETERR
depending on whether the parse succeeded or not.
X
X	firstsymbol: . . . ;
X	. . .
X
X	main()
X	{
X		if (firstsymbol() == RETOK)
X			return OK;
X		return ERR;
X	}
X
But you don't have to have just one entry point!  Adding a "%export"
modifier after a non-terminal just before the ':' causes that symbol
to become callable from outside the grammer:
X
X	thing<mytype> %export :  . . .  ;
X
X	func() { mytype var;  return thing(var); }
X
The first non-terminal in the grammer is automatically exported unless
"%export" is used somewhere in the grammer.  Also, notice that if a
"type" is defined and used for a non-terminal, that type must be passed
in by reference to that function.
X
The "%export" feature lets you call several non-terminals in the
grammer.  This can be used to export parts of a grammer, say
sub-expression parsing, or let you put several different parsers into
one grammer file.  All exported non-terminals are also listed as
"extern"s in the "tokens.h" header file.
X
X
X
The scanner section is optional.  If there is a "$$" at the end of the
file, the rest is considered to be almost straight lex(1) source.  If
there is a "$$", every terminal must have a lex value associated with
it.  Character and string constants are self-defining.  Other
nonterminals are described in the lex section.
X
An example:
X
X	expr: LPAREN expr RPAREN | "id" | [];
X
X	$$
X
X	%%
X
X	"."		{ return (int)EOI; }
X
X	$LPAREN		"("|"["
X	$RPAREN		")"|"]"
X
X	[ \t\v\n\f]	;
X	.  { w_scanerr("Illegal character %d (%c)", yytext[0], yytext[0]); }
X
The string "id" naturally stands for itself.  LPAREN and RPAREN are
described in the lex section in a reverse order than normal.  Wacco will
convert those lines starting with a `$' into the appropriate lex output.
This is not only to make sure that all terminals are defined, but allows
defining a language without ever having to manually define token ids for
any terminal symbol!
X
The default scanner (located in -lwacco) maintains its own I/O file
pointer.  This is so that user code can implement the equivalent of
"#include" without too much work.  The functions in the scanner include:
X
X	int w_openfile(char *fname)	// open a file to the specified name
X
X	void w_closefile()		// close the last opened file
X
X	void w_setfile(FILE *f)		// set the current file to this
X
X	FILE *w_getfile()		// return the currently opened file
X
X	int w_currcol()			// the current column in the input
X
X	int w_currline()		// the current line in the input
X
X	char *w_getcurrline()		// the text of the current line
X
X	int w_input()			// basic I/O routines which are
X	int w_unput(int c)		// to be used by the scanner
X	void w_output(int c)
X
X
You should call either w_setfile() or w_openfile() before starting the
parse or the default scanner will probably dump core.
X
X
The functions that the parser expects to have available are:
X
X	int w_gettoken()	// get the next token - usually calls yylex()
X				// - must return EOI on end-of-input
X
X	int w_scanerr()		// printf-type error printing routine
X				// - must always returns RETERR
X				// - is called with a NULL argument
X				// when just skipping a token in the input
X
These are either are in the wacco library -lwacco, or must be provided
by the user.
X
The default w_scanerr() will try to print the line that had the error,
and underneath it print "^" where the error occurred and "*" where
tokens were skipped when re-syncing.  Because of some lex(1) funnies,
this doesn't always work as expected.  When I do away with the need for
lex, this won't be a problem anymore.
X
X
Some other convenient functions defined in parser.C include:
X
X	int w_nexttoken() // return the value of the next token but don't
X			// scan it yet - calls gettoken() at most once
X			// - useful for token lookahead
X
X	void w_skiptoken()	// scan the current token - the next call to
X				// nexttoken() will actually read another token
X
X	char *w_tokenname(int tokid)	// return the string name of a token
X					// whose id is tokid
X
These are only really useful if you are writing your own scanner instead
of using lex.  Nexttoken() and skiptoken() can also be used to somewhat
direct the parse.  If you provide your own infinite push-back stack of
tokens, you can completely alter the parse at run-time!
X
The program flex(1) may be used instead of lex(1) if desired, and is
highly recommended.
X
The extern for "yytext" is automatically declared in parser.C.
Unfortunately, it may be wrong for the scanner generator actually being
used.  To change the definition, the macro YYTEXT_DECL may be redefined
at the top of your wacco grammer if you wish to use flex:
X
X	{
X	#undef YYTEXT_DECL
X	#define YYTEXT_DECL char *yytext
X	}
X	...
X
X
X
My original plan was to write a scanner-generator directly into wacco,
but since flex(1) is now available, which is very fast and generates
excellent scanners, I now have no plans to do anything to the scanning
parts of wacco.
X
X
X
X		--  Parag Patel
X
X
X
================= E X A M P L E    G R A M M E R ==================
X
// This is the usual required calculator sample.  It can still use
// a LOT of work, but it illustrates the basics.  Note that the
// precedence of operators is all wrong.
X
{
#include <stdio.h>
#include <stdlib.h>
}
X
calc
X	:	%{
X			while (w_nexttoken() != EOI) {
X		%}
X	    expr ([] | '=' | ';' | ',')
X		%{
X			printf("%f\n", $expr);
X			}
X		%}
X	| []
X	;
X
expr<double>
X	:	term { $binop_expr = $term; } binop_expr { $$ = $binop_expr; }
X	;
X
binop_expr<double>
X	:	'+' expr { $$ += $expr; }
X	|	'-' expr { $$ -= $expr; }
X	|	'*' expr { $$ *= $expr; }
X	|	'/' expr { $$ /= $expr; }
X	|	'&' expr { $$ = (int)$$ & (int)$expr; }
X	|	'|' expr { $$ = (int)$$ | (int)$expr; }
X	|	'^' expr { $$ = (int)$$ ^ (int)$expr; }
X	|	"<<" expr { $$ = (int)$$ << (int)$expr; }
X	|	">>" expr { $$ = (int)$$ >> (int)$expr; }
X	|	"&&" expr { $$ = $$ && $expr; }
X	|	"||" expr { $$ = $$ || $expr; }
X	|	[]
X	;
X
term<double>
X	:	DOUBLE { $$ = atof((char *)yytext); }
X	|	'-' expr { $$ = -$expr; }
X	|	'~' expr { $$ = ~(int)$expr; }
X	|	'!' expr { $$ = !$expr; }
X	|	'(' expr ')' { $$ = $expr; }
X	;
X
{
X	main()
X	{
X		w_setfile(stdin);
X		calc();
X	}
}
X
$$
X
D	[0-9]
L	[_A-Za-z]
X
%%
X
"."		{ return (int)EOI; }
X
$DOUBLE		({D}+)|({D}+\.{D}+)|({D}+[Ee]-?{D}+)|({D}+\.{D}+[Ee]-?{D}+)
X
"#".*$		;
X
[ \t\v\n\f]	;
.	{ w_scanerr("Illegal character %d ($c)", yytext[0], yytext[0]); }
SHAR_EOF
chmod 0444 wacco.doc ||
echo 'restore of wacco.doc failed'
Wc_c="`wc -c < 'wacco.doc'`"
test 15987 -eq "$Wc_c" ||
	echo 'wacco.doc: original size 15987, current size' "$Wc_c"
fi
true || echo 'restore of wacco.doc.iw failed'
echo End of part 1, continue with part 2
exit 0

exit 0 # Just in case...
-- 
Kent Landfield                   INTERNET: kent at sparky.IMD.Sterling.COM
Sterling Software, IMD           UUCP:     uunet!sparky!kent
Phone:    (402) 291-8300         FAX:      (402) 291-4362
Please send comp.sources.misc-related mail to kent at uunet.uu.net.