TRC - expert system building tool (part 3 of 8)

sources-request at panda.UUCP sources-request at panda.UUCP
Sat Feb 8 22:22:12 AEST 1986


Mod.sources:  Volume 3, Issue 111
Submitted by: ihnp4!dicomed!ndsuvax!nckary (Daniel D. Kary)

This is NOT a shell archive.  Simply delete everything up to and including
the cut mark and save the result as reference.1.doc.

Dan Kary
ihnp4!dicomed!ndsuvax!nckary

-------------- cut here ---------------








                  The TRC Reference Manual


                       Daniel D. Kary

               North Dakota State University
                Computer Science Department
                      300 Minard Hall
                     Fargo, ND   58102


                          _A_B_S_T_R_A_C_T


          The syntax of TRC is formally  defined.   The
     output of TRC is elucidated.



            TABLE OF CONTENTS

     PART ONE - INPUT
           1. INTRODUCTION
           2. OVERVIEW
           3. LEXICAL ELEMENTS
           4. DEFINITIONS
           5. SHORT TERM MEMORY
           6. LONG TERM MEMORY
           7. OPTIMIZER

     PART TWO - OUTPUT
           8. OVERVIEW
           9. COMMON PROCEDURES
          10. DATA OBJECTS
          11. MANIPULATING THE DATA
          12. TRANSLATING RULES
          13. OPTIONS

     APPENDICES
           A. TRC GRAMMAR
           B. ERROR MESSAGES
           C. STYLE NOTES
           D. SAMPLE PROGRAM


_1.  _I_N_T_R_O_D_U_C_T_I_O_N

     TRC is a programming language that is useful for build-
ing expert systems.  It is presumed that the reader is fami-
liar with expert systems in general and has  used  at  least
one expert system building tool.  Some terms that are widely









                           - 2 -


used in describing expert  systems  have  specific  meanings
when used to describe TRC and will be defined now.

     The set  of  situation-action  rules  that  embody  the
knowledge  an expert uses to solve a problem are referred to
as Long Term Memory (LTM).  The information  that  may  vary
with  each  instance  of the problem is referred to as Short
Term Memory (STM).  The code which determines if the  situa-
tion part of a rule is true will be called a pattern matcher
or a matcher.  The  code  which  determines  which  rule  to
activate  will  be  called  an inference engine and includes
both the matcher and the LTM.  The input to the TRC compiler
is called a specification.

     The input to the TRC compiler is a rule based language.
The  output is a set of C language files.  The procedures in
the C language files output by the TRC compiler collectively
implement  the  inference engine.  An inference engine is to
an expert system as a parser is to a compiler: it is of cen-
tral  importance  but it does not comprise a complete imple-
mentation.  TRC does not provide code for  interaction  with
the  user, but does permit the programmer to easily add this
code.

     This document is divided into two parts and  a  set  of
appendices.  The  first part presents a formal definition of
the input language with examples of each  language  feature.
The second part describes the output of the TRC compiler and
includes some important insight on integrating TRC generated
code with other C language code.  The appendices include the
complete TRC grammar, a listing and explanation of  all  the
error  messages that TRC might produce and a sample specifi-
cation.


                      PART ONE - INPUT


_2.  _O_V_E_R_V_I_E_W

     Every specification file consists of five sections, the
header,  definitions,  short  term  memory (data base), long
term memory (rules) and the  trailer.   These  sections  are
separated  by double percent characters.  The form of a full
specification is illustrated below.   The  header,  STM  and
trailer are optional so the minimum specification would con-
tain only the definitions and LTM.  All of  the  "%%"  marks
must be present in each specification file.


     header
     %%
     definitions
     %%









                           - 3 -


     STM
     %%
     LTM
     %%
     trailer


     The purpose of the header and trailer  sections  is  to
permit  the  inclusion  of C language code in the specifica-
tion.  The header and trailer are each composed of a  single
lexical  element called a c-code which is defined in section
3.  Separate sections are devoted to each of  the  remaining
parts of a TRC specification.

_3.  _L_E_X_I_C_A_L _E_L_E_M_E_N_T_S

     A program consists of a  single  file.   A  file  is  a
sequence  of lexical elements composed of characters.  Char-
acters may be one of these classes; (1)  upper-case-letters,
(2)  lower-case-letters,  (3) digits, (4) special characters
(5) separators (6) embedded characters and (7) other charac-
ters.

     (1) upper-case-letters
          A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

     (2) lower-case-letters
          a b c d e f g h i j k l m n o p q r s t u v w x y z

     (3) digits
          0 1 2 3 4 5 6 7 8 9

     (4) special characters
          " ( ) : ; = < > ! ^ _ $ . { } / * %

     (5) separators
          space tab newline

     (6) embedded characters
          \n   embedded-newline
          \t   embedded-tab
          \b   embedded-backspace
          \r   embedded-carriage-return
          \f   embedded-form-feed
          \\   embedded-back-slash
          \"   embedded-quote

     (7) other characters
          @ # & + [ ] ` ~ ' | , ?

The following names are used when referring to special characters:

     Symbol    Name      Symbol    Name










                           - 4 -


     "    quote               :    colon
     ;    semicolon      =    equal
     !    exclamation         ^    hat
     _    underscore          $    dollar
     <    less-than      >    greater-than
     (    left-paren          )    right-paren
     {    left-bracket        }    right-bracket
     *    asterisk       %    percent
     -    minus


     A lexical element is a either a delimiter, an  identif-
ier,  an integer literal, a floating point literal, a string
literal, a comment or a c_code.  In some cases  a  separator
is  required  between  lexical  elements,  specifically when
adjacent lexical elements could be interpreted as  a  single
lexical  element.   A separator is any of space, tab or new-
line.  One or more separators are permitted between any  two
lexical  element, before the first lexical element and after
the last lexical element.

_3._1.  _D_E_L_I_M_I_T_E_R_S

     A delimiter may be one of the following special charac-
ters:

     : ; " . ^ - ( ) { } < >

     A delimiter may be one of the following compound delim-
iters.  Each  compound delimiter is composed of two adjacent
special characters.

     => %% != == >= <=

     The following names are used when referring to compound
delimiters:

     Delimiter Name

     =>        arrow
     %%        delim
     !=        not-equal
     ==        equality
     >=        greater-than-or-equal
     >=        less-than-or-equal

_3._2.  _I_D_E_N_T_I_F_I_E_R_S

     Identifiers are used as tokens and as  reserved  words.
Separators are not allowed in an identifier.  The underscore
character is the only special character that may be part  of
an identifier.

     identifier  ::=  letter { underscore | letter | digit}









                           - 5 -


     letter      ::=  upper-case-letter | lower-case-letter

Examples:

     PENNY          Get_Stuff x1
     ThisOne        WrZ_123        etc_

The identifiers that are reserved words are:

     ADD            MARK      PROFILE
     BACKTRACK NORECURS  RECURS
     DUMP      NOT            SAVE
     EMPTY          OPTIMIZE       STRING
     FLOAT          POINTER        TRACE
     INT            PREFIX         ZERO

_3._3.  _N_U_M_E_R_I_C _L_I_T_E_R_A_L_S

     There are two  classes  of  numeric  literals,  integer
literals  and  floating  point  literals.  The presence of a
decimal point distinguishes  floating  point  literals  from
integer literals.

     floating-point-literal  ::=  [ minus ] digits dot digits
     integer-literal             ::=  [ minus ] digits
     digits              ::=  digit { digit }

Example integer literals:

     1   -33   187

Example floating point literals:

     0.5   -3.14159    6.0

_3._4.  _S_T_R_I_N_G _L_I_T_E_R_A_L_S

     A string literal is formed by a sequence of  characters
(possibly  zero)  enclosed between quote characters.  Any of
the six classes of characters may be embedded in a string.

     string-literal  ::=  quote { [ character ] } quote

Examples:
     ""
     "\"recursion\""
     "these characters can be in a string $, ! => etc_\n"

_3._5.  _C_O_M_M_E_N_T_S

     Comments may be included anywhere  in  the  input  file
that  separators  or  delimiters may occur.  Comments follow
the style of comments in the C language.  Comments  may  not
be  nested  within  comments.   Any  of  the  six classes of









                           - 6 -


characters may be embedded in a comment.

comment  ::= slash asterisk { [ character ] } asterisk slash

Examples:
     /*  a simple comment */

     /*
       a multi-line
       comment
     */

     /*******************
      * A Fancy Comment *
      *******************/

_3._6.  _C__C_O_D_E

     A c_code is a fragment  of  C  language  code  that  is
embedded  in  the input file.  A c_code is recognized by the
scanner as a single lexical item.  The C language itself  is
not  parsed by TRC.  A c-code may not contain a procedure or
function.

c_code   ::=  left-bracket { [character] | [c_code] } right-bracket

Example:

     {
          if(condition){
               action(argument);
               another_action();
          }
     }

_4.  _D_E_F_I_N_I_T_I_O_N_S

     Each entity that can be referred to by a TRC rule  must
be  defined  in the definition section.  Each entity that is
defined is called an _o_b_j_e_c_t.  Objects may  have  numeric  or
string values associated with them.  These associated values
are called _e_l_e_m_e_n_t_s of the object.  There are two  forms  of
definitions.   There is a simple form for objects which have
no elements and an extended form for objects that have asso-
ciated elements.

     definition  ::=  identifier
     definition  ::=  identifier left-paren item-list right-paren
     item-list   ::=  { [ item ] }
     item        ::=  identifier colon type
     type     ::=  INT | FLOAT | STRING | POINTER

     For each object there  will  be  an  associated  object
count  in  the  output  code, which represents the number of









                           - 7 -


objects of that type that exist at any point in  time.   For
each  object  with  at  least  one element, there will be an
associated structure and list of objects of that type in the
output  code.  The  set  of  object  counts and object lists
defined in the definition section represent the STM that the
system of rules may refer to.

     Each element that is defined for a  given  object  must
have  a  data  type  specified.   Strong  type  checking  is
enforced throughout TRC.   Comparisons  and  assignments  of
elements must involve elements or literals of the same type.
There is no coercion of types.  The INT, FLOAT,  STRING  and
POINTER types are described in section 10.  The POINTER type
is a pointer to a structure of the type of the  object  that
contains it.

Examples:

     A
     b_b (This : INT)
     CAST ( THAT      : INT
            The_Other : FLOAT)

_5.  _S_H_O_R_T _T_E_R_M _M_E_M_O_R_Y

     The short term memory (stm) section  of  the  input  is
where  the initial state of the working memory is specified.
The intention of this  section  is  to  permit  the  working
memory  to be initialized to some state that may be required
for each invocation  of  the  expert  system.   It  is  also
intended  to  serve as a way of entering test data while the
expert system is being developed,  before  data  entry  pro-
cedures are developed.

     stm        ::=  { [ entry ] }
     entry      ::=  [ integer-literal ] identifier
     entry      ::=  [ integer-literal ] identifier
               left-paren { [ init-item ] } right-paren
     init-item  ::=  identifier arrow value
     value      ::=  integer-literal
     value      ::=  floating-point-literal
     value      ::=  string-literal

     The short term memory section is a list of objects that
are to be entered into the working memory.  If an object has
one or more elements, those elements can be  initialized  to
user  specified  values.  Numeric values that are not speci-
fied are initialized to zero and string values that are  not
specified  are initialized to an empty string.  The integer-
literal that may precede the name (identifier) of the object
specifies  how  many objects of that type are to be added to
working memory with the given element values, e.g.:

                      /* definition part */









                           - 8 -


     A (A1 : STRING
        A2 : FLOAT)
     B (B1 : STRING
        B2 : FLOAT)
     %%
                      /* short term memory */
       A ( A1 => "string")  /* It is not necessary to
                            initialize all the elements
                      in an object.
                      */
     2 A ( A2 => 3.34
           A1 => "thing")   /* Nor is it necessary to
                      initialize elements in the
                      order they were declared.
                      */
     3 B              /* It is not necessary to
                      initialize the elements
                      at all.
                      */
     %%




_6.  _L_O_N_G _T_E_R_M _M_E_M_O_R_Y

     Long   term   memory   is   the   section   where   the
situation/action  rules  are  enumerated.   This section may
begin with a listing of options that are to  be  turned  on.
All options in this section can also be specified by command
line flags.  Since the syntax for the long term memory  sec-
tion is more complex than for the other sections, it will be
presented in several parts.

_6._1.  _O_P_T_I_O_N_S

     The long term memory is composed of two  sections,  the
options and the rules.

     ltm     ::=  { [option] } { rule }
     option  ::=  ZERO  |  PROFILE  |  BACKTRACK
            |  DUMP  |  RECURS  |  NORECURS
            |  SAVE  |  TRACE  |  PREFIX identifier

_6._1._1.  _Z_E_R_O

     The ZERO option directs the compiler to generate a pro-
cedure  that  will free all the dynamic structures allocated
by TRC generated code.  This feature is useful when develop-
ing  inference  engines that will be entered more than once.
It is often necessary to remove the 'leftovers' from a  pre-
vious execution before beginning a new execution.











                           - 9 -


_6._1._2.  _P_R_O_F_I_L_E

     The PROFILE option directs  the  compiler  to  generate
code  to profile the execution of the inference engine and a
procedure to print a summary of that profile.  The profiling
code counts the number of times that each rule searches some
part of STM and how many times each rule is fired.

_6._1._3.  _B_A_C_K_T_R_A_C_K

     The BACKTRACK option directs the compiler  to  generate
an  inference  engine  that  will  backtrack when no rule is
true.  Backtracking is accomplished by undoing  the  actions
of  the last rule that fired and continuing to test rules as
if the undone rule had never fired.

_6._1._4.  _D_U_M_P

     The DUMP option directs the compiler to  generate  pro-
cedures  that will print the contents of STM on the standard
output.  The intention of this option  is  to  simplify  the
process  of  developing  and debugging rules.  By having the
DUMP  procedures  generated  automatically,  the   knowledge
engineer  is freed of the mundane task of writing procedures
to display the current state of  the  STM.   The  DUMP  pro-
cedures are not intended to serve as the output of an expert
system.   Appropriate  output  routines  will  have  to   be
developed  by  the  knowledge  engineer after the rules have
been written.

_6._1._5.  _R_E_C_U_R_S

     TRC will generate code that uses one of two  strategies
for  searching  STM.   These strategies (detailed in section
6.3.3) are called LINEAR and RECURSIVE.  The LINEAR strategy
is  the  default.   The  RECURS directive in the option part
directs the compiler to use the RECURSIVE  strategy  as  the
default.   It  is  possible to override the default on a per
rule basis.  Overriding the default is discussed in  section
6.3.3.

_6._1._6.  _N_O_R_E_C_U_R_S

     The NORECURS option directs the  compiler  to  use  the
LINEAR  search  strategy  in  all  rules,  unless  otherwise
directed.  Since this is the default condition,  it  is  not
necessary to use this option.

_6._1._7.  _S_A_V_E

     The SAVE option directs the compiler to  generate  pro-
cedures  to save all objects which are dynamically allocated
by TRC code on a file.  The compiler will also generate pro-
cedures   which   can   restore  the  dynamically  allocated









                           - 10 -


structures from the previously written files.  The intention
of this option is to simplify the development of expert sys-
tems with checkpointing and  restarting  capabilities.   The
procedures  generated  by  this  option and the use of those
procedures is described in section 13.6 and Appendix C.

_6._1._8.  _T_R_A_C_E

     The TRACE option directs the compiler to trace the exe-
cution  of the inference engine by maintaining a list of the
rules that have been fired in the  order  they  were  fired.
This  list  can  be  used  to produce an explaination of the
actions taken by the expert system.

_6._1._9.  _P_R_E_F_I_X

     The PREFIX option directs the compiler to use the iden-
tifier  that  follows the reserved word 'PREFIX' as a prefix
for all data objects and procedures generated by  TRC.   The
intention  of  this  option is to facilitate building expert
systems that have more than one inference engine.  Supplying
different  prefixes  for  each inference engine insures that
there will be no name conflicts between  separate  inference
engines, e.g.:

     PREFIX X_


_6._2.  _R_U_L_E_S

     The second section if LTM is the list of  rules.   Each
rule  has  a  label,  which can be supplied or automatically
generated by TRC.  The label is used whenever it  is  neces-
sary  or convenient to refer to the rule by name.  The label
is followed by the situation part  (described  in  the  next
section).   The  situation part is a list of statements fol-
lowed by the arrow delimiter.  The action part (described in
the section following the description of the situation part)
follows the arrow delimiter and is itself a list  of  state-
ments.   The  action  part  is followed by a semicolon which
terminates the rule.

     rule   ::=  label situation arrow action semicolon
     label  ::=  identifier colon  |  colon

_6._3.  _S_I_T_U_A_T_I_O_N

     The situation part specifies how STM is to be  searched
and what must be present in STM for the situation part to be
true.


     situation   ::=  { [ s-option ] } { [ match ] }
     s-option    ::=  EMPTY identifier identifier









                           - 11 -


     s-option    ::=  RECURS  |  NORECURS
     match       ::=  [ integer-literal ] identifier
     match       ::=  NOT identifier
     match       ::=  [ integer-literal ] left-paren name
                match-list right-paren
     match       ::=  c-code
     name     ::=  hat identifier identifier
     match-list  ::=  { match-item }
     match-item  ::=  identifier dot identifier relop literal
     match-item  ::=  identifier dot identifier relop
                      identifier dot identifier
     relop       ::=  equality  |  not-equal  |  less-than
     relop       ::=  greater-than  |  greater-than-or-equal
     relop       ::=  less-than-or-equal

_6._3._1.  _M_A_T_C_H_I_N_G

     It is necessary to understand how matching is specified
before  the  s-option part can be explained.  A match, which
will also be referred to as a test, is a statement  of  what
the  inference  engine  is to search for in STM.  Assume the
following objects were defined in the definition section:

     %%
     A (A1 : INT
        A3 : INT
        A2 : STRING)
     B (B1 : INT
        B2 : STRING)
     %%

     The simplest match specifies only the object that  must
be  present.   A  search  for  one  object of type A and one
object of type B can be specified as follows:

     A  B

     A search for two objects of type A and two  objects  of
type  B  can be specified in many ways, including these four
equivalent ways:

          A A B B

          2A 2 B

          A B A B

          A A 2B

     The objects can be listed in any order and may be  pre-
ceded  by  an integer literal.  The integer literal specifys
how many objects of the named type are to be search for.  In
one  of  the examples there is a space between the count and
the object name and in other  examples  there  is  no  space









                           - 12 -


between  the count and the object.  Spaces are required only
when there would be a conflict without a space.   Since  the
string  "2A"  (for  example)  begins  with  a  digit,  it is
presumed to be a numeric literal.  Since "A" is not a digit,
the  numeric  literal ends at that point.  Since the numeric
literal contained  no  decimal  point,  it  is  an  integer-
literal.  The string is therefore lexically equivalent to "2
A".

     The reserved word NOT is used to explicitly test for an
empty  list.   The following match statement will be true if
there are no objects of type A in STM:

     NOT A

     Any rule which contains a search for an  object  and  a
test  for that same list being empty can never be true.  TRC
generates an error message in this  situation  because  even
though  it  is syntactically correct, it is in fact meaning-
less, e.g.:

     A
     NOT A

     Very often it is necessary to search STM based not only
on  the  type of object, but also based on the values of the
elements of the object.  This is specified by placing a list
of the element values after the element name.

     (A.A1 == 2)
     (B.B1 != 3
      B.B2 <= "THIS")
     (A.A2 == "HERE"  A.A1 > 6)

     These three statements can be  translated  as  follows:
First  search  the  A list for an object whose element A1 is
equal to two, then search the B list  for  an  object  whose
element  B1  is  not  equal to three and whose element B2 is
less than or equal to "THIS", finally search the A list  for
an object whose element A2 is equal to "HERE" and whose ele-
ment A1 is greater than six.  This situation part  would  be
true  if  all  three objects were found in STM, otherwise it
would be false.  In the first match only the value of A1  is
specified.  Only the elements that are specified are tested,
the values of any other elements that the object may contain
are  not  considered.  Association of parameters is by name,
so it is not necessary to list elements in  the  order  they
were  declared.   The  third  match statement in the example
above lists the value of element A2 first, even though  ele-
ment A1 was declared first.

     The final case that must  be  considered  is  the  case
where it is necessary to search STM for an object whose ele-
ments are to be tested against the result of  some  previous









                           - 13 -


search.  To do this it is first necessary to name the object
that is being searched for so that it may later be  referred
to, e.g.:

     (^A FIRST
       A.A1 == 2)
     (B.B1 == FIRST.A3)
     (A.A1 != A.A3)

     The first statement begins with a hat character.   This
indicates  that this object is to be named.  The hat charac-
ter is followed by the object type and it's name.  The  name
is followed by a list of the elements to search for, in this
case a search for element A1  equal  to  two  is  specified.
This  statement  can  be translated as follows: Search the A
list for an object whose element A1 is equal to two and name
that  object "FIRST". A name that is applied to an object is
called a free variable.  The scope of a free variable is the
current  rule.   Free variable names can be reused in subse-
quent rules.

     The second statement specifys that the B list is to  be
searched for an element B1 whose value is equal to the value
of the element A3 found in the previous statement.  The free
variable name makes it possible to refer to previously found
elements.

     The third statement, while looking innocent enough,  is
radically  different from all previous examples.  In all the
previous examples the exact value that  was  being  searched
for  was  known  before  the  search  began.  That value was
expressed as either a literal, or the value of some  element
that  was found in a previous test.  In the third statement,
the A list is being searched for an object whose elements A1
and  A3 are not equal to one another.  The values these ele-
ments are to have are not specified, only their relationship
to one another.  This can be further complicated:

     (^A Second
       A.A1 == 3)
     (A.A1 < Second.A3
      A.A3 < A.A1)

     In the second match statement of  this  example  A1  is
being  compared  to  the  value  of  A3  in the object named
"Second" and it is being compared to the value of  the  ele-
ment  A3  in  the  object  that contains it.  An element may
appear on the left hand side of the relational operator only
once in a given match statement.  It is now possible to con-
sider the effects of the options.













                           - 14 -


_6._3._2.  _O_P_T_I_O_N_S

     The situation part begins with a (possibly empty)  list
of  options.   The  reserved  words  RECURS  or NORECURS may
appear in the option part of the situation.  The  appearance
of  one  of these words causes the named strategy to be used
rather than the current default strategy. It is not an error
to  explicitly  specify  the  default  strategy,  but  it is
unnecessary.  The option part  of  the  situation  may  also
include  EMPTY  statements.   An EMPTY statement is a static
object declaration.  The intention of the EMPTY statement is
to  provide  a means of passing data from STM to embedded c-
code and from embedded c-code to STM.  Examples  in  section
12 will illustrate these actions.

_6._3._3.  _S_E_A_R_C_H _S_T_R_A_T_E_G_I_E_S

     A small example provides an easy way to illustrate  the
two  search  strategys.   This  example  is  a  complete TRC
specification, though not useful for anything other than  as
an example.

     %%
     PENNY (MINT : STRING  DATE : INT)
     %%
     PENNY (MINT => "DENVER"
            DATE => 1964)
     PENNY (DATE => 1966)
     %%
     R1:
          (^PENNY First)
          (PENNY.DATE <= First.DATE)
          =>
          MARK PENNY
          ;
     %%

     STM will be initialized to contain two objects of  type
PENNY, the first minted in Denver in 1964, the second minted
in an unspecified location in 1966.  Since the reserved word
RECURS does not appear in either option section, the default
LINEAR search strategy will be used.

     In the LINEAR strategy, STM is  searched  in  a  linear
fashion  for  each  object  specified in the situation part.
Objects are searched for in the order they are  listed.   In
this  example,  the  object named "First" will be associated
with the first object in the list.  Since the values of  the
elements  are  not  specified, any object of type PENNY will
match.  This object is then temporarily marked as being  "in
use" and can not be used to match any subsequent tests.  The
list will then be searched for an object of type PENNY whose
DATE  element  is  less than or equal to the DATE element of
the "First" object.  The only other object in the list has a









                           - 15 -


DATE  element  of  1966  which  is not less than or equal to
1964, so the rule fails.  In the LINEAR strategy,  when  any
test  in  the  situation  part  fails, the entire rule fails
immediately, no further tests are made.

     It should be obvious that this  rule  would  have  been
true  if  "First" had been associated with the second object
in the PENNY list.  This is precisely  the  purpose  of  the
RECURSIVE  search  strategy.   In the RECURSIVE search stra-
tegy, when a test fails, the previous test  is  redone.   To
redo  a  test,  the  object  that  was marked as "in use" is
unmarked, and the list is searched from that  point  for  an
object that will match the test.  The RECURSIVE search fails
when a single test fails and it is  no  longer  possible  to
undo  the previous test (this occurs when there is no previ-
ous test).  The RECURSIVE search strategy is a powerful pat-
tern matching tool, but it can be expensive in terms of exe-
cution time.

_6._3._4.  _E_M_B_E_D_D_E_D _C_O_D_E

     Arbitrary C language code may be embedded in the situa-
tion  part anywhere a match may occur.  Recall that embedded
code (c-code) is recognized as a single lexical  element  by
the  scanner,  the  C  language itself is not parsed by TRC.
Errors in embedded code will not be detected  by  TRC.   The
intention  of permitting embedded code in the situation part
is to make it possible to include tests that may not fit the
context of a match against STM.

     In order to integrate an embedded code  test  with  the
existing  match statements, it is necessary to have a way to
refer to objects in embedded code.  In  order  for  embedded
code to have the same functionality as a match statement, it
is necessary to have a way to cause a rule to  fail  in  the
embedded code.  Each of these facilities are provided.

_6._3._5.  _E_M_P_T_Y _O_B_J_E_C_T_S

     The purpose of the EMPTY statement is to create a named
object  that  can  be  referred  to  by match statements and
embedded code, without having to exist in STM.  One  of  the
capabilities  that  results  is  the  ability  to  have  STM
searched on the basis of the result of  some  external  pro-
cedure, e.g.:

     R1:
     EMPTY PENNY SPARE   /* this creates an object of
                       type PENNY that is named
                       SPARE.  This object exists
                       separately from STM and it's
                       elements are not initialized. */
         {
          /* this embedded C code precedes









                           - 16 -


             any search of STM
          */
          if(($SPARE.DATE = external-procedure()) <= 1920){
              $FAIL.
          }
         }
         (PENNY.DATE == SPARE.DATE)
         =>
         MARK PENNY
         ;

     Several things are happening in this example.  First an
object  of  type  PENNY is created and given the name SPARE.
This object exists separately from STM and will  exist  only
during  the  current  rule.   It is useful only as something
that can be referred to in other statements.  A  section  of
embedded  code  precedes  the  only  match statement in this
rule.  When the code produced by TRC is  compiled  and  run,
that  embedded code will be executed before STM is searched,
by virtue of the fact that it precedes the match statement.

     The embedded code contains an "if" statement which con-
tains  an  embedded  assignment and function call as part of
it's condition.  The left-hand-side of the embedded  assign-
ment,  "$SPARE.DATE" is not syntactically correct C language
code.  The dollar character is a flag to TRC that  indicates
a  reference to a named object.  The identifier that follows
the dollar character will be translated by  TRC  during  the
output  phase.  This translation is described in section 12.
The statement "$FAIL." is translated by  TRC  into  whatever
statements are required to make this rule fail.  The defini-
tion of failure depends on  the  search  strategy.   If  the
LINEAR  strategy is being used, "$FAIL." will cause the rule
to stop searching STM and continue with the next  rule.   If
the  RECURSIVE  strategy  is being used, "$FAIL." will cause
the rule to undo and then redo the previous match statement.

     An object name preceded by  the  dollar  character  may
occur  in  the  embedded  code  anywhere a variable name may
occur, since that is what it will actually be translated to.
Embedded  code  may  also refer to objects that exist in STM
using the same dollar character translation technique:

     R1:
     RECURS
          (^PENNY NEW
            PENNY.MINT == "DENVER)
          {
               if(some-function($NEW.DATE))
                $FAIL.
               else
                $NEW.DATE = 0;
          }
          (PENNY.DATE == 1921)









                           - 17 -


          =>
          MARK PENNY
          ;

     The "else" part of the  embedded  code  illustrates  an
assignment  to  an  element  of  the  object named NEW.  The
object that is being called  NEW  exists  in  STM  and  this
assignment  to  it's  DATE element is permanent.  Since this
rule is recursive, it is possible that  this  embedded  code
will  set the DATE element of every object in the PENNY list
to zero.  These modifications are made  before  it  is  even
known that the situation part is true.  Modifying STM in the
situation part of a rule would be  a  major  departure  from
traditional expert system implementation techniques.  Furth-
ermore, the BACKTRACKing option is unaware of  changes  made
in  STM by embedded code.  The BACKTRACKing option is unable
to correctly undo this rule.

_6._4.  _A_C_T_I_O_N

     The ACTION part specifies what is to  be  done  if  the
situation  part is true.  The actions that can be taken pri-
marily involve adding objects to  STM  or  deleting  objects
from  STM.  Recall that the non-terminal 'entry' was defined
in section 4.

     action      ::=  statements c-code
     statements  ::=  { [statement] }
     statement   ::=  MARK mark-list
     statement   ::=  ADD  add-list
     statement   ::=  OPTIMIZE identifier
     mark-list   ::=  { [ mark-item ] }
     mark-item   ::=  [ integer-literal ] identifier
     add-list    ::=  { [ entry ] }

_6._4._1.  _M_A_R_K

     The MARK statement is used to delete objects from  STM.
Only  objects  that  were found in the situation part may be
deleted.  The reason for this constraint is  that  only  the
objects  found in the situation part are definitely known to
exist in STM.  STM is searched only in the  situation  part,
there  is  no  searching in the action part.  Objects may be
deleted by name or in the order they were found, e.g. (using
the definitions from section 6.3.1):

     R1:
          (A.A1 != A.A3)
          (^A FIRST
            A.A1 == 2)
          (B.B1 == FIRST.A3)
          =>
          MARK A
          ;









                           - 18 -


     This MARK statement will delete the  object  in  the  A
list that met the test (A.A1 != A.A3).  In some instances it
may be desirable to delete an object that was not the  first
object that was found, e.g.:

     R1:
          (A.A1 != A.A3)
          (^A FIRST
            A.A1 == 2)
          (B.B1 == FIRST.A3)
          =>
          MARK FIRST
          ;

     The A list object named 'FIRST' is the second object of
type A to be found.  It is specified as the object to delete
by using it's free variable  name.   A  MARK  statement  can
specify  a  count of how many objects of a given type are to
be deleted.  A MARK statement may list any number of objects
to delete, and each object to be deleted can have a separate
MARK statement if desired.  In no case can more  objects  be
deleted  than were found in the situation part.  Each of the
following examples is equivalent:

     R1:
          (A.A1 != A.A3)
          (^A FIRST
            A.A1 == 2)
          (B.B1 == FIRST.A3)
          =>
          MARK B FIRST A
          ;

     R1:
          (A.A1 != A.A3)
          (^A FIRST
            A.A1 == 2)
          (B.B1 == FIRST.A3)
          =>
          MARK 2A
          MARK B
          ;

     R1:
          (A.A1 != A.A3)
          (^A FIRST
            A.A1 == 2)
          (B.B1 == FIRST.A3)
          =>
          MARK FIRST
          MARK A
          MARK B
          ;










                           - 19 -


_6._4._2.  _A_D_D

     The ADD statement is used to add new  objects  to  STM.
As  in  the MARK statement, an ADD statement can specify one
or several objects to add to STM.  The value of each element
of each object can be specified as in the STM section of the
specification.  Each object is inserted at the head  of  the
appropriate  list.   The insertions are actually made in the
opposite order that they are listed, the net effect is  that
the objects appear at the head of the list in the order they
are specified.  ADD and MARK statements may be intermixed in
any order, e.g.:

     R1:
          (A.A1 != A.A3)
          (^A FIRST
            A.A1 == 2)
          (B.B1 == FIRST.A3)
          =>
          MARK FIRST
          ADD A (A.A1 => 6
                 A.A3 => FIRST.A3)
          ADD B (B.B2 => FIRST.A2
                 B.B1 => 9)
          MARK B
          ;

     All the ADD statements will be executed before any MARK
statements  are  executed  regardless  of  the  order of the
statements in the action part.  The statements  are  ordered
by  the  compiler  to  insure that an ADD statement does not
refer to an object that has already  been  MARKed.   In  the
example  above, the first ADD statement refers to the object
named 'FIRST'.  The object named 'FIRST' is  MARKed  in  the
previous statement.  If the code were executed in the speci-
fied order, the element 'FIRST.A3' would not exist when  the
ADD statement was executed.

_6._4._3.  _O_P_T_I_M_I_Z_E

     The OPTIMIZE statement is named for it's primary  func-
tion,  hand  optimization  of LTM.  There is also a built in
optimizer that can be invoked.  Optimization is discussed in
detail  in section 7.  The OPTIMIZE statement can be thought
of as an unconditional GOTO statement.  In normal execution,
after  a  rule fires the rules are tested from the beginning
of LTM for the next  rule  that  will  fire.   The  OPTIMIZE
statement can specify a point other than the start of LTM to
begin testing rules.  In addition to optimization, it can be
used  to impose a customized control structure on the set of
rules.

     One example of the use of the OPTIMIZE statement is  to
implement a search for the absence of some object(s) in STM,









                           - 20 -


which is not otherwise supported by the  TRC  language.   To
search  for  the  absence  of some object(s), use two rules.
The first rule searches for the presence of the object(s) in
question,  if  the  rule  is true then the object(s) are not
absent.  If the rule fails, the object(s) are absent, e.g.:

     R1:
          /* effectively search for the
             absence of an object A with
             element A1 == 2 */

          (A.A1 == 2)
          =>
          /* If this rule is true, branch
             around the next rule */
          OPTIMIZE R3
          ;

     R2:
          /* If R1 fails, then there is
             no object A with element
             A1 == 2.  An empty situation
             part such as this always
             evaluates to true */
          =>
          /* whatever you wish to do in response
             to the absence of A1 == 2 */
          ;

     R3:
          /* continue here if R1 is true */
          . . .


_6._4._4.  _C-_C_O_D_E

     A c-code may follow the MARK, ADD and  OPTIMIZE  state-
ments.  This is user code that is to be executed when a rule
fires.  C-code may not appear between MARK, ADD or  OPTIMIZE
statements.   If  it is necessary to refer to an object that
is being MARKed in c-code, it should be done in  the  situa-
tion  part.   A  c-code  may  precede  the arrow symbol that
separates the situation and action parts.   C-code  in  this
position  is  equivalent  to  c-code  in  the situation part
preceding the MARK, etc. statements, e.g.:

     R1:
          (^A FIRST)
          {
               /* this c-code follows all
                  situation tests.  It is
                  effectively in the action
                  part since it will execute
                  only if the situation is









                           - 21 -


                  true */

               some_procedure($FIRST.A1);
          }
          =>
          MARK FIRST
          ;


_7.  _O_P_T_I_M_I_Z_E_R

     The optimizer does not produce code that is optimum  in
any sense.  What it does is to perform a single, very useful
code modification that can have a very  positive  impact  on
execution time.

     Consider the execution of an  inference  engine.   Each
rule  is  tested  until  one who's situation part is true is
found.  This rule's action  part  is  then  executed.   When
rules  are being tested the problem space is being searched.
When an action part is executed a step is taken in the solu-
tion of the problem.  Searching the problem space is clearly
part of the solution, but the action part is where  the  the
results occur.

     The goal, which is  not  attained,  is  to  reduce  the
search time to zero.  To attain this goal it would be neces-
sary to know each time a rule fires  which  rule  will  fire
next.   This is generally not known.  In particular when the
inference engine begins execution, the contents of  STM  are
not  known,  any rule can be the first rule to fire.  Once a
rule has fired and each time any rule fires a great deal  of
implicit  knowledge  about  the contents of STM is obtained.
It is known that no rule previous to  the  current  rule  is
true  and  no  rule previous to the current rule can be true
after the execution of the current rule unless  the  current
rule  modifies  STM  in  such a way as to make some previous
rule true.  This simple fact is  the  entire  basis  of  the
optimizer, which attempts to reduce the number of rules that
are tested by deducing which rules can not possibly fire.

     Three tests must be performed to determine a  candidate
next  rule, which is the first rule in LTM that can possibly
fire after the current rule  fires.   The  three  tests  are
called the NOT test, the ADD test and the MARK test.

     The first case to be considered is the case of  a  rule
which contains a NOT statement in the situation part.  A NOT
test is an explicit test for an empty  list.   When  a  rule
that fires contains an ADD statement it will not be possible
for any previous rule with a NOT statement referring to that
list  to be the next rule to fire.  Likewise, if a rule that
fires contains a MARK statement and no ADD statement  refer-
ring  to  that  same list, it is possible that the list will









                           - 22 -


become empty making it possible for the rule  with  the  NOT
statement  that  previously failed to become true.  If it is
determined that it is possible for a rule to fire after  the
NOT  test,  that  rule  becomes  the  candidate  rule and no
further testing is done.

     Consider the case of a rule with no NOT statements that
recursively  searches  STM  for  a  situation.  If this rule
fails, it will continue to fail until something is added  to
STM  to make it true.  If all rules searched STM recursively
it would be known when a rule fires that of the  rules  that
precede  the  current rule, only those rules that search for
something added to STM by the current rule can possibly fire
in the next pass.

     If the current rule  adds  something  to  STM,  control
could  continue  with  the first rule that searches for that
something rather than the first rule in  LTM.   If  no  rule
prior to the current rule searches for those things added to
STM by the current rule or if the current rule adds  nothing
to  STM  then no prior rule can execute.  Control could con-
tinue with the current rule rather than at the beginning  of
LTM.   By causing control to continue with a rule later than
the first rule the amount of searching is reduced.

     The case of a rule that performs only a  linear  search
on  STM  must  also  be considered.  The previous conclusion
about items being added to STM is still true;  a  rule  that
adds  something  to  STM  can  cause a linear search rule to
become true.  With linear search it is also possible that  a
rule  will become true if something is removed from STM.  If
a linear rule searches for several similar items  which  are
present  but  not  correctly ordered it is possible for this
linear search to fail where a  recursive  search  would  not
have  failed.   If there were excess items,  removing one or
more may cause a different  linear  assignment  which  could
make  a  linear rule true.  This is the MARK test.  Examples
of this situation are non-trivial, but where correctness  is
an issue these cases can not be overlooked.

     The TRC optimizer selects a continuation point for each
rule  based  on  what  the  rule adds to or deletes from STM
rather than testing each rule from  the  beginning  of  LTM.
The  continuation  point  is  the first rule that could fire
based on the NOT and ADD tests for all rules, and  the  MARK
test  for linear rules.  The TRC optimizer is somewhat naive
in that it considers only items added or  deleted  with  the
ADD  and  MARK  statements.  The optimizer is unaware of any
changes that may have been made to STM by  user  code.   The
caveat  is if STM is modified in user code the optimizer may
produce incorrect code. The optimizer, which can be  invoked
with  a  command line option (-O), tests each rule individu-
ally and ignores those rules that were hand optimized in the
specification.



More information about the Mod.sources mailing list