command line options

John P. Rouillard rouilj at umb.umb.edu
Tue Apr 5 05:55:57 AEST 1988


The folowing structure allows a generic function to parse any
cconcevable command line.  The structure would have the form:

struct command_entry
{
struct command_entry * next,  /* for a linked list of these babies */
char *NAME,           /* the full name of the option */
char *ABBREV,         /* the shortest abbreviation for the option */
char *ARG_TYPE,       /* the type of argument (string, char, int, float ...)
char *format_type,    /* Keyword = value, +keyword, -keyword ... */

type *VARIABLE_addr,  /* the address of a variable to set */
enum v_type VAR_type, /* the type of the variable above */

int **FUNCTION_addr(),  /* address of a function returning a pointer to int */
enum f_type FUNCT_type, /* the type the function actually returns */

int *Error_handler(),   /* Your own personal error handler */
  add you favorite options here
};

a possible entry would be: (from the command make (augmented for show )
{ NAME         "file",
  ABBREV       "f",
  ARG_TYPE     string,  /* ie char *     */
  format_type  "-w" /* specifing "-"f and w signifies space between 
		       keyword and value */
  VARIABLE_addr &makefile_name,
  VAR_type      String,
  FUNCTION_addr NULL, /* not function needed */
  FUNCT_TYPE     NULL  /* the type of a nonexistant function */
}

This structure would allow:

    a: A long name that would be able to be abbreviated to the value
       in ABBREV.

    b: Handling  multi character flags without values (I.E. "-las" in
       "ls -las") Simply loop over each character and set the
       appropriate flag.
    
    c: Whitespace elimination (I.E. -Kvalue) is easily done the value
       up to the next whitespace character is scanned according to its
       type. 

    d: The setting of a variable to an argument value or if a function
       is specified the setting of the variable to the pointer value
       returned by the function.  (The variable at the VARIABLE_addr
       is interpreted according to the value in VAR_type so
       appropriate casts can be made.)

    e: The ability to handle special parsing of the command line via
       calls to a function that takes 1) current argv location, 2) argc
       and 3) the address of the command_entry list
       as arguments.

    f: For those values that are multiples on the command line (i.e.
       multiple filenames), the function specified in the
       command_entry could create a list of the names (copying them if
       desired) and then have the variable in the command_entry point
       to the head of the list.

    g: Optionally to setting other variables, the values could be returned
       in the command_entry structure itself (maybe via a union in the
       struct??).

    h: The ability to specify in the command entry an error routine
       specific for the particular option being parsed.

    i: By adding the flexibility of calling a function to deal with
       funky parts of the command line the function to parse the
       command line will return only when it has parsed the whole
       command line thus eliminating the problem of dealing with the
       unparsed command line namely because it is an error [probably
       fatal] for it not to parse the whole command line.

    j: The command_entries could be created dynamically during
       runtime, or declared statically at compile time.

    k: The driver for Options_please (the get_ops lookalike )
       would act similiarly to a LR or LL parser driver with a parse
       table (the linked list of command_entries). The driver is easy to
       maintain with all of the work actualy done during the creation
       of the parse tables.


BUGS:  
       a: This data for the command_entries could take up a lot of
	  space and therefore may be troublesome.

       b: The second problem occurs because of the ambiguity in the
          command language.  Please follow my description below:

	Assume we have defined:
	  A keyword Kval that can have an optional argument,
	  and boolean keywords (flags either on or off) "u" and "e".

	  How do we parse "-Kvalue".

	  Is it Kval with argument "ue" or is it Kval with no
	  arguments and the boolean flags "u" and "e".

	     If we allow eliding of whitespace between flag and value
	     it is impossible to tell which is meant.  By doing away
	     with 'c' above we can then parse this as Kval with no
	     arguments.

          Another ambiguity arises if we decide on having an argument
	  that can be abbreviated "K" (Kval needs all four letters)
	  and other arguments "v", "a", and "l".  Now how does the
	  above string parse:

	    The boolean "K" the boolean "v" no wait those two letters
	    are the prefix for Kval (ARRGH ;-[) (HELP LR GRAMMAR)

	  Richard Harter also touched on this ambiguity problem in his
	  article.

	  This is a problem that is inherant with features 
	  a,b above.  

	  One way around this is to make sure that you never use the
	  letters K,v,a, and l :-).  

	  A second way around the problem is to make the order of the
	  keyword in the list of command_entries significant and
	  therefore impart an priority to the commands.  In the above
	  example:

		   if Kval appeared before K (which it would have to
		   do in order to have Kval called at all) the
		   interpretation of the flag Kval would occur first.


       A third way around it is to write the table such that no two
       command_entries have overlaping differences.

       The fourth way is to write a function that will allow the
       handling of this via look-ahead or whatever mechanism you
       devise.  Basically you turn an NFA into a DFA by combining
       states.  E.G. if a K was found a function would be called that
       would try to determine if the value was Kval or if the value
       was K followed by random characters.

       If you think this stuff was handled in The Dragon Book You are
       right on the money.  But note the thing that causes all of the
       problems is allowing names and having possibly non-unique
       representations for every string that can be generated.
       However this facility seems to be the only way to even attempt
       generality and allow a way of working around the problem.

PLEASE NOTE: that this is only an idea and I would like feedback on
	     it.

Please feel free to steal the idea and modify it as necessary.
Sorry it is so long but I was trying to reply to everybodues favorite
must haves.

What do I know I am only a Physics major?

==========================================================================
The opinions expressed above are all mine and belong to nobody else.  To
U-Mass I am just a number.

E = M C**2  Not just an equation a way of life.

John Rouillard				U.S. Snail:	Physics Department 
U-Mass Boston						U-Mass Boston
Physics Major				        	Harbor Campus
							Boston, MA 02125
 					UUCP: 	harvard!umb 
						husc6!umb



More information about the Comp.lang.c mailing list