entry at other than main (was want to know)

Sat Aug 19 23:16:52 AEST 1989

In many articles many people write this, that, and the other argument
for or against `main()' as the program entry point.

Personally, I do not see this as much of an issue.  There must be
*some* way to label something as the program entry point.  The obvious
way to do this is with a `reserved word'.  Many programs use a special
syntax:

	PROGRAM FOO
	IMPLICIT UNDEFINED (A-Z)
	...
	END

or	program blivet(input, output);
	type goo = record ... end;
	var a, b, c : integer;
	begin ... end.

Others simply `enter from the top' (SNOBOL does this, making
subroutines exciting, since the subroutine must be defined before it is
used, yet usually cannot be run before the main program itself begins).
Still others (like C) reserve a particular function name.  In
languages with true reserved words, this has the trivial advantage
of not `using up' another word.

Only a very few languages---particularly interpreted or `symbolic'
languages---have historically allowed several program entry points.
These get away with it by preserving enough of the symbol table---often
this means `all of the symbol table'---to know the names of every
function, and the types of arguments, and so on.  Many compiled
languages discard the symbols at the end of compilation, at least
virtually (e.g., global symbols are retained for use with debuggers,
unless you use `strip'), and C has historically taken this approach.
Once the symbols are gone, there is no good way to bind names to
machine code locations, necessitating a simple convention like
`start at the first byte' or `start at offset <word at image+4>'.

Anyway, this gives us some background with which to consider the
options available.  We have four standard approaches available:

	a) program begins at procedure or function declared with
	   some special syntax;
	b) program begins at top;
	c) program begins at reserved name (`main');
	d) program begins at any function (Lisp, APL, etc).

Of these, only one allows programmers and users to `do lots more', and
that is the last approach.  It it certainly very useful during
debugging.  But it has drawbacks: it uses more resources (you have to
carry those symbols around, and provide a way to look them up).  A more
subtle drawback is that you may not *want* users to start your program
anywhere---a canned application is only meant to be started in some
particular way(s).  Compiler vendors are probably not interested in
their users' being able to invoke individual functions and perhaps
`steal compiler technology' that way.

At any rate, you can, right now, go out and *buy* approach (d) for C:
there are at least two C interpreters on the market.  If you want it,
go pay for it.

That leaves us with (a), (b), and (c).  Of these, I would personally
reject (b) out of hand, having had some experience with it, leaving
only (a) and (c).  So: what does (a), adding a special syntax, buy us?

Well, for one, we can name our programs.  Instead of

	/* calculate prime factors */
	int main(int argc, char **argv)
	{ ... }

we can write

	{ calculate prime factors }
	program primefactors(input, output)
	...

That this is good, I think most will agree.  That it is worth the
`cost' of a program keyword is a bit more debatable.  More intriguing
to me is the fact that many compilers actually discard the program name
almost immediately---the program name acts like a comment.  If it acts
like one, maybe it should just *be* one, as in C.  Either way, I think
this is ultimately unimportant.  One either learns `main is the
program, look near it to figure out what the program is about' or `the
program name is discarded, look away from it when the debugger prints
locations' or whatever.

But there is another advantage to the special syntax, if we design it
properly.  We could allow programs to declare each entry point with a
`program' or `entry' statement, and thus share subroutines and get the
effect of switching on argv[0] on Unix machines, as ex/vi/view/edit/e
and compress/uncompress do.  To do this we must have the compiler and
the linker cooperate: the compiler has to `leave behind' the names of
all the program entry points, and the linker must include code to
select the appropriate one at runtime.  If there is only one entry
point, the linker could skip the selection code.  The benefits we know;
the cost of this is some special syntax, some code in the compiler, and
some more code in the linker.

Is this an advantage?  Certainly, at least for programs like
ex/vi/view/edit/e and compress/uncompress; they could leave out the
`magic' used to decide how to operate, relying on the `magic' in the
runtime library instead.  Is it worth it?  Again, this is debatable.
For every application that has several entry points you can find many
that have only one.  (In fact, ex/vi/... has only one: it sets flags
based on argv[0], does some startup common to all variants, and only
then looks at the flags.  The same flags can be set or cleared under
program control [e.g., `set magic', `set readonly'], so ex/vi/... is
not such a great example.  Compress/uncompress is a much better example.)
Moreover, one of the philosopies underlying both C and Unix is (or
at least was) `there is no magic': the language and the programs are
(or at least, once were) generally simple and straightforward.

At any rate, C uses the `reserved procedure name' approach, with its
single merit of simplicity and its drawbacks as discussed above, and
arguments in this newsgroup are unlikely to change this.  If you really
want multiple entry points *and* debuggability in C, go buy a C
interpreter.  If you want something in between, go write it yourself.
Maybe, after demonstrating how wonderful it is, you can get it into
C00 (or whatever the next standard may be called).
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris at mimsy.umd.edu	Path:	uunet!mimsy!chris