Put your code... (was Re: gotos

Tue Apr 26 09:23:00 AEST 1988

[Part 1: Double-exit loops and double decisions.]

Before the advent of GO TO-less programming, many programmers
recognized the advantages of straightforward control flow, and in
fact, coded programs that created structured control structures using
GO TO statements.

With the advent of structured programming, these programs could be
converted to use the Boehm-Jacopini [Boeh66] control structures; the
conversion was usually straightforward.  There was a common case,
though, that defied conversion without either duplicating code or
introducing extraneous variables.  This case was where the result of
executing conditional code was in turn used in the control condition
of another conditional or looping context.

In a study of Fortran programs, which addressed data flow analysis
rather than control structures, Farrow, Kennedy and Zucconi [Farr76]
identified two new control structures that address most of these
cases.  I propose looking at their control structures in the context
of structuring `C' programs.

They define their control structures in terms of program flow graphs;
I will use GO TO programs to show the sort of analysis that can be
done when a program is viewed in this light.  Furthermore, to preserve
the graph-theoretic nature of the arguments, I will sometimes insert
needless GO TO statements to avoid considering the effects of
`fallthrough.'

If readers are unfamiliar with the fundamental techniques whereby flow
graphs that already conform to Bohm-Jacopini control structures can be
reduced to the corresponding control structures, I refer them to
Appendix A at the end of this article.

1.1 The double decision.

EXAMPLES 1-3 all use a control flow that cannot be reduced to the
Boehm-Jacopini control-flow primitives without either introducing
auxiliary variables or copying parts of the code.  This limitation to
the Boehm-Jacopini structures was first pointed out by Ashcroft and
Manna [Ashc71], several years before Knuth's paper.

In their analysis of programs, Farrow, Kennedy and Zucconi were able to
introduce an additional control structure into their graphs; the new
structure handles the Ashcroft-Manna counterexample, and in addition
can handle EXAMPLES 1-3.

The new structure is the _double decision_, corresponding to the
program fragment:

   if (COND1) {
	action1;
	goto A;
	}
   action2;
   if (COND2) {
	action3;
	goto A;
	}
   action4;
   goto B;

There is a trivial way to address this problem, introducing an
auxiliary variable.  We convert the program to the intermediate
representation:

   if (COND1) {
	action1;
	flag = TRUE;
	}
   else {
	action2;
	if (COND2) {
		action3;
		flag = TRUE;
		}
	else {
		action4;
		flag = FALSE;
		}
	}
   if (flag) goto A; else goto B;

and then reduce the concluding _if_ statement according to the
techniques in Appendix A.

I contend, without proof in this paper, that disciplined use of the
_break_ statement can eliminate the auxiliary variable in all cases
where (a) action1 and action3 are both null, or (b) action4 is null.
I have a set of translation rules that will convert programs
containing double decisions to conventional control structures; if
there is sufficient interest, I may consider writing a paper on this.

1.2 The double-exit loop.

A few programs (none of Knuth's fall into this category) cannot be
reduced successfully with the double decision, but can by introducing
another concept from Farrow, Kennedy, and Zucconi, the _double-exit
loop._  The double-exit loop looks like the following:

A: action1;
   if (COND1) {
	action2;
	goto B;
	}
   action3;
   if (COND2) {
	action4;
	goto C;
	}
   action5;
   goto A;

Once again, there is a trivial translation to conventional `C' if we
introduce an auxiliary Boolean variable:

A: for (;;) {
	action1;
	if (COND1) {
		action2;
		flag = TRUE;
		break;
		}
	action3;
	if (COND2) {
		action4;
		flag = FALSE;
		break;
		}
	action5;
	}
   if (flag) goto B; else goto C;

Once again, there are important special cases with null actions that
allow the auxiliary variable to be eliminated by disciplined use of
_break_ and _continue_.

1.3 Application to the examples.

All of Knuth's examples use the double decision in a way that can
avoid the use of auxiliary Booleans.

Translated to idiomatic C (i.e., using 0-based indexing and the _for_
statement), Knuth's Example 1, with goto's, looks like:

/* 1 */	for (i = 0; i < m; ++i)
		if (A[i] == x) goto found;
/* not found */
	i = m++;
	A[i] = x; B[i] = 0;
found:	++B[i];

Knuth proposes, in Example 1a, eliminating the _goto_ with something
like:

/* 1a */
	for (i = 0; i <= m && A[i] != x; ++i) /* empty loop body */ ;
	if (i > m) {
		i = m++;
		A [i] = x;
		B [i] = 1;
		}
	else ++B[i];

In this example, he points out that the test for (i <= m) in the inner
loop is superfluous with proper data structures, and comes up with:

/* 2 */	A[m] = x;
	for (i = 0; A[i] != x; ++i) /* empty loop body */ ;
	if (i > m) {
		i = m++;
		A [i] = x;
		B [i] = 1;
		}
	else ++B[i];

This last form is close to the way I'd code a linear search.  It also
has no _goto_s.

EXAMPLE 3 does not admit gracefully of such restructuring.  The
example, with GO TO's, is

/* 3 */	i = hash (x);
	while (A[i] != 0) {
		if (A[i] == x) goto found;
		if (--i < 0) i = m - 1;
		}
	A [i] = x; B [i] = 0;
found:	++B[i];

Following Knuth's lead in interchanging the checks on A[i] (EXAMPLE
3B), we arrive at:

/* 3b */
	i = hash (x);
	while (A[i] != x) {
		if (A[i] == 0) goto not_found;
		if (--i < 0) i = m - 1;
		}
	goto found;
not_found:
	A[i] = x; B[i] = 0;
found:	++B[i];

But here we note that the `not_found' case can be hoisted into the
lexical scope of the `while':

	i = hash (x);
	while (A[i] != x) {
		if (A[i] == 0) {
			A[i] = x; B[i] = 0;
			break;
			}
		if (--i < 0) i = m - 1;
		}
	++ B[i];

This is close to the way I'd express it in `C'.  I would, however,
make a _for_ statement out of the first two lines; the _for_ statement
would have an empty third part.

Testing of this version (on a VAX, using 4.3BSD, and compiling all
versions of the program with -O) shows that it's actually faster than
any of Knuth's examples with the _goto_ statements, so in this case,
at least, I contend that the arguments that structured programming
degrades performance are specious.
------------------------------------------------------------------------
Appendix A. Reduction of flow graphs of already-structured programs.

A.1 Sequencing.

A program that contains a GOTO A, where the GOTO is the only way to
reach label A, can have the GOTO eliminated in a trivial way, by
reordering the code so that the statements at A immediately follow
the GOTO.

A.2 Conditionals.

A program whose general structure is

if (COND) goto A; else goto B;
....
A: action1; goto C;
....
B: action2; goto C;

where there is no other way to reach the code contained in action1 and
action2, can be replaced with

if (COND) {
	action1;
	}
else {
	action2;
	}

A.3 Loops.

The sequence of code:

A: action1;
   if (COND) goto B; else goto C;
....
B: action2;
   goto A;
....
C:

is a generalized one-entry one-exit loop.  It can be converted
mechanically to:

A: for (;;) {
	action1;
	if (COND) break;
	action2;
	}

There are two important special cases of this construct.  If action1
is null, we can write

A: while (!(COND)) {
	action2;
	}

and if action2 is null, we can write

A: do {
	action1;
	while (!(COND));
	}

Moreover, if action2 can be expressed as a single statement in C, we
have the construct

A: for (; !(COND); action2) {
	action1;
	}

and, of course, straight-line code preceding A can be merged with the
_for_ statement if desired.

Which of these equivalent constructions is to be preferred depends on
the context.

These three reductions, applied successively, can be used to convert
any flowgraph of a program that is already structured to the
corresponding C program statements.  Note that these reductions handle
the common case of a `loop n and a half times,' where a loop exits
from the middle; they avoid both copying code and the use of the GO TO
by a disciplined use of the _break_ statement.