[4bsd-f77 #31] F77 sometimes can't handle mixed REAL and REAL*8

Fri Aug 24 08:59:48 AEST 1984

From: Donn Seeley <donn at utah-cs.arpa>

Subject: F77 sometimes can't handle mixed REAL and REAL*8 expressions
Index:	f77/src/f1/order.c 4.2BSD

Description:
	When generating code for certain complicated expressions which
	mix REAL and REAL*8 values, f77 will bomb out because it
	underestimates the number of registers that the expression will
	require.  I found this bug in a long-lost report from Kirk
	Thege at UC Berkeley.

Repeat-By:
	Clip out the following program (based on a program by Kirk
	Thege) and try to compile it.  It doesn't matter whether the
	optimizer is on or off, but the examples of code below assume
	that it is on:

	----------------------------------------------------------------
		program xmpl

		integer i
		real a, b, c, d
		real*8 x(3), y, z

		i = 1
		a = 1.5
		b = 2.0
		c = 2.5
		d = 3.0
		x(1) = 3.5
		y = 4.0

		z = (a + 1.0) * ((x(i) / b) + ((d / y) * c))

		print *, z

		stop
		end
	----------------------------------------------------------------

	When you try to compile, the compiler will print something like
	this:

	----------------------------------------------------------------
	xmpl.f:
	   MAIN xmpl:
	xmpl.f, line 15: compiler error: no table entry for op REG

	compiler error.
	----------------------------------------------------------------

Fix:
	The problem is that the code generator has painted itself into
	a corner.  If you run the code generator manually, you can see
	what it's trying to do (code has been prettied up):

	----------------------------------------------------------------
		addf3	{0x4080},{a},r0
		movl	{i},r1
		cvtfd	{b},r2
		divd3	r2,&{x}-8[r1],r2
		cvtfd	{d},r4
		divd2	{y},r4
		<bombs>
	----------------------------------------------------------------

	When it crashes, the code generator has just finished emitting
	the code for '(d / y)', and now needs to multiply the value of
	'c' into r4.  Unfortunately, 'c' is a REAL and the expression
	is of type REAL*8, so 'c' needs to be converted; but there are
	no more registers available for the conversion, since we have
	used up r0-r5 to hold temporary results of subexpressions.  The
	code generator frantically attempts to rewrite the expression
	tree, and when it can't find anything appropriate it finally
	runs over one of its many 'never happens' bugs and croaks.

	Why didn't the code generator arrange to compute the value of
	one or more subexpressions on the stack?  The problem is that
	the Sethi-Ullman algorithm which the code generator uses to
	estimate its needs for registers is not getting the right
	information; it is led to believe that it can get away with
	fewer registers than are really required.  Looking at the
	intermediate code, we see the following (more prettied up
	and ASCIIfied output):

	----------------------------------------------------------------
	oreg        (double)  {z}
	oreg        (float)  {a}
	name        (float) L21
	+           (float)
	reg         (double *) r11
	int         (long) &{x}-8
	+           (double *)
	oreg        (long)  {i}
	int         (long) 3
	<<          (long)
	+           (double *)
	* (indir)   (double)
	oreg        (float)  {b}
	/           (double)
	oreg        (float)  {d}
	oreg        (double)  {y}
	/           (double)
	oreg        (float)  {c}
	*           (double)
	+           (double)
	*           (double)
	=           (double)
	----------------------------------------------------------------

	The parser pass of the compiler does not generate conversion
	operations from REAL to REAL*8!  There are a few lines in
	fixexpr() in f77/src/f77pass1/expr.c which show that this is
	intentional.  Apparently the code generator can handle certain
	conversions of REAL to REAL*8 without recourse to explicit
	conversion operators, since there are entries in the code table
	for these contingencies.  However the Sethi-Ullman algorithm
	doesn't know about these implicit conversions and as a result
	it thinks that a REAL operand requires the same space
	regardless of whether it needs to be converted to REAL*8.

	There are two solutions -- either force conversion operators to
	be emitted or fix the Sethi-Ullman algorithm in the code
	generator so that it knows about implicit conversions.  I will
	present a fix for the code generator, since both sets of
	changes do the job, and the code generator fixes produce better
	code.

	There seems to be a bug in the code generator which the parser
	pass is taking advantage of -- if an implicit REAL to REAL*8
	conversion occurs to the left operand of an expression, then
	the Sethi-Ullman algorithm allocates fewer registers for this
	expression than for the corresponding expression with an
	explicit conversion operator forcing the left operand to be
	type REAL*8.  There are special code table entries for REAL*8
	expressions with REAL operands which can conserve registers.
	The problem crops up because registers are really only
	conserved if the REAL operand is on the left, where it can be
	converted into REAL*8 in situ.  For operands on the right an
	extra register pair is required for a conversion.  The obvious
	change to the code generator is to force the Sethi-Ullman
	algorithm to pretend that there is an explicit conversion
	operation if an implicit conversion must apply to the right
	operand but not the left.  (A somewhat cleaner but more
	extensive change would be to recognize explicit conversions and
	handle them this way anyway...)

	The changes are in the routine sucomp() in order.c:

	----------------------------------------------------------------
	*** /tmp/,RCSt1001501	Tue Jul 31 17:05:42 1984
	--- order.c	Tue Jul 31 15:44:47 1984
	***************
	*** 75,80
		   number, or local equivalent */
	  
		register o, ty, sul, sur, r;
	  
		o = p->in.op;
		ty = optype( o );

	--- 75,82 -----
		   number, or local equivalent */
	  
		register o, ty, sul, sur, r;
	+ 	register NODE *temp;
	+ 	int szr;
	  
		o = p->in.op;
		ty = optype( o );
	***************
	*** 117,122
	  
		sul = p->in.left->in.su;
		sur = p->in.right->in.su;
	  
		if( o == ASSIGN ){
			/* computed by doing right, then left (if not in mem), then doing it */

	--- 119,130 -----
	  
		sul = p->in.left->in.su;
		sur = p->in.right->in.su;
	+ 	szr = szty( p->in.right->in.type );
	+ 	if( szty( p->in.type ) > szr && szr >= 1 ) {
	+ 		/* implicit conversion in rhs */
	+ 		szr = szty( p->in.type );
	+ 		sur = max( szr, sur );
	+ 	}
	  
		if( o == ASSIGN ){
			/* computed by doing right, then left (if not in mem), then doing it */
	***************
	*** 154,159
			p->in.su = max( max(sul,sur), 1);
			return;
	  
		case PLUS:
		case OR:
		case ER:

	--- 162,168 -----
			p->in.su = max( max(sul,sur), 1);
			return;
	  
	+ 	case MUL:
		case PLUS:
		case OR:
		case ER:
	***************
	*** 159,165
		case ER:
			/* commutative ops; put harder on left */
			if( p->in.right->in.su > p->in.left->in.su && !istnode(p->in.left) ){
	- 			register NODE *temp;
				temp = p->in.left;
				p->in.left = p->in.right;
				p->in.right = temp;

	--- 168,173 -----
		case ER:
			/* commutative ops; put harder on left */
			if( p->in.right->in.su > p->in.left->in.su && !istnode(p->in.left) ){
				temp = p->in.left;
				p->in.left = p->in.right;
				p->in.right = temp;
	***************
	*** 163,168
				temp = p->in.left;
				p->in.left = p->in.right;
				p->in.right = temp;
				}
			break;
			}

	--- 171,184 -----
				temp = p->in.left;
				p->in.left = p->in.right;
				p->in.right = temp;
	+ 			sul = p->in.left->in.su;
	+ 			sur = p->in.right->in.su;
	+ 			szr = szty( p->in.right->in.type );
	+ 			if( szty( p->in.type ) > szr && szr >= 1 ) {
	+ 				/* implicit conversion in rhs */
	+ 				szr = szty( p->in.type );
	+ 				sur = max( szr, sur );
	+ 				}
				}
			break;
			}
	***************
	*** 168,174
			}
	  
		/* binary op, computed by left, then right, then do op */
	! 	p->in.su = max(sul,szty(p->in.right->in.type)+sur);
	  /*
		if( o==MUL||o==DIV||o==MOD) p->in.su = max(p->in.su,fregs);
	   */

	--- 184,190 -----
			}
	  
		/* binary op, computed by left, then right, then do op */
	! 	p->in.su = max(sul,szr+sur);
	  /*
		if( o==MUL||o==DIV||o==MOD) p->in.su = max(p->in.su,fregs);
	   */
	----------------------------------------------------------------

	After installing these fixes, the code generator succeeds in
	processing the output of the parser pass, producing this:

	----------------------------------------------------------------
		movl	{i},r0
		cvtfd	{b},r2
		divd3	r2,&{x}-8[r0],r2
		cvtfd	{d},r0
		divd2	{y},r0
		cvtfd	{c},r4
		muld2	r0,r4
		addd2	r4,r2
		addf3	{0x4080},{a},r0
		cvtfd	r0,r0
		muld3	r2,r0,{z}
	----------------------------------------------------------------

	Notice that the new code does not require stack temporaries to
	store subexpression results...  This is because the algorithm
	is now smart enough to postpone evaluating the subexpression
	(a + 1.0), which is a result of adding '*' to the list of
	commutative operators.  Without the latter change, correct code
	is still produced but a stack temporary is required (I won't
	present any more gory details).

	This is a pretty sensitive change, I hope it doesn't have any
	nasty side effects...

Donn Seeley    University of Utah CS Dept    donn at utah-cs.arpa
40 46' 6"N 111 50' 34"W    (801) 581-5668    decvax!utah-cs!donn