Of Standards and Inventions: A Cautionary Tale

Chris Torek chris at mimsy.UUCP
Thu Apr 7 00:15:44 AEST 1988


[Typography convention: /word/ represents /italics/; |word|
represents typewriter-text.]

By now most of you know my sentiments towards `noalias'.  Here,
however, is a sequence showing how even the most innocent-seeming
inventions can interact to produce surprising results.

First, a note about unsignedness:  In the C language, the unsigned
attribute on a type can be viewed as `sticky': operations on unsigned
numbers always yeild an unsigned result.  (The only exception is the
ternary e1?e2:e3, whose result is independent of the type of e1.)
The condition can, of course, be cleared by a cast to a signed
type.

Second, we have a long-standing clause in the draft standard on /integer
constants/, one that determines the type of a constant from its value
and that value's representation on your machine.  In itself this is
nothing new: even K&R say that whether |34567| is an |int| or a |long|
will depend on the number of bits in your |int|.  The dpANS further
says that a constant may become an |unsigned long|.  In particular, on
machines with 32 bit |long|s, values in 2147483648..4294967295 are
|unsigned long|.  This is certainly reasonable, or at least seems so.

Next we have the introduction of explicitly-unsigned constants.  |12U|
is to be equivalent to |(unsigned)12|; |99LU| or |99UL| is equivalent
to |(unsigned long)99|.  This is quite a notational convenience, just
as is the existing L suffix, and adding it to compilers is simple:  It
took perhaps a dozen lines to add it to the 4.3BSD Vax and Tahoe
compilers.  Again, reasonable, if something of a frill.

But now that we have this U suffix, and various files that use it, I
find that the preprocessor must do something with it.  And indeed, the
draft tells us that the preprocessor now has the notion of unsigned
arithmetic.  Rather than do everything in |long|s, ignoring any U
suffixes, it must obey the compiler's rules for combining |long| and
|unsigned long|.  Is this such a burden?  Perhaps; perhaps not: a close
approximation in the Reiser preprocessor---making unsigned
`sticky'---took only a few changes (the approximation fails only for
e1?e2:e3 as noted above).  But having unsigned arithmetic available in
the preprocessor is clearly semantically desirable: it should be nice
to be able to tell whether the maximum unsigned short is greater than
65535U:

	#include <limits.h>

	/*
	 * Define a type to hold values in 0..65536.  We will
	 * have a large array of these numbers, so use as little
	 * space as possible.
	 */
	#if USHRT_MAX > 65535U
	typedef unsigned short bigunum;
	#else
	typedef unsigned long bigunum;	/* dpANS says u_long must suffice */
	#endif

Each of these inventions (for inventions they are, at least as they
have been phrased) seems perfectly reasonable.  At least, each one
seems so to me.  But lo! what has happened when we combine them all?
The answer to that lies in the following question:

	On a machine with 32 bit |long|s and two's complement
	arithmetic, what is the type of -2147483648 in the preprocessor?

Since the preprocessor is required to follow the same rules as the
compiler, and is possesed of the notion of unsigned, we find that it is
first to compute 2147483648 and then to negate it, and when it does the
former it finds that the type is |unsigned long|.  The negation changes
nothing: /neither the type nor the value/.  As noted earlier, the only
way to remove the unsigned attribute is to use a cast.  But since the
preprocessor explicitly disallows casts, there is no way to get
-2147483648!  In particular, this means that

	#include <limits.h>
	#if LONG_MIN > 0

is guaranteed to be /true/ on any two's complement machine!

The moral, if you will, of this story is that even obvious and
well-behaved inventions may not always work together.  If something as
simple as putting unsigned arithmetic in the preprocessor has such a
surprising result, what can we expect of inventions like |noalias|?
Perhaps this will show why I am uneasy about /every/ invention in
this draft standard, even such obvious improvements as prototypes.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris at mimsy.umd.edu	Path:	uunet!mimsy!chris



More information about the Comp.lang.c mailing list