ANSIfication: value preserving rules

Chris Torek chris at mimsy.UUCP
Sun Apr 10 19:20:34 AEST 1988


Several people have expressed confusion over the difference between
`sign preserving' rules and `value preserving' rules.  These rules
control the result when the compiler has to expand an unsigned char,
unsigned short, or unsigned int value to a larger type.  (From here on
the unsigned prefix will be abbreviated |u_|.)

The first kind of expansion happens whenever an object of type |u_char|
or |u_short| appears in an expression.  The object must be widened to
|int| or |u_int|.  The second occurs when |u_int| values (possibly
produced by the former expansion) are mixed with |long| or |u_long|
values in any arithmetic expression.

The `sign preserving' rules can be stated in four words: the result is
unsigned.  The table below shows the result of each conversion
(u_int:long means u_int in long context):

	SIGN PRESERVING RULES
	input type	output type
	----------	-----------
	u_char		u_int
	u_short		u_int
	u_int		u_int
	u_int:long	u_long

The `value preserving' rule table looks like this:

	VALUE PRESERVING RULES
	input type	output type
	----------	-----------
	u_char		int or u_int
	u_short		int or u_int
	u_int		u_int
	u_int:long	long or u_long

Whether |int| or |u_int| (|long| or |u_long|) is chosen depends on
whether |int| (|long|) can hold all the values of the input type.
More specifically, on a machine with 16-bit |int|s and 32-bit
|long|s (e.g., IBM PC, PDP-11, some 68000 systems), the table
looks like this:

	VALUE PRESERVING RULES FOR PDP-11/IBM-PC
	input type	output type
	----------	-----------
	u_char		int
	u_short		u_int
	u_int		u_int
	u_int:long	long

whereas on a 32-bit |int| and |long| machine (e.g., VAX, IBM PS/2
in 386 mode, most 68000 systems), it appears instead as

	VALUE PRESERVING RULES FOR VAX/SUN/IBM PS2
	input type	output type
	----------	-----------
	u_char		int
	u_short		int
	u_int		u_int
	u_int:long	u_long

The Rationale provides the following, er, rationale:

    The unsigned preserving rules greatly increase the number of
    situations where |unsigned int| confronts |signed int| [in an
    expression] to yeild a questionably signed result [where a negative
    number suddenly becomes a large positive number, a possibly
    unintended result], whereas the value preserving rules minimize
    such confrontations.  Thus, the value preserving rules were
    considered to be safer for the novice, or unwary, programmer.
    After much discussion, the Committee decided in favor of value
    preserving rules, despite the fact that the UNIX C compilers had
    evolved in the direction of unsigned preserving.

			QUIET CHANGE
	A program that depended upon unsigned preserving arithmetic
	conversions will behave differently, probably without
	complaint.  This is considered the most serious semantic
	change made by the Committee to a widespread current practice.

I claim that the value-preserving rules are no easier for novices,
particularly because the expansion of |u_short| is so terribly
context-dependent.  One might note that the following prints
"conformant" twice on every existing conformant implementation:

	unsigned char uc = -1;
	unsigned int ui = -1;

	if (-uc < 0)
		printf("conformant\n");
	if (-ui > 0)
		printf("conformant\n");

We are supposed to believe that this is somehow less confusing than the
alternative (-uc > 0, -ui > 0).  The Rationale notes that the behaviour
of expressions such as

	if (-(unsigned short)-1 < 0)

is machine-dependent, without going so far as to give examples like
those above.  It also notes that all the ambiguity (along with the
default rules) can be eliminated with judicious use of casts.  Why
not, then, ask novices always to write those casts, and/or to remember
the rule `unsigned widens to unsigned'.

In find it significant that the unsigned preserving rules can be stated
in four words, while the value preserving rules require a paragraph
full of conditional wording.  How can something that is that hard to
say be `safer'?  As for the argument that the value-preserving rules
minimise the presence of mixed signed and unsigned operations, I submit
that a majority of these will occur between |u_int| and |long| objects,
and I note that in this case, on most modern systems (counting the
80386 as modern, but not the 286), the value preserving rules help
not at all.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris at mimsy.umd.edu	Path:	uunet!mimsy!chris



More information about the Comp.lang.c mailing list