signed/unsigned char/short/int/long [was: #defines with parameters]

Piercarlo Grandi pcg at aber-cs.UUCP
Mon Dec 12 05:27:40 AEST 1988


I realize that in my crudeness and brutality there is no hope for me to
achieve the extremely rarified levels of wisdom and learning of certain
people endowed with a quick grasp of issues and gentlemany manners of debate.

I therefore appeal (bowing my head, palms joined :->) to higher authority.

Let me quote and summarize from one such easily recognizable higher authority,
and repeat my own contentions (if it is boring for you, think how it is for me):

-----------------------------------------------------------------------------

#   4. What's in a name [ .... ]
#   Objects declared as characters ("char") are large enough to store any
#   member of the implementation's character set, and if a genuine character
#   from that character set is stored ina character variable, its value is
#   equivalent to the integer code of that character. Other quantities may be
#   stored in a character variable, but the implementation is machine
#   dependent.

character type == an integer type of sufficient length, whether "unsigned" or
"int" is up to the implementation.

#   Up to three sizes of integer, declared "short int" "int", and "long int"
#   are available.  [ .... ]

integer type == any one of the three lengths of "int", not just "int".

#   Unsigned integers, declared "unsigned", obey the laws of arithmetic
#   modulo "2^n", where "n" is the number of bits in the representation. (on the
#   PDP-11, unsigned long quantitied are not supported).

unsigned integer type == "unsigned" integer of all lengths, except of the
PDP-11. Semantics are different from thsoe of integer types, as they obey the
rules of modular, not algebraic, arithmetic.

#   [ .... ] Because objects of the foregoing types can be usefully interpreted 
#   as numbers, the will be referred to as "arithmetic" types. Types "char" and
#   "int" of all sizes will be collectively called "integral" types. [ .... ]

character type == "char", some large enough integer or unsigned integer type;
unsigned integer type == "unsigned" of all lengths;
integer type == "int" of all lengths (occasionally includes also "unsigned"s);
integral type == all three of them.
arithmatic type == integral types plus all lengths of "float".

#   6.1 Characters and integers
#   A character or a short integer may be used whenever an integer is used.
#   In all cases the value is converted to an integer.

There is no behavioural difference between char, short and other lengths of
"int", but for their range.

#   Conversion of a shorter integer to a longer always involves sign
#   extension; integers are signed quantities.

Integer types involve sign extension, by contrast with unsigned integer types.

#   Whether or not sign extension occurs for characters is machine dependent,
#   [ .... ].

Whether or not "char" is an integer or unsigned integer type is not prescribed.

#   [ .... ] When a longer integer is converted to a shorter or to a "char",
#   it is truncated on the left; excess bits are simply discarded.

There is no behavioural difference between "char" and "short", or other
lengths, except their size.

#   6.5 Unsigned
#   Whenever an unsigned integer and a plain integer are combined, the
#   plain integer is converted to unsigned and the result is unsigned.
#   The value is the least unsigned integer congruent to the signed
#   integer (module "2^wordsize"). [ .... ] When an unsigned integer is
#   converted to "long", the value of the result is the same numerically
#   as that of the unsigned integer. [ .... ]

The rules for conversions involving unsigned integers are different from
those for integers.

#   7. Expressions [ .... ]
#   The handling of overflow and divide check is expression evaluation is
#   machine dependent. [ .... ]

Note insofar overflow is concerned this only applies to integer types, as
unsigned integer types cannot overflow by definition. In other words,
exceeding the range of a length of "int" is not well defined, while exceeding
the range of a length of "unsigned" is.

Another case where there are behavioural differences between unsigned integer
and integer types.

#   7.2 Unary operators
#   [ .... ] The result of the unary "-" operator is the negative of its
#   operand. The usual arithmetic conversions are performed. The negative of
#   an "unsigned" quantity is computed by subtracting its value from "2^n",
#   where "n" is is the number of bits in an "int". [ .... ]

Another case where there are behavioural differences between unsigned integer
and integer types.

#   7.5 Shift operators
#   [ .... ] The right shift is guaranteed to be logical (0 fill) if "E1"
#   is "unsigned"; otherwise it may be (and is, on the PDP-11), arithmetic
#   (fill by a copy of the sign bit).

Another case where there are behavioural differences between unsigned integer
and integer types.

#   8.2 Type specifiers
#   [ .... ] The words "long", "short" and "unsigned" may be thought of as
#   adjectives; the following combinations are acceptable: [ .... ]

Here lies the crux of the matter. Throughout it is repeatedly and explicitly
stated that unsigned integer types behave differently from integer types, and
that the character type does not behave differently from a sufficiently
long/short unsigned integer or integer type.

Given this and the quoted phrase, it is apparent in hindsight that syntax and
semantics are incomplete, as there is no way to ensure the signedness of a
"char" (a similar problem exists with bit fields), and that syntax does not
properly reflect semantics.

dpANS C addresses the first point only, adding the "signed" keyword that can
thought of as another adjective and adding several cases to the table of
acceptable combinations.

My contentions (for the last time!) are that

    [1] this is not necessary, as it is more natural to drop the pretense
    that "char" is a type distinct from "int", and instead adopt the notion
    that "char" is like "short", an adjective that modifies the length of its
    base type;

    [2] it does not resolve the issue of making clear that "unsigned" is
    semantically different from "int", while the various lengths of either
    type are, but for the different ranges, semantically equivalent among
    themselves, and this distinction is important;

    [3] both points can be economically addressed by redefining as integral
    types the class of all integer and unsigned types, as integer types the
    various lengths of "int", as unsigned types the various lengths of
    "unsigned", and as length adjectives/modifiers the keywords "char",
    "short", "long"; when the adjective is omitted, the base type has the
    length of "short" or "long", depending on the implementation; when the
    base type type is omitted, "int" is presumed, except for length "char",
    where the choice is implementation dependent.

    [4] the proposed rationalization, provided that "unsigned int" is made as
    a special case equivalent to "unsigned", is backward compatible;

    [5] because of a easily made "mistake", some compilers, in the past or
    now, did not/do complain when the rationalized syntax was/is used, and
    this could be easily blessed instead of eradicated;

    [6] if it is felt desirable to substantially modify the declaration of
    "int" or "unsigned" types, a new keyword could be introduced for range
    definition, or the syntax for bit fields could be allowed outside

-------------------------------------------------------------------------

Kind reader, having had the patience to reach this point, make a last effort,
and please circle what you believe to be the correct answers:

[1] The material quoted above:

    [A] Is excerpted in an accurate, substantial and non misleading way
	from "The C programming Language - Reference Manual" (1978) the
	authoritative definition of Classic C.
    [B] I have made it up.

[2] The summaries I have made of the various passages quoted:

    [A] Accurately reflect the contents of said Reference Manual, or at
	least a consistent and historically defensible interpretation of
	those contents.
    [B] I have never read/understood the Reference Manual.

[3] The final contentions and suggestions are:

    [A] Supported by fair and reasonable technical arguments, based on the
	contents of said Reference Manual, as well as other more mundane points.
    [B] My advisor (if I had one) must be on drugs.
-- 
Piercarlo "Peter" Grandi			INET: pcg at cs.aber.ac.uk
Sw.Eng. Group, Dept. of Computer Science	UUCP: ...!mcvax!ukc!aber-cs!pcg
UCW, Penglais, Aberystwyth, WALES SY23 3BZ (UK)



More information about the Comp.lang.c mailing list