signed/unsigned char/short/int/long

Piercarlo Grandi pcg at aber-cs.UUCP
Thu Dec 8 23:18:58 AEST 1988


In article <9086 at smoke.BRL.MIL> gwyn at brl.arpa (Doug Gwyn (VLD/VMB) <gwyn>) writes:
    
    You're still inaccurate.  "far" and "near" were never at any time in
    any draft of the proposed ANSI C standard.
    
    "noalias" was not in any draft other than the one sent out for the
    second public review.  How could it be, when it had just been invented
    at the previous meeting and was retracted at the next?

Just a moment! I have just apologized saying that my inaccuracy was to imply
that near and far eventually made it into dpANS.  I acknowledge that they did
not make it there, still at one time (fairly long ago) they were considered
to be going to be part of ANSI C (yes, I know that ANSI C does not yet
exist), and the usual trade interests advertised (they shouldn't have done
it, I know) these as ANSI conforming features of their compiler.  I am
pleased that, along with noalias (that did eventually make it into just one
official dpANS issue) X3J11 had enough sense to avoid them.

    Are you just making this stuff up, or do you have drug-using advisors, or
    what?

Maybe I have seen too many Laurel & Hardy films ["... Look at what you made
me do!"] and I cannot keep them too well distinct from X3J11 work :-) :-).

More seriously, I have been using C for 8-9 years now, and following X3J11
for many years as well. The attention I have devoted to following X3J11 in
later years has not been great, as disappointment set in (volatile?  signed?
reserved word functions? structural equivalence only if different compilation
units? etc...).

I would like also to add that you are right to ask that X3J11 be taken to
account only for what has been perpretated in the latest official, published
version of dpANS, but I am right too to raise again the specter of old issues
that have been discussed quite seriously, as they are part of the full
picture.  When you want to run for President, after all, you know that people
will look at whether you stole cookies when you were twelve :-)...

    >As to the last point, char has been so far just a short short; a char
    >value can be operated upon exactly as an integer.

    Except that whether it acts as signed or unsigned depends on the
    implementation.

Gee, I see you have indeed read the Classic edition of K&R.  Let me be
nitpicking. I said "integer", not "int", and for once I was accurate :-).
You know the meaning of "integer" and "integral".  What I was saying is that
I cannot really see a strong enough difference in the semantics of "char" and
"int/unsigned" to need an "integral" class distinct from "integer"; I think
that the Classic C book can be slightly reinterpreted or amended to make char
belong to "integer" (approximately, just as a modifier on int/unsigned).

The other problem with the Classic C book (that is, apart from distinguishing
between "integer" and "integral"), and you seem to have understood it
correctly :-), is that it only defined "char", whose signedness was
implementation dependent, and "unsigned char".

What I am asking is why X3J11 did not legalize the combination "int char",
hitherto not legal, but accepted by some popular compilers because of an
easily explained benign mistake, to mean "signed char", WITHOUT the
introduction of a new keyword with further complication of the rules for
declarations. I cannot believe they did not think of it...

A related but distinct issue, that fits in nicely with the first, is why it
has not been stipulated that there are two integral types, int/unsigned, with
different arithmetic properties, and three optional lengths for wither of
them, char/short/long, instead of writing up tables of permitted
combinations, which are somewhat more complex, and less clear as to the
fundamental difference in semantics between unsigned and int.

    >Historically char constants have been really the size of integer
    >constants...

    You mean "character constants"; in C they ARE integer constants
    specified in a certain character-oriented way.

Exactly, thnak you for the nitpicking. I used this point to show that
"philosophically" char in C is just a shorter type of integer. This is not
suprising, considering that C is a descendant of BCPL (whose single most
annoying feature is having to use putbyte() and getbyte() for string
manipulation, as it has just one length of integer).

In a sense C is a wonderfully equilibrated mix, BCPL with quite a good lot of
Algol68 thrown in, and this shows thru in things like some semantics
(BCPL-ish) of integer types, and their syntax (Algol68-ish).  I can say this
having studied in depth (several years ago) both Algol68 and BCPL; it is a
pity that so many C programmers don't know either, and miss the pleasure of
contemplating some important threads of history (e.g.  BCPL and Algol68 are
themselves related by way of CPL).

    >Now I reiterate the question: why was a new keyword introduced "signed"
    >when it just sufficed to sanction the existing practice of some
    >compilers (PCC had it, more recent BSD versions fixed this "bug") to
    >say "int char" or better "char int"?

    I have never seen a C compiler that accepted "int char";

Well, you have seen few, I surmise, or you never tried (more likely, I
admit). As I explained, the fact that some (or even several) compilers
do accept "int char" is the result of an easily made mistake in a
particular, but popular, parsing strategy for C declarations.

    certainly Ritchie didn't intend for it to be valid.  Also, char has
    never been guaranteed to be signed; read K&R 1st Edition.

I am pleased that we do agree on something, indeed Ritchie never
intended it to be valid and he did carefully not specify the default
signedness of char; I am also pleased that you have actually read
Classic K&R, and not just the less delectable works from X3J11.

    It happened to be most efficient on the PDP-11 to make it signed, ...

	[ well known list of machines and defaults for char signedness omitted ]

    ... implementation dependence in his BSTJ article on C in 1978.

You even read the BSTJ! My, you must be quite a learned fellow. If so,
you will also know that char is by default unsigned also in some 68K
compilers, while most Intel compilers have it signed. Incidentally, I
have even seen two compilers for the same architecture (68k) implement
a different default! Unfortunately, your precious information (that can
be found, by the way, in a table in any Classic K&R book) is beyond my
my point. Also, I am not entirely surprised/amused at your repeated
assumptions that nobody has bothered to read the Classic C book.

[ By the way, for the benefit of our audience, I will add that many
Classic C and Unix articles from various BSTJs etc... have been
reprinted in a more easily obtained set of two volumes; if I remember
correctly, "Unix Papers" by Academic Press. ]

    >Amusingly it persists even today in other compilers, among them
    >g++ 1.27, where interestingly "sizeof (char int)" is 4 and "sizeof
    >(int char)" is 1 on a 68020...

    I don't know what C++ rules for basic types really are, but if as I
    suspect g++ is getting it wrong, you should report this bug to the
    GNU project.

Well, technically this IS a mistake. On the other hand I am not going
to complain, of course...  (except that I do not like the dissimetry
between "char int" and "int char").  If you had read the full
paragraph, I did say that it is an unintentional "feature", I even
explain why and how this mistake is commonly made by C compiler
writers.


What I am still waiting for, instead of cheap innuendo and showing off
that one had read the Classic K&R (as though nobody else did), is for
somebody to make a good case for:

    introducing the signed keyword and related paraphernalia instead of
    allowing "int char" (an existing unintentional "feature" of some
    compilers, by the way) to do the trick,

    NOT stipulating that there are two fundamental types with very
    different semantics, that can come in four different lengths, and
    therefore having to do with three word long type specifiers, and
    fairly tedious tables of what is permitted, and not emphasizing
    the distinction between int and unsigned.

Note that both things are essentially issues of elegance and easier
comprehensibility, which are damn important in a language like C, and
both can be introduced into the language with essentially a slight
reinterpretation and/or the removal of restrictions of existing rules.
-- 
Piercarlo "Peter" Grandi			INET: pcg at cs.aber.ac.uk
Sw.Eng. Group, Dept. of Computer Science	UUCP: ...!mcvax!ukc!aber-cs!pcg
UCW, Penglais, Aberystwyth, WALES SY23 3BZ (UK)



More information about the Comp.lang.c mailing list