Uses of "short" ?

Sat Sep 14 11:49:58 AEST 1985

> But 'int' is a perfectly good abstraction; more abstract that 'short'
> or 'long.'  The restriction of certain values to certain ranges CAN
> be part of an abstraction, but it can also be an incidental factor
> that is only useful because some machines make a distinction that
> makes it useful.  That says to me that use of 'short' or 'long'
> instead of 'int' shows more attention to machine specificity.

The use of "long" instead of "int" shows more attention to machine
specificity?  OK, we have the object "Internet address".  This object can be
represented as, among other things, a 32-bit quantity.  (We neglect the
problem of non-binary machines for the nonce.) Implementing this object with
an "int" shows a hell of a lot of attention to machine specificity, since it
won't work worth a damn on a PDP-11, or any other machine with "int"s less
than 32 bits (like machines based on current 8086-family chips, or
68000/68010/68008 machines with 16-bit-"int" compilers).  Implementing it
with a "long" shows a lot less machine specificity, since (according to the
ANSI C standard) a "long" can hold numbers in the range -2147483647 to
2147483647.  (On a two's complement machine, or a one's complement machine,
or even a sign-magnitude machine, this requires 32 bits.)  The same argument
applies to "unsigned int" vs. "unsigned long".

If the 4.xBSD networking code had been written with less implicit knowledge
of the machines it would work on - i.e., if the type "long" had been used
where the C specification says it should be (the information explicitly
described in the ANSI C standard is here considered part of the "implicit"
specification of C - yes, it's folklore, but UNIX is still dominated by
folklore) - it would have moved to 2.9BSD more easily.  I believe a certain
popular news-reading system had much the same problem; it stored the length
of an article in an "int" instead of a "long".  Earlier versions of Berkeley
Mail had the same problem (and "mailx" is based on one of those earlier
versions, alas).

"int" is to be used to implement "integer" objects whose value will *never*
be outside the range -32767 to 32767, and where the amount of space taken up
by the object is less important than the amount of time required to
manipulate it.  (Well, modulo machines with 16-bit data paths and large
address spaces, where "int"s are often 32 bits even though it takes more
time to manipulate them than it does to manipulate 16-bit quantities.)
"short" is to be used to manipulate "integer" objects whose value will never
be outside that range but where the amount of space taken up by the object
is more important than the amount of time required to manipulate it (either
because there's a limit on the address space or physical memory available,
or because the object's representation must conform to some
externally-imposed restrictions).  "long" is to be used to manipulate
"integer" objects whose value can be outside the aforementioned range, or
whose representation must conform to some externally-imposed restriction
that requires the use of "long".

Code that uses "int" to implement objects known to have values outside the
range -32767 to 32767 is incorrect C.  The ANSI standard explicitly
indicates this.  Even in the absence of such an explicit indication in an
official language specification document, this information should be
imparted to all people being taught C.

If you removed "long" from the C language, you would either

	1) have a language incapable of talking about numbers outside the
	   range -32767 to 32767

or

	2) have a language which requires at least an 18-bit machine and
	   probably at least a 32-bit machine.

"short" is less commonly used, since it provides no guarantees about the
range of integral values it can represent that "int" doesn't provide.
However, it should be obvious to anyone who is aware of the fact that
correct C code can, in most if not all cases, be moved from one machine to
another (assuming no operating system dependencies) simply by recompiling it
that there *is* a reason to use "short" instead of "int" even if
sizeof(short) == sizeof(int) and even if the data doesn't have to conform to
some external specification.  Thinking of "short" as a compact form of
"int" and using it wherever space-efficiency is *or might be* of primary
importance will yield code that is more likely to run happily on a variety
of machines (and is less likely to piss off the guy who has to get the
program running efficiently on a computer other than one of the ones the
original programmer had in their shop).

> It may be the case that in a certain piece of code it is possible
> to prove that a variable's value must lie in a particular range.
> If the programmer specifies that range somehow, compilers for
> languages that support that distinction can produce code taking
> advantage of it.  From the programmer's point of view, however,
> that provable range is probably not significant to her view of
> the process.

In a lot of cases, I damn well hope that the provable range is significant
to the programmer's view of the process.  In the code

	int a[10];
	int i;

	i = <some value>;
	a[i]++;

if the programmer's view of the process does not include the (provable)
condition that "i" will never have a value outside the range 0 to 9, this
code is incorrect and, by Murphy's Law, will proceed to demonstrate that
fact at the worst possible moment.  Plenty of code demonstrates its
incorrectness in similar fashion; the code

	FILE *foo;
	char buf[SIZE];

	foo = fopen(<some_file_name>, "r");
	fgets(buf, SIZE, foo);

will so demonstrate on a Sun (or a CCI Power 5/20 or a lot of other
machines) simply by being run after ensuring that the file to be opened and
read does not exist.  Replacing it with

	foo = fopen(...);
	if (foo != NULL)
		fgets(...);

will probably render it provable that the "fgets" call won't screw up - or,
at least, won't screw up by reading from an unopened file.

I don't know whether there are proof techniques which are powerful enough to
prove the correctness of code involving subrange types in all interesting
cases and which are practical.  If there are, I'd like to see them
incorporated into compilers and have the compiler refuse to generate code
unless 1) the necessary checks are put in or maybe 2) explicit directives
are inserted *into the code* to tell the compiler that you know what you're
doing and it should trust you.  (I don't want it to be a compiler option; I
want the code to explicitly indicate that it's being unsafe.)

	Guy Harris