TRUE and FALSE

Wed Sep 26 15:52:53 AEST 1990

In article <12777 at sdcc6.ucsd.edu>, mautner at odin.ucsd.edu (Craig Mautner) writes:
[1.  "Weak typing" was the phrase used, but actually
;    no "bool" type, and
;    char neither signed nor unsigned
were the real complaints.]

In the beginning there was CPL, an Algolish language with lots of goodies
including types.  CPL begat BCPL, which was the nearest thing there was to
a portable "implementation language" for years.  However, BCPL had one type:
"machine word", which had to serve for integer, float, and (word) address.
BCPL begat B, and B begat C, which broke with years of BCPL/B tradition by
adding types (it took a couple of revisions, but they're there).

'char' was neither signed nor unsigned because the machines of the time
didn't support both well; on a PDP-11 unsigned char was much more costly
than signed char, while on an IBM/360 signed char was much more costly
than signed char.  The one thing you were guaranteed was that any character
in the machine's usual alphabet could be represented in a 'char'.  This is
not a case of weak typing:  this implementation freedom in the representation
of characters was excellent engineering.  That there was no _separate_ type
for "8 bit integer", _that_ is the weak point.  But there weren't originally
any unsigned integers either.  (In UNIX V6 it was still common to use
pointer arithmetic to get unsigned integer arithmetic.)

It may sound as though I am nit-picking, but there's really no point in
raving on about the faults of a programming language unless you get it
quite clear what the faults really are.

Something which has long puzzled me is why so much of BCPL was retained
in C, yet MANIFEST was not.  In BCPL you could say
	MANIFEST $( I = 1, J = 2, K = 3 $)
which corresponds to ANSI C
	const int i = 1, j = 2, k = 3;
and as in ANSI C the "initial values" could be expressions.  Oh well,
what was that about repeating history?

[2.  NULL not a keyword]

> Beginning C programmers wonder why you have to "#include <stdio.h>"
> in a program that doesn't use standard I/O.

Any beginning C programmer who wonders any such thing has been
taught shockingly badly.  You *don't* have to include stdio.h.
In ANSI C, NULL is defined in at least three places:
	stddef.h	NULL, offsetof, ptrdiff_t, size_t, wchar_t
	stdlib.h	NULL, EXIT_SUCCESS, EXIT_FAILURE, lots of fns
	stdio.h		NULL, FILE, fopen, ...
In pre-ANSI C, there's nothing to stop you #define'ing NULL yourself.
I was in the habit of using explicit casts, (char*)0 and the like.

> Redemption from this sin is on its way.  Modern 
> compilers define "NULL" as "(void *) 0"

No.  ANSI-compliant compilers have *permission* to define NULL this
way, but all three of 0, 0L, and (void*)0 are allowed.  _Some_ compilers
will define NULL to be (void*)0, but other compiler writers will have
compassion on the idiots who wrote "char x = NULL;" and not break their
programs.

[3.  static]

This could be argued several ways.  I have long had in my personal
header file
	#define public
	#define private static
"static" is not such a big deal.  Remember, the norm for good C style
is to have a very small number of functions in a file, which means that
'static' functions are expected to be rare.

If you knew how BCPL used to tie things together, you would agree that
the way C does it is a _big_ improvement.  (Would you believe an
explicit "global vector" with externals identified by number?)

[4.  break]

There are two separate and distinct issues here, and it really doesn't
help to confuse them.

    1.  "case xxx:" in C is just a label, and no more contains a jump
	than any other kind of label.  So you can "fall through" into
	the next case.

    2.  The way you say "get out of the switch () statement" is the
	same as the way you say "get out of a while ()", so it is
	pointlessly hard to have a switch case that exits a loop.

BCPL had property 1, but not property 2.  In BCPL, the way that you
said "get out of the switch ()" was ENDCASE, while the way that you
said "get out of the loop" was BREAK.  It is a mystery to me why B
or C ever changed this.  But it is important to be clear about the
fact that the two properties are quite independent.

[5.  function definitions]

> The Fifth Original Sin was the way functions are defined.
> The entire parameter list has to be written twice.

This meant that one could easily _add_ parameter type information
to existing V5 or V6 code.  It also imitated Algol 60 and Fortran.
It wasn't by any means an innovation.  I actually rather like the
Algol/Fortran/classic-C style of function header, because it is
very easy to indent consistently (unlike the Algol 68/Pascal/Ada
approach), and is a splendid opportunity to place the usual
explanatory comment:
	type foo(x, y, z)
	    xtype x;	/* say what x is all about */
	    ytype y;	/* say what y is all about */
	    ztype z;	/* say what z is all about */

> Most programmers have written something like 
> "strcmp(s,t)", forgetting the declaration "char 
> *s,*t;".  What you wind up with in most cases is, not a 
> function that fails, but something worse--a function 
> that works as long as pointers and integers are the 
> same size, and then fails when you try to port it.

I don't understand this.  If the body of the function refers to
*s or *t (and it is hard to imagine an implementation of strcmp
that didn't) the compiler will catch it.  Nothing about the
relative sizes of pointers and integers is involved.  Other parts
of the program (in classic C) don't know the argument types _anyway_,
so we're only concerned here with the effect on the function definition.
I can imagine problems if int and long aren't the same size, but how do
you get something where pointer/int is a likely confusion?

[7.  eight-character names]

> The Seventh Original Sin was the eight-character limit 
> on distinguishable names, or even fewer than eight for 
> externally defined names.  Of course, some such 
> limitation was required for efficient implementation, ...
> for English about 20 [characters] should be sufficient.

This simply isn't true.  No limitation whatsoever is necessary for
efficient implementation.  20 is nowhere *near* enough for English,
and 31 is barely adequate.

Recall that from quite early days, C has been used on other platforms
than UNIX.  Name length restrictions are a regrettable fact of life
with other people's linkers.  Blame the linkers, not the language.
I can't say that I was impressed by the UNIX V7 linker's restrictions,
but do bear in mind that it had to work in an address space of 64kb
*total*.  It was a big improvement on the IBM 1130's 5-character limit!

The ANSI C standard imposes *no* limit on the number of characters that
a C system *may* consider significant.  A compiler _may_ look at only
the first 31 characters, but it _may_ regard them all as significant.
A portable program must ensure that external identifiers are unique in
the first six characters (after case folding, too), but that's not a
sin in C, it's a fact of life about other people's linkers.  On a modern
UNIX system, all the characters are significant.

> But we must abandon the false god of 100% upward compatibility.

C is a language with a history.  To adapt a metaphor from S.J.Gould,
think of it as a panda, but as a panda that's all thumbs (:-).  There
never has been a goal (let alone a god) of 100% upward compatibility.
But the goal of 95% upwards compatibility was vital:  if the standard
had been too radically different, it wouldn't have been usable as a
standard for *C*.

-- 
Fixed in the next release.