TRUE and FALSE
Craig Mautner
mautner at odin.ucsd.edu
Wed Sep 26 01:18:18 AEST 1990
The author of this does not have access to the news groups.
He asked me to post this and see what comments it generates.
Any correspondence should be sent to him at the internet address
included in the header.
-Craig Mautner
//////////////////// Begin Included Message ////////////////////////
Seven Original Sins of K&R
by Philip J. Erdelsky
Compuserve: 75746,3411
Internet: 75746.3411 at compuserve.com
September 22, 1990
The creation of C approximately two decades ago was a
wondrous event, even if it did not seem so at the time.
Like all human creations, C was imperfect. I have
identified seven Original Sins--minor flaws in C for
which K&R will eventually have to answer, in this world
or the next. I call them original sins because they were
present when C originated, not because K&R were the
first to commit them. Some of these sins have been
purged from later versions of C, but others remain with
us.
I am not the first to decry these sins, nor will I be
the last. I am merely another in a long series of
prophets crying in the wilderness.
I
The First Original Sin was pitifully weak typing.
There is no Boolean type in C, so generations of
programmers have erroneously written something like "if
(x=5)" instead of "if (x==5)", only to wonder why x
always seems to be 5, regardless of what has gone
before. The "char" type was not specified as either
signed or unsigned. This sin has probably wasted more
CPU time than any other, as savvy programmers learn to
put a defensive "&0xFF" after every "char" expression
that needs to be unsigned. The default type for
functions should have been "void", not "int", but there
was originally no "void" type.
Modern compilers have provided partial redemption from
this sin, usually by issuing warning messages when the
program appears to be tainted. But these warnings are
often false alarms and go unheeded. There is still no
Boolean type, and "char" may be either signed or
unsigned. Even the new enumeration types are merely
integers in disguise, just as willing to be mixed as
matched.
II
The Second Original Sin was the failure to make "NULL"
a keyword. Beginning C programmers wonder why you have
to "#include <stdio.h>" in a program that doesn't use
standard I/O. Some compilers don't even object when
you assign an integer constant to a pointer without a
typecast, especially when the constant happens to be
zero. Don't blame the compiler. The poor thing can't
tell the difference between a zero integer constant and
"NULL".
Redemption from this sin is on its way. Modern
compilers define "NULL" as "(void *) 0", so there's at
least some hope of distinguishing it from a plain old
zero.
III
The Third Original Sin was the use of the keyword
"static" to mark a function or variable as local to
particular source file. This is really a trinity of
sins. The word "static" doesn't mean local. It
conflicts with the other use of the word "static"--to
mark a variable inside a function as one that actually
is static, in an accepted meaning of the word.
Finally, even if the word "local" had been used
instead, it would have been marking the wrong thing.
The word "public", or some similar word, should have
been used to mark the few functions and variables that
must be made available to the code in other files.
Other functions and variables should have been local by
default. That's how it's done in assembly language and
other high-level languages, and the reason for it is
obvious.
>From this sin, however, no redemption is in sight.
IV
The Fourth Original Sin is the mandatory use of the
"break" keyword to terminate a "case" clause in a
"switch" statement. Omitting it is natural for
beginning programmers, and sometimes even for
experienced programmers who have been dabbling in more
tightly structured languages. Of course, this causes
control to fall through to the next case, which is
occasionally useful but nearly always a mistake, like a
double exposure in photography. But the evil goes even
further. Often, the "switch" statement is enclosed in
a "for" or "while" loop. You want to finish up a
"case" clause by breaking out of the loop? You can't
do it in C, not without breaking out of the "switch"
statement first!
The solution, not likely to be adopted even in C+++,
would be to have the compiler put an implicit "break"
at the end of every "case" clause, and reserve the
"break" keyword for breaking out of loops, the way God
intended.
V
The Fifth Original Sin was the way functions are
defined. The entire parameter list has to be written
twice. That's something no programmer should have to
do unless it's absolutely necessary. And to compound
the evil, an untyped parameter defaults to type "int".
Most programmers have written something like
"strcmp(s,t)", forgetting the declaration "char
*s,*t;". What you wind up with in most cases is, not a
function that fails, but something worse--a function
that works as long as pointers and integers are the
same size, and then fails when you try to port it.
Fortunately, ANSI C permits prototype definitions, but
the old way is still permitted, at least during a
transitional period. Let's hope the transition is
brief.
VI
The Sixth Original Sin was the way conflicts among the
names of members of different structures were neither
forbidden nor resolved. The original K&R said that
different structures could have members with identical
names as long as they had identical offsets. The way
early compilers implemented this dictum varied. Some
compilers would check to see that the offsets were
indeed identical. Others simply generated erroneous
code when they weren't. Most programmers took the
safest course by including the structure name--usually
abbreviated--in every member name.
Modern compilers have atoned for this sin completely by
keeping a separate member list for each structure type.
This resolves the conflicts, but a reminder of past
iniquities persists in the awkward names of structure
members in UNIX source code and other old C scriptures.
VII
The Seventh Original Sin was the eight-character limit
on distinguishable names, or even fewer than eight for
externally defined names. Of course, some such
limitation was required for efficient implementation,
but eight characters are not enough. C was much better
than Fortran, which allowed only six, but there are
many pairs of English words with distinct meanings
whose first eight letters are identical. The minimum
number depends on the language, but for English about
20 should be sufficient. German programmers need more.
Most modern compilers do have a reasonable limit, but
some compiler developers have apparently forgotten that
virtue lies in moderation. One compiler allows at
least several hundred characters, maybe more. That's
too long. Compilers are supposed to compile, not test
the limits of computability by allowing single labels
to occupy practically the entire computer memory (and
disk swap area). An unprintable name--one that won't
fit on a single line--should also be uncompilable.
Epilogue
None of these sins is inconsistent with the philosophy
of C. We needn't embrace heresies like Pascal, Modula
2 or Ada. But we must abandon the false god of 100%
upward compatibility. We must tear down the old temple
to build a new one. Then, and only then, will our
redemption be at hand.
Note
This jeremiad is not copyrighted. You are welcome to
copy it and pass it on. I only ask you to leave my
name and account number on it. Let me take the
credit--and the heat.
//////////////////// End Included Message ////////////////////////
--
--------------------------------------------------------------------
Craig D. Mautner UCSD
mautner at cs.ucsd.edu Dept of CSE, C-014
(619) 534-4526 La Jolla, Ca. 92093
More information about the Comp.lang.c
mailing list