Seven Original Sins of K&R (Long)

Craig Mautner mautner at odin.ucsd.edu
Wed Sep 26 01:44:40 AEST 1990


My apologies for the consumption of bandwidth but I wanted
to send this through properly.  The last time it had the wrong
Subject line.

The author of this does not have access to the news groups.
He asked me to post this and see what comments it generates.
Any correspondence should be sent to him at the internet address
included in the header.

-Craig Mautner

//////////////////// Begin Included Message ////////////////////////


              Seven Original Sins of K&R
                 by Philip J. Erdelsky
                Compuserve: 75746,3411
          Internet: 75746.3411 at compuserve.com
                  September 22, 1990

The creation of C approximately two decades ago was a 
wondrous event, even if it did not seem so at the time.  
Like all human creations, C was imperfect.  I have 
identified seven Original Sins--minor flaws in C for 
which K&R will eventually have to answer, in this world 
or the next.  I call them original sins because they were 
present when C originated, not because K&R were the 
first to commit them.  Some of these sins have been 
purged from later versions of C, but others remain with 
us.

I am not the first to decry these sins, nor will I be 
the last.  I am merely another in a long series of 
prophets crying in the wilderness.

                           I

The First Original Sin was pitifully weak typing.  
There is no Boolean type in C, so generations of 
programmers have erroneously written something like "if 
(x=5)" instead of "if (x==5)", only to wonder why x 
always seems to be 5, regardless of what has gone 
before.  The "char" type was not specified as either 
signed or unsigned.  This sin has probably wasted more 
CPU time than any other, as savvy programmers learn to 
put a defensive "&0xFF" after every "char" expression 
that needs to be unsigned.  The default type for 
functions should have been "void", not "int", but there 
was originally no "void" type.

Modern compilers have provided partial redemption from 
this sin, usually by issuing warning messages when the 
program appears to be tainted.  But these warnings are 
often false alarms and go unheeded.  There is still no 
Boolean type, and "char" may be either signed or 
unsigned.  Even the new enumeration types are merely 
integers in disguise, just as willing to be mixed as 
matched.

                          II

The Second Original Sin was the failure to make "NULL" 
a keyword.  Beginning C programmers wonder why you have 
to "#include <stdio.h>" in a program that doesn't use 
standard I/O.  Some compilers don't even object when 
you assign an integer constant to a pointer without a 
typecast, especially when the constant happens to be 
zero.  Don't blame the compiler.  The poor thing can't 
tell the difference between a zero integer constant and 
"NULL".

Redemption from this sin is on its way.  Modern 
compilers define "NULL" as "(void *) 0", so there's at 
least some hope of distinguishing it from a plain old 
zero.

                          III

The Third Original Sin was the use of the keyword 
"static" to mark a function or variable as local to 
particular source file.  This is really a trinity of 
sins.  The word "static" doesn't mean local.  It 
conflicts with the other use of the word "static"--to 
mark a variable inside a function as one that actually 
is static, in an accepted meaning of the word.  
Finally, even if the word "local" had been used 
instead, it would have been marking the wrong thing.  
The word "public", or some similar word, should have 
been used to mark the few functions and variables that 
must be made available to the code in other files.  
Other functions and variables should have been local by 
default.  That's how it's done in assembly language and 
other high-level languages, and the reason for it is 
obvious.

>From this sin, however, no redemption is in sight.

                          IV

The Fourth Original Sin is the mandatory use of the 
"break" keyword to terminate a "case" clause in a 
"switch" statement.  Omitting it is natural for 
beginning programmers, and sometimes even for 
experienced programmers who have been dabbling in more 
tightly structured languages.  Of course, this causes 
control to fall through to the next case, which is 
occasionally useful but nearly always a mistake, like a 
double exposure in photography.  But the evil goes even 
further.  Often, the "switch" statement is enclosed in 
a "for" or "while" loop.  You want to finish up a 
"case" clause by breaking out of the loop?  You can't 
do it in C, not without breaking out of the "switch" 
statement first!

The solution, not likely to be adopted even in C+++, 
would be to have the compiler put an implicit "break" 
at the end of every "case" clause, and reserve the 
"break" keyword for breaking out of loops, the way God 
intended.

                           V

The Fifth Original Sin was the way functions are 
defined.  The entire parameter list has to be written 
twice.  That's something no programmer should have to 
do unless it's absolutely necessary.  And to compound 
the evil, an untyped parameter defaults to type "int".  
Most programmers have written something like 
"strcmp(s,t)", forgetting the declaration "char 
*s,*t;".  What you wind up with in most cases is, not a 
function that fails, but something worse--a function 
that works as long as pointers and integers are the 
same size, and then fails when you try to port it.

Fortunately, ANSI C permits prototype definitions, but 
the old way is still permitted, at least during a 
transitional period.  Let's hope the transition is 
brief.

                          VI

The Sixth Original Sin was the way conflicts among the 
names of members of different structures were neither 
forbidden nor resolved.  The original K&R said that 
different structures could have members with identical 
names as long as they had identical offsets.  The way 
early compilers implemented this dictum varied.  Some 
compilers would check to see that the offsets were 
indeed identical.  Others simply generated erroneous 
code when they weren't.  Most programmers took the 
safest course by including the structure name--usually 
abbreviated--in every member name.

Modern compilers have atoned for this sin completely by 
keeping a separate member list for each structure type.  
This resolves the conflicts, but a reminder of past 
iniquities persists in the awkward names of structure 
members in UNIX source code and other old C scriptures.

                          VII

The Seventh Original Sin was the eight-character limit 
on distinguishable names, or even fewer than eight for 
externally defined names.  Of course, some such 
limitation was required for efficient implementation, 
but eight characters are not enough.  C was much better 
than Fortran, which allowed only six, but there are 
many pairs of English words with distinct meanings 
whose first eight letters are identical.  The minimum 
number depends on the language, but for English about 
20 should be sufficient.  German programmers need more.

Most modern compilers do have a reasonable limit, but 
some compiler developers have apparently forgotten that 
virtue lies in moderation.  One compiler allows at 
least several hundred characters, maybe more.  That's 
too long.  Compilers are supposed to compile, not test 
the limits of computability by allowing single labels 
to occupy practically the entire computer memory (and 
disk swap area).  An unprintable name--one that won't 
fit on a single line--should also be uncompilable.

                       Epilogue

None of these sins is inconsistent with the philosophy 
of C.  We needn't embrace heresies like Pascal, Modula 
2 or Ada.  But we must abandon the false god of 100% 
upward compatibility.  We must tear down the old temple 
to build a new one.  Then, and only then, will our 
redemption be at hand.

                         Note

This jeremiad is not copyrighted.  You are welcome to 
copy it and pass it on.  I only ask you to leave my 
name and account number on it.  Let me take the 
credit--and the heat.

//////////////////// End Included Message ////////////////////////
-- 
--------------------------------------------------------------------
Craig D. Mautner		UCSD
mautner at cs.ucsd.edu		Dept of CSE, C-014
(619) 534-4526			La Jolla, Ca. 92093



More information about the Comp.lang.c mailing list