Long Chars

Bill Kennedy kennedy at tolerant.UUCP
Tue Mar 15 02:58:38 AEST 1988


In article <12341 at brl-adm.ARPA> TLIMONCE%DREW.BITNET at CUNYVM.CUNY.EDU writes:
>[ pun reference omitted ]
>
>The "short char vs char" problem can't be solved very easily.  Why not a
>"long char".  That wouldn't break much code now, would it?  Now I'm not
>demanding that it goes into v1.0 of the standard but maybe we can look at
>this for the next "congress".

There are already specifications for it, AT&T has one and I think I read
something from HP about it as well.  It can be solved rather easily and
it need not break much code if the code is well written.  The same old
dragon that breathed up the pointer/int thing just rears its ugly head
again for characters.

>For now, if you want to make some progress, try to get one of the biggies
>(like MS) to add it as an extension.  You can tell them that they'll hit
>on the "multi-nation/multi-language vendor market" with it.

I disagree.  I am using long characters for a specific purpose and adding
the baggage to domestic computing wouldn't serve any useful purpose.  I don't
think that you will get a software vendor to weave it in if it costs
performance at compile or run time (which they do, both...).  The hardware
manufacturers will implement it themselves if they want to penetrate farther
into the overseas markets.  Remember it's not just a world of 7 or 15 bit
characters, variations on the Roman alphabet are handled, e.g. Europeans,
with the eighth bit (has it's own problems too, not pertinent).  I don't
think that you will get any momentum at all from software houses but I have
first hand knowledge :-) that the computer manufacturers get pretty interested.

>Of course, in my programming I don't have a use for it, but if you do, try
>
>typedef short LONG_CHAR;
>or
>typedef char LONG_CHAR[2];
>(Hmmm... I like the former)

No offense intended but I wholeheartedly agree with "don't have a use..."
and I would suggest it reads "haven't had any experience with...".  I'm
also not scolding you, I work with the things every day and there are some
very real traps.  If you just make it a typedef you'll get your storage
sizes right (for the most part) but you can't manupulate either of your
examples very well.  I use lchar because it's easier to type then LONG_CHAR.
You need a further refinement so that you can look at each byte and the bits
within each byte, I use a structure and a union within that.

>and then you can implement a lstrcmp() and a lstrcpy() and an assortment
>of routines like that.  Then when you're done, those can be re-used in all
>your programs.

You also need routines to convert into and out of strings containing long
characters and some way to insulate yourself from cases and while(c)
things that make assumptions about character size and content.  To qualify
the long character structure/union approach, vi, the shell, and I'm sure
other programs use the MSbit of a character for their own pruposes.  Many
Asian terminals set the MSbit of a byte as a flag that another byte is coming
with the rest of the character.  In some European countries it's quite normal
for the MSbit to be set for a special character native to their alphabet but
absent from ASCII.  So here you see but three uses of the MSbit that are
darned near mutually exclusive and require further inspection of the byte
stream.

>When it get's suggested to ANSI C II (or whatever it'll be called) you'll
>be there to warn us about implementation difficulties and ideas.  And when
>it gets passed you can do a search-and-replace from "LONG_CHAR" to "long
>char"

I'm not convinced that it belongs in the language specification because it
is so implementation specific.  In fact I'm not sure that it even needs to
exist for hardware destined for a technical audience.  Those professionals
have learned to read ASCII like some of us did APL :-)  When you start to
bring in commercial applications where you want to drive down the level of
skill required to operate a program, that's where you need the additional
capability/overhead.  You made a good start and now I have overkilled it for
you...

These are my opinions and observations, Tolerant is nice enough to let me use
their equipment; so don't blame me on them.

Bill Kennedy {rutgers,cbosgd,killer}!ssbn!bill or bill at ssbn.WLK.COM



More information about the Comp.lang.c mailing list