Programming and international character sets.

Every system needs one terry at wsccs.UUCP
Thu Nov 10 13:49:14 AEST 1988


In article <621 at quintus.UUCP>, ok at quintus.uucp (Richard A. O'Keefe) writes:
> In article <207 at jhereg.Jhereg.MN.ORG> mark at jhereg.MN.ORG (Mark H. Colburn) writes:
> >In article <8804 at smoke.BRL.MIL> gwyn at brl.arpa (Doug Gwyn (VLD/VMB) <gwyn>) writes:
> >>In article <532 at krafla.rhi.hi.is> kjartan at rhi.hi.is (Kjartan R. Gudmundsson) writes:
> >>>How difficult is it convert american/english programs so that they can 
> >>>be used to handle foreign text? [etc.]
> 
> Xerox have supported a 16-bit character set (XNS) for years.
> Some of the surprises mentioned by Mark Colburn have been no news
> to Interlisp-D programmers for a long time.
> 
> The kludges being proposed for C & UNIX just so that a sequence of
> "international" characters can be accessed as bytes rather than pay
> the penalty of switching over to 16 bits are unbelievable.

First of all, there are too many 8-bit character models available:

	All of the ISO models, DEC Multinational, 7-bit replacement sets,
	Wang-PC international sets, and IBM-PC International sets.

There is no way to consolidate it without mapping, and that's so device
dependant it isn't funny.  Consider your termcap growing by at least 128
times the number of entries characters... assuming that there is no need for
multiple GS/GE strings, as it may require more than one additional character
set on some terminals.

Second, vi in the US strips the 8th bit out, and is therefore not
usable for programming international (8-bit) characters using either model.


Problems with 16 bit characters:

O	The Xerox model is 16-bit and only valid for bitmapped displays,
	like Mac, and we all know how slowly that scrolls.

O	All of the current software would break without extensive rewrite

O	The internal overhead in a non-message passing operating system
	(most of them) is so high that it's ridiculous.

O	Think of pipes and all file I/O going half as fast.

O	Think of your hard disks shrinking to half their size... source
	files, after all, are text.

			terry at wsccs



More information about the Comp.lang.c mailing list