Programming and international character sets.

Daniel Lawrence nwd at j.cc.purdue.edu
Tue Nov 1 00:54:43 AEST 1988


In article <532 at krafla.rhi.hi.is> kjartan at rhi.hi.is (Kjartan R. Gudmundsson) writes:
>
>How difficult is it convert american/english programs so that they can 
>be used to handle foreign text? The answer of course depends on the language
	[a description of some of the problems using 8 bit chars]
>
>Let's look at some code from MicroEMACS:
>
	[a code excerpt from MicroEMACS 3.9]
>Ugly isn't it?
>

	Ok, I am feeling a little picked on here... a lot of people like
using uEMACS for pointing things like this out.  When I first started
working with it, it was just for me. But that is really no excuse... 

>An other way of doing this is using "is.." functions that are
	[an alternative which is better]
>This code is better (most of the is.. things are macros that mask
	[More descriptions of 8 bit problems...]

	And someone finally proposes some solutions rather than just
blindly stabbing out and complaining.  The last round of complaints I
sent out a request for information on this problem, and the best I got
back was.. go to the library and do some research.  Well for a project I
am doing in my spare time, considering the poor library system round
here I really wasn't happy to here all the griping and then get no help
from anyone to fix the problems.  So I applaud Mr. Gudmundsson for his
mail.

>#	Kjartan R. Gudmundsson        #     
>#	Raudalaek 12                  #     
>#	105 Reykjavik                 #     Internet:  kjartan at rhi.hi.is      #


	However, after the last round, I thaought carefully about the 8
bit problems, and resolved that the issue was too complex on a language
by language basic for me to ever attempt to get all the case mappings
correct.  So when you see the next version of MicroEMACS, it will have
a user changable upper/lowercase mapping function (which is working
right now).  Note: This slows down the regular pattern matching code
considerable, so uEMACS can be compiled with the diacritical (un
american in this case) turned off, but both options now exits.

			Daniel Lawrence		(317) 742-5153
			UUCP:	{pur-ee!}j.cc.purdue.edu!nwd
			ARPA:	nwd at j.cc.purdue.edu
			FIDO:	1:201/10 The Programmer's Room (317) 742-5533



More information about the Comp.lang.c mailing list