sizeof(char)

Thu Nov 13 12:04:19 AEST 1986

In article <9181 at sun.uucp> guy at sun.uucp (Guy Harris) writes:
>If it is indeed the case that there is more than one way of sorting text in,
>say, Oriental languages, then either 1) "setlocale" is a poor name, because
>it takes into account more than just the locale, or 2) it is a poor routine,
>because it doesn't take into account more than just the locale.

The name is short for "set locale-specific information", which reflects the
main motivation for the function.  There were several suggestions for the
name, but we couldn't find one that we liked better, other than contractions
of "set environment", which had to be rejected for the obvious reason.
Actually, it WAS intended that setlocale() indeed mean "change or query the
program's entire LOCALE or portions thereof", where the term "locale" was
to be defined in section 1.5.  However, something appears to have gone awry
in the process of making this last-minute addition to the draft proposed
standard document, since there are two sentences in the description of
setlocale (section 4.4.1.1) that say almost the same thing using different
words, and section 1.5 defines "locale-specific behavior" but not "locale".
The general term "locale" is intended in the context of X3J11 to refer to
a complete, orthogonal set of selections of conventions for items that are
allowed to affect program operation based on nationality, culture, or
language.  Thus "locale" is not synonymous with "location".

By the way, one doesn't have to turn to oriental languages to find more
than one way of sorting text.  Even English has several different collating
sequences, depending on the specific application.

>The claim you made was that "strcoll() amounts to a declaration that there
>IS a natural multibyte collating sequence for any single environment" is a
>little hard to parse.  I assume you mean that "by specifying that there
>is such a routine, the proposers of strcoll() are declaring that there IS a
>natural multibyte collating sequence for any single environment."  Given
>that "setlocale" exists, I fail to see how it declares this, unless
>"environment" is defined so that an environment always specifies a single
>collating sequence.  In the latter case, the claim is true, but trivially so.

I used "environment" rather than "locale" since the technical X3J11 meaning
of the latter is not well known.  The existence of a natural collating
sequence for a locale is not at all obvious; one might question whether
it is really true for languages that use ideographs for their printed
representation, for example.

>Fine.  Are you prepared to admit that there *is* a non-trivial trade-off
>involved in the "short char" proposal (i.e., that it is not a given that
>few, if any, lines of *existing* code need change so that it can work
>equally well in an one-storage-unit "char" and a two-storage-unit "char"
>environment), and that some people might rationally disagree with your value
>weighting of the changes needed to existing code to make it work in a
>two-storage-unit "char" environment and to make it work in a "long char"
>environment?

I have been maintaining that very little existing code is affected:
NONE on implementations that decide to make sizeof(char)==1, and almost
none for the vast majority of applications code on implementations that
decide to support multi-byte (char)s.  I even gave examples of most
typical code dependence on sizeof(char)==1.  I can well believe that
AT&T's STREAMS code would be heavily dependent on the constraint (in
fact, I wonder whether it could even be made to work on a 20- or 36-bit
word architecture, if it depends so much on the size of a (char));
however, I don't mind nearly so much making more work for kernel workers,
network hackers, and other lower life forms as I do making more work for
application developers.  (As I said, different value weighting.)