[tei-council] CH and BCP 47 again
Martin Holmes
mholmes at uvic.ca
Tue Jun 26 12:17:45 EDT 2012
Re-examining the CH chapter and BCP 47, I have a couple of things I'd
like to run by you. The first is this bit:
"Languages are identified by a language subtag, which may be a two
letter code taken from ISO 639-1 or a three letter code taken from ISO
639-2."
This is in line with BCP 47, which does not AFAICS express a preference
for two- or three-letter codes where both exist. However, in our own
practice in the Guidelines, we decided to use two-letter codes from ISO
639-1 where these exist, and only use three-letter codes from 639-2
where there is no two-letter code. I have a feeling this was because
this appears to be what the iana language-subtag-registry does:
<http://www.iana.org/assignments/language-subtag-registry>
You can see, for instance, that Modern Greek is "el" (639-1) rather than
the available "ell" and "gre" from 639-2. (Both sets are listed in
<http://www.loc.gov/standards/iso639-2/php/code_list.php>.)
Also BCP 47 notes this: "When languages have both an ISO 639-1
two-character code and a three-character code (assigned by ISO 639-2,
ISO 639-3, or ISO 639-5), only the ISO 639-1 two-character code is
defined in the IANA registry."
Therefore I would like to supplement our explanation above, like this:
"Languages are identified by a language subtag, which may be a two
letter code taken from ISO 639-1 or a three letter code taken from ISO
639-2. The practice in these Guidelines is to prefer a two-letter code
from ISO 639-1 where one exists, following the practice in the IANA
Language Subtag Registry, and we recommend that encoders also follow
this convention."
Secondly, there are two specific examples of extended tags in CH which I
find it hard to parse according to the rules:
zh-s-nan (the Southern Min language of the macrolanguage Chinese)
zh-s-nan-Hans-CN (the Southern Min language of the macrolanguage Chinese
as spoken in China written in simplified Characters)
The use of s-nan seems incorrect to me, and I can find no examples of it
in the wild. If anyone remembers the source for these tags, could you
explain it to me?
Cheers,
Martin
--
Martin Holmes
University of Victoria Humanities Computing and Media Centre
(mholmes at uvic.ca)
More information about the tei-council
mailing list