[tei-council] CH and BCP 47 again

Martin Holmes mholmes at uvic.ca
Tue Jun 26 12:17:45 EDT 2012


Re-examining the CH chapter and BCP 47, I have a couple of things I'd 
like to run by you. The first is this bit:

"Languages are identified by a language subtag, which may be a two 
letter code taken from ISO 639-1 or a three letter code taken from ISO 
639-2."

This is in line with BCP 47, which does not AFAICS express a preference 
for two- or three-letter codes where both exist. However, in our own 
practice in the Guidelines, we decided to use two-letter codes from ISO 
639-1 where these exist, and only use three-letter codes from 639-2 
where there is no two-letter code. I have a feeling this was because 
this appears to be what the iana language-subtag-registry does:

<http://www.iana.org/assignments/language-subtag-registry>

You can see, for instance, that Modern Greek is "el" (639-1) rather than 
the available "ell" and "gre" from 639-2. (Both sets are listed in 
<http://www.loc.gov/standards/iso639-2/php/code_list.php>.)

Also BCP 47 notes this: "When languages have both an ISO 639-1 
two-character code and a three-character code (assigned by ISO 639-2, 
ISO 639-3, or ISO 639-5), only the ISO 639-1 two-character code is 
defined in the IANA registry."

Therefore I would like to supplement our explanation above, like this:

"Languages are identified by a language subtag, which may be a two 
letter code taken from ISO 639-1 or a three letter code taken from ISO 
639-2. The practice in these Guidelines is to prefer a two-letter code 
from ISO 639-1 where one exists, following the practice in the IANA 
Language Subtag Registry, and we recommend that encoders also follow 
this convention."

Secondly, there are two specific examples of extended tags in CH which I 
find it hard to parse according to the rules:

zh-s-nan (the Southern Min language of the macrolanguage Chinese)

zh-s-nan-Hans-CN (the Southern Min language of the macrolanguage Chinese 
as spoken in China written in simplified Characters)

The use of s-nan seems incorrect to me, and I can find no examples of it 
in the wild. If anyone remembers the source for these tags, could you 
explain it to me?

Cheers,
Martin

-- 
Martin Holmes
University of Victoria Humanities Computing and Media Centre
(mholmes at uvic.ca)



More information about the tei-council mailing list