[tei-council] representing transliteration in @xml:lang (was Re: biblscope and imprint)

Kevin Hawkins kevin.s.hawkins at ultraslavonic.info
Tue Nov 13 22:30:31 EST 2012


Let me attempt to summarize this discussion so far, especially for the 
benefit of the incoming Council members.

While Lou was implementing http://purl.org/TEI/FR/3555190 , he noticed 
this example given in passing in CO-CoreElements.xml:

<title xml:lang="ru" level="m">Ciklotronnye volny v plazme</title>

which is a TEI representation of part of a source document that includes 
a citation in which a title in Russian is transliterated into Roman 
letters.  Lou identified the transliteration scheme used as ISO/R 
9:1968, though I hereby note that it could also be a few other systems 
of transliteration which happen to coincide for the characters used in 
this title.

Lou asked whether "ru" is the correct value of @xml:lang for Russian 
written in Roman letters.  Lou thinks that for processing purposes it's 
wrong to have just "ru" when you expect to see Cyrillic letters, whereas 
Kevin argued that the language is simply underspecified when using only 
a language subtag.  (Note that according to the nomenclature of BCP 47, 
which defines how to create a value for @xml:lang, the whole value of 
this attribute is called a "language tag", whereas the components of 
this "tag", separated by hyphens, are called "subtags".)

Note also that we don't want to just detransliterate the Russian title 
since it comes from a source document that we are attempting to represent.

Lou thought that you can't use a script subtag with a language subtag 
unless the combination is enumerated in the IANA registry linked from 
BCP 47 ( http://www.iana.org/assignments/language-subtag-registry ), and 
while I at first agreed, as I read this document more closely, I see 
that none of the script subtags are paired with languages, implying that 
they could be used as needed (unless defined in a "Suppress-Script" 
field, per BCP 47).  That is, it appears to me that "ru-Latn" is an 
acceptable combination.

Still, we're left with the problem of whether "ru-Latn" is worth using 
in the Guidelines (or in any TEI document) since there's still more than 
one transliteration system you might use for Russian.  That is, it's 
barely more actionable than "ru".  The IANA registry lists various 
"variant subtags" for systems of transliteration, some of which are 
given with one or more prefix with which it might be used.  For example, 
"pinyin" -- for Pinyin romanization -- is given with the prefixes 
"zh-Latn" and "bo-Latn", showing that "pinyin" is meant to be used with 
Chinese.  On the other hand, the variant subtag "alalc97" -- for 
Romanizations recommended by the American Library Association and the 
Library of Congress, in "ALA-LC Romanization Tables: Transliteration 
Schemes for Non-Roman Scripts" (1997), ISBN 978-0-8444-0940-5 -- is not 
given with any prefixes, implying that you could construct 
"ru-Latn-alalc97" if that particular transliteration system was used for 
Russian.  Unfortunately, the example in the Guidelines definitely 
doesn't use this system.

According to BCP 47 section 2.2.5, paragraph 4, you can only use 
registered variants, so we couldn't just make up "ru-Latn-isor91968" or 
something like that for what's transliterated above.  (It wouldn't be 
much of a standard anyway if we could!)  So we could seek IANA 
registration of another variant subtag for one of the systems of 
transliteration of Russian used in the example above.

Alternatively, we simply remove @xml:lang entirely from this example in 
the Guidelines!

* * *

As an aside, Kevin and Martin suggested that we link to the following:

* http://www.iana.org/assignments/language-subtag-registry
* http://www.w3.org/International/articles/language-tags/
* http://www.w3.org/International/questions/qa-choosing-language-tags.

from:

http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-att.global.html

as we've done at:

http://www.tei-c.org/release/doc/tei-p5-doc/en/html/CH.html#CHSH

on the same grounds that Martin and I often argue: that we and most 
users of the TEI, when seeking guidance on a specific element, consult 
the element and attribute specifications and don't always make it to the 
prose of the Guidelines.  Lou, on the other hand, said that the 
definition of the attribute class isn't the best place to look for this 
information, and that if you follow links starting at 
http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-att.global.html 
, you will reach this information.

--Kevin


More information about the tei-council mailing list