[tei-council] representing transliteration in @xml:lang (was Re: biblscope and imprint)
Kevin Hawkins
kevin.s.hawkins at ultraslavonic.info
Tue Nov 13 22:30:31 EST 2012
Let me attempt to summarize this discussion so far, especially for the
benefit of the incoming Council members.
While Lou was implementing http://purl.org/TEI/FR/3555190 , he noticed
this example given in passing in CO-CoreElements.xml:
<title xml:lang="ru" level="m">Ciklotronnye volny v plazme</title>
which is a TEI representation of part of a source document that includes
a citation in which a title in Russian is transliterated into Roman
letters. Lou identified the transliteration scheme used as ISO/R
9:1968, though I hereby note that it could also be a few other systems
of transliteration which happen to coincide for the characters used in
this title.
Lou asked whether "ru" is the correct value of @xml:lang for Russian
written in Roman letters. Lou thinks that for processing purposes it's
wrong to have just "ru" when you expect to see Cyrillic letters, whereas
Kevin argued that the language is simply underspecified when using only
a language subtag. (Note that according to the nomenclature of BCP 47,
which defines how to create a value for @xml:lang, the whole value of
this attribute is called a "language tag", whereas the components of
this "tag", separated by hyphens, are called "subtags".)
Note also that we don't want to just detransliterate the Russian title
since it comes from a source document that we are attempting to represent.
Lou thought that you can't use a script subtag with a language subtag
unless the combination is enumerated in the IANA registry linked from
BCP 47 ( http://www.iana.org/assignments/language-subtag-registry ), and
while I at first agreed, as I read this document more closely, I see
that none of the script subtags are paired with languages, implying that
they could be used as needed (unless defined in a "Suppress-Script"
field, per BCP 47). That is, it appears to me that "ru-Latn" is an
acceptable combination.
Still, we're left with the problem of whether "ru-Latn" is worth using
in the Guidelines (or in any TEI document) since there's still more than
one transliteration system you might use for Russian. That is, it's
barely more actionable than "ru". The IANA registry lists various
"variant subtags" for systems of transliteration, some of which are
given with one or more prefix with which it might be used. For example,
"pinyin" -- for Pinyin romanization -- is given with the prefixes
"zh-Latn" and "bo-Latn", showing that "pinyin" is meant to be used with
Chinese. On the other hand, the variant subtag "alalc97" -- for
Romanizations recommended by the American Library Association and the
Library of Congress, in "ALA-LC Romanization Tables: Transliteration
Schemes for Non-Roman Scripts" (1997), ISBN 978-0-8444-0940-5 -- is not
given with any prefixes, implying that you could construct
"ru-Latn-alalc97" if that particular transliteration system was used for
Russian. Unfortunately, the example in the Guidelines definitely
doesn't use this system.
According to BCP 47 section 2.2.5, paragraph 4, you can only use
registered variants, so we couldn't just make up "ru-Latn-isor91968" or
something like that for what's transliterated above. (It wouldn't be
much of a standard anyway if we could!) So we could seek IANA
registration of another variant subtag for one of the systems of
transliteration of Russian used in the example above.
Alternatively, we simply remove @xml:lang entirely from this example in
the Guidelines!
* * *
As an aside, Kevin and Martin suggested that we link to the following:
* http://www.iana.org/assignments/language-subtag-registry
* http://www.w3.org/International/articles/language-tags/
* http://www.w3.org/International/questions/qa-choosing-language-tags.
from:
http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-att.global.html
as we've done at:
http://www.tei-c.org/release/doc/tei-p5-doc/en/html/CH.html#CHSH
on the same grounds that Martin and I often argue: that we and most
users of the TEI, when seeking guidance on a specific element, consult
the element and attribute specifications and don't always make it to the
prose of the Guidelines. Lou, on the other hand, said that the
definition of the attribute class isn't the best place to look for this
information, and that if you follow links starting at
http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-att.global.html
, you will reach this information.
--Kevin
More information about the tei-council
mailing list