[tei-council] biblscope and imprint

Mon Nov 5 12:06:23 EST 2012

Hi Kevin,

On 12-11-05 06:29 AM, Kevin Hawkins wrote:
> That document is helpful.  What wasn't clear to me about BCP 47 is
> whether you could only use a script subtag in combination with a
> language subtag if they were listed in combination in the IANA registry
> (as some are).

Absolutely not. The idea of the "suppress script" thing, if I understand 
it correctly, is that you _don't_ need to specify that script, because 
it's the default or obvious, so when you use just "ru", the script 
"Cyrl" is understood; but if a different script is used, then you should 
specify it.

>  Whereas this W3C guide explicitly says you can only use
> extended language subtags with certain languages, I see that it
> explicitly says you can use a script subtag with any language when it's
> not written in the script given for "suppress script".  And, as we see,
> it even gives the examples of Russian "transcribed into the Latin script".
>
> So if you encounter "ru-Latn", you're stuck figuring out which
> transliteration scheme was used (or whether the author just invented one).

Yes, that's an interesting point; in many cases, there are ways to 
specify the transliteration scheme used:

Type: variant
Subtag: wadegile
Description: Wade-Giles romanization
Added: 2008-10-03
Prefix: zh-Latn

which specifies one particular romanization of Chinese. Use of variants 
is explained here:

<http://www.w3.org/International/questions/qa-choosing-language-tags#xxxshortcomings>

However, there are no such variants for Russian. If there are multiple 
latin transliteration schemes in use for Russian, it would be a good 
idea to register subtags for them.

To my mind, BCP 47 itself is actually quite hard to understand, but 
we've linked in the Guidelines to two W3C documents (including the one 
above) which really help to clarify the situation.

Cheers,
Martin

>
> --Kevin
>
> On 11/5/12 8:25 AM, Martin Holmes wrote:
>> The W3C guide to using language subtags actually uses ru-Latn as the
>> specific example for script subtags:
>>
>> "Script subtags should only be used as part of a language tag when the
>> script adds some useful distinguishing information to the tag. Usually
>> this is because a language is written in more than one script or because
>> the content has been transcribed into a script that is unusual to the
>> language (so one might tag Russian transcribed into the Latin script
>> with a tag such as ru-Latn)."
>>
>> <http://www.w3.org/International/questions/qa-choosing-language-tags>
>>
>> So I think Kevin might be misunderstanding BCP 47, and ru-Latn should be
>> used.
>>
>> Cheers,
>> Martin
>>
>> On 12-11-04 09:49 PM, Kevin Hawkins wrote:
>>> On 11/4/12 7:52 AM, Lou Burnard wrote:
>>>> Firstly the comment that using "ru" for Russian transliterated in Roman
>>>> characters is simply "underspecified" seems to me rather to miss the
>>>> point. If I see something in a Unicode document which says it has
>>>> xml:lang="ru" I expect to see proper Russian Unicode characters.
>>>
>>> Perhaps.  I meant that while you might think that, it wasn't clear to me
>>> that the semantics of @xml:lang license that inference.  However, once I
>>> looked at BCP 47 and the discussion of "suppress script" further, I
>>> think it might indeed license Lou's inference.
>>>
>>>> Secondly, even if I am prepared to accept Romanized versions of those
>>>> characters and figure out for myself what the Russian should have been,
>>>> this is not entirely easy. There are several different (Wikipedia lists
>>>> ten) possible Romanization schemes, which vary quite considerably. In
>>>> some, for example, the sequence "ye" stands for the Russian letter that
>>>> looks like a Roman "e"; in others this character is represented by "e",
>>>> unless it is iotated by a preceding soft sign. So generating a correct
>>>> Cyrillic version of this citation isn't easy, and neither is deciding
>>>> which scheme we're dealing with here!
>>>
>>> BCP 47 allows for registering of variant subtags for systems of
>>> transliteration, but it does not require this.  However, per the
>>> discussion of "suppress script", it seems you effectively need to for
>>> transliteration.
>>>
>>> This is puzzling.
>>>
>>>> Thirdly, this particular example is actually taken verbatim from a
>>>> rather elderly ISO standard on bibliographic reference (ISO 690, 1987).
>>>>       Hence we probably should not mess with its representation at all.
>>>
>>> I fully agree that as long as we are citing a citation in a source
>>> document, we shouldn't go de-transliterating it!
>>>
>>>     >  (You
>>>> can see it cited as a example in the Wikipedia entry for ISO_690,
>>>> curiously enough).
>>>
>>> I imagine that someone writing or improving the Wikipedia article on ISO
>>> 690 googled around to see what they could find and stumbled upon the
>>> Guidelines ...
>>>
>>>> My guess, but I defer to the Russian expert in our midst, is that this
>>>> uses the now deprecated ISO/R:1968 but without access to the original,
>>>> it's hard to be sure, and without being sure I'd rather not try to
>>>> convert it into proper Russian.
>>>
>>> Well, it looks like Lou not only tried, but as your resident Russian
>>> expert I can say that he also succeeded.
>>>
>>>> All of which I suppose we can side-step cheerfully, by saying "ru-Latn",
>>>> even though this particular combination isn't actually proposed in
>>>> http://www.iana.org/assignments/language-subtag-registry, and even
>>>> though this won't help anyone who *does* want to see the original title
>>>> as it should have been presented!
>>>
>>> I, like Martin in a later message, used to think that BCP 47 allowed for
>>> the various types of tags to be combined as you see fit, meaning that
>>> "ru-Latn" would be allowed.  But a closer reading of BCP 47 now makes me
>>> think that you can only use things in the IANA registry unless you use a
>>> private use subtag.
>>>
>>> We could bring in Syd Bauman or Deborah Anderson to help us sort this
>>> out, or we could take a shortcut by simply removing the @xml:lang on
>>> this transliterated title.
>>>
>>> --Kevin
>>>
>>

-- 
Martin Holmes
University of Victoria Humanities Computing and Media Centre
(mholmes at uvic.ca)