[tei-council] biblscope and imprint

Mon Nov 5 09:29:48 EST 2012

That document is helpful.  What wasn't clear to me about BCP 47 is 
whether you could only use a script subtag in combination with a 
language subtag if they were listed in combination in the IANA registry 
(as some are).  Whereas this W3C guide explicitly says you can only use 
extended language subtags with certain languages, I see that it 
explicitly says you can use a script subtag with any language when it's 
not written in the script given for "suppress script".  And, as we see, 
it even gives the examples of Russian "transcribed into the Latin script".

So if you encounter "ru-Latn", you're stuck figuring out which 
transliteration scheme was used (or whether the author just invented one).

--Kevin

On 11/5/12 8:25 AM, Martin Holmes wrote:
> The W3C guide to using language subtags actually uses ru-Latn as the
> specific example for script subtags:
>
> "Script subtags should only be used as part of a language tag when the
> script adds some useful distinguishing information to the tag. Usually
> this is because a language is written in more than one script or because
> the content has been transcribed into a script that is unusual to the
> language (so one might tag Russian transcribed into the Latin script
> with a tag such as ru-Latn)."
>
> <http://www.w3.org/International/questions/qa-choosing-language-tags>
>
> So I think Kevin might be misunderstanding BCP 47, and ru-Latn should be
> used.
>
> Cheers,
> Martin
>
> On 12-11-04 09:49 PM, Kevin Hawkins wrote:
>> On 11/4/12 7:52 AM, Lou Burnard wrote:
>>> Firstly the comment that using "ru" for Russian transliterated in Roman
>>> characters is simply "underspecified" seems to me rather to miss the
>>> point. If I see something in a Unicode document which says it has
>>> xml:lang="ru" I expect to see proper Russian Unicode characters.
>>
>> Perhaps.  I meant that while you might think that, it wasn't clear to me
>> that the semantics of @xml:lang license that inference.  However, once I
>> looked at BCP 47 and the discussion of "suppress script" further, I
>> think it might indeed license Lou's inference.
>>
>>> Secondly, even if I am prepared to accept Romanized versions of those
>>> characters and figure out for myself what the Russian should have been,
>>> this is not entirely easy. There are several different (Wikipedia lists
>>> ten) possible Romanization schemes, which vary quite considerably. In
>>> some, for example, the sequence "ye" stands for the Russian letter that
>>> looks like a Roman "e"; in others this character is represented by "e",
>>> unless it is iotated by a preceding soft sign. So generating a correct
>>> Cyrillic version of this citation isn't easy, and neither is deciding
>>> which scheme we're dealing with here!
>>
>> BCP 47 allows for registering of variant subtags for systems of
>> transliteration, but it does not require this.  However, per the
>> discussion of "suppress script", it seems you effectively need to for
>> transliteration.
>>
>> This is puzzling.
>>
>>> Thirdly, this particular example is actually taken verbatim from a
>>> rather elderly ISO standard on bibliographic reference (ISO 690, 1987).
>>>      Hence we probably should not mess with its representation at all.
>>
>> I fully agree that as long as we are citing a citation in a source
>> document, we shouldn't go de-transliterating it!
>>
>>    >  (You
>>> can see it cited as a example in the Wikipedia entry for ISO_690,
>>> curiously enough).
>>
>> I imagine that someone writing or improving the Wikipedia article on ISO
>> 690 googled around to see what they could find and stumbled upon the
>> Guidelines ...
>>
>>> My guess, but I defer to the Russian expert in our midst, is that this
>>> uses the now deprecated ISO/R:1968 but without access to the original,
>>> it's hard to be sure, and without being sure I'd rather not try to
>>> convert it into proper Russian.
>>
>> Well, it looks like Lou not only tried, but as your resident Russian
>> expert I can say that he also succeeded.
>>
>>> All of which I suppose we can side-step cheerfully, by saying "ru-Latn",
>>> even though this particular combination isn't actually proposed in
>>> http://www.iana.org/assignments/language-subtag-registry, and even
>>> though this won't help anyone who *does* want to see the original title
>>> as it should have been presented!
>>
>> I, like Martin in a later message, used to think that BCP 47 allowed for
>> the various types of tags to be combined as you see fit, meaning that
>> "ru-Latn" would be allowed.  But a closer reading of BCP 47 now makes me
>> think that you can only use things in the IANA registry unless you use a
>> private use subtag.
>>
>> We could bring in Syd Bauman or Deborah Anderson to help us sort this
>> out, or we could take a shortcut by simply removing the @xml:lang on
>> this transliterated title.
>>
>> --Kevin
>>
>