[tei-council] Values of @xml:lang on <exemplum>

Laurent Romary laurent.romary at loria.fr
Thu Apr 30 16:03:08 EDT 2009


If we were in an ideal world I would like to see the following  
guidelines adopted:
- xml:lang is used rigourously to identify the working language (the  
rule of thumb is: what spelling checker would I apply to this content)
- we create an @objectLang attribute which is only used when we know  
that a certain content is "about" a certain language. A typical  
example would be a dictionary entry, a gloss about a foreign word,  
etc. @objectLang could come with an @scheme so that we are not tied to  
the strict xml:lang constraints there (we may want to refer to  
language famillies as in ISO 639 part 5. (do we need an attr. class?).  
@objectLang is inherited just xml:lang is.

To take up Lou's example, we would have:

<exemplum xml:lang="fr" objectLang='la'> <!-- I make the assumption is  
that we try to explain something about the Latin language, but we use  
a more practical language to do so -->
	<p>Voici un exemple d'interet profonde:
	<egXML xml:lang="la">Caesar adsum jam forte</egXML> <!-- look here,  
there are places where working and object language are identical. That  
would also be the case for the <orth> of a dictionary entry
  qui pourrait s'exprimer en anglais comme:
	<egXML xml:lang="en">Caesar I  am here at last</egXML> <!-- it's  
still about Latin, but we now of a few who would just understand  
English here, so we use it as a working language -->
	ou peut etre..
	</p>
</exemplum>


Le 30 avr. 09 à 20:08, David Sewell a écrit :

> Laurent,
>
> How would you recommend that we encode the distinction in our  
> <exemplum>
> sections?
>
> On Thu, 30 Apr 2009, Laurent Romary wrote:
>
>> What James is alluded to is an essential description in, e.g.,  
>> terminoly. That
>> is between working and object (which he calls target) languages. I  
>> quote here
>> the corresponding section of ISO 16642 (TMF - Terminoogical Markup  
>> Framework)
>> and I think we should seriously consider making this distinction  
>> ours in the
>> TEI.
>> Laurent
>> Any terminological data collection conforming to this TMF should  
>> clearly
>> distinguish between the working
>> language and the object language, which are the two types of language
>> information that can be attached to any
>> level of the collection.
>>
>> The working language is the language used to express any given  
>> textual content
>> in the data collection. This
>> information shall be represented using the xml:lang attribute as  
>> defined in
>> the Extensible Markup Language
>> (XML) recommendation of the W3C and used accordingly. In  
>> particular, the scope
>> of the working language is
>> the whole sub-document starting from the element where the  
>> information has
>> been declared, unless it is
>> superseded by another working language declaration for some element  
>> in this
>> sub-document.
>>
>> The object language is the language of the terminological  
>> information which is
>> being described at some level in
>> the terminological data collection (typically at the language  
>> section level).
>> As such, it is represented in the TMF
>> as a data category (“language identifier” in ISO12620) and may be  
>> represented
>> in a given TML using any style
>> among those described in this International Standard. Its possible  
>> values are
>> those allowed by the reference
>> data category in ISO12620 or a reduced set defined for a given TML.
>>
>> The following example shows how the two types of language can be  
>> used within a
>> language section expressed
>> in GMT:
>> <struct type="LS" xml:lang="fr">
>> <feat type="language identifier">en</feat>
>> <feat type="definition">Unevaleur entre 0 et 1 utilisée...</feat>
>> <struct type="TS">
>> <feat type="term" xml:lang="en">alpha smoothing factor</feat>
>> <feat type="term type">fullForm</feat>
>> </struct>
>> </struct>
>>
>>
>>
>> Le 30 avr. 09 à 15:57, James Cummings a écrit :
>>
>>>
>>> Sebastian and I were just discussing this in relationship to  
>>> examples
>>> and my desire to decouple them.  In implementation it is fine  
>>> where the
>>> Guidelines point to a particular example by @xml:id, you just  
>>> include
>>> it.  But sometimes the elementSpec might have an example in another
>>> language that you wish to appear in the English (or another  
>>> language's)
>>> element reference page.  So the logic, I think, is to include all
>>> exemplum in the target language, and possibly all those with no
>>> language, and only if those don't exist provide the English example.
>>> the problem comes when the Chinese want to include a French example.
>>> The solution we thought of was basically as you have below, but we
>>> worried about abusing the semantics of exemplum/@xml:lang ...  
>>> having a
>>> @targetLang or something might solve this.  So you might get:
>>>
>>> <exemplum targetLang="fr">
>>> <!-- XInclude an example by @xml:id whose own @xml:lang is 'lat' -->
>>> </exemplum>
>>> <exemplum targetLang="zh">
>>> <!-- XInclude the same example -->
>>> </exemplum>
>>>
>>> Not sure...which is why I wanted to avoid talking about  
>>> implementation. ;-)
>>>
>>> -James
>>>
>>> David Sewell wrote:
>>>> This is do-able within the next month. I don't mind taking it on  
>>>> as my
>>>> project, as I've been working with similar issues with our UVa  
>>>> Press
>>>> material.
>>>>
>>>> Incidentally, I realized there is a perfectly straightforward (if
>>>> somewhat hack-ish) solution to handling exempla in Latin, etc.:
>>>>
>>>> <exemplum xml:lang="">
>>>>  <egXML xmlns="http://www.tei-c.org/ns/Examples" xml:lang="la">
>>>>    <p>GALLIA est omnis divisa in partes tres</p>
>>>>  </egXML>
>>>> </exemplum>
>>>>
>>>> which asserts "this exemplum is language-neutral, but the sample  
>>>> code it
>>>> contains is in Latin".
>>>>
>>>> David
>>>>
>>>> On Thu, 30 Apr 2009, Sebastian Rahtz wrote:
>>>>
>>>>> Excellent to get this sorted.
>>>>>> I would suggest, then, that we merge cases in #2 with #3 and  
>>>>>> assign
>>>>>> them
>>>>>> all @xml:lang="". Those exempla are then considered language- 
>>>>>> neutral
>>>>>> for
>>>>>> purposes of display. I assume that's more important to us than
>>>>>> enabling
>>>>>> people to search for <exemplum>s that happen to be in Old  
>>>>>> English or
>>>>>> Arabic or whatever?
>>>>>>
>>>>> I am ambivalent about this. But on the whole I think your  
>>>>> suggestion
>>>>> is the only practical one. We can change the schema for the  
>>>>> Guidelines
>>>>> to make xml:lang compulsory on <exemplum>, by the way.
>>>>>
>>>>> the processing _will_ have to change. Can we get this done
>>>>> in the next month? how close are you to a listing all the
>>>>> exempla which need an xml:lang=""? I assume we just assign "en" to
>>>>> all the others which have no xml:lang at present.
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> Dr James Cummings, Research Technologies Service, University of  
>>> Oxford
>>> James dot Cummings at oucs dot ox dot ac dot uk
>>> _______________________________________________
>>> tei-council mailing list
>>> tei-council at lists.village.Virginia.EDU
>>> http://lists.village.Virginia.EDU/mailman/listinfo/tei-council
>>
>
> -- 
> David Sewell, Editorial and Technical Manager
> ROTUNDA, The University of Virginia Press
> PO Box 801079, Charlottesville, VA 22904-4318 USA
> Courier: 310 Old Ivy Way, Suite 302, Charlottesville VA 22903
> Email: dsewell at virginia.edu   Tel: +1 434 924 9973
> Web: http://rotunda.upress.virginia.edu/



More information about the tei-council mailing list