[tei-council] DCR alignment inside ODD

Piotr Bański bansp at o2.pl
Fri Apr 27 07:07:24 EDT 2012


Hi Laurent,

Right, tagUsage is exactly what I used there. Will formulate a more
precise proposal when I've dealt with the most pressing things before
the end of the month.

Isn't this a bit like the <equiv> case, where you talk about the DCR
alignment indirectly? Need to give it a thought when the reality calms down.

Thanks,

  P.

On 27/04/12 09:03, Laurent Romary wrote:
> Hi Piotr,
> I also like your Zadar use case. We need to think of a way of having a global declaration ("all my <gen> mean X in the DCR") in encodingDesc, which would be indepedent of the <equiv> mechanism. I would suggest that the dcr: attributes go on tagUsage for this purpose.
> Cheers,
> Laurent
> 
> Le 26 avr. 2012 à 20:10, Piotr Bański a écrit :
> 
>> [keeping Laurent in the loop, as he requested]
>>
>> On 26/04/12 18:48, Lou Burnard wrote:
>>> On 26/04/12 17:12, Piotr Bański wrote:
>>>
>>> [... snip ... ]
>>>
>>>>> a<elementSpec>  can contain a<valList>, whose<valItem>  children
>>>>> can have<equiv>  children
>>>>>
>>>>> Does that help?
>>>>
>>>> Some. Thanks. I looked at valItem but the description made me shy away
>>>> from it ("contains one or more valItem elements defining possible values
>>>> for an *attribute*") -- it made me think that using it for element
>>>> content is Bad.
>>>
>>>
>>> I'd say that description is erroneous and should be revised. Please put 
>>> in a ticket.
>>
>> Done.
>>
>> https://sourceforge.net/tracker/?func=detail&aid=3521714&group_id=106328&atid=644062
>>
>>>>> I suspect what you'd really like is to use a DTD which supplied default dcr:cat attributes to
>>>>> instances of<pos>.
>>>>
>>>> I'm not sure how to handle this in DTDs. default dcr:datcat pointing at
>>>> a definition of the POS, sure. But I can't see how to use this approach
>>>> for the values (noun, verb, etc.), maybe I'm missing something again.
>>>>
>>>
>>> I am coming to this discussion under-prepared, but for what it's worth, 
>>> it seems to me that if what you want is to say "my <pos> elements all 
>>> have content/values defined by the ISO DCR", you certainly don't need to 
>>> say it on every <pos> occurrence. You could either say it in your ODD 
>>> using <equiv> (as previously noted), or you could also say it in the 
>>> <encodingDesc> somewhere. Similarly if you wanted to say that for your 
>>> @type attributes or anything else. But this seems different from saying 
>>> that your @type attribute or <pos> element itself is defined by the ISO 
>>> DCR.
>>
>> I want to say about <pos>noun</pos> that:
>>
>> 1) the concept expressed by <pos> is this-and-that Data Category kept at
>> PID X (that's the dcr:datcat pointing at the definition of
>> "part-of-speech"), and
>>
>> 2) the value of that POS is this-and-that Simple Data Category kept at
>> PID Y (that's the dcr:valueDatcat pointing at the definition of the
>> concept "noun").
>>
>> (note that I am restricting this to linguistic examples, but you can
>> have just as well Data Categories for the concept of "author" or "sex",
>> or "trochee", etc., with the same reference machinery -- this is why
>> Laurent wants them global)
>>
>> In particular, I would like to know that when dictionary A says that
>> something is "fem", dictionary B that it is "f", and C that it is
>> "feminine" (or "ż", "żeń.", or "weibl.", etc.), they all talk about the
>> same value of the category "Gender" (so I use dcr:datcat for the concept
>> "Gender", and valueDatcat for the concept "Feminine").
>>
>> Conversely, when one dictionary tells me that something is "n", and
>> another that something else is "n", I want to make sure to indicate that
>> the first one talks about the concept "noun", but the other about the
>> concept "neuter", so I don't want to combine them in my search, or in my
>> combined mega-dictionary.
>>
>> So it's not just about saying that "my pos elements have content defined
>> by the ISO DCR", but I need to be more granular, and actually identify
>> the concepts by their PIDs. I could indicate that to humans by e.g.
>> "neut" and to machines by the appropriate valueDatcat, at the same time
>> -- this is roughly the extension of the <f> example mentioned by
>> Laurent. And I guess this is the stage which can be encoded in the
>> Guidelines right now.
>>
>> <gen dcr:datcat="{PID of 'Gender'}"
>>     dcr:valueDatcat="{PID of 'Neuter'}">neut</gen>
>>
>> -------------------------------------------
>> What I talked about in Zadar was a way to state, just *once* per
>> dictionary, that "wherever I use "neut" below as the value of <gen>, I
>> mean this-and-that DC under this PID". So in the body of the dictionary,
>> one would only use "neut" (incidentally human-readable and short), but
>> the header would tie this string appropriately to the relevant PID. I
>> guess that this is a matter for at least one Council session, and I hope
>> that LingSIG will come up with a coherent proposal, hopefully around
>> College Station or Oxford, whichever comes first.
>>
>> best,
>>
>>  P.
> 
> Laurent Romary
> INRIA & HUB-IDSL
> laurent.romary at inria.fr
> 
> 
> 
> 



More information about the tei-council mailing list