[tei-council] DCR alignment inside ODD

Piotr Bański bansp at o2.pl
Thu Apr 26 14:10:29 EDT 2012


[keeping Laurent in the loop, as he requested]

On 26/04/12 18:48, Lou Burnard wrote:
> On 26/04/12 17:12, Piotr Bański wrote:
> 
> [... snip ... ]
> 
>>> a<elementSpec>  can contain a<valList>, whose<valItem>  children
>>> can have<equiv>  children
>>>
>>> Does that help?
>>
>> Some. Thanks. I looked at valItem but the description made me shy away
>> from it ("contains one or more valItem elements defining possible values
>> for an *attribute*") -- it made me think that using it for element
>> content is Bad.
> 
> 
> I'd say that description is erroneous and should be revised. Please put 
> in a ticket.

Done.

https://sourceforge.net/tracker/?func=detail&aid=3521714&group_id=106328&atid=644062

>>> I suspect what you'd really like is to use a DTD which supplied default dcr:cat attributes to
>>> instances of<pos>.
>>
>> I'm not sure how to handle this in DTDs. default dcr:datcat pointing at
>> a definition of the POS, sure. But I can't see how to use this approach
>> for the values (noun, verb, etc.), maybe I'm missing something again.
>>
> 
> I am coming to this discussion under-prepared, but for what it's worth, 
> it seems to me that if what you want is to say "my <pos> elements all 
> have content/values defined by the ISO DCR", you certainly don't need to 
> say it on every <pos> occurrence. You could either say it in your ODD 
> using <equiv> (as previously noted), or you could also say it in the 
> <encodingDesc> somewhere. Similarly if you wanted to say that for your 
> @type attributes or anything else. But this seems different from saying 
> that your @type attribute or <pos> element itself is defined by the ISO 
> DCR.

I want to say about <pos>noun</pos> that:

1) the concept expressed by <pos> is this-and-that Data Category kept at
PID X (that's the dcr:datcat pointing at the definition of
"part-of-speech"), and

2) the value of that POS is this-and-that Simple Data Category kept at
PID Y (that's the dcr:valueDatcat pointing at the definition of the
concept "noun").

(note that I am restricting this to linguistic examples, but you can
have just as well Data Categories for the concept of "author" or "sex",
or "trochee", etc., with the same reference machinery -- this is why
Laurent wants them global)

In particular, I would like to know that when dictionary A says that
something is "fem", dictionary B that it is "f", and C that it is
"feminine" (or "ż", "żeń.", or "weibl.", etc.), they all talk about the
same value of the category "Gender" (so I use dcr:datcat for the concept
"Gender", and valueDatcat for the concept "Feminine").

Conversely, when one dictionary tells me that something is "n", and
another that something else is "n", I want to make sure to indicate that
the first one talks about the concept "noun", but the other about the
concept "neuter", so I don't want to combine them in my search, or in my
combined mega-dictionary.

So it's not just about saying that "my pos elements have content defined
by the ISO DCR", but I need to be more granular, and actually identify
the concepts by their PIDs. I could indicate that to humans by e.g.
"neut" and to machines by the appropriate valueDatcat, at the same time
-- this is roughly the extension of the <f> example mentioned by
Laurent. And I guess this is the stage which can be encoded in the
Guidelines right now.

<gen dcr:datcat="{PID of 'Gender'}"
     dcr:valueDatcat="{PID of 'Neuter'}">neut</gen>

-------------------------------------------
What I talked about in Zadar was a way to state, just *once* per
dictionary, that "wherever I use "neut" below as the value of <gen>, I
mean this-and-that DC under this PID". So in the body of the dictionary,
one would only use "neut" (incidentally human-readable and short), but
the header would tie this string appropriately to the relevant PID. I
guess that this is a matter for at least one Council session, and I hope
that LingSIG will come up with a coherent proposal, hopefully around
College Station or Oxford, whichever comes first.

best,

  P.


More information about the tei-council mailing list