[tei-council] DCR alignment inside ODD

Piotr Bański bansp at o2.pl
Thu Apr 26 12:14:36 EDT 2012


Ouch, I get the difference wrt <equiv> now, thanks! :-)

  P.

On 26/04/12 18:01, Laurent Romary wrote:
> I answer your two points quickly (baby bottle soon):
> 
> * I could live with a decision of just making http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-model.gramPart.html member of att.datcat and consider step by step the integration of other objects (but maybe metadata elements in the header should be considered quickly). Still, I do see this as a kind of general purpose mechanism at the service of convergence across standardization bodies and as such a stro,ng move to make
> * the semantic of dcr:datcat is "the element I qualify  corresponds to the concept expressed in my value", and you don't want to express this for <equiv>: there is a level of indirection here, because equiv qualifies the semantic of an element defined by the element spec, the element itself does not contain the attribute. It would thus be conceptually wrong (attribute-abuse,of the worst kind ;-)) to adopt the construct you suggest...
> :-)
> Laurent
> 
> Le 26 avr. 2012 à 17:51, Piotr Bański a écrit :
> 
>> Thank you, Sebastian and Laurent,
>>
>> There's several issues here. (I concentrate on Laurent's post and will
>> reply to Sebastian's separately)
>>
>> Firstly, you (Laurent) want the most radical move, i.e., making the
>> datcat attributes available to *all* elements. Two comments:
>>
>> * I had an impression that this was not accepted by the Council, and
>> that the recommendation was to go from the bottom up, i.e. to select the
>> items that need the datcat stuff, and possibly increase the scope as
>> needed. I agree that the radical solution would simplify the issue,
>> given that ISOcat is able to provide alignment for practically any kind
>> of data category, not just those purely linguistic. Perhaps we could
>> reopen the discussion on that, I'd be happy to see att.datcat where you
>> suggest.
>>
>> The title of the ticket is general, but the description may indeed
>> suggest that this is a proposal restricted to linguistic stuff:
>>
>> https://sourceforge.net/tracker/index.php?func=detail&aid=3432520&group_id=106328&atid=644065
>>
>>
>> * if you assume that att.datcat are global, why on earth NOT use them on
>> <equiv>, where's the consistency? Sure thing, equiv has the @uri
>> attribute which was, or could be, used for DCR alignment, since there
>> was no other tool to do it. But if you postulate global datcat
>> attributes, I see it as inconsistent and counterintuitive to demand that
>> on <equiv> alone, DCR alignment is to be handled by @uri rather than the
>> available datcat attributes.
>>
>> Secondly, yes, I know the example, it's nice until you imagine lots of
>> <fs> at the POS layer (take any serious corpus out there), at which
>> point it stops being nice and becomes seriously overredundant, and makes
>> you think of shifting the DCR stuff at least to the level of FSD.
>> Granted, FSD is not quite there still (sigh), so keeping all the stuff
>> within <f> is an unhappy temporary solution, good for presenting as one
>> of the examples in the spec, but maybe not necessarily in the
>> Dictionaries chapter.
>>
>> Sure thing, I can do the <equiv>alence for POS in the ODD, and then do:
>> <pos dcr:valueDatcat="http://www.isocat.org/datcat/DC-1256">CN</pos>
>> (for "common noun"), except that it's a variant of the problem with <f>,
>> namely redundancy, very clear in the context of a dictionary. It is also
>> a problem of a split mechanism (<equiv> for containers vs. local
>> dcr:valueDatcat for values) instead of a unified mechanism.
>>
>> Let me make sure it's clear what I consider redundancy in this very
>> case: <pos> or @name="part of speech" have to be repeated many times,
>> that's OK. But if we add the dcr: stuff, then, together with the "local"
>> identifiers, we repeat the "global" identifiers, in every place
>> affected, instead of saying once, either in the ODD, schema, or header:
>> <pos> = "http://www.isocat.org/datcat/DC-1345", and then using <pos>,
>> with its meaning now clarified.
>>
>> Still, I'm grateful for the replies and discussion because it took away
>> my doubts concerning the here-and-now: it's better to have the DCR stuff
>> officially in the TEI than create roundabout solutions of the type I
>> talked about in Zadar and implemented in FreeDict. For the reasons that
>> I gave above, it feels to me like a half-way solution, but still, as we
>> all know, it's better to have it than not to have it, and I will now put
>> some example into the DI chapter (maybe even without mentioning <equiv>
>> for the time being), and will be happy to make a step forward. I think I
>> wanted too much too soon (and feared about how overwhelming it might
>> become, and that it goes beyond just a brief Council discussion that
>> we've had).
>>
>> Best,
>>
>>  P.
>>
>>
>> On 26/04/12 09:44, Laurent Romary wrote:
>>> I guess you are currently working on 3432520
>>>
>>> There are two distinct mechanisms here:
>>> - the normal use of <equiv> within an ODD spec
>>> - the on-the-fly declaration of equivalence on an element instance ("I
>>> used <pos> here, meaning exactly the POS in ISOCat")
>>> For the latter purpose, ISO 12620 introduces two attributes in the dcr:
>>> namespace, for instance (example provided by Menzo in CC), you can
>>> decorate an FS as follows
>>> <tei:TEI xmlns:tei="http://www.tei-c.org/ns/1.0" xmlns:dcr="http://www.isocat.org/ns/dcr">
>>>    ...
>>>    <tei:fs>
>>>        ...
>>>        <tei:f 
>>>            name="part of speech"
>>>           dcr:datcat="http://www.isocat.org/datcat/DC-1345"
>>>            fVal="common noun"
>>>            dcr:valueDatcat="http://www.isocat.org/datcat/DC-1256"
>>>        /> 
>>>        ...
>>>    </tei:fs>
>>>    ...
>>> </tei:TEI>
>>>
>>>
>>> So, looking again at the ticket the situation is clear, you
>>> make att.global a member of att.datcat, but make clear in the guidelines
>>> that this does not replace <equiv>
>>>
>>>
>>> Le 25 avr. 2012 à 22:59, Sebastian Rahtz a écrit :
>>>
>>>>
>>>> On 25 Apr 2012, at 21:40, Piotr Bański wrote:
>>>>
>>>>> I'm working on the ISO DCR / ISOcat issues.[1] Got stuck at the point of
>>>>> adding the relevant pieces of text to the Guidelines.
>>>>>
>>>>> The enlightened way to align grammatical categories with the values of
>>>>> the DCR is to put the appropriate references into the ODD, and I guess
>>>>> <equiv> is the ideal place for that.
>>>>>
>>>>> I imagine, and please correct me if I am wrong, that for elements such
>>>>> as <pos>, this action may be trivial:
>>>>>
>>>>> <elementSpec ident="pos" mode="change">
>>>>> <equiv dcr:datcat="http://www.isocat.org/datcat/DC-1345"/>
>>>>> </elementSpec>
>>>>
>>>> <equiv url="http://www.isocat.org/datcat/DC-1345"/> is the syntax, I
>>>> think.
>>>>
>>>>> The above makes it possible for us to happily realize that whenever we
>>>>> do e.g.
>>>>>
>>>>> <gramGrp><pos>...</pos></gramGrp>
>>>>>
>>>>> all the machines in the world may know that by <pos>, we mean
>>>>> http://www.isocat.org/datcat/DC-1345 .
>>>> well, if they read the ODD yes. I think there is a certain amount
>>>> of "simple matter of programming" involved here.
>>>>
>>>>>
>>>>> However, there is also the content of <pos> to be handled, and it is not
>>>>> so obvious to me how to represent this in the ODD. Intuitively, I'm
>>>>> thinking of
>>>>>
>>>>> <elementSpec>
>>>>> ...
>>>>> <content>
>>>>> {list of values with their DCR references}
>>>>> </content>
>>>>
>>>> a <elementSpec> can contain a <valList>, whose <valItem> children
>>>> can have <equiv> children
>>>>
>>>> Does that help?
>>>>
>>>> I suspect what you'd really like is to use a DTD which supplied
>>>> default dcr:cat attributes to
>>>> instances of <pos>.
>>>>
>>>> Sebastian
>>>>
>>>
>>> Laurent Romary
>>> INRIA & HUB-IDSL
>>> laurent.romary at inria.fr <mailto:laurent.romary at inria.fr>
>>>
>>>
>>>
>>
> 
> Laurent Romary
> INRIA & HUB-IDSL
> laurent.romary at inria.fr
> 
> 
> 
> 



More information about the tei-council mailing list