[tei-council] DCR alignment inside ODD

Piotr Bański bansp at o2.pl
Wed May 2 10:04:18 EDT 2012


This is a quick reply to relay Menzo's reply to the Council and to thank
him for it :-)

And let me also add that by sighing over FSD not being quite there I
indeed meant the practical side, i.e. the lack of a way to validate FSRs
against an FSD with a single, publicly available tool.

Possibly this may be another thing for the Council to bear in mind wrt
project packages that might be announced to the community. I also
wonder, on a somewhat related note, if the relatively recent
introduction of the option to use <rng:text/> inside <f> impacts on the
FSD section in any way.

Best,

  Piotr

On 02/05/12 15:11, Menzo Windhouwer wrote:
> Dear all,
> 
> Sorry for my late response, we had some national holidays in the Netherlands so I was a few days off :-)
> 
> Indeed the @dcr:datcat and the ODD <equiv/> can function on different levels, e.g., in ODD one could say what <tei:f/> in a feature structure means, i.e., link to a data category /feature/, while using the @dcr:datcat/@dcr:valueDatcat one could link an specific feature instance to a specific data category, i.e., /partOfSpeech/ and /commonNoun/.
> 
> For the TEI header I've created in the past already a set of scripts to create data categories and link them using <equiv/>. However, this is suboptimal as it would create new data categories for all elements/attributes while ISOcat might already contain useful ones. The results never made it to isocat.org, but it might still be an interesting starting point and I would glady run/update these scripts and show case the result on the ISOcat test server ...
> 
> In a lexicon related project (the RELISH project also presented at the TEI lexicon workshop last year) I'm working on a LMF serialization that allows the use of the TEI feature structures. There I encountered the same problem as you Piotr, i.e., one needs to repeat all @dcr:datcat/dcr:valueDatcat annotation for each instance. I was planning to minimize that burden by allowing annotating a feature structure declaration. So I wonder what you mean by "FSD is not quite there still (sigh)". The XML vocabulary is defined and can be used isn't it? Or do you mean that there is no actual impact, e.g., validation, of the declaration on the instances?
> 
> Best,
> 
> Menzo
> 
> --
> Menzo Windhouwer
> e-mail:Menzo.Windhouwer at mpi.nl
> Max-Planck-Institute for Psycholinguistics
> 
> 
>> -----Original Message-----
>> From: Laurent Romary [mailto:laurent.romary at inria.fr]
>> Sent: Thursday, April 26, 2012 18:01
>> To: Piotr Bański
>> Cc: Sebastian Rahtz; TEI Council; Menzo Windhouwer
>> Subject: Re: [tei-council] DCR alignment inside ODD
>>
>> I answer your two points quickly (baby bottle soon):
>>
>> * I could live with a decision of just making http://www.tei-
>> c.org/release/doc/tei-p5-doc/en/html/ref-model.gramPart.html member of
>> att.datcat and consider step by step the integration of other objects (but
>> maybe metadata elements in the header should be considered quickly). Still,
>> I do see this as a kind of general purpose mechanism at the service of
>> convergence across standardization bodies and as such a stro,ng move to
>> make
>> * the semantic of dcr:datcat is "the element I qualify  corresponds to the
>> concept expressed in my value", and you don't want to express this for
>> <equiv>: there is a level of indirection here, because equiv qualifies the
>> semantic of an element defined by the element spec, the element itself
>> does not contain the attribute. It would thus be conceptually wrong
>> (attribute-abuse,of the worst kind ;-)) to adopt the construct you suggest...
>> :-)
>> Laurent
>>
>> Le 26 avr. 2012 à 17:51, Piotr Bański a écrit :
>>
>>> Thank you, Sebastian and Laurent,
>>>
>>> There's several issues here. (I concentrate on Laurent's post and will
>>> reply to Sebastian's separately)
>>>
>>> Firstly, you (Laurent) want the most radical move, i.e., making the
>>> datcat attributes available to *all* elements. Two comments:
>>>
>>> * I had an impression that this was not accepted by the Council, and
>>> that the recommendation was to go from the bottom up, i.e. to select
>>> the items that need the datcat stuff, and possibly increase the scope
>>> as needed. I agree that the radical solution would simplify the issue,
>>> given that ISOcat is able to provide alignment for practically any
>>> kind of data category, not just those purely linguistic. Perhaps we
>>> could reopen the discussion on that, I'd be happy to see att.datcat
>>> where you suggest.
>>>
>>> The title of the ticket is general, but the description may indeed
>>> suggest that this is a proposal restricted to linguistic stuff:
>>>
>>>
>> https://sourceforge.net/tracker/index.php?func=detail&aid=3432520&grou
>>> p_id=106328&atid=644065
>>>
>>>
>>> * if you assume that att.datcat are global, why on earth NOT use them
>>> on <equiv>, where's the consistency? Sure thing, equiv has the @uri
>>> attribute which was, or could be, used for DCR alignment, since there
>>> was no other tool to do it. But if you postulate global datcat
>>> attributes, I see it as inconsistent and counterintuitive to demand
>>> that on <equiv> alone, DCR alignment is to be handled by @uri rather
>>> than the available datcat attributes.
>>>
>>> Secondly, yes, I know the example, it's nice until you imagine lots of
>>> <fs> at the POS layer (take any serious corpus out there), at which
>>> point it stops being nice and becomes seriously overredundant, and
>>> makes you think of shifting the DCR stuff at least to the level of FSD.
>>> Granted, FSD is not quite there still (sigh), so keeping all the stuff
>>> within <f> is an unhappy temporary solution, good for presenting as
>>> one of the examples in the spec, but maybe not necessarily in the
>>> Dictionaries chapter.
>>>
>>> Sure thing, I can do the <equiv>alence for POS in the ODD, and then do:
>>> <pos dcr:valueDatcat="http://www.isocat.org/datcat/DC-1256">CN</pos>
>>> (for "common noun"), except that it's a variant of the problem with
>>> <f>, namely redundancy, very clear in the context of a dictionary. It
>>> is also a problem of a split mechanism (<equiv> for containers vs.
>>> local dcr:valueDatcat for values) instead of a unified mechanism.
>>>
>>> Let me make sure it's clear what I consider redundancy in this very
>>> case: <pos> or @name="part of speech" have to be repeated many times,
>>> that's OK. But if we add the dcr: stuff, then, together with the "local"
>>> identifiers, we repeat the "global" identifiers, in every place
>>> affected, instead of saying once, either in the ODD, schema, or header:
>>> <pos> = "http://www.isocat.org/datcat/DC-1345", and then using <pos>,
>>> with its meaning now clarified.
>>>
>>> Still, I'm grateful for the replies and discussion because it took
>>> away my doubts concerning the here-and-now: it's better to have the
>>> DCR stuff officially in the TEI than create roundabout solutions of
>>> the type I talked about in Zadar and implemented in FreeDict. For the
>>> reasons that I gave above, it feels to me like a half-way solution,
>>> but still, as we all know, it's better to have it than not to have it,
>>> and I will now put some example into the DI chapter (maybe even
>>> without mentioning <equiv> for the time being), and will be happy to
>>> make a step forward. I think I wanted too much too soon (and feared
>>> about how overwhelming it might become, and that it goes beyond just a
>>> brief Council discussion that we've had).
>>>
>>> Best,
>>>
>>>  P.
>>>
>>>
>>> On 26/04/12 09:44, Laurent Romary wrote:
>>>> I guess you are currently working on 3432520
>>>>
>>>> There are two distinct mechanisms here:
>>>> - the normal use of <equiv> within an ODD spec
>>>> - the on-the-fly declaration of equivalence on an element instance
>>>> ("I used <pos> here, meaning exactly the POS in ISOCat") For the
>>>> latter purpose, ISO 12620 introduces two attributes in the dcr:
>>>> namespace, for instance (example provided by Menzo in CC), you can
>>>> decorate an FS as follows <tei:TEI
>>>> xmlns:tei="http://www.tei-c.org/ns/1.0"
>> xmlns:dcr="http://www.isocat.org/ns/dcr">
>>>>    ...
>>>>    <tei:fs>
>>>>        ...
>>>>        <tei:f
>>>>            name="part of speech"
>>>>           dcr:datcat="http://www.isocat.org/datcat/DC-1345"
>>>>            fVal="common noun"
>>>>            dcr:valueDatcat="http://www.isocat.org/datcat/DC-1256"
>>>>        />
>>>>        ...
>>>>    </tei:fs>
>>>>    ...
>>>> </tei:TEI>
>>>>
>>>>
>>>> So, looking again at the ticket the situation is clear, you make
>>>> att.global a member of att.datcat, but make clear in the guidelines
>>>> that this does not replace <equiv>
>>>>
>>>>
>>>> Le 25 avr. 2012 à 22:59, Sebastian Rahtz a écrit :
>>>>
>>>>>
>>>>> On 25 Apr 2012, at 21:40, Piotr Bański wrote:
>>>>>
>>>>>> I'm working on the ISO DCR / ISOcat issues.[1] Got stuck at the
>>>>>> point of adding the relevant pieces of text to the Guidelines.
>>>>>>
>>>>>> The enlightened way to align grammatical categories with the values
>>>>>> of the DCR is to put the appropriate references into the ODD, and I
>>>>>> guess <equiv> is the ideal place for that.
>>>>>>
>>>>>> I imagine, and please correct me if I am wrong, that for elements
>>>>>> such as <pos>, this action may be trivial:
>>>>>>
>>>>>> <elementSpec ident="pos" mode="change"> <equiv
>>>>>> dcr:datcat="http://www.isocat.org/datcat/DC-1345"/>
>>>>>> </elementSpec>
>>>>>
>>>>> <equiv url="http://www.isocat.org/datcat/DC-1345"/> is the syntax, I
>>>>> think.
>>>>>
>>>>>> The above makes it possible for us to happily realize that whenever
>>>>>> we do e.g.
>>>>>>
>>>>>> <gramGrp><pos>...</pos></gramGrp>
>>>>>>
>>>>>> all the machines in the world may know that by <pos>, we mean
>>>>>> http://www.isocat.org/datcat/DC-1345 .
>>>>> well, if they read the ODD yes. I think there is a certain amount of
>>>>> "simple matter of programming" involved here.
>>>>>
>>>>>>
>>>>>> However, there is also the content of <pos> to be handled, and it
>>>>>> is not so obvious to me how to represent this in the ODD.
>>>>>> Intuitively, I'm thinking of
>>>>>>
>>>>>> <elementSpec>
>>>>>> ...
>>>>>> <content>
>>>>>> {list of values with their DCR references} </content>
>>>>>
>>>>> a <elementSpec> can contain a <valList>, whose <valItem> children
>>>>> can have <equiv> children
>>>>>
>>>>> Does that help?
>>>>>
>>>>> I suspect what you'd really like is to use a DTD which supplied
>>>>> default dcr:cat attributes to instances of <pos>.
>>>>>
>>>>> Sebastian
>>>>>


More information about the tei-council mailing list