[tei-council] Re: sanskrit examples

Lou Burnard lou.burnard at computing-services.oxford.ac.uk
Fri Sep 17 12:48:31 EDT 2004


My apologies for not having done this already. As my earlier note 
suggested I did try to discuss the solution with John over the phone 
last month, but I should certainly have completed the action I 
volunteered for at the last Council call well before now: my bad (as my 
daughter likes to say).

Anyway, to recap on the issue....

John's document proposed two examples (I have not tried to include the 
accented characters in this email, since they dont really affect the issue)

A <sequence type="wordGroup">asidraja
     <segment type="word">asit</word>
     <segment type="word">raja</word>
</sequence>

B <sequence type="compound">dharamksetre
      <segment type="compoundMember">dharma</segment>
      <segment type="compoundMember">ksetre</segment>
   </sequence>

In each case, the intention is to show that the first string ("asidraja" 
or "sharamksetre") can be analysed into two constituents, and that those 
two constituents need not be simple substrings of the first one.

There was (unless I am making it up) a general consensus on the council 
that mixed content models of this type were less preferable than either 
of the following more explicit formulations:

<xxx>
    <sequence type="compound">dharamksetre
      <segment type="compoundMember">dharma</segment>
      <segment type="compoundMember">ksetre</segment>
   </sequence>
</xxx>

<sequence type="compound">
      <xxx>dharamksetre</xxx>
      <segment type="compoundMember">dharma</segment>
      <segment type="compoundMember">ksetre</segment>
   </sequence>

where I have used xxx for some element whose name is to be invented

John proposed for example:

C <sequence>
     <sequenceText type="compound">dharamksetre</sequenceText>
      <sequenceAnalysis>
         <segment type="compoundMember">dharma</segment>
          <segment type="compoundMember">ksetre</segment>
   </sequenceAnalysis>
</sequence>

Which seems reasonable enough to me. Using only the existing TEI 
segmentation and analysis tags, I would suggest

D

<seg type="compound" ana="C1">
     dharamksetre
</seg>

<seg type="analysis" id="C1">
     <w>dharma</w>
     <w>ksetre</w>
</seg>

This representation uses the ID/IDREF link supplied by ANA to point to 
the analysis, which therefore need not be in the same place in the 
document. This is  advantageous if there are many such cases to be 
marked up (I have used the arbitrary identifier "C1" but of course a 
more meaningful id would be preferable, e.g. "dharma-kstre"): the 
analysis need only be given once in the document.

If on the other hand the intention is always to give the two things 
together, then I see no reason why they should not be grouped using a 
<choice> element:

E

<choice>
<seg type="compound">
     dharamksetre
</seg>
<seg type="analysis">
     <w>dharma</w>
     <w>ksetre</w>
</seg>
</choice>

At present, the tei-choice group is still arguing about the details of 
the choice content model, but the above usage is not -- as yet -- in 
question. "choice" here means simply that there are two possible ways of 
representing the phenomenon in the text, that the first one given is 
physically attested in the source, and the other is an appropriate 
encoding of the same phenomenon.

Lou


> Dear Lou,
> 
> In the last conference call, you agreed to reencode John Smith's
> sanskrit examples using <seg> and <ana> and send this to him.  If you
> have these, I would like to have a look.  (He is calling me today and
> I would like to see what has been going on..)
> 
> All the best,
> 
> Christian
> 
> 




More information about the tei-council mailing list