[tei-council] Re: sanskrit examples
Lou Burnard
lou.burnard at computing-services.oxford.ac.uk
Fri Sep 17 12:48:31 EDT 2004
My apologies for not having done this already. As my earlier note
suggested I did try to discuss the solution with John over the phone
last month, but I should certainly have completed the action I
volunteered for at the last Council call well before now: my bad (as my
daughter likes to say).
Anyway, to recap on the issue....
John's document proposed two examples (I have not tried to include the
accented characters in this email, since they dont really affect the issue)
A <sequence type="wordGroup">asidraja
<segment type="word">asit</word>
<segment type="word">raja</word>
</sequence>
B <sequence type="compound">dharamksetre
<segment type="compoundMember">dharma</segment>
<segment type="compoundMember">ksetre</segment>
</sequence>
In each case, the intention is to show that the first string ("asidraja"
or "sharamksetre") can be analysed into two constituents, and that those
two constituents need not be simple substrings of the first one.
There was (unless I am making it up) a general consensus on the council
that mixed content models of this type were less preferable than either
of the following more explicit formulations:
<xxx>
<sequence type="compound">dharamksetre
<segment type="compoundMember">dharma</segment>
<segment type="compoundMember">ksetre</segment>
</sequence>
</xxx>
<sequence type="compound">
<xxx>dharamksetre</xxx>
<segment type="compoundMember">dharma</segment>
<segment type="compoundMember">ksetre</segment>
</sequence>
where I have used xxx for some element whose name is to be invented
John proposed for example:
C <sequence>
<sequenceText type="compound">dharamksetre</sequenceText>
<sequenceAnalysis>
<segment type="compoundMember">dharma</segment>
<segment type="compoundMember">ksetre</segment>
</sequenceAnalysis>
</sequence>
Which seems reasonable enough to me. Using only the existing TEI
segmentation and analysis tags, I would suggest
D
<seg type="compound" ana="C1">
dharamksetre
</seg>
<seg type="analysis" id="C1">
<w>dharma</w>
<w>ksetre</w>
</seg>
This representation uses the ID/IDREF link supplied by ANA to point to
the analysis, which therefore need not be in the same place in the
document. This is advantageous if there are many such cases to be
marked up (I have used the arbitrary identifier "C1" but of course a
more meaningful id would be preferable, e.g. "dharma-kstre"): the
analysis need only be given once in the document.
If on the other hand the intention is always to give the two things
together, then I see no reason why they should not be grouped using a
<choice> element:
E
<choice>
<seg type="compound">
dharamksetre
</seg>
<seg type="analysis">
<w>dharma</w>
<w>ksetre</w>
</seg>
</choice>
At present, the tei-choice group is still arguing about the details of
the choice content model, but the above usage is not -- as yet -- in
question. "choice" here means simply that there are two possible ways of
representing the phenomenon in the text, that the first one given is
physically attested in the source, and the other is an appropriate
encoding of the same phenomenon.
Lou
> Dear Lou,
>
> In the last conference call, you agreed to reencode John Smith's
> sanskrit examples using <seg> and <ana> and send this to him. If you
> have these, I would like to have a look. (He is calling me today and
> I would like to see what has been going on..)
>
> All the best,
>
> Christian
>
>
More information about the tei-council
mailing list