[tei-council] tei stemma model

Wed Jul 18 09:47:51 EDT 2007

Dear James (cc Council),

> However, I think the better solution has got to be the one which is more
> consistent with the Guidelines as a whole.  I'm also concerned that we are
> just expressing the existing tree structure of the tree/root/iNode/leaf
> elements in a different form.  Would it be better if this were revised to
> allow the nesting structure similar to that which David suggests? (But this
> would push it to P5 1.1 I'm assuming.)
>   

I may have misunderstood, but since <eTree> is already part of the Graph 
chapter and already has the nesting structure I propose, I think we can 
already allow that nesting structure if we use the existing <eTree> to 
do it. That is, a solution more consistent with the Guidelines as a 
whole than my original version could use <eTree> out of the box and make 
the changes in attribute use that you and Conal have advocated. The only 
substantial structural feature not currently present is a way of 
representing contamination, but adding another element should not 
inflict any damage on the existing graphing model (i.e., could be 
undertaken easily and without substantial delay).

> I think using <label> (which if memory serves eTree allows) to label groups
> of nested eTree nodes might be useful.
>   

This sounds like the best way to incorporate group labels, especially 
because 1) it's already in place and 2) it doesn't conflict with labels 
for individual nodes (retrievable from @n or by dereferencing 
@key/@corresp/or something similar). In my own work I tend to refer to 
groups by the label on their parent (so that, for example, I would refer 
to the "beta branch of the tradition" to identify the subtree rooted at 
beta), but if one wishes to assign a different name to the group as a 
whole than to its root node in the stemma, <label> seems like a sensible 
place to record that name.

> I was curious about the decision to have contaminates only contain a single
> target.  I understand that it keeps things straightforward, but what other
> benefits are there, or conversely what drawbacks to having a single
> contaminates point to multiple @xml:id's in a @targets attribute?
>   

As far as I can tell, the two approaches are informationally equivalent, 
but:

1) As you note, allowing a single value keeps things straightforward, by 
which I mean easier to manage.

2) Making each instance of contamination a separate element records in 
an iconic way that each instance of contamination in the tradition is an 
independent event, separate from each other instance. There is no 
informational difference, but there is an intellectual one, to the 
extent that we would like our XML syntax to model our conceptualization 
of the reality we are encoding. (See my 
http://clover.slavic.pitt.edu/~djb/sgml/invalid.html for a discussion of 
schemas as models of reality.)

> If one can use schematron to provide the necessary rules to validate that
> an contaminates/@n attribute points only to a node or a node/@n points only
> to something in msDesc, then couldn't one do the same thing with @xml:id
> and @targets, etc?
>   

I don't see why not.

> For now following your advice (for now) to use eTree instead of node, and
> assuming that there is a witList providing a sigil to which all of these
> eTree elements point, but using @corresp/@xml:id/@targets does the
> following do something similar to what you meant?
>
> <eTree corresp="#msabc123" type="hypothetical">
>   <eTree corresp="#msbeta" type="hypothetical">
>     <eTree corresp="#msDusty" type="hypothetical">
> 	<label>Group Foo</label>
>       <eTree corresp="'#msL" type="extant"/>
>       <eTree corresp="#mst" type="lost"/>
>     </eTree>
>     <eTree corresp="#mseps" type="hypothetical">
> 	<label>Group Blort</label>
>       <eTree corresp="#msR" type="extant"/>
>       <eTree xml:id="nodeA" corresp="#msA" type="extant"/>
>     </eTree>
>   </eTree>
>   <eTree corresp="msY" type="hypothetical">
> 	<label>Group Wibble</label>
>     <contaminates targets="#nodeA"/>
>     <eTree corresp="#msI" type="extant"/>
>     <eTree corresp="#msX" type="extant"/>
>   </eTree>
> </eTree>
>
> Aside from changing the @n attributes to references to spurious @xml:id
> attributes, an my labelling of groups, is much intellectual content lost?
>   

In addition to the differences you note (changing @n to @xml:id, adding 
group labels), you've also changed @target to @targets (presumably so 
that it can contain more than one target, should a node contaminate more 
than one other node) and you've divided between @xml:id and @corresp the 
duties are shared by @n in my original model. Thus, the node that serves 
as the target of pointing in your example has both of those attributes, 
@corresp to point to a presumed <msDescription> elsewhere and @xml:id so 
that @targets on another element can point to it. Other nodes have only 
@corresp; they do not need an @xml:id since they are not the targets of 
pointing from <contaminates> elements.

The only change to intellectual content is the addition of group labels, 
which may be useful and which my original model did not support at all. 
The change of @n to @xml:id and @corresp is, as you note, consistent 
with TEI practice elsewhere. One possible additional cost (on top of 
general parsimony and reduced opportunity for error with my model, which 
I mentioned earlier) is that users may need to draw a stemma where they 
do not have corresponding <msDescription> elements elsewhere. Since your 
model doesn't record the sigla in the stemma itself, it requires that 
users record them elsewhere, even when an <msDescription> (or witness 
list or something similar) may otherwise not be needed, so that the 
indirection and additional markup may be required not by the 
informational goals of the author, but by the construction of the 
schema. I appreciate that this additional cost may also seem worthwhile 
if it buys greater consistency with general TEI practice. I don't 
understand, though, how specifying multiple targets of contamination all 
on one <contaminates> element is more TEI-like than specifying each 
target in a separate <contaminates> element. There is no informational 
difference between the two models, but, as noted above, using a separate 
<contaminates> element for each instance of contamination models 
iconically that each instance of contamination in the tradition is 
independent of each other instance (which is true, or we would be 
asserting that the contamination occurred at a different level in the 
stemma).

> In having skimmed through some more of the trees and graphs section of the
> Guidelines, I'd like to suggest that anything significantly more
> complicated than David's proposal be held back until P5 1.1.  Moreover,
> that when planning P5 1.1 that a significant re-examination of the current
> graph/tree provision be undertaken.
>   

I read the Graphs chapter of the Guidelines for the first time (and the 
second, and the third, etc.) when I was working on this proposal, and it 
left me with the impression that, as in many other places, the TEI may 
have decided to allow multiple ways of doing the same thing. I'm not 
enough of a mathematician to be able to call this "impression" a 
"conclusion" (in some cases the different models supported by the Graph 
chapter clearly have different intellectual properties, and in other 
cases they may have different properties that I do not understand), but 
if we're going to revisit this chapter down the road, I think it's good 
policy in general to provide multiple strategies only when they have 
different informational content (that is, not when they are merely 
notational variants of one another). Readers may find the following 
useful when considering this issue, which arises frequently in TEI-land:

http://www.idealliance.org/papers/extreme/proceedings/html/2002/Usdin01/EML2002Usdin01.html

Best,

David