[tei-council] tei stemma model

David J Birnbaum djbpitt+tei at pitt.edu
Wed Jul 18 13:26:20 EDT 2007


Dear Lou (cc James, Council),

I may have translated the assignment from "describe a stemma using only 
the limited set of tools you have in front of you" into "describe a 
stemma in the best way possible, and then we'll see how to integrate 
that into the TEI." In any case, I was inclined toward an <eTree>-like 
solution because contamination both is and isn't parentage, which led me 
to conclude not that a contaminated stemma isn't a tree (although that's 
a fair way to look at it), but that it's a tree with one other type of 
relationship tacked on.

The existing TEI <graph>, with <node> and <arc>, could describe a 
stemma, since a stemma is a special type of graph and the existing model 
is more than sufficiently powerful, but I think it's a clumsy tool for 
the job. If we consider what we might want to do with a stemma other 
than render it, one possibility is that we might want to use it for the 
semi-automated evaluation of variation. I think this type of possibility 
is the most exciting aspect of the whole enterprise, and the sort of 
thing that makes humanities computing interesting even to philologists 
who may not otherwise be interested in computing.

For example, suppose we have the stemma in my sample (you've all 
memorized it by now, right? :-) ) and the following variation in an edition:

    <app>
        <rdg wit="L t">Chocolate</rdg>
        <rdg wit="R A I X">Peanut butter</rdg>
    </app>

According to stemmatic principles, the reading in alpha was "Peanut 
butter," and "Chocolate" was introduced in delta. If, on the other hand, 
we have:

    <app>
        <rdg wit="L t R A">Chocolate</rdg>
        <rdg wit="I X">Peanut butter</rdg>
    </app>

the "vote" is still two-to-four, but here we have a crux, since one 
reading goes back to beta and the other to gamma, and the stemma doesn't 
help us determine which goes back, in turn, to alpha.

If we've taken an <eTree> approach, we can examine the text() nodes of 
the <rdg> elements for each <app> element and for each <rdg> element 
find the youngest common parent (in the stemma) of the manuscripts cited 
in the @wit attribute. If they are at the same depth in the stemma, we 
have a crux, and the stemma cannot resolve which is primary. If, 
however, the youngest common parent of one is deeper in the tree than 
the lowest common parent of the other, the former is the error and the 
latter can be projected back to alpha.

This is difficult but manageable XSLT/XPath programming with an 
<eTree>-like model. With the <graph>/<node>/<arc> model, on the other 
hand, it becomes much more complicated. It isn't impossible (after all, 
the <graph>/<node>/<arc> model can be transformed into the <eTree> 
model), but if we were going to try to build something like this for 
production, I think we agree on which model has the greater engineering 
advantages (by far). I'd like to see the TEI Guidelines say something 
like "here's The Best Way to represent a stemma because in addition to 
describing the graph [which one could do in a variety of ways], it also 
lets one automate some of the analysis of variation, and that's part of 
what humanities computing is all about."

With this in mind, I see no reason not to enhance the <eTree> model with 
a <contaminates> feature. It doesn't get in the way for those who are 
modeling true trees, and it lets us use the <eTree> structure for 
modeling stemmata, which I think makes those models much more useful for 
textual analysis than would be the case under the <graph>/<node>/<arc> 
approach.

Cheers,

David

Lou Burnard wrote:
> Unless I'm mistaken, when this investigation of how to represent ms 
> stemma was first proposed, it was mostly as an exercise to see how 
> applicable the existing TEI model for trees and graphs is. I may be 
> inventing this, but my recollection is that the conversation went 
> something like
> x: we've got this lovely way of representing graphs and networks and 
> stuff
> y: why would anyone ever want to use such a thing
> x: well you could use it represent, err, airplane networks, or family 
> trees, or um,
> y: or MANUSCRIPT TRADITIONS! wow that's really innerestink
>
> So the exercise intended for David was really to test the capabilities 
> of the existing TEI model, which I think his work does rather well.
>
> I don't have anything to add to what James has already said about
> labels and the use of existing TEI styles for identification and 
> linkage. I do however wish to confess to a feeling of unease about 
> contamination.
>
> As I understand it, this is a way of saying that a given node has one 
> or more parent-nodes *other* than the one it's actually attached to -- 
> which  of course means that you're not looking at a tree any more, but 
> a directed acyclic graph. So it cannot be represented using <tree> or 
> <eTree> -- you need to use the more general <graph>.
>
> I would be interested to see how David's example would play as a <graph>
>
> Lou
>




More information about the tei-council mailing list