[tei-council] tei stemma model
David J Birnbaum
djbpitt+tei at pitt.edu
Wed Jul 18 13:26:20 EDT 2007
Dear Lou (cc James, Council),
I may have translated the assignment from "describe a stemma using only
the limited set of tools you have in front of you" into "describe a
stemma in the best way possible, and then we'll see how to integrate
that into the TEI." In any case, I was inclined toward an <eTree>-like
solution because contamination both is and isn't parentage, which led me
to conclude not that a contaminated stemma isn't a tree (although that's
a fair way to look at it), but that it's a tree with one other type of
relationship tacked on.
The existing TEI <graph>, with <node> and <arc>, could describe a
stemma, since a stemma is a special type of graph and the existing model
is more than sufficiently powerful, but I think it's a clumsy tool for
the job. If we consider what we might want to do with a stemma other
than render it, one possibility is that we might want to use it for the
semi-automated evaluation of variation. I think this type of possibility
is the most exciting aspect of the whole enterprise, and the sort of
thing that makes humanities computing interesting even to philologists
who may not otherwise be interested in computing.
For example, suppose we have the stemma in my sample (you've all
memorized it by now, right? :-) ) and the following variation in an edition:
<app>
<rdg wit="L t">Chocolate</rdg>
<rdg wit="R A I X">Peanut butter</rdg>
</app>
According to stemmatic principles, the reading in alpha was "Peanut
butter," and "Chocolate" was introduced in delta. If, on the other hand,
we have:
<app>
<rdg wit="L t R A">Chocolate</rdg>
<rdg wit="I X">Peanut butter</rdg>
</app>
the "vote" is still two-to-four, but here we have a crux, since one
reading goes back to beta and the other to gamma, and the stemma doesn't
help us determine which goes back, in turn, to alpha.
If we've taken an <eTree> approach, we can examine the text() nodes of
the <rdg> elements for each <app> element and for each <rdg> element
find the youngest common parent (in the stemma) of the manuscripts cited
in the @wit attribute. If they are at the same depth in the stemma, we
have a crux, and the stemma cannot resolve which is primary. If,
however, the youngest common parent of one is deeper in the tree than
the lowest common parent of the other, the former is the error and the
latter can be projected back to alpha.
This is difficult but manageable XSLT/XPath programming with an
<eTree>-like model. With the <graph>/<node>/<arc> model, on the other
hand, it becomes much more complicated. It isn't impossible (after all,
the <graph>/<node>/<arc> model can be transformed into the <eTree>
model), but if we were going to try to build something like this for
production, I think we agree on which model has the greater engineering
advantages (by far). I'd like to see the TEI Guidelines say something
like "here's The Best Way to represent a stemma because in addition to
describing the graph [which one could do in a variety of ways], it also
lets one automate some of the analysis of variation, and that's part of
what humanities computing is all about."
With this in mind, I see no reason not to enhance the <eTree> model with
a <contaminates> feature. It doesn't get in the way for those who are
modeling true trees, and it lets us use the <eTree> structure for
modeling stemmata, which I think makes those models much more useful for
textual analysis than would be the case under the <graph>/<node>/<arc>
approach.
Cheers,
David
Lou Burnard wrote:
> Unless I'm mistaken, when this investigation of how to represent ms
> stemma was first proposed, it was mostly as an exercise to see how
> applicable the existing TEI model for trees and graphs is. I may be
> inventing this, but my recollection is that the conversation went
> something like
> x: we've got this lovely way of representing graphs and networks and
> stuff
> y: why would anyone ever want to use such a thing
> x: well you could use it represent, err, airplane networks, or family
> trees, or um,
> y: or MANUSCRIPT TRADITIONS! wow that's really innerestink
>
> So the exercise intended for David was really to test the capabilities
> of the existing TEI model, which I think his work does rather well.
>
> I don't have anything to add to what James has already said about
> labels and the use of existing TEI styles for identification and
> linkage. I do however wish to confess to a feeling of unease about
> contamination.
>
> As I understand it, this is a way of saying that a given node has one
> or more parent-nodes *other* than the one it's actually attached to --
> which of course means that you're not looking at a tree any more, but
> a directed acyclic graph. So it cannot be represented using <tree> or
> <eTree> -- you need to use the more general <graph>.
>
> I would be interested to see how David's example would play as a <graph>
>
> Lou
>
More information about the tei-council
mailing list