[tei-council] tei stemma model
David J Birnbaum
djbpitt+tei at pitt.edu
Wed Jul 18 09:47:51 EDT 2007
Dear James (cc Council),
> However, I think the better solution has got to be the one which is more
> consistent with the Guidelines as a whole. I'm also concerned that we are
> just expressing the existing tree structure of the tree/root/iNode/leaf
> elements in a different form. Would it be better if this were revised to
> allow the nesting structure similar to that which David suggests? (But this
> would push it to P5 1.1 I'm assuming.)
>
I may have misunderstood, but since <eTree> is already part of the Graph
chapter and already has the nesting structure I propose, I think we can
already allow that nesting structure if we use the existing <eTree> to
do it. That is, a solution more consistent with the Guidelines as a
whole than my original version could use <eTree> out of the box and make
the changes in attribute use that you and Conal have advocated. The only
substantial structural feature not currently present is a way of
representing contamination, but adding another element should not
inflict any damage on the existing graphing model (i.e., could be
undertaken easily and without substantial delay).
> I think using <label> (which if memory serves eTree allows) to label groups
> of nested eTree nodes might be useful.
>
This sounds like the best way to incorporate group labels, especially
because 1) it's already in place and 2) it doesn't conflict with labels
for individual nodes (retrievable from @n or by dereferencing
@key/@corresp/or something similar). In my own work I tend to refer to
groups by the label on their parent (so that, for example, I would refer
to the "beta branch of the tradition" to identify the subtree rooted at
beta), but if one wishes to assign a different name to the group as a
whole than to its root node in the stemma, <label> seems like a sensible
place to record that name.
> I was curious about the decision to have contaminates only contain a single
> target. I understand that it keeps things straightforward, but what other
> benefits are there, or conversely what drawbacks to having a single
> contaminates point to multiple @xml:id's in a @targets attribute?
>
As far as I can tell, the two approaches are informationally equivalent,
but:
1) As you note, allowing a single value keeps things straightforward, by
which I mean easier to manage.
2) Making each instance of contamination a separate element records in
an iconic way that each instance of contamination in the tradition is an
independent event, separate from each other instance. There is no
informational difference, but there is an intellectual one, to the
extent that we would like our XML syntax to model our conceptualization
of the reality we are encoding. (See my
http://clover.slavic.pitt.edu/~djb/sgml/invalid.html for a discussion of
schemas as models of reality.)
> If one can use schematron to provide the necessary rules to validate that
> an contaminates/@n attribute points only to a node or a node/@n points only
> to something in msDesc, then couldn't one do the same thing with @xml:id
> and @targets, etc?
>
I don't see why not.
> For now following your advice (for now) to use eTree instead of node, and
> assuming that there is a witList providing a sigil to which all of these
> eTree elements point, but using @corresp/@xml:id/@targets does the
> following do something similar to what you meant?
>
> <eTree corresp="#msabc123" type="hypothetical">
> <eTree corresp="#msbeta" type="hypothetical">
> <eTree corresp="#msDusty" type="hypothetical">
> <label>Group Foo</label>
> <eTree corresp="'#msL" type="extant"/>
> <eTree corresp="#mst" type="lost"/>
> </eTree>
> <eTree corresp="#mseps" type="hypothetical">
> <label>Group Blort</label>
> <eTree corresp="#msR" type="extant"/>
> <eTree xml:id="nodeA" corresp="#msA" type="extant"/>
> </eTree>
> </eTree>
> <eTree corresp="msY" type="hypothetical">
> <label>Group Wibble</label>
> <contaminates targets="#nodeA"/>
> <eTree corresp="#msI" type="extant"/>
> <eTree corresp="#msX" type="extant"/>
> </eTree>
> </eTree>
>
> Aside from changing the @n attributes to references to spurious @xml:id
> attributes, an my labelling of groups, is much intellectual content lost?
>
In addition to the differences you note (changing @n to @xml:id, adding
group labels), you've also changed @target to @targets (presumably so
that it can contain more than one target, should a node contaminate more
than one other node) and you've divided between @xml:id and @corresp the
duties are shared by @n in my original model. Thus, the node that serves
as the target of pointing in your example has both of those attributes,
@corresp to point to a presumed <msDescription> elsewhere and @xml:id so
that @targets on another element can point to it. Other nodes have only
@corresp; they do not need an @xml:id since they are not the targets of
pointing from <contaminates> elements.
The only change to intellectual content is the addition of group labels,
which may be useful and which my original model did not support at all.
The change of @n to @xml:id and @corresp is, as you note, consistent
with TEI practice elsewhere. One possible additional cost (on top of
general parsimony and reduced opportunity for error with my model, which
I mentioned earlier) is that users may need to draw a stemma where they
do not have corresponding <msDescription> elements elsewhere. Since your
model doesn't record the sigla in the stemma itself, it requires that
users record them elsewhere, even when an <msDescription> (or witness
list or something similar) may otherwise not be needed, so that the
indirection and additional markup may be required not by the
informational goals of the author, but by the construction of the
schema. I appreciate that this additional cost may also seem worthwhile
if it buys greater consistency with general TEI practice. I don't
understand, though, how specifying multiple targets of contamination all
on one <contaminates> element is more TEI-like than specifying each
target in a separate <contaminates> element. There is no informational
difference between the two models, but, as noted above, using a separate
<contaminates> element for each instance of contamination models
iconically that each instance of contamination in the tradition is
independent of each other instance (which is true, or we would be
asserting that the contamination occurred at a different level in the
stemma).
> In having skimmed through some more of the trees and graphs section of the
> Guidelines, I'd like to suggest that anything significantly more
> complicated than David's proposal be held back until P5 1.1. Moreover,
> that when planning P5 1.1 that a significant re-examination of the current
> graph/tree provision be undertaken.
>
I read the Graphs chapter of the Guidelines for the first time (and the
second, and the third, etc.) when I was working on this proposal, and it
left me with the impression that, as in many other places, the TEI may
have decided to allow multiple ways of doing the same thing. I'm not
enough of a mathematician to be able to call this "impression" a
"conclusion" (in some cases the different models supported by the Graph
chapter clearly have different intellectual properties, and in other
cases they may have different properties that I do not understand), but
if we're going to revisit this chapter down the road, I think it's good
policy in general to provide multiple strategies only when they have
different informational content (that is, not when they are merely
notational variants of one another). Readers may find the following
useful when considering this issue, which arises frequently in TEI-land:
http://www.idealliance.org/papers/extreme/proceedings/html/2002/Usdin01/EML2002Usdin01.html
Best,
David
More information about the tei-council
mailing list