[tei-council] NH

Sat Sep 29 09:17:26 EDT 2007

> This is not conformant because it breaks the abstract model. A
> <seg> is supposed to have some content.

Indeed, that is one perspective, with which I am pretty sympathetic.
But I then read the actual description of <seg>, which says it
"contains any arbitrary phrase-level unit of text"; seems to me
nothing (i.e., no content) fits into the category of an arbitrary
unit.

Which is why I say using an empty <seg> follows the letter, but
probably not the spirit, of the Guidelines. (Although we are free to
redefine that spirit here, I gather you, Lou, think it's a bad idea.
I have to confess, I agree -- my lazy side may have been hopeful, but
my analytic side objects.)

> If you rewrote it with milestones it would be OK though.

Well it might be valid, but it would still violate the TEI abstract
model. It would confound the difference between segment boundaries
and milestones, which (I believe) is an important, useful
distinction. Milestones are not arbitrary empty elements (like
<anchor>[1]), but "simply mark the points in a text at which some
category in a reference system changes". Milestone elements, like the
milestones along the road after which they are named, mark the
passage from one to another *of the same feature*, e.g. miles of the
road, pages of a book, or the reels of a movie. Thus in the general
case a milestone element marks both the beginning and the end of *the
same feature*, just with a different reference. That is, with respect
to the feature being encoded, the stuff before and the stuff after
the milestone are fundamentally the same. As section HD54M puts it, a
milestone indicates where a state 'variable' changes its value.

One interesting feature of milestones is that the ranges into which
they divide their ancestor element of interest (typically <text>)
tessellate it. This permits software to find out something about the
current feature by looking at the preceding milestone (e.g., to find
current page number, looking for preceding::tei:pb[1]/@n) and to find
its edges by looking for the preceding and following milestones. 

All this is patently untrue of segment boundary delimiters, which
indicate a feature that does not tessellate its ancestor, by
explicitly marking the beginning and the end. In some sense, they are
not marking where a state 'variable' changes its value, but rather
are marking the boundaries of a completely different state of
affairs, a different variable, altogether. Unlike milestones, you
can't just go to the preceding element to get information, as the
preceding element may be the end of a previous occurrence, as opposed
to the beginning of the one you're in.

> (I am still wondering why you prefer "segment boundary" as a name
> to "milestone" which is what we have always called these things
> till now and still do call them elsewhere in the Guidelines)

I do not think the Guidelines ever make the mistake of referring to a
segment boundary as a milestone (or even an anchor-point as a
milestone). Others have committed this error, using "milestone
element" to mean "empty element". As described above, both milestone
elements and segment boundary delimiter elements are special cases
(or particular uses) of empty elements.

I am not sure, but I believe that the nomenclature, which is common
in the literature (although, as I say, occasionally used incorrectly)
was developed by the TEI. You, Lou, are a co-author of what I
consider to be one of the seminal papers on the subject, ML W 18[2].

> > As for HORSE, the whole point of the encoding methodology is that
> > the same element type is used both as a container and as empty
> > boundary markers. Thus whether we describe it as conformable (and
> > thus TEI Conformant) or that it uses a TEI Extension, the
> > elements need to stay in the TEI namespace.

> Same argument applies. An empty <p/> means something different in
> TEI proper.

Sad. Taking this stance means that the HORSE examples will make use
of a TEI Extension, and thus in the current system, will have to
remain <eg>s instead of <egXML>s (right, Sebastian?)

Notes
-----
[1] Actually, in retrospect, why not use <anchor> for this purpose?
    It has been used for just this sort of thing before, when it was
    pressed into service as the end segment boundary delimiter for
    <addSpan> and <delSpan>. It has a type= which can be used for the
    GI, and a subtype=, which can be used for "start" or "end". So
    perhaps segment boundary delimiters are an example of
    "conformable" or "algorithmically conformant" markup.
    Here's the example:
    <lg type="stanza">
     <l><anchor type="s" subtype="start" xml:id="a"/>E l'orma dell'acqua &#xE8; l'alba</l>
     <l>sulla riva.<anchor type="s" subtype="end" xml:id="b"/>
       <anchor type="s" subtype="start" xml:id="c"/>Si esauriva in me</l>
     <l>il supplizio della sabbia,</l>
     <l>a batticuore, spaziando la notte.<anchor type="s" subtype="end" xml:id="d"/></l>
   </lg>
   The only disadvantage is that it's a bit silly to put in those
   xml:id= attributes (which are required) when they're never
   referred to. But that's *adding* information -- the constraint in
   CF is that no information can be *lost*.

[2] http://www.tei-c.org/Vault/ML/mlw18.txt