[tei-council] divWrapping: problem description

Syd Bauman Syd_Bauman at Brown.edu
Tue Jul 17 14:19:09 EDT 2007


The major problem with the current content model of <div>[1] is that
anything that can go at the top of a division can also go at the
bottom. For a lot of elements this makes sense, e.g. <argument>,
<byline>, and <dateline>. Even for <epigraph> it may be OK, for
although most define <epigraph> as something that occurs at the
beginning[2], P4 goes both ways on the issue (defining <epigraph> as
"appearing at the start of a section or chapter, or on a title page",
but permitting it in %m.divbot;), and a few define it as something
either at the beginning or the end (the Comprehensive TeX Archive
Network, UK TeX FAQ); moreover, one could stretch the etymological
roots of the word to include both before and after.

However, for some elements, most notably <head> and <opener>, it
makes no sense, and causes problems, to allow them at the bottom of a
division. 

First off, semantically <head> and <opener> just should not exist
after the main part of a division, much less after a <trailer> or
<closer>. The following is valid against the current content
model[1]: 

      <div>
        <head/>
        <p/>
        <p/>
        <trailer/>
        <head/>
      </div>

But what's worse, is the confusion this causes in those new to TEI,
particularly those who are used to XHTML. 

Imagine such a person encoding what you and I know should be encoded
as

      <div type="section">
        <div type="subsection">
          <head/>
          <p/>
          <p/>
        </div>
        <div type="subsection">
          <head/>
          <note/>
          <p/>
          <p/>
        </div>
        <div type="subsection">
          <head/>
          <p/>
          <p/>
        </div>
      </div>

Being familiar with XHTML, she tries the following:

01.      <div type="section">
02.        <head/>
03.        <p/>
04.        <p/>
05.        <head/>
06.        <note/>
07.        <p/>
08.        <p/>
09.        <head/>
10.        <p/>
11.        <p/>
12.      </div>

and gets quite confused when her validator tells her that the <p>
element on line 07 is invalid. (And then she asks "why isn't a <p>
allowed after a <note>?" :-)

We would just be asking for trouble. Of course, the validator should
tell her that the <head> at line 05 is invalid, because <head>
shouldn't be allowed at the bottom of a division.


Note also that the content model for <body>, which should be
essentially the same as that for <div> (except that it has to
accommodate both the numbered- and unnumbered- flavors of division)
does not have this problem. It has the reverse problem, that none of
the divWrapper elements are allowed near the bottom.


Lastly, the content model of <div> and its ilk is quite complicated,
with what initially appear to be extra levels of parentheses.
However, these are needed for DTD (and perhaps XSD) determinism. I
believe it is possible that we could make this model a bit simpler
and still be deterministic, but I am not sure, and as Sebastian has
pointed out, if it can be done it is awfully hard.

My recommendation (which I never got to on the conference call) is
not to bother trying for this simplification for 1.0.


Wrapping ones mind around the content models for <back> and <front>
and how they should be utilizing divWrapping stuff is another story.


Notes
-----
[1] See http://www.tei-c.org/release/doc/tei-p5-doc/html/ref-div.html.
[2] Including wikipedia, Chicago Manual of Style, Random House
    Unabridged Dictionary, Merriam-Webster, The American Heritage
    Dictionary of the English Language, WordNet, DocBook, and the
    Oxford English Dictionary.




More information about the tei-council mailing list