[tei-council] conformance draft

Fri Mar 23 10:15:24 EST 2007

Wow, excellent discussion thread. I am sorry I don't have a chance to
really dive into deeply right now: Conal's "political" questions are
worth pondering, and I am very interested in Dot's experiments. I
will take the time to address two arguments, though.

jc:I'm jc:not jc:entirely jc:convinced jc:that jc:it jc:truly
jc:impairs jc:human jc:readability. jc:Having jc:a jc:few
jc:namespace jc:prefixes jc:might jc:actually jc:clarify jc:things
jc:more.

sb:I sb:find sb:this sb:pretty sb:hard sb:to sb:swallow, sb:at
sb:least sb:keeping sb:in sb:mind sb:that sb:the sb:target
sb:audience sb:we're sb:talking sb:about sb:here sb:are sb:relative
sb:beginners.

CT> Now, do we want to avoid name collisions or don't we? 

Yes, of course, name collision is annoying, and we'd prefer to avoid
it. But on the scale of problems someone has merging TEI vocabularies
or getting one project's files to be interoperable with another
project's software, name collision is extremely low on the list of
difficulties. (And it's quite rare, to boot.)

Let's say you and I both have rich TEI vocabularies for encoding
early modern printed books, and we each have added <duck> elements
with different semantics. If you want to suck some of my files into
your system, you are going to have deal with a *lot* worse than the
fact that <duck> is a name collision. 

* The fact that I record the idealized page number on n= of <pb>,
  whereas you store it as the content of
  fw[@type='pageNum']/choice/corr. 

* The fact that your software expects quotations in <quote>, but I've
  encoded both quotations and direct speech in <q>.

* The fact that our value lists for type= of <div> and <lg> are
  different.

* The fact that I record rhyme scheme on rhyme= of <lg>, whereas you
  indicate rhyme using <rhyme> inside <l>. 

* The fact that I handle overlapping speeches and metrical lines
  using part= of <l>, and you use next= & prev=.

* The fact that you encode the author in .../titleStmt/author
  regardless, whereas I put it in .../titleStmt/author/persName,
  .../ittleStmt/author/orgName, or .../titleStmt/author, depending on
  whether (I think) the author was a person, an organization, or is
  neither (e.g., "unknown").

* The fact that you've used 2-letter ISO 639 language codes, but I've
  used 3-letter codes.

* The fact that your software expects an <lb> whenever there is a
  line-break, but my encoding presumes that certain elements (<head>,
  <p>, <l>) imply a line-break, so I haven't encoded an <lb>.

* The fact that I've recorded my sources in
  .../sourceDesc/biblStruct, but you've used .../sourceDesc/bibl.

* The fact that I make use of default renditions, but you don't, so
  your software doesn't know about them.

* Oh, and while we're on rendition, I use CSS2 in my rend=
  attributes, but you use a home-grown solution.

* We've used different classification schemes for our <catRef>s. 

You get the idea. The list goes on and on and on even when two
different projects are applying "vanilla" TEI to similar kinds of
documents. Thus the gain of having one fewer problem (our <duck>s
needing to be put in a row) when trying to do something that is a
rare activity at the expense of making day-to-day activities a little
harder and (more importantly) raising the bar for entry into the land
of TEI seems like a bod idea to me.

Remember the words of the song (sung to the tune of "Jesus Christ
Superstar"): 

  T-E-I, Wendell notes,
  Must remain easy for newer folks.

  [http://listserv.brown.edu/archives/cgi-bin/wa?A2=ind0007&L=TEI-L&P=R969,
   although you have to have read the entire thread to get some of
   the jokes, and I can't spend the time to track it down now ...
   besides, IIRC one or two of the exchanges were off-list.]

CT> If we don't care about name collisions, what is the TEI namespace
CT> for? Why have we added it?

For several reasons, but mostly so that one *can* import other
vocabularies into TEI, TEI into other vocabularies, or avoid
namespace collisions with your extensions. Remember, I am not for a
moment saying we should force people to stick their added elements
into the TEI namespace.

(One of the main reasons namespaces exist and are useful, not spelled
out in the Spec, IIRC, is that even without name collisions, it makes
processing much easier when everything is clearly labeled such that a
software package can divide all the elements into "those I am
supposed to process" and "others" with almost no effort.)

So I see these as reasons to permit a user to separate her additions
via namespace, perhaps even to encourage her to do so. But to insist
that the lone scholar studying Hispanic rhyming patterns in
17th-century manuscripts create his own namespace and deal with
multiple namespace issues for the one element he wants to add to
enhance his research (or lose the funding-helpful claim to "TEI
Conformance"); or worse yet, to risk having an administrator worry
that, like copyright infringement, it's illegal to add elements in
the TEI namespace ... all for a limited technical gain that may never
occur, seems like a bad idea.