[tei-council] Fwd: TITE again

James Cummings James.Cummings at oucs.ox.ac.uk
Sun Jul 15 07:38:12 EDT 2012


For your consideration.

-James


-------- Original Message --------
Subject: 	TITE again
Date: 	Sun, 15 Jul 2012 10:32:41 +0000
From: 	Martin Mueller <martinmueller at northwestern.edu>
To: 	James Cummings <James.Cummings at OUCS.OX.AC.UK>



Dear James,

As you know, some time ago I raised the question whether the TITE
convenience elements <b>, <i>, <u>, <sup>, and <sub> should
become part of P5 on the grounds that <i> relates to <hi
rend="italics"> in the same in which <lb/> relates to <milestone
unit="line"/>. There followed a lively discussion on the Council
list, which you accurately summarized as not very conclusive.

I'd like to come back to this discussion and argue that on
balance the case for 'yes' is a little stronger than the case for
'no'. Please take this letter to the Council. If you think it
would be helpful to put it on TEI list please do so.

I read through the thread again in the particular context of
wondering how many of the varied and commons superscripts in the
TCP texts could be expressed through Unicode characters, a
possibility raised by Lou Burnard. There were other comments
Piotr Banski, James Cummings, Martin Holmes, Kevin Hawkins,
Sebastian Rahtz, and Paul Schaffner.

The thread consisted of a mixture of theoretical and pragmatic
arguments. On the more theoretical side, James, Piotr, and Gabby
had reservations about mixing up semantic and presentational
markup, coming too close to HTML, or encouraging encoders to be
lazy. Piotr shared James's sense that <lb> and <pb> were somehow
different from <i> or <sup>. I don't see the difference, but I
respect such intuitions and recognize that they are hard to
resolve by argument.

On the pragmatic side, Kevin, Paul, and Sebastian argued in
favour of various options for inclusion. Paul said that elements
like <i> and <sup> have a "reassuring rootedness in actual page
phenomena." In that regard, they may be like line or page breaks:
you can't really argue about "the fact that." But most page or
line breaks have a compelling reason: there is no more space.
Italics and similar phenomena are never compulsory in that way
(perhaps that is the reason why you and Piotr think they are
different. They must have a reason even if it is hard to figure out).

If you admit things like <i> or <sup>, where do you stop? Paul
raised that question when he said that from the TCP perspective
the list of TITE elements was inadequate and <b> wasn't needed.
Implicit in Sebastian's comments, I think, is the argument that
Frequency is King and a good enough guide to a tightly limited
set of canned options. Sebastian suggested a kiss module (keep it
simple stupid?) of i/b/u/bl/larger/smaller/sup/sub and removing
the rend attribute.

Martin Holmes replied that you'd always need a rend option to
cover eventualities and wondered whether there would be a
continued stream of feature request for more canned options.
Kevin on the other hand argued that a limited set of canned
options helps the cause of interoperability.

To return to Lou's suggestion about superscripts, it turns out
that you can represent a high percentage of superscripts in the
TCP texts (perhaps as many as 98% of tokens) with Unicode
characters. But there doesn't seem to be a superscripted 'c',
which rules out the common w<sup>ch</sup>, there is no lower-case
'i' (so much for the common superscripted forms of 'Majestie'),
and there is no superscripted period sign, which is also common.
But 98% is 98%, and the various Unicode characters, although
cobbled together from different lists, play well with each other
in browser displays.

Where does that leave us? In some ways, the question of
convenience or syntactic sugar elements is like that of
'favourites' on Windows or OS X. Commonly used directories are
freed from their status in the hierarchy and given a sort of VIP
treatment. This works as long as two conditions are met:

  1. The list must be short
  2. The candidates must be really obvious in terms of their
     frequency across a lot of different document types

I would say that <i> and <sup> clearly meet the second criterion.
If you're not bound by Systemzwang you would probably throw out
<sub>, because it is much less common than <sup>. The TITE
inclusion of <b> and <u> may have more to do with HTML than with
conventions of print culture. So I'd urge the Council to go with
the pragmatists and come up with a really short list that covers
a high percentage of cases and is worth the trade-off of giving
up a little consistency for a lot of convenience.

Perhaps there is no such list. But I think there is.




More information about the tei-council mailing list