[tei-council] More on TEI Lite: work in progress

Christian Wittern wittern at kanji.zinbun.kyoto-u.ac.jp
Mon Feb 20 00:34:03 EST 2006

James Cummings <James.Cummings at computing-services.oxford.ac.uk> writes:

> Lou Burnard wrote:
>> Sympathetic as I am to the suggestion that <g> should be usable in a TEI
>> Lite document, I am also feeling somewhat daunted at the extra baggage
>> this would entail. It isn't just a matter of adding <g>, you also have
>> to add  all the paraphernalia of <charDesc> -- some dozen extra elements.

This is true, but in practice most users will encounter only the <g>
that pops up as a possible element in all content models.

> I have to disagree with some other members of the council on this.  I think TEI
> Lite should expressly *not* contain <g>.  This does not mean I don't like people
> using non-standard character or glyph, simply that I strongly feel that if they
> are doing something unusual (i.e. using such characters) then they would benefit
> more greatly from using 'full' TEI and/or creating their own customisation. TEI
> Lite should be just that, 'Lite'.

I think there are two separate issues here. 

The <g> element has been introduced to provide a way for the text
encoder to introduce new characters or specify attributes to existing
characters.  The necessity of this arises from issues with the
underlying character encoding and is quite independent of the relative
complexity of the markup used.

Quite aside from this issue, one of the applications of TEI Lite is in
introductions to TEI (or mostly, text encoding, markup etc) that want
to provide a good overview of what text encoding is.  The benefit of
having TEI Lite here is that it comes with a very condensed tutorial
which is far less intimidating than the whole Guidelines (it is also
the only part of the TEI that has so far been translated in a range of
languages other than English) and that it comes with downloadable
schema files.  This latter fact is quite crucial for introductory
courses, because the concepts necessary for understanding and using
Roma to create a customization are beyond reach here.  Currently, I am
preparing customized TEI Lite versions that differ from the "standard"
TEI Lite only in that they have the gaiji-module added.  The
impression this makes on participants is without fail that "standard"
TEI is simply not up to the task of dealing with East Asian texts
(since that is what I have to deal with in my courses), which gives
them (in their eyes) a good excuse to avoid dealing with TEI at all.

> However, I do have the following comments:
> 1) Section 17: should be expanded a bit, perhaps mentioning that if one needs to
> use strange non-unicode characters that the full version of TEI has the
> capability to allow you to do this with <g>.

Yes, the <g> after being introduced to TEI Lite should probably
discussed here.  I think it is also overly optimistic to say that 

"For those working with standard forms of the European languages in
particular, almost no special action is needed: "

The action needed is either to convince the "any XML editor" to use
UTF-8 or if that does not work, declare the encoding of the file to
use iso-8859-1.  "No action" is surely asking for desaster here.

> 2) Section 17:  It may be misleading to say "Unicode as the required character
> set for all documents", as XML allows you to specify other encodings, isn't it
> the parsers which have to worry about changing them to unicode.

No.  XML uses only Unicode.  XML uses only Unicode.  XML uses only
Unicode.  If you specify an encoding in the XML declaration, what you
do is specify a *subset* of XML.  You can not specify an encoding that
can not be mapped to XML. (Such things do exist and a friend of mine
sticks to SGML for this very reason)

> 3) Section 17: Perhaps providing some examples of unicode character entity usage
> might make the last sentence clearer.

Talking about entity references opens a very ugly can of worms here,
especially since mainstream P5 is not using DTDs any more. If you mean
numeric character references like &#160; you should say so.

> 4) Somewhere there should be a short discussion that TEI-Lite is expressed as an
> ODD and either a copy of that ODD or a link to it.  This discussion should
> mention Roma and that TEI users have the ability to make their own
> customisations if they don't like TEI Lite.

I would say that it should be enough to point to somewhere in the

All the best,


 Christian Wittern 
 Institute for Research in Humanities, Kyoto University
 47 Higashiogura-cho, Kitashirakawa, Sakyo-ku, Kyoto 606-8265, JAPAN

More information about the tei-council mailing list