[tei-council] More on TEI Lite: work in progress

James Cummings James.Cummings at computing-services.oxford.ac.uk
Mon Feb 20 16:28:37 EST 2006


Christian Wittern wrote:
> James Cummings <James.Cummings at computing-services.oxford.ac.uk> writes:
>>I have to disagree with some other members of the council on this.  I think TEI
>>Lite should expressly *not* contain <g>.  This does not mean I don't like people
>>using non-standard character or glyph, simply that I strongly feel that if they
>>are doing something unusual (i.e. using such characters) then they would benefit
>>more greatly from using 'full' TEI and/or creating their own customisation. TEI
>>Lite should be just that, 'Lite'.
> 
> 
> I think there are two separate issues here. 
> 
> The <g> element has been introduced to provide a way for the text
> encoder to introduce new characters or specify attributes to existing
> characters.  The necessity of this arises from issues with the
> underlying character encoding and is quite independent of the relative
> complexity of the markup used.

True, I agree with this.

> Quite aside from this issue, one of the applications of TEI Lite is in
> introductions to TEI (or mostly, text encoding, markup etc) that want
> to provide a good overview of what text encoding is.  The benefit of
> having TEI Lite here is that it comes with a very condensed tutorial
> which is far less intimidating than the whole Guidelines (it is also
> the only part of the TEI that has so far been translated in a range of
> languages other than English) and that it comes with downloadable
> schema files.  This latter fact is quite crucial for introductory
> courses, because the concepts necessary for understanding and using
> Roma to create a customization are beyond reach here.  

I also agree with this, Lou, Sebastian and I are using TEI Lite 
documentation in the course material for a 2day course we are in the 
middle of teaching.

> Currently, I am
> preparing customized TEI Lite versions that differ from the "standard"
> TEI Lite only in that they have the gaiji-module added.  The
> impression this makes on participants is without fail that "standard"
> TEI is simply not up to the task of dealing with East Asian texts
> (since that is what I have to deal with in my courses), which gives
> them (in their eyes) a good excuse to avoid dealing with TEI at all.

That is a shame, and an understandable customisation as a way around 
it. But is this an argument for adding <g> to TEILite?  Rather than 
making your customisation available as well? (It is a theoretical 
tangent that this is a separate TEI customisation (TEIGaijiLite?!?) 
rather than TEILite+gaiji.  I.e. that people should be customising 
TEI, not customising TEILite)

Since I know absolutely nothing about East Asian texts and the amount 
of Unicode support in that area, I had always assumed that a fair bit 
of necessary glyphs had already made it in to Unicode.  (I know this 
will expose my ignorance both of East Asian languages and Unicode's 
coverage of them.)

In teaching today Lou and Sebastian did mention that we were using a 
subset of TEI called TEI-Lite, but proceeded also to describe the TEI 
modules, class system, etc. before having students use Roma to 
customise the TEI themselves.  Using TEILite is not incompatible with 
also showing how to customise the TEI.  Rather than concentrate on 
adding in gaiji, I would have thought it more of a benefit to show 
them how to start with 'full' TEI and remove all the bits they don't 
want for their exercises.  If this happens to produce TEILite+gaiji, 
then so be it.  In fact, rather than concentrating on adding gaiji, I 
would have thought it should be highlighted as a module providing good 
functionality for East Asian languages and special glyphs that other 
encoding schemes might not provide.

>>2) Section 17:  It may be misleading to say "Unicode as the required character
>>set for all documents", as XML allows you to specify other encodings, isn't it
>>the parsers which have to worry about changing them to unicode.
> 
> No.  XML uses only Unicode.  XML uses only Unicode.  XML uses only
> Unicode.  If you specify an encoding in the XML declaration, what you
> do is specify a *subset* of XML.  You can not specify an encoding that
> can not be mapped to XML. (Such things do exist and a friend of mine
> sticks to SGML for this very reason)

Yes, you are right of course, when I declare my document as iso-8859-1 
for example, this is a subset of XML.  Since the first 256 code points 
in unicode were made identical to iso-8859-1 then my subset is easily 
mapped.  I think what was in my mind was the way it was expressed 
seemed to suggest that only one encoding was allowed, while this is 
true in that it is Unicode, it might frighten people off who are used 
to using other character encodings which are now part of unicode.  So 
I guess I wanted some examples saying that various iso encodings are 
allowed.

>>3) Section 17: Perhaps providing some examples of unicode character entity usage
>>might make the last sentence clearer.
> Talking about entity references opens a very ugly can of worms here,
> especially since mainstream P5 is not using DTDs any more. If you mean
> numeric character references like &#160; you should say so.

You are right, I meant character references, so I reiterate: 
'providing some examples of Unicode character references usage might 
make the last sentence clearer'.

>>4) Somewhere there should be a short discussion that TEI-Lite is expressed as an
>>ODD and either a copy of that ODD or a link to it.  This discussion should
>>mention Roma and that TEI users have the ability to make their own
>>customisations if they don't like TEI Lite.
> 
> I would say that it should be enough to point to somewhere in the
> Guidelines. 

What I'm trying to avoid by suggesting this is exactly the problem the 
you were encountering above, that people think TEI-Lite is 'TEI' and 
when it can't cope with something think that 'TEI isn't up to it'.  I 
think that TEI Lite should be proclaiming loud and proud that it is a 
customisation and that 'full' TEI has much more in it.

-James



More information about the tei-council mailing list