[tei-council] More on TEI Lite: work in progress
James Cummings
James.Cummings at computing-services.oxford.ac.uk
Mon Feb 20 16:28:37 EST 2006
Christian Wittern wrote:
> James Cummings <James.Cummings at computing-services.oxford.ac.uk> writes:
>>I have to disagree with some other members of the council on this. I think TEI
>>Lite should expressly *not* contain <g>. This does not mean I don't like people
>>using non-standard character or glyph, simply that I strongly feel that if they
>>are doing something unusual (i.e. using such characters) then they would benefit
>>more greatly from using 'full' TEI and/or creating their own customisation. TEI
>>Lite should be just that, 'Lite'.
>
>
> I think there are two separate issues here.
>
> The <g> element has been introduced to provide a way for the text
> encoder to introduce new characters or specify attributes to existing
> characters. The necessity of this arises from issues with the
> underlying character encoding and is quite independent of the relative
> complexity of the markup used.
True, I agree with this.
> Quite aside from this issue, one of the applications of TEI Lite is in
> introductions to TEI (or mostly, text encoding, markup etc) that want
> to provide a good overview of what text encoding is. The benefit of
> having TEI Lite here is that it comes with a very condensed tutorial
> which is far less intimidating than the whole Guidelines (it is also
> the only part of the TEI that has so far been translated in a range of
> languages other than English) and that it comes with downloadable
> schema files. This latter fact is quite crucial for introductory
> courses, because the concepts necessary for understanding and using
> Roma to create a customization are beyond reach here.
I also agree with this, Lou, Sebastian and I are using TEI Lite
documentation in the course material for a 2day course we are in the
middle of teaching.
> Currently, I am
> preparing customized TEI Lite versions that differ from the "standard"
> TEI Lite only in that they have the gaiji-module added. The
> impression this makes on participants is without fail that "standard"
> TEI is simply not up to the task of dealing with East Asian texts
> (since that is what I have to deal with in my courses), which gives
> them (in their eyes) a good excuse to avoid dealing with TEI at all.
That is a shame, and an understandable customisation as a way around
it. But is this an argument for adding <g> to TEILite? Rather than
making your customisation available as well? (It is a theoretical
tangent that this is a separate TEI customisation (TEIGaijiLite?!?)
rather than TEILite+gaiji. I.e. that people should be customising
TEI, not customising TEILite)
Since I know absolutely nothing about East Asian texts and the amount
of Unicode support in that area, I had always assumed that a fair bit
of necessary glyphs had already made it in to Unicode. (I know this
will expose my ignorance both of East Asian languages and Unicode's
coverage of them.)
In teaching today Lou and Sebastian did mention that we were using a
subset of TEI called TEI-Lite, but proceeded also to describe the TEI
modules, class system, etc. before having students use Roma to
customise the TEI themselves. Using TEILite is not incompatible with
also showing how to customise the TEI. Rather than concentrate on
adding in gaiji, I would have thought it more of a benefit to show
them how to start with 'full' TEI and remove all the bits they don't
want for their exercises. If this happens to produce TEILite+gaiji,
then so be it. In fact, rather than concentrating on adding gaiji, I
would have thought it should be highlighted as a module providing good
functionality for East Asian languages and special glyphs that other
encoding schemes might not provide.
>>2) Section 17: It may be misleading to say "Unicode as the required character
>>set for all documents", as XML allows you to specify other encodings, isn't it
>>the parsers which have to worry about changing them to unicode.
>
> No. XML uses only Unicode. XML uses only Unicode. XML uses only
> Unicode. If you specify an encoding in the XML declaration, what you
> do is specify a *subset* of XML. You can not specify an encoding that
> can not be mapped to XML. (Such things do exist and a friend of mine
> sticks to SGML for this very reason)
Yes, you are right of course, when I declare my document as iso-8859-1
for example, this is a subset of XML. Since the first 256 code points
in unicode were made identical to iso-8859-1 then my subset is easily
mapped. I think what was in my mind was the way it was expressed
seemed to suggest that only one encoding was allowed, while this is
true in that it is Unicode, it might frighten people off who are used
to using other character encodings which are now part of unicode. So
I guess I wanted some examples saying that various iso encodings are
allowed.
>>3) Section 17: Perhaps providing some examples of unicode character entity usage
>>might make the last sentence clearer.
> Talking about entity references opens a very ugly can of worms here,
> especially since mainstream P5 is not using DTDs any more. If you mean
> numeric character references like   you should say so.
You are right, I meant character references, so I reiterate:
'providing some examples of Unicode character references usage might
make the last sentence clearer'.
>>4) Somewhere there should be a short discussion that TEI-Lite is expressed as an
>>ODD and either a copy of that ODD or a link to it. This discussion should
>>mention Roma and that TEI users have the ability to make their own
>>customisations if they don't like TEI Lite.
>
> I would say that it should be enough to point to somewhere in the
> Guidelines.
What I'm trying to avoid by suggesting this is exactly the problem the
you were encountering above, that people think TEI-Lite is 'TEI' and
when it can't cope with something think that 'TEI isn't up to it'. I
think that TEI Lite should be proclaiming loud and proud that it is a
customisation and that 'full' TEI has much more in it.
-James
More information about the tei-council
mailing list