[tei-council] More on TEI Lite: work in progress

Mon Feb 20 18:55:35 EST 2006

Sebastian Rahtz <Sebastian.Rahtz at oucs.ox.ac.uk> writes:

>  > The benefit of
>> having TEI Lite here is that it comes with a very condensed tutorial
>> which is far less intimidating than the whole Guidelines (it is also
>> the only part of the TEI that has so far been translated in a range of
>> languages other than English) and that it comes with downloadable
>> schema files.  This latter fact is quite crucial for introductory
>> courses, because the concepts necessary for understanding and using
>> Roma to create a customization are beyond reach here. 
>
> we can add gaiji to Lite, but not spend much time documenting it?
>
For example

>> Currently, I am
>> preparing customized TEI Lite versions that differ from the "standard"
>> TEI Lite only in that they have the gaiji-module added.
> you use the new Lite, I hope ?

This *currently* was in fact what I did last year.  Now I would use
the new version.

>>  The
>> impression this makes on participants is without fail that "standard"
>> TEI is simply not up to the task of dealing with East Asian texts
>> (since that is what I have to deal with in my courses), which gives
>> them (in their eyes) a good excuse to avoid dealing with TEI at all.
>
> you're saying most East Asian texts need gaiji?
>

Not necessarily new characters, but associating codepoints with
specific glyphs is a frequent requirement.  In my specific audience -
academics dealing with premodern east asian texts - gaiji is a
frequent requirement as well.  At this point in time this is also
necessary in many cases were a character exists in the latest version
of Unicodes, but the editors, operating systems etc do not yet support
that character. 

>
>
>> "For those working with standard forms of the European languages in
>> particular, almost no special action is needed: "
>> The action needed is either to convince the "any XML editor" to use
>> UTF-8 or if that does not work, declare the encoding of the file to
>> use iso-8859-1.  "No action" is surely asking for desaster here.
>
> tell me which editors don't do the right thing?
>
Emacs (out of the box), a whole bunch of plain text editors. It might
be the case that most XML editors now do support it, but my impression
from various mailing lists is that the problem still persists.  I
think it would be enough to simply add a phrase similar to what I said
in the quote above.

>> No.  XML uses only Unicode.  XML uses only Unicode.  XML uses only
>> Unicode. 
> *internally*, yes.
>
>   If you specify an encoding in the XML declaration, what you
>> do is specify a *subset* of XML. 
>
> I dont agree. you specify the encoding that your document uses.

But you are always free to use any codepoint with numeric references.
These NCRs are to *Unicode* not to the declared encoding.  So what you
declare is a subset.

> Most parsers will transcode it, but do not have to.
>
Show me one that does not transcode.  Life would be pretty hard then,
I assume.

>   You can not specify an encoding that
>> can not be mapped to XML. 
>
> which encodings cannot be mapped?
>
My friend uses CCCII, the Chinese Character Coding for Information
Interchange defined in Taiwan 1985.  There is also more recently the
TRON character set in Japan, which is used in PC like operating
systems and cell phones.

All the bes,t

Christian

-- 
 Christian Wittern 
 Institute for Research in Humanities, Kyoto University
 47 Higashiogura-cho, Kitashirakawa, Sakyo-ku, Kyoto 606-8265, JAPAN