[tei-council] clarifications for WD chapter

Christian Wittern cwittern at gmail.com
Fri Dec 14 02:41:47 EST 2007


Council members,

Those of you who attended the meeting in Maryland might have noted Marcus
Bingenheimer's poster about the gaiji module.  He raised some concerns that
were largely based on misunderstandings.  I have now, with his help, drafted
a slight rewrite of an existing paragraph and added some new material with
examples as included below.   While this might prove difficult, we both feel
it desirable to have this included in P5 1.01 to be released soon.
Please give me your thoughts.

----------

<!-- replace the existing para under 5.2 after the list giving the contents
of charDecl -->
<p>The <gi>char</gi> and <gi>glyph</gi> elements have similar contents
and are used in similar ways, but their functions are different.  The
<gi>char</gi> element is provided to define a character which is not
available in the current document character set, for whatever reason,
as stated above.  The <gi>glyph</gi> element is used to annotate a
character that has already been defined somewhere (either in the
document character set, or through a <gi>char</gi> element) by
providing a specific glyph that shows how a character appeared in the
original document.  This is necessary, since Unicode codepoints refer
not to a single, specific glyph shape of a character, but rather to a
set of glyphs, which can all be used to render the codeopoint in
question; in some case they can differ considerably.  The
<gi>glyph</gi> element is provided for cases where the encoder wants
to specify a specific glyph (or family of glyphs) out of all possible
glyphs.  Unfortunately, due to the way Unicode has been defined, there
are cases where several glyphs that should belong to the same class
have been given separate codepoints, especially in the blocks defining
East-Asian characters.  In such cases, <gi>glyph</gi> elements can be
used to express the view that they are all instances of the same
character (see below <ptr target="#D25-30"/>).</p>


<!-- add after the text under D25-30 --> <p>Since the need to use
these constructs to annotate or define characters occurs frequently in
Chinese, Korean or Japanese documents, here are some issues that are
specific to these documents.  There are two slightly different
versions of the problem.  In the first case, due to the way Unicode is
defined, there are occasions when more than one glyph is defined for a
character.  In such an occasion, one might want to retain the
character as used, but add information in a way so that a normalizer
(for search or indexing operations) could take advantage of this
information.  To achieve this, we simply define within a
<gi>charDecl</gi> element a <gi>glyph</gi> that has two <gi>mapping</gi>
elements, as shown here:
<egXML>
     <charDecl>
           <glyph xml:id="u8aaa">
               <mapping type="Unicode">&#x8AAA;</mapping>
               <mapping type="Standard">&#x8AAC;</mapping>
           </glyph>
       </charDecl>
</egXML>

The first of these <gi>mapping</gi>s, of type "Unicode", simply maps our
glyph to the
codepoint where Unicode defined it.  The other one, of type
"Standard", encodes the fact that in our view, this glyph is a
variation of the standard character given in the content of the
element. We could then use the xml:id "u8aaa" to refer to this glyph
element in our texts as
<egXML>
<g ref="#u8aaa">&#x8AAA;</g>
</egXML>
</p>
<p>A slightly different, but related problem occurs when we have two
variants, but none of them has been defined in Unicode.  In this case,
we need to define one as a new character using <gi>char</gi>, the
other as glyph using <gi>glyph</gi>.
<egXML>
   <charDecl>
          <char xml:id="newchar1">
<!-- more properties here -->
           </char>

           <glyph xml:id="varofnewchar1">
<!-- more properties here -->
               <mapping type="Standard"><g ref="#newchar1"/></mapping>
           </glyph>
 </charDecl>
</egXML>
The <gi>char</gi> defines the character, to which the <gi>glyph</gi>
element then adds a variant glyph of the same character.  In real life
they would need to have more properties to make them identifiable.
</p>
------

All the best,

Christian Wittern

-- 
 Christian Wittern
 Institute for Research in Humanities, Kyoto University
 47 Higashiogura-cho, Kitashirakawa, Sakyo-ku, Kyoto 606-8265, JAPAN


More information about the tei-council mailing list