[tei-council] question about <char>

Daniel O'Donnell daniel.odonnell at uleth.ca
Sun Apr 15 23:12:11 EDT 2007


Not being part of the earlier discussions, the main reason I can see for
the g are

1) To provide a mechanism for describing non-unicode characters
2) To keep the content model of mapping the same whether the character
is unicode or not.

Looking up mapping and g, and see the definition of mapping supports my
hypothesis here, but the nomenclature and definition of g does not:

In mapping:

> The <g> elements contained by this element can point to either another
> <char> or <glyph>element or contain a character that is intended to be
> the target of this mapping.

In g:

> (character or glyph) represents a non-standard character or glyph.
...
> The name g is short for gaiji, which is the Japanese term for a
> non-standardized character or glyph.[1]

Personally, I really like consistent content models for structures
independent of specifics of the content--so I prefer requiring g for
both standard and non-standard characters. But the name of g is
misleading then in this case. Have we other places where you use cdata
if the content is one thing and a structural element is the content is
structurally the same thing but non-standard or the like?

[1] I notice that others use gaiji in a slightly different sense as
"supplemental" or "any glyph that's valid in your written language but
is not in the font you are using" (both Adobe:
http://www.adobe.com/products/indesign/sing_gaiji.html). "Any computer
system can only provide a limited, finite set of characters. Additional
characters are handled as
'gaiji'" (http://www.chibs.edu.tw/~chris/papers/ie/xml-gaiji/foil02.html). "A character not included in a standard set of characters" (Wiktionary).

In these cases it is less of a stretch to use <g> for Unicode characters
that need mapping: presumably you only map the ones that are relatively
unusual and are not likely to be in a users usual character set or font.

My preference is to leave it in and maybe amend the definition of
g/gaiji.

-dan


On Mon, 2007-04-16 at 10:52 +0900, Wittern Christian wrote:
> Lou Burnard wrote:
> > The description of the <char> element includes the following example:
> >
> >
> > <char xml:id="circledU4EBA">
> > <charName>CIRCLED IDEOGRAPH 4EBA</charName>
> > <charProp>
> > <unicodeName>character-decomposition-mapping</unicodeName>
> > <value>circle</value>
> > </charProp>
> > <charProp>
> > <localName>daikanwa</localName>
> > <value>36</value>is a standard mapping why is it using a <g> element? 
> > What's wrong with just using
> > </charProp>
> > <mapping type="standard">
> > <g ref="#U4EBA">人</g>
> > </mapping>
> > </char>
> >
> > I am puzzled by the <g> within the <mapping>. If this is a standard 
> > mapping why is it using a <g> element? What's wrong with just using 
> > the character, or an NCR like this?
> >
> > <mapping type="standard">
> > &#x4EBA;
> > </mapping>
> >
> You are right, it is not necessary.  If I remember correctly, this was 
> put in to show to human readers both the character and the codepoint.  
> If you find it confusing, your suggestion for replacement seems 
> acceptable to me on this dark and rainy Monday morning.
> 
> Christian
> 
> 
> _______________________________________________
> tei-council mailing list
> tei-council at lists.village.Virginia.EDU
> http://lists.village.Virginia.EDU/mailman/listinfo/tei-council
-- 
Daniel Paul O'Donnell, PhD
Director, Digital Medievalist Project http://www.digitalmedievalist.org/
Associate Professor and Chair, Department of English
University of Lethbridge
Lethbridge AB T1K 3M4
Canada
Vox: +1 403 329-2378
Fax: +1 403 382-7191




More information about the tei-council mailing list