[tei-council] comments on CH from Brett Zamir

Tue Dec 25 19:06:15 EST 2007

The following comments on CH  were sent to the Editors by Brett Zamir, 
along with a small number of other minor typos and fixes which Syd has 
acted on.

-------------

First of all, let me say, I found this chapter extremely helpful. I've 
read several introductory discussions on this topic, and this was by far 
the most thorough and informative.

[tx brett!]

*General issues raised in my mind by the chapter:
*
1) In referring to written and spoken human language, might I suggest 
also referring to "signed" language (for which textual representation 
exist, albeit not necessarily in Unicode yet)? It is the only other 
category of modern human language, and as it is a full language in its 
own right, I'd suggest referring to it whenever also referring to spoken 
and written text. (e.g., 1st par. of Chapter vi.)

[I see no harm in adding this reference here, tho neither do I see any 
necessity for it:  LB]

2) You might search through all of the guidelines (or correct your 
transforming processor if that is the culprit, as seems more likely 
based on this document) so that use of ampersands are not displayed as 
&amp;amp; -- e.g., as at 
http://www.tei-c.org/release/doc/tei-p5-doc/en/html/CH.html (even though 
the code seems correct)

[I think this is a rendering problem in the HTML, but I will check  LB]

3) Why no text wrapping in these documents? (easier to search for 
distinct phrases that way) Don't most text processors which your editors 
might use allow wrapping?
*
[I have no idea what Brett means here. Which documents? LB ]

vi. Languages and Character Sets
*
1) As "Entry of Characters" refers to entities, perhaps this list would 
also be of interest for those using mathematical entities: 
http://www.w3.org/TR/xml-entity-names/ , many of which besides the 
following you are already using:

    * isomfrk <http://www.w3.org/TR/xml-entity-names/isomfrk.html>  Math
      Alphabets: Fraktur
    * isomopf <http://www.w3.org/TR/xml-entity-names/isomopf.html>  Math
      Alphabets: Open Face
    * isomscr <http://www.w3.org/TR/xml-entity-names/isomscr.html>  Math
      Alphabets: Script
    * mmlextra
      <http://www.w3.org/TR/xml-entity-names/mmlextra.html>  Additional
      MathML Symbols
    * mmlalias
      <http://www.w3.org/TR/xml-entity-names/mmlalias.html>  MathML Aliases
    * xhtml1-lat1
      <http://www.w3.org/TR/xml-entity-names/xhtml1-lat1.html>  Latin
      for HTML
    * xhtml1-special
      <http://www.w3.org/TR/xml-entity-names/xhtml1-special.html>  Special
      for HTML
    * xhtml1-symbol
      <http://www.w3.org/TR/xml-entity-names/xhtml1-symbol.html>  Symbol
      for HTML

[I think we should add this TR to the bibliography and reference it from 
here]

2) I find this paragraph under "Compatibility characters" confusing (I'm 
not sure if it is actually correct or not):

   However, by the time the Unicode standard
   was first being debated, it had become common practice to include
   single glyphs representing the more common ligatures in the
   repertoires of some typesetting devices and high-end printers, and
   for the coded character sets built into those devices to use a
   single code point for such glyphs, even though they represent two
   distinct abstract characters.

The context I thought was about items which "should not have been 
regarded as abstract characters in their own right", so I don't 
understand the last part of the above paragraph, as it seems to me it 
ought perhaps to be saying the opposite.

[It doesnt seem wrong to me: maybe "in their own right" is not quite the 
right phrase?
LB]

3) From the XML Standard at http://www.w3.org/TR/2006/REC-xml-20060816/: 

    "*Note that non-validating processors are not obligated to 
<http://www.w3.org/TR/2006/REC-xml-20060816/#include-if-valid> to read 
and process entity declarations occurring in parameter entities or in 
the external subset;* for such documents, the rule that an entity must 
be declared is a well-formedness constraint only if standalone='yes' 
<http://www.w3.org/TR/2006/REC-xml-20060816/#sec-rmd>."

This seems to me to contradict this statement in this chapter:

   The XML standard requires a
   non-validating parser to read and act on entity declarations
   only if they are located within the document's internal subset
   (which does not, of course, mean that the entity declarations
   have to be manually merged into the document instance in advance
   of processing: character entity sets, for instance, count as
   being in the internal subset if they are placed there via a
   parameter entity, as is normal TEI practice).

So it DOES seem to me from the above that in such non-validating 
parsers, parameter entities in the internal subset will also not 
work--the entities will need to be included manually. Am I wrong?

[I think these sentences are talking about different things and they 
dont contradict each other. The W3C statement says that a document is 
not well formed if it references undeclared entities (unless it 
explicitly says standlone="no"); the chapter is explaining that  entity 
declarations dont necessarily appear in the doc instance. LB]

best wishes,
Brett

--------------------