[tei-council] comments on CH from Brett Zamir
Lou's Laptop
lou.burnard at oucs.ox.ac.uk
Tue Dec 25 19:06:15 EST 2007
The following comments on CH were sent to the Editors by Brett Zamir,
along with a small number of other minor typos and fixes which Syd has
acted on.
-------------
First of all, let me say, I found this chapter extremely helpful. I've
read several introductory discussions on this topic, and this was by far
the most thorough and informative.
[tx brett!]
*General issues raised in my mind by the chapter:
*
1) In referring to written and spoken human language, might I suggest
also referring to "signed" language (for which textual representation
exist, albeit not necessarily in Unicode yet)? It is the only other
category of modern human language, and as it is a full language in its
own right, I'd suggest referring to it whenever also referring to spoken
and written text. (e.g., 1st par. of Chapter vi.)
[I see no harm in adding this reference here, tho neither do I see any
necessity for it: LB]
2) You might search through all of the guidelines (or correct your
transforming processor if that is the culprit, as seems more likely
based on this document) so that use of ampersands are not displayed as
& -- e.g., as at
http://www.tei-c.org/release/doc/tei-p5-doc/en/html/CH.html (even though
the code seems correct)
[I think this is a rendering problem in the HTML, but I will check LB]
3) Why no text wrapping in these documents? (easier to search for
distinct phrases that way) Don't most text processors which your editors
might use allow wrapping?
*
[I have no idea what Brett means here. Which documents? LB ]
vi. Languages and Character Sets
*
1) As "Entry of Characters" refers to entities, perhaps this list would
also be of interest for those using mathematical entities:
http://www.w3.org/TR/xml-entity-names/ , many of which besides the
following you are already using:
* isomfrk <http://www.w3.org/TR/xml-entity-names/isomfrk.html> Math
Alphabets: Fraktur
* isomopf <http://www.w3.org/TR/xml-entity-names/isomopf.html> Math
Alphabets: Open Face
* isomscr <http://www.w3.org/TR/xml-entity-names/isomscr.html> Math
Alphabets: Script
* mmlextra
<http://www.w3.org/TR/xml-entity-names/mmlextra.html> Additional
MathML Symbols
* mmlalias
<http://www.w3.org/TR/xml-entity-names/mmlalias.html> MathML Aliases
* xhtml1-lat1
<http://www.w3.org/TR/xml-entity-names/xhtml1-lat1.html> Latin
for HTML
* xhtml1-special
<http://www.w3.org/TR/xml-entity-names/xhtml1-special.html> Special
for HTML
* xhtml1-symbol
<http://www.w3.org/TR/xml-entity-names/xhtml1-symbol.html> Symbol
for HTML
[I think we should add this TR to the bibliography and reference it from
here]
2) I find this paragraph under "Compatibility characters" confusing (I'm
not sure if it is actually correct or not):
However, by the time the Unicode standard
was first being debated, it had become common practice to include
single glyphs representing the more common ligatures in the
repertoires of some typesetting devices and high-end printers, and
for the coded character sets built into those devices to use a
single code point for such glyphs, even though they represent two
distinct abstract characters.
The context I thought was about items which "should not have been
regarded as abstract characters in their own right", so I don't
understand the last part of the above paragraph, as it seems to me it
ought perhaps to be saying the opposite.
[It doesnt seem wrong to me: maybe "in their own right" is not quite the
right phrase?
LB]
3) From the XML Standard at http://www.w3.org/TR/2006/REC-xml-20060816/:
"*Note that non-validating processors are not obligated to
<http://www.w3.org/TR/2006/REC-xml-20060816/#include-if-valid> to read
and process entity declarations occurring in parameter entities or in
the external subset;* for such documents, the rule that an entity must
be declared is a well-formedness constraint only if standalone='yes'
<http://www.w3.org/TR/2006/REC-xml-20060816/#sec-rmd>."
This seems to me to contradict this statement in this chapter:
The XML standard requires a
non-validating parser to read and act on entity declarations
only if they are located within the document's internal subset
(which does not, of course, mean that the entity declarations
have to be manually merged into the document instance in advance
of processing: character entity sets, for instance, count as
being in the internal subset if they are placed there via a
parameter entity, as is normal TEI practice).
So it DOES seem to me from the above that in such non-validating
parsers, parameter entities in the internal subset will also not
work--the entities will need to be included manually. Am I wrong?
[I think these sentences are talking about different things and they
dont contradict each other. The W3C statement says that a document is
not well formed if it references undeclared entities (unless it
explicitly says standlone="no"); the chapter is explaining that entity
declarations dont necessarily appear in the doc instance. LB]
best wishes,
Brett
--------------------
More information about the tei-council
mailing list