[tei-council] review of IM
Syd Bauman
Syd_Bauman at Brown.edu
Sat Oct 27 21:51:55 EDT 2007
I read revision r3778 of IM. I note, however, that since then the
source files have been re-arranged a bit. The filename of the chapter
should be USE-UsingTEI.xml.
It's not clear to me that this section needs to be in the Guidelines
at all. (It's very clear that it needs to exist -- it is important
documentation.) It is currently part of the chapter "Using the TEI",
but really has very little to do with _using_ the TEI. It is probably
better positioned as an appendix.
The entire section does not mention the inclusion of Schematron rules
in an ODD, nor their extraction.
* #IM/p[1]: At the moment, the second word should be "section" not
"chapter". However, as I say above, I think maybe it should be
"appendix", if anything.
* Passim: very often the adjectival "firstly", "secondly", and
"thirdly" are used where I think we usually use "first", "second",
and "third".
* #IM/p[4]: deserves a re-write; here's a starting point:
<p>An ODD processor is not mandated to perform these two stages
in sequence, but this may well be the simplest approach. The
ODD processing tools provided by the TEI Consortium and used to
process the source of these Guidelines take this approach.</p>
* #IM-unified/p[1], 1st sentence, "... which specifies the name and
default namespace of the result.": I'm not sure what is meant by
"the name" (name of what? the schema? -- it doesn't have a name,
does it? the file that holds the schema? the identifier used to
refer to the schema?) or how it is specified (the ident=, I
presume?); it is not stated how the namespace is specified. Here is
a suggested re-work of what I think this sentence is trying to
convey.
<p>Merging an ODD customization with the TEI P5 ODD
specification is driven by a <gi>schemaSpec</gi> element:
<specDesc key="schemaSpec" atts="ident ns"/>
The <att>ident</att> attribute is required; it provides a name
for the generated schema. Other components of the ODD
processing system may use this name to refer to the schema
being generated, e.g. in issuing error messages or as the base
name of the generated output schema file or files. The
<att>ns</att> attribute may be used to specify the default
namespace within which elements valid against the resulting
schema belong, as discussed in <ptr ref="#MDNS"/>.
* #IM-unified/p[1], 2nd sentence (pre-list): should perhaps a bit
more descriptive: "The main content of the <gi>schemaSpec</gi>
element consist of a series of specialized elements, in any order,
each of which falls into one of four types."
* #IM-unified/p[1]/list[1]: suggested revision follows. Note the PIs,
which indicate spots someone who underastands this process bettter
than I should check.
<list type="ordered">
<label>specifications</label>
<item>The TEI ODD specification elements <gi>elementSpec</gi>,
<gi>classSpec</gi>, and <gi>macroSpec</gi> may appear as direct children
of <gi>schemaSpec</gi>. Each occurrence must bear a <att>mode</att>
attribute which determines how it will be processed.<note place="foot">We
do not here say what happens in case of errors; a specification in
<val>add</val> mode which is also present in an imported module should
obviously be flagged as an error.</note> If the value of <att>mode</att>
is <val>add</val>, then the object is simply copied to the output, but if
it is <val>change</val>, <val>delete</val>, or <val>replace</val>, then it
will be looked at by other parts of the process.</item>
<label>reference to specifications</label>
<item><gi>specGrpRef</gi> elements refer to <gi>specGrp</gi> elements that
occur elsewhere in the current ODD document or even in another document
entirely. A <gi>specGrp</gi> element, in turn, groups together a set of
ODD specifications (among other things, including further
<gi>specGrpRef</gi> elements). The use of <gi>specGrp</gi> and
<gi>specGrpRef</gi> permits the ODD markup to occur at the points in
documentation where they are discussed, rather than all inside
<gi>schemaSpec</gi>. The <att>target</att> attribute of any
<gi>specGrpRef</gi> should be followed, and the <gi>elementSpec</gi>,
<gi>classSpec</gi>, and <gi>macroSpec</gi>, elements in the
corresponding <gi>specGrp</gi> should be processed as described in the
previous item; <gi>specGrpRef</gi> elements should be processed as
described here.</item>
<label>references to TEI modules</label>
<item><gi>moduleRef</gi> elements with <att>key</att> attributes refer to
components of the TEI. The value of the <att>key</att> attribute matches
the <att>ident</att> attribute of a TEI module. The <att>key</att> must be
dereferenced by some means, such as reading an XML file with the TEI ODD
specification (either from the local hard drive or off the web), or
looking up the reference in an XML database (again, locally or remotely);
whatever means is used, it should return a stream of XML containing the
element, class, and macro specifications <?tei is this right? --sb ?>
belonging to the specified module. These specification elements can then
be checked against overrides in the <gi>schemaSpec</gi> being processed.</item>
<label>references to external modules</label>
<item><gi>moduleRef</gi> elements with <att>url</att> attributes refer to
external schemas written in RELAX
NG<?tei do these have to be in XML or compact syntax? --sb?>. These
should remain untouched, and be passed directly to the output schema when
it is created. </item>
</list>
* #IM-unified/p[2]:
- I don't believe the term "object" has been defined, and it should be
- insert "with the <att>key</att> attribute" between
"<gi>moduleRef</gi>" and "must"
- in the list, the values of mode= are mis-encoded as <att> instead of <val>
- the list says what do do with objects of same ident= when mode=
is 'delete', 'replace', or 'change', but not 'add' (footnote 87
above notwithstanding -- it is too far removed from this list,
I'd say)
* #IM-unified/p[3], 2nd sentence, before the <list>: how about
"Each component could fall into one of four categories:"?
* #IM-unified/p[3]/list[1]/item[1]: Last I knew, this included the
xml:id= attribute, with the result that you could not use an
xml:id= value in your customization that occurs on an object in the
TEI ODD specification (at least, if that module is included). I'm
wondering if xml:id= should be excluded from this rule, so that
other ODD processors may get around this restriction. Or does that
lead to madness?
* #IM-unified/p[3]/list[1]/item[3], parenthetical: missing ", and",
but moreover I'm uncomfortable with the loose use of "elements",
"macros", and "attributes" for "when the ODD processor is building
an element" or whatever. Would the following do?
(<gi>equiv</gi>, <gi>desc</gi>, <gi>gloss</gi>,
<gi>exemplum</gi>, <gi>remarks</gi>, and <gi>listRef</gi> in
the specifications of elements or macros, and
<gi>datatype</gi> and <gi>defaultVal</gi> in the
specification of attributes)
* #IM-unified/p[3]/list[1]/item[3], after parenthetical: insert
"occurrences" after "all".
* #IM-unified/p[3]/list[1]/item[4]: "i.e." -> "e.g."; make
"attribute" plural:
<item>identified objects (i.e. those with an <att>ident</att>
attribute, e.g. <gi>attDef</gi> and <gi>valItem</gi>) are
processed according to their <att>mode</att> attributes,
following the rules in this list.</item>
It would be better, I think, to reword the whole list to use
singular subjects, e.g. "Each object which can occur ... is taken
..."
* #IM-unified/p[4], sentence 2: Should we be pointing out that the
example demonstrates a non-conformant customization? In any case,
the term "element" should probably be more specific:
Consider this simple example of a non-conformant customization
to the <gi>p</gi> element:
* #IM-unified/p[4], between the <egXML>s: s/affect/effect/; reverse
"not" and "to"; also probably good to expand "the att.typed class";
thus perhaps
The effect of making <gi>p</gi> a member if the <name
type="class">att.typed</name> class is to provide it with both
the <att>type</att> and <att>subtype</att> attributes. If we
want <gi>p</gi> <emph>not</emph> to have the
<att>subtype</att> attribute, ...
* #IM-unified/p[4], after 2nd <egXML>: change <code> to <tag>
* #IM-unified/p[6]: s/entire/entirely/; but moreover, why is it
easier to deal with multiple examples?
* #IM-unified/p[7]: delete first comma; I'm not fond of the
"whether to take account of" construct. How about "<p>When
processing the content models of elements and the content of
macros, the processor has to decide whether to take deleted
elements into account or not."?
* #IM-unified/p[7]/note, sentence 1: s/PizzaChef/Pizza Chef/;
* #IM-unified/p[7]/note, sentence 2: would "The roma program behind
the P5 Roma application is not as sophisticated, ..." be incorrect?
It reads better.
* #IM-unified/p[7], between the <egXML>s --
"... the <gi>choice</gi> is simply <att>model.global</att>.":
should be more like "... then <name
type="class">model.global</name> is left as the only child of
<gi>rng:choice</gi>".
Notice that <choice> needs to be qualified, as it is also the name
of a TEI element. (In general, I think we should qualify all
elements not from the TEI namesapce, except perhaps in SG.)
"is itself inside an <gi>zeroOrMore</gi> inside a <gi>group</gi>":
the "an" should be an "a".
* #IM-unified/p[7], right after 2nd <egXML>: before the example we
were talking in generic terms, but after with a specific element
name.
"and it has been deleted (for example, if <gi>figDesc</gi> had
been deleted in the customization in which the above example
occurs)"
That's not too good, but you get the idea.
BTW, I'm curious: why is it necessary to remove the reference?
Couldn't it just be resolved to the pattern "empty"?
* #IM-unified/p[7]/note: How about the following:
Note that deletion of required elements will cause the schema
specification to mark as valid instances that cannot be TEI
Conformant documents since they break the TEI abstract model.
Conformance topics are addressed in more detail in <ptr
target="#CF"/>.
* Same para, next sentence, "consequentially": I don't wonder if the
word "consequently" is what is intended, in which case it should be
moved to be the 1st word of the sentence:
Consequently, surrounding constructs, such as a
<gi>rng:zeroOrMore</gi>, may also have to be removed.</p>
If "consequentially" is what was meant, we need to explain what
consequence is of concern.
* #IM-unified/p[8]: "flat set" is not explained. (I think it would
be good to explain it, but low priority.)
* #IMGS: In this section the voice switches from making the ODD
processor the active party ("an ODD processor must ...") and things
like "it will be necessary to remove" (what is that -- 'impersonal
passive'?) to the first person plural.
* #IMGS/p[1]: The fact that order matters in order to give "the best
chance of successfully supporting all the schema languages" perhaps
should be mentioned before the actual sequence of events. Although
I have to admit, I have not quite figured out why processing order
matters with respect to schema language. (It is very clear that
output order matters for DTDs: see #IM-makeDTD.)
* #IMGS/p[2], 1st 2 sentences, "Firstly, a decision must be made
about which schema language is going to be used. The TEI ODD
specification, using RELAX NG to express content models, is
slightly biased towards this language,": The first sentence seems
odd -- I would kinda hope software engineers designing an ODD
processor know what output they want. I also would hope that we
consider ourselves a wee bit more than _slightly_ biased towards
RELAX NG.
An ODD processor may use any desired schema language or
languages as its schema output. The TEI ODD specification uses
RELAX NG to express content models, and is therefore biased
towards this language. However, the current TEI ODD processing
system is capable of producing schema output in the three main
schema languages, as follows:
* #IMGS/p[2]/list/item[1]: s/direct/directly/; also "a RELAX NG
#compact version" should be "a version in the compact syntax" or
#some such.
* #IMGS: In this section the `trang` program is encoded as an
<ident>; in the previous section Roma, I think it was, was not
encoded at all. I think that all references to programs, utilities,
commands, etc. should be encoded as <name type='pgm'>. (After all,
"trang" is the name of a program.)
* #IMGS/p[3]: if the rewrite of the beginning of para 2 is accepted,
#then this should be deleted.
* #IMGS/p[4], last sentence: is "Roma processors" (plural) correct?
Also, to anyone who has read a schema "in as simple a style as
possible" seems like an exaggeration. (E.g., much of the indirection
could be resolved -- not that I think this is a good idea, mind
you.) How about "in a comparatively simple style"?
* #IMGS/p[5]/eg[1] and eg[2]: Since there is no markup in the
examples, the CDATA marked sections are superfluous.
* #IMGS/p[5] text in between the two <eg>: The idea that "the
knowledge that the attributes such as <att>n</att> and
<att>rend</att> come from the global attribute class is lost" seems
pretty counter-intuitive: everyone and anyone can see that n= and
rend= come from the global attribute class, because the patterns
used are named "att.global.n" and "att.global.rend". Here is a
suggested re-wording:
In the above, a redefinition of an attribute class will have no
effect, as each class has already been expanded to its
constituent attributes.
* #IMGS/p[5] text after the 2nd <eg>:
- change "class attributes" to "attribute classes", no?
- change "with a pointer" to "via a reference"
* #IMGS/p[6], last sentence, "An ODD processor is not required to
support both.": Perhaps we should mention that for processing TEI
ODDs, the simple schema output is at least vastly preferred, if not
required.
* #IMGS/p[7]: the example <sp> declaration is not simplified, it is
completely different (there is no place to put the speech!). If we
want to keep this example, I'd change "simplified" to "fictitious".
* #IMGS/p[7], after <eg>s: I'm not fond of the wording here (no
reason not to use more precise industry-wide term "deterministic";
the last sentence makes it sound like it is a problem that RELAX NG
does not require determinism), but I think it is low priority and
can await 1.1, unless someone can re-word this a lot faster than I.
* IMGS/p[8], "... mandate any particular schema, but it is ...":
s/schema/mechanism/;
* IMGS/p[8], rest of para: Why are we recommending this only for
DTDs? Just because it is hard for us to do for RELAX NG doesn't
mean we should not recommend ODD processors do this.
* #IM-naming/head: how about "Names and Documentation in Generated
Schemas"?
* #IM-naming/p[1], sentence 1: insert Oxford comma after "element".
* #IM-naming/p[1]/list/item[1]
- "... value of the <att>ident</att> attribute, prefixed ...":
insert "corresponding" after "the"
- "... distinctive prefix such as e.g. <val>tei_</val>.": remove
either "such as" or "e.g.".
- "(compact)": we haven't mentioned that examples are in the
compact syntax before, but I think it is a good idea that we do.
I suggest we standardize on "RELAX NG (compact syntax)" both here
and at #IMGS/p[5], just before the <egXML>. (Anywhere else?)
- I think "Referring strings have to be adjusted accordingly."
should be expanded. What exactly is a "referring string"?
Would something like "References to these patterns (or, in DTDs,
parameter entities) also need to be prefixed with the same
value." be correct?
* #IM-naming/p[1]/list/item[2], "... <gi>altIdent</gi> child, the
value of that is ...": re-word: "... <gi>altIdent</gi> child, its
content is ...".
* #IM-naming/p[1]/list/item[3], 2nd sentence: suggested re-wording:
If there is only one occurrence of either of these elements, it
should be used; however if there are two or more occurrences with
different values of <att>xml:lang</att>, a locale indication in
the processing environment should be used to decide which to
use.
Note that this does not give advice on what to do when there are
two or more with the same value of xml:lang=. Fodder for a 1.1
improvement.
* #IM-naming/p[2]/list/item[2]: there is an exception: colons are
removed first, so that the namespace prefix and attribute name are
run together, as in 'att.global.attribute.xmlid'.
* #IMMA, after the <egXML>: reword to something like the following.
Note that in much of these Guidelines, RELAX NG schema fragments
are shown in the compact syntax; both the content of the
<gi>contents</gi> element and the the unified ODD specification
generated by the TEI ODD processing software stores RELAX NG in
the more verbose XML format. However, the two formats are
interchangeable.
* #IMCL/p[1], sentence 1: actually, a definition is generated, not
just an alternation. suggested rewording:
An ODD model class generates a RELAX NG pattern definition
listing all the members of the class present in the ODD in
alternation.
* #IMCL/p[2]/egXML[2]:
- I expected to see an <a:documentation> element; am I crazy?
- it would probably be a good idea to explain the reason behind two
definitions, one as 'empty' (I do not understand well enough to
explain it)
* #IMCL/p[2]/quote/following-sibling::text(), "Naturally, this
sort of use of the documentation elements is not mandatory, and
other ODD processors may ignore them when creating schemas.": other
ODD processors could do something else, too, so I'd suggest
something like:
Naturally, this sort of use of the documentation elements is
not mandatory, and other ODD processors may generate
alternate documentation or ignore them when creating schemas.
* #IMCL/p[3], before the <egXML>s: this paragraph does not follow
house style in referring to elements and attributes.
<p>An individual attribute consists of a <gi>rng:attribute</gi>
element with a <att>name</att> attribute derived according to
the naming rules described above. In addition, the ODD model
supports a <gi>defaultVal</gi> element, which is transformed to
a <att>defaultValue</att> attribute in the <ident
type="ns">http://relaxng.org/ns/compatibility/annotations/1.0</ident>
namespace on the <gi>rng:attribute</gi> element. The body of
the attribute definition is taken from the <gi>datatype</gi>
child, unless there is a supporting <gi>valList</gi> element
with a <att>type</att> attribute with a value of
<val>closed</val>. In that case a <gi>rng:choice</gi> is
generated, listing the allowed values.
* #IMCL/p[3], after the <egXML>s: <ident> needs type="ns"; need to
cite the recommendation for marking up annotations this way.
(http://relaxng.org/compatibility-20011203.html, is it?)
* #IM-makeDTD/p[1], "... classes generate DTD entities,
the TEI ...": insert "parameter" after "DTD".
* #IM-makeDTD/p[1]/list/following-sibling::text(): I think this
sentence is far too colloquial for use in the Guidelines. I think
it can just be deleted.
* #IM-makeDTD/p[2]/eg[1]: I realize this is probably correctly
copied-and-pasted from some real DTD output, but I'm thinking that
the xmlns attribute should be declared with #FIXED.
* #IM-makeDTD/p[2], last sentence: "... the document is processing
by a DTD-aware ...": s/ing/ed/;
* #IMGD/p[1], 1st sentence:
- need a citation for Knuth's literate programming.
- latter half of sentence a bit wordy; suggested revision:
... the previous sections have dealt with the
<term>tangle</term> process; to generate documentation, we now
turn to the <term>weave</term> process.
* #IMGD/p[2]: suggested revision:
An ODD customization may consist largely of general
documentation and examples, which should be processed
normally;, but in addition it will contain a
<gi>schemaSpec</gi> and possibly some <gi>specGrp</gi>
fragments.
* #ref-faith: probably would be good to come up with a more recent
image.
* #STPE: This section deals with instructions on how to "stitch
together" the RELAX NG or DTD schema fragments into a usable
schema. My recollection is that Council decided this information
should not be included in the Guidelines themselves, so I am
recommending we delete the entire section, and I am not giving it a
closer reading.
More information about the tei-council
mailing list