[tei-council] review of IM

Sat Oct 27 21:51:55 EDT 2007

I read revision r3778 of IM. I note, however, that since then the
source files have been re-arranged a bit. The filename of the chapter
should be USE-UsingTEI.xml.

It's not clear to me that this section needs to be in the Guidelines
at all. (It's very clear that it needs to exist -- it is important
documentation.) It is currently part of the chapter "Using the TEI",
but really has very little to do with _using_ the TEI. It is probably
better positioned as an appendix.

The entire section does not mention the inclusion of Schematron rules
in an ODD, nor their extraction.

* #IM/p[1]: At the moment, the second word should be "section" not
  "chapter". However, as I say above, I think maybe it should be
  "appendix", if anything.

* Passim: very often the adjectival "firstly", "secondly", and
  "thirdly" are used where I think we usually use "first", "second",
  and "third".

* #IM/p[4]: deserves a re-write; here's a starting point:
      <p>An ODD processor is not mandated to perform these two stages
      in sequence, but this may well be the simplest approach. The
      ODD processing tools provided by the TEI Consortium and used to
      process the source of these Guidelines take this approach.</p>

* #IM-unified/p[1], 1st sentence, "... which specifies the name and
  default namespace of the result.": I'm not sure what is meant by
  "the name" (name of what? the schema? -- it doesn't have a name,
  does it? the file that holds the schema? the identifier used to
  refer to the schema?) or how it is specified (the ident=, I
  presume?); it is not stated how the namespace is specified. Here is
  a suggested re-work of what I think this sentence is trying to
  convey. 
      <p>Merging an ODD customization with the TEI P5 ODD
      specification is driven by a <gi>schemaSpec</gi> element:
      <specDesc key="schemaSpec" atts="ident ns"/> 
      The <att>ident</att> attribute is required; it provides a name
      for the generated schema. Other components of the ODD
      processing system may use this name to refer to the schema
      being generated, e.g. in issuing error messages or as the base
      name of the generated output schema file or files. The
      <att>ns</att> attribute may be used to specify the default
      namespace within which elements valid against the resulting
      schema belong, as discussed in <ptr ref="#MDNS"/>.

* #IM-unified/p[1], 2nd sentence (pre-list): should perhaps a bit
  more descriptive: "The main content of the <gi>schemaSpec</gi>
  element consist of a series of specialized elements, in any order,
  each of which falls into one of four types."

* #IM-unified/p[1]/list[1]: suggested revision follows. Note the PIs,
  which indicate spots someone who underastands this process bettter
  than I should check.

  <list type="ordered">
    <label>specifications</label>
    <item>The TEI ODD specification elements <gi>elementSpec</gi>,
      <gi>classSpec</gi>, and <gi>macroSpec</gi> may appear as direct children
      of <gi>schemaSpec</gi>. Each occurrence must bear a <att>mode</att>
      attribute which determines how it will be processed.<note place="foot">We
        do not here say what happens in case of errors; a specification in
          <val>add</val> mode which is also present in an imported module should
        obviously be flagged as an error.</note> If the value of <att>mode</att>
      is <val>add</val>, then the object is simply copied to the output, but if
      it is <val>change</val>, <val>delete</val>, or <val>replace</val>, then it
      will be looked at by other parts of the process.</item>
    <label>reference to specifications</label>
    <item><gi>specGrpRef</gi> elements refer to <gi>specGrp</gi> elements that
      occur elsewhere in the current ODD document or even in another document
      entirely. A <gi>specGrp</gi> element, in turn, groups together a set of
      ODD specifications (among other things, including further
      <gi>specGrpRef</gi> elements). The use of <gi>specGrp</gi> and
        <gi>specGrpRef</gi> permits the ODD markup to occur at the points in
      documentation where they are discussed, rather than all inside
        <gi>schemaSpec</gi>. The <att>target</att> attribute of any
        <gi>specGrpRef</gi> should be followed, and the <gi>elementSpec</gi>,
        <gi>classSpec</gi>, and <gi>macroSpec</gi>, elements in the
      corresponding <gi>specGrp</gi> should be processed as described in the
      previous item; <gi>specGrpRef</gi> elements should be processed as
      described here.</item>
    <label>references to TEI modules</label>
    <item><gi>moduleRef</gi> elements with <att>key</att> attributes refer to
      components of the TEI. The value of the <att>key</att> attribute matches
      the <att>ident</att> attribute of a TEI module. The <att>key</att> must be
      dereferenced by some means, such as reading an XML file with the TEI ODD
      specification (either from the local hard drive or off the web), or
      looking up the reference in an XML database (again, locally or remotely);
      whatever means is used, it should return a stream of XML containing the
      element, class, and macro specifications <?tei is this right? --sb ?>
      belonging to the specified module. These specification elements can then
      be checked against overrides in the <gi>schemaSpec</gi> being processed.</item>
    <label>references to external modules</label>
    <item><gi>moduleRef</gi> elements with <att>url</att> attributes refer to
      external schemas written in RELAX
      NG<?tei do these have to be in XML or compact syntax? --sb?>. These
      should remain untouched, and be passed directly to the output schema when
      it is created. </item>
  </list>

* #IM-unified/p[2]: 
  - I don't believe the term "object" has been defined, and it should be
  - insert "with the <att>key</att> attribute" between
    "<gi>moduleRef</gi>" and "must"
  - in the list, the values of mode= are mis-encoded as <att> instead of <val>
  - the list says what do do with objects of same ident= when mode=
    is 'delete', 'replace', or 'change', but not 'add' (footnote 87
    above notwithstanding -- it is too far removed from this list,
    I'd say)

* #IM-unified/p[3], 2nd sentence, before the <list>: how about
  "Each component could fall into one of four categories:"?

* #IM-unified/p[3]/list[1]/item[1]: Last I knew, this included the
  xml:id= attribute, with the result that you could not use an
  xml:id= value in your customization that occurs on an object in the
  TEI ODD specification (at least, if that module is included). I'm
  wondering if xml:id= should be excluded from this rule, so that
  other ODD processors may get around this restriction. Or does that
  lead to madness?

* #IM-unified/p[3]/list[1]/item[3], parenthetical: missing ", and",
  but moreover I'm uncomfortable with the loose use of "elements",
  "macros", and "attributes" for "when the ODD processor is building
  an element" or whatever. Would the following do?

        (<gi>equiv</gi>, <gi>desc</gi>, <gi>gloss</gi>,
        <gi>exemplum</gi>, <gi>remarks</gi>, and <gi>listRef</gi> in
        the specifications of elements or macros, and
        <gi>datatype</gi> and <gi>defaultVal</gi> in the
        specification of attributes)

* #IM-unified/p[3]/list[1]/item[3], after parenthetical: insert
  "occurrences" after "all".

* #IM-unified/p[3]/list[1]/item[4]: "i.e." -> "e.g."; make
  "attribute" plural:
     <item>identified objects (i.e. those with an <att>ident</att>
     attribute, e.g. <gi>attDef</gi> and <gi>valItem</gi>) are
     processed according to their <att>mode</att> attributes,
     following the rules in this list.</item>
  It would be better, I think, to reword the whole list to use
  singular subjects, e.g. "Each object which can occur ... is taken
  ..." 

* #IM-unified/p[4], sentence 2: Should we be pointing out that the
   example demonstrates a non-conformant customization? In any case,
   the term "element" should probably be more specific:
      Consider this simple example of a non-conformant customization
      to the <gi>p</gi> element:

* #IM-unified/p[4], between the <egXML>s: s/affect/effect/; reverse
  "not" and "to"; also probably good to expand "the att.typed class";
  thus perhaps
       The effect of making <gi>p</gi> a member if the <name
       type="class">att.typed</name> class is to provide it with both
       the <att>type</att> and <att>subtype</att> attributes. If we
       want <gi>p</gi> <emph>not</emph> to have the
       <att>subtype</att> attribute, ...

* #IM-unified/p[4], after 2nd <egXML>: change <code> to <tag>

* #IM-unified/p[6]: s/entire/entirely/; but moreover, why is it
  easier to deal with multiple examples? 

* #IM-unified/p[7]: delete first comma; I'm not fond of the
  "whether to take account of" construct. How about "<p>When
  processing the content models of elements and the content of
  macros, the processor has to decide whether to take deleted
  elements into account or not."?

* #IM-unified/p[7]/note, sentence 1: s/PizzaChef/Pizza Chef/; 

* #IM-unified/p[7]/note, sentence 2: would "The roma program behind
  the P5 Roma application is not as sophisticated, ..." be incorrect?
  It reads better.

* #IM-unified/p[7], between the <egXML>s --

  "... the <gi>choice</gi> is simply <att>model.global</att>.":
  should be more like "... then <name
  type="class">model.global</name> is left as the only child of
  <gi>rng:choice</gi>".

  Notice that <choice> needs to be qualified, as it is also the name
  of a TEI element. (In general, I think we should qualify all
  elements not from the TEI namesapce, except perhaps in SG.)

  "is itself inside an <gi>zeroOrMore</gi> inside a <gi>group</gi>":
  the "an" should be an "a".

* #IM-unified/p[7], right after 2nd <egXML>: before the example we
   were talking in generic terms, but after with a specific element
   name. 
      "and it has been deleted (for example, if <gi>figDesc</gi> had
      been deleted in the customization in which the above example
      occurs)"
   That's not too good, but you get the idea.
   BTW, I'm curious: why is it necessary to remove the reference?
   Couldn't it just be resolved to the pattern "empty"?

* #IM-unified/p[7]/note: How about the following:
      Note that deletion of required elements will cause the schema
      specification to mark as valid instances that cannot be TEI
      Conformant documents since they break the TEI abstract model.
      Conformance topics are addressed in more detail in <ptr
      target="#CF"/>.

* Same para, next sentence, "consequentially": I don't wonder if the
  word "consequently" is what is intended, in which case it should be
  moved to be the 1st word of the sentence:
    Consequently, surrounding constructs, such as a
    <gi>rng:zeroOrMore</gi>, may also have to be removed.</p>
  If "consequentially" is what was meant, we need to explain what
  consequence is of concern.

* #IM-unified/p[8]: "flat set" is not explained. (I think it would
  be good to explain it, but low priority.)

* #IMGS: In this section the voice switches from making the ODD
  processor the active party ("an ODD processor must ...") and things
  like "it will be necessary to remove" (what is that -- 'impersonal
  passive'?) to the first person plural.

* #IMGS/p[1]: The fact that order matters in order to give "the best
  chance of successfully supporting all the schema languages" perhaps
  should be mentioned before the actual sequence of events. Although
  I have to admit, I have not quite figured out why processing order
  matters with respect to schema language. (It is very clear that
  output order matters for DTDs: see #IM-makeDTD.)

* #IMGS/p[2], 1st 2 sentences, "Firstly, a decision must be made
  about which schema language is going to be used. The TEI ODD
  specification, using RELAX NG to express content models, is
  slightly biased towards this language,": The first sentence seems
  odd -- I would kinda hope software engineers designing an ODD
  processor know what output they want. I also would hope that we
  consider ourselves a wee bit more than _slightly_ biased towards
  RELAX NG. 

     An ODD processor may use any desired schema language or
     languages as its schema output. The TEI ODD specification uses
     RELAX NG to express content models, and is therefore biased
     towards this language. However, the current TEI ODD processing
     system is capable of producing schema output in the three main
     schema languages, as follows:

* #IMGS/p[2]/list/item[1]: s/direct/directly/; also "a RELAX NG
  #compact version" should be "a version in the compact syntax" or
  #some such. 

* #IMGS: In this section the `trang` program is encoded as an
  <ident>; in the previous section Roma, I think it was, was not
  encoded at all. I think that all references to programs, utilities,
  commands, etc. should be encoded as <name type='pgm'>. (After all,
  "trang" is the name of a program.)

* #IMGS/p[3]: if the rewrite of the beginning of para 2 is accepted,
  #then this should be deleted.

* #IMGS/p[4], last sentence: is "Roma processors" (plural) correct?
  Also, to anyone who has read a schema "in as simple a style as
  possible" seems like an exaggeration. (E.g., much of the indirection
  could be resolved -- not that I think this is a good idea, mind
  you.) How about "in a comparatively simple style"?

* #IMGS/p[5]/eg[1] and eg[2]: Since there is no markup in the
  examples, the CDATA marked sections are superfluous.

* #IMGS/p[5] text in between the two <eg>: The idea that "the
  knowledge that the attributes such as <att>n</att> and
  <att>rend</att> come from the global attribute class is lost" seems
  pretty counter-intuitive: everyone and anyone can see that n= and
  rend= come from the global attribute class, because the patterns
  used are named "att.global.n" and "att.global.rend". Here is a
  suggested re-wording:
    In the above, a redefinition of an attribute class will have no
    effect, as each class has already been expanded to its
    constituent attributes.

* #IMGS/p[5] text after the 2nd <eg>:
  - change "class attributes" to "attribute classes", no?
  - change "with a pointer" to "via a reference"

* #IMGS/p[6], last sentence, "An ODD processor is not required to
  support both.": Perhaps we should mention that for processing TEI
  ODDs, the simple schema output is at least vastly preferred, if not
  required.

* #IMGS/p[7]: the example <sp> declaration is not simplified, it is
  completely different (there is no place to put the speech!). If we
  want to keep this example, I'd change "simplified" to "fictitious".

* #IMGS/p[7], after <eg>s: I'm not fond of the wording here (no
  reason not to use more precise industry-wide term "deterministic";
  the last sentence makes it sound like it is a problem that RELAX NG
  does not require determinism), but I think it is low priority and
  can await 1.1, unless someone can re-word this a lot faster than I.

* IMGS/p[8], "... mandate any particular schema, but it is ...":
  s/schema/mechanism/;

* IMGS/p[8], rest of para: Why are we recommending this only for
  DTDs? Just because it is hard for us to do for RELAX NG doesn't
  mean we should not recommend ODD processors do this.

* #IM-naming/head: how about "Names and Documentation in Generated
  Schemas"? 

* #IM-naming/p[1], sentence 1: insert Oxford comma after "element".

* #IM-naming/p[1]/list/item[1]
  - "... value of the <att>ident</att> attribute, prefixed ...":
    insert "corresponding" after "the"
  - "... distinctive prefix such as e.g. <val>tei_</val>.": remove
    either "such as" or "e.g.".
  - "(compact)": we haven't mentioned that examples are in the
    compact syntax before, but I think it is a good idea that we do.
    I suggest we standardize on "RELAX NG (compact syntax)" both here
    and at #IMGS/p[5], just before the <egXML>. (Anywhere else?)
  - I think "Referring strings have to be adjusted accordingly."
    should be expanded. What exactly is a "referring string"?
    Would something like "References to these patterns (or, in DTDs,
    parameter entities) also need to be prefixed with the same
    value." be correct?

* #IM-naming/p[1]/list/item[2], "... <gi>altIdent</gi> child, the
  value of that is ...": re-word: "... <gi>altIdent</gi> child, its
  content is ...".

* #IM-naming/p[1]/list/item[3], 2nd sentence: suggested re-wording: 
     If there is only one occurrence of either of these elements, it
     should be used; however if there are two or more occurrences with
     different values of <att>xml:lang</att>, a locale indication in
     the processing environment should be used to decide which to
     use.
  Note that this does not give advice on what to do when there are
  two or more with the same value of xml:lang=. Fodder for a 1.1
  improvement. 

* #IM-naming/p[2]/list/item[2]: there is an exception: colons are
  removed first, so that the namespace prefix and attribute name are
  run together, as in 'att.global.attribute.xmlid'.

* #IMMA, after the <egXML>: reword to something like the following.
     Note that in much of these Guidelines, RELAX NG schema fragments
     are shown in the compact syntax; both the content of the
     <gi>contents</gi> element and the the unified ODD specification
     generated by the TEI ODD processing software stores RELAX NG in
     the more verbose XML format. However, the two formats are
     interchangeable.

* #IMCL/p[1], sentence 1: actually, a definition is generated, not
  just an alternation. suggested rewording:
     An ODD model class generates a RELAX NG pattern definition
     listing all the members of the class present in the ODD in
     alternation.

* #IMCL/p[2]/egXML[2]:
  - I expected to see an <a:documentation> element; am I crazy?
  - it would probably be a good idea to explain the reason behind two
    definitions, one as 'empty' (I do not understand well enough to
    explain it)

* #IMCL/p[2]/quote/following-sibling::text(), "Naturally, this
  sort of use of the documentation elements is not mandatory, and
  other ODD processors may ignore them when creating schemas.": other
  ODD processors could do something else, too, so I'd suggest
  something like:
        Naturally, this sort of use of the documentation elements is
        not mandatory, and other ODD processors may generate
        alternate documentation or ignore them when creating schemas.

* #IMCL/p[3], before the <egXML>s: this paragraph does not follow
  house style in referring to elements and attributes.
      <p>An individual attribute consists of a <gi>rng:attribute</gi>
      element with a <att>name</att> attribute derived according to
      the naming rules described above. In addition, the ODD model
      supports a <gi>defaultVal</gi> element, which is transformed to
      a <att>defaultValue</att> attribute in the <ident
      type="ns">http://relaxng.org/ns/compatibility/annotations/1.0</ident>
      namespace on the <gi>rng:attribute</gi> element. The body of
      the attribute definition is taken from the <gi>datatype</gi>
      child, unless there is a supporting <gi>valList</gi> element
      with a <att>type</att> attribute with a value of
      <val>closed</val>. In that case a <gi>rng:choice</gi> is
      generated, listing the allowed values.

* #IMCL/p[3], after the <egXML>s: <ident> needs type="ns"; need to
  cite the recommendation for marking up annotations this way.
  (http://relaxng.org/compatibility-20011203.html, is it?)

* #IM-makeDTD/p[1], "... classes generate DTD entities,
  the TEI ...": insert "parameter" after "DTD".

* #IM-makeDTD/p[1]/list/following-sibling::text(): I think this
  sentence is far too colloquial for use in the Guidelines. I think
  it can just be deleted.

* #IM-makeDTD/p[2]/eg[1]: I realize this is probably correctly
  copied-and-pasted from some real DTD output, but I'm thinking that
  the xmlns attribute should be declared with #FIXED.

* #IM-makeDTD/p[2], last sentence: "... the document is processing
  by a DTD-aware ...": s/ing/ed/;

* #IMGD/p[1], 1st sentence:
  - need a citation for Knuth's literate programming.
  - latter half of sentence a bit wordy; suggested revision:
       ... the previous sections have dealt with the
       <term>tangle</term> process; to generate documentation, we now
       turn to the <term>weave</term> process.

* #IMGD/p[2]: suggested revision:
         An ODD customization may consist largely of general
         documentation and examples, which should be processed
         normally;, but in addition it will contain a
         <gi>schemaSpec</gi> and possibly some <gi>specGrp</gi>
         fragments.

* #ref-faith: probably would be good to come up with a more recent
  image. 

* #STPE: This section deals with instructions on how to "stitch
  together" the RELAX NG or DTD schema fragments into a usable
  schema. My recollection is that Council decided this information
  should not be included in the Guidelines themselves, so I am
  recommending we delete the entire section, and I am not giving it a
  closer reading.