[tei-council] comments on ST

Wed Oct 17 14:40:18 EDT 2007

Finished first pass through ST. It certainly reads better than it did
before.

* entire Guidelines, Passim: when we refer to a chapter, section, or
  subsection with a <ptr>, we should probably ditch the word
  "chapter" or "section" or whatever in the source prose file, and
  let the stylesheet generate the level of the division. Probably a
  P5 1.1 thing.

* 1st para: I'm wondering if the entire 2nd half of the 1st para is
  necessary. I.e. "Other chapters supply ... new TEI-based system."
  seems like overkill. All this stuff is discussed better a few
  paragraphs later, and plopping it right up front may scare the more
  timid off. (I understand the logic of telling the reader up front
  what topics they might be looking for that this chapter *doesn't*
  cover and where to find it -- I'm just not sure it's worth it in
  this case.)

* 3rd para, last sentence: I don't like using "mix and match" in this
  way, since it's the other modules that can be mixed and matched,
  not the four that are the subject of the sentence. How about

     Most schemas will therefore need to include these four modules,
     but are free to add most any combination of others.

* 4th para, last sentence, "... as well as to generate documentation
  such as the <title>Guidelines</title> and their associated
  website.": I'd like to shift the emphasis to the possibilities.
  Something more like:

     ... as well as to generate associated documentation in a variety
     of formats. (At the time of this writing TEI and XHTML formats
     are supported by TEI-supplied software.) The web-accessible
     version of these <title>Guidelines</title> is generated in this
     manner.

* Should STECAT (1.4 Attribute Classes) and STECCM (1.5 Model
  Classes) be subsections of STEC (1.3 The TEI Class System)? (Didn't
  they used to be?)

* //div[@xml:id='STMA']/p[1]/list[1]/item[2], "a formal declaration,
  expressed for reference by means of the ISO schema language RELAX
  NG": The "for reference" does not seem to belong, but moreover, our
  formal declarations are, in part, expressed not in RELAX NG but in
  TEI.

* //div[@xml:id='STMA']/p[2]: I am not sure that the first two
  sentences of this paragraph are needed at all, but if you think the
  ideas expressed are important to explain, then they need to be
  re-written, as they confused me, let alone a new user!

* #STMA, 1st comment, "": where is this modules.xsl
  stylesheet? Generation of this table should be made part of the
  build process (I think we should do so before 1.0.)

* #tab-mods: This is the first (and almost only) use of "formal
  public identifier" in the Guidelines. Is there any reason to retain
  this information here? I'm also wondering about the arrangement of
  the table. It seems to me much more likely that a user will come in
  asking the question "what is the formal name of the module
  associated with chapter X" than "what is the chapter associated
  with module NAME". In which case, the table should be sorted
  numerically by chapter, rather than by module name, and the chapter
  column should be first.

* Last para before #STIN: I don't like how this sentence is phrased,
  but was not able to come up with something better off the top of my
  head. Seems to me part of the point is to explicitly say that for
  each module there is a chapter that describes it, but that there
  are some other chapters not associated with modules.

* //div[@xml:id='STIN']/p[1], last sentence: "Local systems may allow
  their schema or DTD to be implicit, but for interchange purposes
  the schema associated with a document <emph>must</emph> be made
  explicit."
  1) " or DTD" should be deleted
  2) I realize I've lost the battle to define a mechanism for such an
     association already, but I have a problem requiring that an
     association be made explicit without *any* advice on how to do
     that.

* #STINsimpleExample/p[1]: The antecedent of "it" in second sentence
  is murky. Suggestion: "In ODD format, the heart of such a
  customization looks like this:".

* #STINsimpleExample/p[2], "The schema specification itself is also
  given an identifier (<ident>TEI-minimal</ident>) and the start
  point, or root element, is specified by means of the
  <att>start</att> attribute.": it's not clear what the root element
  is the root element of. Suggestion:

     The schema specification itself is also given an identifier
     (<ident>TEI-minimal</ident>) and the start point, or root
     element, of instances valid against the schema being defined is
     specified by means of the <att>start</att> attribute.

* #STINsimpleExample/p[2]: typo: "declarationsm"

* #STINsimpleExample/p[2], "... or in principal any another schema
  language."
  1) shouldn't "principal" be "principle"?
  2) I don't know that this is true; could the TEI (or any other
     complex closed schema) be expressed in an open schema language
     like Schematron? I'd be more comfortable saying "... or in
     principle any another closed schema language.", although I'm not
     sure that "closed" is still the jargon that's used for this -- I
     may be out-of-date.

* //div[@xml:id='STINlargerExample']/p[4]/x:egXML[1]: I think it
  would be good to insert comments as one might in a real
  customization:
    <schemaSpec ident="TEI-PROJECT" start="TEI">
      <moduleRef key="tei"/>
      <moduleRef key="header"/>
      <moduleRef key="core"/>
      <moduleRef key="textstructure"/>
      <moduleRef key="msdescription"/> <!-- Manuscript Description -->
      <moduleRef key="transcr"/>       <!-- Transcription of Primary Sources -->
      <moduleRef key="namesdates"/>    <!-- Names, Dates, People, and Places -->
    </schemaSpec>

* Last para of #STINlargerExample, "change their names or even add":
  I'd put in the comma after "names". Also this paragraph is missing
  a close-paren at the end.

* #STEC/p[1], last sentence: I found the use of the terms
  "superclass" and "subclass" confusing throughout, and think that
  perhaps the way to address them is to define them better here. Here
  is a first stab, although I'm sure you can do better, Lou:

     A class (call it A) may also have as a member another class
     (call it B), in which case B inherits properties from A, and
     an element that is a member of B inherits both those properties
     belonging to A and those belonging to B. In these cases A is
     sometimes called the <term>superclass</> of B, and B a
     <term>subclass</> of A.

* #STGA//specDesc: Only 5 of the 7 global attributes are mentioned.
  As for xml:space=, I think that's a good thing, because I do not
  think that xml:space= should be in the TEI in the first place. If
  it does remain, it really needs a lot more work, including
  explanations, examples, etc. As for xml:base=, it may be a rare
  enough use attribute that it should just not be mentioned here, not
  sure. 
  Also, I thought we had agreed that the new rendition-pointer
  attribute would be named rendRef=, not rendition=.

* I think that the discussion following the first two paras in #STGA
  might usefully be further divided into divisions:
    <div xml:id="STGAid"><head>Element Identification and Labeling</head>
      <p>The value supplied ... therefore probably redundant.</p>
    </div>
    <div xml:id="STGAla"><head>Indicating Language</head>
      <p>The <att>xml:lang</att> attribute ... header (see section
        <ptr target="#HD41"/>).</p>
    </div>
    <div xml:id="STGAre"><head>Indicating Rendition</head>
      <p>The <att>rend</att> attribute ... be closely related.  </p>
    </div>

* //div[@xml:id='STGA']/p[4], last sentence "used to denote two
  different element types": although it is true that 'p' and 'P'
  denote two different element types, what's important for this para
  is that they denote two distinctly different values of xml:id= and
  thus two different occurrences of XML elements. Easy fix is to
  change "types" to "occurrences".

* //div[@xml:id='STGA']/p[5], "the same identifier,a validating XML":
  typo, missing space after comma.

* //div[@xml:id='STGA']//x:egXML[1]: since the example is invalid, I
  think rather than escaping part of it to avoid an error message, we
  should either tolerate the error message or encode the whole things
  as <eg>. (Having them look different from each other is icky;
  having the whole example look different because it is an <eg> and
  most others are <egXML> is good: it *is* different.)

* //div[@xml:id='STGA']//x:egXML[2]: whole example and numbers in
  preceding sentence need to be replaced. Here is a possible
  replacement: 

  transcribed from a faulty original in which the number 3 is
  used twice, and 5 is omitted:
  <egXML xmlns="http://www.tei-c.org/ns/Examples"><list type="ordered">
    <item n="1">The Bride</item>
    <item n="2">The Groom</item>
    <item n="3">The Courtship</item>
    <item n="3">The Preparations</item>
    <item n="5">The Announcement</item>
    <item n="6">The Festivities</item>
    <item n="7">The Wedding</item>
    <item n="8'>Honeymoon</item>
  </list></egXML>
  <!-- adapted from Goldsmith, William, _The Princess Bride_ -->

* //div[@xml:id='STGA']/p[16]/x:egXML[1] (has string "text-style"),
  and moreover the <rendition> tagdog: didn't we discuss using MIME
  types instead of schemes? Did we agree one way or the other? I
  don't think there is a MIME type for XSLFO (other than text/xml),
  is there? If we decide to keep scheme= over mimeType= (aka
  att.internetMedia), I really think this should be a "semi" list
  without "other", not a closed list.

* //div[@xml:id='STGA']/p[17], last sentence "mechanisms for
  describing font families, weight, and styles": it's my age-old
  training in typesetting coming through, bucking against the modern
  (now decades old, I'm sure) mis-use of the term "font", I'm sure,
  but I'd prefer this read "mechanisms for describing type faces,
  weight, and styles".

* //div[@xml:id='STGA']/p[18]: I don't like the construction
  "X/HTML". I'd prefer to just say "XHTML", period. But if you feel
  we really have to make it clear that it's true of HTML as well,
  then spell it out: "HTML or XHTML".

* #STECCM/p[4]: I'd be inclined to change "... its structural
  location. Those elements (or ..." to "... its structural location.
  E.g., those elements (or ..."; also this paragraph is missing a
  close-paren at the end.

* #STECCM/p[5]: I'd be inclined to change "The same class will
  contain different members ... of model classes) will differ
  depending on ..." to "The same class may contain different members
  ... of model classes) may differ depending on ..." as some classes
  do not change their membership based on which models are loaded
  (except for the base 4 modules, of course :-)

* #STECCM/p[7]: Change "at all the <gi>element</gi> must be
  available" to "at all the <gi>g</gi> element must be available".

* #STECCM/p[9]: Change "Just as there a few classes" to "Just as
  there are a few classes".

* <div><head>The Basic TEI Class Structure</head> is missing an
  xml:id=.

* I noticed that the <desc> of macro.limitedContent does not end in a
  period, as it probably should. There must be others. I will send a
  list shortly.

* macro.schemapattern tagdoc: the name should be macro.schemaPattern,
  and the <desc> is almost gibberish. How about "Contains a RELAX NG
  content model which permits the elements necessary to define XML
  content models and attribute value constraints in a given schema
  language." or some such?

* #STECST/table: Inconsistent use of initial cap in description
  column (I prefer no initial caps here). "Empty elements" might be
  more usefully described as "Elements that cannot have any content"
  or some such. (Not a big deal, as "empty element" is defined in SG
  I believe.)

* #DTYPES/p[5]: scale= of <graphic> is not data.probability (so it
  should be moved), and there should be no comma between
  "data.numeric" and "include". (Smells like a copy-and-paste
  boo-boo! :-)

* #DTYPES/p[7] (immediately after <specList> that starts with
  data.duration.w3c): The <specList> does not list the .iso versions
  of the temporal datatypes, which I think I would prefer to see
  included. But we certainly shouldn't attribute an ISO 8601 format
  to a W3C datatype as this para does. Also RFC 3066 should either be
  RFC 4646 (more precise, but will need to be updated when RFC
  changes) or BCP 47 (a tad less precise, but should endure for
  foreseeable future). Suggested rewording:

     <p>Note that in each of these cases the values used are those
     recommended by existing international standards: ISO 8601 as
     profiled by <title>XML Schema Part 2: Datatypes Second
     Edition</title> in the cases of durations, times, and dates; W3C
     Schema datatypes in the case of truth values; BCP 47 in the case
     of language; and ISO 5218 in the case of sex.</p>

* The <desc> of the data.pattern tagdoc is at best ambiguous. What
  this datatype is really doing is declaring that the attribute value
  (although I'm still uncomfortable with our limiting datatypes to
  attribute values -- a P5 1.5 issue, I'm sure :-) should be
  interpreted as a regular expression. Didn't we also used to
  explicitly say which regular expression language, too: (it was
  W3C)?

* The <desc> of the data.name tagdoc: change to
     <desc>defines the range of attribute values expressed as an <ref
       target="http://www.w3.org/TR/REC-xml/#dt-name">XML
       Name</ref></desc>

* The <desc> of data.enumerated: change "a single word or token" to
  "an XML Name".

* //div[@xml:id='DTYPES']/p[10] (right after the data.code
  <specDesc>): 
  1) "only attribute defined as of type": delete "of"
  2) "their possible values: and any string of": delete "and"
  3) last clause: I am really uncomfortable with this. The
     implication is that a user couldn't redefine data.key in order
     to get some nice validation constraints. But that's one of the
     main reasons we want the indirection of using data.key instead
     of just 'text' directly! Furthermore, users really really want
     to know where to document such things. If we say it goes in the
     TEI Header, we should say where in the TEI Header.
     How about something like:
        Any constraints on their values, such as the rules for
        constructing a valid database key in a particular system, may
        be documented in a <tagUsage> element in the the TEI Header,
        but are not enforced by the datatype as defined here. The TEI
        may be customized using the methods described in <ptr
        target="#MD"> such that some constraints will be enforced by
        validation software.

* //div[@xml:id='DTYPES']/p[13], "Attributes of type <ident
  type="datatype">data.enumerated</ident>, such as
  <att>anchored</att> on <gi>note</gi>": anchored= of <note> is no
  longer data.enumerated.

* I think this section of DTYPES (between 13th & 14th paras, i.e. a
  new third-to-last para) might be a good place to reiterate that
  users are expected to customize TEI at least to supply value lists
  for attributes that are data.enumerated for which the TEI does not
  supply a list, if not to close lists TEI has left open, add values,
  etc. 

* As discussed previously, the list of classes & macros defined
  should not be in the chapter.

* #STOV/p[2], "this TEI module make extensive use of": add 's' after
  "make". 

* #STOV/p[3]: I am not sure that it is important to map out the
  organization of the RELAX NG schema fragments in the chapters. I'd
  be inclined not to do so. But surely we have already agreed (in
  Berlin) that the Guidelines should not discuss how to stitch
  together DTD fragments, so the map of the DTD fragment should be
  deleted.