[tei-council] project <gi>

Syd Bauman Syd_Bauman at Brown.edu
Sat Dec 15 20:12:29 EST 2007


I went through the Guidelines today routing out problematic instances
of the <gi> element. I found and fixed dozens.[1,2] Most were
elements not from the TEI scheme (I added a scheme= attribute),
mis-spelled element names, things that should have been <tag>, or
inclusion of the translated gloss inside the <gi>.

However, there were a few that I am a bit unsure about, or that
raised other issues with which I need help:

* The 2nd <exemplum> of stringVal.xml needs fixing or explaining (or
  both). Given that this element is not documented at all anywhere
  else, this has to be as clear as can be. I figure if I can't figure
  it out ...

* The first paragraph of the <remarks> of specGrp.xml needs to be
  re-written, I think. It refers to a non-existent <module> element,
  saying that <specGrpRef> but not <specGrp> is allowed within it.
  I'm not sure what element that would be.

* The <remarks> of gram.xml refer to term entries and <otherForm>,
  which no longer exist in the Guidelines. I think probably the first
  two sentences should simply be deleted, but would like someone else
  to affirm that first.

* The <desc xml:lang="zh-tw" in TEI.xml: is that <gi> a "teiCoprus"?
  In which case, for consistency, I think we should probably re-word
  it to "<gi>teiCorpus</gi> (tei&#x6587;&#x96C6;)" or whatever the
  right thing would be. If no one on Council can answer this (CW?)
  I'll plan to ask Marcus or Weining.

* The files hand.xml and handList.xml still exist in the repository.
  Any good reason to keep them? What's more important, though, is
  that the files
  http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-hand.html
  and
  http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-handList.html
  still exist. There is good reason to delete them. Seems to me it is
  important to figure out how it is that they're still there, so that
  further updates don't leave orphans like that.


Notes
-----
[1] In some ways, it was bittersweet vindication: I have always
    favored a content model of xsd:Name for <gi>, rather than text. I
    think the sheer number of errors I found confirms not only that
    position, but actually a stronger stance: our build process
    should validate the content of <gi> against the list of TEI
    element names, unless scheme= is specified as something other
    than "TEI" or "tei".

[2] I did most of the seeking via command-line searches on my Mac OS
    X system (same would have worked in GNU/Linux), some using the
    xmlstarlet command. I'm posting them here just in case anyone is
    interested. Feel free to ask for an explanation of how these
    commands work.
       $ echo "<gis>" > /tmp/gis.xml
       $ xml sel -N t="http://www.tei-c.org/ns/1.0"
                 -t -m "//t:gi[not(@scheme) or @scheme='tei' or @scheme='TEI']"
                 -c "." -o " {" -v "ancestor::t:div[1]/@xml:id" -o "}" -n
             Source/Guidelines/en/guidelines-en.xml
         | perl -pe 's/ xmlns(:[a-z]+)?="[^"]+"//g;'
         | perl -pe 's/{/<!-- /; s/}/ -->/;'
         | sort >> /tmp/gis.xml
       $ echo "</gis>" >> /tmp/gis.xml
    (I used the curly-braces and perl stage to change to comments
    because I haven't figured out how to get a comment out of
    xmlstarlet.) 
    Then I validated the resulting XML file against an RNC file that
    I wrote using the output of the following as the basis:
       $ xml sel -N r=http://relaxng.org/ns/structure/1.0
                 -t -m "//r:element" -v "@name" -n
             [Sourceforge]/P5/Exemplars/tei_all.rng
             > /tmp/gis.rnc_starter_file



More information about the tei-council mailing list