[tei-council] project <gi>
Syd Bauman
Syd_Bauman at Brown.edu
Sat Dec 15 20:12:29 EST 2007
I went through the Guidelines today routing out problematic instances
of the <gi> element. I found and fixed dozens.[1,2] Most were
elements not from the TEI scheme (I added a scheme= attribute),
mis-spelled element names, things that should have been <tag>, or
inclusion of the translated gloss inside the <gi>.
However, there were a few that I am a bit unsure about, or that
raised other issues with which I need help:
* The 2nd <exemplum> of stringVal.xml needs fixing or explaining (or
both). Given that this element is not documented at all anywhere
else, this has to be as clear as can be. I figure if I can't figure
it out ...
* The first paragraph of the <remarks> of specGrp.xml needs to be
re-written, I think. It refers to a non-existent <module> element,
saying that <specGrpRef> but not <specGrp> is allowed within it.
I'm not sure what element that would be.
* The <remarks> of gram.xml refer to term entries and <otherForm>,
which no longer exist in the Guidelines. I think probably the first
two sentences should simply be deleted, but would like someone else
to affirm that first.
* The <desc xml:lang="zh-tw" in TEI.xml: is that <gi> a "teiCoprus"?
In which case, for consistency, I think we should probably re-word
it to "<gi>teiCorpus</gi> (tei文集)" or whatever the
right thing would be. If no one on Council can answer this (CW?)
I'll plan to ask Marcus or Weining.
* The files hand.xml and handList.xml still exist in the repository.
Any good reason to keep them? What's more important, though, is
that the files
http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-hand.html
and
http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-handList.html
still exist. There is good reason to delete them. Seems to me it is
important to figure out how it is that they're still there, so that
further updates don't leave orphans like that.
Notes
-----
[1] In some ways, it was bittersweet vindication: I have always
favored a content model of xsd:Name for <gi>, rather than text. I
think the sheer number of errors I found confirms not only that
position, but actually a stronger stance: our build process
should validate the content of <gi> against the list of TEI
element names, unless scheme= is specified as something other
than "TEI" or "tei".
[2] I did most of the seeking via command-line searches on my Mac OS
X system (same would have worked in GNU/Linux), some using the
xmlstarlet command. I'm posting them here just in case anyone is
interested. Feel free to ask for an explanation of how these
commands work.
$ echo "<gis>" > /tmp/gis.xml
$ xml sel -N t="http://www.tei-c.org/ns/1.0"
-t -m "//t:gi[not(@scheme) or @scheme='tei' or @scheme='TEI']"
-c "." -o " {" -v "ancestor::t:div[1]/@xml:id" -o "}" -n
Source/Guidelines/en/guidelines-en.xml
| perl -pe 's/ xmlns(:[a-z]+)?="[^"]+"//g;'
| perl -pe 's/{/<!-- /; s/}/ -->/;'
| sort >> /tmp/gis.xml
$ echo "</gis>" >> /tmp/gis.xml
(I used the curly-braces and perl stage to change to comments
because I haven't figured out how to get a comment out of
xmlstarlet.)
Then I validated the resulting XML file against an RNC file that
I wrote using the output of the following as the basis:
$ xml sel -N r=http://relaxng.org/ns/structure/1.0
-t -m "//r:element" -v "@name" -n
[Sourceforge]/P5/Exemplars/tei_all.rng
> /tmp/gis.rnc_starter_file
More information about the tei-council
mailing list