[tei-council] datatypes updated

Syd Bauman Syd_Bauman at Brown.edu
Sun Sep 25 17:05:10 EDT 2005


> I have now checked new versions of all tagdocs, classdocs, and
> chapter files into CVS. These versions now use the new datatypes
> only (more or less!)
> 
> There follow some random notes on what I needed to change (or not
> change) to get this revised version to validate both P5 itself and
> our current test suite.

I am getting two massive sets of errors. 

* It seems that at least fileents.dtd still points to a directory
  called "newSource/". I've changed 'em to "Source/", will check in
  shortly, presuming it works.

* For reasons I can't explain, jing is calling the wrong java. I
  have to manually reset JAVA_HOME to get it to work.

Oh. I see someone's already checked in a fixed fileents.dtd now.
Good. However, now that I've fixed these two, I'm still getting quite
a few validation problems, but am probably too tired to do much about
it tonight.


> 1. textLang used to use langKey and otherLang to reference a
> <language> - I have changed them to tei.data.language to do this
> which means they dont need to be pointers any more, but they must
> use valid language codes. I also changed "langKey" to "mainLang".

I don't understand. They are listed as tei.data.language in the EDW90
table, and always have been (well, otherLangs= was list {
xsd:language } at one point).

Ah well, as long as they've been fixed.


> 2. I have left most things defined as tei.data.names unchanged. But
> they need to be reviewed, and changed to either
> list{tei.data.name*} or tei.data.token (or some other name to map
> on to xsd:token), depending on whether they are actually distinct
> items like "up down" or single items which just happen to have a
> space in them like "fat possum"

This review had already been done -- in the EDW90 table, as of last
week at least, those that had actual distinct items were either
tei.data.tokens or list { something }.


> 3.  tei.data.probability : I have made this consistently a number 
> between 0 and 1, and I have decided on using it for certainty at degree, 
> and as an alternation with tei.data.certainty for  damage at degree -- 
> sorry, no percentages.

So the net effect is 
1) removed alternation of tei.data.certainty from recommendation for
   degree= of <certainty>
2) no percentages in tei.data.probability


> 4. tei.data.certainty : this needs a better name to indicate that
> it means "approximate designation of quantity" : i used it for
> purpose at degree

I.e., you've removed alternation of tei.data.probability from
recommendation for degree= of <purpose>. How come? (Having never seen
a <purpose> element used in real life, I don't think, I'm not sure I
understand how one is intended to combine either probabilities or
"high", "medium", et al., let alone intertwine them.)


> 4. schemaSpec at start is oneOrMore tei.idents, rather than zeroOrMore as 
> proposed by Syd

I proposed the plural class "tei.data.idents", at your suggestion,
which should boil down to 
   list { tei.data.ident+ }
Perhaps you are confusing this with the requirement that the atts=
attribute be allowed to be empty.


> 5. scheme attribute on att, tag, gi : as these have a vallist, i
> have made them tei.data.enumerated and also made the list open to
> cater for the various names used in SA

I'm not convinced an enumeration is the best way to go. What are
namespaces for if not to indicate the scheme an attribute or element
belongs to? While I agree the EDW90 recommendation (pointer|ident)
may be a bit dorky, I think associating an <att> or <gi> with a
namespace is a better idea.


> 6. In abolishing the reg= attribute i had to revise the text of CO
> a bit and ND quite a lot. The latter in particular contains several
> examples best characterized as a little eccentric... 

So what did you put into the Guidelines about how regularizations
should be encoded?

> ... and two problems remain: how to represent a glossed geographic
> feature like <geogName>Mont (Mountain) St Michel</geogName>

The other?


> 8. metsym at value changed back to text, since examples use / and @

But text allows whitespace, which you said it can't have.

I think it should be tei.data.token -- it can't have white space. It 
should be a single character in fact for any sensible kind of notation.


> 9. ruledLines changed to choice of single or list of two tei.data.count 
> to allow for range (this meant also changing several egs in MS

Why not just "list { tei.data.count, tei.data.count? }" ?


> 10. removed several cases where reg= was being used to mean equiv=
> for quantities etc. this needs thought

Indeed.


> 11.  xsd:regexp doesnt allow use of +- as in example for metdecl: 
> changed example

I could swear I had fixed that. Sorry.


> 12. attributes width, height, scale on (at least) graphic need some
> more thought; height and width should include units ("10in" etc) ;
> scale should be tei.data.probability (which maybe needs a better
> name?)

So you can't scale a graphic larger than 100%?


> 13. fragmentPattern at pat changed to text pro tem (shd be child
>     element?)

Why would this be a child element? Characters outside of Unicode or
not permitted, big time; and It cannot be said to be in any natural
language, so you can't want to indicate which one. (Note: I recently
discovered that the XPath 2.0 regular expression language includes
back references, which would allow it to be the language for this.)


> 14. state at delim -> reverted to text (names doesnt allow space)

?? tei.data.names (plural) should allow space.


> 15. handNote at script should be tei.data.names, not pointer

I had thought the control offered by tei.data.code would be
beneficial. Having never seen this attribute used, let alone used it
myself, I'll be happy to bow to your wisdom.




More information about the tei-council mailing list