[tei-council] Datatypes.... continued

Syd Bauman Syd_Bauman at Brown.edu
Tue Sep 20 12:20:28 EDT 2005


> 1. add at place 
>    addSpan at place
>            -->  tei.data.enumerated

What do you propose the value list be? The list{} method proposed
permits things like "opposite top left" (which is also permitted in
P4). 

>    metDecl at type    
>            -->  tei.data.enumerated

Again, we need to account for the fact that more than 1 "value" is
permitted. Thus the value list would have to account for the various
combinations. Luckily, since there are only 3 of 'em, in this case
it's not hard:
  met
  real
  rhyme
  met;real
  met;rhyme
  real;rhyme
  met;real;rhyme
(In EDW90 I just copied what P4 says to do: "One or more of the three
attribute names met, real, or rhyme, separated by whitespace", thus
   list { ("met" | "real" | "rhyme")+ }
Both P4 and EDW90 permit silly combinations like "met real met
real".)


> 2. tei.pointerGroup at domains  
>           -->  tei.data.pointers

I don't really see it as a plus to permit values that the prose says
are invalid just to say we used a datatype directly. The EDW90
recommendation matches the prose perfectly. (It is
  list { tei.data.pointer, tei.data.pointers }
.)

On the other hand, if everyone really thinks it is extremely
important to use an abstract TEI datatype instead of a perfectly
reasonable combination of 'em like the above, I suppose we could move
the "must be 2 or more" check into a Schematron rule. Seems like the
lesser of two evils, at least.

 
> 3. schemaSpec at start
>     specDesc at atts   
>          --> tei.data.names

Again, why use a lax constraint when the proper one is readily
available just to say you used a datatype? There are three possible
declarations for tei.data.names on the table, and only one of them
actually constrain this attribute properly:
* list { xsd:NCName+ }  ->  does not permit "musicML:note", but since 
                            there is an ns= attribute, I'm betting
                            that's considered a good thing, right
                            Sebastian?
* list { xsd:NMTOKEN+ }  ->  permits "--notAllowed"
* list { xsd:token { pattern="\S+" } }  ->  permits "${notAllowed}"

If we decide tei.data.names should boil down to something other than
the first, then I think we should just use the proper constraint
without a TEI datatype and not fret it.


> 4. date at precision
>        --> tei.data.certainty

I think this is a really bad idea. First off, <date> should probably
not have a precision= attribute. The precision should be expressed in
the value=. Furthermore, while I suppose vague terms like "high",
"medium", and "low" are occasionally applied to the precision, it is
much more common, and far more useful, to express the unit to which a
measurement is precise. So if we really wanted to separate precision
from the value=, we would want precision= of date to have values like
"century", "decade", "year", "month", "week", and "day".


> 5. tei.datable at dateAttrib
>        --> tei.data.enumerated

Yes, that's what is recommended: tei.enumerated with a value list of
"datable" | "dated" | "unknown".


> 6. locus at scheme
>        --> tei.data.name

I presume that's because you don't want to argue with Matthew and
David over the possible values? :-) Seriously, I don't currently
really understood the purpose of this attribute. <locus> describes a
location in the current manuscript, which (supposedly) has only one
foliation scheme which should be described in //supportDesc/foliation
(of which only 0 or 1 are permitted, right?). So what does scheme=
buy us?


> 7. fragmentPattern at pat
>       --> tei.data.notation

As above, if "tei.data.notation" is for "notations TEI made up", this
doesn't belong.

> 
> 8. schemaSpec at namespace
>     elementSpec at ns (why isnt it "namespace" btw?)
>        --> these could be xsd:uri as proposed, or tei.data.pointer, but 
> maybe since they have to be
>              real namespace names (i.e. "#foo" won't do) maybe shd be 
> their own datatype?

Yes, all our ns= and namespace= attributes need to be brought into
alignment. I don't care which.

However, as I read the spec, "#foo" is a perfectly valid namespace.
Stupid perhaps, but valid.


> 9.   tei.declarable at default
>      tei.identifiable at predeclare
>      metSym at terminal
>      numeric at trunc
>      binary at value                  
>         --> are all xsd:boolean (so "unknown" not allowed) ; could just 
> be tei.data.truthValue with extra rule

Could be. And at one point in the history of that EDW90 table, they
were. But a week or two ago we agreed to go directly with
xsd:boolean. 


> 10.  timeline at interval
>        when at interval   
>        --> tei.data.numeric | -1  (or think of a better way of
>            doing the -1)

a) We'd go back to needing unit= (I'm not saying this is so horrible,
   just want to make sure everyone understands the implications) and
   violate the policy of using W3C Schema datatypes where applicable.

b) All of the proposed declarations of tei.data.numeric already
   permit -1.

c) This permits all other negative numbers, which P4 explicitly
   disallows. 

d) If you meant 'tei.data.count', it won't do as fractions may be
   needed. 

e) Now that I think about it, the pattern EDW90 recommends has a
   problem, too: it permits "-0.5".

Thus, I am now leaning towards
   xsd:long { minInclusive = "0" }  |  xsd:token "unknown"


> 11. several attributes with proposed  datypes of "xsd:NCName" -> 
>     tei.data.name

Again, only if we agree tei.data.name -> xsd:NCName, of which I've
yet to be convinced.


> 12. several attributes with proposed datatypes of
>     "xsd:nonNegativeInteger" --> there are enough of these that I
>     propose a new datatype "tei.data.count"

Fine with me, although I'm not sure what it gains for us.


> 13. sense at level -> tei.data.count

Fine.
(Just so everyone understands, the only real difference between
  xsd:unsignedShort
and
  tei.data.count -> xsd:nonNegativeIngeger
is that the software engineer designing an application that reads a
TEI-encoded dictionary knows that in the former case whenever she
comes across a level= of <sense> she need only set aside 16 bits to
hold the value; with the latter she gets no such assurances. The
other difference, that xsd:unsignedShort has a maximum value of
65,535, isn't a practical problem.)




More information about the tei-council mailing list