[tei-council] datatype issues (part 1)

Lou Burnard lou.burnard at computing-services.oxford.ac.uk
Sat Sep 10 14:12:04 EDT 2005


Working through the datatype definitions, I have so far identified the 
following issues. Comments and corrections on these would be much 
appreciated, especially if it comes in the next few days.

1. <specDesc ident="tei.data.xxxx"/> will extract the <desc> from the 
referenced macroSpec. It would be nice also to be able to extract the 
<content> or <stringVal> part for display.

2. The definition of <macroSpec> allows it to contain multiple <content> 
or <stringVal> children. Why?

3. tei.data.certainty is defined as either an enumeration ("high", 
"low", "medium", "unknown")  or a reference to tei.data.probability, 
which is a real value between 0,1 or an integer between 0,100. I wonder 
if it wouldn't be less confusing to restrict the values for 
tei.data.certainty to the literal only, since any attribute for which we 
to allow either kind of value can do so by giving an alternation of 
datatypes (I think)

 4. tei.datatype.language is isomorphic with xsd:language: do we need it?

5. tei.data.regexp is used only  in two rather obscure places: do we 
need it? If we do, is the reference to appx. F of the xsd spec really 
the canonical place to define what sort of regexp we mean ?

6. tei.data.sex defines four alphabetic values (m f x u) which 
correspond to ISO 5218 numeric codes 1 2 0 and 9. Should we not rather 
use the ISO codes?

7. Furthermore, where (as with sex) the datatype is a closed 
enumeration, it makes sense to represent this in the macrospec as a 
<rng:choice> containing several <rng:value>s. But there is currently no 
scope to  provide a gloss for what each value means, since <valList> is 
not allowed within <macrospec>.

8. In earlier discussion I had proposed that tei.data.token should 
differ from rng:token in that the former should not permit included 
whitespace.  Thinking about this again, I think I might have been wrong: 
it might be less confusing to use <rng:token> directly wherever we want 
a "tei.data.token", thus allowing people to use XML whitespace 
normalization in attribute values in the same way as they can in 
content. If we do define tei.data.token as proposed (i.e. as an 
xsd:token with a facet saying that whitespace is not allowed), we should 
really give it a different name, or expect to spend the rest of eternity 
explaining why our usage differs from W3C and RNG's (ok, we were there 
first, but still). Same applies, mutatis mutandis, to tei.data.tokens: 
it might indeed be simpler to define that as xsd:token rather than as a 
list of our weird tei.data.tokens. On the other hand....

That's something to be going on with. I'm going to have a cup of tea AND 
A BISCUIT now.

Lou













More information about the tei-council mailing list